My Mega Drive emulator has gotten pretty big: it has multiple core components (68000 emulator, Z80 emulator, YM2612 emulator, etc.) two separate frontends (a standalone SDL2/Dear Imgui frontend, and a libretro frontend), and even some tools which I never committed like a 68000 test suite and a performance benchmarker. This all creates a pretty bloated Git repository that pulls-in a whole bunch of dependencies, despite the core emulator being a lightweight blob of ANSI C code with no dependencies beyond the C standard library.
I think that this harms clownmdemu as a software library, since anyone who wants to use the core emulator as a Git submodule in their project has to pull-in a bunch of unrelated and unnecessary code. This is a problem that I’ve encountered with other libraries like libdeflate and libxmp, where all I want to do is compile and link the library, but the all-or-nothing nature of Git submodules means that I have to checkout their test suites, documentation, and example code, none of which gets used at all.
Another issue that monolithic repositories create is with notifications: a person that likes to keep up-to-date on a project’s development may not be interested certain subprojects, and so do not want to be notified when commits are made that only affect those subprojects. I for one hate being disappointed by seeing a project that I like at the top of my ‘recently updated starred repositories’ list, only for the update to be some boring test suite maintenance.
Finally, monolithic repositories create problems with build reliability. When regression testing, there are few things more frustrating than failed builds, as they bring the regression testing process to a grinding halt while the build errors are addressed, or they cause the commit to not be tested at all, potentially resulting in the cause of the regression being missed. When you have a repository with a library in it, as well as multiple standalone projects which use that library, then any backwards-incompatible changes to that library will cause all of those standalone projects to break. Until those projects are fixed, there will be commits where they cannot be built or ran properly, complicating later regression testing. By giving each project its own repository, those projects are able to make the library’s repository a Git submodule, allowing them to use a specific commit snapshot of the library that is known to work. With this, the number of commits in the project’s repository where the project cannot build or execute properly is reduced, potentially by a great amount.
Despite all of these downsides, I’ve only ever seen one project split across multiple repositories: mupen64plus, which has repositories for its emulation core, frontends, and video/audio/controller plugins. And yet, I don’t think mupen64plus does this because of any of the aforementioned downsides, but rather only because the mupen64plus project is just the emulation core: the frontends and plugins are all developed by third-parties, and essentially unofficial extensions.
After a lot of fighting with git filter-repo
, I’ve split the original clownmdemu repository in four: the emulator core, the standalone frontend, the libretro frontend, and common frontend components such as the FM/PSG mixer. Repositories for the 68000 test suite and performance benchmarker are being worked on too. Since their modular design makes them easy to use in other projects, I may also split the Z80, YM2612, and SN76496 emulators to their own repositories at some point.
One thought on “Git Split”