I’ve been working on a C++ project (blame Qt), and I recently stumbled across an issue that seemed to be caused by not following the Rule of 3/5: after ‘reconstructing’ an object by assigning a newly-constructed temporary object to it, my program began crashing with some kind of use-after-free error.
I decided to do some research, which sent me down the rabbit hole that is copy constructors, copy assignment operators, move constructors, and move assignment operators.
After I picked my jaw up off the floor, I set about adding these to one of my classes. Unfortunately, the code was quite large, verbose, and full of duplication:
I didn’t like this, especially since a copy/move constructor and its corresponding assignment operator seemed to mostly do the same thing – could these not share code somehow?
A method I found that did allow a constructor and assignment operator to share code was the ‘copy and swap idiom‘. Not only that, but it also allowed copy constructors/operators to share code with move constructors/operators. This code compactness seemed great, but I didn’t like that the process of swapping required a third, temporary object. Considering that my objects were responsible for large buffers, this seemed like an awful waste of RAM.
The code that I’d written had a lot of duplication: the code used to copy/move each of the object’s buffers was exactly the same. This had me wondering if could make a buffer class that would allow me to make both buffers share their copy/move code. But, wait, doesn’t C++ already have a bunch of container classes that do that? After giving it some thought, I settled on replacing my class’s buffers with vectors, and, as a result, I was able to greatly simplify the constructors and assignment operators:
My, it’s so minimal! It’s so sleek! It’s so efficient that the destructor doesn’t need any code!
Wait… the destructor doesn’t need any code?
The Rule of 3/5 says that if you need a copy/move constructor, copy/move assignment operator, or a destructor, then you probably need all of them. But clearly I don’t actually need an explicit destructor anymore, as the default implicit one will do the job just fine!
Actually… now that I think about it, all of those methods can be replaced with their defaults.
After doing that, here’s my code:
That’s right: there isn’t any! By using the proper containers, I don’t need explicit copy/move constructors, copy/move assignment operators, or even a destructor anymore! The code’s practically writing itself!
Dear ImGui’s default style looked a bit tacky and ‘programmer art’-like to me, so I’ve made my own:
I’ve tried to create a typical ‘dark mode’ theme while still keeping a high degree of contrast in order to maintain good legibility. Another focus of the style is technical and artistic minimalism, hence the removal of borders, tab/scrollbar rounding, and colour (colour introduces visual noise, and rounding greatly increases the number of rendered polygons).
Optimised VRAM Viewer
The VRAM viewer wasn’t very efficient: it would try to render every single tile even when only a fraction of them are actually visible to the user. This is something that can be seen with Dear ImGui’s built-in debugger:
As the debugger shows, hidden tiles are rendered above and below the section of the viewable region. This causes 4096 polygons to be drawn in total, when far fewer are actually needed.
I discovered that Dear ImGui has a feature specifically intended to address this kind of problem: the List Clipper! The List Clipper automates the process of selecting only the elements in a list that are visible to be rendered. Applying this to the VRAM viewer only required the smallest bit of refactoring (changing a single for-loop that iterated over each tile into two for-loops that iterate over each row and then each tile in each row) as well as the removal of a small hack, and the problem was solved!
In this example, the number of polygons has been reduced to just over 500!
Support for the YM2612’s Timers
Huzzah – an improvement to the actual Mega Drive emulation!
The Mega Drive’s primary sound chip – the YM2612 – has a nifty little feature that went tragically underused by Sega: a pair of timers that are capable of raising CPU interrupts. This is notable because the only other interrupts of its kind are the VDP’s V-blank and H-blank interrupts, but those interrupts’ timings vary based on whether the console is a PAL or NTSC model. Additionally, these timers are fully configurable, allowing for the possibility of arbitrarily-timed interrupts!
…And Sega didn’t think to connect these timers to either of the Mega Drive’s CPUs. What a waste.
These timers are still usable, but they must be manually polled by the CPU to check if they’ve expired yet. This wastes precious CPU time, and squanders the timers’ potential for certain uses.
Games typically use these timers for controlling the timing of their sound engines. There are two alternative ways to achieve this, but they both have their own downsides:
One way to control the speed of music and sound effects is to use the V-blank interrupt, however, this interrupt occurs less often on PAL consoles, in turn causing the sound engine to update less often. This causes music and sound effect to play slower on PAL consoles, which is perhaps most well known in Sonic the Hedgehog (1991). The game’s sequels avoid this by detecting PAL consoles and forcing every fifth V-blank interrupt to update the music twice, resulting in it having roughly the correct speed, albeit with some minor distortion. Using the YM2612’s timers instead would have avoided this issue entirely as they are almost exactly the same on PAL consoles as they are on NTSC consoles.
Another method of controlling the speed of audio is to manually time code execution by writing the code so that it uses a certain number of CPU cycles. This approach is quite extreme, but it does see heavy use in Z80 code that is responsible for feeding PCM samples to the YM2612’s DAC channel (such code is called a ‘DAC driver’). Most DAC drivers use idle loops to waste CPU cycles until it is time to send the next sample. Were the YM2612’s timers capable of raising CPU interrupts, then this technique would be largely unnecessary except for in the most advanced DAC drivers.
The YM2612 timers are quite rudimentary: every time the YM2612 outputs a full frame of audio, Timer A is decremented, and every 16 times a full frame is output, Timer B is decremented.
Unfortunately, I ended up not noticing this, and redundantly refactoring my YM2612 emulator to operate in cycles instead of audio frames (there are 144 cycles to 1 audio frame). The reason that I made this mistake was that the (unofficially translated) official documentation for the YM2608 (the chip that the YM2612 is derived from) states that Timer A decrements every 72 cycles, not 144. And yet, my own testing and the documentation found here both suggest that the timers are twice as slow as the YM2608 manual claims that they are. This may have something to do with how the YM2612 differs from the YM2608 in how it “mixes” all six channels togethers (it doesn’t: it just cycles between outputting each one hundreds of thousands of times per second).
Since these timers are typically used to time sound engines, the lack of emulating this feature causes many games to output no audio. Such games include Vectorman and Castlevania: Bloodlines. With support for the timers added, these games now produce audio.
Another game affected by this is an old ROM-hack of mine – ‘Sonic 2 except the music goes as fast as you do’:
Unlike the vanilla Sonic the Hedgehog 2, this ROM-hack uses Timer A to control the speed of the sound engine. By adjusting the timer, the speed of the music and sound effects is changed!
clownmdemu’s frontend is a rather complex bit of software, so, to do the things that it does, it leverages a number of open-source libraries (and fonts). These libraries are made available under certain conditions: for example, a library may require that its authors are credited in the documentation of any software that uses it, while other libraries go a step further and require that an entire copy of the library’s licence is provided in said documentation. The libraries used by my frontend tend to require the latter.
Being a minimalist, I don’t like the idea of every release of my frontend bundling the executable with a dozen text files, so instead I had the idea of embedding the licences into the frontend itself. To this end, there is now an ‘About’ menu which gives a brief overview of what the program is and provides a list of open-source licences.
This makes it a lot easier for myself and anyone else who uses my emulator to abide by the various licences, as there no longer has to be any worry about forgetting to reproduce licences with every binary distribution.
Personally, I think the requirement to reproduce a big blob of legalese with every binary distribution of non-copyleft software is stupid, which is why my libraries are usually zlib- or 0BSD-licensed instead, as they don’t have that requirement.
Support for the Window Plane
Another big feature for Mega Drive emulation!
The Window Plane is an oddity to me: the first two Sonic games (whose codebases I am very familiar with due to spending over a decade reverse-engineering them) never use it, so it’s completely alien to me. That’s why it took me so long to add support for the Window Plane: I never had the need nor knowledge to.
What the Window Plane is is a bizarre override for Plane A. Unlike Plane A, the Window Plane cannot be scrolled, but it is otherwise capable of everything that Plane A is. The Window Plane is not rendered on top of Plane A, like Plane A is to Plane B, but rather the Window Plane renders instead of Plane A. The VDP specifies two boundaries – one vertical and one horizontal – that determine where Plane A stops being drawn and the Window Plane starts being drawn instead (or vice versa).
This feature tends to be used for drawing a HUD that does not scroll with the game’s foreground or background. An example of a game that does this is Castlevania: Bloodlines:
This feature has a glaring bug on real Mega Drives: if the Window Plane is drawn to the left of Plane A, and Plane A is scrolled horizontally by a number of pixels that is not a multiple of 16, then the two columns of Plane A tiles that are next to the Window Plane will be “disfigured”. This bug is noted in Sega’s official Mega Drive developer documentation (the “Genesis Software Manual”, page 50). My emulator does not yet reproduce this bug, but I plan to add it in the future.
One other bit of software that I have on-hand to test the Window Plane is a little bit of homebrew that I found here, which was made by someone called ‘Fonzie’. It’s useful for illustrating the aforementioned bug on real Mega Drives, but it’s also handy for testing that emulators support the Window Plane. Here’s a screenshot of it running in my emulator:
Like the Sprite Plane, Plane A, and Plane B, the Window Plane can be disabled in my frontend’s debugging toggles menu. As a novelty, disabling the Window Plane will cause Plane A to be drawn in its place. This allows the user to see the part of Plane A that is “hidden” by the Window Plane.
Support for the 68000’s BCD Instructions
The 68000 has three instructions for performing binary-coded-decimal arithmetic. The long and short of it is that performing a BCD addition between 0x1 and 0x9 results in 0x10 instead 0f 0xA. This is useful in situations where you need to extract individual decimal digits from a number but don’t want to resort to repeatedly dividing by 10 to do so, as, if the number is in BCD format, you can instead just use bit-shifts and bit-masks, which are way faster than divisions. The two games that I know of which use BCD instructions use them for HUD elements, which makes sense since each digit needs to be extracted so that it can be used to determine which number graphic to display on the HUD.
The reason for these instructions taking so long to be implemented, besides them being quite niche, is that they have an absurd number of undocumented behaviours and edge-cases. For instance, the BCD instructions are some of the only instructions to have their overflow condition code behaviour officially specified by Motorola themselves as ‘undefined’, meaning that the official documentation is of no help in understanding how it works. The documentation also fails to explain what is supposed to happen when a BCD operation is performed on non-BCD numbers. For instance, what’s supposed to happen when 2 is added to 0xC? Is the result 0xE? 0x12? 0x10?
Luckily, other emulator developers encountered this problem too, and exhaustively documented how a real 68000 performs its BCD operations, figuring out every quirk and feature. This information has been collected in this SpritesMind thread. Most notably, Flamewing provided some homebrew that you can run on an emulator or a real Mega Drive to verify that its BCD instructions do as they should. Safe to say, my emulator now passes all of this homebrew’s tests!
With this, three of the 68000’s quirkiest instructions are now fully emulated!
The lack of BCD instruction emulation broke Castlevania: Bloodlines in a couple of humorous ways: it was impossible to use items because the ‘gem’ counter would never go above 0, and it was also not possible to get a Game Over because the lives counter would never go below 0, which effectively gave the player infinite lives.
Vastly-Improved Z80 Emulation Accuracy
Some people have told me that certain Sonic ROM-hacks were missing their DAC audio output when ran in my emulator. One such hack that I was able to verify this with was Sonic 1 Megahack: Ultra Edition, which lacked the music and sound effects during the “Sonic have pased” popup after completing a level.
Unlike my 68000 emulator, my Z80 emulator had not been verified against any test suites, so it was likely that my Z80 emulator had a number of bugs and inaccuracies that these ROM-hacks were invoking, causing them to misbehave and not output audio through the DAC channel.
Back when I initially developed my Z80 emulator, I had read a blog post that detailed something called ‘ZEXDOC’ which is a program that is written in Z80 assembly and can be ran on both real and emulated Z80 CPUs to verify that each instruction performs properly, which it does by analysing RAM, registers, and flags before and after each instruction. The catch was that this program is intended to be ran on CP/M, which is an old operating system for the Z80. With that said, the dependency on CP/M was quite minimal: ZEXDOC only used two of CP/M’s console-printing system calls, and expected itself to be placed at address 0x100.
I was able to quickly rig-up a small program that implemented just enough of CP/M to get ZEXDOC running on my Z80 emulator, and it signalled that many instructions were not working as intend:
Unfortunately, ZEXDOC gave little feedback as to exactly which instructions went wrong and how they went wrong. Not to mention, if my emulator is performing Z80 instructions incorrectly, and ZEXDOC uses Z80 instructions to perform the calculations that determine whether other Z80 instructions work correctly, then how can I be sure that its calculations are being performed correctly? What if some of these failed tests were false-positives?
So I couldn’t trust ZEXDOC’s output, nor make attempts to narrow-down exactly what inaccuracies my Z80 emulator had.
This problem didn’t exist with my 68000 emulator’s test suite, as the actual tests are not performed by the emulated 68000 itself but rather the host computer’s CPU. If I could find a test suite like that for the Z80, then that would be a great help. Not to mention that a test suite like the one used for my 68000 emulator would do a much better job of helping me verify exactly which instructions are incorrect and why, since it lists the exact contents of each register and memory address before and after the tested instruction’s executions.
I was in luck, because I stumbled across a Z80 test suite that was almost identical to the test suite that I used for my 68000 validator!
Due to its similarly to the 68000 test suite, I was able to make a modified version of my 68000 validator read the Z80 test suite’s data, and begin performing said tests on my Z80 emulator. It immediately uncovered numerous bugs, of which some were minor (such as undocumented flag behaviours) and others were severe (such as entire swaths of instructions using the wrong operands).
Soon, I had my Z80 emulator passing every test, with the exception of instructions which I had yet to implement in the first place such as IN, OUT, CPI, CPIR, and DAA. Now confident that my emulator could run ZEXDOC at least somewhat properly, I tried it again and, this time, most of the tests passed:
The only failed tests were for instructions which I had not yet implemented in my emulator. I have never seen these instructions used in any Z80 code for the Mega Drive, be it code from official games or homebrew. Because of this, implementing these instructions was never a priority for me, and I had no test-cases for them either. However, because both the Z80 test suite and ZEXDOC provide exhaustive tests for these instructions, I decided that I would finally add them to my emulator. Eventually, these new instructions passed both validators, meaning that every ZEXDOC test passed:
There are still three instructions that are not yet implemented: HALT, IN, and OUT. The reason that the latter two are not implemented is because they are special instructions that write to “IO ports”, which do not exist on the Mega Drive.
The instructions that I did implement are CPI, CPD, CPIR, CPDR, and DAA. The first four all are variants of each other, and are used for searching for a particular byte in a block of memory (much like C’s memchr function). The DAA instruction is for performing “BCD correction” to the accumulator register. It essentially performs the same task as the second half of the 68000’s ABCD, NBCD, and SBCD instructions: it computes a ‘correction factor’ based on the output of the previous arithmetic instruction, and adds it to the output to make it into a valid BCD number.
Now that my Z80 emulator is much closer to behaving like a real Z80, those Sonic ROM-hacks should all have working DAC audio output, right?
Well, the joke’s on me, because it turns out that Sonic 1 Megahack: Ultra Edition still has missing DAC audio. But why? Why does the audio still not work despite the Z80 emulation being so much more accurate? Well… it’s because the audio doesn’t work on a real Mega Drive!
Yep, you read that right: that ROM-hack only works properly on inaccurate emulators, because that’s all that the developer used to test the game during its development. This is actually a fairly common occurrence when it comes to ROM-hacks, since running a ROM-hack on a real Mega Drive requires an expensive flash-cartridge, so most developers just stick to testing exclusively with emulators.
So, in the end, the audio in Sonic 1 Megahack: Ultra Edition was bugged, not because my emulator was inaccurate, but because it was too accurate. I suppose I should be proud of that.
Improved FM Debug Menu
While debugging Sonic 1 Megahack: Ultra Edition‘s missing audio, I realised that the FM debugger could be improved, so I’ve moved the per-FM-channel data out of the tabbed section and into a shared table, and exposed the timers, latched address and port, and channel panning.
Fix 1-Cell Horizontal Scrolling Mode
This is another bug that I didn’t notice because no Sonic game or ROM-hack that I know of uses it. The Mega Drive’s VDP has three ways of scrolling the screen horizontally:
Scrolling the entire screen.
Scrolling each row of pixels individually.
Scrolling each row of pixels in groups of 8 (the size of a tile, or, as Sega’s official documentation calls it, a “cell”).
This bug involves that last one: it just didn’t work at all. This can be seen in Earthworm Jim‘s “What the heck?” level:
So what’s going on? When I first wrote my VDP emulator, I assumed that 1-cell mode kept all of its scroll values next to each other in memory, just like in 1-line mode. However, that is not the case: strangely, each cell’s scroll value is actually spaced 8 values apart. Correcting this behaviour fixes Earthworm Jim, and presumably everything else that uses 1-cell mode.
Add General VDP Debugger
To diagnose the above bug, I needed information on what the VDP was doing, so I added a menu to show the VDP’s various settings. It’s not all that pretty, but it gets the job done for now:
It was with this that I figured out that Earthworm Jim‘s scrolling was only broken when the VDP was in 1-cell horizontal scrolling mode, so this menu should prove useful in the future too.
Abandon ‘tiny file dialogs’ Library
Dear ImGui and SDL2 are great and all, but they don’t provide a cross-platform way to browse files. This is a problem since it’s important for the user to be able to select a ROM image or a save-state file to load or save. To this end, the frontend made use of the ‘tiny file dialogs’ library, which enables the use of the operating system’s standard file dialogs.
Unfortunately, the library is as much a help as it is a burden: its code is of questionable quality, producing various compiler warnings for such novice mistakes as returning pointers to local arrays. In addition, despite emphasising portability and supporting POSIX, the library is incompatible with the BSDs. Finally, its support for symbolic links is allegedly completely broken.
With the library being riddled with code hygiene issues and actively limiting the portability of my frontend, I’ve decided to ditch it. In its place, I’ve added a barebones file input prompt that leverages Dear ImGui.
Neither C++11 nor SDL2 provide a way of querying directories for their contents, so this is the only universally-compatible solution. I could have used C++17’s file-system API, but I worry that such a new API is not very ubiquitous yet.
Of course, using such a limited, clunky way of opening files would harm the frontend’s usability, so I intend to add platform-specific logic to use the native file dialogs whenever possible. Right now, this has been done for Windows, and I intend to do the same for Linux (GTK and Qt) soon. Users of other operating systems will have to get comfy with the barebones dialog, but it’s still an improvement over the frontend not compiling at all.
The barebones file dialog supports drag-and-drop: just drag the desired file onto the window and its path will automatically be entered into the text box. The file dialog doesn’t even have to be open: ROMs and save states can be dragged onto the window at any time, and the frontend will apply them appropriately. Intuitiveness is great.
Fix VRAM Fill
The Mega Drive’s Video Display Processor has three Direct Memory Access modes: 68000 to VDP, VRAM Fill, and VRAM to VRAM. The first is used quite often and implemented in my emulator, the second is much less common and supported by my emulator but not well tested, and the third is rare and not yet in my emulator at all.
Across a large number of games (including Sonic, Castlevania, Vectorman, and Earthworm Jim), VRAM Fill is seemingly only used for setting large swarths of VRAM to 0. This basic usage is easy to emulate, but does not serve well to ensure that said emulation is entirely accurate to the behaviour of a real Mega Drive.
Cue Mega Man: The Wily Wars, which makes some interesting use of VRAM FIll: at the start of VRAM, it stores a handful of blank tiles, each of a different colour – these tiles are all generated with VRAM Fill.
This behaviour was enough to expose issues with my emulator’s implementation of VRAM Fill:
Looking at the VRAM debugger shows exactly what’s going wrong:
Every other two pixels is not being filled with the correct colour. I immediately had my suspicions about the cause of this: back when I was first writing the VDP emulator in September 2021, I learnt that VRAM is apparently 8-bit, while CRAM and VSRAM are 16-bit. This caught me by surprise, as I’d always assumed that all three of them are 16-bit, so I was not sure what the proper way to implement this quirk was. Instead, I implemented VRAM as 16-bit, as I was originally going to. I figured that this might cause certain edge-cases – such as uploading data to an odd VRAM address – to behave incorrectly, but otherwise it would not be an issue. However, seeing this bug in Wily Wars convinced me that this workaround had to go.
It turns out that many aspects of my emulator’s implementation of VRAM Fill were incorrect: for instance, it mistakenly assumed that the length was measured in 16-bit words, when in reality it was measured in 8-bit bytes minus one. Additionally, the 16-bit word that specifies the value to fill the VRAM with is not entirely used: only the upper 8 bits of it are actually written to VRAM.
Addressing these issues fixed the bug, and Wily Wars now looks much better:
Wily Wars still has one small bug: those four coloured dots near the start of VRAM. The game appears to accidentally write a VDP command ($8F02) to the VDP’s data port instead of its command port, causing it to be uploaded to VRAM as graphics. While the game certainly makes the same mistake on a real Mega Drive, it doesn’t seem to result in those pixels being written to that particular place in VRAM, as it causes artifacts to be visible in the level which aren’t present on a real Mega Drive:
See the pixels in that pit? Those aren’t there on a real Mega Drive. Just another inaccuracy to fix in the future, I suppose.
Implement YM2612’s ‘BUSY’ Flag
This is a feature that wasn’t too important to emulate, but it’s so simple that I figured that I’d might as well.
The YM2612 takes time to process the data that it is given. To signal to the CPU that it can’t accept more data yet, the YM2612 sets the high bit of its status byte. The CPU obtains this byte by reading from one of the YM2612’s address ports.
On the YM2612, the busy flag is extremely basic: it is set whenever either data port is written to, and always lasts for 32 YM2612 cycles (192 68000 cycles), regardless of how long the submitted data actually takes to process. I’ve heard that the longest YM2612 operation is only 24 cycles, and that the YM3438 actually does set the busy flag for lengths that match each operation’s duration.
With this feature implemented, any software that explicitly relies on the busy flag for timing should work correctly now. An Earthworm Jim game and a game called Hellfire apparently rely on this.
Add ‘Other’ Debugging Menu
To expose yet more of the Mega Drive’s internal state to the user, a menu has been added to show information about the general console, rather than specific components. Right now, this menu revolves around the bus arbiter, but it will be expanded with other information in the future, as the need arises.
This has been a pretty massive update (and a massive blog post), so I’m feeling a bit burnt-out on working on this emulator. Progress may be slow for a while after this. I’m pretty happy to see how far this emulator has come, though! It wasn’t long ago that this emulator couldn’t even boot anything, and now look at what it can do: play the classic Sonic trilogy, run Linux, and even run ROM-hacks and homebrew!
This is just a quick update to address some issues in the previous v0.3 release.
Make FM Debugger More Compact
The FM debugger was a bit ‘verbose’ in v0.3…
As you can see, each channel was given its own window, which meant that it was a lot of effort to simply switch from one channel to another without just having all windows open at the same time, which would take up a lot of the screen.
Since it’s unlikely that a user would ever need to see more than one FM channel’s registers at a time, these windows have all been merged into a single tabbed window:
Fix DPI Support
Unfortunately, after hyping it up so much in v0.3’s release, the default window sizes were broken on DPIs that weren’t 150% the standard. I was expecting Dear ImGui to handle DPI differences like this automatically like it usually does, but that’s not the case here.
I’ll have to remember to test this frontend at alternate DPIs before each release to prevent a repeat of this mistake.
Add a Horizontal Scrollbar to the Plane Debugger
As the result of yet another strange quirk of Dear ImGui, horizontal scrollbars do not exist by default, even in windows that need them. This affected the VDP’s Plane A/B debuggers, which only had a vertical scrollbar. By explicitly telling Dear ImGui to create a horizontal scrollbar, this issue is no more:
User-Friendliness Improvements to Keyboard Rebinding
Sometimes it’s the small things that matter most.
When the user is repeatedly adding key bindings, the newly-extended binding list would push the ‘Add Binding’ button off-screen, requiring the user to scroll down to be able to press it again. This is a small annoyance, but an annoyance nonetheless, so it has been corrected by automatically scrolling the window down after a new binding is added.
Additionally, when selecting an action to bind to a selected key, the selected key is displayed to the user. This extra feedback allows the user to verify that they selected the correct key, instead of them being left in the dark.
Fix Phantom Keyboard Inputs After Rebinding
Sometimes, after rebinding the keyboard inputs, the emulated Control Pad would behave as if certain buttons were held when they are not. This was due to edge-cases in how the key-binding system works. For instance, if a key’s binding were changed after it has been pressed but before it is released, then the emulator would ‘forget’ which Control Pad button to release when the key is released. This should no longer be the case.
Fix Ugly Seams Around Tiles in VRAM Debugger
Depending on the display’s DPI, odd artifacts could appear around the tiles in the VRAM viewer:
This was the result of some accidental fractional image scaling. This has been corrected to use the proper integer image scaling, eliminating the seams.
With this much-needed polishing complete, hopefully the next update will include some improvements to the core emulation: Window Plane, SRAM, LFO, SSG-EG, YM2612 Timers – there are plenty of things left to add.
This update mostly affects the standalone frontend, but a couple of them also apply to the libretro core.
One shortcoming of the standalone frontend is that it lacks keyboard rebinding: the W, A, S, and D keys will always control the Control Pad’s D-Pad, and so on.
But not anymore!
New to the frontend is full keyboard rebinding! In addition, the default key bindings have been switched to the more common arrow keys and Z/X/C keys combination.
Unlike some other emulators, this system allows the user to bind multiple keys to the same action: for instance, if the user wanted to bind both the ‘Z’ key and the ‘space’ key to the Control Pad’s ‘A’ button, then they can do so!
It would be pretty frustrating for binding customisations to be lost whenever the program is closed, so support has been added for persistent configuration: settings are saved to a file called ‘clownmdemu-frontend.ini’, allowing settings such as the keyboard bindings, console region, and V-sync to be remembered by the emulator.
Previously, the options would all be managed through the menu bar, but this is quite clunky as the menu bar would close after each option is toggled. To improve the user experience, the options have now been moved to a dedicated menu:
This menu provides a much more intuitive way to change options! Additionally, each option shows a tooltip when hovered over with the mouse, allowing unfamiliar users to understand what they do!
Default Window Sizes
Another improvement to the user experience is that windows are now given a sane default size, meaning that they will now have a proper size when opened for the first time.
Opening the same ROM file over and over again is tedious, so now the emulator keeps a list of the 10 most recent files used:
FM and PSG Debugging Toggles
The standalone frontend has had the ability to disable individual VDP planes for ages, but now it can also toggle FM and PSG channels. A dedicated menu has been added for this:
This feature is also available in the libretro core:
PSG Debugger Overhaul
The PSG debugging menu was butt-ugly before, and has been given a makeover:
Support for Alternate PAL Detection Method
Previously, when playing Sonic the Hedgehog 2 with the emulated Mega Drive in PAL mode, the music would play at a slightly slower speed, just like it does in the first game. This shouldn’t happen.
The reason that this was occurring was that the game relies on an alternative method of detecting the PAL video mode: by checking bit 0 of the VDP’s control port. This bit should reflect whether PAL mode is enabled or not. Now that this is the case, the game properly detects and accounts for the speed difference in its music, allowing it to play at the proper speed.
User-friendliness has been a focus of this update, so hopefully this will make the standalone frontend much more accessible to new users!
It’s been too long, but finally my emulator has an update!
Since the first release, the emulator has been greatly optimised, some inaccuracies in the 68000 interpreter have been addressed, and the occasional missing CPU instruction has been added. Compatibility with games should be a bit better than before, but still not great as many essential features of the Mega Drive are not emulated.
The standalone frontend has had some extra debug menus added, which allow you to view the registers of the YM2612, 68000, and Z80:
New to the emulator is a libretro core frontend, allowing the emulator to be used by libretro implementations such as RetroArch. It lacks the debug menus of the standalone frontend, but makes up for it with features that libretro cores get for free, like customisable controllers and shaders:
In theory, the libretro core should provide a simple way of getting this emulator running on a variety of platforms: just compile the core into a library (static or shared), and use it in tandem with a libretro frontend such as RetroArch.
During the development of this update, I have set up a test suite for the 68000 interpreter which allows me to check that each instruction does as it is supposed to. It was this test suite that notified me of how the word-size ADDA, SUBA, and CMPA instructions were pitifully broken. I’m surprised that this didn’t break Sonic 1, 2, or 3&K, but it did break Linux.
I also made a small benchmarking tool which measures the speed of the core emulation logic. This is useful for measuring the impact of optimisations and the difference in speed between platforms.
Overall, this has been a rather incremental update. Rather than being focussed on optimisation and refactoring, I hope that the next update will be focussed on improving compatibility and emulating more features of the Mega Drive.
You can find the standalone frontend here, and the libretro frontend here.
Years ago, I wrote an MD5 hasher. For some reason, I never gave it a proper release, instead only including a copy of it in one or two of my projects. That’s finally changed, and I figured that I’d mark the occasion by giving a recap of its history here. It’s a bit more complex than you’d expect.
I originally wrote my MD5 hasher as part of a university assignment. It was meant to be written in C#, but I preferred C, so I wrote it in that instead and converted it to C# after I had it fully tested and working.
This was simple enough: getting the hasher to produce the correct hashes was a bit of a nightmare due to parts of the specification being easily glossed-over, but the actual conversion to C# only had one mishap: right-shifting by 32 resulted in a right-shift by 0 instead. This is actually undefined behaviour in C, so I had to correct that to get consistent behaviour between the two languages.
It seems like this trick paid off, because I never caught any flak for the code not being written like ‘proper’ C# or anything like that.
After submitting my MD5 hasher, I forgot about it for months (or maybe years) until stumbling across it again and deciding to clean it up a little: I converted it to a single-header library (one of the first that I had ever made), and overhauled the API to be lower-level by allowing data to be streamed to it a chunk at a time instead of all at once.
Despite this, I didn’t release the new and improved hasher, and instead just placed it in a directory called ‘clownlibs’ which contained assorted small libraries of varying degrees of polish. I had considered releasing them all on GitHub in a single repository, a lastb, but I became paranoid about how it would be impossible to star a particular library, or have a submodule pull in one specific library (something that came to a head a few blog posts ago), so I ended up endlessly putting it off.
A long time later, I was overhauling the build systems of the various Sonic the Hedgehog disassemblies, converting them from Batch/Bash/Python to Lua. The disassemblies relied on being able to produce hashes of the assembled ROM image, and comparing them against a series of hashes to determine the ROM image’s accuracy. Previously, this had been done with Python, but, with Python being replaced with Lua, there was no longer a hasher built into the language’s standard library that could be relied on. I tried to source a Lua hasher online, but all of the ones that I could find were absurdly slow. In hindsight, this was probably because Lua had only recently introduced support for integers and bitwise operations like AND and OR, meaning that those hashers were instead simulating them using floating-point operations, which, frankly, blows my mind.
Not realising this at the time, I instead assumed that the problem was simply that neither Lua nor the hashers that I had tried were very fast. This made me remember my own MD5 hasher, which was optimised for performance and portability above all, and I figured that I should try porting that to Lua to see if it performed any better than the others.
The process of porting the hasher to Lua wasn’t too complicated, though Lua does have a number of syntax differences from C that had to be accounted for. Lua’s ubiquitous tables also meant that portions of the code had to be rewritten to be more natural to the language.
Before long, I had a working MD5 hasher written in Lua that performed wonderfully, annihilating the other hashers in terms of speed. This hasher would find its way into the disassemblies of Sonic 1, Sonic 2, and Sonic 3 & Knuckles.
My MD5 hasher came in handy once more as I was working on my Wii U port of Sonic Mania: that game’s built-in MD5 hasher was garbage, and I’d always wanted to test my MD5 hasher on a big-endian platform like the Wii U, so I swapped the two. My hasher integrated into the codebase pretty well, and even eliminated some thread-safety issues. As expected, it worked perfectly on the big-endian CPU, putting the prior hasher to shame. The game does a lot of hashing, so it was really putting my hasher to the test! It was also just so cool to see my software being leveraged by an actual game.
After that, my MD5 hasher returned to its slumber once more, until today: I happened to take a look in my ‘clownlibs’ directory, and noticed that my hasher was the only library in there which I hadn’t eventually released: clowncommon.h and clownresampler.h have their own GitHub repositories now, but clownmd5.h still remained hidden. At last, I figured I’d put it off long enough, and finally created a GitHub repository for my hasher, years after first writing it.
Honestly, I didn’t expect to get so much mileage out of this library: it was just some university coursework, and yet it ended up being used by a bunch of different projects. Since it was so useful to me, it will hopefully be useful to others too. It’s licensed under the 0BSD licence, so there’s no reason not to go nuts with it!
Back in the pre-apocalypse days of 2018, I received a message from a Cave Story modder called zxin; he was interested in adding Ogg Vorbis support to his own mod, but didn’t want to bundle my entire mod into his: he just wanted the audio subsystem.
At the time, the audio subsystem was tangled with the rest of the mod’s code, so it took a lot of refactoring to separate it all neatly. Once I had the audio subsystem separated, I placed its code in a directory which I named ‘audio_lib’. I then wired the audio library into the code of zxin’s mod, and listened to it spring to life. Noob-y 2018 me found it so novel to have this code work completely outside of the environment it was designed for. Even now, code reusability is something that I aim for in all of my projects.
zxin had one more request, however: rather than use libvorbis directly, he wanted my audio library to leverage the libsndfile library. libsndfile is a wrapper around libvorbis and multiple other libraries, meaning that, by using libsndfile, my audio library would gain support for many more audio formats than just Ogg Vorbis, including FLAC, WAV, and AIFF.
I found libsndfile to be pretty cool, so, after integrating it into zxin’s mod, I tried backporting it to my mod. However, I eventually realised that its licensing would be a problem: unlike libvorbis, libsndfile is under a copyleft licence (the LGPL), and I didn’t want to force modders who use my mod to deal with a bunch of complicated copyleft obligations. To resolve this, I decided to make libsndfile optional.
These would be the first of clownaudio’s many decoder backends: within days, I added decoder backends for libFLAC (for playing FLACs), libopenmpt (for playing tracker formats), and even snes_spc (for playing SNES music). In the following months, I also added support for libtremor (an integer-only Ogg Vorbis decoder) and PxTone (a chiptune format made by the same person that originally made Cave Story – Daisuke “Pixel” Amaya).
But, more to the point, this was also the start of clownaudio’s life as a standalone library. Being shared by two projects, even if only briefly, was enough to make me whip the code into shape, encapsulating it from the surrounding code. This made it easy to drop into future projects, most notably CSE2, but that particular can of worms is a topic for another time.
My Mega Drive emulator has gotten pretty big: it has multiple core components (68000 emulator, Z80 emulator, YM2612 emulator, etc.) two separate frontends (a standalone SDL2/Dear Imgui frontend, and a libretro frontend), and even some tools which I never committed like a 68000 test suite and a performance benchmarker. This all creates a pretty bloated Git repository that pulls-in a whole bunch of dependencies, despite the core emulator being a lightweight blob of ANSI C code with no dependencies beyond the C standard library.
I think that this harms clownmdemu as a software library, since anyone who wants to use the core emulator as a Git submodule in their project has to pull-in a bunch of unrelated and unnecessary code. This is a problem that I’ve encountered with other libraries like libdeflate and libxmp, where all I want to do is compile and link the library, but the all-or-nothing nature of Git submodules means that I have to checkout their test suites, documentation, and example code, none of which gets used at all.
Another issue that monolithic repositories create is with notifications: a person that likes to keep up-to-date on a project’s development may not be interested certain subprojects, and so do not want to be notified when commits are made that only affect those subprojects. I for one hate being disappointed by seeing a project that I like at the top of my ‘recently updated starred repositories’ list, only for the update to be some boring test suite maintenance.
Finally, monolithic repositories create problems with build reliability. When regression testing, there are few things more frustrating than failed builds, as they bring the regression testing process to a grinding halt while the build errors are addressed, or they cause the commit to not be tested at all, potentially resulting in the cause of the regression being missed. When you have a repository with a library in it, as well as multiple standalone projects which use that library, then any backwards-incompatible changes to that library will cause all of those standalone projects to break. Until those projects are fixed, there will be commits where they cannot be built or ran properly, complicating later regression testing. By giving each project its own repository, those projects are able to make the library’s repository a Git submodule, allowing them to use a specific commit snapshot of the library that is known to work. With this, the number of commits in the project’s repository where the project cannot build or execute properly is reduced, potentially by a great amount.
Despite all of these downsides, I’ve only ever seen one project split across multiple repositories: mupen64plus, which has repositories for its emulation core, frontends, and video/audio/controller plugins. And yet, I don’t think mupen64plus does this because of any of the aforementioned downsides, but rather only because the mupen64plus project is just the emulation core: the frontends and plugins are all developed by third-parties, and essentially unofficial extensions.
After writing the last blog post, I was able to find a test suite for the Motorola 68000, allowing me to verify the accuracy of my 68000 emulator. After addressing a number of inaccuracies, Linux could finally finish booting! The issue that was breaking kmalloc was the CMPA.W, ADDA.W, and SUBA.W instructions not setting their condition codes properly due to quirks related to sign-extension, which presumably broke a branch or two somewhere.
Upon booting Linux, I noticed that it wasn’t recognising serial input. This turned out to be because Linux expects a level 2 interrupt to occur when there is pending data in the serial port’s FIFO. Curiously, code in 68 Katy’s Linux port suggests that it should also support a level 7 interrupt, which combines the serial input update with a 100Hz timer update, however it does not appear to work. That aside, adding the level 2 interrupt was simple enough, and, with input now working, I could try out the various executables that were bundled with Linux.
The 68 Katy’s Linux port is very barebones, only sporting vi, expand, ledblink, and sash. It also includes Colossal Cave Adventure, which is an old-school text-based adventure game. Sash is neat, because, despite being a shell, it has some BusyBox-like functionality that allows it to perform commands such as mount, mkdir, touch, and ls, which is enough to do some basic file-management. The inclusion of a fun little terminal-friendly game is nice too, even if its executable does take up a lot of space (86KiB, out of the 512KiB ROM).
Running dusty old executables is fun and all, but I had something else in mind: I wanted to see if I could port my Mega Drive emulator – clownmdemu. You see, my Mega Drive emulator and my 68 Katy emulator use the exact same 68000 emulator, meaning that if I can get the former to run on the latter, then my 68000 emulator would be emulating itself!
Being restricted to a terminal means that my emulator can’t create any graphical output, but I can at least run the emulator as a benchmark and get an idea of its performance. Compiling my Mega Drive emulator to target the 68 Katy was simple enough, since it doesn’t have any dependencies beyond the C standard library, though I did have to increase the 68 Katy’s ROM and RAM to fit the files and give the emulator the memory it needs.
So, how fast is it?
This benchmark is running the game Knuckles the Echidna in Sonic the Hedgehog 2 for a single frame, which takes 0.5 seconds. That’s right: it’s running at 2 frames per second. Note that the 68 Katy’s emulated CPU is running as fast as the host platform will let it, which in my case is around 350MHz.
I would have gotten the entire 68 Katy emulator running in itself, but the awkward timer interrupt and terminal input logic meant that it has to rely on POSIX threads, which doesn’t appear to be compatible with the 68 Katy’s ancient toolchain.
I tried to port a newer version of Linux to the 68 Katy, but it seems that only the Linux 2.0.X build of uClinux supports the vanilla Motorola 68000: the closest thing that later versions of Linux support is the Motorola 68328 – a souped-up 68000 with additional features such as built-in timers and an improved interrupt mechanism. While I was able to eliminate the dependencies on these extra features from Linux 4.4 and get it to partially boot in my emulator, it would still crash before completing its boot process.
Despite that setback, I was still successful in running Linux on my 68000 emulator, even if it was just Linux 2.0.X. I think that this is a good place to leave the project for now, so I’ve cleaned-up the codebase and made it available on GitHub. In contrast to my usual naming scheme, I’ve named this project ‘Virtual 68 Katy’ just because I think it sounds cool. You can find its Git repository here.
Unfortunately, I am not familiar with porting Linux at all, and with my 68000 emulator not being very mature, I wasn’t sure if it was even capable of running Linux in the first place; if I encountered a crash, how would I know if it’s an issue with the Linux port or my emulator?
So it looked like this idea wasn’t going to pan out… but wait – what if I instead emulated an existing 68000 Linux port that I know already works?
While researching how to cross-compile Linux for the 68000, I read this series of blog posts about a 68008-based computer that was originally designed on a breadboard – the 68 Katy. It was amazing to see how similar wiring an old CPU up to ROM/RAM chips and some peripheral devices was to what I’d done with a PIC microcontroller back in university. I guess I figured that CPUs wouldn’t be as simple. Anyway, the blog posts provided a pre-built copy of the 68 Katy’s Linux port (complete with a bootloader and filesystem) in a single flat-mapped binary blob that was ready to be placed at the start of the 68000’s address space – it couldn’t be any simpler! All I’d have to do is implement the 68 Katy’s memory map and serial communication port, and I could run this blob in my emulator!
Oh, right, there was one feature that I needed to add to my 68000 emulator first: user mode. Previously, my emulator had only ever ran software that operated in supervisor mode, but Linux extensively makes use of user mode, which has its own unique stack pointer and raises exceptions if certain privileged instructions are used. Implementing this was simple enough, but it took a while to weed-out the subtle bugs.
With this last feature added, I could proceed to emulating the 68 Katy!
My 68000 emulator is actually just a part of my Mega Drive emulator – clownmdemu – but each component of my Mega Drive emulator was designed to be modular and usable independent from the rest of the project. It definitely paid off in this case, as it was easy to pluck out the 68000 emulator and begin wiring it up to a new environment: all it needs is some initialisation and two call-back functions for reading and writing memory.
The memory map is simple: 0x00000-0x77FFF for ROM, 0x78000-0x7FFFF for IO, and 0x80000-0xFFFFF for RAM. The serial communication port exists in the IO space, and was pretty complicated to implement: the actual device on a 68 Katy is an FT245, but all that really matters is that it’s a FIFO with a couple of status bits to say when there’s pending data to be read, or no more room in the FIFO for data to be written. Figuring out the details required reading the code of the 68 Katy’s system monitor (which is written in pure, largely-undocumented assembly), the FT245’s manual, and the FT245’s kernel driver.
With this implemented, I was able to boot the 68 Katy binary blob and enter the system monitor. At first, my emulator only had support for serial output, so I couldn’t give any input, but I could at least see the monitor boot and print a message. Unfortunately, input is required to make the monitor boot the Linux kernel. Once I had input working, I was able to boot the kernel by entering the command ‘j003000’ (jump to address 0x003000, which is where the kernel’s code begins in the binary blob).
At first, this resulted in an immediate crash, but this turned out to just be the effect of a bug in the memory map implementation (IO was being mapped to ROM – oops). With that addressed, the kernel was able to print a few messages before hanging on ‘Calibrating delay loop..’.
This same issue was detailed in the 68 Katy’s development blog: Linux apparently needs a timer interrupt in order to do stuff. The 68 Katy has a timer wired up to the 68008’s interrupt pins, raising a level 5 interrupt every 100th of a second. Once that was recreated in my emulator, Linux was able to proceed a little bit further.
After hours of debugging, I found that this hang is caused by a failure to create the kernel thread which is responsible for running the ‘init’ function, which presumably completes the rest of Linux’s initialisation process. The thread is unable to be created because a call to ‘kmalloc’ fails to allocate memory. Unfortunately, this is where the extent of my debugging abilities end.
I can only imagine that my 68000 emulator has a bug in it, which somehow is not exposed when running any of the Mega Drive games and homebrew that I have on hand. I was hoping that any inaccuracies in my emulator would result in easily-debugged hard crashes rather than an insidious little state corruption like this.
The best way that I can think of to debug this is to swap out my 68000 emulator with one that I know is accurate, and then compare various ‘kmalloc’-related variables at various points throughout the boot process, but that doesn’t sound like the most fun. Admittedly, a proper test suite for my 68000 emulator would help a lot to find inaccuracies. I wonder if there’s anything like the Z80’s ‘ZEXALL’ instruction set exerciser for the 68000…