More rambling about overhauling build systems? More rambling about overhauling build systems.
The recent p2bin overhaul wasn’t the first time that I’ve meddled with the build systems of the various Sonic disassemblies, as almost a year ago I’d taken on a much more complex task: converting the build scripts to Lua.
Much like p2bin, the build scripts of the Sonic disassemblies used to be extremely fragmented: the ‘canonical’ build scripts were written in Batch, but only worked on Windows, requiring that alternate build scripts be provided and maintained for Linux and Mac. To serve this purpose, the Sonic 2 disassembly offered a Bash script, while Sonic 1, Sonic 3, and Sonic & Knuckles offered a Python script. This naturally resulted in duplicate code galore as four Batch scripts, two Python scripts, and one Bash script all tried to do the exact same thing: invoke the assembler, produce and patch the ROM file, and display any errors to the user. Things were complicated further by additionalscripts being present for verifying the accuracy of built ROMs, which were also subject to the same fragmentation and duplication.
The build scripts were a pain to edit: if you wanted to add a feature to the Batch script, then you’d have to add it again to the Bash/Python script as well. Additionally, because each script was intended for a different OS, you had to swap between OSs to test that both scripts worked properly.
Wouldn’t it be great if each disassembly had only one build script which worked on Window, Linux, and Mac alike? Batch didn’t seem a good fit for this role, being a seemingly Windows-only language. Bash, while ubiquitous enough on Unix-like OSs, lacked native support on Windows. Python came close to being an ideal language, supporting Windows, Linux, and Mac, but suffering the downside of being too heavy of a dependency: a Windows user would need to download and install Python separately, harming the ready-to-use nature of the Sonic disassemblies (which can normally be used without any setup whatsoever). Bundling a Python runtime with the disassembly was also not an option due to how large it would be.
What the disassemblies needed was a language that was well-known, supported all major operating systems, powerful enough to handle complex logic, and lightweight enough to be bundled into the repository. Lua fit the bill perfectly.
I’d started learning Lua not long prior, and this was the ideal opportunity to put my knowledge of it to the test. Lua’s a nice language: it has a minimalist design while still being extremely flexible, all made possible by its introduction of the ‘table’, which essentially combines a struct, array, and class into one absurdly powerful construct. Lua’s minimalism also extends to its dependencies: its only dependency is ANSI C, which its runtime is written in. This allows Lua’s runtime to be condensed into a single 500KiB dependency-less executable – more than lightweight enough to bundle with the disassembly.
With the language decided, I set about creating a new build script that was written in it. Before long, it was at feature parity with the assorted Batch, Bash, and Python scripts, while being able to run just as well on Windows as it did on Linux or Mac. By reducing the number of build scripts per disassembly from two to one, it was no longer necessary to perform the same modification twice to two different build scripts whenever changing the build process.
An unintended upside of switching to Lua is that, as lightweight as it is, its flexibility and power allow it to perform tasks that previously required native binaries. One such process is calculating the ROM’s checksum and adding it to the ROM’s header: the Batch and Bash scripts left this task to a native executable while the Lua script (and the old Python script) can do it directly. Theoretically, the Lua build script can replace the functionality of the ‘p2bin’ and ‘fixpointer’ tools as well, though that has yet to happen.
With this done, the build scripts of the Sonic disassemblies were vastly improved, now all behaving consistently with one-another and sharing code wherever possible. If you ever need an interpreted language that’s lightweight but also powerful, then Lua might be what you’re looking for. It sure was in this case.
‘p2bin’ is the name of a tool that is used to turn the output of Macro Assembler AS into a usable binary. The various Sonic disassemblies use their own p2bin instead of the one that comes with AS, which is entirely custom and able to handle the compression of the games’ Z80 sound driver code.
Unfortunately, each disassembly comes with its own variant of the custom p2bin (named ‘s1p2bin’, ‘s2p2bin’, ‘s3p2bin’, etc.), with each one existing as a fork of another. This fragmentation means that each p2bin, despite being mostly the same as the others, is slightly different.
The differences lie in two things: the value of bytes inserted between segments and the compression of certain Z80 segments:
Sonic 2 uses ’00’ bytes between segments, while Sonic 1, 3, and Sonic & Knuckles use ‘FF’.
Sonic 1 and Sonic & Knuckles use Kosinski compression while Sonic 2 uses Saxman compression and Sonic 3 uses no compression. Additionally, Sonic 1 and 2 have a single compressed segment that is inserted between its surrounding segments while Sonic & Knuckles has two compressed segments that are written over the previous segment.
Having so many forks that differ in such minor ways is a huge annoyance, as bugfixes made to one of them isn’t always propagated to the others, and sometimes the same feature is implemented in one fork, and then implemented again in another fork in a completely different way. The latter occurred with the Sonic 1 and Sonic & Knuckles disassemblies, which each have their own way of setting the value of padding bytes to 0xFF.
Another major annoyance with these many forks is that it makes it hard to provide binaries for all of the supported platforms: each disassembly supports five different platforms (x86 Windows, x86 Linux, x86-64 Linux, x86-64 Mac, and x86-64 FreeBSD), meaning that, to cover all three disassemblies, 15 binaries are needed. For Mac, I rely on random contributors to provide binaries as I can’t build those myself, but because of the fragmentation Mac binaries provided for one disassembly cannot be used for the others, meaning that said random contributor would need to compile three different p2bin forks to get every disassembly working. In practice this never happens, with the contributor instead only providing the one binary for the one disassembly that they happened to be using at the time.
This issue is made even worse by the fact that custom sound drivers often require their own custom p2bin, as is the case for my Sonic 2 Clone Driver v2 and Flamewing’s Flamedriver, meaning that even if binaries were provided for all five platforms in all three diassemblies, there’s still an extra matrix of binaries that need to be provided for these platforms to support every custom sound driver.
To address all of this, I’ve created a new p2bin. It’s written from scratch to be highly-configurable and to allow me to release it under a permissive licence (the p2bin variants previously used by the disassemblies lacked licensing altogether).
The configurability is achieved through its command-line interface:
Usage: p2bin [options] [input filename] [output filename] [header filename]
Set padding byte to the specified value.
Specify a compressed series of Z80 segments where...
address = Starting address of first compressed segment.
compression = Compression format:
uncompressed = Uncompressed
kosinski = Kosinski (authentic)
kosinski-optimised = Kosinski (optimised)
saxman = Saxman (authentic)
saxman-optimised = Saxman (optimised)
kosinskiplus = Kosinski+
constant = Constant that is used to reserve space for the compressed
type = Method of inserting compressed data:
before = Overlap the previous segment.
after = Insert after the previous segment.
This tool converts a Macro Assembler AS '.p' code file to a ROM file.
Consecutive Z80 segments starting at a specified address can be compressed in a
specified format, and the size of this compressed data will be written to the
The two differences mentioned earlier are accounted for with two option arguments. Now, the padding value and the format of the compressed Z80 code and data can be adjusted by simply tweaking the build script – no new binaries needed.
For added fun, this new p2bin is written entirely in C++-compatible ANSI C. Even its dependencies, spanning three whole compression libraries, are C++-compatible ANSI C. This allows it to be built with pretty much any C/C++ compiler.
With this, p2bin fragmentation should be entirely eliminated, allowing a single set of binaries to cover every supported platform for every disassembly. Hooray for good programming.
Recently Dolphin (a GameCube and Wii emulator) was removed from Steam in response to a Cease & Desist order that Value received from Nintendo. Nintendo alleged that Dolphin violated the DMCA’s restrictions on distributing ‘circumvention’ software by including copies of cryptographic keys that are required to make games playable.
This begs the obvious question of why one should need Nintendo’s permission to have access to the unencrypted form of the data on the disc that they bought, but my concern is more with what makes distributing a 32-digit number paramount to circumvention and where the line is drawn.
A decryption key is little more than a magic number that is applied to encrypted data in some arbitrary manner to produce the unencrypted original. It’s like a cypher: take the text “uftu” and offset each letter by -1 to get the original text “test”. In this example, the decryption key is the number ‘-1’.
The main difference between modern encryption and simple cyphers and XOR masks is that it’s feasible to just figure out or guess the decryption key of a simple cypher or XOR mask, while it’s utterly infeasible to determine the decryption key of a modern encryption scheme. This means that the only way to get the description key is to obtain it from someone who already knows it.
To my understanding, the main reason that you can’t share random data that you pull out of a game or operating system on the internet is copyright protection: you can’t share a texture or song that you didn’t make because they’re both creative in nature. Decryption keys are different: it’s not that you can’t share them because they’re creative in nature (because they’re not), but rather because the DMCA restricts them under its anti-‘effective-technological-measure circumvention’ umbrella.
It’s easy to get the two confused, however: how is distributing 16 bytes (a decryption key) that you ripped from a Wii any different from distributing 4 gigabytes (an entire game) that you ripped from a Wii disc?
But what if I told you that encryption and decryption don’t have to involve an arbitrary string of bytes? What if I told you that it doesn’t have to involve any data at all? What if I told you that the secret to decrypting some data could be a mere algorithm?
What if I told you that forbidding the distribution of decryption keys by extension suggests that it’s forbidden to disclose such an algorithm?
Imagine that – outlawing an algorithm that can decrypt your legally-owned data because some random cunts on the other side of the planet don’t want you to decrypt it.
Let’s go back to the cypher comparison: I said that the key to decrypting ‘uftu’ is ‘-1’, but that’s not the whole story: you also need to know what to do with ‘-1’. In this example, the key is used to offset each letter of the text. ‘-1’ is the key, and ‘offset the value of each letter by the key’ is the algorithm. We could easily eliminate the need for a key entirely by changing the algorithm to, say, ‘invert the value of each letter’, making the encrypted form of ‘test’ ‘gvhg’.
If it’s not legal to distribute a magic number that’s solely responsible for enabling the decryption of “protected” (restricted) data, then surely it’s not legal to distribute an algorithm that has the same effect.
Not to mention how blurry the line between encryption and compression is: if it’s illegal to share an algorithm that converts data from an incomprehensible format to one that’s actually useful, then am I in the wrong for creating and sharing a decompressor for the Kosinski compression format to gain access to deobfuscated assets from the Sonic the Hedgehog games?
That begs the question of how I figured out Kosinski’s format in the first place, and likewise how Dolphin’s developers knew the Wii’s decryption keys: if it’s infeasible to determine these keys and algorithms, then how are they public knowledge? Well it’s because the very people who use these things to (over)protect their intellectual property by “managing” (restricting) the rights of their customers… gave them to us.
That’s right: they made it impossible to figure out how to decrypt their data only to hand the keys straight to us and then throw a tantrum because we “found out” their “secrets”.
The Wii’s decryption keys? They’re in every Wii ever made and can easily be extracted (and likewise for the Kosinski compression format there’s decompression code for it in millions of Mega Drive cartridges). Frankly, these decompression keys are public knowledge whether Nintendo like it or not: if they didn’t want us to have them, then they shouldn’t have sold them to us. A secret isn’t a secret if you tell it to everybody who asks.
But I’ll humour the idea for just a minute: so let’s say that it is somehow wrong to share a decryption key that over 100,000,000 people own a physical copy of – that’s not so bad, right? Software like Dolphin can just ask users to provide it themselves, like Yuzu and rom-properties does. It’s just a matter of copying some bytes – there’s nothing infeasible about that.
But what about when there is no decryption key, and instead only a decryption algorithm? How is the user supposed to extract that? How is Dolphin or rom-properties supposed to read that? Under this premise, users are either supposed to extract the raw machine code of the Wii’s decryption logic and have the software emulate it, or users are supposed to extract and somehow comprehend the machine code in order to determine the underlying algorithm, correctly implement it in pseudocode, and then have the software read that.
It’s asinine. It shouldn’t be a secret how to decrypt something when that information is already everywhere. Decryption keys are just a layer of obfuscation to hide the fact that decryption is nothing more than an arbitrary algorithm. An idea. Forbidden knowledge that many, many people have but aren’t allowed to share.
I’ve been busy lately, but I’ve come across some free time and figured that I should put together a list of all of my project from over the years. It can be found by clicking the ‘Projects’ link at the top right of the site.
So far, it lists 41 projects, stretching all the way back to 2009 and spanning my time as a Game Maker noob, Sonic ROM-hacker, Cave Story modder, and general programming hobbyist. It should serve as a nice centralised place to catalogue my projects, which was previously fragmented across GitHub and assorted forums, in the same way that the blog is a centralised place to catalogue my recaps of what I’m currently working on.
Speaking of which – where have I been since the last blog post? I was mainly working on ClownMapEd, which is that Qt project that I was talking about in the last post. ClownMapEd is a clone of the SonMapEd sprite editor. It’s close enough to completion that I’ve made a couple of releases. I actually wrote a couple of drafts for a blog post about it, but I always ran out of steam and couldn’t find anything to say.
Another thing I’ve been working on is my game engine: I’ve side-stepped the collision engine issues that have been holding up the project for the last six months and instead focussed on other aspects such as font rendering.
As seen here, it’s now possible to examine things by walking up to them and pressing a button.
Another thing I’ve done is produce a tutorial on restoring the cut Hidden Palace Zone level to Sonic 2:
I think it’s my best video tutorial yet! Compared to my previous tutorials, I frequently pause and resume the recording to avoid dead air. With it, I was able to condense what took over four hours to do down to a one and a half video.
It’s also the best restoration of Hidden Palace Zone that I’ve done to date: all badniks, all level objects, a few bugfixes, a complete debug object list, and assets taken from the latest prototype to still feature Hidden Palace Zone.
Restoring Hidden Palace Zone is actually one of the first things that I tried to do when I started hacking Sonic 2 way back in 2012. Naturally, it was a struggle, so I could never do it completely.
Hopefully sometime soon I’ll take on a project that I actually have something to say about.
I’ve been working on a C++ project (blame Qt), and I recently stumbled across an issue that seemed to be caused by not following the Rule of 3/5: after ‘reconstructing’ an object by assigning a newly-constructed temporary object to it, my program began crashing with some kind of use-after-free error.
I decided to do some research, which sent me down the rabbit hole that is copy constructors, copy assignment operators, move constructors, and move assignment operators.
After I picked my jaw up off the floor, I set about adding these to one of my classes. Unfortunately, the code was quite large, verbose, and full of duplication:
I didn’t like this, especially since a copy/move constructor and its corresponding assignment operator seemed to mostly do the same thing – could these not share code somehow?
A method I found that did allow a constructor and assignment operator to share code was the ‘copy and swap idiom‘. Not only that, but it also allowed copy constructors/operators to share code with move constructors/operators. This code compactness seemed great, but I didn’t like that the process of swapping required a third, temporary object. Considering that my objects were responsible for large buffers, this seemed like an awful waste of RAM.
The code that I’d written had a lot of duplication: the code used to copy/move each of the object’s buffers was exactly the same. This had me wondering if could make a buffer class that would allow me to make both buffers share their copy/move code. But, wait, doesn’t C++ already have a bunch of container classes that do that? After giving it some thought, I settled on replacing my class’s buffers with vectors, and, as a result, I was able to greatly simplify the constructors and assignment operators:
My, it’s so minimal! It’s so sleek! It’s so efficient that the destructor doesn’t need any code!
Wait… the destructor doesn’t need any code?
The Rule of 3/5 says that if you need a copy/move constructor, copy/move assignment operator, or a destructor, then you probably need all of them. But clearly I don’t actually need an explicit destructor anymore, as the default implicit one will do the job just fine!
Actually… now that I think about it, all of those methods can be replaced with their defaults.
After doing that, here’s my code:
That’s right: there isn’t any! By using the proper containers, I don’t need explicit copy/move constructors, copy/move assignment operators, or even a destructor anymore! The code’s practically writing itself!
Dear ImGui’s default style looked a bit tacky and ‘programmer art’-like to me, so I’ve made my own:
I’ve tried to create a typical ‘dark mode’ theme while still keeping a high degree of contrast in order to maintain good legibility. Another focus of the style is technical and artistic minimalism, hence the removal of borders, tab/scrollbar rounding, and colour (colour introduces visual noise, and rounding greatly increases the number of rendered polygons).
Optimised VRAM Viewer
The VRAM viewer wasn’t very efficient: it would try to render every single tile even when only a fraction of them are actually visible to the user. This is something that can be seen with Dear ImGui’s built-in debugger:
As the debugger shows, hidden tiles are rendered above and below the section of the viewable region. This causes 4096 polygons to be drawn in total, when far fewer are actually needed.
I discovered that Dear ImGui has a feature specifically intended to address this kind of problem: the List Clipper! The List Clipper automates the process of selecting only the elements in a list that are visible to be rendered. Applying this to the VRAM viewer only required the smallest bit of refactoring (changing a single for-loop that iterated over each tile into two for-loops that iterate over each row and then each tile in each row) as well as the removal of a small hack, and the problem was solved!
In this example, the number of polygons has been reduced to just over 500!
Support for the YM2612’s Timers
Huzzah – an improvement to the actual Mega Drive emulation!
The Mega Drive’s primary sound chip – the YM2612 – has a nifty little feature that went tragically underused by Sega: a pair of timers that are capable of raising CPU interrupts. This is notable because the only other interrupts of its kind are the VDP’s V-blank and H-blank interrupts, but those interrupts’ timings vary based on whether the console is a PAL or NTSC model. Additionally, these timers are fully configurable, allowing for the possibility of arbitrarily-timed interrupts!
…And Sega didn’t think to connect these timers to either of the Mega Drive’s CPUs. What a waste.
These timers are still usable, but they must be manually polled by the CPU to check if they’ve expired yet. This wastes precious CPU time, and squanders the timers’ potential for certain uses.
Games typically use these timers for controlling the timing of their sound engines. There are two alternative ways to achieve this, but they both have their own downsides:
One way to control the speed of music and sound effects is to use the V-blank interrupt, however, this interrupt occurs less often on PAL consoles, in turn causing the sound engine to update less often. This causes music and sound effect to play slower on PAL consoles, which is perhaps most well known in Sonic the Hedgehog (1991). The game’s sequels avoid this by detecting PAL consoles and forcing every fifth V-blank interrupt to update the music twice, resulting in it having roughly the correct speed, albeit with some minor distortion. Using the YM2612’s timers instead would have avoided this issue entirely as they are almost exactly the same on PAL consoles as they are on NTSC consoles.
Another method of controlling the speed of audio is to manually time code execution by writing the code so that it uses a certain number of CPU cycles. This approach is quite extreme, but it does see heavy use in Z80 code that is responsible for feeding PCM samples to the YM2612’s DAC channel (such code is called a ‘DAC driver’). Most DAC drivers use idle loops to waste CPU cycles until it is time to send the next sample. Were the YM2612’s timers capable of raising CPU interrupts, then this technique would be largely unnecessary except for in the most advanced DAC drivers.
The YM2612 timers are quite rudimentary: every time the YM2612 outputs a full frame of audio, Timer A is decremented, and every 16 times a full frame is output, Timer B is decremented.
Unfortunately, I ended up not noticing this, and redundantly refactoring my YM2612 emulator to operate in cycles instead of audio frames (there are 144 cycles to 1 audio frame). The reason that I made this mistake was that the (unofficially translated) official documentation for the YM2608 (the chip that the YM2612 is derived from) states that Timer A decrements every 72 cycles, not 144. And yet, my own testing and the documentation found here both suggest that the timers are twice as slow as the YM2608 manual claims that they are. This may have something to do with how the YM2612 differs from the YM2608 in how it “mixes” all six channels togethers (it doesn’t: it just cycles between outputting each one hundreds of thousands of times per second).
Since these timers are typically used to time sound engines, the lack of emulating this feature causes many games to output no audio. Such games include Vectorman and Castlevania: Bloodlines. With support for the timers added, these games now produce audio.
Another game affected by this is an old ROM-hack of mine – ‘Sonic 2 except the music goes as fast as you do’:
Unlike the vanilla Sonic the Hedgehog 2, this ROM-hack uses Timer A to control the speed of the sound engine. By adjusting the timer, the speed of the music and sound effects is changed!
clownmdemu’s frontend is a rather complex bit of software, so, to do the things that it does, it leverages a number of open-source libraries (and fonts). These libraries are made available under certain conditions: for example, a library may require that its authors are credited in the documentation of any software that uses it, while other libraries go a step further and require that an entire copy of the library’s licence is provided in said documentation. The libraries used by my frontend tend to require the latter.
Being a minimalist, I don’t like the idea of every release of my frontend bundling the executable with a dozen text files, so instead I had the idea of embedding the licences into the frontend itself. To this end, there is now an ‘About’ menu which gives a brief overview of what the program is and provides a list of open-source licences.
This makes it a lot easier for myself and anyone else who uses my emulator to abide by the various licences, as there no longer has to be any worry about forgetting to reproduce licences with every binary distribution.
Personally, I think the requirement to reproduce a big blob of legalese with every binary distribution of non-copyleft software is stupid, which is why my libraries are usually zlib- or 0BSD-licensed instead, as they don’t have that requirement.
Support for the Window Plane
Another big feature for Mega Drive emulation!
The Window Plane is an oddity to me: the first two Sonic games (whose codebases I am very familiar with due to spending over a decade reverse-engineering them) never use it, so it’s completely alien to me. That’s why it took me so long to add support for the Window Plane: I never had the need nor knowledge to.
What the Window Plane is is a bizarre override for Plane A. Unlike Plane A, the Window Plane cannot be scrolled, but it is otherwise capable of everything that Plane A is. The Window Plane is not rendered on top of Plane A, like Plane A is to Plane B, but rather the Window Plane renders instead of Plane A. The VDP specifies two boundaries – one vertical and one horizontal – that determine where Plane A stops being drawn and the Window Plane starts being drawn instead (or vice versa).
This feature tends to be used for drawing a HUD that does not scroll with the game’s foreground or background. An example of a game that does this is Castlevania: Bloodlines:
This feature has a glaring bug on real Mega Drives: if the Window Plane is drawn to the left of Plane A, and Plane A is scrolled horizontally by a number of pixels that is not a multiple of 16, then the two columns of Plane A tiles that are next to the Window Plane will be “disfigured”. This bug is noted in Sega’s official Mega Drive developer documentation (the “Genesis Software Manual”, page 50). My emulator does not yet reproduce this bug, but I plan to add it in the future.
One other bit of software that I have on-hand to test the Window Plane is a little bit of homebrew that I found here, which was made by someone called ‘Fonzie’. It’s useful for illustrating the aforementioned bug on real Mega Drives, but it’s also handy for testing that emulators support the Window Plane. Here’s a screenshot of it running in my emulator:
Like the Sprite Plane, Plane A, and Plane B, the Window Plane can be disabled in my frontend’s debugging toggles menu. As a novelty, disabling the Window Plane will cause Plane A to be drawn in its place. This allows the user to see the part of Plane A that is “hidden” by the Window Plane.
Support for the 68000’s BCD Instructions
The 68000 has three instructions for performing binary-coded-decimal arithmetic. The long and short of it is that performing a BCD addition between 0x1 and 0x9 results in 0x10 instead 0f 0xA. This is useful in situations where you need to extract individual decimal digits from a number but don’t want to resort to repeatedly dividing by 10 to do so, as, if the number is in BCD format, you can instead just use bit-shifts and bit-masks, which are way faster than divisions. The two games that I know of which use BCD instructions use them for HUD elements, which makes sense since each digit needs to be extracted so that it can be used to determine which number graphic to display on the HUD.
The reason for these instructions taking so long to be implemented, besides them being quite niche, is that they have an absurd number of undocumented behaviours and edge-cases. For instance, the BCD instructions are some of the only instructions to have their overflow condition code behaviour officially specified by Motorola themselves as ‘undefined’, meaning that the official documentation is of no help in understanding how it works. The documentation also fails to explain what is supposed to happen when a BCD operation is performed on non-BCD numbers. For instance, what’s supposed to happen when 2 is added to 0xC? Is the result 0xE? 0x12? 0x10?
Luckily, other emulator developers encountered this problem too, and exhaustively documented how a real 68000 performs its BCD operations, figuring out every quirk and feature. This information has been collected in this SpritesMind thread. Most notably, Flamewing provided some homebrew that you can run on an emulator or a real Mega Drive to verify that its BCD instructions do as they should. Safe to say, my emulator now passes all of this homebrew’s tests!
With this, three of the 68000’s quirkiest instructions are now fully emulated!
The lack of BCD instruction emulation broke Castlevania: Bloodlines in a couple of humorous ways: it was impossible to use items because the ‘gem’ counter would never go above 0, and it was also not possible to get a Game Over because the lives counter would never go below 0, which effectively gave the player infinite lives.
Vastly-Improved Z80 Emulation Accuracy
Some people have told me that certain Sonic ROM-hacks were missing their DAC audio output when ran in my emulator. One such hack that I was able to verify this with was Sonic 1 Megahack: Ultra Edition, which lacked the music and sound effects during the “Sonic have pased” popup after completing a level.
Unlike my 68000 emulator, my Z80 emulator had not been verified against any test suites, so it was likely that my Z80 emulator had a number of bugs and inaccuracies that these ROM-hacks were invoking, causing them to misbehave and not output audio through the DAC channel.
Back when I initially developed my Z80 emulator, I had read a blog post that detailed something called ‘ZEXDOC’ which is a program that is written in Z80 assembly and can be ran on both real and emulated Z80 CPUs to verify that each instruction performs properly, which it does by analysing RAM, registers, and flags before and after each instruction. The catch was that this program is intended to be ran on CP/M, which is an old operating system for the Z80. With that said, the dependency on CP/M was quite minimal: ZEXDOC only used two of CP/M’s console-printing system calls, and expected itself to be placed at address 0x100.
I was able to quickly rig-up a small program that implemented just enough of CP/M to get ZEXDOC running on my Z80 emulator, and it signalled that many instructions were not working as intend:
Unfortunately, ZEXDOC gave little feedback as to exactly which instructions went wrong and how they went wrong. Not to mention, if my emulator is performing Z80 instructions incorrectly, and ZEXDOC uses Z80 instructions to perform the calculations that determine whether other Z80 instructions work correctly, then how can I be sure that its calculations are being performed correctly? What if some of these failed tests were false-positives?
So I couldn’t trust ZEXDOC’s output, nor make attempts to narrow-down exactly what inaccuracies my Z80 emulator had.
This problem didn’t exist with my 68000 emulator’s test suite, as the actual tests are not performed by the emulated 68000 itself but rather the host computer’s CPU. If I could find a test suite like that for the Z80, then that would be a great help. Not to mention that a test suite like the one used for my 68000 emulator would do a much better job of helping me verify exactly which instructions are incorrect and why, since it lists the exact contents of each register and memory address before and after the tested instruction’s executions.
I was in luck, because I stumbled across a Z80 test suite that was almost identical to the test suite that I used for my 68000 validator!
Due to its similarly to the 68000 test suite, I was able to make a modified version of my 68000 validator read the Z80 test suite’s data, and begin performing said tests on my Z80 emulator. It immediately uncovered numerous bugs, of which some were minor (such as undocumented flag behaviours) and others were severe (such as entire swaths of instructions using the wrong operands).
Soon, I had my Z80 emulator passing every test, with the exception of instructions which I had yet to implement in the first place such as IN, OUT, CPI, CPIR, and DAA. Now confident that my emulator could run ZEXDOC at least somewhat properly, I tried it again and, this time, most of the tests passed:
The only failed tests were for instructions which I had not yet implemented in my emulator. I have never seen these instructions used in any Z80 code for the Mega Drive, be it code from official games or homebrew. Because of this, implementing these instructions was never a priority for me, and I had no test-cases for them either. However, because both the Z80 test suite and ZEXDOC provide exhaustive tests for these instructions, I decided that I would finally add them to my emulator. Eventually, these new instructions passed both validators, meaning that every ZEXDOC test passed:
There are still three instructions that are not yet implemented: HALT, IN, and OUT. The reason that the latter two are not implemented is because they are special instructions that write to “IO ports”, which do not exist on the Mega Drive.
The instructions that I did implement are CPI, CPD, CPIR, CPDR, and DAA. The first four all are variants of each other, and are used for searching for a particular byte in a block of memory (much like C’s memchr function). The DAA instruction is for performing “BCD correction” to the accumulator register. It essentially performs the same task as the second half of the 68000’s ABCD, NBCD, and SBCD instructions: it computes a ‘correction factor’ based on the output of the previous arithmetic instruction, and adds it to the output to make it into a valid BCD number.
Now that my Z80 emulator is much closer to behaving like a real Z80, those Sonic ROM-hacks should all have working DAC audio output, right?
Well, the joke’s on me, because it turns out that Sonic 1 Megahack: Ultra Edition still has missing DAC audio. But why? Why does the audio still not work despite the Z80 emulation being so much more accurate? Well… it’s because the audio doesn’t work on a real Mega Drive!
Yep, you read that right: that ROM-hack only works properly on inaccurate emulators, because that’s all that the developer used to test the game during its development. This is actually a fairly common occurrence when it comes to ROM-hacks, since running a ROM-hack on a real Mega Drive requires an expensive flash-cartridge, so most developers just stick to testing exclusively with emulators.
So, in the end, the audio in Sonic 1 Megahack: Ultra Edition was bugged, not because my emulator was inaccurate, but because it was too accurate. I suppose I should be proud of that.
Improved FM Debug Menu
While debugging Sonic 1 Megahack: Ultra Edition‘s missing audio, I realised that the FM debugger could be improved, so I’ve moved the per-FM-channel data out of the tabbed section and into a shared table, and exposed the timers, latched address and port, and channel panning.
Fix 1-Cell Horizontal Scrolling Mode
This is another bug that I didn’t notice because no Sonic game or ROM-hack that I know of uses it. The Mega Drive’s VDP has three ways of scrolling the screen horizontally:
Scrolling the entire screen.
Scrolling each row of pixels individually.
Scrolling each row of pixels in groups of 8 (the size of a tile, or, as Sega’s official documentation calls it, a “cell”).
This bug involves that last one: it just didn’t work at all. This can be seen in Earthworm Jim‘s “What the heck?” level:
So what’s going on? When I first wrote my VDP emulator, I assumed that 1-cell mode kept all of its scroll values next to each other in memory, just like in 1-line mode. However, that is not the case: strangely, each cell’s scroll value is actually spaced 8 values apart. Correcting this behaviour fixes Earthworm Jim, and presumably everything else that uses 1-cell mode.
Add General VDP Debugger
To diagnose the above bug, I needed information on what the VDP was doing, so I added a menu to show the VDP’s various settings. It’s not all that pretty, but it gets the job done for now:
It was with this that I figured out that Earthworm Jim‘s scrolling was only broken when the VDP was in 1-cell horizontal scrolling mode, so this menu should prove useful in the future too.
Abandon ‘tiny file dialogs’ Library
Dear ImGui and SDL2 are great and all, but they don’t provide a cross-platform way to browse files. This is a problem since it’s important for the user to be able to select a ROM image or a save-state file to load or save. To this end, the frontend made use of the ‘tiny file dialogs’ library, which enables the use of the operating system’s standard file dialogs.
Unfortunately, the library is as much a help as it is a burden: its code is of questionable quality, producing various compiler warnings for such novice mistakes as returning pointers to local arrays. In addition, despite emphasising portability and supporting POSIX, the library is incompatible with the BSDs. Finally, its support for symbolic links is allegedly completely broken.
With the library being riddled with code hygiene issues and actively limiting the portability of my frontend, I’ve decided to ditch it. In its place, I’ve added a barebones file input prompt that leverages Dear ImGui.
Neither C++11 nor SDL2 provide a way of querying directories for their contents, so this is the only universally-compatible solution. I could have used C++17’s file-system API, but I worry that such a new API is not very ubiquitous yet.
Of course, using such a limited, clunky way of opening files would harm the frontend’s usability, so I intend to add platform-specific logic to use the native file dialogs whenever possible. Right now, this has been done for Windows, and I intend to do the same for Linux (GTK and Qt) soon. Users of other operating systems will have to get comfy with the barebones dialog, but it’s still an improvement over the frontend not compiling at all.
The barebones file dialog supports drag-and-drop: just drag the desired file onto the window and its path will automatically be entered into the text box. The file dialog doesn’t even have to be open: ROMs and save states can be dragged onto the window at any time, and the frontend will apply them appropriately. Intuitiveness is great.
Fix VRAM Fill
The Mega Drive’s Video Display Processor has three Direct Memory Access modes: 68000 to VDP, VRAM Fill, and VRAM to VRAM. The first is used quite often and implemented in my emulator, the second is much less common and supported by my emulator but not well tested, and the third is rare and not yet in my emulator at all.
Across a large number of games (including Sonic, Castlevania, Vectorman, and Earthworm Jim), VRAM Fill is seemingly only used for setting large swarths of VRAM to 0. This basic usage is easy to emulate, but does not serve well to ensure that said emulation is entirely accurate to the behaviour of a real Mega Drive.
Cue Mega Man: The Wily Wars, which makes some interesting use of VRAM FIll: at the start of VRAM, it stores a handful of blank tiles, each of a different colour – these tiles are all generated with VRAM Fill.
This behaviour was enough to expose issues with my emulator’s implementation of VRAM Fill:
Looking at the VRAM debugger shows exactly what’s going wrong:
Every other two pixels is not being filled with the correct colour. I immediately had my suspicions about the cause of this: back when I was first writing the VDP emulator in September 2021, I learnt that VRAM is apparently 8-bit, while CRAM and VSRAM are 16-bit. This caught me by surprise, as I’d always assumed that all three of them are 16-bit, so I was not sure what the proper way to implement this quirk was. Instead, I implemented VRAM as 16-bit, as I was originally going to. I figured that this might cause certain edge-cases – such as uploading data to an odd VRAM address – to behave incorrectly, but otherwise it would not be an issue. However, seeing this bug in Wily Wars convinced me that this workaround had to go.
It turns out that many aspects of my emulator’s implementation of VRAM Fill were incorrect: for instance, it mistakenly assumed that the length was measured in 16-bit words, when in reality it was measured in 8-bit bytes minus one. Additionally, the 16-bit word that specifies the value to fill the VRAM with is not entirely used: only the upper 8 bits of it are actually written to VRAM.
Addressing these issues fixed the bug, and Wily Wars now looks much better:
Wily Wars still has one small bug: those four coloured dots near the start of VRAM. The game appears to accidentally write a VDP command ($8F02) to the VDP’s data port instead of its command port, causing it to be uploaded to VRAM as graphics. While the game certainly makes the same mistake on a real Mega Drive, it doesn’t seem to result in those pixels being written to that particular place in VRAM, as it causes artifacts to be visible in the level which aren’t present on a real Mega Drive:
See the pixels in that pit? Those aren’t there on a real Mega Drive. Just another inaccuracy to fix in the future, I suppose.
Implement YM2612’s ‘BUSY’ Flag
This is a feature that wasn’t too important to emulate, but it’s so simple that I figured that I’d might as well.
The YM2612 takes time to process the data that it is given. To signal to the CPU that it can’t accept more data yet, the YM2612 sets the high bit of its status byte. The CPU obtains this byte by reading from one of the YM2612’s address ports.
On the YM2612, the busy flag is extremely basic: it is set whenever either data port is written to, and always lasts for 32 YM2612 cycles (192 68000 cycles), regardless of how long the submitted data actually takes to process. I’ve heard that the longest YM2612 operation is only 24 cycles, and that the YM3438 actually does set the busy flag for lengths that match each operation’s duration.
With this feature implemented, any software that explicitly relies on the busy flag for timing should work correctly now. An Earthworm Jim game and a game called Hellfire apparently rely on this.
Add ‘Other’ Debugging Menu
To expose yet more of the Mega Drive’s internal state to the user, a menu has been added to show information about the general console, rather than specific components. Right now, this menu revolves around the bus arbiter, but it will be expanded with other information in the future, as the need arises.
This has been a pretty massive update (and a massive blog post), so I’m feeling a bit burnt-out on working on this emulator. Progress may be slow for a while after this. I’m pretty happy to see how far this emulator has come, though! It wasn’t long ago that this emulator couldn’t even boot anything, and now look at what it can do: play the classic Sonic trilogy, run Linux, and even run ROM-hacks and homebrew!
This is just a quick update to address some issues in the previous v0.3 release.
Make FM Debugger More Compact
The FM debugger was a bit ‘verbose’ in v0.3…
As you can see, each channel was given its own window, which meant that it was a lot of effort to simply switch from one channel to another without just having all windows open at the same time, which would take up a lot of the screen.
Since it’s unlikely that a user would ever need to see more than one FM channel’s registers at a time, these windows have all been merged into a single tabbed window:
Fix DPI Support
Unfortunately, after hyping it up so much in v0.3’s release, the default window sizes were broken on DPIs that weren’t 150% the standard. I was expecting Dear ImGui to handle DPI differences like this automatically like it usually does, but that’s not the case here.
I’ll have to remember to test this frontend at alternate DPIs before each release to prevent a repeat of this mistake.
Add a Horizontal Scrollbar to the Plane Debugger
As the result of yet another strange quirk of Dear ImGui, horizontal scrollbars do not exist by default, even in windows that need them. This affected the VDP’s Plane A/B debuggers, which only had a vertical scrollbar. By explicitly telling Dear ImGui to create a horizontal scrollbar, this issue is no more:
User-Friendliness Improvements to Keyboard Rebinding
Sometimes it’s the small things that matter most.
When the user is repeatedly adding key bindings, the newly-extended binding list would push the ‘Add Binding’ button off-screen, requiring the user to scroll down to be able to press it again. This is a small annoyance, but an annoyance nonetheless, so it has been corrected by automatically scrolling the window down after a new binding is added.
Additionally, when selecting an action to bind to a selected key, the selected key is displayed to the user. This extra feedback allows the user to verify that they selected the correct key, instead of them being left in the dark.
Fix Phantom Keyboard Inputs After Rebinding
Sometimes, after rebinding the keyboard inputs, the emulated Control Pad would behave as if certain buttons were held when they are not. This was due to edge-cases in how the key-binding system works. For instance, if a key’s binding were changed after it has been pressed but before it is released, then the emulator would ‘forget’ which Control Pad button to release when the key is released. This should no longer be the case.
Fix Ugly Seams Around Tiles in VRAM Debugger
Depending on the display’s DPI, odd artifacts could appear around the tiles in the VRAM viewer:
This was the result of some accidental fractional image scaling. This has been corrected to use the proper integer image scaling, eliminating the seams.
With this much-needed polishing complete, hopefully the next update will include some improvements to the core emulation: Window Plane, SRAM, LFO, SSG-EG, YM2612 Timers – there are plenty of things left to add.
This update mostly affects the standalone frontend, but a couple of them also apply to the libretro core.
One shortcoming of the standalone frontend is that it lacks keyboard rebinding: the W, A, S, and D keys will always control the Control Pad’s D-Pad, and so on.
But not anymore!
New to the frontend is full keyboard rebinding! In addition, the default key bindings have been switched to the more common arrow keys and Z/X/C keys combination.
Unlike some other emulators, this system allows the user to bind multiple keys to the same action: for instance, if the user wanted to bind both the ‘Z’ key and the ‘space’ key to the Control Pad’s ‘A’ button, then they can do so!
It would be pretty frustrating for binding customisations to be lost whenever the program is closed, so support has been added for persistent configuration: settings are saved to a file called ‘clownmdemu-frontend.ini’, allowing settings such as the keyboard bindings, console region, and V-sync to be remembered by the emulator.
Previously, the options would all be managed through the menu bar, but this is quite clunky as the menu bar would close after each option is toggled. To improve the user experience, the options have now been moved to a dedicated menu:
This menu provides a much more intuitive way to change options! Additionally, each option shows a tooltip when hovered over with the mouse, allowing unfamiliar users to understand what they do!
Default Window Sizes
Another improvement to the user experience is that windows are now given a sane default size, meaning that they will now have a proper size when opened for the first time.
Opening the same ROM file over and over again is tedious, so now the emulator keeps a list of the 10 most recent files used:
FM and PSG Debugging Toggles
The standalone frontend has had the ability to disable individual VDP planes for ages, but now it can also toggle FM and PSG channels. A dedicated menu has been added for this:
This feature is also available in the libretro core:
PSG Debugger Overhaul
The PSG debugging menu was butt-ugly before, and has been given a makeover:
Support for Alternate PAL Detection Method
Previously, when playing Sonic the Hedgehog 2 with the emulated Mega Drive in PAL mode, the music would play at a slightly slower speed, just like it does in the first game. This shouldn’t happen.
The reason that this was occurring was that the game relies on an alternative method of detecting the PAL video mode: by checking bit 0 of the VDP’s control port. This bit should reflect whether PAL mode is enabled or not. Now that this is the case, the game properly detects and accounts for the speed difference in its music, allowing it to play at the proper speed.
User-friendliness has been a focus of this update, so hopefully this will make the standalone frontend much more accessible to new users!
It’s been too long, but finally my emulator has an update!
Since the first release, the emulator has been greatly optimised, some inaccuracies in the 68000 interpreter have been addressed, and the occasional missing CPU instruction has been added. Compatibility with games should be a bit better than before, but still not great as many essential features of the Mega Drive are not emulated.
The standalone frontend has had some extra debug menus added, which allow you to view the registers of the YM2612, 68000, and Z80:
New to the emulator is a libretro core frontend, allowing the emulator to be used by libretro implementations such as RetroArch. It lacks the debug menus of the standalone frontend, but makes up for it with features that libretro cores get for free, like customisable controllers and shaders:
In theory, the libretro core should provide a simple way of getting this emulator running on a variety of platforms: just compile the core into a library (static or shared), and use it in tandem with a libretro frontend such as RetroArch.
During the development of this update, I have set up a test suite for the 68000 interpreter which allows me to check that each instruction does as it is supposed to. It was this test suite that notified me of how the word-size ADDA, SUBA, and CMPA instructions were pitifully broken. I’m surprised that this didn’t break Sonic 1, 2, or 3&K, but it did break Linux.
I also made a small benchmarking tool which measures the speed of the core emulation logic. This is useful for measuring the impact of optimisations and the difference in speed between platforms.
Overall, this has been a rather incremental update. Rather than being focussed on optimisation and refactoring, I hope that the next update will be focussed on improving compatibility and emulating more features of the Mega Drive.
You can find the standalone frontend here, and the libretro frontend here.
Years ago, I wrote an MD5 hasher. For some reason, I never gave it a proper release, instead only including a copy of it in one or two of my projects. That’s finally changed, and I figured that I’d mark the occasion by giving a recap of its history here. It’s a bit more complex than you’d expect.
I originally wrote my MD5 hasher as part of a university assignment. It was meant to be written in C#, but I preferred C, so I wrote it in that instead and converted it to C# after I had it fully tested and working.
This was simple enough: getting the hasher to produce the correct hashes was a bit of a nightmare due to parts of the specification being easily glossed-over, but the actual conversion to C# only had one mishap: right-shifting by 32 resulted in a right-shift by 0 instead. This is actually undefined behaviour in C, so I had to correct that to get consistent behaviour between the two languages.
It seems like this trick paid off, because I never caught any flak for the code not being written like ‘proper’ C# or anything like that.
After submitting my MD5 hasher, I forgot about it for months (or maybe years) until stumbling across it again and deciding to clean it up a little: I converted it to a single-header library (one of the first that I had ever made), and overhauled the API to be lower-level by allowing data to be streamed to it a chunk at a time instead of all at once.
Despite this, I didn’t release the new and improved hasher, and instead just placed it in a directory called ‘clownlibs’ which contained assorted small libraries of varying degrees of polish. I had considered releasing them all on GitHub in a single repository, a lastb, but I became paranoid about how it would be impossible to star a particular library, or have a submodule pull in one specific library (something that came to a head a few blog posts ago), so I ended up endlessly putting it off.
A long time later, I was overhauling the build systems of the various Sonic the Hedgehog disassemblies, converting them from Batch/Bash/Python to Lua. The disassemblies relied on being able to produce hashes of the assembled ROM image, and comparing them against a series of hashes to determine the ROM image’s accuracy. Previously, this had been done with Python, but, with Python being replaced with Lua, there was no longer a hasher built into the language’s standard library that could be relied on. I tried to source a Lua hasher online, but all of the ones that I could find were absurdly slow. In hindsight, this was probably because Lua had only recently introduced support for integers and bitwise operations like AND and OR, meaning that those hashers were instead simulating them using floating-point operations, which, frankly, blows my mind.
Not realising this at the time, I instead assumed that the problem was simply that neither Lua nor the hashers that I had tried were very fast. This made me remember my own MD5 hasher, which was optimised for performance and portability above all, and I figured that I should try porting that to Lua to see if it performed any better than the others.
The process of porting the hasher to Lua wasn’t too complicated, though Lua does have a number of syntax differences from C that had to be accounted for. Lua’s ubiquitous tables also meant that portions of the code had to be rewritten to be more natural to the language.
Before long, I had a working MD5 hasher written in Lua that performed wonderfully, annihilating the other hashers in terms of speed. This hasher would find its way into the disassemblies of Sonic 1, Sonic 2, and Sonic 3 & Knuckles.
My MD5 hasher came in handy once more as I was working on my Wii U port of Sonic Mania: that game’s built-in MD5 hasher was garbage, and I’d always wanted to test my MD5 hasher on a big-endian platform like the Wii U, so I swapped the two. My hasher integrated into the codebase pretty well, and even eliminated some thread-safety issues. As expected, it worked perfectly on the big-endian CPU, putting the prior hasher to shame. The game does a lot of hashing, so it was really putting my hasher to the test! It was also just so cool to see my software being leveraged by an actual game.
After that, my MD5 hasher returned to its slumber once more, until today: I happened to take a look in my ‘clownlibs’ directory, and noticed that my hasher was the only library in there which I hadn’t eventually released: clowncommon.h and clownresampler.h have their own GitHub repositories now, but clownmd5.h still remained hidden. At last, I figured I’d put it off long enough, and finally created a GitHub repository for my hasher, years after first writing it.
Honestly, I didn’t expect to get so much mileage out of this library: it was just some university coursework, and yet it ended up being used by a bunch of different projects. Since it was so useful to me, it will hopefully be useful to others too. It’s licensed under the 0BSD licence, so there’s no reason not to go nuts with it!