Design a site like this with WordPress.com
Get started

Making My Own Game Engine

Where the heck have I been lately? Starting yet another project, of course.

Why an engine? Well, there’s only so much time I can spend poking around other people’s game engines before I start wanting to make my own… which in my case is about 10 years.

Personally, I have a real hate boner for premade game engines like Unreal and Unity: taking the programming out of game development just takes all the fun out of it. In addition, I’m a massive sucker for NIH syndrome: I love making my own versions of things. MD5 hasher? I made one of those. Graph-based LZSS compressor? That too. Mega Drive emulator? Yup. I always prefer to use things that I made over things that other people made, so why not make my own game engine that I can just play around with too?

As always, this project has some goofy programming requirements: in particular, I want it to be written in C89, and have no global state. Why C89? Because I like being able to compile my code with ancient compilers like MSVC 6. I also see it as a fun challenge, being unable to rely on a built-in boolean type or 64-bit integers. Why no global state? Because I want to allow multiple instances of the engine to be ran at once, just like my Mega Drive emulator. This could be useful for things like adding a ‘race’ mode to the engine that allows two people to play a game at once and see who can beat it first.

C89’s shortcomings come down to its requirement that declarations always come before code (which leads to very ugly code), its lack of 64-bit integers, its lack of a boolean type, and its lack of many C standard library functions, macros, and constants. Most of these haven’t been an issue for me (so far I’ve had no need for 64-bit integers, for instance), and the missing things that I do need can simply be implemented from scratch. For this, I created a header called ‘clowncommon.h’ that defines such things as a boolean type and a Pi constant, as well some bonuses like min/max/clamp macros and a degree-to-radian macro.

Eliminating global state from the engine was tricky due to many dependencies having unavoidable global state, particularly GLFW and OpenGL. This is understandable, especially in GLFW’s case, but the tricky part is knowing what should and shouldn’t be global. For instance, there’s no harm in having multiple instances of the game engine generate vertex buffers from the same OpenGL context, but the instances should not share the same pool of vertex buffers, as one instance exhausting the pool would cause it to be exhausted for the other instances as well, breaking the encapsulation between them. Having designed the engine around this from the start, it wasn’t hard to eventually get a working build of the engine running two instances of itself at once, rendering to a single window:

The same scene being displayed by two separate instances of the engine in horizontal split-screen.

I’m quite proud of the object system: like the 16-bit Sonic engine, objects are allocated from a pool, but unlike the Sonic engine, objects run their constructor as soon as they’re allocated, rather than during the first invocation of their ‘update’ function. Additionally, my engine’s object system has support for destructors. Finally, the pool is managed with a pair of linked lists: one for allocated objects, and one for deallocated objects. This makes allocation and deallocation ‘O(1)’, compared to the ‘O(n)’ of the Sonic engine’s allocation, which performs a slow array search. Additionally, the linked lists makes iterating over all allocated objects faster, since the deallocated objects are skipped without needing to be checked first.

I’d have more to show off, but I’ve been stuck for the past few weeks on implementing collision detection. I didn’t want to use a discrete collision algorithm, as that would allow things to phase through walls if they move fast enough, so instead I looked into continuous collision algorithms. I found this one which, while not perfect, seemed to be good enough for a first attempt. Taking into account criticisms of the algorithm such as this and this, I was able to produce an implementation of the algorithm that lacked some of its original shortcoming, such as allowing the collision ellipsoid to become infinitely close to collision triangles. Despite this, I have been stuck debugging an issue involving the collision response step overreacting and pushing the collision ellipsoid far away from the triangle that it intersected with. To tell you the truth, I wasn’t expecting this to be so difficult: I’ve made so many things over the past 10 years that I wasn’t expecting to be stumped by something as ‘simple’ as level collision. Rather than work on my engine proper, I’ve instead had to divert my effort to writing a debugging visualiser for the collision system. Bleh, at least I got to write a sphere mesh generator – that was fun.

A generated sphere, which has been scaled into an ellipsoid that represents the player’s collision shape.

Maybe I should use this debug visualiser as an excuse to develop the engine further, such as by adding a GUI or input rebinding, or perhaps even a full-blown debug mode. Who knows.

Hopefully, I’ll crack the collision bug soon, and I can start making some real progress on this engine.

Advertisement

clownmdemu – 68k Overhaul

Ever since adding support for the Z80, the next improvement that I wanted to make to my emulator was overhauling the 68k CPU interpreter.

As detailed in the first emulator-related blog post, the 68k CPU interpreter was one of the first parts of the emulator that I created. It was also the first CPU interpreter that I had ever written. Safe to say, there were a lot of mistakes made with it: in particular, I optimised it for size rather than speed.

Each of the 68k’s instructions follow very similar steps: read the source operand, read the destination operand, perform the instruction’s action (such as subtraction or addition), write the destination operand, and update the condition codes. Some instructions skip certain steps, or use alternate versions of steps that are also used by other instructions. Because of all this shared logic, I made each step of instruction execution a switch statement, something like this:

/* Read source operand. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_SUB:
	case INSTRUCTION_MOVE:
		/* Read standard operand. */
		source = ReadOperand(&source_operand);
		break;

	case INSTRUCTION_ADDI:
	case INSTRUCTION_SUBI:
		/* Always a literal */
		source = ReadLiteral();
		break;

	case INSTRUCTION_NOP:
		/* Doesn't have a source operand. */
		break;
}

/* Read destination operand. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_SUB:
	case INSTRUCTION_ADDI:
	case INSTRUCTION_SUBI:
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);
		break;

	case INSTRUCTION_MOVE:
	case INSTRUCTION_NOP:
		/* Doesn't read its destination operand. */
		break;
}

/* Perform instruction. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_ADDI:
		destination += source;
		break;

	case INSTRUCTION_SUB:
	case INSTRUCTION_SUBI:
		destination -= source;
		break;

	case INSTRUCTION_NOP:
		/* Does nothing. */
		break;

	case INSTRUCTION_MOVE:
		destination = source;
		break;
}

/* Write destination operand. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_SUB:
	case INSTRUCTION_ADDI:
	case INSTRUCTION_SUBI:
	case INSTRUCTION_MOVE:
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);
		break;

	case INSTRUCTION_NOP:
		/* Doesn't write its destination operand. */
		break;
}

While this does eliminate duplicate code, it adds a lot of runtime overhead. For some reason, GCC does not produce jump tables from these switch statements either, instead producing a gross if-else chain that further adds to the overhead.

This was a mistake that I made sure not to make with the Z80 CPU interpreter, which only has the one switch statement, like this:

switch (instruction)
{
	case INSTRUCTION_ADD:
		/* Read source operand. */
		/* Read standard operand. */
		source = ReadOperand(&source_operand);

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination += source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_SUB:
		/* Read source operand. */
		/* Read standard operand. */
		source = ReadOperand(&source_operand);

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination -= source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_ADDI:
		/* Read source operand. */
		/* Always a literal */
		source = ReadLiteral();

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination += source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_SUBI:
		/* Read source operand. */
		/* Always a literal */
		source = ReadLiteral();

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination -= source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_MOVE:
		/* Read source operand. */
		/* Read standard operand. */
		source = ReadOperand(&source_operand);

		/* Read destination operand. */
		/* Doesn't read its destination operand. */

		/* Perform instruction. */
		destination = source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_NOP:
		/* Read source operand. */
		/* Doesn't have a source operand. */

		/* Read destination operand. */
		/* Doesn't read its destination operand. */

		/* Perform instruction. */
		/* Does nothing. */

		/* Write destination operand. */
		/* Doesn't write its destination operand. */

		break;
}

While a lot more verbose, it was considerably faster: roughly 3x more-so than the 68k CPU interpreter.

I didn’t want to work on my emulator any further until this was fixed: what was the point in improving the emulator when such a core component is fundamentally flawed? However, addressing this took much longer than planned, perhaps due to scope creep: I didn’t want to manually ‘inline’ the code for every instruction, as the 68k has around 100 instructions totalling around 2000 lines of code, and not only would doing so take forever and result in an unmaintainable mess of duplicate code, but it would be extremely prone to human error.

Inspired by floooh’s Z80 interpreter, I decided that I should automate the process. This would greatly reduce the chance for human error while simplifying future overhauls. At first, I planned on doing the same as floooh, and condensing every instruction down to metadata in a table, which could then be parsed by a generator which emits code to execute said instruction. However, I ran into issues with how aspects of instructions can only be represented by algorithms rather than data, and how the table would contain a massive amount of repeating data, which is exactly what I was trying to avoid. Eventually I settled for something much more primitive: instead of parsing a big table, I would just take the existing interpreter and modify it to emit code instead of execute it. This way I can have the best of both worlds: the generator will have next to no duplicate code, while the generated output will have no unnecessary overhead.

It took some wrangling, but I eventually got this done, and the performance improvement was a very welcome 60%+: the 68k CPU interpreter now ran at around the same speed as the Z80 CPU interpreter, all from just eliminating some switch statements.

The generated interpreter, while readable and resembling hand-written C code, is rife with duplicate code, taking it from 2000 lines of code to 5000. Because of this, I still prefer to work with the original small interpreter. Maybe, at some point, the generator will become too much effort to maintain and I’ll start editing the ‘inlined’ interpreter directly, but that remains to be seen.

While the original 68k interpreter was indeed slow, it was by no means the slowest part of the emulator. In fact, it was the third slowest: followed by the VDP renderer (slightly slower) and the emulator’s cycle-ticking function (3 times slower).

The VDP renderer being slower can be understood, but why is the cycle ticking function so slow, and 3 times more nonetheless? Isn’t its whole purpose to just sit in a loop calling the functions that do the actual emulation? It turns out that this was down to bad programming: the Mega Drive’s master clock runs at about 52MHz, so the emulator’s cycle-ticking function must do 52,000,000 cycles per frame. How did it do this? With a plain-ass for-loop. The thing is, most of these cycles are wasted as the 68k only runs 1 out of every 7 cycles, and the Z80 only runs 1 out of every 15 cycles, resulting in most iterations of the for-loop doing nothing but creating overhead. The solution? Stop iterating the for-loop a single cycle at a time. Instead, iterate as many cycles as necessary for the 68k or Z80 to update, or for the Mega Drive to finish updating for the current frame. With this, the cycle-ticking function went from being 3 times slower than the VDP renderer to being about the same speed as it.

…As you may have noticed, this means that tweaking a for-loop resulted in a bigger performance gain than overhauling the 68k CPU interpreter. That hurts me.

Still, this means that my emulator is a lot faster now! It might not be running full-speed on old DOS PCs any time soon, but it’s still a win for running well on limited hardware, or in exotic environments like Emscripten.

In the future, I’d like to look into optimising my CPU interpreters further by extending the single switch statement from per-instruction to per-opcode, eliminating yet more overhead at the cost of a massive increase in code size and the resulting cache misses. Better yet, I’d like my generator to be able to produce both per-instruction and per-opcode interpreters, to suit whichever platform the emulator is being built for.

Until then, I think the next thing I work on will be adding support for the Window Plane to the VDP renderer. Maybe after that I can finally optimise the VDP renderer so that it’s not the slowest part of the emulator.

As usual, if you’re interested in checking out the emulator yourself, you can find binaries and source code on GitHub.

Porting Sonic Mania to the Wii U

I’m still quite busy these days, but I did find time in the last month to start a new project:

As you can imagine, this was pretty similar to the Sonic CD port that I did last year: not only do they run on the same engine (RSDK – Mania just uses a newer version), but they also have the same dependencies (SDL2 and libtheora).

The process of porting the game wasn’t quite as simple though: while Sonic CD just required some endianness and thread-safety fixes, Sonic Mania had major performance issues. In particular, Competition Mode, the Special Stages, the pinball table, and the Stardust Speedway Zone Act 2 boss fight would all lag severely. The reason for this is that Sonic Mania is software-rendered, and its fancier effects are simply too much for the Wii U’s CPU to handle.

But that wasn’t the only problem that I had to deal with: this port also exposed issues in the Wii U’s SDL2 port as well!

But I’m getting ahead of myself… let’s start at the beginning:

Timeline

Getting it to Boot

In the middle of August, the Sonic Mania decompilation was released, and news of it reached a small Discord server that I was in. One thing led to another, and I accidentally gave people the impression that I would port the decompilation to the Wii U, as I had done previously with Sonic CD. Eventually, word spread to GBAtemp, and I soon had people messaging me, asking about the port that I was allegedly working on. Realising that I’d kind of done this to myself, I figured that I might as well give it a shot. I had an afternoon all to myself – just me, my Wii U, a copy of the decompilation, devkitPPC, and no internet – so I got to work. Within a few hours, I had the game running, albeit very rough and laggy.

Building the game was simple enough: the build system is not complex, so I was able to quickly write my own CMake script from scratch (the decompilation only comes with a Makefile, but CMake scripts are much easier to target the Wii U with). The Wii U already had a port of SDL2, and I was able to compile my own Wii U port of libtheora, so the dependencies were quickly satisfied as well. But once I had produced a Wii U executable, I found that running it would immediately result in an endless black screen – the game was crashing upon boot.

Luckily, the game has built-in error reporting, and I found that the game was failing to locate a file. The game keeps all of its files in a single archive blob, with each one being represented by an MD5 hash of its original filename. To load a file, the game hashes the filename at runtime and then uses that to locate the file in the archive. But here’s the thing: the MD5 hasher that the game uses is garbage: it’s literally just some random code grabbed off a random website.

This hasher’s code commits every sin:

  • Tiny, incomprehensible variable names (because why would anyone want to understand what a variable does or is for?).
  • No documentation (no, really, who cares about code being readable and maintainable?).
  • Uses dynamic memory for no reason (MD5 has absolutely no need for dynamic memory: everything can be done on the stack with a small buffer).
  • Doesn’t handle malloc failure (who doesn’t like null pointer dereferences?).
  • Is dependent on implementation-defined integer type sizes (because, as we all know, ‘int’ and ‘long’ are the same size on x86-64 Linux as they were on DOS, am I right?).
  • Uses signed ints for representing the size of arrays (ah yes, an array of -0x1000 chars, please).
  • Only works on little-endian CPUs (because screw basic portability, especially in code that’s, you know, shared online and intended to be used by people in a variety of projects).

While the other points could be considered nit-picks, that last one prevents the game from running on the Wii U, which has a big-endian CPU. How is it not compatible with big-endian CPUs? It casts an array of chars to an int, instead of just bit-shifting and OR-ing them together like sane code would.

Because of this bug, the calculated hashes would be incorrect, causing the game to fail to find vital files and crash.

Anyway, with that fixed, the game can now run, right? Nope: it still crashes on boot. It’s not just the MD5 hasher that only works on little-endian CPUS: the entire game engine is the same way! Oddly, there is some degree of explicit big-endian support in the engine’s code, but I guess due to bit-rot, other parts of the engine lack it, such as the UTF-16 string reader. After patching this all up, the game could finally boot.

RGB565

The game was now running, but there was one pretty big problem: it was very, very laggy in anything but the main levels. Special Stages, animated cutscenes, pinball, Competition Mode, they would all lag to half or even a third of the game’s usual speed.

As I explained earlier, this is largely due to the game being software rendered, and the Wii U’s CPU being a dinosaur that’s still compatible with GameCube code, but it’s also caused in part by SDL2: you see, the game outputs an RGB565 framebuffer, but the Wii U port of SDL2 doesn’t support RGB565, causing SDL2 to convert it to a variant of RGBA8888 and use that instead. This add a massive amount of overhead to the game’s rendering. Despite this, the Wii U actually does support RGB565. Indeed, there is even code in the SDL2 port that tries to use it, but it’s dummied out with a comment explaining that it doesn’t work. The cause of this is that there’s a catch to RGB565 support: each 16-bit RGB565 pixel is little-endian, not big-endian. This isn’t too hard to work around, though: just byte-swap the bitmap as it’s passed to the GPU.

With this done, the game started to perform a lot better, but it would still drop to half speed a lot. This struck me as odd, since it would drop to half speed even if it should only be lagging slightly. Looking into SDL2’s code, I found the cause: the port uses regular V-sync instead of adaptive V-sync. For those not in the know, adaptive V-sync is a variant of V-sync that doesn’t wait until the next frame if the game is lagging. This means that if the game only barely missed the frame that it was meant to wait for, then it wouldn’t wait for the next one – it would just continue. The result is that the game only lags as much as it has to, instead of dropping to a fraction of its framerate. Making the V-sync adaptive was a simple change to make to SDL2.

With these two fixes combined, the game’s performance was massively improved: many instances of lag were eliminated, and the cases where the game still lagged at least ran at a framerate that was much closer to 60FPS.

YUV420

There was still one part that consistently lagged, however: the animated cutscenes. Unlike the rest of the game’s RGB565, the animations render in YUV420. This is a weird format that encodes colour as a single luminance channel and two chrominance channels. Unlike RGB, these channels are not interleaved, and they’re not even the same resolution: the chrominance channels are half the vertical and horizontal resolution of the luminance channel. It’s really weird.

Like with RGB565 before, SDL2 didn’t support YUV420 on the Wii U, so it was converting the YUV420 image to something else every frame. Unlike RGB565, the Wii U doesn’t support YUV420 natively… however, it can be emulated with a fragment shader.

This is a trick that I learnt from SDL2’s OpenGL renderer: if you create three textures, you can assign one to each of the YUV channels. Then, these textures can be sampled in a fragment shader, and dotted with some lookup tables to produce an RGB pixel. This works out to be much faster than SDL2’s CPU-based conversion, allowing the cutscenes to finally run at full speed.

It’s so pretty that you wouldn’t guess it was encoded in such a gross colour format!
Controller Support

Oddly, the game could only read input from the Wii U Gamepad: the Wii U Pro Controller and Wii Remote didn’t work at all. Looking into it some more, I found that this was because SDL2 only supported those controllers in its legacy ‘joystick‘ API, and not its newer ‘game controller‘ API. The difference between the two is that the ‘game controller’ API binds each button to an XInput-style layout, avoiding the need for the user to configure their own button mappings like in an old PC game. To achieve this, SDL2 contains a database of known controllers and which buttons map to their equivalent XInput buttons. All I had to do was add the Wii U Pro Controller, Wii Classic Controller, and Wii Remote (with Nunchuk) to this database, and now these controllers can be used. This is great for Competition Mode!

A Gamepad, a DualSense, a Wii Remote, and a Pro Controller, all connecting to the Wii U for a four-player game of Competition Mode! The DualSense is connected to the Wii U with the amazing Bloopair homebrew!
Random Crashing

Ah yes, my worst nightmare: after releasing the first few versions of my port, I began experiencing random crashes. There was no pattern to when they’d occur, so I couldn’t reproduce them at will: all I could do was playtest the game for hours and hope that it would eventually crash. At first, I had no leads to go on, but a handy YouTube comment tipped me off that unlocking achievements would consistently result in a crash. From this, I found that the memory allocator was faulty.

Yes, Sonic Mania has its own memory allocator. It’s neat: it performs automatic garbage collection and defragmentation. However, the decompilation broke it in two key areas: the allocator would increment the total number of allocations even if the allocation failed, and the allocation duplicator would fail to increment the allocation total at all, resulting in memory being destroyed by the garbage collector while still in use. Once these were addressed, the memory allocator was back to functioning as intended.

Awesome! No more random crashes, right? Nope: there are more!

At this point, I was stumped: my only lead was a dead end, leaving me with no other ways of diagnosing what the cause of these crashes was. That was, at least, until a Wii U homebrew developer called Gary informed me that the Wii U’s operating system keeps crash logs in its /storage_slc/sys/logs directory. This was an enormous help: suddenly, I had a full stack trace and register dump of the last 100 crashes that my Wii U had experienced! From this, I found that the crashes were occurring in the audio mixer. A null pointer dereference; strange, as there was only one pointer in the audio mixer (the sample pointer), and it should never be null.

This pointer is obtained from a struct which is only read if a flag in it is set, but, whenever this flag is set, the pointer is set to a non-null value. This could only mean one thing… a thread-safety issue.

Eventually, I found the culprit: the StopSFX function. Most audio functions make sure to lock the audio thread before running, but this one neglects to. I’m not sure if this is the case in the real Sonic Mania, or if this is just a mistake in the decompilation, but either way this failure was resulting in a race condition where the audio thread was running after the flag had been set but before the pointer had been set, resulting in a null pointer being read and processed by the audio mixer, crashing the system. With this fixed, the random crashing was finally a thing of the past.

Aroma

During the development of this port, the Aroma Wii U Homebrew Environment was released. In particular, Aroma features one very interesting development: running homebrew from the Wii U Menu, bypassing the Homebrew Launcher. This meant that homebrew could use the Home button to open the Home Menu, whereas with the Homebrew Launcher it would just abruptly exit the homebrew without warning. Additionally, it meant that homebrew could include a cool splash screen that would be shown during start-up. It’s also just cool to have homebrew displayed alongside official Wii U software and games. Naturally, I updated my port to take advantage of this, making it look and feel extra official.

It’s like it was meant to be!
File IO

The Sonic Mania decompilation supports mods, but I was surprised to find that these mods would load extremely slowly on the Wii U. For instance, a level-replacement mod would take upwards of an entire minute to load. What’s going on? Is the Wii U’s SD card IO really that slow?

At first, I thought that this was an issue with devkitPPC’s implementation of fopen and related functions: perhaps it’s doing some weird slow low-level interaction with the Wii U’s SD card slot? I figured that I could try using the Wii U OS’s native file reading functions instead, only to find out that those are exactly what devkitPPC uses.

Eventually, I began trawling through devkitPPC’s code to see if it was using the native file reading functions incorrectly. On the contrary, however, I found that the code was very well written: there were already clever tricks that exploited the properties of the CPU’s cache to optimise file reading. Higher up the software stack, in the C standard library (newlib), there was even complete IO buffering, greatly reducing the number of IO accesses. What could possibly be going wrong here?

Welp, after examining the code for hours, I realised that I was looking at the wrong branch. After switching to the devkitPPC branch, I found the cause: newlib’s IO buffering was being manually disabled by a hack in the code, causing every byte of read data to result in an IO poke. No wonder it took a minute to load a couple of megabytes. This change had recently been reverted, but an update had yet to be released that contained this change. Still, the C standard library has a function for re-enabling IO buffering (setvbuf), so I made the game use that, and now mods load just as fast as the rest of the game!

jubbalub’s Labyrinth Zone mod no longer takes forever to load!
Wrapping Up

After just over a month of work, the port is now quite complete, running fast and stable while providing near feature-parity with official releases. This project has not only led to improvements in the Sonic Mania decompilation, which should benefit all ports and not just this one, but also improvements to the Wii U port of SDL2, which should benefit many Wii U homebrew programs and games which depend on it, granting them support for more controller, improved support for certain texture formats, and even some bugfixes.

If you’re interested in trying my Sonic Mania port, you can find it here.

If any fancy updates are made to the port, I’ll be sure to talk about them here. I’m still trying to think of a way to port the screen filter shaders, and it would be nice to add full hardware-accelerated rendering to eliminate the remaining lag in the Special Stages. We’ll see. In the meantime, though, I’ve taken an interest in my Mega Drive emulator once again…

Project Sand: Sonic Aftermath

I was digging through some old Sonic hacks of mine when I stumbled across these ancient relics from 2014 and 2016.

Remember that Project Sand/Sonic Aftermath hack that I have a few old videos about? Here’s the one level of it that was worked on before the project died: Sand Zone from Cave Story.

There’s not much that I can tell you about it: this one test level was made before I came up with any ideas for gimmicks and the sort. That purple mess in the middle of the stage is supposed to be spikes, but its graphics are overwritten with the level tiles.

I believe that the music is a straight port from Cave Story: the Organya music was converted to XM with the org2xm tool, then that was converted to SMPS with xm2smps, and then that was converted from binary to ASM with SMPS2ASM, allowing it to be installed in my hack’s custom sound driver. This hack was either rocking my Sonic 2 Clone Driver v2, or Flamewing’s Flamedriver. After all this time, I can’t remember which.

This footage is of an earlier build of my hack than the one seen in the other videos. The reason for this is that the hack was remade from scratch after this build, and this level was never reintroduced afterwards.

Here’s a cutscene that was one of the last things that was worked on before the project died.

This was meant to be the opening cutscene to Knuckles’ story: the hack was meant to be a ‘what if’ scenario where Sonic 2 ends differently, leading to a different series of events in what would be Sonic 3. This cutscene depicts the Death Egg landing in Angel Island’s volcano directly after the events of Sonic 2, instead of the lake like it normally does. Following this, Knuckles would have gone to investigate it. At the time, I didn’t have Knuckles ported into the game, so I used Sonic as a placeholder.

Between this cutscene, Sand Zone, and the custom title screen, this is all there ever was to Project Sand/Sonic Aftermath.

clownmdemu and clownassembler released

Sorry for the drought of blog posts lately: I’ve been busy with work and a lot of other IRL stuff. Still, if there’s one thing I can give an update on, it’s that two of my projects have finally seen a release: clownmdemu and clownassembler have been released on Sonic Retro, Sonic Stuff Research Group, and Mega Drive Developers Collective.

Here are links to the various release threads.

clownassembler was released less than an hour ago, so there’s not much that I can comment on feedback-wise. However, clownmdemu was released at the end of June, giving it plenty of time to receive feedback.

Surprisingly, people were quite enthusiastic and welcoming of the emulator despite its unfinished state. If anything, people seemed more bothered by the hardcoded key bindings than its inability to boot certain games. I can only hope that clownassembler gets an equally warm reception, considering that it is similarly unfinished.

clownmdemu – Z80 Support

With the addition of FM support, my Mega Drive emulator came much closer to being able to provide a complete experience for certain games such as Sonic 1. Unfortunately, there was still one major missing feature: drums, voice clips, and sometime even all audio entirely, were inaudible. What gives? Sonic 1 plays most of its audio, but 2 and 3 don’t play any at all?

The reason that this is the case lies in the architecture of these games: Sonic 1 uses a sound engine that runs on the 68k CPU, while 2 and 3 use one that runs on the Z80 CPU. Up to this point, my emulator has not emulated the Z80 CPU, which is why no sound plays in those two games. Additionally, Sonic 1 uses the Z80 CPU for its drums as well as the famous Sega chant, which is why those are missing as well.

However, this problem is no more: new to clownmdemu is Z80 CPU emulation. It doesn’t implement 100% of the Z80’s feature set, but it’s enough to get at least the Sonic games to output all of their audio. Heck, it’s even enough to get Sonic ROM hacks to play their audio, including the ones that use fancy custom Z80 code:

This ROM hack in particular, Sonic 2 Recreation, uses Z80 code that was written from scratch by ValleyBell and is able to apply a variety of effects to its PCM samples.

Writing another CPU emulator was nice because it gave me a chance to apply what I’ve learnt since writing the 68k CPU emulator. In particular, instead of making each step of execution a giant switch statement, there’s only one switch statement: one for each instruction. This does result in some duplicate code, but the hope is that this is outweighed by avoiding the overhead of going through 6 or so switch statements.

I’ve also used this opportunity to write the machine code decoder in such a way that its job can be replaced with a lookup table. You see, the task of breaking a byte of machine code down into a struct which describes the operation to be performed has been given to a function: this function takes a byte and returns a struct. Because of this modularised design, I can execute a loop on start-up which executes this function for every possible byte, from 0 to 0xFF, and then caches the resulting structs in a big lookup table. Then, during emulation, the machine code to be executed can be used as an index into this table to retrieve its corresponding struct. By doing this, the expensive step of manually decoding the machine code is skipped entirely. According to my tests, this doubles the performance of the Z80 emulator. Though, for RAM-limited platforms, I’ve left a compile-time option to revert back to using the function instead of a lookup table. One day, I would really like to refactor the 68k CPU emulator to bring over these improvements.

The Zilog Z80 itself is a weird thing: it’s an 8-bit CPU with a 4-bit Arithmetic Logic Unit which mimics the instruction set of another CPU (the Intel 8080) while adding extra instructions of its own. The extra instructions were bolted-on in a very ugly way that clashes with the ‘neatness’ of the base 8080 instruction set. It also contains some outright hacks that aren’t intuitive to emulate at all: for instance, there’s a certain mode that you can make the Z80 enter where the output of an instruction can be written to both a register and memory at the same time. It essentially hijacks the operand getters and setters.

Anyway, with this implemented, Sonic 1 and 2 are now finally fully emulated: there isn’t a single feature in those games that isn’t being emulated (to my knowledge anyway). Sonic 3 still has some strange bugs, and it makes use of unimplemented features such as SRAM and the Window Plane, so that may be what I work on next.

There are still a number of things that I want to add before I give this emulator a proper release: a libretro core, controller rebinding, SRAM, the Window Plane, 68k instruction durations, 68k exceptions, YM2612 LFO, YM2612 SSG-EG, YM2612 Timers… though, maybe I should just release this anyway: it’s not like this emulator will ever play every title perfectly or anything.

As always, you can find the source code to my emulator in the usual place.

Oh, right. I should also list the resources that I used when making this Z80 emulator. First there’s this blog post which gave a general overview of the Z80, and explained why and how the Z80’s instruction set is the way it is. Then there’s this follow-up which explains the Z80’s timings. There’s also this useful table which visualises the various opcodes. Finally, there’s this webpage which explains how to effectively decode Z80 opcodes. These resources were invaluable to me, and hopefully they can be to others too.

clownmdemu – FM Audio Emulation

It finally happened! With university over, I decided to tackle what is perhaps my greatest challenge yet in writing this emulator: emulating the YM2612.

The YM2612 is the Mega Drive’s primary audio chip. Apparently, it is a cost-effective, stripped-down version of the YM2608: while the YM2608 featured FM, SSG, Rhythm, and ADPCM modules, the YM2612 is just a standalone FM module with basic DAC output slapped on it.

For the longest time, the only documentation that was available to emulator developers was the “Genesis Software Manual“, which was a document that Sega made available to developers that described the console’s hardware. Unfortunately, this document went into very little detail about how the sound hardware worked. Still, it was apparently good enough for a number of emulators to be made back in the 1990s and 2000s.

Later, in 2008, Nemesis obtained a copy of the official YM2608 manual. Unfortunately, this document was in Japanese, but he was able to produce a mostly-coherent machine-translation. This document answered many questions that emulator developers had about the YM2612, but it still failed to go into detail when it came to certain subjects that were essential for emulator developers to understand.

When I began implementing my own FM emulation, I decided that I would try to stick to this manual early on, and only seek out additional information once I get stuck. With just this YM2608 document and my own knowledge of the YM2612, I was able to produce an extremely basic emulator that produced one sine wave per channel. These sine waves could have their volume and frequency adjusted. While this wasn’t nearly enough to produce authentic audio, it did at least make it possible to hear music and sounds.

Unfortunately, there wasn’t much more that I could do than this: I knew from the manual that there were three core components of the YM2612 that were essential to how it produced audio: the Operators, the Phase Generator, and the Envelope Generator. However, while the manual described what they are and what input they take, it did not describe what output they create from that input. To give an example, a ‘detune’ value can be supplied that offsets the frequency, but neither the Genesis Software Manual nor the YM2608 manual describe how much it offsets the frequency.

At this point, I decided to find some more information on the YM2612. I remembered that the SMPS devkit which was found a few years ago had a YM2612 manual, but unfortunately it too was in Japanese and with seemingly no translation available. It turns out that the manual for the YM2612’s CMOS equivalent (the YM3438) was also found, but yet again it was in Japanese. This wasn’t a huge loss though, as those didn’t appear to contain anything particularly useful that wasn’t already in the YM2608 manual.

What was useful, however, was a thread on SpritesMind that I’ve had bookmarked for years. It’s 58 pages of discussion and discoveries regarding YM2612 emulation, including some incredible documentation that was produced by Nemesis. In particular, he created three massive write-ups of exactly how the YM2612’s Operators, Phase Generator, and Envelope Generator all work. This information is utterly invaluable, as it provides most of the ‘missing pieces’ to the YM2612’s functionality that the manuals lack. Somehow, with nothing more than a Mega Drive, an oscilloscope, and an extensive array of tests, he was able to even figure out details as nuanced as the exact values of the chip’s internal sine wave lookup table.

It took me about four hours straight to read through that whole thread, but it was worth it! I’ll probably have to read through it again to catch any details that I missed the first time around. If you’re interested in making your own Mega Drive emulator, or are just curious about the YM2612, then I cannot recommend that thread enough.

Unfortunately, Nemesis wasn’t able to complete his documentation of the YM2612, meaning that there was still missing information on three key components: the accumulator, Operator feedback and modulation, and the Low Frequency Oscillator.

For the time being, I passed on implementing the Low Frequency Oscillator because Sonic 1 (the game that I was using to test audio) doesn’t use it. The accumulator, I had already produced my own implementation of through guesswork. This left the Operator feedback and modulation.

So what is Operator feedback and modulation? To understand that, you have to understand how the YM2612 produces sound. So, what the heck: here’s an overview of how the YM2612 works:

The YM2612 has six channels. Each channel is composed of four sine waves, dubbed ‘Operators’. Each Operator has its own Phase Generator and Envelope Generator. The Phase Generator advances the sine wave, and the Envelope Generator produces an ADSR envelope. The Phase Generator manages the frequency of the sine wave, and thus is responsible for producing notes, while the Envelope Generator is responsible for shaping the sine wave into a more complex waveform, and thus creating basic ‘instruments’ or ‘voices’.

However, the Operators allow for even more advanced ‘instruments’ to be made through Operator modulation: rather than output its waveform to the speakers, an Operator can instead feed directly into another Operator, modulating its sine wave in a process known as ‘phase modulation‘.

‘Operator feedback’ is the process of the first operator of a channel feeding into itself, which is a feature unique to the first operator.

In the spirit of emulation development, I decided I’d find out for myself how to implement Operator modulation: with the help of the cycle-accurate Nuked OPN2 YM3438 emulator, I compared the output of my emulator to what a real Mega Drive would sound like, and tweaked my own Operator modulation implementation until it sounded correct.

Despite this, the audio still sounded far from accurate. I was able to track one source of distortion down to an incorrect implementation of the Phase Generator’s multiplier, and while that did fix the channels sounding like whistles, it still left the audio sounding like this:

Clearly the envelope generator was running too fast… and yet its code matched Nemesis’s notes exactly. I spent hours debugging this, creating custom FM instruments to test specific parts of the emulator against Nuked OPN2, but nothing made sense: the envelope generator was absolutely working as intended. I then figured that perhaps I had given the emulator’s frontend the wrong sample rate, and that it was somehow playing the audio back twice as fast.

The truth was a lot dumber: I’d accidentally given the emulated YM2612 a 7x overclock.

The YM2612’s clock is derived from the Motorola 68000’s clock, which is derived from the master clock. The 68000’s clock is the master clock divided by 7, and the YM2612 clock is the 68000’s clock divided by 6. My emulator had the YM2612 clocked at the master clock divided by 6.

Once that was corrected, (and I properly implemented ADSR envelope rate-scaling which I somehow glossed over when reading the YM2608 manual 100 times), I finally had this:

At last: it worked! And it came so suddenly too: one minute everything’s a whistle, and the next it sounds like a real Mega Drive!

And that takes us to where we are now. There are still many things left for me to implement in my YM2612 emulator, such as per-operator frequencies, Timer A and Timer B, the Low-Frequency Oscillator, SSG-EG (a second Envelope Generator), and possibly even the debug registers.

You might be asking yourself how I’m going to replicate all of those when there’s so much missing documentation. Well, the truth is that the YM2612 and YM3438 have actually been completely documented for years now. I just figured that it would be too easy to use that documentation. What is that documentation? Nuked OPN2’s source code.

You see, Nuked OPN2 isn’t just a cycle-accurate emulator: it’s a cycle-accurate emulator that’s based directly on a die-shot of a de-capped YM3438. Essentially, Nuked OPN2 is a conversion of the YM3438’s circuitry to C. With this, there are no mysteries about how the YM3438 works: everything is documented in such a way that you can verify it just by running it. What other form of documentation doesn’t just say ‘dude, trust me’, but ‘here: I’ll prove it’?

While I do plan on using Nemesis’s documentation to implement SSG-EG, Nuked OPN2 can be used to implement any details that aren’t explained elsewhere.

Surprisingly, some parts of my YM2612 emulator happen to function exactly how a real YM2612 does, in contrary to how documentation suggests they should function. For example, the note octave is encoded as a number between 0 and 7, which expresses the following behaviour:

OctaveBehaviour
0Divide note frequency by 2.
1Leave note frequency as-is.
2Multiply frequency by 2.
3Multiple frequency by 4.
4Multiple frequency by 8.
5Multiple frequency by 16.
6Multiple frequency by 32.
7Multiple frequency by 64.

Both the manuals and Nemesis’s notes suggest that this should be implemented as a left-shift by the number of the octave minus 1, with special-case logic for octave 0 that does a right-shift by 1 instead. In my emulator, however, I just left-shift by the number of the octave, and perform a single right-shift afterwards, creating the same result with more-efficient code. According to Nuked OPN2, this is exactly what a real YM3438 does as well.

As much as I’d like to continue rambling about this stuff, there’s only so much to write about, and only so many hours in the day. As usual, you can find clownmdemu’s source code in its Git repo. Hopefully I’ll have more progress on my emulator to talk about soon. In the meantime, I’ll leave you with a video of a fun bug:

The Mega Drive’s Interlaced Video Output

Today, I looked into adding support for the Mega Drive’s interlaced video output to my emulator. It didn’t go how I planned, and I eventually realised that it was worthless to pursue. Still, I think this makes for a fun story.

First, I should probably go over the basics of how old CRT TVs would display. Basically, the screen is split up into 480 lines, but they are not all drawn at once. Rather, the even lines are done first, and the odd lines are done on the next frame, or vice versa. You could think of it as the TV rendering 240 lines at 60FPS or 480 lines at 30FPS.

With that in mind, we can begin to understand the Mega Drive’s interlacing. It has three modes:

  • Mode 0, which is the non-interlaced mode. This mode is plain 240p. It uses a trick to prevent the odd lines from ever being drawn, meaning that two sets of 240 even lines are drawn instead.
  • Mode 1. This mode is similar to mode 0, but it does not prevent odd lines from being drawn. The odd lines will display the exact same graphics as the even lines. The official ‘Genesis Software Manual’ developer document warns that this mode will result in severe vertical blurring.
  • Mode 2. This mode is very interesting: it is like mode 1, except the odd lines will not display the same graphics as the even lines. Basically, the Mega Drive’s vertical resolution will double, being 320×480 or 256×480. However, because only even or odd lines are displayed in a single frame, this means that the image will be downsampled back down to 320×240/256×240 when displayed.

Regardless of the interlace mode, it is always 240 lines that are output in a frame. I wanted to implement this in my emulator, to perfectly replicate the rendering of interlace mode 2, which until now has been rendering at the native internal resolution of 320×480.

However… always rendering a 320×240 image wouldn’t be correct. After all, old TVs have 480 lines, not 240. Since mode 0 disables odd lines, there should theoretically be empty black spaces between each line, creating a ‘scanline’ effect.

To recreate this, I set about making my emulator always render a 320×480 image, and having the emulated console simply skip lines. This is accurate to how a real Mega Drive displays on a real CRT TV. However, having leftover lines from a previous frame mixed in with lines from the current frame produced an ugly ‘comb’ effect:

The effect in Mode 1.

Because of this, and knowing that the lines of a CRT fade when they’re skipped, I decided to simply make the skipped scanlines black.

In Mode 1 and Mode 2, the lines which are black and the lines which are actually drawn alternate every frame, causing the screen to ‘jitter’. It looked pretty cool and authentic to how I remember the two modes looking on a real Mega Drive. I ended up doing some extended playtesting with this, to soak in the nice scanline effect. When I was done, I closed the emulator and- oh dear:

My monitor had severe image persistence:

A recreation, because there’s no way in hell that I’m doing it again.

It turns out that this is a terrible idea: apparently, much like the waveform of a sound, the electricity going through an LCD pixel must go up and down, positive and negative. It alternates every frame, just like how a wave alternates every sample. By rapidly flickering a pixel between colour and no colour every frame, however, the pixel never gets to go negative, always remaining positive, or vice versa. The result is that the pixel becomes a capacitor, storing charge and refusing to release it, causing it to display colour when it shouldn’t.

Thankfully, this appears to be temporary, as the built-up charge will dissipate naturally. Still, this was bloody terrifying: I thought I’d just ruined the display of my fancy new laptop.

Clearly, this interlacing emulation had to go. The last thing I need is a wave of heated bug reports from furious users who think their monitors have been destroyed.

Thinking about it, I realised that there’s no point to emulating the Mega Drive’s interlacing in the first place besides authenticity: in every use-case, interlacing is either an annoying side-effect or an irrelevant technical detail.

  • Sonic 2 uses mode 2 for its split screen multiplayer, taking advantage of the doubled vertical resolution. The interlacing does nothing but halve the game’s vertical resolution and introduce an ugly jitter effect.
  • Mode 2 could be used to display a static 320×480 image, in which case the interlacing wouldn’t be visible at all.
  • Mode 2 could be used for supporting 3D glasses, in which case the interlacing would cause each eye to see 30FPS instead of 60FPS.

As a result of this, interlacing emulation is completely pointless: my emulator has been doing ‘the right thing’ the entire time by just not emulating it. Better yet, my emulator goes above and beyond by rendering mode 2 at its native 320×480 resolution instead of halving its resolution back down to 320×240, meaning that it renders mode 2 better than a real Mega Drive.

It’s… beautiful!

I suppose this is where the story ends. Interlacing has been a strange thing to research, but I’m glad that I now understand more about how the Mega Drive delivers images to the display. I just wish that I didn’t nearly suffer heart failure in the process.

Knuckles in Sonic 2 Disassembled

With my honours project complete, I decided to put my newfound free time into a project that I’ve been meaning to get around to for almost five years: disassembling Knuckles in Sonic 2.

In case you don’t know, Knuckles in Sonic 2 (which I’m just going to call ‘KiS2’ from now on) is a version of Sonic 2 that lets you play as Knuckles instead of Sonic and Tails. Sonic hackers like to port Knuckles from this version back into regular Sonic 2, but, in the process, they effectively undo the huge number of changes that KiS2 made to Sonic 2’s codebase. This ranges from simple alterations for Knuckles, to bugfixes that have gone undiscovered to this very day.

You might be asking yourself why I want to disassemble this game, since a disassembly for it already exists. Well, the reason is that the existing disassembly is completely separate from the Sonic 2 disassembly that also already exists. Not only does this mean that it is horrifically outdated in comparison to the Sonic 2 disassembly, but this also makes it extremely difficult to compare the two games and find differences between them.

Rather than disassembling the game from scratch like the maker(s) of the other disassembly did, my approach is to take the Sonic 2 disassembly, and edit it to match KiS2. This is exactly what I did to create the disassemblies of Sonic 2’s revisions (REV00 and REV02), the game’s Mega Play arcade version, and the version of Sonic 2 found in Sonic Classics/Sonic Compilation.

As of writing, this task is finally done, and I have a modified Sonic 2 disassembly that produces a perfect copy of KiS2. With this disassembly more or less complete, I figured I should explain everything I’ve learnt about KiS2 here:

Changes

Knuckles

Obviously, Knuckles has replaced Sonic. This is actually surprisingly tacked-on: Knuckles is just a lightly-modified Sonic with all of the gliding and wall-climbing behaviour wrapped in a single function call. I suppose this isn’t surprising, but I was under the impression that the whole Knuckles object was copied from an in-development version of Sonic & Knuckles. I think I got that idea from the Sonic 3 Unlocked blog, but I could just be misremembering.

Notably, Knuckles’ graphics are loaded from the Sonic & Knuckles cartridge: the tiles are recoloured at runtime to suit Sonic 2’s palette. The sprite mappings and dynamic tile loading data are also loaded from the Sonic & Knuckles cartridge. Sonic hackers may find this surprising, since Sonic & Knuckles uses a different sprite mapping format to Sonic 2. This leads me into my next point…

Mappings

All of the game’s mappings were converted to Sonic & Knuckles’ format. This strikes me as very odd, as this means that the mappings now have to be included in the KiS2 ROM, instead of being loaded from the Sonic 2 cartridge, wasting space. Maybe it was considered too much effort to go through the whole game and split the mappings? This conversion was universal: even unused mappings were converted. Heck, even unreferenced parts of mappings were converted. This suggests that the mappings were created using assembly macros, and the macro itself was modified to convert the mappings to Sonic & Knuckles’ format.

The difference between Sonic 2’s and Sonic & Knuckles’ sprite mapping format is that Sonic 2’s has extra data for the game’s two player mode, which uses a fancy rendering mode of the Mega Drive’s VDP. This leads me onto yet another point…

Two Player Mode

Two player mode was removed, but not entirely. It appears that the developer(s?) were struggling to fit the game to the size they wanted, so they began removing code related to two player mode, and once they reached their desired size, they stopped. In the end, they scraped by with only 680 bytes to spare.

There are plenty of leftovers from two player mode in the game: the variable used to detect two player mode (dubbed ‘Two_player_mode’ in the disassembly) still exists, and is referenced frequently in the game’s code. For example, the level title card object still makes heavy use of the flag.

Being a Sonic hacker, I’ve removed two player mode from Sonic 2 before, and I’ve done it much more thoroughly than in KiS2. With that in mind, I know how complex removing two player mode is, so it doesn’t surprise me that the developers didn’t go all the way with it.

Lock-On Technology

This won’t be a surprise to most people reading this, but KiS2, despite being a version of Sonic 2, doesn’t have many of Sonic 2’s assets in it. Instead, it copies them from the attached Sonic 2 cartridge. You see, KiS2 isn’t a standalone game: it’s actually a bonus mode in Sonic & Knuckles. Sonic & Knuckles’ cartridge has a cartridge slot on top of it, allowing you to plug other cartridges into it, with KiS2 being the result of plugging in Sonic 2’s cartridge.

The way Sonic 2’s assets were removed from KiS2 is pretty basic: at the end of Sonic 2 is a massive block of assets (including the game’s music, sounds, drum samples, enemy graphics, player graphics, player sprite mappings, level graphics, level layout, level object placements, and more), and it is simply removed in KiS2. Notably, assets that aren’t part of this giant block were not removed, such as the title screen’s ‘1 PLAYER’ and ‘2 PLAYER VS’ text.

As mentioned earlier, some assets are loaded from the Sonic & Knuckles cartridge, such as Knuckles’ assets. However, those aren’t the only things that are loaded from that cartridge: KiS2 features modified level object placements, which reward the player for exploring with Knuckles’ wall-climbing. Strangely, the data for this is in the Sonic & Knuckles portion of the cartridge instead of KiS2. It’s possible that this was done to free-up space in KiS2, with Sonic & Knuckles having room to spare.

Bugfixes

KiS2 contains a surprising number of bugfixes:

Perhaps most notably, KiS2 removed the air speed cap, which appears to be a leftover from Sonic 1. This is significant because it has always been unclear whether the air speed cap was deliberately retained in Sonic 2 as a feature, or leftover as a bug. The air speed cap is responsible for at least two areas in Sonic 2 not working as intended: the red spring that leads to the ‘monkey island’ in Emerald Hill Zone Act 2, and the launcher that flings you over a large gap in the floor in Wing Fortress Zone. In both cases, the speed cap causes the player to undershoot their target if they press left or right on the D-pad while moving through the air. The removal of this speed cap in KiS2 suggests that it was indeed an unintentional leftover all along.

One of the most well-known bugfixes in KiS2 is the correction of a bug that causes the bottom two lines of the screen to appear incorrectly in Emerald Hill Zone. I wonder how this bug was discovered, since televisions were especially prone to overscan hiding the edges of the screen back then.

One type of bugfix that KiS2 contains is taking the player’s character out of their ‘roll-jumping’ state, where their controls are basically locked. Being left in this state at a bad time can result in the game soft-locking, as the player is unable to move their character. Times when KiS2 makes the character exit their roll-jumping state is when they enter a wind-tunnel and when hovering over a propeller in Wing Fortress Zone.

Sonic 2 suffers from a particularly glaring bug, where entering the cheat to gain 15 Continues causes the game to play Oil Ocean Zone’s music forever. The cause is a nonsensical sound ID being submitted to the sound driver. This is corrected in KiS2. This bug was also fixed in the version of Sonic 2 included in Sonic Mega Collection.

The title card appears to have had a bugfix applied to it which prevents odd behaviour if the graphic of the name of the zone goes too far to the left of the screen, causing its X coordinate to drop below 0. This bugfix works by replacing some unsigned conditional branches with signed conditional branches, and only drawing the sprite if it is within 48 pixels of the screen’s left side.

The bumpers in Casino Night Zone have their own layout data. This data needs to be terminated with special byte patterns that prevent the bumper manager from reading beyond them and parsing surrounding code as data. One of these termination patterns is missing from the very start of Act 1’s layout data. In a stroke of good luck, the code before the data happens to resemble the terminating byte pattern, preventing the bumper manager from processing invalid data. In KiS2, however, this is no longer the case. A proper data terminator was added at the start of the data, fixing this problem. Fun fact: this bug appears to have not been fixed in the earliest prototype of KiS2, causing the game to crash if you go to the top left corner of the level.

There are also some modifications to the game’s collision code, which may be an attempt to fix bugs in it. Unfortunately, I haven’t figured out the point of these modifications yet, so I can’t say for sure what bugs, if any, they’re trying to fix. One bug that it appears to be trying to fix is the bug in Sonic 2 where collision with an object from below doesn’t properly push the player out, sometimes resulting in them phasing straight through the object. This fix does not work correctly, however, and cancels-out the player’s inertia when it shouldn’t. You can read more about it here.

One rather funny bug is that if you’re moving at a high speed towards a wall, and then start moving in the other direction at last second, Sonic will impact the wall and then start moving away from it while playing his pushing animation. KiS2 appears to fix this bug as well, preventing Knuckles from entering his pushing animation if he is not facing towards the object that he pushed against.

In Sonic’s movement code, a register that holds his speed is unintentionally partially overwritten before being used later on to decide whether Sonic is moving fast enough to skid or not. This creates an asymmetry in what speed Sonic needs to be in order to skid when attempting to move in the opposite direction. This too is fixed in KiS2. You can read more about this bug here.

Another bug fixed by KiS2 is that, when the player turns Super, a ring is instantly drained. This is due to a counter never being initialised. Now, the game waits a second before draining the first ring, which is consistent with how it drains every ring afterwards.

In Mystic Cave Zone, it’s possible for the player to become ‘detached’ from a hanging vine switch, appearing suspended in the air away from the vine itself. KiS2 addresses this by forcefully updating the player’s coordinates to match the vine every frame.

Speaking of Mystic Cave Zone, the boss of that zone has a nasty bug where, apparently due to a copy-paste error, the wrong address register is used at one point, causing a random byte of memory to be overwritten. Somehow, KiS2’s developers noticed this and fixed it.

And… that’s it. That should be the last of the bugfixes that I’ve found in KiS2. So, what other changes were made in KiS2?

JmpTos

Yep, JmpTos again. They always find an excuse to crop up when I do this kind of thing. For those not in the loop, ‘JmpTo’ is the nickname given to branch extensions that are present through Sonic 2’s codebase. If a branch is too short to reach its destination, it instead branches to a long-range jump instruction in order to reach it. In the first two revisions of Sonic 2 (REV00 and REV01), they appear to have been generated by the assembler. In the third revision – REV02 – they changed significantly, presumably because the developers switched to using a different assembler. They’ve once again changed quite a bit in KiS2.

What’s interesting about the JmpTos in KiS2 is that they appear to be hand-made, as opposed to the obviously-automated JmpTos in Sonic 2 REV00 and REV01. You see, it appears that the developers went through much of the game’s code, ‘tidying’ the JmpTos: rather than being messily mixed into code, as they were in REV02, they were grouped and moved to the end of their respective blocks of code. Additionally, redundant branches to JmpTos were eliminated: in Sonic 2 REV02, it wasn’t uncommon to see unconditional branches that branched to JmpTos, when they could have just been jump instructions that jumped straight to the intended destination – KiS2 removed many, if not all, of these.

Further adding to the idea that REV02 and KiS2’s JmpTos were hand-made is the fact that one of the JmpTos in REV02 (‘JmpTo13_MarkObjGone’) is completely unused. It was removed in KiS2.

Restored Debug Features

Invisible objects, such as plane-switchers and invisible walls, become visible in Debug Mode in KiS2. One object in particular is made visible with code that was previously only in REV00. This suggests that the code may have existed in REV01’s and REV02’s source code in a dummied-out form that was simply un-dummied-out in KiS2. Perhaps these debug features were hidden behind a build-time flag?

Removed Development Code

In Sonic 2, after the ‘loadLevelLayout’ function is some leftover code. The first chunk of code is the level layout loading function from Sonic 1, modified to repeat the background layout. This was used in some of Sonic 2’s prototypes.

After that is a function that converts a level’s chunks from Sonic 1’s 256×256 format to Sonic 2’s 128×128 format, and after that is a function for eliminating duplicate 128×128 chunks. These were likely used to convert Green Hill Zone’s chunks to 128×128 for Sonic 2’s “Nick Arcade” prototype.

After surviving through numerous prototypes, all three revisions of the final Sonic 2, Sonic Classics, and the Mega Play arcade version, this code was finally removed in KiS2. RIP.

Demos

Also known as ‘attract mode’, the game will play some demos if you leave it on the title screen. The developers of KiS2 attempted to preserve compatibility with Sonic 2’s demos, reenabling things like the air speed cap and giving Knuckles Sonic’s jump height when a demo is playing. Unfortunately, the result is not perfect, and the demos still manage to desynchronise at points. The developers went so far in their attempts to keep the demos working that they manually edited the inputs for the Emerald Hill Zone demo.

Other

I could talk about the modified title screen, Wing Fortress Zone cutscene, ending, and logo after the credits, but honestly I can’t think of anything noteworthy about them. Maybe I’ll go over them in a follow-up post, if I can think of anything interesting to say.

Standalone

As an experiment in what is possible with this disassembly, I’ve added an option to build a ‘standalone’ version of KiS2 that doesn’t rely on Sonic 2 or Sonic & Knuckles in order to run. This is similar to the ‘Sonic 3 Complete’ mode of the Sonic & Knuckles disassembly, which produces a version of Sonic 3 & Knuckles that doesn’t rely on Sonic 3. You can find a built ROM of this standalone KiS2 here. The intention of this, in addition to just being a tech demo, is also to make it feasible to produce ROM hacks of KiS2, which is practically impossible whilst it is dependant on two other ROMs.

Conclusion

Personally, I’ve learnt a lot about KiS2 from this disassembly, and I hope others will learn a lot from it too. KiS2 has always been a mysterious black box to me: its many changes and fixes always being out of reach and beyond our understanding, with no easy way to find the new in a sea of old. Every change and every fix was a needle in a haystack… but not anymore. Maybe now we can see a *complete* port of Knuckles to Sonic 2, title screen, ending, compatibility adjustments, and all!

The disassembly can be found here.

Fun fact: I started this disassembly on the 28th of April, and it was completed on the 5th of May. It took me almost five years to get around to doing something that only took a week. Geez.

Sonic Monitor in Microsoft Office

This will never not be weird to me: for some reason, there’s a Sonic monitor lookalike in Microsoft Office. To see it, go to the ‘View’ tab and select the ‘Zoom’ option:

I don’t get it. What came first: Sonic’s monitor or Microsoft Office’s monitor? Who was copying who? Were they even copying each other to begin with?

There seems to be multiple variations of this little sprite: I have an old screenshot from 2015 that shows a version which is much closer to Sonic’s monitor sprite:

Here’s a comparison of the three:

Somebody please tell me that I’m not alone in thinking that these look uncannily similar: they’re both 30×30, they use similar greys, and even the image in the Sonic monitor perfectly matches the size and position of the image in the Office monitor. They’re so similar that I can literally put it into a ROM-hack of Sonic 1 and it works perfectly:

Somebody please help: this has been driving me nuts for the last 7 years.