Design a site like this with WordPress.com
Get started

Virtual 68 Katy

At last, it’s working!

After writing the last blog post, I was able to find a test suite for the Motorola 68000, allowing me to verify the accuracy of my 68000 emulator. After addressing a number of inaccuracies, Linux could finally finish booting! The issue that was breaking kmalloc was the CMPA.W, ADDA.W, and SUBA.W instructions not setting their condition codes properly due to quirks related to sign-extension, which presumably broke a branch or two somewhere.

Upon booting Linux, I noticed that it wasn’t recognising serial input. This turned out to be because Linux expects a level 2 interrupt to occur when there is pending data in the serial port’s FIFO. Curiously, code in 68 Katy’s Linux port suggests that it should also support a level 7 interrupt, which combines the serial input update with a 100Hz timer update, however it does not appear to work. That aside, adding the level 2 interrupt was simple enough, and, with input now working, I could try out the various executables that were bundled with Linux.

The 68 Katy’s Linux port is very barebones, only sporting vi, expand, ledblink, and sash. It also includes Colossal Cave Adventure, which is an old-school text-based adventure game. Sash is neat, because, despite being a shell, it has some BusyBox-like functionality that allows it to perform commands such as mount, mkdir, touch, and ls, which is enough to do some basic file-management. The inclusion of a fun little terminal-friendly game is nice too, even if its executable does take up a lot of space (86KiB, out of the 512KiB ROM).

Running dusty old executables is fun and all, but I had something else in mind: I wanted to see if I could port my Mega Drive emulator – clownmdemu. You see, my Mega Drive emulator and my 68 Katy emulator use the exact same 68000 emulator, meaning that if I can get the former to run on the latter, then my 68000 emulator would be emulating itself!

Being restricted to a terminal means that my emulator can’t create any graphical output, but I can at least run the emulator as a benchmark and get an idea of its performance. Compiling my Mega Drive emulator to target the 68 Katy was simple enough, since it doesn’t have any dependencies beyond the C standard library, though I did have to increase the 68 Katy’s ROM and RAM to fit the files and give the emulator the memory it needs.

So, how fast is it?

This benchmark is running the game Knuckles the Echidna in Sonic the Hedgehog 2 for a single frame, which takes 0.5 seconds. That’s right: it’s running at 2 frames per second. Note that the 68 Katy’s emulated CPU is running as fast as the host platform will let it, which in my case is around 350MHz.

I would have gotten the entire 68 Katy emulator running in itself, but the awkward timer interrupt and terminal input logic meant that it has to rely on POSIX threads, which doesn’t appear to be compatible with the 68 Katy’s ancient toolchain.

I tried to port a newer version of Linux to the 68 Katy, but it seems that only the Linux 2.0.X build of uClinux supports the vanilla Motorola 68000: the closest thing that later versions of Linux support is the Motorola 68328 – a souped-up 68000 with additional features such as built-in timers and an improved interrupt mechanism. While I was able to eliminate the dependencies on these extra features from Linux 4.4 and get it to partially boot in my emulator, it would still crash before completing its boot process.

Despite that setback, I was still successful in running Linux on my 68000 emulator, even if it was just Linux 2.0.X. I think that this is a good place to leave the project for now, so I’ve cleaned-up the codebase and made it available on GitHub. In contrast to my usual naming scheme, I’ve named this project ‘Virtual 68 Katy’ just because I think it sounds cool. You can find its Git repository here.

Advertisement

(Almost) Emulating the 68 Katy

During a recent bout of illness I became fascinated with the concept of porting the Linux kernel to minimalist environments: it began when I stumbled upon this video of someone who had gotten Linux running on their RISC-V emulator, which made me wonder if I could do the same with my 68000 emulator.

Unfortunately, I am not familiar with porting Linux at all, and with my 68000 emulator not being very mature, I wasn’t sure if it was even capable of running Linux in the first place; if I encountered a crash, how would I know if it’s an issue with the Linux port or my emulator?

So it looked like this idea wasn’t going to pan out… but wait – what if I instead emulated an existing 68000 Linux port that I know already works?

While researching how to cross-compile Linux for the 68000, I read this series of blog posts about a 68008-based computer that was originally designed on a breadboard – the 68 Katy. It was amazing to see how similar wiring an old CPU up to ROM/RAM chips and some peripheral devices was to what I’d done with a PIC microcontroller back in university. I guess I figured that CPUs wouldn’t be as simple. Anyway, the blog posts provided a pre-built copy of the 68 Katy’s Linux port (complete with a bootloader and filesystem) in a single flat-mapped binary blob that was ready to be placed at the start of the 68000’s address space – it couldn’t be any simpler! All I’d have to do is implement the 68 Katy’s memory map and serial communication port, and I could run this blob in my emulator!

Oh, right, there was one feature that I needed to add to my 68000 emulator first: user mode. Previously, my emulator had only ever ran software that operated in supervisor mode, but Linux extensively makes use of user mode, which has its own unique stack pointer and raises exceptions if certain privileged instructions are used. Implementing this was simple enough, but it took a while to weed-out the subtle bugs.

With this last feature added, I could proceed to emulating the 68 Katy!

My 68000 emulator is actually just a part of my Mega Drive emulator – clownmdemu – but each component of my Mega Drive emulator was designed to be modular and usable independent from the rest of the project. It definitely paid off in this case, as it was easy to pluck out the 68000 emulator and begin wiring it up to a new environment: all it needs is some initialisation and two call-back functions for reading and writing memory.

The memory map is simple: 0x00000-0x77FFF for ROM, 0x78000-0x7FFFF for IO, and 0x80000-0xFFFFF for RAM. The serial communication port exists in the IO space, and was pretty complicated to implement: the actual device on a 68 Katy is an FT245, but all that really matters is that it’s a FIFO with a couple of status bits to say when there’s pending data to be read, or no more room in the FIFO for data to be written. Figuring out the details required reading the code of the 68 Katy’s system monitor (which is written in pure, largely-undocumented assembly), the FT245’s manual, and the FT245’s kernel driver.

With this implemented, I was able to boot the 68 Katy binary blob and enter the system monitor. At first, my emulator only had support for serial output, so I couldn’t give any input, but I could at least see the monitor boot and print a message. Unfortunately, input is required to make the monitor boot the Linux kernel. Once I had input working, I was able to boot the kernel by entering the command ‘j003000’ (jump to address 0x003000, which is where the kernel’s code begins in the binary blob).

At first, this resulted in an immediate crash, but this turned out to just be the effect of a bug in the memory map implementation (IO was being mapped to ROM – oops). With that addressed, the kernel was able to print a few messages before hanging on ‘Calibrating delay loop..’.

This same issue was detailed in the 68 Katy’s development blog: Linux apparently needs a timer interrupt in order to do stuff. The 68 Katy has a timer wired up to the 68008’s interrupt pins, raising a level 5 interrupt every 100th of a second. Once that was recreated in my emulator, Linux was able to proceed a little bit further.

After hours of debugging, I found that this hang is caused by a failure to create the kernel thread which is responsible for running the ‘init’ function, which presumably completes the rest of Linux’s initialisation process. The thread is unable to be created because a call to ‘kmalloc’ fails to allocate memory. Unfortunately, this is where the extent of my debugging abilities end.

I can only imagine that my 68000 emulator has a bug in it, which somehow is not exposed when running any of the Mega Drive games and homebrew that I have on hand. I was hoping that any inaccuracies in my emulator would result in easily-debugged hard crashes rather than an insidious little state corruption like this.

The best way that I can think of to debug this is to swap out my 68000 emulator with one that I know is accurate, and then compare various ‘kmalloc’-related variables at various points throughout the boot process, but that doesn’t sound like the most fun. Admittedly, a proper test suite for my 68000 emulator would help a lot to find inaccuracies. I wonder if there’s anything like the Z80’s ‘ZEXALL’ instruction set exerciser for the 68000…

Making My Own Game Engine

Where the heck have I been lately? Starting yet another project, of course.

Why an engine? Well, there’s only so much time I can spend poking around other people’s game engines before I start wanting to make my own… which in my case is about 10 years.

Personally, I have a real hate boner for premade game engines like Unreal and Unity: taking the programming out of game development just takes all the fun out of it. In addition, I’m a massive sucker for NIH syndrome: I love making my own versions of things. MD5 hasher? I made one of those. Graph-based LZSS compressor? That too. Mega Drive emulator? Yup. I always prefer to use things that I made over things that other people made, so why not make my own game engine that I can just play around with too?

As always, this project has some goofy programming requirements: in particular, I want it to be written in C89, and have no global state. Why C89? Because I like being able to compile my code with ancient compilers like MSVC 6. I also see it as a fun challenge, being unable to rely on a built-in boolean type or 64-bit integers. Why no global state? Because I want to allow multiple instances of the engine to be ran at once, just like my Mega Drive emulator. This could be useful for things like adding a ‘race’ mode to the engine that allows two people to play a game at once and see who can beat it first.

C89’s shortcomings come down to its requirement that declarations always come before code (which leads to very ugly code), its lack of 64-bit integers, its lack of a boolean type, and its lack of many C standard library functions, macros, and constants. Most of these haven’t been an issue for me (so far I’ve had no need for 64-bit integers, for instance), and the missing things that I do need can simply be implemented from scratch. For this, I created a header called ‘clowncommon.h’ that defines such things as a boolean type and a Pi constant, as well some bonuses like min/max/clamp macros and a degree-to-radian macro.

Eliminating global state from the engine was tricky due to many dependencies having unavoidable global state, particularly GLFW and OpenGL. This is understandable, especially in GLFW’s case, but the tricky part is knowing what should and shouldn’t be global. For instance, there’s no harm in having multiple instances of the game engine generate vertex buffers from the same OpenGL context, but the instances should not share the same pool of vertex buffers, as one instance exhausting the pool would cause it to be exhausted for the other instances as well, breaking the encapsulation between them. Having designed the engine around this from the start, it wasn’t hard to eventually get a working build of the engine running two instances of itself at once, rendering to a single window:

The same scene being displayed by two separate instances of the engine in horizontal split-screen.

I’m quite proud of the object system: like the 16-bit Sonic engine, objects are allocated from a pool, but unlike the Sonic engine, objects run their constructor as soon as they’re allocated, rather than during the first invocation of their ‘update’ function. Additionally, my engine’s object system has support for destructors. Finally, the pool is managed with a pair of linked lists: one for allocated objects, and one for deallocated objects. This makes allocation and deallocation ‘O(1)’, compared to the ‘O(n)’ of the Sonic engine’s allocation, which performs a slow array search. Additionally, the linked lists makes iterating over all allocated objects faster, since the deallocated objects are skipped without needing to be checked first.

I’d have more to show off, but I’ve been stuck for the past few weeks on implementing collision detection. I didn’t want to use a discrete collision algorithm, as that would allow things to phase through walls if they move fast enough, so instead I looked into continuous collision algorithms. I found this one which, while not perfect, seemed to be good enough for a first attempt. Taking into account criticisms of the algorithm such as this and this, I was able to produce an implementation of the algorithm that lacked some of its original shortcoming, such as allowing the collision ellipsoid to become infinitely close to collision triangles. Despite this, I have been stuck debugging an issue involving the collision response step overreacting and pushing the collision ellipsoid far away from the triangle that it intersected with. To tell you the truth, I wasn’t expecting this to be so difficult: I’ve made so many things over the past 10 years that I wasn’t expecting to be stumped by something as ‘simple’ as level collision. Rather than work on my engine proper, I’ve instead had to divert my effort to writing a debugging visualiser for the collision system. Bleh, at least I got to write a sphere mesh generator – that was fun.

A generated sphere, which has been scaled into an ellipsoid that represents the player’s collision shape.

Maybe I should use this debug visualiser as an excuse to develop the engine further, such as by adding a GUI or input rebinding, or perhaps even a full-blown debug mode. Who knows.

Hopefully, I’ll crack the collision bug soon, and I can start making some real progress on this engine.

clownmdemu – 68k Overhaul

Ever since adding support for the Z80, the next improvement that I wanted to make to my emulator was overhauling the 68k CPU interpreter.

As detailed in the first emulator-related blog post, the 68k CPU interpreter was one of the first parts of the emulator that I created. It was also the first CPU interpreter that I had ever written. Safe to say, there were a lot of mistakes made with it: in particular, I optimised it for size rather than speed.

Each of the 68k’s instructions follow very similar steps: read the source operand, read the destination operand, perform the instruction’s action (such as subtraction or addition), write the destination operand, and update the condition codes. Some instructions skip certain steps, or use alternate versions of steps that are also used by other instructions. Because of all this shared logic, I made each step of instruction execution a switch statement, something like this:

/* Read source operand. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_SUB:
	case INSTRUCTION_MOVE:
		/* Read standard operand. */
		source = ReadOperand(&source_operand);
		break;

	case INSTRUCTION_ADDI:
	case INSTRUCTION_SUBI:
		/* Always a literal */
		source = ReadLiteral();
		break;

	case INSTRUCTION_NOP:
		/* Doesn't have a source operand. */
		break;
}

/* Read destination operand. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_SUB:
	case INSTRUCTION_ADDI:
	case INSTRUCTION_SUBI:
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);
		break;

	case INSTRUCTION_MOVE:
	case INSTRUCTION_NOP:
		/* Doesn't read its destination operand. */
		break;
}

/* Perform instruction. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_ADDI:
		destination += source;
		break;

	case INSTRUCTION_SUB:
	case INSTRUCTION_SUBI:
		destination -= source;
		break;

	case INSTRUCTION_NOP:
		/* Does nothing. */
		break;

	case INSTRUCTION_MOVE:
		destination = source;
		break;
}

/* Write destination operand. */
switch (instruction)
{
	case INSTRUCTION_ADD:
	case INSTRUCTION_SUB:
	case INSTRUCTION_ADDI:
	case INSTRUCTION_SUBI:
	case INSTRUCTION_MOVE:
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);
		break;

	case INSTRUCTION_NOP:
		/* Doesn't write its destination operand. */
		break;
}

While this does eliminate duplicate code, it also adds a lot of runtime overhead. For some reason, GCC does not produce jump tables from these switch statements either, instead producing gross if-else chains that further add to the overhead.

This was a mistake that I made sure not to make with the Z80 CPU interpreter, which only has the one switch statement, like this:

switch (instruction)
{
	case INSTRUCTION_ADD:
		/* Read source operand. */
		/* Read standard operand. */
		source = ReadOperand(&source_operand);

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination += source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_SUB:
		/* Read source operand. */
		/* Read standard operand. */
		source = ReadOperand(&source_operand);

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination -= source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_ADDI:
		/* Read source operand. */
		/* Always a literal */
		source = ReadLiteral();

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination += source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_SUBI:
		/* Read source operand. */
		/* Always a literal */
		source = ReadLiteral();

		/* Read destination operand. */
		/* Read standard operand. */
		destination = ReadOperand(&destination_operand);

		/* Perform instruction. */
		destination -= source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_MOVE:
		/* Read source operand. */
		/* Read standard operand. */
		source = ReadOperand(&source_operand);

		/* Read destination operand. */
		/* Doesn't read its destination operand. */

		/* Perform instruction. */
		destination = source;

		/* Write destination operand. */
		/* Write standard operand. */
		WriteOperand(&destination_operand, destination);

		break;

	case INSTRUCTION_NOP:
		/* Read source operand. */
		/* Doesn't have a source operand. */

		/* Read destination operand. */
		/* Doesn't read its destination operand. */

		/* Perform instruction. */
		/* Does nothing. */

		/* Write destination operand. */
		/* Doesn't write its destination operand. */

		break;
}

While a lot more verbose, it was considerably faster: roughly 3x more-so than the 68k CPU interpreter.

I didn’t want to work on my emulator any further until this was fixed: what was the point in improving the emulator when such a core component is fundamentally flawed? However, addressing this took much longer than planned, perhaps due to scope creep: I didn’t want to manually ‘inline’ the code for every instruction, as the 68k has around 100 instructions totalling around 2000 lines of code, and not only would doing so take forever and result in an unmaintainable mess of duplicate code, but it would be extremely prone to human error.

Inspired by floooh’s Z80 interpreter, I decided that I should automate the process. This would greatly reduce the chance for human error while simplifying future overhauls. At first, I planned on doing the same as floooh, and condense every instruction down to metadata in a table, which could then be parsed by a generator which emits code to execute said instruction. However, I ran into issues with how aspects of instructions can only be represented by algorithms rather than data, and how the table would contain a massive amount of repeating data, which is exactly what I was trying to avoid. Eventually I settled for something much more primitive: instead of parsing a big table, I would just take the existing interpreter and modify it to emit code instead of execute it. This way I can have the best of both worlds: the generator will have next to no duplicate code, while the generated output will have no unnecessary overhead.

It took some wrangling, but I eventually got this done, and the performance improvement was a very welcome 60%+: the 68k CPU interpreter now ran at around the same speed as the Z80 CPU interpreter, all from just eliminating some switch statements.

The generated interpreter, while readable and resembling hand-written C code, is rife with duplicate code, taking it from 2000 lines of code to 5000. Because of this, I still prefer to work with the original small interpreter. Maybe, at some point, the generator will become too much effort to maintain and I’ll start editing the ‘inlined’ interpreter directly, but that remains to be seen.

While the original 68k interpreter was indeed slow, it was by no means the slowest part of the emulator. In fact, it was the third slowest: followed by the VDP renderer (slightly slower) and the emulator’s cycle-ticking function (3 times slower).

The VDP renderer being slower is understandable, but why is the cycle-ticking function so slow, and 3 times more nonetheless? Isn’t its whole purpose to just sit in a loop calling the functions that do the actual emulation? It turns out that this was down to bad programming: the Mega Drive’s master clock runs at about 52MHz, so the emulator’s cycle-ticking function must do 52,000,000 cycles per second. How did it do this? With a plain-ass for-loop. The thing is, most of these cycles are wasted as the 68k only runs 1 out of every 7 cycles, and the Z80 only runs 1 out of every 15 cycles, resulting in most iterations of the for-loop doing nothing but creating overhead. The solution? Stop iterating the for-loop a single cycle at a time. Instead, iterate as many cycles as necessary for the 68k or Z80 to update, or for the Mega Drive to finish updating for the current frame. With this, the cycle-ticking function went from being 3 times slower than the VDP renderer to being about the same speed as it.

…As you may have noticed, this means that tweaking a for-loop resulted in a bigger performance gain than overhauling the 68k CPU interpreter. That stings.

Still, this means that my emulator is a lot faster now! It might not be running full-speed on old DOS PCs any time soon, but it’s still a win for running well on limited hardware, or in exotic environments like Emscripten.

In the future, I’d like to look into optimising my CPU interpreters further by extending the single switch statement from per-instruction to per-opcode, eliminating yet more overhead at the cost of a massive increase in code size and the resulting cache misses. Better yet, I’d like my generator to be able to produce both per-instruction and per-opcode interpreters, to suit whichever platform the emulator is being built for.

Until then, I think the next thing I work on will be adding support for the Window Plane to the VDP renderer. Maybe after that I can finally optimise the VDP renderer so that it’s not the slowest part of the emulator.

As usual, if you’re interested in checking out the emulator yourself, you can find binaries and source code on GitHub.

Porting Sonic Mania to the Wii U

I’m still quite busy these days, but I did find time in the last month to start a new project:

As you can imagine, this was pretty similar to the Sonic CD port that I did last year: not only do they run on the same engine (RSDK – Mania just uses a newer version), but they also have the same dependencies (SDL2 and libtheora).

The process of porting the game wasn’t quite as simple though: while Sonic CD just required some endianness and thread-safety fixes, Sonic Mania had major performance issues. In particular, Competition Mode, the Special Stages, the pinball table, and the Stardust Speedway Zone Act 2 boss fight would all lag severely. The reason for this is that Sonic Mania is software-rendered, and its fancier effects are simply too much for the Wii U’s CPU to handle.

But that wasn’t the only problem that I had to deal with: this port also exposed issues in the Wii U’s SDL2 port as well!

But I’m getting ahead of myself… let’s start at the beginning:

Timeline

Getting it to Boot

In the middle of August, the Sonic Mania decompilation was released, and news of it reached a small Discord server that I was in. One thing led to another, and I accidentally gave people the impression that I would port the decompilation to the Wii U, as I had done previously with Sonic CD. Eventually, word spread to GBAtemp, and I soon had people messaging me, asking about the port that I was allegedly working on. Realising that I’d kind of done this to myself, I figured that I might as well give it a shot. I had an afternoon all to myself – just me, my Wii U, a copy of the decompilation, devkitPPC, and no internet – so I got to work. Within a few hours, I had the game running, albeit very rough and laggy.

Building the game was simple enough: the build system is not complex, so I was able to quickly write my own CMake script from scratch (the decompilation only comes with a Makefile, but CMake scripts are much easier to target the Wii U with). The Wii U already had a port of SDL2, and I was able to compile my own Wii U port of libtheora, so the dependencies were quickly satisfied as well. But once I had produced a Wii U executable, I found that running it would immediately result in an endless black screen – the game was crashing upon boot.

Luckily, the game has built-in error reporting, and I found that the game was failing to locate a file. The game keeps all of its files in a single archive blob, with each one being represented by an MD5 hash of its original filename. To load a file, the game hashes the filename at runtime and then uses that to locate the file in the archive. But here’s the thing: the MD5 hasher that the game uses is garbage: it’s literally just some random code grabbed off a random website.

This hasher’s code commits every sin:

  • Tiny, incomprehensible variable names (because why would anyone want to understand what a variable does or is for?).
  • No documentation (no, really, who cares about code being readable and maintainable?).
  • Uses dynamic memory for no reason (MD5 has absolutely no need for dynamic memory: everything can be done on the stack with a small buffer).
  • Doesn’t handle malloc failure (who doesn’t like null pointer dereferences?).
  • Is dependent on implementation-defined integer type sizes (because, as we all know, ‘int’ and ‘long’ are the same size on x86-64 Linux as they were on DOS, am I right?).
  • Uses signed ints for representing the size of arrays (ah yes, an array of -0x1000 chars, please).
  • Only works on little-endian CPUs (because screw basic portability, especially in code that’s, you know, shared online and intended to be used by people in a variety of projects).

While the other points could be considered nit-picks, that last one prevents the game from running on the Wii U, which has a big-endian CPU. How is it not compatible with big-endian CPUs? It casts an array of chars to an int, instead of just bit-shifting and OR-ing them together like sane code would.

Because of this bug, the calculated hashes would be incorrect, causing the game to fail to find vital files and crash.

Anyway, with that fixed, the game can now run, right? Nope: it still crashes on boot. It’s not just the MD5 hasher that only works on little-endian CPUs: the entire game engine is the same way! Oddly, there is some degree of explicit big-endian support in the engine’s code, but I guess due to code-rot, other parts of the engine lack it, such as the UTF-16 string reader. After patching this all up, the game could finally boot.

RGB565

The game was now running, but there was one pretty big problem: it was very, very laggy in anything but the main levels. Special Stages, animated cutscenes, pinball, Competition Mode, they would all lag to half or even a third of the game’s usual speed.

As I explained earlier, this is largely due to the game being software rendered, and the Wii U’s CPU being a dinosaur that’s still compatible with GameCube code, but it’s also caused in part by SDL2: you see, the game outputs an RGB565 framebuffer, but the Wii U port of SDL2 doesn’t support RGB565, causing SDL2 to convert it to a variant of RGBA8888 and use that instead. This adds a massive amount of overhead to the game’s rendering. Despite this, the Wii U actually does support RGB565. Indeed, there is even code in the SDL2 port that tries to use it, but it’s dummied out with a comment explaining that it doesn’t work. The cause of this is that there’s a catch to RGB565 support: each 16-bit RGB565 pixel is little-endian, not big-endian. This isn’t too hard to work around, though: just byte-swap the bitmap as it’s passed to the GPU.

With this done, the game started to perform a lot better, but it would still drop to half speed a lot. This struck me as odd, since it would drop to half speed even if it should only be lagging slightly. Looking into SDL2’s code, I found the cause: the port uses regular V-sync instead of adaptive V-sync. For those not in the know, adaptive V-sync is a variant of V-sync that doesn’t wait until the next frame if the game is lagging. This means that if the game only barely missed the frame that it was meant to wait for, then it wouldn’t wait for the next one – it would just continue. The result is that the game only lags as much as it has to, instead of dropping to a fraction of its framerate. Making the V-sync adaptive was a simple change to make to SDL2.

With these two fixes combined, the game’s performance was massively improved: many instances of lag were eliminated, and the cases where the game still lagged at least ran at a framerate that was much closer to 60FPS.

YUV420

There was still one part that consistently lagged, however: the animated cutscenes. Unlike the rest of the game’s RGB565, the animations render in YUV420. This is a weird format that encodes colour as a single luminance channel and two chrominance channels. Unlike RGB, these channels are not interleaved, and they’re not even the same resolution: the chrominance channels are half the vertical and horizontal resolution of the luminance channel. It’s really weird.

Like with RGB565 before, SDL2 didn’t support YUV420 on the Wii U, so it was converting the YUV420 image to something else every frame. Unlike RGB565, the Wii U doesn’t support YUV420 natively… however, it can be emulated with a fragment shader.

This is a trick that I learnt from SDL2’s OpenGL renderer: if you create three textures, you can assign one to each of the YUV channels. Then, these textures can be sampled in a fragment shader, and dotted with some lookup tables to produce an RGB pixel. This works out to be much faster than SDL2’s CPU-based conversion, allowing the cutscenes to finally run at full speed.

It’s so pretty that you wouldn’t guess it was encoded in such a gross colour format!
Controller Support

Oddly, the game could only read input from the Wii U Gamepad: the Wii U Pro Controller and Wii Remote didn’t work at all. Looking into it some more, I found that this was because SDL2 only supported those controllers in its legacy ‘joystick‘ API, and not its newer ‘game controller‘ API. The difference between the two is that the ‘game controller’ API binds each button to an XInput-style layout, avoiding the need for the user to configure their own button mappings like in an old PC game. To achieve this, SDL2 contains a database of known controllers and which buttons map to their equivalent XInput buttons. All I had to do was add the Wii U Pro Controller, Wii Classic Controller, and Wii Remote (with Nunchuk) to this database, and now these controllers can be used. This is great for Competition Mode!

A Gamepad, a DualSense, a Wii Remote, and a Pro Controller, all connecting to the Wii U for a four-player game of Competition Mode! The DualSense is connected to the Wii U with the amazing Bloopair homebrew!
Random Crashing

Ah yes, my worst nightmare: after releasing the first few versions of my port, I began experiencing random crashes. There was no pattern to when they’d occur, so I couldn’t reproduce them at will: all I could do was playtest the game for hours and hope that it would eventually crash. At first, I had no leads to go on, but a handy YouTube comment tipped me off that unlocking achievements would consistently result in a crash. From this, I found that the memory allocator was faulty.

Yes, Sonic Mania has its own memory allocator. It’s neat: it performs automatic garbage collection and defragmentation. However, the decompilation broke it in two key areas: the allocator would increment the total number of allocations even if the allocation failed, and the allocation duplicator would fail to increment the allocation total at all, resulting in memory being destroyed by the garbage collector while still in use. Once these were addressed, the memory allocator was back to functioning as intended.

Awesome! No more random crashes, right? Nope: there were more!

At this point, I was stumped: my only lead was a dead end, leaving me with no other ways of diagnosing what the cause of these crashes was. That was, at least, until a Wii U homebrew developer called Gary informed me that the Wii U’s operating system keeps crash logs in its /storage_slc/sys/logs directory. This was an enormous help: suddenly, I had a full stack trace and register dump of the last 100 crashes that my Wii U had experienced! From this, I found that the crashes were occurring in the audio mixer. A null pointer dereference; strange, as there was only one pointer in the audio mixer (the sample pointer), and it should never be null.

This pointer is obtained from a struct which is only read if a flag in it is set, but, whenever this flag is set, the pointer is set to a non-null value. This could only mean one thing… a thread-safety issue.

Eventually, I found the culprit: the StopSFX function. Most audio functions make sure to lock the audio thread before running, but this one neglects to. I’m not sure if this is the case in the real Sonic Mania, or if this is just a mistake in the decompilation, but either way this failure was resulting in a race condition where the audio thread was running after the flag had been set but before the pointer had been set, resulting in a null pointer being read and processed by the audio mixer, crashing the system. With this fixed, the random crashing was finally a thing of the past.

Aroma

During the development of this port, the Aroma Wii U Homebrew Environment was released. In particular, Aroma sports one very interesting feature: running homebrew from the Wii U Menu, bypassing the Homebrew Launcher. This meant that homebrew could use the Home button to open the Home Menu, whereas with the Homebrew Launcher it would just abruptly exit the homebrew without warning. Additionally, it meant that homebrew could include a cool splash screen that would be shown during start-up. It’s also just cool to have homebrew displayed alongside official Wii U software and games. Naturally, I updated my port to take advantage of this, making it look and feel extra official.

It’s like it was meant to be!
File IO

The Sonic Mania decompilation supports mods, but I was surprised to find that these mods would load extremely slowly on the Wii U. For instance, a level-replacement mod would take upwards of an entire minute to load. What’s going on? Is the Wii U’s SD card IO really that slow?

At first, I thought that this was an issue with devkitPPC’s implementation of fopen and related functions: perhaps it’s doing some weird slow low-level interaction with the Wii U’s SD card slot? I figured that I could try using the Wii U OS’s native file reading functions instead, only to find out that those are exactly what devkitPPC uses.

Eventually, I began trawling through devkitPPC’s code to see if it was using the native file reading functions incorrectly. On the contrary, however, I found that the code was very well written: there were already clever tricks that exploited the properties of the CPU’s cache to optimise file reading. Higher up the software stack, in the C standard library (newlib), there was even complete IO buffering, greatly reducing the number of IO accesses. What could possibly be going wrong here?

Welp, after examining the code for hours, I realised that I was looking at the wrong branch. After switching to the devkitPPC branch, I found the cause: newlib’s IO buffering was being manually disabled by a hack in the code, causing every byte of read data to result in an IO poke. No wonder it took a minute to load a couple of megabytes. This change had recently been reverted, but an update had yet to be released that contained this change. Still, the C standard library has a function for re-enabling IO buffering (setvbuf), so I made the game use that, and now mods load just as fast as the rest of the game!

jubbalub’s Labyrinth Zone mod no longer takes forever to load!
Wrapping Up

After just over a month of work, the port is now quite complete, running fast and stable while providing near feature-parity with official releases. This project has not only led to improvements in the Sonic Mania decompilation, which should benefit all ports and not just this one, but also improvements to the Wii U port of SDL2, which should benefit many Wii U homebrew programs and games which depend on it, granting them support for more controllers, improved support for certain texture formats, and even some bugfixes.

If you’re interested in trying my Sonic Mania port, you can find it here.

If any fancy updates are made to the port, I’ll be sure to talk about them here. I’m still trying to think of a way to port the screen filter shaders, and it would be nice to add full hardware-accelerated rendering to eliminate the remaining lag in the Special Stages. We’ll see. In the meantime, though, I’ve taken an interest in my Mega Drive emulator once again…

Project Sand: Sonic Aftermath

I was digging through some old Sonic hacks of mine when I stumbled across these ancient relics from 2014 and 2016.

Remember that Project Sand/Sonic Aftermath hack that I have a few old videos about? Here’s the one level of it that was worked on before the project died: Sand Zone from Cave Story.

There’s not much that I can tell you about it: this one test level was made before I came up with any ideas for gimmicks and the sort. That purple mess in the middle of the stage is supposed to be spikes, but its graphics are overwritten with the level tiles.

I believe that the music is a straight port from Cave Story: the Organya music was converted to XM with the org2xm tool, then that was converted to SMPS with xm2smps, and then that was converted from binary to ASM with SMPS2ASM, allowing it to be installed in my hack’s custom sound driver. This hack was either rocking my Sonic 2 Clone Driver v2, or Flamewing’s Flamedriver. After all this time, I can’t remember which.

This footage is of an earlier build of my hack than the one seen in the other videos. The reason for this is that the hack was remade from scratch after this build, and this level was never reintroduced afterwards.

Here’s a cutscene that was one of the last things that was worked on before the project died.

This was meant to be the opening cutscene to Knuckles’ story: the hack was meant to be a ‘what if’ scenario where Sonic 2 ends differently, leading to a different series of events in what would be Sonic 3. This cutscene depicts the Death Egg landing in Angel Island’s volcano directly after the events of Sonic 2, instead of the lake like it normally does. Following this, Knuckles would have gone to investigate it. At the time, I didn’t have Knuckles ported into the game, so I used Sonic as a placeholder.

Between this cutscene, Sand Zone, and the custom title screen, this is all there ever was to Project Sand/Sonic Aftermath.

clownmdemu and clownassembler released

Sorry for the drought of blog posts lately: I’ve been busy with work and a lot of other IRL stuff. Still, if there’s one thing I can give an update on, it’s that two of my projects have finally seen a release: clownmdemu and clownassembler have been released on Sonic Retro, Sonic Stuff Research Group, and Mega Drive Developers Collective.

Here are links to the various release threads.

clownassembler was released less than an hour ago, so there’s not much that I can comment on feedback-wise. However, clownmdemu was released at the end of June, giving it plenty of time to receive feedback.

Surprisingly, people were quite enthusiastic and welcoming of the emulator despite its unfinished state. If anything, people seemed more bothered by the hardcoded key bindings than its inability to boot certain games. I can only hope that clownassembler gets an equally warm reception, considering that it is similarly unfinished.

clownmdemu – Z80 Support

With the addition of FM support, my Mega Drive emulator came much closer to being able to provide a complete experience for certain games such as Sonic 1. Unfortunately, there was still one major missing feature: drums, voice clips, and sometime even all audio entirely, were inaudible. What gives? Sonic 1 plays most of its audio, but 2 and 3 don’t play any at all?

The reason that this is the case lies in the architecture of these games: Sonic 1 uses a sound engine that runs on the 68k CPU, while 2 and 3 use one that runs on the Z80 CPU. Up to this point, my emulator has not emulated the Z80 CPU, which is why no sound plays in those two games. Additionally, Sonic 1 uses the Z80 CPU for its drums as well as the famous Sega chant, which is why those are missing as well.

However, this problem is no more: new to clownmdemu is Z80 CPU emulation. It doesn’t implement 100% of the Z80’s feature set, but it’s enough to get at least the Sonic games to output all of their audio. Heck, it’s even enough to get Sonic ROM hacks to play their audio, including the ones that use fancy custom Z80 code:

This ROM hack in particular, Sonic 2 Recreation, uses Z80 code that was written from scratch by ValleyBell and is able to apply a variety of effects to its PCM samples.

Writing another CPU emulator was nice because it gave me a chance to apply what I’ve learnt since writing the 68k CPU emulator. In particular, instead of making each step of execution a giant switch statement, there’s only one switch statement: one for each instruction. This does result in some duplicate code, but the hope is that this is outweighed by avoiding the overhead of going through 6 or so switch statements.

I’ve also used this opportunity to write the machine code decoder in such a way that its job can be replaced with a lookup table. You see, the task of breaking a byte of machine code down into a struct which describes the operation to be performed has been given to a function: this function takes a byte and returns a struct. Because of this modularised design, I can execute a loop on start-up which executes this function for every possible byte, from 0 to 0xFF, and then caches the resulting structs in a big lookup table. Then, during emulation, the machine code to be executed can be used as an index into this table to retrieve its corresponding struct. By doing this, the expensive step of manually decoding the machine code is skipped entirely. According to my tests, this doubles the performance of the Z80 emulator. Though, for RAM-limited platforms, I’ve left a compile-time option to revert back to using the function instead of a lookup table. One day, I would really like to refactor the 68k CPU emulator to bring over these improvements.

The Zilog Z80 itself is a weird thing: it’s an 8-bit CPU with a 4-bit Arithmetic Logic Unit which mimics the instruction set of another CPU (the Intel 8080) while adding extra instructions of its own. The extra instructions were bolted-on in a very ugly way that clashes with the ‘neatness’ of the base 8080 instruction set. It also contains some outright hacks that aren’t intuitive to emulate at all: for instance, there’s a certain mode that you can make the Z80 enter where the output of an instruction can be written to both a register and memory at the same time. It essentially hijacks the operand getters and setters.

Anyway, with this implemented, Sonic 1 and 2 are now finally fully emulated: there isn’t a single feature in those games that isn’t being emulated (to my knowledge anyway). Sonic 3 still has some strange bugs, and it makes use of unimplemented features such as SRAM and the Window Plane, so that may be what I work on next.

There are still a number of things that I want to add before I give this emulator a proper release: a libretro core, controller rebinding, SRAM, the Window Plane, 68k instruction durations, 68k exceptions, YM2612 LFO, YM2612 SSG-EG, YM2612 Timers… though, maybe I should just release this anyway: it’s not like this emulator will ever play every title perfectly or anything.

As always, you can find the source code to my emulator in the usual place.

Oh, right. I should also list the resources that I used when making this Z80 emulator. First there’s this blog post which gave a general overview of the Z80, and explained why and how the Z80’s instruction set is the way it is. Then there’s this follow-up which explains the Z80’s timings. There’s also this useful table which visualises the various opcodes. Finally, there’s this webpage which explains how to effectively decode Z80 opcodes. These resources were invaluable to me, and hopefully they can be to others too.

clownmdemu – FM Audio Emulation

It finally happened! With university over, I decided to tackle what is perhaps my greatest challenge yet in writing this emulator: emulating the YM2612.

The YM2612 is the Mega Drive’s primary audio chip. Apparently, it is a cost-effective, stripped-down version of the YM2608: while the YM2608 featured FM, SSG, Rhythm, and ADPCM modules, the YM2612 is just a standalone FM module with basic DAC output slapped on it.

For the longest time, the only documentation that was available to emulator developers was the “Genesis Software Manual“, which was a document that Sega made available to developers that described the console’s hardware. Unfortunately, this document went into very little detail about how the sound hardware worked. Still, it was apparently good enough for a number of emulators to be made back in the 1990s and 2000s.

Later, in 2008, Nemesis obtained a copy of the official YM2608 manual. Unfortunately, this document was in Japanese, but he was able to produce a mostly-coherent machine-translation. This document answered many questions that emulator developers had about the YM2612, but it still failed to go into detail when it came to certain subjects that were essential for emulator developers to understand.

When I began implementing my own FM emulation, I decided that I would try to stick to this manual early on, and only seek out additional information once I get stuck. With just this YM2608 document and my own knowledge of the YM2612, I was able to produce an extremely basic emulator that produced one sine wave per channel. These sine waves could have their volume and frequency adjusted. While this wasn’t nearly enough to produce authentic audio, it did at least make it possible to hear music and sounds.

Unfortunately, there wasn’t much more that I could do than this: I knew from the manual that there were three core components of the YM2612 that were essential to how it produced audio: the Operators, the Phase Generator, and the Envelope Generator. However, while the manual described what they are and what input they take, it did not describe what output they create from that input. To give an example, a ‘detune’ value can be supplied that offsets the frequency, but neither the Genesis Software Manual nor the YM2608 manual describe how much it offsets the frequency.

At this point, I decided to find some more information on the YM2612. I remembered that the SMPS devkit which was found a few years ago had a YM2612 manual, but unfortunately it too was in Japanese and with seemingly no translation available. It turns out that the manual for the YM2612’s CMOS equivalent (the YM3438) was also found, but yet again it was in Japanese. This wasn’t a huge loss though, as those didn’t appear to contain anything particularly useful that wasn’t already in the YM2608 manual.

What was useful, however, was a thread on SpritesMind that I’ve had bookmarked for years. It’s 58 pages of discussion and discoveries regarding YM2612 emulation, including some incredible documentation that was produced by Nemesis. In particular, he created three massive write-ups of exactly how the YM2612’s Operators, Phase Generator, and Envelope Generator all work. This information is utterly invaluable, as it provides most of the ‘missing pieces’ to the YM2612’s functionality that the manuals lack. Somehow, with nothing more than a Mega Drive, an oscilloscope, and an extensive array of tests, he was able to even figure out details as nuanced as the exact values of the chip’s internal sine wave lookup table.

It took me about four hours straight to read through that whole thread, but it was worth it! I’ll probably have to read through it again to catch any details that I missed the first time around. If you’re interested in making your own Mega Drive emulator, or are just curious about the YM2612, then I cannot recommend that thread enough.

Unfortunately, Nemesis wasn’t able to complete his documentation of the YM2612, meaning that there was still missing information on three key components: the accumulator, Operator feedback and modulation, and the Low Frequency Oscillator.

For the time being, I passed on implementing the Low Frequency Oscillator because Sonic 1 (the game that I was using to test audio) doesn’t use it. The accumulator, I had already produced my own implementation of through guesswork. This left the Operator feedback and modulation.

So what is Operator feedback and modulation? To understand that, you have to understand how the YM2612 produces sound. So, what the heck: here’s an overview of how the YM2612 works:

The YM2612 has six channels. Each channel is composed of four sine waves, dubbed ‘Operators’. Each Operator has its own Phase Generator and Envelope Generator. The Phase Generator advances the sine wave, and the Envelope Generator produces an ADSR envelope. The Phase Generator manages the frequency of the sine wave, and thus is responsible for producing notes, while the Envelope Generator is responsible for shaping the sine wave into a more complex waveform, and thus creating basic ‘instruments’ or ‘voices’.

However, the Operators allow for even more advanced ‘instruments’ to be made through Operator modulation: rather than output its waveform to the speakers, an Operator can instead feed directly into another Operator, modulating its sine wave in a process known as ‘phase modulation‘.

‘Operator feedback’ is the process of the first operator of a channel feeding into itself, which is a feature unique to the first operator.

In the spirit of emulation development, I decided I’d find out for myself how to implement Operator modulation: with the help of the cycle-accurate Nuked OPN2 YM3438 emulator, I compared the output of my emulator to what a real Mega Drive would sound like, and tweaked my own Operator modulation implementation until it sounded correct.

Despite this, the audio still sounded far from accurate. I was able to track one source of distortion down to an incorrect implementation of the Phase Generator’s multiplier, and while that did fix the channels sounding like whistles, it still left the audio sounding like this:

Clearly the envelope generator was running too fast… and yet its code matched Nemesis’s notes exactly. I spent hours debugging this, creating custom FM instruments to test specific parts of the emulator against Nuked OPN2, but nothing made sense: the envelope generator was absolutely working as intended. I then figured that perhaps I had given the emulator’s frontend the wrong sample rate, and that it was somehow playing the audio back twice as fast.

The truth was a lot dumber: I’d accidentally given the emulated YM2612 a 7x overclock.

The YM2612’s clock is derived from the Motorola 68000’s clock, which is derived from the master clock. The 68000’s clock is the master clock divided by 7, and the YM2612 clock is the 68000’s clock divided by 6. My emulator had the YM2612 clocked at the master clock divided by 6.

Once that was corrected, (and I properly implemented ADSR envelope rate-scaling which I somehow glossed over when reading the YM2608 manual 100 times), I finally had this:

At last: it worked! And it came so suddenly too: one minute everything’s a whistle, and the next it sounds like a real Mega Drive!

And that takes us to where we are now. There are still many things left for me to implement in my YM2612 emulator, such as per-operator frequencies, Timer A and Timer B, the Low-Frequency Oscillator, SSG-EG (a second Envelope Generator), and possibly even the debug registers.

You might be asking yourself how I’m going to replicate all of those when there’s so much missing documentation. Well, the truth is that the YM2612 and YM3438 have actually been completely documented for years now. I just figured that it would be too easy to use that documentation. What is that documentation? Nuked OPN2’s source code.

You see, Nuked OPN2 isn’t just a cycle-accurate emulator: it’s a cycle-accurate emulator that’s based directly on a die-shot of a de-capped YM3438. Essentially, Nuked OPN2 is a conversion of the YM3438’s circuitry to C. With this, there are no mysteries about how the YM3438 works: everything is documented in such a way that you can verify it just by running it. What other form of documentation doesn’t just say ‘dude, trust me’, but ‘here: I’ll prove it’?

While I do plan on using Nemesis’s documentation to implement SSG-EG, Nuked OPN2 can be used to implement any details that aren’t explained elsewhere.

Surprisingly, some parts of my YM2612 emulator happen to function exactly how a real YM2612 does, in contrary to how documentation suggests they should function. For example, the note octave is encoded as a number between 0 and 7, which expresses the following behaviour:

OctaveBehaviour
0Divide note frequency by 2.
1Leave note frequency as-is.
2Multiply frequency by 2.
3Multiple frequency by 4.
4Multiple frequency by 8.
5Multiple frequency by 16.
6Multiple frequency by 32.
7Multiple frequency by 64.

Both the manuals and Nemesis’s notes suggest that this should be implemented as a left-shift by the number of the octave minus 1, with special-case logic for octave 0 that does a right-shift by 1 instead. In my emulator, however, I just left-shift by the number of the octave, and perform a single right-shift afterwards, creating the same result with more-efficient code. According to Nuked OPN2, this is exactly what a real YM3438 does as well.

As much as I’d like to continue rambling about this stuff, there’s only so much to write about, and only so many hours in the day. As usual, you can find clownmdemu’s source code in its Git repo. Hopefully I’ll have more progress on my emulator to talk about soon. In the meantime, I’ll leave you with a video of a fun bug:

The Mega Drive’s Interlaced Video Output

Today, I looked into adding support for the Mega Drive’s interlaced video output to my emulator. It didn’t go how I planned, and I eventually realised that it was worthless to pursue. Still, I think this makes for a fun story.

First, I should probably go over the basics of how old CRT TVs would display. Basically, the screen is split up into 480 lines, but they are not all drawn at once. Rather, the even lines are done first, and the odd lines are done on the next frame, or vice versa. You could think of it as the TV rendering 240 lines at 60FPS or 480 lines at 30FPS.

With that in mind, we can begin to understand the Mega Drive’s interlacing. It has three modes:

  • Mode 0, which is the non-interlaced mode. This mode is plain 240p. It uses a trick to prevent the odd lines from ever being drawn, meaning that two sets of 240 even lines are drawn instead.
  • Mode 1. This mode is similar to mode 0, but it does not prevent odd lines from being drawn. The odd lines will display the exact same graphics as the even lines. The official ‘Genesis Software Manual’ developer document warns that this mode will result in severe vertical blurring.
  • Mode 2. This mode is very interesting: it is like mode 1, except the odd lines will not display the same graphics as the even lines. Basically, the Mega Drive’s vertical resolution will double, being 320×480 or 256×480. However, because only even or odd lines are displayed in a single frame, this means that the image will be downsampled back down to 320×240/256×240 when displayed.

Regardless of the interlace mode, it is always 240 lines that are output in a frame. I wanted to implement this in my emulator, to perfectly replicate the rendering of interlace mode 2, which until now has been rendering at the native internal resolution of 320×480.

However… always rendering a 320×240 image wouldn’t be correct. After all, old TVs have 480 lines, not 240. Since mode 0 disables odd lines, there should theoretically be empty black spaces between each line, creating a ‘scanline’ effect.

To recreate this, I set about making my emulator always render a 320×480 image, and having the emulated console simply skip lines. This is accurate to how a real Mega Drive displays on a real CRT TV. However, having leftover lines from a previous frame mixed in with lines from the current frame produced an ugly ‘comb’ effect:

The effect in Mode 1.

Because of this, and knowing that the lines of a CRT fade when they’re skipped, I decided to simply make the skipped scanlines black.

In Mode 1 and Mode 2, the lines which are black and the lines which are actually drawn alternate every frame, causing the screen to ‘jitter’. It looked pretty cool and authentic to how I remember the two modes looking on a real Mega Drive. I ended up doing some extended playtesting with this, to soak in the nice scanline effect. When I was done, I closed the emulator and- oh dear:

My monitor had severe image persistence:

A recreation, because there’s no way in hell that I’m doing it again.

It turns out that this is a terrible idea: apparently, much like the waveform of a sound, the electricity going through an LCD pixel must go up and down, positive and negative. It alternates every frame, just like how a wave alternates every sample. By rapidly flickering a pixel between colour and no colour every frame, however, the pixel never gets to go negative, always remaining positive, or vice versa. The result is that the pixel becomes a capacitor, storing charge and refusing to release it, causing it to display colour when it shouldn’t.

Thankfully, this appears to be temporary, as the built-up charge will dissipate naturally. Still, this was bloody terrifying: I thought I’d just ruined the display of my fancy new laptop.

Clearly, this interlacing emulation had to go. The last thing I need is a wave of heated bug reports from furious users who think their monitors have been destroyed.

Thinking about it, I realised that there’s no point to emulating the Mega Drive’s interlacing in the first place besides authenticity: in every use-case, interlacing is either an annoying side-effect or an irrelevant technical detail.

  • Sonic 2 uses mode 2 for its split screen multiplayer, taking advantage of the doubled vertical resolution. The interlacing does nothing but halve the game’s vertical resolution and introduce an ugly jitter effect.
  • Mode 2 could be used to display a static 320×480 image, in which case the interlacing wouldn’t be visible at all.
  • Mode 2 could be used for supporting 3D glasses, in which case the interlacing would cause each eye to see 30FPS instead of 60FPS.

As a result of this, interlacing emulation is completely pointless: my emulator has been doing ‘the right thing’ the entire time by just not emulating it. Better yet, my emulator goes above and beyond by rendering mode 2 at its native 320×480 resolution instead of halving its resolution back down to 320×240, meaning that it renders mode 2 better than a real Mega Drive.

It’s… beautiful!

I suppose this is where the story ends. Interlacing has been a strange thing to research, but I’m glad that I now understand more about how the Mega Drive delivers images to the display. I just wish that I didn’t nearly suffer heart failure in the process.