Years ago, I wrote an MD5 hasher. For some reason, I never gave it a proper release, instead only including a copy of it in one or two of my projects. That’s finally changed, and I figured that I’d mark the occasion by giving a recap of its history here. It’s a bit more complex than you’d expect.
I originally wrote my MD5 hasher as part of a university assignment. It was meant to be written in C#, but I preferred C, so I wrote it in that instead and converted it to C# after I had it fully tested and working.
This was simple enough: getting the hasher to produce the correct hashes was a bit of a nightmare due to parts of the specification being easily glossed-over, but the actual conversion to C# only had one mishap: right-shifting by 32 resulted in a right-shift by 0 instead. This is actually undefined behaviour in C, so I had to correct that to get consistent behaviour between the two languages.
It seems like this trick paid off, because I never caught any flak for the code not being written like ‘proper’ C# or anything like that.
After submitting my MD5 hasher, I forgot about it for months (or maybe years) until stumbling across it again and deciding to clean it up a little: I converted it to a single-header library (one of the first that I had ever made), and overhauled the API to be lower-level by allowing data to be streamed to it a chunk at a time instead of all at once.
Despite this, I didn’t release the new and improved hasher, and instead just placed it in a directory called ‘clownlibs’ which contained assorted small libraries of varying degrees of polish. I had considered releasing them all on GitHub in a single repository, a la stb, but I became paranoid about how it would be impossible to star a particular library, or have a submodule pull in one specific library (something that came to a head a few blog posts ago), so I ended up endlessly putting it off.
A long time later, I was overhauling the build systems of the various Sonic the Hedgehog disassemblies, converting them from Batch/Bash/Python to Lua. The disassemblies relied on being able to produce hashes of the assembled ROM image, and comparing them against a series of hashes to determine the ROM image’s accuracy. Previously, this had been done with Python, but, with Python being replaced with Lua, there was no longer a hasher built into the language’s standard library that could be relied on. I tried to source a Lua hasher online, but all of the ones that I could find were absurdly slow. In hindsight, this was probably because Lua had only recently introduced support for integers and bitwise operations like AND and OR, meaning that those hashers were instead simulating them using floating-point operations, which, frankly, blows my mind.
Not realising this at the time, I instead assumed that the problem was simply that neither Lua nor the hashers that I had tried were very fast. This made me remember my own MD5 hasher, which was optimised for performance and portability above all, and I figured that I should try porting that to Lua to see if it performed any better than the others.
The process of porting the hasher to Lua wasn’t too complicated, though Lua does have a number of syntax differences from C that had to be accounted for. Lua’s ubiquitous tables also meant that portions of the code had to be rewritten to be more natural to the language.
Before long, I had a working MD5 hasher written in Lua that performed wonderfully, annihilating the other hashers in terms of speed. This hasher would find its way into the disassemblies of Sonic 1, Sonic 2, and Sonic 3 & Knuckles.
My MD5 hasher came in handy once more as I was working on my Wii U port of Sonic Mania: that game’s built-in MD5 hasher was garbage, and I’d always wanted to test my MD5 hasher on a big-endian platform like the Wii U, so I swapped the two. My hasher integrated into the codebase pretty well, and even eliminated some thread-safety issues. As expected, it worked perfectly on the big-endian CPU, putting the prior hasher to shame. The game does a lot of hashing, so it was really putting my hasher to the test! It was also just so cool to see my software being leveraged by an actual game.
After that, my MD5 hasher returned to its slumber once more, until today: I happened to take a look in my ‘clownlibs’ directory, and noticed that my hasher was the only library in there which I hadn’t eventually released: clowncommon.h
and clownresampler.h
have their own GitHub repositories now, but clownmd5.h
still remained hidden. At last, I figured I’d put it off long enough, and finally created a GitHub repository for my hasher, years after first writing it.
Honestly, I didn’t expect to get so much mileage out of this library: it was just some university coursework, and yet it ended up being used by a bunch of different projects. Since it was so useful to me, it will hopefully be useful to others too. It’s licensed under the 0BSD licence, so there’s no reason not to go nuts with it!