A Succinct Dictionary.

0
542

For the past few years, I’ve been writing and rewriting user interface (UI) code over top of the same client-service logic that I wrote around five years ago.

First at coffee shops before and after work, and now full-time in-between whatever adventures come my way.

Since I’ve been full-time, I’ve coded and shipped a cross-platform 2D game engine for SpeedWords Arena and Diamond Blitz Arena and most recently, a cross-platform app engine for Sooner Or Later for Windows 10.

But UI code is not my favorite nor my forte.

It’s just a necessity.

Every once in a while, though, I get to write some interesting code.

I recently added another 100,000 words to my SpeedWords dictionary.

I’ve simply ignored the folks who like to throw an extra vowel into words like “colour” and “flavour” (apparently as some kind of antiquated homage to the French) for too long.

Time to bury the hatchet, drink a proper cup of tea, and embrace my roots.

Back when I originally wrote SpeedWords, I had planned to implement a succinct data structure that supported searching while the dictionary was compressed in memory.

It turned out to be unnecessary, since a simple hashset structure wasn’t a problem for even low-end devices to handle.

But, with an extra 100,000 words in the dictionary, I finally had an excuse to write that code!

I looked into a number of algorithms and concluded that the one I had dreamed up back then was still the best option.

Briefly, the idea is to Huffman encode the individual words, organize them into arrays by encoded byte length, and sort the encoded values within each array.

To execute a search, Huffman encode the search term and do a binary search in the array matching the byte length of the encoded search term.

The search can be done within the compressed structure without decompressing or allocating any memory, except for the one-time allocation of a reusable buffer for encoding the search term.

Yes, there are algorithms that would give me better compression and performance, but in my opinion this was the right trade-off between complexity, compression, and performance.

The code is fairly simple and maintainable, the new dictionary takes less memory than the old one, and there is no perceivable difference in game performance.

Now if I could just find someone passionate about all of the tedious UI code so I could write this kind of code more often!

-D

Leave a Reply