Recently at work I volunteered for a special project. The work has turned out to be lots of fun, like I expected. One day maybe I’ll figure out what types of programming related tasks I like the best. For now I know that this is fun.
I wanted to modify an image for a particular game. The game conveniently had an ‘images’ directory which contained all the images it used. However, the images in this directory had file extensions like: jpege, gife, pnge. I suspect that the ‘e’ stood for ‘encrypted’ or ‘encoded’ and I was right.
I opened one of the pnge files in vim. The first four bytes of the file were QOF. Since I had been looking at some other png files recently I quickly noticed that Q – 1, N + 1, F + 1 = PNG. Conveniently PNGs start with a well defined header. Not only that, but many of the values in the header have a limited range of valid values. Also, each ‘chunk’ of a PNG has a checksum. So if the data contained in a chunk is small enough, even if there are unknown values in the data and the checksum, swapping bits around for a while eventually produces a valid combination. I worked my way through the PNG header and quickly found that byte values were either -1 or +1 from what they should be. After decoding a bunch of values, I looked at what I had found so far, but didn’t see any pattern in the values.
I next moved to a second PNG, in order to see if/how the encoding differed. Interestingly, I found that the only place where the values I had decoded so far differed was the width/height bytes and the header checksum. This information proved to me that the encoding was at least based on the value of a byte and likely was unrelated to its position or other attributes. With a bit more experimentation (happily I had many files to work with) and comparisons I was able to prove that the encoding was based exclusively on the byte value. Which meant things were easy. Every time I determined the correct value for a byte I added it to a list of known transitions and as I worked, more and more values were known, making it easier to find values to mach checksums or checksums bytes to mach data.
At some point I went to bed, as I was falling asleep, I wondered to myself whether maybe the encoding was completely trivial and the differences were simply based on modulus. This speculation, it turns out, was correct. Sadly it took me a while to realize that. In the morning I plotted the values I had so far, but due to lack of data, and some erroneous data I didn’t see any pattern. So I pressed on, decoding chunks of PNGs that were small enough to easily match checksums and using all the files at my disposal. Eventually I had a basic PNG decoder and about 130 of the needed 256 translations calculated.
At this point I decided to graph my numbers again. I had since corrected erroneous numbers and gained quite a few. The data made it obvious that the pattern was simply odd even. 0=>1, 1=>0, 2=>3, 3=>2 and so on. I was able to reduce my script to about 15 lines that would encode or decode the file based on the file extension. Ta da!
Luckily there was no real encryption involved. I didn’t succeed at taking cryptography at school.