Thursday, March 1, 2018

Reverse Engineering My Head

A few years ago I got my teeth imaged at the dentist. This resulted in two things. My wisdom teeth getting removed and my possession of a medical image viewing tool from Anatomage. This tool lets you view a single dental cone beam CT file. The tool ships as a single exe file with a spiffy icon and everything.

I was very impressed with the volume visualizations that Anatomage's viewer tool could produce but I wanted more. Not being able to read a file from your own health record with an open source tool didn't strike me as right. So started hacking away at the problem. The first breakthrough is that the Invivo viewer does not memory map part of it's own file directly with the volume data. Instead it extracts that data into your C:\ProgramData\ directory as Patient.inv and then loads it from there. A simple copy paste and we have a version of that file to play with.

Now that we have some data let's figure out what we can about the format. First off I opened Patient.inv up in Visual Studio Code which is generally a bad idea for a 70MB file but this showed me that the file started with XML. In fact the entire BLOB of the volume data is included inside the XML. There are also offset values stored in the XML. Presumably the offset values are used to skip the XML parser over binary data to avoid it throwing a fit.

More good news the XML tells us about the images, their sizes and encodings. However they were all in one large BLOB not one per image. The format specified for the images was J2K. This is a stream of JPEG 2000 data without the normal file container around it. I expected that this would mean that I would have to provide metadata to a JPEG 2000 decoder myself however it turns out that the image size is included in the bytestream itself. The only operation needed after some trial and error is to look for the magic bytes at the beginning of each image and chop the file up on those boundaries. This is not a very robust approach and your mileage may vary. With that said for my file it worked. The resulting JPEG 2000 files could then be fed to a decoder. Reading through one of the image files with a hex editor found that they were produced by the JasPer toolkit. Because of this I decided to choose ImageMagick to decode the images as ImageMagick included JasPer. It turns out that this information is out of date and ImageMagick has swapped to a new library. Luckily this new library handles the JasPer produced files just fine.

I've created a node script to automate the process of extracting the layers into PNG files which you can grab from GitHub here.

With that done we get exciting images like the one below.

A raw PNG file from the stack (exciting right?!)

These images have extremely low contrast but there is data there so now on to creating a 3D model that we can use. At first I was expecting to have to voxalize the data myself. However the Brazilian government came to the rescue with InVesalius 3.1 an amazing tool for processing medical data. We can simply load our PNG files into InVesalius and use it to produce a STL file. From there we can do just about anything we want with the model.

Spooky Christmas ornament anyone?