Saturday, September 19, 2020

Zoom like it's 1988 with the Mitsubishi VisiTel

Well here we are in 2020, remote work is finally here and after decades of marketing by every major telecommunication company, my mum knows how to make a video call. Unfortunately this sudden enthusiasm caused unprecedented shortages of webcams. We can't let this slow us down as there are important calls to be made and I have the solution. Introducing the Mitsubishi VisiTel!


This definitely was The Future™, and for the low low price of $399 how could this not catch on immediately?! Popular Mechanics Feb 1988 carried a full page description with a surprising amount of technical detail. Meanwhile the hosts of Gadget Guru on WSMV asked the most profound question about video calling within seconds of being introduced to it "Can I still use the phone without this?".

Well it has taken almost 30 years and a global pandemic, but we finally have accepted the inevitable fact that we are going to have to take a shower before talking to our colleagues on the phone. Video telephony is here in force. 

I am not the first person to think there is something specially kitsch about the VisiTel. People have been tinkering around with these for a while and despite not having motion video, it is one of those products which was clearly a feat of engineering and massively ahead of its time.

Japhy Riddle's video is what got me really interested in the VisiTel. He pointed out that by slowing down or speeding up the recording of an image sent to the VisiTel the image skewed on the screen. Immediately I knew this meant the device used a simple AM modulation scheme. With that knowledge in hand, I jumped on eBay and picked up a unit to play with. My goal: to make this glorious time capsule of late 80s tech work with 2020's work from home platform of choice, Zoom, so I can chat with my colleagues. Moreover I want to do this without modifying the hardware because these things are just too cool to butcher.

Step 1: Electrical interfacing

The VisiTel is extremely easy to set up. On the back of the device is one long cable with a Y joint near the end (more on this in a minute) and two plugs. One plug is a 2.1mm DC barrel jack with which the device is powered. It expects 15 volts center positive. Oddly they chose to write the number 15 on the device in a font that makes it appear to read "IS Voltage". The second plug is an RJ11 connector with two of the pins populated. This is the standard layout for a telephone handset. If you remember corded phones you might also remember that the receiver was connected to the dialy bit (technical term) by means of an RJ9 connector and the dialer to the wall with an RJ11 connecter. The intent being that you could plug other accessories into your dialer, or replace the receiver | cable if something went wrong. Not that the latter problem would happen often because telephone handsets used a special kind of wire which is extra flexy.


Returning to the back of the VisiTel, along with the cable there is also a RJ11 socket. After scratching my head for a while the obvious dawned on me. The VisiTel is designed to man in the middle your telephone so that it can listen or transmit whilst you make calls. That leads to an obvious problem, as the VisiTel unit is sharing the wiring in parallel with the telephone the user is going to hear images being received and transmitted. Well the engineers at Mitsubishi thought of that and whenever you receive an image or transmit one there is a loud click from the device as a relay disconnects the handset. For our application we aren't interested in listening to the audio signal, so we aren't going to plug anything into that RJ11 socket on the back of the device. This makes the isolation relay superfluous meaning it could be removed for clickless image sending. 

So how do we communicate with this device? My computer for one doesn't have an RJ11 socket on it. No worries, apparently there is a market for adaptors to connect your RJ9 telephone handsets to smartphones. Or you can simply snip off the RJ11 connector and solder on a mono 3.5mm jack. Note: I originally thought the VisiTel had an RJ9 connector not an RJ11.

With your 3.5mm jack in place connecting couldn't be simpler. Get a USB sound adaptor with separate microphone and headphone output jacks (not TRRS) and plug the VisiTel into the appropriate socket.

I said that I'd get back to the Y joint on that cable. On my device the wires in that joint had failed, not a nice solid "it's broken" failure, but one of those nasty types of failure where everything works when you go to bed and the next morning it doesn't. If only the VisiTel team had used some of that special extra flexible wire. This intermittent failure cost me days of staring blankly at screens, fruitlessly looking for thermal issues, and triple checking my sound card settings. Eventually I discovered the failure when in frustration I picked the device up during a test and saw a crackle of static on my recording. If there is a lesson about old hardware to be learned here it might be to perform a "jostle" test once you have everything working and see if anything suddenly doesn't. 


Step 2: Understand the protocol

Dodgy wires aside now that we have the the hardware interfaced with the computer it's time to dig into what the protocol looks like. As we can tell from the videos and articles about the VisiTel, the image is encoded using AM modulation on an audio frequency carrier. AM stands for Amplitude Modulation and describes the process of encoding information in the amplitude (volume) of a waveform. Normally it is used to encode audio onto radio waves, however there is no reason you can't encode images on to radio waves or even images onto audio waves. @nabanita.sarkar has a great description of this process with more detail and some Python example code for people who learn by doing.

Knowing the data was AM encoded in the audio domain, the first step was to record the transcription of some sample images with known properties. To capture these I used my go application for audio processing Audacity.


As you can see above I wanted to validate my understanding of how data was encoded, so I sent some very simple sample images from the VisiTel. These were made by holding colored pieces of card in front of the camera one piece black and one piece white.

The input image for the trace labeled blackWhiteV looked roughly like this.


Let us examine the result a little closer



warning loud 1750Hz tone

As you can see, there appears to be some kind of initialization / header / preamble at the beginning of each image @17.55 to @17.80 in this example. This allows the receiving device to detect that an image is about to be sent and determine the maximum amplitude of the signal. This maximum amplitude is then used as a scale factor during image decoding so that a lossy quiet phone line doesn't make the images being transferred lose contrast or brightness. Note: the signal you can see before @17.55 is simply noise on the line and isn't important for the operation of the VisiTel protocol. 

To check that this header establishes an image that is about to be sent, I played just this small section of audio to the VisiTel and was rewarded with the sound of the relay clicking on. Clearly this triggers something. However, when sending just the 30 milliseconds of header the VisiTel seems to detect that there is no image data being sent and turns the relay back off a few milliseconds after the header ends. Playing the header and the first few lines of the image causes the VisiTel to start drawing an image on the screen. If you stop the audio while an image is being drawn the VisiTel continues receiving until the image buffer is full. This once again shows that once the VisiTel is reading pixels it doesn't rely on an external oscillator to know when to sample data. It has an internal clock source telling it when to inspect the waveform.

As the header simply establishes that a connection is being made and doesn't need to change along with the image, I didn't feel the need to delve much deeper into how it works. Knowing that the header establishes a connection and roughly what it looks like is enough for our purposes.

Moving on to the most difficult part, the pixel format. The first step was to figure out how the pixel data is modulated onto the carrier wave. My first hunch is that each complete wave represents one pixel. I checked this by counting the number of waves between repeats of my test pattern. Indeed it matches what the old advertisements say. 96x96 pixels with a few lines drawn before the visible image starts.

With this we know the amplitude of each wave is sampled and drawn into one pixel of the digital image buffer contained in the VisiTel. We know from the marketing material that pixels have 16 shades of grey, however as I am evaluating pixels from an 'analog' waveform I felt there was no need to posterize them in my decoding or encoding. 

Interestingly the pixel brightnesses are inverted before modulation so the largest waves represent the darkest pixels, they are also mirrored left to right to give a mirror image. I'd love to know any reader theories on why the brightness inversion may have been done. My suspicion is that the human eye is more forgiving of a random black pixel than a random white pixel and line noise is something which the VisiTel surely had to contend with a lot back in 1988. 


There is however an exception to this encoding scheme which eluded me for weeks. In this inverted encoding scheme, completely white pixels should be represented as quiescence, no carrier wave should appear on the output at all. The VisiTel designers seem to have not liked this idea. Instead to encode a completely white pixel the carrier wave is offset by 1/4 a wavelength putting it completely out of phase with the wave in the "default" state. This way the carrier wave can still be sent down the line. At the receiving end the device receiving this signal is still synchronized with the "default" waveform and will sample the amplitude right as the wave passes through the 0 point correctly giving a white pixel. As I previously mentioned, if the audio is cut off halfway through sending an image the VisiTel continues drawing white pixels until the frame buffer is full; so clearly silence can be read as white without the presence of the carrier wave. I have no idea why the designers decided to complicate their modulation scheme with this out of phase mode. It seems like a huge amount of effort to go to for little to no gain.

Not having known about this phase shift modulation, my initial approach to decoding the images involved looking for the maxima of each wave and drawing that as a pixel. Line steps were emitted after 96 * the number of samples per wave. This lead to some lines having slightly too many or two few pixels. Additionally this crude approach was very sensitive to noise as small spikes in the wave could cause additional pixels to be output. It was however quite simple to implement and this simple implementation was able to decode images without having to synchronize clocks. The results unfortunately just weren't usable.

In order to decode images more accurately, we have to do what the VisiTel does and synchronize with the signal during the header and then continue sampling at regular intervals after this. As you may expect, this requires extremely precise timing. For my unit at least when recording at 44100Hz there are 25.23158 samples per wave (meaning the carrier wave is at 1747.80968929Hz). So each time we read a pixel we set the index of the next target 25.23158 pixels further into the audio buffer. As we only have audio samples at integer offsets, we simply round to the nearest sample and take that one. The key point is we don't allow this rounding error to accumulate as the sampling position would quickly get out of phase with the wave. Getting just a little out of phase results in some intense artifacts as shown below.


Thankfully this sample per wave value appears to be very stable. It does not change as the unit warms up which I had been concerned would present an issue. This invariance allowed me to hardcode a value into the decoding logic. Ideally the samples per wave value would be derived from the header data but I found there are not enough samples in order to reach the 5 decimal points of precision with my implementation and the hard coded value works robustly. With this more accurate implementation line steps are simply emitted after the draw pixel function has been called 96 times.

Up until now I have been working with pre recorded blocks of audio decoded from WAV files. In order to operate live and decode images as they are sent to an audio interface, the decoder needs to be able to detect the header and calculate where the image data starts. Looking at the header it has three distinct phases which are easy to detect. 


Carrier wave --> "Silence" --> Carrier wave

In order to find these I implemented a simple FFT based detector and a finite state machine. First the waveform is transformed into the frequency domain. The detector then inspects each block of audio until it finds the carrier a strong signal at 1747Hz. Blocks are then inspected until the signal disappears and returns again. At this point the start of a transmission has been found. From here a simple static offset is used to find the beginning of the image data and the image decoding routines already discussed are utilized. Once an image has been decoded it is displayed and the finite state machine is reset in order to listen for the next transmission.

Step 3: Present the device as a webcam

Now that we have managed to decode the images there is one final piece to this puzzle, presenting the decoded images as video frames to our video conferencing software. On linux this is surprisingly easy. Video input is abstracted using the V4L2 interface, unfortunately this still happens in kernel space. To avoid the complications of making a kernel module you can use V4L2 Loopback. This module presents itself as both an input and an output device. Whatever pixels you publish to it as output it makes available as input for other programs like Zoom. There are even a few Python packages which further abstract this to play nicely with OpenCV images and NumPy. Of those packages I have used pyfakewebcam for this project. The interface couldn't be simpler. To setup the virtual webcam you do the following:

import pyfakewebcam

self.camera = pyfakewebcam.FakeWebcam(self.v4l2_device, 640, 480) 

 and whenever a new frame is available :

self.camera.schedule_frame(output)

With that simple addition to the decoding program success. Zoom can now receive images from the VisiTel and we can video chat like it's 1988.

Going forward I'd love to setup a linux Direct Rendering Manager driver for this so that I can output images onto the VisiTel screen as well as receiving them. For now having the video capture setup and working in Zoom is enough for me to call this project a success. 

I hope you enjoyed that write up. Check out the code on GitHub and please let me know if you setup your own VisiTel webcam https://github.com/Bostwickenator/VisiTel

Thursday, March 1, 2018

Reverse Engineering My Head

A few years ago I got my teeth imaged at the dentist. This resulted in two things. My wisdom teeth getting removed and my possession of a medical image viewing tool from Anatomage. This tool lets you view a single dental cone beam CT file. The tool ships as a single exe file with a spiffy icon and everything.

I was very impressed with the volume visualizations that Anatomage's viewer tool could produce but I wanted more. Not being able to read a file from your own health record with an open source tool didn't strike me as right. So started hacking away at the problem. The first breakthrough is that the Invivo viewer does not memory map part of it's own file directly with the volume data. Instead it extracts that data into your C:\ProgramData\ directory as Patient.inv and then loads it from there. A simple copy paste and we have a version of that file to play with.

Now that we have some data let's figure out what we can about the format. First off I opened Patient.inv up in Visual Studio Code which is generally a bad idea for a 70MB file but this showed me that the file started with XML. In fact the entire BLOB of the volume data is included inside the XML. There are also offset values stored in the XML. Presumably the offset values are used to skip the XML parser over binary data to avoid it throwing a fit.

More good news the XML tells us about the images, their sizes and encodings. However they were all in one large BLOB not one per image. The format specified for the images was J2K. This is a stream of JPEG 2000 data without the normal file container around it. I expected that this would mean that I would have to provide metadata to a JPEG 2000 decoder myself however it turns out that the image size is included in the bytestream itself. The only operation needed after some trial and error is to look for the magic bytes at the beginning of each image and chop the file up on those boundaries. This is not a very robust approach and your mileage may vary. With that said for my file it worked. The resulting JPEG 2000 files could then be fed to a decoder. Reading through one of the image files with a hex editor found that they were produced by the JasPer toolkit. Because of this I decided to choose ImageMagick to decode the images as ImageMagick included JasPer. It turns out that this information is out of date and ImageMagick has swapped to a new library. Luckily this new library handles the JasPer produced files just fine.

I've created a node script to automate the process of extracting the layers into PNG files which you can grab from GitHub here.

With that done we get exciting images like the one below.

A raw PNG file from the stack (exciting right?!)

These images have extremely low contrast but there is data there so now on to creating a 3D model that we can use. At first I was expecting to have to voxalize the data myself. However the Brazilian government came to the rescue with InVesalius 3.1 an amazing tool for processing medical data. We can simply load our PNG files into InVesalius and use it to produce a STL file. From there we can do just about anything we want with the model.



Spooky Christmas ornament anyone?

Monday, July 3, 2017

Lomo To Gif

Lomo To Gif is a little tool that I made to process Octomat photos into little gif flipbook like animations. It's super simple and super meh code but it's available on GitHub here.

Use it in practice here.


Thursday, May 11, 2017

Sigma 150-600 Sport Comfort Handle

The Sigma 150-600 Sport lens is a beast at 2.8kg. This makes it very hard to handle handheld and indeed most people will use this lens on a monopod or tripod. I'm more of a run and gun photographer though so I took matters into my own hands, literally. Introducing the Sigma 150-600 Sport Comfort Handle. Now built for humans, you know the things with fingers.




This thing is a pretty large part which would be expensive to print completely solid. If you want to do that however you can buy one on Shapeways here.

To make a small production run I decided to use a cheap FDM print with low infill to make a mold. This part would not be strong enough to be functional but does a great job of being a positive of the shape. The mold was constructed out of silicone rubber and then a grip was cast with fiberglass reinforced epoxy. The resulting part is EXTREMELY strong and after a coat of paint looks almost exactly like the renders.


Tuesday, December 20, 2016

Shopping for Frameworks

There is nothing more nerve wracking in a software project than choosing which horse to back, which basket to put your eggs in, and which metaphors to mix. Shopping for frameworks (and it truly is shopping, you are going to spend money) is much like regular shopping. If you get in and out of the store in 10 minutes you've probably not thought though how that jacket is going to match your wardrobe and if you got something cheap you'd better be prepared to stitch the buttons back on. So with no further ado I present my humble advice on the titular topic.

Alex's Framework Wisdom

  1. There is no free lunch. You will pay for free frameworks with ongoing development cost working around their oddities and/or contributing upstream. That is to say less CapEx means more OpEx. In the case of subscription licences it's purely OpEx as salaries or as licenses, either way you'll pay.
  2. The quality of the vendor is as important as the framework. You are piling your developer hours (your money) on top of these frameworks. The vendor has the ability to kill support at any time and transmute your beautiful codebase into a pile of technical debt with the snap of their fingers. If that worries you, get a contract that stipulates they won't.
  3. New isn't automatically better. New languages, new toolchains, new companies. On the surface these may be exciting and they may give you a market edge in some cases. Each one of them is however a double edged sword. These are a lot of companies out there using tried and true technologies because they work and they are known quantities. You should be one of these companies unless there is a pressing market reason that you cannot be. Using new things will make it harder to hire staff. The staff you do hire will create technical debt as they learn what works and what doesn't.
  4. Invest in training. When you choose a framework get your development team together to learn about it. It is crucial that this happens as a group. Received wisdom does not work well in the software industry. If all your team learns about something at once it will create homogeneity and accepted best practices. Handing a rulebook to developers will not work well they'll either ignore it or they'll all make disparate readings of it.

How to avoid choosing poorly


Congratulations you think you've found just the right combination of lego pieces to solve your business case. Not so fast! Before committing you (and by this I mean not you, I mean your staff) need to take it for a test drive.
Pooooorly
  • Apply at minimum your own standards. What does the framework's source look like? Is it public? If it's public review it. Makes sure it would pass the standards you set for your own code. If possible try fixing an open bug and getting it accepted upstream. If the framework is closed source you are especially dependent on the vendor. Look at the release notes and review the bug tracker. Make sure things are dealt with in a timely fashion. 
  • Consider the vendor's bus factor (how many of their team could get hit by a bus without impacting their ability to deliver). If it's a small number take this in mind. Will their business model keep them around at least as long as your product?
  • Check out the debugger. Write something with a bug in it and hand it to a second developer. Their job is to document how the debugging tooling works while they find the bug. Did it run right out of the box or did they have to spend an hour setting up an environment? Does it hit every breakpoint? Are values presented in a useful way? Can it attach to a running process? Plus any other metrics your team might find useful here.
  • Eyeball the UI. If it has a UI component take it to your UI and branding team. The default themes will not satisfy your business requirements. Get the UI team to specify the changes that will be required and then attempt to implement them. If this is difficult start worrying.
  • Inspect the toolchain. Does it integrate with your CI setup? Can you make repeatable builds of your code over time. If not what would you have to do to internalize any build dependencies? Does it build in 1 second or 1 hour?
  • Figure out how to test the end product. Does the framework have interfaces to test tools? Can you write a unit test, a functional test, and an integration test? Are the tests stable? If the framework has a UI component can you automate the UI testing or will you be paying people to click buttons?
  • Consider reuse. Is the code you write in the framework coupled to it? Will you be able to take code from it to other places in your business? Will you be able to bring code into it?
These are just some high level acid test questions you should be asking yourself when you are looking at a framework. Please take into consideration any other factors important to your staff and your business case.

Case study; Death by 1000 cuts (or Appcelerator Titanium)

The small issues hurt. It is not the big problems that make you regret your choices. It is the day to day grind. The subtle bug that no one suspected and no one documented. It is the lost hours in every week that your staff could have been doing something that generates revenue. See wisdom #1. Trivial hindrances also create morale issues. Monotonous and frustrating work will demoralize a group of people who typically pride themselves on doing new, interesting, and challenging things. Nothing is less rewarding that realizing you lost your day to a missing semicolon.

Below are replies to some issues I have personally raised against a particularly hodgepodge framework known as Appcelerator Titanium. They are prime examples of the kind of bugs that will decimate your productivity should they occur with regularity. Also note these aren't even bugs with the framework these are bugs with the associated toolchain, remember to investigate the whole package you are buying into.


https://github.com/appcelerator/titanium/issues/243

It's a little complicated. The available Titanium SDK releases depends on if you're using ti sdk vs appc ti sdk. appc ti sdk shows the latest and greatest SDKs. ti sdk shows the latest release before the switch from the Titanium CLI to the Appcelerator CLI.

The list of releases for the Titanium CLI is no longer automatically updated, but rather it's a manual process. Because we haven't determined which past releases should be available to ti sdk, we simply haven't updated the release list.

However, the list of releases that appc ti sdk reference is automatically updated and current, so I advise you just use appc ti sdk for now.

Time lost: 1 hour


https://github.com/appcelerator/titanium/issues/244

Smart quotes are not valid in XML. I wonder what the XML parser library is doing. I wouldn't be surprised if the platform was being parsed as "“iphone”". I don't think we should naively replace all smart quotes with ascii quotes and properly scrubbing the smart quotes is more effort than it's worth. In other words, I don't think we're going to improve this.

Time lost: Half day



Trivial issues like the ones above and dozens more will slow your development process and make your project management unpredictable. This has a real fiscal impact. If a developer has a high chance of running into a framework bug or a hidden issue they will not be able to provide accurate estimates of effort to management. Without accurate estimates you will either miss release dates and stress your employees trying to make back time or have long release cycles and lose agility. Neither of these is going to help you make money.

Conclusions 


You never know everything about the ship you are getting on board. With that said a few sanity checks may save you months of rowing back to shore or having to listen to an 8 piece band play you out. Spend the time up front to get an idea of what is going on, trust the people who will be doing the day to day work, and remember nothing is free.

Monday, December 19, 2016

Oven Fresh Nexus 5X

Woe and calamity. Yes, I got the dreaded Nexus 5X bootloop issue a couple of weeks back. Having thrown my phone in the freezer (inside a zip-lock bag) I got a boot out of it. That points to the issue being a bad solder joint. Maybe not in the general case but in at least mine. If your device passes a similar test you can try reflow soldering your device by following along below. Warning: this process will void any warranty you have, only do this as a method of last resort if Google or LG will not service your phone. You may also want to consider taking it to a professional electronics repair shop for reflow work.

With that warning out of the way and the freezer test having roused my suspicions I pulled the case off and took the motherboard out. iFixit has a great guide here for that process.

Time to cook!
With the motherboard out I inspected the teardown photos also provided by iFixit and determined which side of the board the packages (chips/black squares) I was interested in were on. The suspect package is the RAM. A Samsung K3QF3F30BM-QGCF with CPU the Qualcomm Snapdragon 808 conveniently located directly beneath it. The constant flexing of the phone in my pocket has most likely cracked one of the tiny BGA solder connectors off the motherboard underneath one of those packages.

Now that you have your motherboard separated from the phone's chassis preheat your oven to 195 degrees Celsius or 390 Fahrenheit whichever is appropriate for your current locale setting. While your oven is heating take out some aluminum foil and crush it into a ball. Un-crumple it so that there is still a rough texture as shown. This is going to help limit the heat transmission to the underside of the motherboard which we don't want to reflow. Place the motherboard on the foil and press it down. At this point attempt to get the board laying as flat as possible. You don't want parts sliding down the board on an angle if you overcook. Place the foil and motherboard on an oven safe cooking surface, a ceramic casserole dish will do nicely so long as it has a flat bottom, I used the spill tray from a waffle iron. So pick whatever looks good to you. Season with a twist of lemon and cracked pepper.

Now that your oven is up to temperature place the dish into the oven and start a timer for 6 minutes and 30 seconds. Wait anxiously. Note: If your oven is fan forced you may need to make an adjustment to the cooking time.

Remove your motherboard and let it rest until completely cool. Inspect the board for any physical damage, if you've really messed it up be extremely careful with reconnecting the battery. Who knows what you might have shorted out.

Reassemble your phone following the guide from iFixit and give that puppy a charge. Power it up and hope for the best!

Friday, July 15, 2016

Minolta 7000 Battery Upgrade MK2

So my last post was about building a replacement battery pack for the Minolta 7000. Since then I had a much higher quality 3D print made of the housing with some lessons learnt from the first version. This is the result.


This part was ordered from Shapeways and I'm quickly learning that they are so far ahead in this game that you should just go to them straight away. Incidentally you can buy this part from there if you want to build this.

Need some help building that? Well tough luck! I present to you the world's worst build log. Seriously everything that could go wrong with this video went wrong. I don't think I even build a single thing on camera.




So I might not be a videographer but you have to admit the end product looks pretty nice.