Proper API Version Control - AKA: WTH Glowforge?


#1

Coming back from a much needed vacation, I fired up my device only to find the OpenGlow was no longer compatible with Glowforge’s cloud/GFUI.

I tracked the issue down to a change in the protocol between the device and the cloud related to the scanning step (where the head fires its laser diode to measure the material thickness).

PREVIOUS METHOD
Previously, this was accomplished by two commands from the cloud.

The first command took a picture with the laser diode on:

{"id":######,"action_type":"head_image","machine_serial":"DMQ-793","status":"ready","settings":{"HCil":1072693248,"HCae":0,"HCex":2047,"HCag":0,"HCga":100}}

The second command took a picture with the laser diode off:

{"id":######,"action_type":"head_image","machine_serial":"DMQ-793","status":"ready","settings":{"HCil":0,"HCae":0,"HCex":2047,"HCag":0,"HCga":100}}

(The value of HCil key is assumed to be the illumination level for the laser diode)

Each command required an exchange between the device and cloud where the device acknowledged the command, captured the image, uploaded the image, and then notified the cloud that the action was completed.

NEW METHOD
The scanning step is now accomplished with a single command:

{"id":######,"action_type":"lidar_image","machine_serial":"DMQ-793","status":"ready","settings":[{"HCil":1072693248,"HCae":0,"HCex":2047,"HCag":0,"HCga":100},{"HCil":0,"HCae":0,"HCex":2047,"HCag":0,"HCga":100}]}

The action_type has been changed to “lidar_image” and the settings key now contains an array with the settings for the two images.

The device now acknowledges the command, takes and uploads each picture in sequence, and then informs the cloud that the process is complete.

This shaves a couple seconds off of the whole process, and is likely the improvement in autofocus speed that was touted in this announcement*.
* Whether or not this small and unremarkable change warranted its own grand announcement is left to reader to ponder.

VERSION PROBLEMS
This brought to my attention an interesting issue.

I have been tracking the current firmware versions, and dutifully updating the Glowforge Utilities library to match. My assumption was that the cloud pays attention to the version number reported by the device, and my library needed this to maintain compatibility.

Strict versioning allows changes to the communication protocol to be rolled out seamlessly. For instance, a device that hasn’t yet updated its firmware wouldn’t know how to process the new ‘lidar_image’ command and would tank. In a well implemented client/server API, you wouldn’t send that command to a device running the old version, opting instead to send it the old command.

However, this is Glowforge. On a hunch, I sent a much earlier firmware version string, and still got the new ‘lidar_image’ command*. The command appeared in v1.6.0-35 (the most current version when this was posted), and would not be recognized by devices running earlier versions.
* I also sent garbage instead. It still worked, and still sent me the new command. They apparently are not looking at the version string, at all.

LID OPEN FIRMWARE UPGRADES
Recall when they started telling users that were having endless calibration problems that they should open the lid and then power the unit on?

About 3 weeks ago, I noticed that the ‘hunt’ command was operating differently. The hunt sequence homes the focal lens and is the first step in the calibration process. The command no longer included a motion url. I updated the Glowforge Utilities to reflect this.

Here’s the issue - if they changed the structure of the command they are sending to device, this would almost always need to be reflected in a firmware update. If the device is expecting the command one way, and gets it another, it is going to tank.

This is a WAG of what was happening with the ‘lid open’ upgrade issue: The device was expecting the command to be formatted one-way and, when it didn’t get it, it froze. This would prevent it from completing the firmware update because it puts a lock in place preventing those updates when it is executing motion commands (which the hunt sequence is).

By leaving the lid open, the device was prevented from running the motion command, hence preventing it from locking out the firmware update, therefore allowing it to update its firmware to a version that understood the new command format.

CONCLUSIONS

  1. Setting up your API with a versioning system is critical in a production system. If you don’t, you get users bitching because your product doesn’t work when you make changes to that API.
  2. This also demonstrates the perils of developing a compatible 3rd party firmware because the API isn’t published and can/will change at any moment.

#2

Great stuff that you are discovering here. I really hope GF are reviewing this blog to improve their dev process.


#3

I hope not. :wink: I’d like to think they have a hell of a lot more insight in to what is going on over there than I do.

I’m just shining what little light I can onto the odd things that happen with their service/product.