Engineering Series: Keeping Upverter Up

We’ve been pretty cagey in the past about a lot of our engineering efforts at Upverter.  Today, we want to start lifting the veil a bit and talk about some of the things we’ve done under the hood to keep the Upverter platform stable, despite huge feature pushes.

Stability starts in culture.  We enforce a pretty stringent engineering culture, augmented by a handful of software systems: all code changes get (quite brutally) code reviewed by two other engineers using our custom-modded version of Rietveld, before buildbot runs it against a battery of tests and packages everything for deployment.

image

We generally avoid big deployments or “release management” since they basically act as risk capacitors.  Instead, everyone on the team can deploy any code that has passed code review and the test suite at any time – and they do.  We usually deploy several times a day.

Overwhelmingly, our stability stems from these kinds of ‘best practices’.  However, we have over 120,000 lines of Javascript running client-side on people’s browser, and that means there’s a huge surface area for client-side stability problems to arise, despite any amount of testing.  Furthermore, it can be a harrowing experience for a hardware engineer if their editor keeps running into errors.

The good news is that instead of having to wait for your software distributor to send you a new version, at Upverter we’re able to deploy fixes to our servers as soon as we see them happen.  To keep an eye on the stability of connected clients, we have a big dashboard in the main engineering space:

image

The dashboard displays all the key data for managing live errors on the site.  It shows us how many times the error has occurred (based on a hash of the stack trace), what users are affected, and what part of the code base is responsible.  We also see times of first and last occurrence.  Since our last revision, all new errors are automatically posted to our our task management tool, Asana, and the engineer tasked with the fix is sync’d back to the dash using the Asana API.

image

In order to track down complex bugs, we send a lot of data back with every error.  Client-side, we take advantage of Google Closure’s global error handler, and add a bunch of extra contextual information to the stacktrace, including the entire history of the client session: what tools were used, what shapes were placed, and when.  Additionally, users are given the opportunity to submit reproduction steps after their design reloads.

Here’s what our engineers see:

image

Finally, we can also browse the connection history to ensure there wasn’t any kind of network problem that contributed to the error:

image

We’re able to re-use the session history information to track how long into sessions errors typically occur, and whether there are significant disconnect/reconnects prior to the crash.

Once the problem is diagnosed, the patch goes into code review, and it’s wash-rinse-repeat!

Sure beats waiting for the next version.

Technical Setup of Upverter

Who doesn’t love tech porn? And what’s better than an inside look at the architecture and tools that power a startup? That’s right, nothing. So we thought, why not put up our own little behind the scenes, and try and share a little bit about how we do what we do?At Upverter, we’ve built the first ever web-based, the first ever collaborative, and the first ever community and reuse focused EDA tools. This meant re-thinking a lot of assumptions that went into building the existing tools. For example, clients and servers weren’t an afterthought, but instead a core part of our architecture. Collaboration was baked in from the start which also meant a whole new stack – borrowed heavily from guys like Google Wave, and Etherpad.


Apache-wave

On the front-end, our pride and joy is what we call the sketch tool. Its more or less where we have spent the bulk of our development time over the last year – a large compiled javascript application that uses long polling to communicate with the API and Design Servers. When we started out to move these tools to the web, we knew that we would be building a big Javascript app. But we didn’t quite know what the app itself would look like and our choice of tech for the app itself has changed quite a bit over time… more on this later!

On the back-end, we run a slew of servers. When it comes to our servers, there was a bit of a grand plan when we started, but in reality they all came about very organically. As we needed to solve new problems and fill voids, we built new servers into the architecture. As it stands right now, we have the following:

  • Front-end web servers, which serve most of our pages and community content;
  • API & Design servers, which do most of the heavy lifting and allow for collaboration;
  • DB servers, which hold the datums; and
  • Background workers, which handle our background processing and batch jobs.

So let’s talk tech…

  • We use a lot of Linux (ub) (arch), both on our development workstations and all over our servers.
  • We use Python on the server side; but when we started out we did take a serious look at using Node.js () and Javascript. But at the time both Node and javascript just wern’t ready yet… But things have come a tremendously long way, and we might have made a different choice if we were beginning today.
  • We use nginx (http://nginx.org/) for our reverse proxy, load balancing and SSL termination.
  • We use Flask (http://flask.pocoo.org/) (which is a like Sinatra) for our Community and Front-end web servers. We started with Django, but it was just too full blown and we found ourselves rewriting it enough that it made sense to step a rung lower.
  • We use Tornado () for our API and design servers. We chose Tornado because it is amazingly good at serving these type of requests at break neck speed.
  • We built our background workers on Node.js so that we can run copies of the javascript client in the cloud saving us a ton of code duplication.
  • We do our internal communication through ZMQ (www.zeromq.org) on top of Google Protocol Buffers
  • Our external communication is also done through our custom RPC javascript again mapped onto Protocol Buffers. http://code.google.com/apis/protocolbuffers/docs/overview.html/
  • We used MySQL () for both relational and KV data through a set of abstracted custom datastore procedures until very recently, when we switched our KV data over to Kyoto Tycoon ().
  • Our primary client the sketch tool is built in Javascript with the Google Closure Library () and Compiler ().
  • The client communicates with the servers via long polling through custom built RPC functions and server-side protocol buffers.
  • We draw the user interface with HTML5 and canvas (), through a custom drawing library which handles collisions and does damage based redrawing.
  • And we use soy templates for all of our DOM UI dialogs, prompts, pop-ups, etc.
  • We host on EC2 and handle our deployment through puppet master ().
  • Monitoring is done through a collection of OpsView/nagios, PingDom and Collectd.

Our development environment is very much a point of pride for us. We have a spent a lot of time making it possible for us to do some of the things we are trying to do from both the client and server sides and putting together a dev environment that allows our team to work efficiently within our architecture. We value testing, and we are fascists about clean and maintainable code.

  • We use git (obviously).
  • We have a headless Javascript unit test infrastructure built on top of QUnit () and Node.js
  • We have python unit tests built on top of nose ().
  • We run closure linting () and compiling set to the “CODE FACIEST” mode
  • We run a full suite of checks within buildbot () on every push to master
  • We also do code reviews on every push using Rietveld ().
  • We are 4-3-1 VIM vs. Text Edit vs. Text Mate.
  • We are 4-2-2 Linux vs. OSX vs. Windows 7.
  • We are 5-2-1 Android vs. iPhone vs. dumb phone.

If any of this sounds like we are on the right path, you should drop us a line. We are in Toronto, we’re solving very real-world, wicked problems, and we’re always hiring smart developers.

Ref: http://www.eflorenzano.com/blog/post/technology-behind-convore/
http://startupnorth.ca/category/under-the-hood/

Building Better Javascript (Through Testing)

Building Better Javascript (Through Testing)

Last week I talked about our Javascript tool chain. One piece of our chain that we’ve invested quite a bit of time in so far, is testing. Our reason for spending so much time: making tests easy to write, run, and manage increases the odds they’ll actually get used. We looked at a few different unit testing libraries when we were getting started. Our requirements for a unit testing library were:

  • able to be run from the browser and the command line
  • cleanly written (to make extension and modification easier)
  • not tied to a particular framework (didn’t depend on a specific JS library)
  • low on boilerplate

The best fit that we found was QUnit. It’s the unit testing library used by jQuery. It’s a very cleanly written library that already has some hooks for integration with browser automation tools. QUnit has a very minimalistic interface. Knowing the functions ’test’, ’equals’, and ’ok’ is all you need to get started. Here’s an example:

test('point in middle of granular region', function() {
  var map = new up.common.BucketMap(10);
  var point = new up.common.Point(5, 4);
  var keys = map.locationKeys_(point);
  equal(keys.length, 1, "Only one key for point locations");
  equal(keys[0], up.common.BucketMap.BucketKey_(0, 0));
});

QUnit on it’s own is great. But there are some very real points of friction that we found with our process and other tools. The big ones are:

  • Managing dependencies with tests
  • Dealing with overrides/ injecting mocks
  • Keeping test HTML in sync with the JS
  • Being able to run tests in one click/command

The first and second points come from our embracing Google’s Closure library. Being able to break up our application into proper namespaces and classes has been great for development. When you go to run a test however, you need to make sure that all of the right dependencies get loaded. The third issue crops up when you have tests that need to interact with the DOM. They may need certain elements or structures to be present to effectively test a unit of code. In most examples that we found a separate js and HTML get created. The HTML files is used to bootstrap the testing environment with the correct dependencies and DOM. The last issue is dealing with lowering the friction to test. If you have to type in 5 commands to run a test, chances are you’re not going to test very often. To solve all of these issues we’ve cooked up a QUnit/closure testing harness. It’s main features are:

  • test’s are a single js + HTML + list of custom dependencies
  • dependencies get dynamically loaded and overrides are used to allow for easy injection of mocks
  • can run multiple tests in a row, cleaning the environment and dependencies in between
  • has hooks for running from the command line as well as the browser
  • the entire set of unit tests can be kicked off with a single ‘scons jstest’

It still has some issues when the tests contain syntax errors, and the command line version still blows up occasionally, but it’s helping to make the Upverter testing process smoother.

Our Javascript Toolchain

Our Javascript Toolchain

Javascript is a language that imposes very little on the developer. The web is littered with Javascript that looks like it was taken from Enterprise Javascript. To bring some sanity to our lives we’ve assembled a set of tools to help in our development efforts. The Javascript tool chain here at Upverter is comprised of:

  • customized qunit and testing harness
  • SConstruct build files for linting, compiling, unit tests, documentation generation

You’ll notice the list is quite Google heavy. When we were starting in we had a discussion about how much of the Closure coolaid we wanted to drink. There were a few concerns we had:

  • It could marry us to the compiler and library. If something better came along it could be painful to switch.
  • The overhead from annotating our code would drive us insane

It has some pretty big positive aspects going for it too:

  • Type annotations encourage documentation
  • Static analysis lets us shake bugs out faster and helps us refactor more confidently
  • There isn’t anything remotely comparable out there for static analysis and compilation

In the end the pros outweighed the cons and we dove in. It’s worked out well so far. Having those tools in place is helping us write better, faster code. (although gjslint’s whiny, inflexibility has resulted in more threatening of an inanimate object than is probably healthy) Lastly there are a couple of resources that I’d like to share in case you haven’t ran into them. Javascript can be a tough language to search for. These links are great starting points to find good answers:

Javascript Development

While building Upverter we have had to slog through mountains of Javascript. This has been an eye opening experience into what Douglas Crockford describes as “The World’s Most Misunderstood Programming Language”. Over the next couple of weeks we’re going to give you a glimpse into Javascript development at Upverter and share some of the things we’ve learned about:

  • IDE’s, code style, static analysis…
  • Javasrcipt testing
  • profiling and performance tuning
  • “building” a project
  • debugging