October 16th, 2015

Node.js for the real world

The real world, for real!

Also posted on http://www.technology-ebay.de/the-teams/mobile-de/blog/nodejs-real-world

A good starting point to prepare your product for the release might be to figure out how your specific real world actually looks like. More precisely this means to evaluate the requirements of your production environment. In contrast to development systems where speed and ease of development is important, values to consider in production are ease of deployment, reliability, observability and performance. Where does your application sit within your infrastructure? How does this affect your app? How does it get there? What traffic is expected? How does your app perform during traffic bursts? Do you want to use Node internal tools like clustering for scaling? How to achieve zero-downtime deployments? Do you have internal guidelines to follow? This is just an excerpt of questions we had to answer in close collaboration with our great siteops engineers.

The way to heaven

At mobile.de, we use our own tool called “Autodeploy“ to deploy and activate software artifacts. Autodeploy has a database which serves as an inventory of applications and their mapping to individual hosts. It is able to deploy to any environment and can be used with different platforms.

Autodeploy

As a Continuous Integration Server we’re happy to use the Open Source project Jenkins which Autodeploy is seamlessly integrated into. Jenkins takes care about our build as soon as we push code to Github. Among others the steps include:

Installing dependencies via npm and bower
Linting (eslint)
Running Tests (Mocha)
Code coverage analysis (Istanbul)
Code quality analysis(Sonar)
Packaging source files

When Jenkins successfully did its job the package is queued for deployment. Deployments that stem from feature - or development branches are automatically deployed to their postproduction/staging host. For production deployments we decided to not automatically deploy but to explicitly trigger the deployment. Currently this can happen via commandline or a button showing up in the Autodeploy UI.

We bootstrap environments by heavily using configuration stored in environment variables. We’re using dotenv to populate ENV via a .env file. This way we make sure each deployment environment gets its app version delivered properly configured.

Illustrating the V(olkswagen) curve – At some point during development we noticed the JavaScript unit tests running on Jenkins were failing sporadically. To keep CI flow we had to deactivate them temporarily. Unfortunately this awesome module that would have solved our problems did not exist back then. Making sure to synchronize Node versions running locally, on Jenkins and in (post)production environments helped to fix this issue.

Drop it when it’s hot

As part of reliability we had to think about what happens in case of our app would crash – intended or not intended. In particualar the usual way in Node.js to recover from programmer errors (bugs) is to let the app crash as you can’t do anything about them anyways. Have you tried turning it off and on again? It’s the fastest and most reliable way to restore app state in those cases. Important thing is to log and monitor such restarts to fix the reasons behind them as soon as possible. To kill these birds with one stone, we found the process manager PM2 to came in handy.

It allows to keep our app alive, as it automatically restarts app instances in the event of a crash. We also use it to start and stop the app when it’s been deployed. The tool even provides some basic monitoring and logging features which helps to gather app stdout – useful especially for dependencies that write logs to process.stdout which our own logger does not capture.

Take notes

Recently we had some problems with multiple PM2 god daemons running on one host after deploying a new version of our app. This led to shadow apps serving/referencing old content that was not available anymore at this point in time, resulting in 404s. Luckily we had our logger module running that let us discover this issue quickly. Being observable for production running apps should be one of the top priorities and is of the utmost importance. We internally use Logstash to centralize, aggregate, parse and filter log files.

Production logstash log entry

As our Logstash configuration embraces JSON as log-file format we’re using Bunyan as a base for our logger module. Its output is per default line delimited JSON, which makes it easy to consume. It’s build around streams and you can define multiple output streams at different log levels. When debugging your app on your local machine, you don’t want to log to a file but want to see output as fast as possible. On the other side, when running in production you don’t want to necessarily log debug level information to logstash. This is totally possible by using Bunyan. Here at mobile.de each app we build needs to conform to certain guidelines. For logging this includes for instance adding informations like build_timestamp, app_revision or log_level. For production usage, we wrote a bunyan-logstash transform stream that would add these fields at runtime and pipe the output to a file. For local development we use a bunyan-debug transform stream that pipes all log levels to stdout. We are currently experimenting with this setup and constantly trying to improve this. For visualizing logs we use Kibana as a dashboard. This instantly lets us discover errors and unexpected issues. PIC

Watch your health

One of our app requirements is to have proper monitoring set up. What does monitoring mean? Actually it’s about collecting numeric time-series data. For mobile.de this includes asynchronous forwarding of metrics to an aggregator (push style) and also providing various endpoints to verify application health (pull style). This helps for monitoring but also implementing reactive behaviour in a microservice landscape. Not suffering NIH we had a look into multiple open source and commercial products that would tackle this problem, properly providing a solution for our all new mobile.de homepage running on Node.js. As most of our apps use Graphite as a real-time graphing system, we also wanted to make use of it. We wanted to collect some default system metrics of the host, like usage of cpu, memory or garbage collection. In addition the module also should provide a possibility to get http information about incoming and outgoing connections. Unfortunately we couldn’t find anything out there that would encapsulate and fit our needs. On top of node-measured and node-graphite we built our own node-metrics module. It currently offers the following features:

gathering vm related metrics
- cpu usage
- memory consumption
- gc stats
custom metrics creation
- counters
- histograms
- gauges
- meters
- timers
middlewares for (semi) automated metrics collection
- timers for all routes inbound.routes.[route]
- meters for all status codes inbound.statuses.[statusCode]
- http server middleware
- express middleware
option of periodically reporting to graphite

The module so far does a solid job and there are plans to open source it.

For visualizing collected metrics we use Grafana as a dashboard.

It depends…

To verify our application could handle the expected traffic easily, we ran various load tests before launching. Everything worked well until we reached a certain amount of concurrent users. The application would then respond with a 500 on every second request. With monitoring and logging in place we figured that the error was caused by engine-munger, a component of our rendering strategy. We decided to simplify our view/template implementation by throwing away the confusing construct of Dust, adaro and engine-munger. This instantly boosted performance and made our tests go green. Without our app being observable this crucial incident would have been deployed to production. Before deciding on a dependency we have to make sure to fully understand it and evaluate if it’s really necessary.

Real world facts

The real world usually requires the application to do real work! Nowadays, we as JavaScript developers are flooded with new frameworks, libraries and tools that mostly arise to solve problems in development environments. Releasing an application to the wild and architect it to scale properly requires a lot more than most of the tutorials out there are showing us. Often the latest and greatest JavaScript trends cannot help us with this task. It’s kinda obvious but I think this is something we should keep in mind when starting to build a new product. The real world is harsh and unforgiving, it doesn’t care about your build tool. It doesn’t care about your development workflow and it doesn’t care about your one-line promise-yielding generator class construct. All it cares about is being able to properly fulfil user needs and it’s our job to let it do so! So be prepared, measure everything, no assumptions, monitor your app’s health, keep the logs coming, know your dependencies and move your app as early as possible into its real world scenario!