<< Home

TL;DR

I've created TLSnappy module which is going to be faster than internal TLS module in node.js. So far it's slower on some benchmarks, but it'll definitely be much snappier soon.

Preface

Many people were complaining about tls performance in node.js, which (as they said) was significantly worse than in many other popular web servers, balancers and terminators (i.e. nginx, haproxy..).

Several things were done to address this issue, including:

After all that stuff got in, rps (requests per second) rate was significantly improved, but many users were still unhappy with overall TLS performance.

TLSnappy

This time, instead of patching and tweaking tls I decided that it may be worth trying to rewrite it from scratch as a third-party node.js addon. This recently became possible, thanks to Nathan Rajlich and his awesome node.js native addon build tool node-gyp.

I didn't want to offer a module that's functionally equivalent to TLS, but wanted to fix some issues (as I've perceived them) and improve few things:

  • Encryption/decryption should happen asynchronously (i.e. in other thread). This could potentially speed up initial ssl handshake, and let the event loop perform more operations while encryption/decryption is happening in the background.
  • The builtin TLS module passes, slices and copies buffers in javascript. All binary data operations should happen in C++.

All this was implemented in TLSnappy module.

There were a lot of availability and stability issues (and surely much more that I'm yet unaware of). But tlsnappy seem to be quite a bit more performant than the built-in tls module. Especially... when taking in account that tlsnappy is by default using all available cores to encrypt/decrypt requests, while tls module needs to be run in cluster to balance load between all cores.

Benchmarking

And I've confirmed that when I was benchmaring it with Apache Benchmark (ab) on my Macbook Pro and on dedicated Xeon server. Here a results from the latter one:

Xeon 16 threads (rps) - Apache Benchmark Xeon 16 threads (ms) - Apache Benchmark

A little comment about curve names here:

  • default - one tlsnappy process with 16 threads
  • hybrid - 4 tlsnappy processes with 4 threads each
  • cluster - 16 tlsnappy processes with 1 thread each
  • http - 16 node.js processes in cluster

As you can see tlsnappy is faster than tls server in almost every case, except cluster mode (which just wasn't saturating CPU enough). Everything looked great and shiny, until Matt Ranney has pointed out that ab results of https benchmarks are not really trustful:

I've installed siege, created node.js script and let it run for some time:

Xeon 16 threads (rps) - Siege

Results are much better now (nginx was doing 5000 rps with siege and 2500 rps with ab), but now tlsnappy seems to be slower than node.js' default tls server.

I started investigation and decided to track not only rps rate, but a CPU load too:

Xeon 16 threads (load) - Siege

Afterword

Right now, as you can see on the chart above, tlsnappy isn't saturating all CPUs well. I suspect this is a major reason of its relative slowness in comparison to both nginx and https module. I'm working on making it balance and handle requests better, and will sum up results of this investigation in the next blog post.

For those of you, who are interested in more details - here is benchmarks' data

Created at: Tue, 02 Oct 2012 16:01:45 GMT