More Fun Timing Data

It is, admittedly, a little unrealistic to just examine how quickly simple i/o dispatching happens in different designs if you are interested in how well an overall service will perform. So what about other operations that might be interesting to look into? This page looks at a couple of others.

Context Creation and Destruction
Full-on Http API servers

Context Creation and Destruction

A common point at which new "execution contexts" need to be created, or at least fetched from a pool of previously constructed ones, is at the point when a new connection request comes into the server. So, in the following sort of code:

// accept loop
void accept_loop(int listening_socket)
{
  while (true) {
    int new_connection = accept(listening_socket, 0, 0);
    
    // create a new "context" to handle this connection
    std::thread(std::bind(handle_message, new_connection)).detach();
  }
}

you can have code that creates a new thread, as shown in the code above, or you can do something else. Regardless of what you do, you will need some kind of context-like object that keeps track of the state of things as you process the messages you are going to receive.

Independent of whether you always construct new objects or pool them, it is interesting to understand the cost of constructing new ones - since you will need to do this with some frequency. To time this kind of data we want to minimize the work being done inside the context (thread) so we can just look at creation and destruction time. So timing the OS thread case would be running a piece of code something like:

// create lots of threads and wait for them to finish
void time_threads()
{
  std::vector<std::thread> threads;
  
  for (int i=0; i<10000; i++)
    threads.push_back(std::thread([]{return 0;});
    
  for (int i=0; i<10000; i++)
    threads[i].join();
}

Fibers support similar syntax so we can write the fiber equivalent of this loop and time both of them. Here is the timing data when this code is run on 1, 2, and 4 core linux VMs, hosted on a 2.3 GHz MacBook Pro.

**Time to create and execute 10,000 fibers vs. 10,000 OS threads**
	1 Core	2 Cores	4 Cores
Fibers	0.022 - 0.022 secs.	0.023 - 0.024 secs.	0.031 - 0.033 secs.
OS Threads	0.175 - 0.256 secs.	0.410 - 0.600 secs.	0.385 - 0.409 secs.

Again, this data shows some penalty to running with multiple cores. But in general the fiber case is significantly faster than the OS thread case by about one order of magnitude.

Full-on Http API Servers

Testing fully-functional Http servers is both more interesting and more dangerous. It is more interesting because it is getting closer to meaningful when considering different server designs. But the performance of them can be affected by lots of different aspects of exactly what was tested. For this, it is probably best to not speculate too much about what the causes of the differences are, but to just be careful in describing exactly what is being tested.

The goal here will be to try to implement very similar servers in several different environments and test them with the same client code. The tables will just show the performance numbers for each test. To start with here is a basic description of the test code that is used to drive each server:

void http_client()
{
  int socks[400];
  for (int i=0; i<400; i++)
    socks[i] = connect( ... to the server ... );
    
  // a simple HTTP GET string... 
  // be sure to specify keep-alive... 
  const char* http_GET_message = "GET / HTTP/1.1\r\nHost: ...";
    
  for (int msg=0; msg<2000; msg++) {
  
    // send http string to the server
    for (int i=0; i<400; i++)
      send_string(socks[i], http_GET_message);
      
    // now loop through and read all the responses
    for (int i=0; i<400; i++)
      read_http_response(socks[i]);
  }
}

Like other tests, this test code takes care to send the http get requests to lots of different sockets before attempting to read responses from any of them to force the server to perform context switches form one connection to another. But it is also trying to avoid performing operations like constructing custom GET messages for each request. While not explicily shown in this example, it also does only minimal parsing of the response message. Also not really shown, but the client uses keep-alive strategies for the connection to avoid timing the process of the connect itself. This sort of test is trying to only ensure that the server is switching connections and parsing the http request prior to sending its reply.

What follows is the look of the source for these simple "echo" servers written in the different tool sets that are timed. Following that is the table listing the timings.

The Anon Example

The anon example gets to use anon-style syntax and looks like this. The http_server class is one provided by anon, and its constructor takes a callable function that is called - in a fiber - each time a new api request is sent from a client. The style of its API is inspired by the node.js syntax for this. In this example it simply specifies that it will be returning plain text, and then writes "echo server!" into the body of the reply.

http_server my_http(/*port*/8080,
    [](http_server::pipe_t& pipe, const http_request& request){
      http_response response;
      response.add_header("Content-Type", "text/plain");
      response << "echo server!\n";
      pipe.respond(response);
    });

The Node.js Example

The node.js example is visually very similar to the anon one. It makes use of syntax that lets it pass a function as a parameter to an http server. That service calls the supplied function when new api requests arrive, and it is the function that writes the response information. But in this case the lauguage is javascript, not C++

my_http = require("http");
my_http.createServer(function(request,response){
  response.writeHeader(200, {"Content-Type": "text/plain"});
  response.write("echo server!");
  response.end();
}).listen(/*port*/8080);

The Proxygen Example

The proxygen example is too complicated to show in a short code snippet like the anon and node.js examples. However, it is just the standard "echo" server that comes with the proxygen github project and the one that Facebook talks about in public discussions. You can see the code in their EchoHandler.h and EchoHandler.cpp files. In this design, proxygen supplies a base class named RequestHandler that defines a number of virtual methods with names like onRequest and onBody. EchoHandler is a subclass of this and implements those virtual methods with simple echoing behavior.

The Spring 4 Example

Spring is a Java framework somewhat similar to Java EE. It has recently been updated to version 4.0. This version comes with a number of example projects that one can build. One of these is an example of how to build a RESTful server. Although simpler than the Proxygen example, it is still a little too complicated to show in a short code snippet. But it is simple enough to serve as a fair comparison to the "echo" servers we are testing here. The source code for this Spring example can be found in github here. The design of this example server makes use of Java's support for "annotations" to provide much of the wiring that routes an HTTP GET request for "/greeting" to the code shown in the GreetingController.java class.

To run this Spring example we modify the http_client example code slightly so that it requests "/greeting" instead of "/" in the HTTP GET request. We also modify it so that the Accept header specifies "application/json" (which is what this particular server is designed to return). As explained above, the http_client application specifies the HTTP "keep-alive" behavior, but unlike the other servers tested here which keep the connection open during the entire test, the Spring server closes it every 100 api calls. So there is a slight penalty reported by this test because it has to reconnect every 100 calls. There is also a very significant "warm up" period when the server is first started, causing the first execution of http_client to be measurably slower than subsequent executions. In the numbers reported below we discard the first run of http_client and then only report the next 3.

Spray

Spray is a Scala-based Http server framework (Scala is a Java-based language). Like the anon and node.js cases spray supports very simple expressions of simple servers. This test is the "spray-template" example project found in github here. The server code looks like:

trait MyService extends HttpService {

  val myRoute =
    path("") {
      get {
        respondWithMediaType(`text/html`) {
          complete {
            <html>
              <body>
                <h1>Say hello to <i>spray-routing</i> on <i>spray-can</i>!</h1>
              </body>
            </html>
          }
        }
      }
    }
}

The Comparison Numbers

The table below is displayed in "api calls per second" as seen by the http_client application, described above. For this, a single "api call" is sending the Http message to the server, and then reading the response it sends back. Like in other tests, the client application is running on the same machine as the server, and so is using the loopback network address to connect to and communicate with the server. The testing code uses http's "keep-alive" strategy on the sockets, so these numbers do not include connect/shutdown time on the sockets themselves (other than the Spring 4 numbers as described above). Particularly in the 1 core case, that one core is forced to run both the client and server threads. This code is run in an Ubuntu 14.04 VM, running in VMWare on a 2.3 GHz MacBook Pro laptop. There is variability in the numbers from one run to the next. This table represents three consecutive runs, listing the slowest and fastest of those three.

**API Calls Per Second**
	1 Core	2 Cores	4 Cores
Anon	84,500 - 85,900	163,200 - 168,000	264,000 - 274,000
Proxygen	47,400 - 51,300	86,200 - 91,000	120,500 - 124,300
Spray	43,441 - 45,323	64,136 - 68,948	112,553 - 116,278
Spring 4	9,193 - 9,472	19,544 - 19,913	26,550 - 26,716
Node.js	4,868 - 5,317	5,150 - 5,223	5,169 - 5,182