Request Processing Capacity

Q: How many requests per second can the Web Server handle?

Short answer: It depends.

Long answer: It really depends on many factors.

Ok, ok.. sillyness aside, can we make any ballpark estimates?

The Web Server can be modeled as a queue. By necessity such modeling will be a simplification at best, but it may provide a useful mental model to visualize request processing inside the server.

Let’s assume your web application has a fairly constant processing time[1], so we’ll model the Web Server as a M/D/c queue where c is the number of worker threads. In this scenario, the Web Server has a maximum sustainable throughput of c / (processing time).

To use some simple numbers, let’s say your web app takes 1 second to process a request (that’s a very slow web application!). If the Web Server has c=128 worker threads, that means it can indefinitely sustain a max request rate of:

128/1 = 128 requests per second

This makes a lot of sense if we think about it:

  • At t = 0 seconds, 128 request come in and each one is taken by a worker thread, fully utilizing server capacity.
  • At t = 1 second, all those requests complete and responses are sent back to the client and at the same time 128 new requests come in and the cycle repeats.

At this request rate we don’t need a connection queue at all[2] because all requests go straight to a worker thread. This also means that at this request rate the response time experienced by the end user is always 1 second.

To expand on that, the response time experienced by the end user is:

end user response time = (connection queue wait time) + (processing time)

Since we’re not using the connection queue the end user response time is simply the same as the processing time[3].

So far so good. Now, what happens if the incoming request rate exceeds the maximum sustainable throughput?

  • At t = 10 seconds, 129 requests come in. 128 go straight to worker threads, 1 sits in wait in the connection queue.
  • At t = 11 seconds, 128 requests come in. 128 (the one which was waiting + 127 of the new ones) go straight to worker threads, 1 sits in wait in the connection queue.

The connection queue absorbs the bumps in the incoming request rate, so connections are not dropped and worker threads can remain fully utilized at all times. Notice that now out of every 128 requests, one of them will have a response time of 2 seconds.

So what happens next?

If we go back to receiving a steady 128 requests per second, there will always be one requests in the connection queue.

If at some point we only receive 127 requests (or less), the server can “catch up” and the connection queue goes back to staying empty.

On the other hand, if the incoming request rate remains at 129 per second we’re in trouble! Every second the connection queue waiting list will grow longer by one. When it reaches 129 entries, one end user will experience a response time of three seconds, and so on.

And of course, the connection queue is not infinite. If the max connection queue size is 4096 then 4096 seconds later it will fill up and from that point onwards, one incoming request will simply be dropped every second since it has no place to go. At this point the server has reached a steady state. It continues pr
ocessing requests at the same rate as always (128 per second), it continues accepting 128 of the 129 new requests per second and dropping one. End users are certainly unhappy by now because they are experiencing response times of over 30 seconds (4096 / 128 = 32, so it takes 32 seconds for a new request to work its way through the queue. Almost like going to the DMV…

If the incoming request rate drops below the maximum sustainable rate (here, 128/sec) only then can the server start to catch up and eventually clear the queue.

In summary, while this is certainly a greatly simplified model of the request queue behavior, I hope it helps visualize what goes on as request rates go up and down.

Theory aside, what can you do to tune the web server?

  • The single best thing to do, if possible, is to make the web app respond quicker!
  • If you want to avoid dropped connections at all cost, you can increase the connection queue size. This will delay the point where the server reaches a steady state and starts dropping connections. Whether this is useful really depends on the distribution of the incoming requests. In the example above we’ve been ass
    uming a very steady incoming rate just above the maximum throughput rate. In such a scenario increasing the connection queue isn’t going to help in practice because no matter how large you make it, it will fill up at some point. On the other hand, if the incoming request rate is very bumpy, you can damp it by using a
    connection queue large enough to avoid dropping connections. However… consider the response times as well. In the example above your end user is already seeing 33 second response times. Increasing the connection queue length will prevent dropped connections but will only make the response times even longer. At some point the user is simply going to give up so increasing the connection queue any further won’t help!

  • Another option is to increase the number of worker threads. Whether this will help or hurt depends entirely on the application. If the request processing is CPU bound then it won’t help (actually, if it were truly CPU bound, which is rare, then you’ll probably benefit from reducing the number of worker threads unle
    ss your server has 128+ CPUs/cores…) If the web app spends most of its time just waiting for I/O then increasing the worker threads may help. No set answer here, you need to measure your application under load to see.

[1] In reality the response time can’t be deterministic. At best it may be more or less constant up to the point where the server scales linearly but after that the response time is going to increase depending on load. On the flip side, cacheing might make some responses faster than expected. So M/D/c is certainly a
simplification.

[2] Not true for several reasons, but it’ll do for this simplified model and it helps to visualize it that way.

[3] Plus network transmission times but since we’re modeling only the web server internals let’s ignore that.

Posted in Sun

Web Server 7 Request Limiting Revisited

Coincidentally last week I heard a couple related queries about check-request-limits from different customers. I haven’t covered that feature in a while so it’s a good time to revisit it for a bit.

To review, Web Server 7has a feature (function) called check-request-limits which can be used to monitor and limit the request rate and/or concurrency of request which match some criteria. It can be used to address denial of service attacks as well as just to limit request rates to some objects or from some clients for other reasons (for example to reduce bandwidth or cpu usage).

I usually refer to ‘matching requests’ when speaking of this capability. Matching what? Probably the most common use case is to match the client IP address. This is useful when you wish to limit request rates coming from a given client machine. Here’s a basic example of that scenario:

 

PathCheck fn="check-request-limits" max-rps="10" monitor="$ip"

The common theme to both customer requests I heard last week was whether it is possible to limit requests based on something other than the client IP?

Yes, certainly!

The monitor parameter above is set to “$ip” which expands to the client IP address but you can set it to anything that you prefer. In my introduction to check-request-limits article I gave examples of both “$ip” and “$uri” (and even both combined). You’re not restricted to only these though, you can u
se any of the server variables available in WS7 as the monitor value.

You can also construct more complicated scenarios using the If expressions of Web Server 7. I gave a few examples of that in this article on check-request-limits.

To give a couple more examples, let’s say your web server is behind a proxy and this the client $ip is always the same (the proxy IP). Clearly monitoring the $ip value isn’t terribly useful in that case. Depending on how your application works you may be able to find other useful entries to monitor. For example if the requests contain a custom header named “Usernum” which contains a unique user number, you could monitor that:

PathCheck fn="check-request-limits" max-rps="1" monitor="$headers{'usernum'}"

Or maybe there’s a cookie named customer which can serve as the monitor key:

 

PathCheck fn="check-request-limits" max-rps="1" monitor="$cookie{'customer'}" 

These two are made-up examples, you’ll need to pick a monitor value which is suitable for your application. But I hope these ideas will help you get started.

By the way check-request-limits can also be used to limit concurrency.

Posted in Sun

What’s Taking So Long

While Sun’s Web Server has a very nice threading model, once a worker thread is processing a specific request it will continue working on that request even if it takes a while or blocks.

This is rarely an issue. Static content is served very quickly and code which generates dynamic application content needs to be written so it responds prompt ly. If the application code takes a long time to generate response data the site has more problems than one, so the web application developers have a motivation to keep it snappy.

But what if you do have a bad application which occasionally does take a long time? As requests come in and worker thread go off to process them, each long-running request ties up another worker thread. If requests are coming in faster than the application code can process them, eventually the Web Server will have all its worker threads busy on existing connections.

As you can infer from Basant’s blog entry, the server will still continue accepting new connections because the acceptor thread(s) are separate from the worker threads, so it is still accepting new connections. But there won’t be any spare worker threads to take that new connection from the connection queue.

If you’re the client making the request, you’ll experience the server accepting your request but it won’t answer for a (possibly long) while. Specifically, until one of the previous long-running requests finally completes and a worker thread frees up to take on your request (and of course, there may be many other pending requests piled up).

If this is happening with your application one option is to check the perfdump output
and see which requests are taking a while. But, as these things are bound to do, it’ll probably happen sporadically and never when you’re watching.

So how can we easily gather a bit more info? It’s been said countless times but always worth repeating.. dtrace really is the greatest thing since sliced bread (and I like bread). I can’t imagine attempting to maintain a system without dtrace in this day and age, it would be limiting beyond belief! One of the many key benefits is being able to gather arbitrary data right from the production machine without any prior preparation (such as producing debug builds) or downtime or even any access to the sources you’re diagnosing.

So in that spirit, I tried to gather a bit more data about the requests which appear to be taking a while using dtrace and without attempting to look at what the code is actually doing (well, also because I only had fairly limited time to dedicate to this experiment so didn’t want to go looking at the code ;-). Although, I should mention, since Sun’s
Web Server is open source you certainly could go review the source code if you wish to know more detail.

So what am I looking for? Basically I’d like to know when the worker thread starts on a request and when it is done with it. If the time between those two grows “too long”, I’d like to see what’s going on. Sounds simple enough. Searchingaround a bit I saw Basant’s article on dtrace and Web Server so using his pid$1::flex_log:entry as an exit point seems like a suitable thing to try. I didn’t find (on a superficial search, anyway) a mention of an adequate entry point so instead I took a number of pstack snapshots and looked for something useful there and wound up selecting “pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry” (ugly mangled C++ function name). With that, ran the following dtrace script on the Web Server process:

% cat log.d
#!/usr/sbin/dtrace -qs

pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry
{
  self->begin = timestamp;
  printf("ENTER %d, %d to nn", tid, self->begin);
}

pid$1::flex_log:entry
/self->begin/
{
  self->end = timestamp;
  printf("DONE %d, %d to %dn", tid, self->begin, self->end);
  self->begin = 0;
}

This gets me entry/exit tick marks as the threads work their way through requests. On a mostly unloaded server it’s easy enough to just watch that output, but then you’re probably not experiencing this problem on an unloaded server. So we need a little bit of helper code to track things for us. Twenty minutes of perl later, I have

#!/usr/bin/perl

$PATIENCE = 9;                  # seconds - how long until complains start

$pid = shift @ARGV;
$now = 0;
$npat = $PATIENCE * 1000000000;

open(DD, "./log.d $pid |");
while (<DD>)
{
    chomp;
    ($op, $tid, $t1, $t2) = /(S*) (d*), (d*) to (.*)/;
    if ($t1 > $now) { $now = $t1; }

    # dtrace output can be out of order so include start time in hash key
    $key = "$tid:$t1";          

    if ($op eq "ENTER") {
        if ($pending{$key} != -1) {
            $pending{$key} = $t1 + $npat; # value is deadline time
        }

    } else {
        $took = (($t2 - $t1) / 1000000000);
        if (!$pending{$key}) {
            $pending{$key} = -1; # if DONE seen before ENTER, just ignore it
        } else {
            delete $pending{$key};
        }
    }

    # Once a second, review which threads have been working too long
    # and do a pstack on those.
    # 
    # Note: we only reach here after processing each line of log.d output
    # so if there isn't any more log.d output activity we'll never get here.
    # A more robust implementation is left as an exercise to the reader.
    #
    if ($now > $nextlook) {
        $c = 0;
        foreach $k (keys %pending)
        {
            if ($pending{$k} != -1 && $pending{$k} < $now) {
                ($tid, $started) = $k =~ /(d*):(d*)/;
                $pastdue = ($now - $started) / 1000000000;
                print "=================================================n";
                system("date");
                print "Thread $tid has been at it $pastdue secondsn";
                system("pstack $pid/$tid");
                $c++;
            }
        }
        if ($c) { print "n"; }
        $nextlook = $now + 1000000000;
    }
    
}

The perl code keeps track of the ENTER/DONE ticks (which may occasionally be out of order) and if too long (more than $PATIENCE) goes by, gives you pstack output what’s going on.

I don’t actually have a suitably misbehaving application so I’ll leave it at that. If I had a real application issue, it’d be useful to fine tune the dtrace script to key off of more specific entry and exit points and it’d also be useful to trigger more app-specific data gathering instead of (or in addition to) the pstack call (for instance, checking database availability if you suspect a database response problem, or whatever is suitable for your concrete application).

dtrace is like lego blocks, there’s a thousand and one ways of coming up with something similar. Care to try an alternative or more efficient approach?

Posted in Sun