What’s Taking So Long

While Sun’s Web Server has a very nice threading model, once a worker thread is processing a specific request it will continue working on that request even if it takes a while or blocks.

This is rarely an issue. Static content is served very quickly and code which generates dynamic application content needs to be written so it responds prompt ly. If the application code takes a long time to generate response data the site has more problems than one, so the web application developers have a motivation to keep it snappy.

But what if you do have a bad application which occasionally does take a long time? As requests come in and worker thread go off to process them, each long-running request ties up another worker thread. If requests are coming in faster than the application code can process them, eventually the Web Server will have all its worker threads busy on existing connections.

As you can infer from Basant’s blog entry, the server will still continue accepting new connections because the acceptor thread(s) are separate from the worker threads, so it is still accepting new connections. But there won’t be any spare worker threads to take that new connection from the connection queue.

If you’re the client making the request, you’ll experience the server accepting your request but it won’t answer for a (possibly long) while. Specifically, until one of the previous long-running requests finally completes and a worker thread frees up to take on your request (and of course, there may be many other pending requests piled up).

If this is happening with your application one option is to check the perfdump output
and see which requests are taking a while. But, as these things are bound to do, it’ll probably happen sporadically and never when you’re watching.

So how can we easily gather a bit more info? It’s been said countless times but always worth repeating.. dtrace really is the greatest thing since sliced bread (and I like bread). I can’t imagine attempting to maintain a system without dtrace in this day and age, it would be limiting beyond belief! One of the many key benefits is being able to gather arbitrary data right from the production machine without any prior preparation (such as producing debug builds) or downtime or even any access to the sources you’re diagnosing.

So in that spirit, I tried to gather a bit more data about the requests which appear to be taking a while using dtrace and without attempting to look at what the code is actually doing (well, also because I only had fairly limited time to dedicate to this experiment so didn’t want to go looking at the code ;-). Although, I should mention, since Sun’s
Web Server is open source you certainly could go review the source code if you wish to know more detail.

So what am I looking for? Basically I’d like to know when the worker thread starts on a request and when it is done with it. If the time between those two grows “too long”, I’d like to see what’s going on. Sounds simple enough. Searchingaround a bit I saw Basant’s article on dtrace and Web Server so using his pid$1::flex_log:entry as an exit point seems like a suitable thing to try. I didn’t find (on a superficial search, anyway) a mention of an adequate entry point so instead I took a number of pstack snapshots and looked for something useful there and wound up selecting “pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry” (ugly mangled C++ function name). With that, ran the following dtrace script on the Web Server process:

% cat log.d
#!/usr/sbin/dtrace -qs

pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry
{
  self->begin = timestamp;
  printf("ENTER %d, %d to nn", tid, self->begin);
}

pid$1::flex_log:entry
/self->begin/
{
  self->end = timestamp;
  printf("DONE %d, %d to %dn", tid, self->begin, self->end);
  self->begin = 0;
}

This gets me entry/exit tick marks as the threads work their way through requests. On a mostly unloaded server it’s easy enough to just watch that output, but then you’re probably not experiencing this problem on an unloaded server. So we need a little bit of helper code to track things for us. Twenty minutes of perl later, I have

#!/usr/bin/perl

$PATIENCE = 9;                  # seconds - how long until complains start

$pid = shift @ARGV;
$now = 0;
$npat = $PATIENCE * 1000000000;

open(DD, "./log.d $pid |");
while (<DD>)
{
    chomp;
    ($op, $tid, $t1, $t2) = /(S*) (d*), (d*) to (.*)/;
    if ($t1 > $now) { $now = $t1; }

    # dtrace output can be out of order so include start time in hash key
    $key = "$tid:$t1";          

    if ($op eq "ENTER") {
        if ($pending{$key} != -1) {
            $pending{$key} = $t1 + $npat; # value is deadline time
        }

    } else {
        $took = (($t2 - $t1) / 1000000000);
        if (!$pending{$key}) {
            $pending{$key} = -1; # if DONE seen before ENTER, just ignore it
        } else {
            delete $pending{$key};
        }
    }

    # Once a second, review which threads have been working too long
    # and do a pstack on those.
    # 
    # Note: we only reach here after processing each line of log.d output
    # so if there isn't any more log.d output activity we'll never get here.
    # A more robust implementation is left as an exercise to the reader.
    #
    if ($now > $nextlook) {
        $c = 0;
        foreach $k (keys %pending)
        {
            if ($pending{$k} != -1 && $pending{$k} < $now) {
                ($tid, $started) = $k =~ /(d*):(d*)/;
                $pastdue = ($now - $started) / 1000000000;
                print "=================================================n";
                system("date");
                print "Thread $tid has been at it $pastdue secondsn";
                system("pstack $pid/$tid");
                $c++;
            }
        }
        if ($c) { print "n"; }
        $nextlook = $now + 1000000000;
    }
    
}

The perl code keeps track of the ENTER/DONE ticks (which may occasionally be out of order) and if too long (more than $PATIENCE) goes by, gives you pstack output what’s going on.

I don’t actually have a suitably misbehaving application so I’ll leave it at that. If I had a real application issue, it’d be useful to fine tune the dtrace script to key off of more specific entry and exit points and it’d also be useful to trigger more app-specific data gathering instead of (or in addition to) the pstack call (for instance, checking database availability if you suspect a database response problem, or whatever is suitable for your concrete application).

dtrace is like lego blocks, there’s a thousand and one ways of coming up with something similar. Care to try an alternative or more efficient approach?