Goodbye my Sun

Posted on 2010/06/02 by jyri

Goodbye my Sun.

(I don’t eat fish and the Douglas Adams quote is overused, anyway.)

I joined Sun Microsystems on March 10, 1997 so I was just about to complete thirteen years at Sun when the acquisition happened. It may have had its highs and lows but all in all it was a wonderful ride at a legendary company. Having been part of Sun was a dream fulfilled.

I’ll always remember Sun for the relentless innovation and the highest standards for technical excellence. Sun was a place where engineering reason prevailed over bureaucrats, assumptions were questioned and critical thinking stood over mere paperwork. We may not have marketed it very well but it sure was the best technology on Earth.

I worked on many projects over the years but the longest running and most special one was the Web Server [iPlanet|SunONE|JES|SJS – nobody loved continuous product renaming like Sun!] Always a small team but with the highest passion for producing the best code on Earth, even against all odds. Thanks for the good memories, Web Server team!

All wonderful things come to an end, I suppose, so did Sun and so does this blog now. Hopefully the articles will continue to be here for future reference but if not, I have also made them available on my own site at http://virkki.com/jyri/articles/.

Interesting challenges have fallen my way so it is time to pursue them.

You know where to find me. Keep in touch!

Joining the ZFS Revolution

Posted on 2010/03/27 by jyri

For a long time now I’ve been meaning to migrate my home file storage over to a ZFS server but the project kept getting postponed due to other priorities. Finally it’s alive!

For the last ten years or so my home fileserver has been through the general purpose debian box in the garage. It has three disks, one for the system and home directories, a larger one which gets exported over NFS and the largest one which backs up the other two (nightly rsync). It has been an adequate solution, in as far as I’ve never lost data. But whenever a disk dies I always have several days of downtime and have to scramble to restore from backups and maybe reinstall.

There are many articles about this topic that make for good reading if you’re considering the same. My goals were:

1. Data reliability, above all.

Initially I had visions of maximizing space, mainly for the geek value of having many terabytes of home storage. But in the end, I don’t really need that much. The NFS export drive on my debian box was currently only 500GB and that was use d not only by the shared data (pictures, mostly, and documents) but also for MythTV storage. Since I wasn’t planning on moving the MythTV data to the ZFS pool, even 500GB would be plenty adequate for some time.

2. Low power consumption.

Since this is another server that’ll need to run 24/7, I wanted to keep an eye on the power it uses.

3. But useful for general computing.

Since this will be the only permanent (24/7) OpenSolaris box on my home network, I also wanted to be able to use it for general purpose development work and testing whenever needed. So despite the goal of low power consumption, I didn’t want to go all out with the lowest possible power setup, needed a compromise.

Here’s the final setup:

CPU: AMD Phenom II X4 (quad core) 925. Reasonable power consumption and the quad cores give me something fun to play with.

Memory: 8GB ECC memory. Since I’m going primarily for data reliability, might as well go with ECC.

ZFS pool: 3 x 1TB drives. These are in a mirror setup, so total storage is just 1TB. That’s still about three times as much as I really need right now. With three drives, even if two fail before I get to replace them I should be ok. Igot each of the three drives from a different manufacturer, hopefully that’ll make them fail at different times.

        NAME        STATE     READ WRITE CKSUM
        represa     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8d0    ONLINE       0     0     0
            c8d1    ONLINE       0     0     0
            c9d0    ONLINE       0     0     0

System disk: I expected to just use one older drive I had on the shelf, but after installing it I found it was running hot. Maybe it is ok but decided to do a two-way mirror of the rpool as well, maybe it’ll save me some time down the road. I don’t need much space here so found the cheapest drive I could get ($40) to add to the rpool. At that price, might as well mirror!

        NAME         STATE     READ WRITE CKSUM
        rpool        ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c9d1s0   ONLINE       0     0     0
            c10d1s0  ONLINE       0     0     0

Total power consumption for the box hovers around 78-80W most of the time.

Sun!

Posted on 2010/01/21 by jyri

So here I sit, at the very end of Sun Microsystems.

(oblink to James Gosling’s entry)

Who would’ve thought!

Close to twenty years ago the university received a shiny new SPARCserver/390. Sure we had other hardware from HP (HP-UX, ugh) and IBM (AIX, even worse!) but that 390 with SunOS was special. I cajoled my way into being the sysadmin for that lab mainly so I could get unlimited play time with it.

Later after finishing grad school I ended up elsewhere but Sun was still the coolest company on Earth. I quickly “found” (not by accident) myself with a SPARCstation10 which later became a 20 and so on… Today my ‘desktop’ is a SunFire server but since it is insanely noisy I keep it in a lab and display through a SunRay in my office.

Inevitably, I later ended up here at Sun (coincidentally, right when Bellcore got acquired) and the engineering culture was as great inside as the products were cool from a customer perspective (as to the management side of the company, the less said the better I suppose). So here we are, at the Sunset of it all. Now, a very red sunrise.

So, what’s next for Sun Web Server?

More Thoughts on Web Server 7 and TLS Vulnerability

Posted on 2009/12/04 by jyri

Please read my article on Web Server 7 and the TLS vulnerability for background and recommendation on this issue.

In this entry I’ll add some random thoughts and examples to illustrate the problem. The ideas in this entry are purely experimental.

Is My Web Application Vulnerable?

You may be tempted to wonder that even if the SSL/TLS connection to your Web Server is vulnerable to the renegotiation attack, maybe your web application cannot be exploited?

While technically the answer is “not necessarily”, for most real web sites which exist today the answer is usually yes. Unless your web site is firmly entrenched in 1994 (nothing but static content and no processing of user input of any kind), a clever attacker can surely
find ways to cause mischief (or worse) using this vulnerability. So I’d like to discourage attempting to talk yourself info believing your site is safe. Instead, upgrade to Web Server 7.0u7.

As one example, shortly after the vulnerability was made public, it was used to grab Twitter passwords.

As noted earlier, at a high level, the attack relies on the MITM attacker being able to interact with your Web Server to pre-establish some state and then trick the legitimate client into executing actions based off that state in their name.

What this means in practice will vary widely, depending on what your web application does and how it processes input.

To answer with certainty whether your web application can be succesfully exploited requires analyzing in detail how the web application handles user input and keeps state so it is not possible to give a universal answer. However, given the complex and often
unintended access points available into most web applications, it is safest to assume there are exploitable vulnerabilities unless you can prove otherwise.

A Textbook Worst Case

As one example, consider a simple banking web site subdivided as follows:

/index.html             Welcome page, no authentication required
/marketing/*            Company info, no authentication required
/clients/info.jsp       Show customer balance info, client-cert auth needed
/clients/payment.jsp    Form to initiate payments, client-cert auth needed
/clients/do-payment.jsp Process payment, client-cert auth needed

The site expects users to enter via the home page, click on a link which takes them to the protected area under /clients at which point the Web Server requires client cert authentication to proceed. Once authenticated, the client can view their account balances or click on the payment page (payment.jsp) which contains a form to enter payment
info (dollar amount, recipient, etc). The form will do a POST to do-payment.jsp which actually processes the payment and removes the money from the customer account.

Exploiting the renegotiation vulnerability with this site is trivially easy:

Looking to check their balance, a legitimate client sends request for /clients/info.jsp (the user probably had it bookmarked)
MITM attacker sends a POST to do-payment.jsp with the amount/recipient info of their choosing
Because do-payment.jsp requires client authentication, the Web Server triggers a renegotiation and asks for a client certificate.
The attacker hands off the connection to the legitimate client
Legitimate client provides valid client certificate and renegotiation succeeds
Web Server executes do-payment.jsp with the POST data sent earlier (by the
attacker) and returns a transaction confirmation to the legitimate client
User panics! Why did the bank just make a payment from my account to some
unknown entity when all I requested was the balance page?!

This is a very real scenario. I have seen customer web applications doing something precisely analogous to what I describe above.

Application Layer Protection

Is it at all possible to take some precautions against the exploit at the application layer?

The renegotiation vulnerability is recent as of this writing and there have not been too many exploits in the wild yet. History teaches us that over time, vulnerabilities will be exploited in ways far more clever than anyone predicted at first. Given that we are barely in the initial stages of the adoption curve (so to speak) of this vulnerability, I’m only prepared to predict that we haven’t seen its more devious applications yet.

For completeness, I’ll share some thoughts on application layer protections. If you web application is handling anything important (and it probably is, since it is running https), I wouldn’t today recommend relying on a purely application layer protection.

Conceptually, your web application should not trust any input it received before the “hand off” renegotiation for the purpose of taking actions after it (i.e. do-payment.jsp should not process the POST data it received before the renegotiation to complete a payment after it).

Unfortunately, while that is easy to say it is impossible to implement! That is because your web application has no way to know that a “hand off” renegotiation occurred. The Web Server itself does not necessarily know either. Remember the renegotiation may occur at
any time and it happens directly at the SSL/TLS layer, invisible to both Web Server and application.

How about if we lower our goal and rephrase the guideline: the web application should not trust any input it received before succesful authentication was complete for the purpose of taking actions after it. Since the web application does have access to authentication data
(or lack of it), it becomes plausible to implement some defenses based on that knowledge. Is this lowered bar sufficient to protect against all attacks using the renegotiation vulnerability?

Picture a shopping web site with a flow like this:

populate cart with items
click on checkout (assume the site has payment info stored already)
authenticate
order is processed

Here the renegotiation is triggered at step 3, so when the legitimate client logs in they suddenly get an order confirmation screen for something they didn’t order.

The flow could be restructured to be:

populate cart with items
click on checkout (assume the site has payment info stored already)
authenticate
present cart contents and order info again for review
if user agrees again, order is processed

Here the legitimate user would enter the flow at step 3 and then see the unexpected order confirmation screen at which point they get a chance to hit cancel.

Do not be overconfident of such ordering. Just because the developer hoped that the request flow is supposed to be

info.jsp -> payment.jsp -> do-payment.jsp

nothing actually prevents the attacker from carefully crafting a request straight to do-payment.jsp. Paranoid enough yet?

Defensive web programmic is a difficult problem, one that many web applications get very wrong. It is a vast topic, now made that much more difficult by the SSL/TLS renegotiation vulnerability. So I’ll leave it at that for the moment.

So in closing, I’ll just repeat that I’d like to discourage attempting to talk yourself info believing your site is safe. It probably is not. Instead, upgrade to Web Server 7.0u7.

Web Server 7 and the TLS renegotiation vulnerability

Posted on 2009/12/03 by jyri

Web Server 7 and the SSL/TLS Vulnerability (CVE-2009-3555)

The recent SSL/TLS protocol vulnerability has been thoroughly covered in the press. Refer to the above link for the formal vulnerability report and refer to any one of many articles on the web for commentary on it.

While the vulnerability is at the SSL/TLS protocol level and impacts all products which support SSL/TLS renegotiation, this article covers it only from the Web Server angle.

Please keep in mind this is not a bug in Web Server nor a bug in NSS. It is flaw in the SSL/TLS protocol specification itself.

What is the vulnerability?

To quote from the CVE report:

[The protocol] 'does not properly associate renegotiation
handshakes with an existing connection, which allows
man-in-the-middle attackers to insert data into HTTPS sessions,
and possibly other types of sessions protected by TLS or SSL, by
sending an unauthenticated request that is processed retroactively
by a server in a post-renegotiation context, related to a
"plaintext injection" attack'

In terms of the Web Server, this means that the MITM (man-in-the-middle) attacker may interact with the web application running on the Web Server for a while and later “hand off” the same SSL/TLS session to the legitimate client in such a way that as far as the Web Server is concerned it was same [legitimate] client all along.

This “hand off” occurs when a renegotiation is done on the SSL/TLS connection. Note that renegotiation may be triggered by either the client(attacker) or the Web Server. The protocol is vulnerable either way, regardless of which party triggers the renegotiation (contrary to some popular belief).

A key point is that the vulnerability is at the SSL/TLS protocol level, in other words, at a lower level than the HTTP connection layer. Even if your Web Server is not configured to ever perform renegotiation explicitly, renegotiation can still occur and thus your site can still be vulnerable. There is nothing you can do to configure Web Server (prior to 7.0u7) to disable renegotiation from happening.

This is why you must upgrade to Web Server 7.0u7 (or later).

The rest of this article goes into more detail for the curious, but the bottom line remains that it is time to upgrade to Web Server 7.0u7 (or later).

Is My Web Server Vulnerable?

If you are not using https at all, your site is not vulnerable (of course, if the site is sending and receiving any sensitive data in clear text it is vulnerable to plenty of other problems, just not this one!)

If your Web Server (pre-7.0u7) is configured to use https and it is not configured to require client-auth, it is open to the renegotiation attack, period.

If client-auth is ‘required’ then that server is not vulnerable. Specifically, you are safe only if the http-listener has this in its configuration:

<http-listener>
...
<ssl>
...
<client-auth>required</client-auth>
...
</ssl>
...
</http-listener>

When client-auth is ‘required’ it means that the Web Server will require the client to provide a valid certificate as part of the initial handshake when establishing the SSL/TLS connection. If no valid client certificate is provided at that point, the connection is never established. Because the HTTP-level connection is never created, there is no window of opportunity for the attacker to send data before the client authentication takes place, defeating the attack.

In short, if you are running Web Server 7.0u6 or earlier and using https the only way to remain safe from this attack is to set <client-auth>required</client-auth> on all the <http-listener> elements which use <ssl>.

Unfortunately there is a significant drawback to doing this. Now all the content on your https site requires client-cert authentication. Clients who access the site without a valid certificate not only cannot read even the home page, they also can’t even get a useful error page. Because the connection attempt is rejected before it ever gets established at the HTTP level, it is not possible for the Web Server to redirect the client to a helpful error page. Your site is safe but the user experience will most likely not be acceptable.

Web Server 7.0u7 – What’s New

Earlier I pointed out that there is nothing you can do to disable renegotiation from occurring. Even if the Web Server is never configured to trigger renegotiation, it can still happen transparently thus it remains vulnerable.

Web Server 7.0u7 includes the latest release of NSS (NSS 3.12.5). The significant change in this release is that SSL/TLS renegotiation is completely disabled. Any attempt to trigger renegotiation (whether initiated by the Web Server itself or by the remote client) will cause
the connection to fail.

The good news is that by simply upgrading to Web Server 7.0u7 your site is now automatically safe from this vulnerability.

Whether there are bad news or not depends on whether your site had any legitimate need for renegotiation. If it did not, there is no bad news. Your site is now safe from this vulnerability and everything continues to work as before.

On the other hand if your site did make use of renegotiation, that capability is now broken.

Does My Site Use Renegotiation?

There is no check box anywhere that says the server needs or does not need renegotiation, so until this vulnerability became public you may have not given any though to whether your Web Server configuration is using renegotiation.

The Web Server uses renegotiation when the web application is configured to require a client certificate for some parts of the content but not for all. This permits the client to request the anonymous areas without presenting a client certificate. If the client clicks on a link to a protected area the Web Server then triggers a renegotiation to obtain the client certificate.

There are a couple of ways to configure this in Web Server 7:

Using get-client-cert in obj.conf
If obj.conf contains a fn=”get-client-cert” dorequest=”1″ line, that is going to trigger renegotiation to obtain client certificate under some conditions (depending on where and how in obj.conf it is invoked).
From Java Servlets, using the CLIENT-CERT auth-method in web.xml:
```
<login-config>
<auth-method>CLIENT-CERT</auth-method>
<login-config>
```
Same as get-client-cert this also triggers a renegotiation to obtain the client certificate only when needed. Refer to the Servlet specification for more info on web.xml.

If the server.xml <client-auth> element is not set to ‘required’ and your web application uses either of the above mechanisms to trigger the need for a client certificate for some parts of the application, then the Web Server is using renegotiation. This means this functionality will be broken after upgrading to Web Server 7.0u7.

Unfortunately there is no way around this. The current SSL/TLS renegotiation is fundamentally broken so it cannot be used safely.

But I Like It When My Web Site Is Vulnerable To Attacks!

Really?

If you absolutely must have renegotiation support, please reread this document from the top. There is no safe way to enable renegotiation, if you enable it your site is vulnerable.

If despite everything you still feel you must have the broken renegotiation support, it can be done as follows:

Environment variable:  NSS_SSL_ENABLE_RENEGOTIATION
Values: "0" or "Never"     (or just "N" or "n")  is the default setting
disables ALL renegotiation
"1" or "Unlimited" (or just "U" or "u")
re-enables the old type of renegotiation and IS VULNERABLE

If you set NSS_SSL_ENABLE_RENEGOTIATION=1 in the environment from where you start the Web Server 7 instance, renegotiation will work as it did in Web Server 7.0u6 and earlier. Which is to say, you’ll be vulnerable to attacks again. Obviously, we never recommend doing this.

Other Possibilities

The current state is very unfortunate. Renegotiation was a useful mechanism for requesting client certificate authentication for only some parts of the web application. Now there is no way to do so safely. As noted earlier this vulnerability is not a bug in the Web
Server implementation of SSL/TLS, it is a fundamental flaw in the protocol specification. Therefore there it can only be fixed at the protocol level (see next section). Until that happens there is nothing the Web Server can do to provide a safe implementation so it is a fact of life that renegotiation can no longer be used.

Here is one possibility which may ameliorate the limitation for some sites. It requires some site refactoring work but may offer relief (thanks to Nelson Bolyard of the NSS team for the idea):

Consider refactoring your https content into two separate http-listeners:

http-listener ls1: port 443 (standard SSL port), no client-auth
http-listener ls2: some other port (say, 2443), client-auth=required

Because you have upgraded to Web Server 7.0u7, listener ls1 is safe because renegotiation is disabled. Listener ls2 is also safe because it has client-auth=required.

Refactor your web application so that whenever a link into a protected area is accessed it it sent to https://example.com:2443/… (where example.com is your site) instead.

This allows clients to access the anonymous content on https://example.com/ and also allows requesting client certificate authentication when needed, on https://example.com:2443/, all while avoiding any use of renegotiation.

If you decide to try this approach feel free to share your experiences on the Web Server forum. Keep in mind that if your Web Server is behind a reverse proxy or a load balancer or other such frontend, you’ll need to arrange so the proper ports are reached as needed.

The Future

Work is underway on an enhanced TLS renegotiation protocol which will not be susceptible to the vulnerability. For info refer to: http://tools.ietf.org/html/draft-ietf-tls-renegotiation-01

As soon as the work is complete and a stable implementation is released, a future update of Web Server 7 will contain support for this enhanced renegotiation. Further details on it will be documented
at that time.

Keep in mind that both the server and the clients will need to be upgraded in order to communicate via the new protocol. While Web Server 7 will be upgraded as soon as possible and browsers which use NSS (such as Firefox) will likely also be upgraded promptly, there will remain a vast installed base of older browsers which will not be compatible with the enhancements for a long time. Some clients, such as those in embedded devices, may well never be upgraded. Therefore, a full transition to the new renegotiation will take considerable time.

Request Processing Capacity

Posted on 2009/08/26 by jyri

Q: How many requests per second can the Web Server handle?

Short answer: It depends.

Long answer: It really depends on many factors.

Ok, ok.. sillyness aside, can we make any ballpark estimates?

The Web Server can be modeled as a queue. By necessity such modeling will be a simplification at best, but it may provide a useful mental model to visualize request processing inside the server.

Let’s assume your web application has a fairly constant processing time^[1], so we’ll model the Web Server as a M/D/c queue where c is the number of worker threads. In this scenario, the Web Server has a maximum sustainable throughput of c / (processing time).

To use some simple numbers, let’s say your web app takes 1 second to process a request (that’s a very slow web application!). If the Web Server has c=128 worker threads, that means it can indefinitely sustain a max request rate of:

128/1 = 128 requests per second

This makes a lot of sense if we think about it:

At t = 0 seconds, 128 request come in and each one is taken by a worker thread, fully utilizing server capacity.
At t = 1 second, all those requests complete and responses are sent back to the client and at the same time 128 new requests come in and the cycle repeats.

At this request rate we don’t need a connection queue at all^[2] because all requests go straight to a worker thread. This also means that at this request rate the response time experienced by the end user is always 1 second.

To expand on that, the response time experienced by the end user is:

end user response time = (connection queue wait time) + (processing time)

Since we’re not using the connection queue the end user response time is simply the same as the processing time^[3].

So far so good. Now, what happens if the incoming request rate exceeds the maximum sustainable throughput?

At t = 10 seconds, 129 requests come in. 128 go straight to worker threads, 1 sits in wait in the connection queue.
At t = 11 seconds, 128 requests come in. 128 (the one which was waiting + 127 of the new ones) go straight to worker threads, 1 sits in wait in the connection queue.

The connection queue absorbs the bumps in the incoming request rate, so connections are not dropped and worker threads can remain fully utilized at all times. Notice that now out of every 128 requests, one of them will have a response time of 2 seconds.

So what happens next?

If we go back to receiving a steady 128 requests per second, there will always be one requests in the connection queue.

If at some point we only receive 127 requests (or less), the server can “catch up” and the connection queue goes back to staying empty.

On the other hand, if the incoming request rate remains at 129 per second we’re in trouble! Every second the connection queue waiting list will grow longer by one. When it reaches 129 entries, one end user will experience a response time of three seconds, and so on.

And of course, the connection queue is not infinite. If the max connection queue size is 4096 then 4096 seconds later it will fill up and from that point onwards, one incoming request will simply be dropped every second since it has no place to go. At this point the server has reached a steady state. It continues pr
ocessing requests at the same rate as always (128 per second), it continues accepting 128 of the 129 new requests per second and dropping one. End users are certainly unhappy by now because they are experiencing response times of over 30 seconds (4096 / 128 = 32, so it takes 32 seconds for a new request to work its way through the queue. Almost like going to the DMV…

If the incoming request rate drops below the maximum sustainable rate (here, 128/sec) only then can the server start to catch up and eventually clear the queue.

In summary, while this is certainly a greatly simplified model of the request queue behavior, I hope it helps visualize what goes on as request rates go up and down.

Theory aside, what can you do to tune the web server?

The single best thing to do, if possible, is to make the web app respond quicker!
If you want to avoid dropped connections at all cost, you can increase the connection queue size. This will delay the point where the server reaches a steady state and starts dropping connections. Whether this is useful really depends on the distribution of the incoming requests. In the example above we’ve been ass
uming a very steady incoming rate just above the maximum throughput rate. In such a scenario increasing the connection queue isn’t going to help in practice because no matter how large you make it, it will fill up at some point. On the other hand, if the incoming request rate is very bumpy, you can damp it by using a
connection queue large enough to avoid dropping connections. However… consider the response times as well. In the example above your end user is already seeing 33 second response times. Increasing the connection queue length will prevent dropped connections but will only make the response times even longer. At some point the user is simply going to give up so increasing the connection queue any further won’t help!
Another option is to increase the number of worker threads. Whether this will help or hurt depends entirely on the application. If the request processing is CPU bound then it won’t help (actually, if it were truly CPU bound, which is rare, then you’ll probably benefit from reducing the number of worker threads unle
ss your server has 128+ CPUs/cores…) If the web app spends most of its time just waiting for I/O then increasing the worker threads may help. No set answer here, you need to measure your application under load to see.

[1] In reality the response time can’t be deterministic. At best it may be more or less constant up to the point where the server scales linearly but after that the response time is going to increase depending on load. On the flip side, cacheing might make some responses faster than expected. So M/D/c is certainly a
simplification.

[2] Not true for several reasons, but it’ll do for this simplified model and it helps to visualize it that way.

[3] Plus network transmission times but since we’re modeling only the web server internals let’s ignore that.

Web Server 7 Request Limiting Revisited

Posted on 2009/08/25 by jyri

Coincidentally last week I heard a couple related queries about check-request-limits from different customers. I haven’t covered that feature in a while so it’s a good time to revisit it for a bit.

To review, Web Server 7has a feature (function) called check-request-limits which can be used to monitor and limit the request rate and/or concurrency of request which match some criteria. It can be used to address denial of service attacks as well as just to limit request rates to some objects or from some clients for other reasons (for example to reduce bandwidth or cpu usage).

I usually refer to ‘matching requests’ when speaking of this capability. Matching what? Probably the most common use case is to match the client IP address. This is useful when you wish to limit request rates coming from a given client machine. Here’s a basic example of that scenario:

PathCheck fn="check-request-limits" max-rps="10" monitor="$ip"

The common theme to both customer requests I heard last week was whether it is possible to limit requests based on something other than the client IP?

Yes, certainly!

The monitor parameter above is set to “$ip” which expands to the client IP address but you can set it to anything that you prefer. In my introduction to check-request-limits article I gave examples of both “$ip” and “$uri” (and even both combined). You’re not restricted to only these though, you can u
se any of the server variables available in WS7 as the monitor value.

You can also construct more complicated scenarios using the If expressions of Web Server 7. I gave a few examples of that in this article on check-request-limits.

To give a couple more examples, let’s say your web server is behind a proxy and this the client $ip is always the same (the proxy IP). Clearly monitoring the $ip value isn’t terribly useful in that case. Depending on how your application works you may be able to find other useful entries to monitor. For example if the requests contain a custom header named “Usernum” which contains a unique user number, you could monitor that:

PathCheck fn="check-request-limits" max-rps="1" monitor="$headers{'usernum'}"

Or maybe there’s a cookie named customer which can serve as the monitor key:

PathCheck fn="check-request-limits" max-rps="1" monitor="$cookie{'customer'}"

These two are made-up examples, you’ll need to pick a monitor value which is suitable for your application. But I hope these ideas will help you get started.

By the way check-request-limits can also be used to limit concurrency.

What’s Taking So Long

Posted on 2009/08/11 by jyri

While Sun’s Web Server has a very nice threading model, once a worker thread is processing a specific request it will continue working on that request even if it takes a while or blocks.

This is rarely an issue. Static content is served very quickly and code which generates dynamic application content needs to be written so it responds prompt ly. If the application code takes a long time to generate response data the site has more problems than one, so the web application developers have a motivation to keep it snappy.

But what if you do have a bad application which occasionally does take a long time? As requests come in and worker thread go off to process them, each long-running request ties up another worker thread. If requests are coming in faster than the application code can process them, eventually the Web Server will have all its worker threads busy on existing connections.

As you can infer from Basant’s blog entry, the server will still continue accepting new connections because the acceptor thread(s) are separate from the worker threads, so it is still accepting new connections. But there won’t be any spare worker threads to take that new connection from the connection queue.

If you’re the client making the request, you’ll experience the server accepting your request but it won’t answer for a (possibly long) while. Specifically, until one of the previous long-running requests finally completes and a worker thread frees up to take on your request (and of course, there may be many other pending requests piled up).

If this is happening with your application one option is to check the perfdump output
and see which requests are taking a while. But, as these things are bound to do, it’ll probably happen sporadically and never when you’re watching.

So how can we easily gather a bit more info? It’s been said countless times but always worth repeating.. dtrace really is the greatest thing since sliced bread (and I like bread). I can’t imagine attempting to maintain a system without dtrace in this day and age, it would be limiting beyond belief! One of the many key benefits is being able to gather arbitrary data right from the production machine without any prior preparation (such as producing debug builds) or downtime or even any access to the sources you’re diagnosing.

So in that spirit, I tried to gather a bit more data about the requests which appear to be taking a while using dtrace and without attempting to look at what the code is actually doing (well, also because I only had fairly limited time to dedicate to this experiment so didn’t want to go looking at the code ;-). Although, I should mention, since Sun’s
Web Server is open source you certainly could go review the source code if you wish to know more detail.

So what am I looking for? Basically I’d like to know when the worker thread starts on a request and when it is done with it. If the time between those two grows “too long”, I’d like to see what’s going on. Sounds simple enough. Searchingaround a bit I saw Basant’s article on dtrace and Web Server so using his pid$1::flex_log:entry as an exit point seems like a suitable thing to try. I didn’t find (on a superficial search, anyway) a mention of an adequate entry point so instead I took a number of pstack snapshots and looked for something useful there and wound up selecting “pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry” (ugly mangled C++ function name). With that, ran the following dtrace script on the Web Server process:

% cat log.d
#!/usr/sbin/dtrace -qs

pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry
{
  self->begin = timestamp;
  printf("ENTER %d, %d to nn", tid, self->begin);
}

pid$1::flex_log:entry
/self->begin/
{
  self->end = timestamp;
  printf("DONE %d, %d to %dn", tid, self->begin, self->end);
  self->begin = 0;
}

This gets me entry/exit tick marks as the threads work their way through requests. On a mostly unloaded server it’s easy enough to just watch that output, but then you’re probably not experiencing this problem on an unloaded server. So we need a little bit of helper code to track things for us. Twenty minutes of perl later, I have

#!/usr/bin/perl

$PATIENCE = 9;                  # seconds - how long until complains start

$pid = shift @ARGV;
$now = 0;
$npat = $PATIENCE * 1000000000;

open(DD, "./log.d $pid |");
while (<DD>)
{
    chomp;
    ($op, $tid, $t1, $t2) = /(S*) (d*), (d*) to (.*)/;
    if ($t1 > $now) { $now = $t1; }

    # dtrace output can be out of order so include start time in hash key
    $key = "$tid:$t1";          

    if ($op eq "ENTER") {
        if ($pending{$key} != -1) {
            $pending{$key} = $t1 + $npat; # value is deadline time
        }

    } else {
        $took = (($t2 - $t1) / 1000000000);
        if (!$pending{$key}) {
            $pending{$key} = -1; # if DONE seen before ENTER, just ignore it
        } else {
            delete $pending{$key};
        }
    }

    # Once a second, review which threads have been working too long
    # and do a pstack on those.
    # 
    # Note: we only reach here after processing each line of log.d output
    # so if there isn't any more log.d output activity we'll never get here.
    # A more robust implementation is left as an exercise to the reader.
    #
    if ($now > $nextlook) {
        $c = 0;
        foreach $k (keys %pending)
        {
            if ($pending{$k} != -1 && $pending{$k} < $now) {
                ($tid, $started) = $k =~ /(d*):(d*)/;
                $pastdue = ($now - $started) / 1000000000;
                print "=================================================n";
                system("date");
                print "Thread $tid has been at it $pastdue secondsn";
                system("pstack $pid/$tid");
                $c++;
            }
        }
        if ($c) { print "n"; }
        $nextlook = $now + 1000000000;
    }
    
}

The perl code keeps track of the ENTER/DONE ticks (which may occasionally be out of order) and if too long (more than $PATIENCE) goes by, gives you pstack output what’s going on.

I don’t actually have a suitably misbehaving application so I’ll leave it at that. If I had a real application issue, it’d be useful to fine tune the dtrace script to key off of more specific entry and exit points and it’d also be useful to trigger more app-specific data gathering instead of (or in addition to) the pstack call (for instance, checking database availability if you suspect a database response problem, or whatever is suitable for your concrete application).

dtrace is like lego blocks, there’s a thousand and one ways of coming up with something similar. Care to try an alternative or more efficient approach?

Web Server 7 Meets Slowloris

Posted on 2009/06/29 by jyri

Lately there’s been some noise about slowloris, a perl script which sends HTTP requests slowly. While there’s nothing new about this technique, I’ve been asked about it a few times so I wanted to show how easy it is to protect against it if you’re lucky enough to be using Sun’s Web Server 7.

In a nutshell, the script opens a connection to the target web server and sends valid request headers and then continues to send more headers, slowly. Specifically, it first sends:

GET / HTTP/1.1
Host: $hostname
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET
 CLR 1.1.4322; .NET CLR 2.0.503l3; .NET CLR .0.4506.2152; .NET CLR 3.5.30729; MS
Office 12)  
Content-Length: 42
X-a: b

Then it continues to send:

X-a: b

after every $timeout delay. It has a default $timeout of 100 seconds but you can change this with -timeout switch.

Let’s look at the more general cases here instead of just slowloris specifically.

The most rudimentary form of this attack is to open a connection to the web server and either don’t send anything or send a partial request and nothing else after that (as described above, this is not what slowloris does).

You’ll want your web server to eventually time out and close the connection if this happens. In Web Server 7 this is controlled by the io-timeout element in server.xml. The default value is 30 (seconds). Let’s try it:

% date;telnet localhost 80;date
Mon Jun 29 19:05:43 PDT 2009
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection to localhost closed by foreign host.
Mon Jun 29 19:06:14 PDT 2009

As you can see, 31 seconds went by before the connection was closed. You can change io-timeout to be shorter if you wish:

  
<http>
    <io-timeout>15</io-timeout> 
</http>


% date;telnet localhost 8080;date
Mon Jun 29 19:15:12 PDT 2009
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection to localhost closed by foreign host.
Mon Jun 29 19:15:27 PDT 2009

Above I changed the io-timeout to 15 and indeed it took 15 seconds before closing the mute connection. Let’s try the same thing but send a partial request:

% date;telnet localhost 8080;date
Mon Jun 29 19:14:38 PDT 2009
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1
Host: localhost
HTTP/1.1 408 Request Timeout
Server: Sun-Java-System-Web-Server/7.0
Date: Tue, 30 Jun 2009 02:14:54 GMT
Content-length: 148
Content-type: text/html
Connection: close
<HTML><HEAD><TITLE>Request Timeout</TITLE></HEAD>
<BODY><H1>Request Timeout</H1>
The server timed out waiting for the client request.
</BODY></HTML>Connection to localhost closed by foreign host.
Mon Jun 29 19:14:54 PDT 2009

Ok, let’s try to make the attack more interesting. Instead of just going silent, the client can continue sending more request data, just slowly. This is what slowloris does. As long as the client sends a little bit of valid request data often enough to not get disconnected by the timeout it can hold on to the connection.

Fortunately Web Server 7 also monitors the time it takes to receive all the request headers. This can be configured using the request-header-timeout element in server.xml. This can be used to defeat a slowloris-type attack. Even thought the slowloris request never actually completes (since it just keeps sending more headers forever), Web Werver 7 will stop waiting and close it off after request-header-timeout seconds go by.

  <http>
    <request-header-timeout>5</request-header-timeout> 
  </http>

Of course, if you set request-header-timeout to 5s you could then run slowloris with a -timeout of less than 5 seconds. However, this quickly starts to defeat the premise of this style of attack. The idea behind a slowloris-style attack is to attempt to tie up the web server quietly without the client having to generate hundreds or more connections per second. For fun, I set my request-header-timeout to 1s and ran slowloris with a -timeout of 1s. The result is the client machine uses up all its CPU generating new connections while Web Server 7 continues to be happily responsive.

A variant of this attack is to send a POST request, send all the request headers and then start to send the body data, slowly. Note that slowloris does not implement this (the -httpready flag sends a POST instead of a GET, but it continues to send X-a:b request headers, not request body data). However it is easy enough to write a tool to do this instead.

If you encounter that scenario you’re in luck because Web Server 7 also monitors the time for the request body to arrive and you can set a timeout on that as well, using the request-body-timeout element:

  <http>
    <request-body-timeout>5</request-body-timeout> 
  </http>

That’s all there is to it to protect against slowlaris and similar slow-client attacks if you’re using Sun Web Server 7! Enjoy!

Notes on Web Server Open Sourcing

Posted on 2009/01/16 by jyri

Brian Aker wrote about the open sourcing of our web server and it got picked up on slashdot today.

I was reading through the comments and figured I’d throw in a few notes about what this code is and is not…

(I worked directly on the Web Server product for some years and while it is not my day job today, I’m still very closely affiliated with the group who works on the commercial version of this product inside Sun.)

First, the released code is not a snapshot of the Netscape Enterprise Server
from the 90’s!
What it is, is a snapshot of the very latest source code for JES Web Server 7.0 (with some non-core parts removed, such as the administration infrastructure – see full list of differences here.
The commercial version of this product is actively maintained and sold by Sun (note it is free to download and use, however – so feel free to download both the source and the commercial binaries and try/compare both, if you wish).
That said, the code is indeed a direct descendant of the Netscape Enterprise Server. The marketing name changes over the years have not marked rewrites of the core code, it’s been the same code all along.
While the revision history is not part of the open sourced snapshot (sorry), I can mention that in the internal repository of this code I see cvs comments dating back to 1995.
With over ten years of development and bug fixing a lot has changed, naturally. On the other hand, if you were involved with the original product way back then, you’ll definitely find some familiar bits and pieces here and there. As with any mature software product, there are always some parts which have not changed in ages.
So, while not a mummified snapshot, the code is indeed interesting as a piece of Internet history. Furthermore, it is also interesting as a modern living product.
Extreme scalability in multi-CPU (or multi-core) hardware is perhaps the most interesting angle from which to look at the code. (Funnily enough, with the rise of parallelism in modern hardware, maybe the code is becoming more interesting these days instead of less!)
As to who or why be interested, that doesn’t really have any one answer. If you find it interesting or useful for either reason (or some other of your own), enjoy! Being under BSD license, there are many ways to take advantage of it.

stdout

Collected Thoughts

Category Archives: Sun