Web Server 7 and the TLS renegotiation vulnerability

Web Server 7 and the SSL/TLS Vulnerability (CVE-2009-3555)

The recent SSL/TLS protocol vulnerability has been thoroughly covered in the press. Refer to the above link for the formal vulnerability report and refer to any one of many articles on the web for commentary on it.

While the vulnerability is at the SSL/TLS protocol level and impacts all products which support SSL/TLS renegotiation, this article covers it only from the Web Server angle.

Please keep in mind this is not a bug in Web Server nor a bug in NSS. It is flaw in the SSL/TLS protocol specification itself.

What is the vulnerability?

To quote from the CVE report:

[The protocol] 'does not properly associate renegotiation
handshakes with an existing connection, which allows
man-in-the-middle attackers to insert data into HTTPS sessions,
and possibly other types of sessions protected by TLS or SSL, by
sending an unauthenticated request that is processed retroactively
by a server in a post-renegotiation context, related to a
"plaintext injection" attack'

In terms of the Web Server, this means that the MITM (man-in-the-middle) attacker may interact with the web application running on the Web Server for a while and later “hand off” the same SSL/TLS session to the legitimate client in such a way that as far as the Web Server is concerned it was same [legitimate] client all along.

This “hand off” occurs when a renegotiation is done on the SSL/TLS connection. Note that renegotiation may be triggered by either the client(attacker) or the Web Server. The protocol is vulnerable either way, regardless of which party triggers the renegotiation (contrary to some popular belief).

A key point is that the vulnerability is at the SSL/TLS protocol level, in other words, at a lower level than the HTTP connection layer. Even if your Web Server is not configured to ever perform renegotiation explicitly, renegotiation can still occur and thus your site can still be vulnerable. There is nothing you can do to configure Web Server (prior to 7.0u7) to disable renegotiation from happening.

This is why you must upgrade to Web Server 7.0u7 (or later).

The rest of this article goes into more detail for the curious, but the bottom line remains that it is time to upgrade to Web Server 7.0u7 (or later).

Is My Web Server Vulnerable?

If you are not using https at all, your site is not vulnerable (of course, if the site is sending and receiving any sensitive data in clear text it is vulnerable to plenty of other problems, just not this one!)

If your Web Server (pre-7.0u7) is configured to use https and it is not configured to require client-auth, it is open to the renegotiation attack, period.

If client-auth is ‘required’ then that server is not vulnerable. Specifically, you are safe only if the http-listener has this in its configuration:

<http-listener>
...
<ssl>
...
<client-auth>required</client-auth>
...
</ssl>
...
</http-listener>

When client-auth is ‘required’ it means that the Web Server will require the client to provide a valid certificate as part of the initial handshake when establishing the SSL/TLS connection. If no valid client certificate is provided at that point, the connection is never established. Because the HTTP-level connection is never created, there is no window of opportunity for the attacker to send data before the client authentication takes place, defeating the attack.

In short, if you are running Web Server 7.0u6 or earlier and using https the only way to remain safe from this attack is to set <client-auth>required</client-auth> on all the <http-listener> elements which use <ssl>.

Unfortunately there is a significant drawback to doing this. Now all the content on your https site requires client-cert authentication. Clients who access the site without a valid certificate not only cannot read even the home page, they also can’t even get a useful error page. Because the connection attempt is rejected before it ever gets established at the HTTP level, it is not possible for the Web Server to redirect the client to a helpful error page. Your site is safe but the user experience will most likely not be acceptable.

Web Server 7.0u7 – What’s New

Earlier I pointed out that there is nothing you can do to disable renegotiation from occurring. Even if the Web Server is never configured to trigger renegotiation, it can still happen transparently thus it remains vulnerable.

Web Server 7.0u7 includes the latest release of NSS (NSS 3.12.5). The significant change in this release is that SSL/TLS renegotiation is completely disabled. Any attempt to trigger renegotiation (whether initiated by the Web Server itself or by the remote client) will cause
the connection to fail.

The good news is that by simply upgrading to Web Server 7.0u7 your site is now automatically safe from this vulnerability.

Whether there are bad news or not depends on whether your site had any legitimate need for renegotiation. If it did not, there is no bad news. Your site is now safe from this vulnerability and everything continues to work as before.

On the other hand if your site did make use of renegotiation, that capability is now broken.

Does My Site Use Renegotiation?

There is no check box anywhere that says the server needs or does not need renegotiation, so until this vulnerability became public you may have not given any though to whether your Web Server configuration is using renegotiation.

The Web Server uses renegotiation when the web application is configured to require a client certificate for some parts of the content but not for all. This permits the client to request the anonymous areas without presenting a client certificate. If the client clicks on a link to a protected area the Web Server then triggers a renegotiation to obtain the client certificate.

There are a couple of ways to configure this in Web Server 7:

  • Using get-client-cert in obj.conf
    If obj.conf contains a fn=”get-client-cert” dorequest=”1″ line, that is going to trigger renegotiation to obtain client certificate under some conditions (depending on where and how in obj.conf it is invoked).
  • From Java Servlets, using the CLIENT-CERT auth-method in web.xml:
    <login-config>
    <auth-method>CLIENT-CERT</auth-method>
    <login-config>
    

    Same as get-client-cert this also triggers a renegotiation to obtain the client certificate only when needed. Refer to the Servlet specification for more info on web.xml.

If the server.xml <client-auth> element is not set to ‘required’ and your web application uses either of the above mechanisms to trigger the need for a client certificate for some parts of the application, then the Web Server is using renegotiation. This means this functionality will be broken after upgrading to Web Server 7.0u7.

Unfortunately there is no way around this. The current SSL/TLS renegotiation is fundamentally broken so it cannot be used safely.

But I Like It When My Web Site Is Vulnerable To Attacks!

Really?

If you absolutely must have renegotiation support, please reread this document from the top. There is no safe way to enable renegotiation, if you enable it your site is vulnerable.

If despite everything you still feel you must have the broken renegotiation support, it can be done as follows:

Environment variable:  NSS_SSL_ENABLE_RENEGOTIATION
Values: "0" or "Never"     (or just "N" or "n")  is the default setting
disables ALL renegotiation
"1" or "Unlimited" (or just "U" or "u")
re-enables the old type of renegotiation and IS VULNERABLE

If you set NSS_SSL_ENABLE_RENEGOTIATION=1 in the environment from where you start the Web Server 7 instance, renegotiation will work as it did in Web Server 7.0u6 and earlier. Which is to say, you’ll be vulnerable to attacks again. Obviously, we never recommend doing this.

Other Possibilities

The current state is very unfortunate. Renegotiation was a useful mechanism for requesting client certificate authentication for only some parts of the web application. Now there is no way to do so safely. As noted earlier this vulnerability is not a bug in the Web
Server implementation of SSL/TLS, it is a fundamental flaw in the protocol specification. Therefore there it can only be fixed at the protocol level (see next section). Until that happens there is nothing the Web Server can do to provide a safe implementation so it is a fact of life that renegotiation can no longer be used.

Here is one possibility which may ameliorate the limitation for some sites. It requires some site refactoring work but may offer relief (thanks to Nelson Bolyard of the NSS team for the idea):

Consider refactoring your https content into two separate http-listeners:

  • http-listener ls1: port 443 (standard SSL port), no client-auth
  • http-listener ls2: some other port (say, 2443), client-auth=required

Because you have upgraded to Web Server 7.0u7, listener ls1 is safe because renegotiation is disabled. Listener ls2 is also safe because it has client-auth=required.

Refactor your web application so that whenever a link into a protected area is accessed it it sent to https://example.com:2443/… (where example.com is your site) instead.

This allows clients to access the anonymous content on https://example.com/ and also allows requesting client certificate authentication when needed, on https://example.com:2443/, all while avoiding any use of renegotiation.

If you decide to try this approach feel free to share your experiences on the Web Server forum. Keep in mind that if your Web Server is behind a reverse proxy or a load balancer or other such frontend, you’ll need to arrange so the proper ports are reached as needed.

The Future

Work is underway on an enhanced TLS renegotiation protocol which will not be susceptible to the vulnerability. For info refer to: http://tools.ietf.org/html/draft-ietf-tls-renegotiation-01

As soon as the work is complete and a stable implementation is released, a future update of Web Server 7 will contain support for this enhanced renegotiation. Further details on it will be documented
at that time.

Keep in mind that both the server and the clients will need to be upgraded in order to communicate via the new protocol. While Web Server 7 will be upgraded as soon as possible and browsers which use NSS (such as Firefox) will likely also be upgraded promptly, there will remain a vast installed base of older browsers which will not be compatible with the enhancements for a long time. Some clients, such as those in embedded devices, may well never be upgraded. Therefore, a full transition to the new renegotiation will take considerable time.

Posted in Sun

Request Processing Capacity

Q: How many requests per second can the Web Server handle?

Short answer: It depends.

Long answer: It really depends on many factors.

Ok, ok.. sillyness aside, can we make any ballpark estimates?

The Web Server can be modeled as a queue. By necessity such modeling will be a simplification at best, but it may provide a useful mental model to visualize request processing inside the server.

Let’s assume your web application has a fairly constant processing time[1], so we’ll model the Web Server as a M/D/c queue where c is the number of worker threads. In this scenario, the Web Server has a maximum sustainable throughput of c / (processing time).

To use some simple numbers, let’s say your web app takes 1 second to process a request (that’s a very slow web application!). If the Web Server has c=128 worker threads, that means it can indefinitely sustain a max request rate of:

128/1 = 128 requests per second

This makes a lot of sense if we think about it:

  • At t = 0 seconds, 128 request come in and each one is taken by a worker thread, fully utilizing server capacity.
  • At t = 1 second, all those requests complete and responses are sent back to the client and at the same time 128 new requests come in and the cycle repeats.

At this request rate we don’t need a connection queue at all[2] because all requests go straight to a worker thread. This also means that at this request rate the response time experienced by the end user is always 1 second.

To expand on that, the response time experienced by the end user is:

end user response time = (connection queue wait time) + (processing time)

Since we’re not using the connection queue the end user response time is simply the same as the processing time[3].

So far so good. Now, what happens if the incoming request rate exceeds the maximum sustainable throughput?

  • At t = 10 seconds, 129 requests come in. 128 go straight to worker threads, 1 sits in wait in the connection queue.
  • At t = 11 seconds, 128 requests come in. 128 (the one which was waiting + 127 of the new ones) go straight to worker threads, 1 sits in wait in the connection queue.

The connection queue absorbs the bumps in the incoming request rate, so connections are not dropped and worker threads can remain fully utilized at all times. Notice that now out of every 128 requests, one of them will have a response time of 2 seconds.

So what happens next?

If we go back to receiving a steady 128 requests per second, there will always be one requests in the connection queue.

If at some point we only receive 127 requests (or less), the server can “catch up” and the connection queue goes back to staying empty.

On the other hand, if the incoming request rate remains at 129 per second we’re in trouble! Every second the connection queue waiting list will grow longer by one. When it reaches 129 entries, one end user will experience a response time of three seconds, and so on.

And of course, the connection queue is not infinite. If the max connection queue size is 4096 then 4096 seconds later it will fill up and from that point onwards, one incoming request will simply be dropped every second since it has no place to go. At this point the server has reached a steady state. It continues pr
ocessing requests at the same rate as always (128 per second), it continues accepting 128 of the 129 new requests per second and dropping one. End users are certainly unhappy by now because they are experiencing response times of over 30 seconds (4096 / 128 = 32, so it takes 32 seconds for a new request to work its way through the queue. Almost like going to the DMV…

If the incoming request rate drops below the maximum sustainable rate (here, 128/sec) only then can the server start to catch up and eventually clear the queue.

In summary, while this is certainly a greatly simplified model of the request queue behavior, I hope it helps visualize what goes on as request rates go up and down.

Theory aside, what can you do to tune the web server?

  • The single best thing to do, if possible, is to make the web app respond quicker!
  • If you want to avoid dropped connections at all cost, you can increase the connection queue size. This will delay the point where the server reaches a steady state and starts dropping connections. Whether this is useful really depends on the distribution of the incoming requests. In the example above we’ve been ass
    uming a very steady incoming rate just above the maximum throughput rate. In such a scenario increasing the connection queue isn’t going to help in practice because no matter how large you make it, it will fill up at some point. On the other hand, if the incoming request rate is very bumpy, you can damp it by using a
    connection queue large enough to avoid dropping connections. However… consider the response times as well. In the example above your end user is already seeing 33 second response times. Increasing the connection queue length will prevent dropped connections but will only make the response times even longer. At some point the user is simply going to give up so increasing the connection queue any further won’t help!

  • Another option is to increase the number of worker threads. Whether this will help or hurt depends entirely on the application. If the request processing is CPU bound then it won’t help (actually, if it were truly CPU bound, which is rare, then you’ll probably benefit from reducing the number of worker threads unle
    ss your server has 128+ CPUs/cores…) If the web app spends most of its time just waiting for I/O then increasing the worker threads may help. No set answer here, you need to measure your application under load to see.

[1] In reality the response time can’t be deterministic. At best it may be more or less constant up to the point where the server scales linearly but after that the response time is going to increase depending on load. On the flip side, cacheing might make some responses faster than expected. So M/D/c is certainly a
simplification.

[2] Not true for several reasons, but it’ll do for this simplified model and it helps to visualize it that way.

[3] Plus network transmission times but since we’re modeling only the web server internals let’s ignore that.

Posted in Sun

Web Server 7 Request Limiting Revisited

Coincidentally last week I heard a couple related queries about check-request-limits from different customers. I haven’t covered that feature in a while so it’s a good time to revisit it for a bit.

To review, Web Server 7has a feature (function) called check-request-limits which can be used to monitor and limit the request rate and/or concurrency of request which match some criteria. It can be used to address denial of service attacks as well as just to limit request rates to some objects or from some clients for other reasons (for example to reduce bandwidth or cpu usage).

I usually refer to ‘matching requests’ when speaking of this capability. Matching what? Probably the most common use case is to match the client IP address. This is useful when you wish to limit request rates coming from a given client machine. Here’s a basic example of that scenario:

 

PathCheck fn="check-request-limits" max-rps="10" monitor="$ip"

The common theme to both customer requests I heard last week was whether it is possible to limit requests based on something other than the client IP?

Yes, certainly!

The monitor parameter above is set to “$ip” which expands to the client IP address but you can set it to anything that you prefer. In my introduction to check-request-limits article I gave examples of both “$ip” and “$uri” (and even both combined). You’re not restricted to only these though, you can u
se any of the server variables available in WS7 as the monitor value.

You can also construct more complicated scenarios using the If expressions of Web Server 7. I gave a few examples of that in this article on check-request-limits.

To give a couple more examples, let’s say your web server is behind a proxy and this the client $ip is always the same (the proxy IP). Clearly monitoring the $ip value isn’t terribly useful in that case. Depending on how your application works you may be able to find other useful entries to monitor. For example if the requests contain a custom header named “Usernum” which contains a unique user number, you could monitor that:

PathCheck fn="check-request-limits" max-rps="1" monitor="$headers{'usernum'}"

Or maybe there’s a cookie named customer which can serve as the monitor key:

 

PathCheck fn="check-request-limits" max-rps="1" monitor="$cookie{'customer'}" 

These two are made-up examples, you’ll need to pick a monitor value which is suitable for your application. But I hope these ideas will help you get started.

By the way check-request-limits can also be used to limit concurrency.

Posted in Sun

What’s Taking So Long

While Sun’s Web Server has a very nice threading model, once a worker thread is processing a specific request it will continue working on that request even if it takes a while or blocks.

This is rarely an issue. Static content is served very quickly and code which generates dynamic application content needs to be written so it responds prompt ly. If the application code takes a long time to generate response data the site has more problems than one, so the web application developers have a motivation to keep it snappy.

But what if you do have a bad application which occasionally does take a long time? As requests come in and worker thread go off to process them, each long-running request ties up another worker thread. If requests are coming in faster than the application code can process them, eventually the Web Server will have all its worker threads busy on existing connections.

As you can infer from Basant’s blog entry, the server will still continue accepting new connections because the acceptor thread(s) are separate from the worker threads, so it is still accepting new connections. But there won’t be any spare worker threads to take that new connection from the connection queue.

If you’re the client making the request, you’ll experience the server accepting your request but it won’t answer for a (possibly long) while. Specifically, until one of the previous long-running requests finally completes and a worker thread frees up to take on your request (and of course, there may be many other pending requests piled up).

If this is happening with your application one option is to check the perfdump output
and see which requests are taking a while. But, as these things are bound to do, it’ll probably happen sporadically and never when you’re watching.

So how can we easily gather a bit more info? It’s been said countless times but always worth repeating.. dtrace really is the greatest thing since sliced bread (and I like bread). I can’t imagine attempting to maintain a system without dtrace in this day and age, it would be limiting beyond belief! One of the many key benefits is being able to gather arbitrary data right from the production machine without any prior preparation (such as producing debug builds) or downtime or even any access to the sources you’re diagnosing.

So in that spirit, I tried to gather a bit more data about the requests which appear to be taking a while using dtrace and without attempting to look at what the code is actually doing (well, also because I only had fairly limited time to dedicate to this experiment so didn’t want to go looking at the code ;-). Although, I should mention, since Sun’s
Web Server is open source you certainly could go review the source code if you wish to know more detail.

So what am I looking for? Basically I’d like to know when the worker thread starts on a request and when it is done with it. If the time between those two grows “too long”, I’d like to see what’s going on. Sounds simple enough. Searchingaround a bit I saw Basant’s article on dtrace and Web Server so using his pid$1::flex_log:entry as an exit point seems like a suitable thing to try. I didn’t find (on a superficial search, anyway) a mention of an adequate entry point so instead I took a number of pstack snapshots and looked for something useful there and wound up selecting “pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry” (ugly mangled C++ function name). With that, ran the following dtrace script on the Web Server process:

% cat log.d
#!/usr/sbin/dtrace -qs

pid$1::__1cLHttpRequestNHandleRequest6MpnGnetbuf_I_i_:entry
{
  self->begin = timestamp;
  printf("ENTER %d, %d to nn", tid, self->begin);
}

pid$1::flex_log:entry
/self->begin/
{
  self->end = timestamp;
  printf("DONE %d, %d to %dn", tid, self->begin, self->end);
  self->begin = 0;
}

This gets me entry/exit tick marks as the threads work their way through requests. On a mostly unloaded server it’s easy enough to just watch that output, but then you’re probably not experiencing this problem on an unloaded server. So we need a little bit of helper code to track things for us. Twenty minutes of perl later, I have

#!/usr/bin/perl

$PATIENCE = 9;                  # seconds - how long until complains start

$pid = shift @ARGV;
$now = 0;
$npat = $PATIENCE * 1000000000;

open(DD, "./log.d $pid |");
while (<DD>)
{
    chomp;
    ($op, $tid, $t1, $t2) = /(S*) (d*), (d*) to (.*)/;
    if ($t1 > $now) { $now = $t1; }

    # dtrace output can be out of order so include start time in hash key
    $key = "$tid:$t1";          

    if ($op eq "ENTER") {
        if ($pending{$key} != -1) {
            $pending{$key} = $t1 + $npat; # value is deadline time
        }

    } else {
        $took = (($t2 - $t1) / 1000000000);
        if (!$pending{$key}) {
            $pending{$key} = -1; # if DONE seen before ENTER, just ignore it
        } else {
            delete $pending{$key};
        }
    }

    # Once a second, review which threads have been working too long
    # and do a pstack on those.
    # 
    # Note: we only reach here after processing each line of log.d output
    # so if there isn't any more log.d output activity we'll never get here.
    # A more robust implementation is left as an exercise to the reader.
    #
    if ($now > $nextlook) {
        $c = 0;
        foreach $k (keys %pending)
        {
            if ($pending{$k} != -1 && $pending{$k} < $now) {
                ($tid, $started) = $k =~ /(d*):(d*)/;
                $pastdue = ($now - $started) / 1000000000;
                print "=================================================n";
                system("date");
                print "Thread $tid has been at it $pastdue secondsn";
                system("pstack $pid/$tid");
                $c++;
            }
        }
        if ($c) { print "n"; }
        $nextlook = $now + 1000000000;
    }
    
}

The perl code keeps track of the ENTER/DONE ticks (which may occasionally be out of order) and if too long (more than $PATIENCE) goes by, gives you pstack output what’s going on.

I don’t actually have a suitably misbehaving application so I’ll leave it at that. If I had a real application issue, it’d be useful to fine tune the dtrace script to key off of more specific entry and exit points and it’d also be useful to trigger more app-specific data gathering instead of (or in addition to) the pstack call (for instance, checking database availability if you suspect a database response problem, or whatever is suitable for your concrete application).

dtrace is like lego blocks, there’s a thousand and one ways of coming up with something similar. Care to try an alternative or more efficient approach?

Posted in Sun

Web Server 7 Meets Slowloris

Lately there’s been some noise about slowloris, a perl script which sends HTTP requests slowly. While there’s nothing new about this technique, I’ve been asked about it a few times so I wanted to show how easy it is to protect against it if you’re lucky enough to be using Sun’s Web Server 7.

In a nutshell, the script opens a connection to the target web server and sends valid request headers and then continues to send more headers, slowly. Specifically, it first sends:

GET / HTTP/1.1
Host: $hostname
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET
 CLR 1.1.4322; .NET CLR 2.0.503l3; .NET CLR .0.4506.2152; .NET CLR 3.5.30729; MS
Office 12)  
Content-Length: 42
X-a: b

Then it continues to send:

X-a: b

after every $timeout delay. It has a default $timeout of 100 seconds but you can change this with -timeout switch.

Let’s look at the more general cases here instead of just slowloris specifically.

The most rudimentary form of this attack is to open a connection to the web server and either don’t send anything or send a partial request and nothing else after that (as described above, this is not what slowloris does).

You’ll want your web server to eventually time out and close the connection if this happens. In Web Server 7 this is controlled by the io-timeout element in server.xml. The default value is 30 (seconds). Let’s try it:

% date;telnet localhost 80;date
Mon Jun 29 19:05:43 PDT 2009
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection to localhost closed by foreign host.
Mon Jun 29 19:06:14 PDT 2009

As you can see, 31 seconds went by before the connection was closed. You can change io-timeout to be shorter if you wish:

  
<http>
    <io-timeout>15</io-timeout> 
</http>


% date;telnet localhost 8080;date
Mon Jun 29 19:15:12 PDT 2009
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection to localhost closed by foreign host.
Mon Jun 29 19:15:27 PDT 2009

Above I changed the io-timeout to 15 and indeed it took 15 seconds before closing the mute connection. Let’s try the same thing but send a partial request:

% date;telnet localhost 8080;date
Mon Jun 29 19:14:38 PDT 2009
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1
Host: localhost
HTTP/1.1 408 Request Timeout
Server: Sun-Java-System-Web-Server/7.0
Date: Tue, 30 Jun 2009 02:14:54 GMT
Content-length: 148
Content-type: text/html
Connection: close
<HTML><HEAD><TITLE>Request Timeout</TITLE></HEAD>
<BODY><H1>Request Timeout</H1>
The server timed out waiting for the client request.
</BODY></HTML>Connection to localhost closed by foreign host.
Mon Jun 29 19:14:54 PDT 2009

Ok, let’s try to make the attack more interesting. Instead of just going silent, the client can continue sending more request data, just slowly. This is what slowloris does. As long as the client sends a little bit of valid request data often enough to not get disconnected by the timeout it can hold on to the connection.

Fortunately Web Server 7 also monitors the time it takes to receive all the request headers. This can be configured using the request-header-timeout element in server.xml. This can be used to defeat a slowloris-type attack. Even thought the slowloris request never actually completes (since it just keeps sending more headers forever), Web Werver 7 will stop waiting and close it off after request-header-timeout seconds go by.

  <http>
    <request-header-timeout>5</request-header-timeout> 
  </http>

Of course, if you set request-header-timeout to 5s you could then run slowloris with a -timeout of less than 5 seconds. However, this quickly starts to defeat the premise of this style of attack. The idea behind a slowloris-style attack is to attempt to tie up the web server quietly without the client having to generate hundreds or more connections per second. For fun, I set my request-header-timeout to 1s and ran slowloris with a -timeout of 1s. The result is the client machine uses up all its CPU generating new connections while Web Server 7 continues to be happily responsive.

A variant of this attack is to send a POST request, send all the request headers and then start to send the body data, slowly. Note that slowloris does not implement this (the -httpready flag sends a POST instead of a GET, but it continues to send X-a:b request headers, not request body data). However it is easy enough to write a tool to do this instead.

If you encounter that scenario you’re in luck because Web Server 7 also monitors the time for the request body to arrive and you can set a timeout on that as well, using the request-body-timeout element:

  <http>
    <request-body-timeout>5</request-body-timeout> 
  </http>

That’s all there is to it to protect against slowlaris and similar slow-client attacks if you’re using Sun Web Server 7! Enjoy!

 

Posted in Sun

Notes on Web Server Open Sourcing

Brian Aker wrote about the open sourcing of our web server and it got picked up on slashdot today.

I was reading through the comments and figured I’d throw in a few notes about what this code is and is not…

(I worked directly on the Web Server product for some years and while it is not my day job today, I’m still very closely affiliated with the group who works on the commercial version of this product inside Sun.)

  • First, the released code is not a snapshot of the Netscape Enterprise Server
    from the 90’s!
  • What it is, is a snapshot of the very latest source code for JES Web Server 7.0 (with some non-core parts removed, such as the administration infrastructure – see full list of differences here.
  • The commercial version of this product is actively maintained and sold by Sun (note it is free to download and use, however – so feel free to download both the source and the commercial binaries and try/compare both, if you wish).
  • That said, the code is indeed a direct descendant of the Netscape Enterprise Server. The marketing name changes over the years have not marked rewrites of the core code, it’s been the same code all along.
  • While the revision history is not part of the open sourced snapshot (sorry), I can mention that in the internal repository of this code I see cvs comments dating back to 1995.
  • With over ten years of development and bug fixing a lot has changed, naturally. On the other hand, if you were involved with the original product way back then, you’ll definitely find some familiar bits and pieces here and there. As with any mature software product, there are always some parts which have not changed in ages.
  • So, while not a mummified snapshot, the code is indeed interesting as a piece of Internet history. Furthermore, it is also interesting as a modern living product.
  • Extreme scalability in multi-CPU (or multi-core) hardware is perhaps the most interesting angle from which to look at the code. (Funnily enough, with the rise of parallelism in modern hardware, maybe the code is becoming more interesting these days instead of less!)
  • As to who or why be interested, that doesn’t really have any one answer. If you find it interesting or useful for either reason (or some other of your own), enjoy! Being under BSD license, there are many ways to take advantage of it.
Posted in Sun

Announcing Open Source Web Server

I’m happy to announce that our Web Server product (about which I’ve been writing here for a few years now) is now open sourced and available as part of the OpenSolaris Web Stack community!

Well, technically it is not exactly the Web Server product, since the open sourced code does not include some of the value-add components such as the administration framework. But it is the real deal, the massively scalable web server core which is used in the JES Web Server 7.0 product is now all open source!

This marks another milestone in the very long history of this web server. Back in the 90’s this was the Netscape Enterprise Server, which later morphed into the iPlanet Web Server during the Sun|Netscape Alliance. After some years it was renamed the SunONE Web Server and most recently renamed again to the JES Web Server (Sun just like to keep you confused, thus the constant renaming of the product!)

The code is placed under BSD license, this should allow for good cross pollination with other web tier projects.

Enjoy!

Source code is available via:

% hg clone ssh://anon@hg.opensolaris.org/hg/webstack/webserver

Build instructions are here: http://wikis.sun.com/display/wsFOSS/Build+Instructions

(The code itself is highly portable as you can see based on the supported platforms of the commercial product. Building on other platforms is a bit more involved due to dependencies so the build instructions only cover the more flexible platforms.)

(edit: adding link to top level info page)

More info here: http://wikis.sun.com/display/wsFOSS/Open+Web+Server

Posted in Sun

Unconsolidating

I’ve mentioned in the past some of the complexities introduced by the consolidation model as we attempt to make open source components available for OpenSolaris.

Tonight I’ll take a shot at listing some of the major difficulties with the SFW consolidation model when applied to the goal of making a broad set of packages available for OpenSolaris – something which will be critical as we move towards IPS.

Looking at Debian unstable I see there are close to 27000 packages available today. Looking at SFW I see about 100 components which produce 158 packages. As I mentioned before, that’s not a complete comparison since other consolidations also deliver various open source components, so the total available for OpenSolaris is a good bit higher.

To succinctly state a goal, here’s what I want to see:

% pkg status -a | wc -l
27000

So… how do we get there?

Or, to the topic of this entry, why can’t we get there via SFW consolidation model?

1. SFW source/build model cannot scale

SFW today is one single source repository (browse it here or download the tarball here). A built tree takes about 7.5GB. On a V2100z (dual Opteron) the build takes about 3 hours. To reiterate, this is about 100 components producing 158 packages. Let’s say we succeed and end up with something on the order of 20000 packages (~126x)? That’s close to a terabyte for each build tree and the build would take about 16 days ;-)

Those numbers are clearly beyond silly… it can’t be done. I could just stop here since this reason alone guarantees that the current SFW consolidation model cannot be used going forward for too long. Admittedly getting to 20000 packages will take time so we can ignore this problem for a little while, but it is better to start planning now instead of waiting until a build takes more than one working day. BTW it only takes about 420 packages for the build to take a full working day (8 hours). Even if 20000 packages might be a lofty far away goal (but one I believe we must achieve), 420 packages is right around the corner.

Requirement: Individual components must be able to build & deliver without checking out or building the rest of the package universe.

2. Centralized breakage

If there is a bug in Ruby build why should the PHP development team be prevented from making progress because they can’t build either? (Not to pick on Ruby which works great ;-). The single tree/single build nature of SFW means that if the build is busted for any reason, it’s broken for everyone. Historically it hasn’t been a consolidation that moves very fast – a few dozen components, most of which don’t change at all, only a few changes trickling in each month. In that environment, the single point of global failure is quite manageable.

Once again, fast forward to thousands or tens of thousands of components being maintaned by hundreds of dispersed teams. Conservatively let’s say each package changes (version updates or any bug fixes) only twice a year. With 20000 packages, that’s already over a hundred changes every day of the year! It’s pretty much guaranteed that the build will be broken nearly all the time (but it’ll take you 16 days to find out why and by then close to 2000 additional changes have gone in!)

Requirement: Bugs & build problems in some components cannot stop progress on entirely unrelated components.

3. Serialization of efforts

During the past four months (duration of the Web Stack project) we had less than half a dozen teams actively putting back changes into SFW. Even with such a tiny numbers, there have been times when one team is held up becau
se the consolidation gate is waiting on some other completely unrelated putback.

As before, this will not work when the level of activity goes up. Once you go from half a dozen teams and a handful of checkins a week to hundreds of teams performing a hundred checkins a day, any serialization will cause the queue of pending changes to quickly grow out of control.

Requirement: Independent teams need to be able to check in code and deliver packages without contending on a single synchronization point.

4. Release early, release … eventually?

We’re all familiar with the phrase release early, release often. The consolidation model used by SFW is the opposite of this idea. The consolidation model assumes that there is one
development team with the resources to do all the development and all the testing necessary to polish the component to perfection. Only after perfection has been reached does it get integrated into the consolidation, after which no further work is usually needed (at least until new features are requested). Admittedly there elegance in this approach. Unfortunately it also doesn’t really apply to the task of packaging third party open source components, where the community feedback loop embodied by “release early, release often” is a vital part of the cycle.

Here is a concrete example taken from my team’s initial PHP integration:

In early August we had PHP packages suitable for installation and experimentation. While there were known problems they wouldn’t have prevented using the packages for early testing. Unfortunately we had no convenient way of publishing them because the consolidation model needs the fully polished final version to be done before putback. So we missed out from any potential feedback on these early packages which would’ve been quite valuable.

By mid-September the work was complete and the packages were ready. By now I had set up an internal IPS repository for distributing the finished work inside Sun. But as we have no external IPS repository to publish into, we still had no convenient way to truly publish the completed work. Again we’re missing out on all potential community feedback.

In mid-October all the processes have completed and we make the checkin into SFW consolidation. Unfortunately checking the code in doesn’t really make it available unless you have access to the SFW gate and are willing to build it yourself. It’s not until 2 weeks later that the packages show up when snv_b76 .iso files become available. And after that, it’s roughly another week before b76 is available for download. So it’s only in early November that the general community – all of you – has easy access to readily installable packages, even though preliminary packages suitable for testing were available to us three full months earlier. The worst part is that by then it i
s so late in the release cycle for SXDE that even if we get suggestions or bug reports, there is little to no time to act on them. It is the kind of feedback that would’ve been most useful back in August.

Requirement: Ability for each component team to independently release early, release often.

5. Schedule synchronization

Another difficulty introduced by the consolidation model is the synchronization of schedules of all components. Since it is a single source tree built at once (every 2 weeks) it requires all putbacks to be synchronized around that beat. This is a lesser issue but even then, even at the current slow pace of change, there’s been some inefficiencies introduced by this. Even minor inefficiencies can become a problem once scaled up to thousands of packages and/or contributors so it is worth keeping in mind. Particularly for packages being maintained by community members who don’t necessarily work to Sun’s schedule.

Requirement: Same as #3: Independent teams need to be able to check in code and deliver packages without contending on a single synchronization point.

Well, it’s easy to see that the existing consolidation model doesn’t quite work for the goal of massively scaling up the number of packages available for OpenSolaris. But then, what’s next? In my next article I’ll explore some thoughts on ways we might move forward.

RSA 1024

At the Java One ECC BOF two weeks ago I mentioned in passing that 1024 bit RSA keys are probably not the wisest choice for too much longer. As a timely follow-up to that discussion, looks like Lenstra has been up to his factorization games again. There are a number of articles about it, here’s one from the register.

Posted in Sun