How not to fix a gooseneck mount!

After 8 years on the waiting list, as of this summer I have a slip in Santa Cruz Harbor and my Colgate 26 is here, home, at last.

For the last several years she has been in charter service with Club Nautique in Sausalito while I waited for the slip to come in. Now that she’s finally home I’ve been fixing some of the damage from the charter service. Chartering is a hard life for the boat, she gets used often mostly by novices and for them it’s a “rental” so there’s not a lot of care. Fortunately the Colgate 26 was designed and built specifically for training so it can take a lot of abuse. I always expected to have to do some cleanup to bring her back to top shape, but some of the things I’m discovering as I go through the boat are shocking.

The gooseneck mount is nearly torn off the mast. Probably, I can only speculate (Club Nautique never told me about the accident), due to a violent jibe.  In any case, instead of fixing the damage, Club Nautique put two hose clamps around the mount and the mast to hold it in place!!! Unbelievable.

torn gooseneck mount

After I removed the hose clamps the full extent of the damage was visible. Of the six rivets, two were gone and the other four were completely lose. I drilled out the remaining rivets and need to decide how best to fix this properly (the existing holes are too elongated to try to use the same size rivets again). The bracket itself shows some stress cracks as well.

Update: Ballenger Spars built me a nice custom bracket to replace the damaged part, it has a different shape so the rivets fall farther apart to avoid the damaged holes in the mast.

 

TCP connection to local MySQL with Ruby

Here’s a small detail that took me a while to discover. Maybe it helps you, or at least it’ll probably help me in the future when I need to do this again…

I was working on a ruby script which made some calls to MySQL. Among other things, the script takes an argument with the hostname of the database which it later uses to open the connection. The implementation is a bit too smart for its own good though.. if the hostname is “localhost” it won’t do a TCP connection. I needed to force it to do a TCP connection even if it was localhost. I couldn’t find it clearly documented anywhere, but turns out there is a way. You need to set the OPT_PROTOCOL option to 1:

h = Mysql.init
h.options(Mysql::OPT_PROTOCOL, 1) # 1=TCP, force TCP connection
connection = h.real_connect(dbhost, $MYSQL_UID, $MYSQL_PWD)

That’ll do it.

As to why I needed this? The MySQL server was actually on a different server but I had set up an ssh tunnel to it mapped to localhost:3306. So, the ruby MySQL library assumption that “localhost” connection must be to a local process was not true in this case.

RPM Scripts

When creating an rpm package, the spec file can specify a number of scripts to be run before and after package install and uninstall. For the simple cases of a new install or an uninstall it is obvious which script runs when. However, the documentation didn’t seem very clear on the behavior during a package upgrade (rpm -U).

Documenting this as a note to my future self, for the next time I need it… The table shows which script runs when (and in what order) and what integer parameter it is given:

Fresh install (rpm -i) %pre     1
%post    1
Upgrade (rpm -U) %pre      2
%post     2
%preun  1
%postun 1
Uninstall (rpm -e) %preun  0
%postun 0

With this, the scripts can do something like:

%pre

case "$*" in
  1)
    echo package is about to be installed for first time
    ;;
  2)
    echo package is about to be upgraded, prepare component
    echo for upgrade (e.g. stop daemons, etc)
    ;;
esac
exit 0

Firewalls and database pools

Recently I had been seeing the occasional request taking a very long time to complete. It was happening very rarely, but enough to be worth investigating.

Looking at diagnostic logs I could see that when it happened, the high level reason was that getting a database connection from the c3p0 connection pool took a long, long time (about 15 minutes or often more).

The pool already had checkoutTimeout set to 30 seconds precisely to avoid having a request sit around forever if for some reason the connection could not be acquired in reasonable time. So whatever was causing the delay was ignoring this setting. The docs say this setting changes “the number of milliseconds a client calling getConnection() will wait for a Connection to be checked-in or acquired when the pool is exhausted”.  Turns out the key part here is “when the pool is exhausted” – from the pool statistics at the time of the slow request I could see that the pool was not exhausted, there were several idle connections available to be had. Which just made it stranger that it would take so long, but explains why this timeout was not relevant.

Trading some speed for more reliability, the server is also configured to testConnectionOnCheckout. This helps almost guarantee the connection will be good when the applications gets it (almost, because it could still become stale in the short time window between the time it is checked out and the time the application actually uses it). Since idle connections are available in the pool, it seems the only reason grabbing one could take a long time was if this test took a long time. But this didn’t make much sense, if the database was down or unreachable the test normally promptly fails.

I should mention that other requests which came in at about the same time as the slow request had no trouble getting their connections from the pool and those requests completed in normal time. So there was no connectivity problem to the database nor was the database responding slowly.

Long story short, I discovered there is a stateful firewall between the web server and the database.

So, turns out that if a database connection in the pool sat around unused long enough, the firewall dropped it from its connection table. When one of these connections was later grabbed from the pool, c3p0 attempted to test the connection by sending a query to the database. In normal conditions this would either work or quickly fail. But here the firewall was silently dropping all packets related to this connection, so the network stack on the web server machine kept retrying for a long time before giving up.  Only after that did c3p0 get the network failure, dropped that connection, created a new one and handed it to the application.

This was happening very rarely because most of the time my server gets a fairly steady load of concurrent requests, so the connections in the pool get used frequently enough that the firewall never drops them.  The problem surfaced only after there was an occasional spike in concurrent requests which led c3p0 to add more connections to the pool. After the spike was over the pool now had some “extra” connections above and beyond the normal use pattern so some of these connections did now sit unused for long enough to be dropped by the firewall. Eventually a request got unlucky and got one of these long-idled connections and ran into the problem.

The easiest solution was to set maxIdleTime from the default of zero (no timeout) to a time just a bit shorter than the firewall timeout. With that, connections which are about to be dropped by the firewall instead get dropped by c3p0 first. It’s a bit unfortunate since it causes otherwise unnecessary churn of good connections, but it is certainly better then the alternative. After I changed this setting, we haven’t seen any more problems.

Performance Testing With Faban

I’ve been quiet on the blog front lately, having all kinds of fun with new challenges. Now that I got around to installing WordPress maybe it is time to write again…

One of the interesting things I’m looking into at Proofpoint is the performance and scalability of one of the web services we offer. Architecturally the server is of a fairly standard design; built using Java Servlets and it provides a number of services via REST APIs.

In between working on features, I’ve been having fun exploring the performance characteristics. After setting up a suitable lab environment, the next question was deciding which load generator to use.

Back at Sun while working on the Web Server I had used Faban so I was already familiar with it, although only with running the load tests, not writing them. Earlier with the SunONE Application Server I had also used jmeter quite a lot so that was another choice. In the end, I decided to try first with Faban.

While there is a lot of documentation on the Faban web site, there are also gaps in the explanations that can be quite confusing. I found that it took some experimentation to get a custom benchmark driver to work. One drawback is that Faban doesn’t do a very good job of reporting problem root causes. Often if it doesn’t like something about the test or configuration it just doesn’t work, but finding out why involves trial and error. Oh well. Still, in the end it is fairly straightforward so with a bit of patience I had a nice custom benchmark with exercises the primary REST APIs of the server.

One thing I found was that none of the convenience HTTP query APIs built into Faban was suitable for my needs because they insisted in encoding the request data in ways not compatible with the server. The solution turned out to be easy in hindsight but difficult to find in the documentation at first, so documenting it is the primary reason for this article…

I ended up using the TimedSocketFactory provided by Faban. In the contructor of my benchmark class I create one instance of it:

    socketFactory = new TimedSocketFactory();

Then in each of the benchmark operation methods I do:

    Socket s = socketFactory.createSocket(myServerIP, 80);
    PrintWriter out = new PrintWriter(s.getOutputStream(), true);
    BufferedReader in = new BufferedReader(new InputStreamReader(s.getInputStream()));
    out.println(req);

Here ‘req’ is the previously constructed request buffer. Then I read and process the server response from ‘in’.

With that, Faban takes care of timing and collecting the statistics very nicely.

Overall I found Faban to be quite useful, I’ve used it to collect much useful data on the performance characteristics of our server under various load conditions. I now have a long list of ideas on how to scale up the performance!

First Post

First Post!

Finally got around to installing WordPress…

All the posts prior to this one have been migrated from my old blog at blogs.sun.com. Note I did not migrate all the blog entries as many were obsoleted by time (product release announcements and such). I’ve kept the ones that may be still be useful as a reference.

In particular, I dropped all the OpenSolaris Web Stack related posts because the project, the community and the repository have all been discontinued (including OpenSolaris itself) so no point in including those anymore.

Goodbye my Sun

Goodbye my Sun.

(I don’t eat fish and the Douglas Adams quote is overused, anyway.)

I joined Sun Microsystems on March 10, 1997 so I was just about to complete thirteen years at Sun when the acquisition happened. It may have had its highs and lows but all in all it was a wonderful ride at a legendary company. Having been part of Sun was a dream fulfilled.

I’ll always remember Sun for the relentless innovation and the highest standards for technical excellence. Sun was a place where engineering reason prevailed over bureaucrats, assumptions were questioned and critical thinking stood over mere paperwork. We may not have marketed it very well but it sure was the best technology on Earth.

I worked on many projects over the years but the longest running and most special one was the Web Server [iPlanet|SunONE|JES|SJS – nobody loved continuous product renaming like Sun!] Always a small team but with the highest passion for producing the best code on Earth, even against all odds. Thanks for the good memories, Web Server team!

All wonderful things come to an end, I suppose, so did Sun and so does this blog now. Hopefully the articles will continue to be here for future reference but if not, I have also made them available on my own site at http://virkki.com/jyri/articles/.

Interesting challenges have fallen my way so it is time to pursue them.

You know where to find me. Keep in touch!

Posted in Sun

Joining the ZFS Revolution

For a long time now I’ve been meaning to migrate my home file storage over to a ZFS server but the project kept getting postponed due to other priorities. Finally it’s alive!

For the last ten years or so my home fileserver has been through the general purpose debian box in the garage. It has three disks, one for the system and home directories, a larger one which gets exported over NFS and the largest one which backs up the other two (nightly rsync). It has been an adequate solution, in as far as I’ve never lost data. But whenever a disk dies I always have several days of downtime and have to scramble to restore from backups and maybe reinstall.

There are many articles about this topic that make for good reading if you’re considering the same. My goals were:

1. Data reliability, above all.

Initially I had visions of maximizing space, mainly for the geek value of having many terabytes of home storage. But in the end, I don’t really need that much. The NFS export drive on my debian box was currently only 500GB and that was use d not only by the shared data (pictures, mostly, and documents) but also for MythTV storage. Since I wasn’t planning on moving the MythTV data to the ZFS pool, even 500GB would be plenty adequate for some time.

2. Low power consumption.

Since this is another server that’ll need to run 24/7, I wanted to keep an eye on the power it uses.

3. But useful for general computing.

Since this will be the only permanent (24/7) OpenSolaris box on my home network, I also wanted to be able to use it for general purpose development work and testing whenever needed. So despite the goal of low power consumption, I didn’t want to go all out with the lowest possible power setup, needed a compromise.

Here’s the final setup:

CPU: AMD Phenom II X4 (quad core) 925. Reasonable power consumption and the quad cores give me something fun to play with.

Memory: 8GB ECC memory. Since I’m going primarily for data reliability, might as well go with ECC.

ZFS pool: 3 x 1TB drives. These are in a mirror setup, so total storage is just 1TB. That’s still about three times as much as I really need right now. With three drives, even if two fail before I get to replace them I should be ok. Igot each of the three drives from a different manufacturer, hopefully that’ll make them fail at different times.

        NAME        STATE     READ WRITE CKSUM
        represa     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8d0    ONLINE       0     0     0
            c8d1    ONLINE       0     0     0
            c9d0    ONLINE       0     0     0

System disk: I expected to just use one older drive I had on the shelf, but after installing it I found it was running hot. Maybe it is ok but decided to do a two-way mirror of the rpool as well, maybe it’ll save me some time down the road. I don’t need much space here so found the cheapest drive I could get ($40) to add to the rpool. At that price, might as well mirror!

        NAME         STATE     READ WRITE CKSUM
        rpool        ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c9d1s0   ONLINE       0     0     0
            c10d1s0  ONLINE       0     0     0

Total power consumption for the box hovers around 78-80W most of the time.

 

Posted in Sun

Sun!

So here I sit, at the very end of Sun Microsystems.

(oblink to James Gosling’s entry)

Who would’ve thought!

Close to twenty years ago the university received a shiny new SPARCserver/390. Sure we had other hardware from HP (HP-UX, ugh) and IBM (AIX, even worse!) but that 390 with SunOS was special. I cajoled my way into being the sysadmin for that lab mainly so I could get unlimited play time with it.

Later after finishing grad school I ended up elsewhere but Sun was still the coolest company on Earth. I quickly “found” (not by accident) myself with a SPARCstation10 which later became a 20 and so on… Today my ‘desktop’ is a SunFire server but since it is insanely noisy I keep it in a lab and display through a SunRay in my office.

Inevitably, I later ended up here at Sun (coincidentally, right when Bellcore got acquired) and the engineering culture was as great inside as the products were cool from a customer perspective (as to the management side of the company, the less said the better I suppose). So here we are, at the Sunset of it all. Now, a very red sunrise.

So, what’s next for Sun Web Server?

Posted in Sun

More Thoughts on Web Server 7 and TLS Vulnerability

Please read my article on Web Server 7 and the TLS vulnerability for background and recommendation on this issue.

In this entry I’ll add some random thoughts and examples to illustrate the problem. The ideas in this entry are purely experimental.

Is My Web Application Vulnerable?

You may be tempted to wonder that even if the SSL/TLS connection to your Web Server is vulnerable to the renegotiation attack, maybe your web application cannot be exploited?

While technically the answer is “not necessarily”, for most real web sites which exist today the answer is usually yes. Unless your web site is firmly entrenched in 1994 (nothing but static content and no processing of user input of any kind), a clever attacker can surely
find ways to cause mischief (or worse) using this vulnerability. So I’d like to discourage attempting to talk yourself info believing your site is safe. Instead, upgrade to Web Server 7.0u7.

As one example, shortly after the vulnerability was made public, it was used to grab Twitter passwords.

As noted earlier, at a high level, the attack relies on the MITM attacker being able to interact with your Web Server to pre-establish some state and then trick the legitimate client into executing actions based off that state in their name.

What this means in practice will vary widely, depending on what your web application does and how it processes input.

To answer with certainty whether your web application can be succesfully exploited requires analyzing in detail how the web application handles user input and keeps state so it is not possible to give a universal answer. However, given the complex and often
unintended access points available into most web applications, it is safest to assume there are exploitable vulnerabilities unless you can prove otherwise.

A Textbook Worst Case

As one example, consider a simple banking web site subdivided as follows:

/index.html             Welcome page, no authentication required
/marketing/*            Company info, no authentication required
/clients/info.jsp       Show customer balance info, client-cert auth needed
/clients/payment.jsp    Form to initiate payments, client-cert auth needed
/clients/do-payment.jsp Process payment, client-cert auth needed

The site expects users to enter via the home page, click on a link which takes them to the protected area under /clients at which point the Web Server requires client cert authentication to proceed. Once authenticated, the client can view their account balances or click on the payment page (payment.jsp) which contains a form to enter payment
info (dollar amount, recipient, etc). The form will do a POST to do-payment.jsp which actually processes the payment and removes the money from the customer account.

Exploiting the renegotiation vulnerability with this site is trivially easy:

  1. Looking to check their balance, a legitimate client sends request for /clients/info.jsp (the user probably had it bookmarked)
  2. MITM attacker sends a POST to do-payment.jsp with the amount/recipient info of their choosing
  3. Because do-payment.jsp requires client authentication, the Web Server triggers a renegotiation and asks for a client certificate.
  4. The attacker hands off the connection to the legitimate client
  5. Legitimate client provides valid client certificate and renegotiation succeeds
  6. Web Server executes do-payment.jsp with the POST data sent earlier (by the
    attacker) and returns a transaction confirmation to the legitimate client
  7. User panics! Why did the bank just make a payment from my account to some
    unknown entity when all I requested was the balance page?!

This is a very real scenario. I have seen customer web applications doing something precisely analogous to what I describe above.

Application Layer Protection

Is it at all possible to take some precautions against the exploit at the application layer?

The renegotiation vulnerability is recent as of this writing and there have not been too many exploits in the wild yet. History teaches us that over time, vulnerabilities will be exploited in ways far more clever than anyone predicted at first. Given that we are barely in the initial stages of the adoption curve (so to speak) of this vulnerability, I’m only prepared to predict that we haven’t seen its more devious applications yet.

For completeness, I’ll share some thoughts on application layer protections. If you web application is handling anything important (and it probably is, since it is running https), I wouldn’t today recommend relying on a purely application layer protection.

Conceptually, your web application should not trust any input it received before the “hand off” renegotiation for the purpose of taking actions after it (i.e. do-payment.jsp should not process the POST data it received before the renegotiation to complete a payment after it).

Unfortunately, while that is easy to say it is impossible to implement! That is because your web application has no way to know that a “hand off” renegotiation occurred. The Web Server itself does not necessarily know either. Remember the renegotiation may occur at
any time and it happens directly at the SSL/TLS layer, invisible to both Web Server and application.

How about if we lower our goal and rephrase the guideline: the web application should not trust any input it received before succesful authentication was complete for the purpose of taking actions after it. Since the web application does have access to authentication data
(or lack of it), it becomes plausible to implement some defenses based on that knowledge. Is this lowered bar sufficient to protect against all attacks using the renegotiation vulnerability?

Picture a shopping web site with a flow like this:

  1. populate cart with items
  2. click on checkout (assume the site has payment info stored already)
  3. authenticate
  4. order is processed

Here the renegotiation is triggered at step 3, so when the legitimate client logs in they suddenly get an order confirmation screen for something they didn’t order.

The flow could be restructured to be:

  1. populate cart with items
  2. click on checkout (assume the site has payment info stored already)
  3. authenticate
  4. present cart contents and order info again for review
  5. if user agrees again, order is processed

Here the legitimate user would enter the flow at step 3 and then see the unexpected order confirmation screen at which point they get a chance to hit cancel.

Do not be overconfident of such ordering. Just because the developer hoped that the request flow is supposed to be

info.jsp -> payment.jsp -> do-payment.jsp

nothing actually prevents the attacker from carefully crafting a request straight to do-payment.jsp. Paranoid enough yet?

Defensive web programmic is a difficult problem, one that many web applications get very wrong. It is a vast topic, now made that much more difficult by the SSL/TLS renegotiation vulnerability. So I’ll leave it at that for the moment.

So in closing, I’ll just repeat that I’d like to discourage attempting to talk yourself info believing your site is safe. It probably is not. Instead, upgrade to Web Server 7.0u7.

 

Posted in Sun