Unconsolidating

I’ve mentioned in the past some of the complexities introduced by the consolidation model as we attempt to make open source components available for OpenSolaris.

Tonight I’ll take a shot at listing some of the major difficulties with the SFW consolidation model when applied to the goal of making a broad set of packages available for OpenSolaris – something which will be critical as we move towards IPS.

Looking at Debian unstable I see there are close to 27000 packages available today. Looking at SFW I see about 100 components which produce 158 packages. As I mentioned before, that’s not a complete comparison since other consolidations also deliver various open source components, so the total available for OpenSolaris is a good bit higher.

To succinctly state a goal, here’s what I want to see:

% pkg status -a | wc -l
27000

So… how do we get there?

Or, to the topic of this entry, why can’t we get there via SFW consolidation model?

1. SFW source/build model cannot scale

SFW today is one single source repository (browse it here or download the tarball here). A built tree takes about 7.5GB. On a V2100z (dual Opteron) the build takes about 3 hours. To reiterate, this is about 100 components producing 158 packages. Let’s say we succeed and end up with something on the order of 20000 packages (~126x)? That’s close to a terabyte for each build tree and the build would take about 16 days ;-)

Those numbers are clearly beyond silly… it can’t be done. I could just stop here since this reason alone guarantees that the current SFW consolidation model cannot be used going forward for too long. Admittedly getting to 20000 packages will take time so we can ignore this problem for a little while, but it is better to start planning now instead of waiting until a build takes more than one working day. BTW it only takes about 420 packages for the build to take a full working day (8 hours). Even if 20000 packages might be a lofty far away goal (but one I believe we must achieve), 420 packages is right around the corner.

Requirement: Individual components must be able to build & deliver without checking out or building the rest of the package universe.

2. Centralized breakage

If there is a bug in Ruby build why should the PHP development team be prevented from making progress because they can’t build either? (Not to pick on Ruby which works great ;-). The single tree/single build nature of SFW means that if the build is busted for any reason, it’s broken for everyone. Historically it hasn’t been a consolidation that moves very fast – a few dozen components, most of which don’t change at all, only a few changes trickling in each month. In that environment, the single point of global failure is quite manageable.

Once again, fast forward to thousands or tens of thousands of components being maintaned by hundreds of dispersed teams. Conservatively let’s say each package changes (version updates or any bug fixes) only twice a year. With 20000 packages, that’s already over a hundred changes every day of the year! It’s pretty much guaranteed that the build will be broken nearly all the time (but it’ll take you 16 days to find out why and by then close to 2000 additional changes have gone in!)

Requirement: Bugs & build problems in some components cannot stop progress on entirely unrelated components.

3. Serialization of efforts

During the past four months (duration of the Web Stack project) we had less than half a dozen teams actively putting back changes into SFW. Even with such a tiny numbers, there have been times when one team is held up becau
se the consolidation gate is waiting on some other completely unrelated putback.

As before, this will not work when the level of activity goes up. Once you go from half a dozen teams and a handful of checkins a week to hundreds of teams performing a hundred checkins a day, any serialization will cause the queue of pending changes to quickly grow out of control.

Requirement: Independent teams need to be able to check in code and deliver packages without contending on a single synchronization point.

4. Release early, release … eventually?

We’re all familiar with the phrase release early, release often. The consolidation model used by SFW is the opposite of this idea. The consolidation model assumes that there is one
development team with the resources to do all the development and all the testing necessary to polish the component to perfection. Only after perfection has been reached does it get integrated into the consolidation, after which no further work is usually needed (at least until new features are requested). Admittedly there elegance in this approach. Unfortunately it also doesn’t really apply to the task of packaging third party open source components, where the community feedback loop embodied by “release early, release often” is a vital part of the cycle.

Here is a concrete example taken from my team’s initial PHP integration:

In early August we had PHP packages suitable for installation and experimentation. While there were known problems they wouldn’t have prevented using the packages for early testing. Unfortunately we had no convenient way of publishing them because the consolidation model needs the fully polished final version to be done before putback. So we missed out from any potential feedback on these early packages which would’ve been quite valuable.

By mid-September the work was complete and the packages were ready. By now I had set up an internal IPS repository for distributing the finished work inside Sun. But as we have no external IPS repository to publish into, we still had no convenient way to truly publish the completed work. Again we’re missing out on all potential community feedback.

In mid-October all the processes have completed and we make the checkin into SFW consolidation. Unfortunately checking the code in doesn’t really make it available unless you have access to the SFW gate and are willing to build it yourself. It’s not until 2 weeks later that the packages show up when snv_b76 .iso files become available. And after that, it’s roughly another week before b76 is available for download. So it’s only in early November that the general community – all of you – has easy access to readily installable packages, even though preliminary packages suitable for testing were available to us three full months earlier. The worst part is that by then it i
s so late in the release cycle for SXDE that even if we get suggestions or bug reports, there is little to no time to act on them. It is the kind of feedback that would’ve been most useful back in August.

Requirement: Ability for each component team to independently release early, release often.

5. Schedule synchronization

Another difficulty introduced by the consolidation model is the synchronization of schedules of all components. Since it is a single source tree built at once (every 2 weeks) it requires all putbacks to be synchronized around that beat. This is a lesser issue but even then, even at the current slow pace of change, there’s been some inefficiencies introduced by this. Even minor inefficiencies can become a problem once scaled up to thousands of packages and/or contributors so it is worth keeping in mind. Particularly for packages being maintained by community members who don’t necessarily work to Sun’s schedule.

Requirement: Same as #3: Independent teams need to be able to check in code and deliver packages without contending on a single synchronization point.

Well, it’s easy to see that the existing consolidation model doesn’t quite work for the goal of massively scaling up the number of packages available for OpenSolaris. But then, what’s next? In my next article I’ll explore some thoughts on ways we might move forward.