to LooselyCoupled.com homepage
 
 Weekly emails: how to advanced search
 Glossary lookup:

 

Loosely Coupled weblog


Tuesday, November 23, 2004

Slashdotted

Visitors to Loosely Coupled will have found the site slow or totally unresponsive yesterday from about noon GMT because of a sudden burst of traffic hitting our server when we were slashdotted. Being slashdotted (definition here) is a bit like a denial-of-service attack, especially if, as in our case, the link is from the main page and controversial enough to generate a miniature flame war in the discussion thread (540 comments already at the time of writing, just 24 hours after the initial posting). Certainly our regular readers were denied the service they normally expect from Loosely Coupled, for which I apologize. However the episode has given us some useful pointers as to how we should upgrade our infrastructure, which we plan to do this coming January. More on that below.

I guess I knew I was being provocative when I posted an item last week declaring J2EE: no longer required. Little did I realize quite how much controversy I was going to arouse. (Nor, to be honest, did I fully reflect on how pretentious and troll-like I was going to sound when I wrote about "a highly tuned 'text pump' to occupy the fabric/bus space in a transaction-intensive enterprise data center," a phrase that would be worthy of Pseuds Corner if it weren't so obscure). But it has been good to stimulate at least some debate and to be able to read the contributions.

Changing the threshold level to +3 reduces the number of comments to a more manageable and digestible 80 in number for reading through them. I think Eric Giguere makes an insightful comment that the comparison should not be J2EE to LAMP as a whole: "really, you're comparing the Java-based J2EE framework against similar Perl/PHP/Python frameworks ... Maybe for pure web apps the latter are better. I don't know, but you have to compare oranges to oranges." Of course, boiling it down to an oranges:oranges comparison still begs the question whether this is an either|or decision anyway. It really does depend what you're trying to do and — more importantly — what your starting point is in terms of what you already have installed.

The other takeaway I have from looking through the comments is that I'm left with the impression that a lot of people who use J2EE and Java aren't in touch with how far PHP and related technologies have progressed (and a certain subset of them are determined to stay that way). Even more surprising to see some otherwise well-informed posters stating they hadn't even come across the LAMP acronym before. Various contributors repeat the (incorrect) assumption that PHP can't scale, based presumably on their identification of PHP with small, personal websites, and the parallel assumption that if you have a big, important website you have to use J2EE. Fortunately, Chad Dickerson, CTO of InfoWorld, has just named "Underestimating PHP" as one of The top 20 IT mistakes to avoid, and reminds us why:

In a comment attached to a weblog post about Friendster's switch to PHP, Rasmus Lerdorf, inventor of PHP, explained the architectural secret of PHP's capability of scaling: "Scalability is gained by using a shared-nothing architecture where you can scale horizontally infinitely."

The stateless "shared-nothing" architecture of PHP means that each request is handled independently of all others, and simple horizontal scaling means adding more boxes. Any bottlenecks are limited to scaling a back-end database.

So if LAMP is so wonderful, why did Loosely Coupled's servers crumble under the Slashdot assault? Well, it wasn't actually any of the LAMP infrastructure that let us down. In fact, the site is architected with a big focus on performance and being able to sustain high loads (and is rated "fast" by Alexa). With the exception of a few dynamic pages, most pages are published to disk as flat HTML files (or XML in the case of our RSS feeds of course) by our PHP+XML content management system (strictly speaking then it's LAXP rather than LAMP). So the usual reason for fading under slashdotting — lack of bandwidth or server resources — didn't apply. We failed because the site runs on a virtual server configuration that strictly limits the number of concurrent connections. Here's what my hoster's technical support had to say in response to my enquiry whether I could add any upgrades to handle yesterday's spike:

Phil, the maximum number of simultaneous connections is 15 and that is non-negotiable. We do not plan to offer any plans that have more than that as this is the optimal setting. The reason for the connectivity issues is that you are being slashdot'ed.

And he very kindly supplied the referring URL. But I already knew exactly what the problem was, because Loosely Coupled, being a service-oriented kind of site, has its traffic monitored by a hosted traffic analysis service. So I can tell you exactly how many complete pages we served to real people (ie omitting any robots and spiders) via those 15 simultaneous connections (I say complete pages because the object that calls the traffic analysis servers is embedded near the end of the page code, which is 18.8k in size). In the first full hour of the onslaught (between noon and 1pm GMT), we only managed to serve 387 pages (to 273 visitors). Realizing that the problem was a connections issue, I tweaked the weblog page to take out some javascript includes and thus reduce the number of server hits per page impression, and managed to hit a peak of 839 page views (to 462 visitors) eight hours later, in the hour between noon and 1pm California time. Response times normalized shortly afterwards, implying that the contention for those connections probably became manageable during that hour of peak impressions (until then, many visitors were probably abandoning before the page had fully loaded). The server log shows that the hit rate for the site peaked at just over 9000 hits per hour, but I don't bother much with server logs, so I can't tell you which hour that was.

The final tally for the past 24 hours is almost precisely 10,000 pageviews served to very nearly exactly 6,000 visitors. I would guess that underestimates the numbers that we could have got had we been able to serve them all by a factor of around 2x to 3x. That's what it means to be slashdotted.

How best to prepare for a similar occurence in the future? I must confess I'm a bit miffed by the connections limit, because I'd reasoned that a virtual server setup would be better able to handle spikes than an individual dedicated server. But being limited to 15 connections negates those advantages — and keeps the server throttled well below any risk of coming up against any other limitations such as bandwidth, memory, disk access or processor. Since much of the connections were asking for the same file, a simple Squid box or similar on the front would have solved my problem, but that's only going to be on offer with a dedicated server setup, which brings those other potential issues into play. I'd feel much safer with a distributed solution, and after a chat yesterday with Limelight Networks, there's a very real possibility that come next January, Loosely Coupled will have graduated beyond the old-fashioned notion of server hosting to a much more twenty-first century style of in-the-network hosting. (An alternative for those who host on their own servers and want a free content delivery network is Coral, an NSF-funded project to establish a peer-to-peer CDN).

posted by Phil Wainewright 1:09 PM (GMT) | comments | link

Assembling on-demand services to automate business, commerce, and the sharing of knowledge

read an RSS feed from this weblog

current


archives

latest stories RSS source


Headline news RSS source


 
 


Copyright © 2002-2005, Procullux Media Ltd. All Rights Reserved.