LCblog | Only transform | Feb 17th 2004 3:15pm

Only transform

Using Pro2xslt to transform Atom into RSS 2.0 neatly demonstrates the philosophy of loosely coupled systems, and illustrates the benefits compared to traditional approaches.

The need for an Atom-to-RSS utility arose out of Google's recent decision to use Atom as the default syndication format for its Blogger weblog publishing service. Both formats, as I explained previously, use XML. So the obvious solution is to use XSLT (which stands for XML Stylesheet Language Transformation) to convert one to the other. Well, at least, that's what you'd have thought, wouldn't you? That's what I assumed, anyway. But it turns out that forward-thinking, loosely coupled philosophers like me are still in a minority here.

The traditional response is typified by Phillip Pearson's Feed Normalizer, for which he has published the source code. This shows that his program uses expression-matching to parse the XML, extracts the data, and then uses text concatenation to wrap it back up in XML tags and spit it out. In other words, it just treats the XML as if it were a random stream of characters, both on the way out and on the way back in.

There are some sound pragmatic reasons, in some circumstances, for using this approach. Many XML feeds — especially in the blogging world — are not well-formed, which means they'll fail when going through a proper, standards-compliant XML processor. But you'd expect a company like Google to be capable of producing well-formed XML, wouldn't you? So in this case, using expression-matching and concatenation to tear the XML apart and then reformulate it is a real waste of an opportunity.

The penalty that Phillip pays is the penalty of tight coupling. He needs Atom-specific code to parse an Atom feed, and each time the Atom format changes, he has to change his program code. At least he's using code written by Mark Pilgrim, one of the instigators of Atom. "Atom's a moving target, but Mark has a pretty strong interest in tracking it, so all I need to do is periodically download new copies of his parser," writes Phillip,. I suppose there are worse fates in programming than being tightly coupled to Mark Pilgrim. But in this instance, it's simply not necessary.

Using XSLT takes advantage of standardization to decouple the process of transformation from the definition of the document formats. In fact, a utility like Pro2xslt can be used with any combination of XML source document and stylesheet. Simply paste a new stylesheet URL into the second box on the form, and it will use that new stylesheet to make the transformation. An instant, loosely-coupled, code-free upgrade.

So here's a suggestion for a much better, service-oriented way of keeping up with changes to the Atom specification: let Atom's authors (or, alternatively, some altruistically-minded third party) publish an XSL stylesheet that transforms Atom to RSS. Then anyone who wants to be sure of staying bang up-to-date with the format can simply feed that URL into their XSLT processor. No more waiting around for Mark Pilgrim to redo his code and then having to make time to upgrade your program with it. Let them publish the stylesheet as a service.

If those Atom people were really smart, they'd publish a whole catalogue of stylesheets: Atom-to-RSS0.92, Atom-to-RSS1.0, Atom-to-RSS2.0, Atom-to-WordprocessingML ... and probably offer a free transformation utility too. Joel Spolsky explained why in Let Me Go Back!, an essay he wrote in June 2000: "... eliminating barriers to switching is the most important thing you have to do if you want to take over an existing market ... make an honest promise that it will be easy to back out of the service if they're not happy, and suddenly you eliminate one more barrier to entry."

What's really stunning about XSLT is that it provides the same get-out-of-jail-free card to every single user of every XML-based document format on the Web. Widespread adoption of XML as a standard is eliminating barriers to switching, and opening up markets to all manner of new entrants — if only they'd take advantage of it.

It's not as if XSLT processing is difficult to do. Here to prove it is the functional code of Pro2xslt, It's just eight lines (spread out over 16 lines here for display neatness):

<?php 
    $xmlContent = file_get_contents(
        "http://example.com/document.xml"
        );
    $xslContent = file_get_contents(
        "http://example.com/stylesheet.xsl"
        );
    $th = xslt_create() or die;
    $args = array(
        "/xml" => $xmlContent,
        "/xsl" => $xslContent
        );
    $result = @xslt_process($th, 'arg:/xml', 'arg:/xsl', NULL, $args)
        or die (xslt_errno($th) .": ". xslt_error($th)); 
    xslt_free($th);
    header("Content-Type: text/xml");
    print($result);
?>

Actually, the live code has a couple more lines because I'm running on a hosted server with PHP 4.2.3, and I have to use an fread() function because file_get_contents() doesn't come in until 4.3.0. But all the XSLT processing comes built-in with 4.2.2 or above, even on $15-a-month shared-server hosting deals.

So this is not rocket-science. It doesn't need enterprise budgets to achieve. And XSL stylesheets are well within the competence of anyone who can cope with HTML and PHP or similar languages. There's no major skills barrier here.

The only barriers are habit and custom. People just aren't used to having standardized formats and free software available that allows them to reformat data without first having to study the fine grain of its structure. They're still thinking in terms of parsing, inspecting, cleansing and reformatting as a tightly integrated software process. Whereas in truth they need only transform.

posted by Phil Wainewright 3:15 PM (GMT) | comments | link

home	news	weblog	resources	services	about
Weekly emails:					how to	advanced search

Loosely Coupled weblog

Tuesday, February 17, 2004

Only transform

current

archives

Nov 2005

Oct 2005

Sep 2005

more ...

latest stories

Headline news