Skip to content

Building An XML Workbench

Just like the workbench in my garage, my Web development workbench is often covered with partially assembled widgets, various loose nuts and bolts,  and lots of tools–at least while I’m learning mode. But once I’ve evaluated the project and decided what materials and tools I’ll need, I need a clean space to work with. So, with a swipe of an arm I clear off the workbench and neatly place the bare necessities on the bench top. Then, and only then, am I ready to begin.

This month, I’ll show you how to set up a workbench of tools you can use to process XML on your Web server. The environment I describe is a low-end solution. That is, I’m not assuming you have high-speed network access, or that you have administrative privileges on your server. All you need is 56K modem and support for Java servlets on your server. The tools I describe are portable, meaning you can set up a similar workbench under Mac, Unix, or Windows. When you’re done installing the tools I describe in this column, you’ll have everything you need to take advantage of XML on your Web site. The best news is that most of these tools are free.

Server-Side XML

After many months of examining XML tools, I’ve come to several conclusions. First, XML is a plate that’s better served. That is, client-side processing of XML will always be doomed to inconsistent support. Let’s face it, Web developers still jump through hoops to get many of HTML 4’s features to behave consistently in the major browsers. In fact, we can use XML and XSL on the server to solve that very problem. Imagine a system that stores its documents in XML format, then uses XSL to dynamically transform these documents into basic HTML. You can even define an XSL style sheet for each brand of browser you intend to support, and with a little browser detection serve HTML optimized for that specific browser. In fact, you’ll be able to that by the end of this column.

Secondly, I’ve decided that Java is generally a better complement to XML than scripting. In my development scenario, I farm out to a Web hosting service that uses an Apache server on a Linux box. My development platform, on the other hand, is a Windows environment. Java allows me to write and test code on my  machine, and move it over to the Web server without recompiling.  If you’re cringing at the thought of writing Java, don’t worry. This isn’t rocket science and it’s a lot easier than writing Perl. (Sorry, Randal!) Basically, if you can write an applet, you can manage a servlet.

To understand how the tools on our workbench interoperate, it might help to look at the big picture. The foundation is the Java Development Kit and Java Servlet Development Kit from Sun, and the JRun Pro Servlet Engine from Live Software. The key components are the XML for Java Processor, LotusXSL, and XML Enabler, all freely available from IBM; see Online for the URLs to all of these tools. I’m also recommending that you grab a copy of “Java Servlet Programming” by Jason Hunter and William Crawford (O’Reilly, Sabastapol, CA. 1998).

Java Development Kit

The Java Development Kit (JDK) needs little introduction. It is the basis for all Java development, so you’ll need to install the JDK before anything else. If you’ve already installed the Java Development Kit (JDK) on your system, you’ll want to make sure that you have the proper version. The IBM tools are mixed in their support of the JDK: The XML for Java processor supports JDK 1.2, but the XSL processor only works under JDK 1.1, so that will be our base platform.

Also, Keep in mind that you may have already installed the JDK as part of a commercial Java development environment. For example, I’m using Symantec’s Visual Café for developing code, compiling classes, and so on. As part of the installation, Visual Café automatically installs the JDK 1.1 on my machine. The bottom line is check to see if you have a JDK installed and if so, ensure that it is version 1.1.x. You can run both versions of the JDK, so you don’t have to give up the latest update to build this workbench. If you plan to install both versions, check the JDK for configuration details.

If you’ve downloaded the JDK from Sun’s Web site (see Online), the installation process varies depending upon the platform you’re running. The Windows version comes as an executable archive. Double-clicking on the archive file invokes the JDK installer, which creates the directory structure and unpacks the tools and documentation bundles. On Solaris, you’ll have to do the unpacking manually. Once you’ve unpacked the archives you can delete this file to recover the nearly 9 MB of disk space that archive file chews up.

Next, do not indiscriminately unpack every .zip file you see in the directory tree. Recall that a feature of Java is its ability to locate class files that are stored in archives including JAR (Java ARchive) and .zip files. In particular, you’ll find a file named CLASSES.ZIP in the lib directory, which contains all of the core Java classes. Do not unzip this file.

Depending on how you’ve obtained the JDK (directly from Sun, or through third-party software), you may or may not have to set environment variables. If you’ve followed the installation procedures for the “raw” JDK you don’t need to set CLASSPATH; see the text box entitled “All About CLASSPATH” for general information on setting and using this environment variable.

To Sevlet and Protect

Servlets have become increasingly popular for many reasons. First, they can be used to generate documents on the fly, thus replacing CGI scripts that often rely on server APIs. You also gain the benefits of Java including built-in support for network sockets, database connectivity, and string manipulation. More importantly, your servlets are easily portable to any Java-enabled Web server. In fact, I’m able to develop my servlets on my Windows 95 machine and copy them directly over to my Web server, which is running Apache under Linux.

The first thing you’ll need to do is ensure that your server supports Java servlets. All of the major Web servers support them, but support may not be enabled. Check with your system administrator or Web hosting service to determine if your server supports them. If your server doesn’t support servlets, then you’ll need to get one of the many servlet engines available. JRun Pro, which I’ll describe in a moment, should serve needs quite nicely.

The servlet API is a standard Java extension and comes as part of the JDK 1.2. However, we’re using JDK1.1, so you’ll need to download the Java Servlet Development Kit (JSDK) 2.0 from Sun’s Web site; see Online. Once you have the JSDK, unzip the archive to a directory on your hard disk. Assuming you’ve installed the JSDK in a directory called “jsdk,” you’ll need to include the path to the jsdk/lib/jsdk.jar file in your CLASSPATH. (See “All About CLASSPATH” for details on setting this variable). That’s it.

The JSDK also provides a simple Java server, ServletRunner, for testing servlets locally before deploying. However, I’m using JRun Pro version 2.2.1 from Live Software for this purpose. JRun is a server extension that implements the Java Servlet API, and it includes a collection of Java classes that acts as an interface layer between your servlets and your Web server.  JRun isn’t required, but it has proved to be an extremely useful addition for my servlet development. JRun supports many advanced features including servlet chaining and filtering, dynamic reloading of modified servlets, <SERVLET> tag support, user session tracking, an integrated JRun Web server, and much more. JRun also improves performance through native code that interfaces directly with your web server.  I should also mention that the basic version of JRun (freely available; see Online for details) was selected by the editors of Web Techniques and Web Review as the best Java Tool at last year’s Web Tools awards. I’ll talk a lot about JRun Pro (and other Servlet engines) in future projects.

To obtain JRun, you’ll need to fill out Live Software’s online registration form. Live Software’s Web site generates an automatic email message telling you where you can download the software. The download file is another executable archive that invokes an installation program. The installer program also launches the JRun Connector Wizard, which guides you through the process of installing connectors between your Web server and the JRun Servlet Engine.

Whichever approach you take, you should test your configuration by running some sample servlets. I would do this on both your local development machine and on the server where you’ll deploy your servlets.

The XML Processor

The next part of your development platform is the XML processor. To run with our other tools, you have a choice of installing either Microsoft’s XML parser, or IBM’s XML Parser for Java (XML4J). Because it was recently updated to support both the XML 1.0 Recommendation released in February and the Document Object Model (DOM) Level 1 specification, I wanted to look at XML4J. IBM’s XML4J processor also provides support for XML namespaces, Simple API for XML (SAX) 1.0, and includes an XPointer package that parses XPointer expressions, can generate an XPointer based on a node in the document tree, and allows your application to search for nodes referenced by XPointers. The processor supports some 37 encodings (as specified in the <?xml encoding=…> declaration, including several variants of UTF, ISO, and EBCDIC encodings. And, XML4J supports a feature, “validating generation,” that allows an applications query a DTD and generate a document with the corresponding structure. All very cool stuff, and necessary for serious XML development.

Installation is a matter of downloading the appropriate version of xml4j.zip from IBM’s AlphaWorks Web site and unpacking it into a new directory. The appropriate version is determined by our next tool LotusXSL, which I’ll describe in a moment. (I’m running version 1.1.1.4) To test the installation from Windows, open up a DOS window and issue the command:

type datapersonal.xml

If the personal.xml file is there, it will display on the screen. Next, run the Java Runtime Environment tool (part of the JDK) with the following command line:

jre -cp xml4j_1_1_14.jar;xml4jSamples_1_1_14.jar

samples.XJParse.XJParse -d datapersonal.xml

Note that the jar files must be specified in the command line. That’s because JRE ignores the CLASSPATH environment variable. If all goes well, this command line invokes XJParse, which parses personal.xml and checks the syntax. This command line also regenerates personal.xml. If no error messages are shown, the test has passed and you should be able to display the personal.xml file on the screen again using the DOS type command.

Once you have the processor working, you can start experimenting with some of the tools including a Channel Definition Format (CDF) editor, a CDF viewer, and SiteOutliner, which scans a Web site and reports its profile in CDF format. If you’ve installed the Java Swing library, you can also run the Tree Viewer, which displays a tree structure of an XML document.

Adding XSL

The XSL processor I’ve chosen for our workbench is LotusXSL from Lotus and IBM. The processor uses XML4J to parse an XML document and output a source tree. LotusXSL takes this source tree and creates a result tree, which is used to output a document IBM is careful to note that LotusXSL is an experimental tool. That’s because XSL was still a draft specification and the official W3C XSL Recommendation was still  pending at the time of this writing.  Most significantly, LotusXSL does not currently support flow objects, a big part of XSL. These flow objects conceptually parallel the formatting objects in Cascading Style Sheets (CSS). However, there are still many unresolved issues related to the implementation of flow objects. So for the time being, you can transform XML documents to be output as HTML, which can include CSS style rules. I’ll show you how in a future column.

To install LotusXSL, download the latest release from the AlphaWorks Web site. I’m working with version 1.1.1.6 (second release) for this article. If you’re running Windows, you’ll get a .zip file which you should unpack into a new directory. The unpacking process extracts the documentation, source files, and another .zip file containing the binaries. You’ll need to unpack this file from the root LotusXSL directory to complete the installation. You can test the installation by opening a DOS window and going to the /testsuite directory, then entering the command line:

test test1.

This invokes a batch file which temporarily resets your CLASSPATH and runs the processor on a test case. Assuming you plan to use the processor on your workbench, be sure to add this path string to your permanent CLASSPATH.

During this part of the installation, I ran into a number of problems that may potentially bite you. First, I had a problem running out of environment space. The way test.bat works is that it saves your old CLASSPATH in a separate environment variable. Next, it redefines your CLASSPATH by adding two new jar files to the path, and appending the old CLASSPATH. All of this can eat up environment space, causing you to receive an “out of environment space” error. I solved this by removing the savedCLASSPATH setting and hard coding the entire CLASSPATH string; see “All About CLASSPATH” for additional details.

Once I’d solved the environment space problem, I tried running test.bat again. This time I began receiving errors that some of the classes couldn’t be found. I carefully checked the CLASSPATH, and verified that all class and jar files were correctly installed. Eventually, I concluded that the current version of the XSL processor didn’t support the most recent XML4J release. The LotusXSL documentation stated that you needed XML4J version 1.1.1.9. However, a newer version, XML4J 1.1.1.4, had just been posted. Despite the apparently lower version number, 1.1.1.4 was a newer release; the unorthodox version numbering scheme seemed to further promote confusion. When I returned to the AlphaWorks Web site to download the older version of XML4 two days later, an update to the XSL processor had been posted which supported XML4J 1.1.1.4. Once I had the proper versions installed, things worked seamlessly. I suppose there’s something to be said for shrink-wrapped software and bundled solutions.

XML Enabler

The final tool, XML Enabler, is a servlet that takes an HTTP request from a browser, and uses information in the HTTP header to determine which type of browser made the request. The servlet then selects an XSL stylesheet from a collection of stylesheets, transforms the data into HTML and sends it back in a response. By customizing different stylesheets for various browsers, you can optimize the HTML output for that specific browser. So now, you’ll be able to render XML data in virtually any browser. Once you’ve installed the tools described here, all you have to do is define a mapping between browser types you want to support and their corresponding stylesheets. Of course, you’ll have to define the stylesheets themselves. I’ll tackle this next month. First, let’s install the final component in your XML workbench.

Assuming you’ve downloaded the XMLEnabler archive (see Online) and unpacked the file to a new directory, the next step is to add support for your servlet to your Web server. There are many ways to do this depending on your server, so you’ll have to rely on your server’s documentation for the precise steps. However, most servers support a servlet.properties file where you can associate your servlet with its class. If you’re using the ServletRunner from the JSDK, or possibly the Java Web Server, you’d add this line to your servlet.properties file:

servlet.xmlenabler.code=

comibmXMLEnablerXMLEnabler

You’ll also want to place the XMLEnabler package in your servlets directory. Then assuming you’re servlet engine is running, you should now be able to access XMLEnabler through your browser. Currently, you must pass the name of the XML document to be parsed as a parameter in the URL:

http://localhost:8080/servlet/

com.ibm.XMLEnabler.XMLEnabler?URL=http://localhost/myDocument.xml

Conclusion

After examining different approaches to XML, it’s clear to me that XML will primarily be used under the covers of your Web server. Client-side processing of XML only makes sense in an intranet environment where you’re certain that designated browsers support XML. Indeed, I believe in the future there will be significant advantages to client-side XML in, say, an all Microsoft environment. For the heterogeneous world, however, server-side XML just makes more sense.

There’s a lot chew on here. But if you’re serious about using XML on your Web site, the effort will be worth it. Next month, I’ll show you what new magic you can perform with your new tools. Until then.