|
![]() |
Article from January, 2000. Inclusion, Entities, XLink, and XML EvolutionBy Bob DuCharme Bob DuCharme is an assistant vice-president at Moody's Investor's Service. He wrote the Prentice Hall books XML : The Annotated Specification (now available in Japanese) and SGML CD , a tutorial and user's guide to free SGML software. See http://www.snee.com/bob for more information. Abstract
On November 23, 1999, the W3C Note " XML Inclusion Proposal (XInclude)" was published on the W3C web site at http://www.w3.org/TR/xinclude . Several XML news sites briefly mentioned the new document, but I saw no discussion of its importance on any of these sites or the XML mailing list or comp.text.xml. This month's feature looks at the note. On November 23, 1999, the W3C Note " XML Inclusion Proposal (XInclude)" was published on the W3C web site at http://www.w3.org/TR/xinclude . Several XML news sites briefly mentioned the new document, but I saw no discussion of its importance on any of these sites or the XML mailing list or comp.text.xml. I think it's very important, but first, let's look at the history leading up to it. SGML , External Entities, and XMLPart of the point of XML was that some SGML experts developed a version of SGML that threw out the odd, rarely used parts and kept the important basics: elements, attributes, entities, notations, processing instructions, and comments. HTML people and database people (the latter of whom knew their HTML basics) already knew the comment syntax and were excited about making up their own elements types and attribute lists. They typically didn't bother with processing instructions and notations, whose usefulness was even less apparent to them than it was to seasoned SGML users. Many categories of entities left them confused. After all, there are three pairs of "two kinds of entities": internal and external, general and parameter, and parsed and unparsed. It is confusing. Again, their familiarity with HTML led them to easily accept the use of entity references such as <, &, and others they used in their Web pages, but they managed to accomplish a lot without any parameter entities or unparsed entities. Still, they couldn't avoid the usefulness of external parsed general entities. Programmers are accustomed to having an instruction for their source files that tells a processing program "insert another source file here as if it were part of this one." This lets them share controlled content between multiple files, it makes systems more modular, and it makes it easier to change a common part of multiple "files" when necessary. Not knowing how to use general external entities, many developers posted messages to XML newsgroups and mailing lists asking "how can I include an existing file inside an XML document?" Others complained that XML 's requirement of a single document element enclosing all the other elements made it impractical to store, for example, log files in XML format-they insisted that a logging program must read in the entire log file, insert the new log entry just before the document element's closing end-tag, and write out the log file. I pointed out to one of these complainers that external general entities make this easy. With a master XML file like this, <!DOCTYPE eventLog [ <!ELEMENT eventLog (event+)> <!ELEMENT event (subElement1, subElement2)> <!ENTITY logfile SYSTEM "logfile.txt"> ]> <eventLog>&logfile;</eventLog> He could append entries to his logfile with no need to read the whole thing into memory. Whenever he wanted an XML document of the events, he could feed the above document to his parser. It turned out that he knew about this syntax, but considered it a "weird kludge." It was some tricky, odd syntax that didn't fit into the spirit of XML as he understood it. Those of us who've been doing it for a while don't consider it odd, but if you think about it, it is responsible for some of XML 's less elegant aspects. For example, to allow for modular documents that are well-formed but not necessarily valid, a non-validating parser must still check for ENTITY declarations. In other words, a well-formed document doesn't need a DTD , but a parser that only worries about well-formedness must still check for this class of DTD declarations. One basic argument against this special syntax for external general entities echoes a key argument for using schema instead of traditional XML 1.0 DTDs: if elements and attributes are so good at representing such a wide variety of information structures, why do we need this other syntax to describe these crucial information structures? A Proposed AlternativeSome argue that the XLink spec offers the best way to identify documents to include as subdocuments, but this asks XLink to do a job that it intentionally avoids. Although the XInclude spec was created as part of the W3C XML Linking Working Group's activity, it's a separate spec for a reason. As it tells us, "Inclusion features [in XInclude] differ from the linking features described in XLink in that they require specific behavior from the inclusion processor." XLink offers ways to describe relationships, not specific actions to represent those relationships. According to XInclude creator and IBM e-business architect Dave Orchard, "I created the predecessor to XInclude in April '99 because I saw the need for an XML element/attribute syntax for any XML vocabulary to specify an inclusion processing model. I've been very pleased to work with Jonathan Marsh of Microsoft to help refine the original work into a much better proposal." The XInclude proposal achieves its goal by describing an "inclusion facility" that uses a specialized element type for including other files within that element's document. (Actually, it doesn't define itself in terms of "files" but in terms of "infosets," because a W3C spec shouldn't preclude use on operating systems such as OS /400 and NeXTSTEP that don't use the concept of "files." Come to think of it, this is why XML and SGML used the term "external entities," but a new syntax to include entities would get tied up in much of the baggage that the Working Group is trying to shed. See http://www.w3.org/TR/xml-infoset for more on XML Information Sets.) The specialized element type is called include. As with other XML add-on specs, it has its own namespace so that an XInclude-aware processor doesn't confuse it with an include element type that you may have defined for your own document type. The following shows one way to use an include element according to the proposal's syntax: <myEssay xmlns:xi="http://www.w3.org/1999/XML/xinclude"> <xi:include href="http://www.snee.com/boilerplate/copyrightNotice.xml" parse="xml"> <par>I met her in a club down in old Soho.</par> </myEssay> There are several important things to notice about it:
These alternatives will make it easier to include little files used as example programming code or sample markup, because the files can be run, tested and included in the master document as they are with no special processing to incorporate them-for example, without requiring the conversion of < characters to < entity references like I had to do to include the example above into this column. At 15 printed pages, the XInclude proposal is not very long, but it addresses many potential problems with a new method for allowing inclusion:
The document is technically not even a Working Draft, which is typically the longest stage before a W3C spec becomes a Proposed Recommendation and then a Recommendation (an official W3C spec). It is a Note, meaning that it's merely a "dated, public record of an idea, comment, or document." While it may never become an official Recommendation, I'm guessing that this wouldn't be due to a lack of importance but instead because it gets subsumed into some other W3C spec such as a more broadly focused XLink or even a future revision of the XML spec itself. Whatever happens to the XInclude proposal, its contents have a real future in the XML world. Keep an eye on it. <end/>
URL for this Article:
http://architag.com/tag/Article.asp?v=14&i=1&p=1&s=2 This article was printed from the <TAG> Newsletter web site, http://www.architag.com/tag. Copyright © 2000 Architag International Corporation. All Rights Reserved. Printing, distribution, and use of this material is governed by U.S. and International copyright laws. |