XML - A Closer Look
- The XML specification sets out the following goals for XML:
- It shall be straightforward to use XML over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs that process XML documents.
- The number of optional features in XML is to be kept to an absolute minimum, ideally zero.
- XML documents should be human-legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance.
- How Is XML Defined?
- XML, the Extensible Markup Language (02/10/98)
- XLL, the Extensible Linking Language (???)
- XSL, the Extensible Style Language (in progress)
- XUA, the XML User Agent (not started yet)
- Multidimensional Documents
- text file is one dimensional (contains no information about its content)
- HTML is two dimensional (contains basic markup that describes its content)
- XML is multidimensional - capable of being processed by different programs, delivered by different methods, and displayed in different views.
- XML is "object-oriented"
- Document Object Model
- makes it possible to address all the elements of an HTML document
- able to combine the content of the document, the logic that controls it, and the styles in one package
- The Scripting Dimension
- JavaScript is an object-oriented language of sorts.
- JavaScript understands that the document is an object and that it has certain properties that you can reference or set. For instance you can reference the title or the location.
- Limitations of the JavaScript approach:
- you cannot reference the contents of a document (the third paragraph, the top level headings, the table, etc.)
- you cannot use the script to control the display of elements.
- when content is embedded in an application, it requires a programmer to create and modify the application. Instead, we should separate the content from the scripting logic that controls the application. (Just as we separate style from the content).
- the above limitation is not in the JavaScript language but in the document model underlying the Netscape browser.
- In the Document Object Model, we can address objects in an HTML document, but we are limited to the elements that HTML provides - which are fairly generic.
- This is where XML comes in nicely.
- The Presentation Dimension
- use style sheets, either included in the document or external to it, to control page layout
- the current style sheet standard is CSS -- cascading style sheets
- HTML and XML will work with CSS
- CSS offers limited control of presentation, handling font changes and such.
- It lacks some important features that you'd associate with style sheets such as controlling the flow of text and positioning elements.
- There are efforts underway to replace CSS with a more substantial style sheet called XSL (DSSSL-lite).
- Presentations will need to be smart about the browser, the display device, the fonts available, and a number of other factors.
- Multidimensional documents will need to have multiple style sheets, reflecting different views or different devices.
- The Server Independent Document
- XML helps to establish documents that can function independently of the server.
- Now you can look at the document as a container of information whose logic provides the user different views.
- It might be natural to create and manage the application in a database. Even so, you might extract the information and deliver an XML document that can exist independently of the server and the database.
- The Primacy of Documents
- One reason for the success of the Web is that creating HTML documents is easy.
- Do we have to trade in the flexibility and control we have with documents for Java applets, plug-ins and proprietary formats that process information in mysterious ways.
- XML opens new dimensions for Web documents, ensuring that documents can be more
functional.
- XML will help to see that documents continue to have a certain kind primacy on the Web.
- Document Type Definition (DTD)
- full SGML uses a Document Type Definition (DTD) to describe the markup (elements)
available in any specific type of document.
- the design and construction of a DTD can be a complex and non-trivial task, so XML has been designed so it can be used with or without a DTD.
- DTDless operation means you can invent markup without having to define it formally.
- Well-Formed
- if there is no DTD in use, the document must start with a Standalone Document Declaration (SDD) saying so:
<?XML Version="1.0" standalone="yes"?>
<foo>
<bar> . . . <blort/> . . . </bar>
</foo>
- all tags must be balanced
- all attribute values must be in quotes
- any EMPTY element tags (e.g. those with no HTML end-tag like HTMLÆs <IMG>, <HR>,
and <BR> and others) must either end with æ\>Æ or you have to make them non-EMPTY by
adding a real-end tag;
Example: <BR> would become either <BR/> or <BR></BR>.
- there must not be any isolated markup characters
- elements must nest inside each other properly (no overlapping of markup, same rules as
for SGML);
- Well-formed files with no DTD may use attributes on any element, but the attributes must
all be of type CDATA by default.
- Valid
- Valid XML files are those which contain a Document Type Definition (DTD) like all other
SGML applications, and which adhere to it.
- They must also be well-formed.
- A valid file begins like any other SGML file with a DTD, but may have an optional XML
Declaration prepended:
<?XML version="1.0"?>
<!DOCTYPE advert SYSTEM "http://www.foo.org/ad.dtd">
<advert>
<headline> . . . <pic/> . . . </headline>
<text> . . . </text>
</advert>