XML (Extensible Markup Language)

Examples of XML

The basic structure is very similar to most other applications of SGML, including HTML. XML documents can be very simple, with no document type declaration, and straightforward nested markup of your own design:

<?XML version="1.0" standalone="yes"?>

<conversation> <greeting>Hello, world!</greeting> <response>Stop the planet, I want to get off!</response> </conversation>

Or they can be more complicated, with a DTD specified, and maybe an internal subset, and a more complex structure:

<?XML version="1.0" standalone="no" encoding="UTF-8"?> <!DOCTYPE titlepage SYSTEM "http://www.frisket.org/dtds/typo.dtd" [<!ENTITY % active.links "INCLUDE">]> <titlepage>

<white-space type="vertical" amount="36"/> <title font="Baskerville" size="24/30" alignment="centered">Hello, world!</title> <white-space type="vertical" amount="12"/>  <image location="http://www.foo.bar/fleuron.eps" type="URL"alignment="centered"/> <white-space type="vertical" amount="24"/> <author font="Baskerville" size="18/22" style="italic">Munde Salutem</author>


     </titlepage>

Figure Example

<FIGURE TYPE="PHOTO"> <IMG>face.jpeg</IMG> <CAPTION>A Funny Picture</CAPTION> </FIGURE>

The FIGURE element has an attribute named TYPE, which has a value of "PHOTO". The element CAPTION is nested inside of the element FIGURE. Now the DTD might specify that the FIGURE element must contain the IMG element and might contain the CAPTION. Additionally, the DTD might specify that a CAPTION must appear within a FIGURE element -- a CAPTION element on its own would be invalid.

A script associated with the FIGURE element in the document, might specify that if someone clicks on the FIGURE, a script is executed that opens a new window and displays the image. A style sheet associated with the document might specify that the CAPTION should be displayed below the figure and that figures should be numbered automatically.

We can attach a style to the element FIGURE, creating a class of similar objects, instead of singling out which IMG tags should be associated with a style.

FAQ Example

<?XML VERSION="1.0" ENCODING="UTF-8" RMD="NONE"?> <!DOCTYPE FAQ SYSTEM "FAQ.DTD"> <FAQ>

<INFO> <SUBJECT> XML </SUBJECT> <AUTHOR> Lars Marius Garshol</AUTHOR> <EMAIL> larsga@ifi.uio.no </EMAIL> <VERSION> 1.0 </VERSION> <DATE> 20.jun.97 </DATE> </INFO> <PART NO="1"> <Q NO="1"> <QTEXT>What is XML?</QTEXT> <A>SGML light.</A> </Q> <Q NO="2"> <QTEXT>What can I use it for?</QTEXT> <A>Anything.</A> </Q> </PART>


</FAQ>

In XML, the markup language shown above (let's call it FAQML) had a DTD like this:

<!ELEMENT FAQ (INFO, PART+)> <!ELEMENT INFO (SUBJECT, AUTHOR, EMAIL?, VERSION?, DATE?)> <!ELEMENT SUBJECT (#PCDATA)> <!ELEMENT AUTHOR (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT VERSION (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PART (Q+)> <!ELEMENT Q (QTEXT, A)> <!ELEMENT QTEXT (#PCDATA)> <!ELEMENT A (#PCDATA)> <!ATTLIST PART NO CDATA #IMPLIED TITLE CDATA #IMPLIED> <!ATTLIST Q NO CDATA #IMPLIED>

Memo Example

<memo> <to>All staff</to> <from>Martin Bryan</from> <date>5th November</date> <subject>Cats and Dogs</subject> <text>Please remember to keep all cats and dogs indoors tonight.</text> </memo>

Notice that at this point nothing has been said about the format of the final document. From the neutral format provided by XML users can either chose to display the memo on a screen, whose size can be varied to suit user preferences, to print the text onto a pre-printed form, or to generate a completely new form, positioning each element of the document where needed.

To define tag sets users must create a Document Type Definition that formally identifies the relationships between the various elements that form their documents. For a simple memo the XML DTD might take the form:

<!DOCTYPE memo [ <!ELEMENT memo (to, from, date, subject?, para+) > <!ELEMENT para (#PCDATA) > <!ELEMENT to (#PCDATA) > <!ELEMENT from (#PCDATA) > <!ELEMENT date (#PCDATA) > <!ELEMENT subject (#PCDATA) > ]>

Where the position of an element in the model is variable the element can be defined as part of a repeatable choice of elements. For example, to allow references to books or figures to occur anywhere in the text of a paragraph, but not in the heading, the model definition for the <para> element could be modified to read:

<!ELEMENT para (#PCDATA|citation|figref)+ >
where the added elements are defined as:

<!ELEMENT citation (#PCDATA) > <!ELEMENT figref (#PCDATA) >
The model could be extended as follows:

<!ELEMENT memo (to, from, date, subject?, (para|figure)+ > <!ELEMENT figure (graphic, caption?) > <!ELEMENT graphic EMPTY > <!ELEMENT caption (#PCDATA) >

A suitable attribute list declaration might, in this case, be:

<!ATTLIST subject form (bold|italic|normal) "normal" >

you can ensure that a unique identifier is assigned to each figure by adding an attribute list declaration of the following form to the DTD:

<!ATTLIST figure id ID #REQUIRED >

Typically a figure reference element might have its attribute declaration list defined as:

<!ATTLIST figref refid IDREF #IMPLIED >

The keyword #IMPLIED indicates that it is permissible to omit the attribute in some instances of the <figref> element. For example, this might need to be done if the reference was to a figure in another publication. (Unique identifiers only apply to the current XML document instance - they are not necessarily unique across document sets.)