XML
Quote
About
- XML = Extensible Markup Language
- You can use you own tags <> so it can be extended to any need
- It is a W3C recommendation
- It's a software and hardware independent data exchange format
- Both human and machine readable
- XML uses a Document Typed Definition (DTD) or XML schema to describe the data
- XML is not a replacement of HTML
- XML has no function but to store informations: someone must write a separate code to interpret its data
- XML has a high degree of persistency i.e. it is very stable through time
Other exchange formats:
- WITSMl (XML for seismic acquisition)
- RESQML
World Wide Web consortium recommends:
- XML (Extensible Markup Language)
- XPath (XML Path Language)
- XSchema (XML Schema Language)
- XSL (Extensible Style Language)
- XQuery (Extensible Query Language)...
XML elements
XML declaration
Tells the reader program which version of XML will be used.
<?xml version="1.0" encoding="ITF-8"?>
Values (always quoted)
<number> x="1.2" </number>
Element (the root element if the first element to be created)
<name> toto </name>
Empty elements
<point x="1.23" y="4.56" z="7.97" />
Tags
<name> toto </name>
Attributes of an element (a way of storing data, child element is another way)
<name font="arial">
### Entities
```xml
<equation> a < b </equation>
PCDATA (Parsed Character Data)
Parsed text between start-tag and end-tag
<name> toto </name>
CDATA (Character Data)
CDATA will not be parsed.
<![CDATA[*p = &q; b = (i <= 3);]]>
Creation and basic rules
- Create a file with the xml extension
- Add an XML prolog
<?xml version="1.0" encoding="UTF-8"?>
- Create a root tag and other sub-tags tags
- Each tag has a start
and an end : - Tags are case sensitive
- White spaces are preserved (HTML mix multiples spaces into a single but not xml)
- Can not start a name using "xml"
<root>
<child>
<subchild> ... </subchild>
</child>
</root>
Attributes
- If necessary, creates attributes like following. Attributes must be quoted.
<friendList>
<friend gender="male">
<name> Alex </name
<age> 25 </age>
</friend>
</friendList>
Dealing with reserved symbols
- Some symbols already have a meaning in xml, therefore use the reference symbol
This will not work:
<eligibility> age > 18 </eligibility>
But instead, write this:
<eligibility> age > 18 </eligibility>
Commenting the code
<!-- This is a comment -->
<!-- This is an invalid -- comment -->
Using prefixes or namespaces
-
When using prefixes in xml, namespace for the prefix must be defined.
-
The namespaces are "h:" or "f:"
- All child elements with the same prefix are associated with the same namespace
- « xmlns » stands for xml namespaces
- the namespace is declared with double dots « : » and its name is just after it (here f and h namespaces), followed by an URL
<root>
<h:table xmlns:h=âhttps://www.w3.org/TR/htmlâ>
<h:tr>
<h:td> Apples </h:td>
<h:td> Bananas </h:td>
</h:tr>
</h:table>
<f:table xmlns:f-âhttps://www.w3.org/furnitureâ>
<f:name> Coffee Table </f:name>
<f:width> 10 </f:width>
<f:height> 20 </f:height>
</f:table>
</root>
Namespaces can also be declared in the xml root element
<root xmlns:h=âhttps://www.w3.org/TR/html4/â
xmlns:f=âhttps://www.w3.org/furnitureâ>
<h:table xmlns:h=âhttps://www.w3.org/TR/htmlâ>
<h:tr>
<h:td> Apples </h:td>
<h:td> Bananas </h:td>
</h:tr>
</h:table>
<f:table xmlns:f-âhttps://www.w3.org/furnitureâ>
<f:name> Coffee Table </f:name>
<f:width> 10 </f:width>
<f:height> 20 </f:height>
</f:table>
</root>
Default namespaces can be defined for an element and all the child elements will use it
<table xmlns=âhttps://www.w3.org/TR/html4/â>
<tr>
<td> Apples </td>
<td> Bananas </td>
<tr>
</table>
is the same as:
<h:table xmlns:h=âhttps://www.w3.org/TR/htmlâ>
<h:tr>
<h:td> Apples </h:td>
<h:td> Bananas </h:td>
</h:tr>
</h:table>
CDATA sections
Instructs the parser to ignore most markup characters
<![CDATA[[ if( *pi < z ) ]]>
Document Type Definition (DTD)
- Define the legal building bloacks of an XML document
- Each XML file can carry a description of its own format with it
- Different people can agree to use a common DTD for interchanging data
Define elements (general case)
<!ELEMENT element_name category>
<!ELEMENT element_name (element_content)>
Empty elements
<!ELEMENT element_name EMPTY>
Elements with only character data
<!ELEMENT element_name (#PCDATA)>
Elements with any content
<!ELEMENT element_name ANY>
Elements with children (sequences)
<!ELEMENT element_name (child_element_name)>
<!ELEMENT element_name (child_element_name, child_element_name_1, ...)>
Number of occurence of the same child element
Exactly one child
<!ELEMENT element_name (child_name)>
Minimum one child
<!ELEMENT element_name (child_name+)>
Zero or more children
<!ELEMENT element_name (child_name*)>
Zero or one child
<!ELEMENT element_name (child_name?)>
Either, or
<!ELEMENT element_name (either | or)>
Mixed content
<!ELEMENT element_name (#PCDATA | choice_1 | choice_2 | ...)>
DTD attributes
<!ATTLIST element_name attribute_name attribute_type default_value>
<!ATTLIST book isbn CDATA #REQUIRED>
Default values
'#REQUIRED' -> the attribute value must be included in the element
'#IMPLIED' -> the attribute does ot have to be included
"value" -> an attribute can be given any legal value as a default
'#FIXED' "value" -> the attribute value is fixed
DTD internal entities
<!ENTITY entity_name "entity_value">
DTD external entities
...
Example:
<!ELEMENT author (#PCDATA)>
<!ELEMENT book (title, author, character+)>
<!ATTLIST book isbn CDADA #REQUIRED>
<!ELEMENT character (name, friend-of?, since, qualification)>
<!ELEMENT friend_of (#PCDATA)>
<!ELEMENT name (#PCADATA)>
<!ELEMENT qualification (#PCDATA)>
<!ELEMENT since (#PCDATA)>
<!ELEMENT title (#PCDATA)>
Problems of the DTD
- The syntax is different for XML and DTD so there is a need for two different parsers and there is more work.
- A soution can be X-Schemas
X-Schemas
About
- Contains document information such as the data-type of the elements, ranges and values and how data is related to another piece of data
- 'xs:sequence' defines an ordered sequence of sub-elements
- 'xs:choice' choice between several possible particles
- 'xs:all' unordered set of elements
- 'simpleType' is for data-types holding values only
- 'complexType' is for data-types holding attributes, non text children...
Definition
<? xml version='1.0' encoding='UTF-8'?>
<xs:schema xmlns:xs='https://w3.org/2001/XMLSchema'>
.../...
</xd:schema>
Creating personalized types
Lets create a new type derived from xs:string with a max number of characters equal to 32:
<xs:simpleType name='type_name'>
<xs:restriction base='xs:string'>
<xs:maxLength value='32'/>
</xs:restriction>
</xs/simpleType>
Groups
Containers holding a set of elements or attributes that can be used to describe complex types
<!--definition of an element group -->
<xs:group name ='mainBookElements'>
<xs:sequence>
<xs:element name='title' type='nameType'/>
<xs:element name='author' type='nameType'/>
</xs:sequence>
</xs:group>
<!-- definition of an attribute group -->
<xs:attributeGroup name='bookAttributes'>
<xs:attribute name='isbn' type='isbnType' use='required'/>
<xs:attribute name='available' type='xs:string'/>
</xs:attributeGroup>
<!-- definition of a new complexType bookType -->
<xs:complexType name='bookType'>
<xs:sequence>
<xs:group ref='mainBookElements'/>
<xs:element name='character' type='characterType' minOccurs='0' maxOccurs='unbounded'/>
</xs:sequence>
<xs:complexType>
Examples:
<xs:element name='name' type='xs:string'/>
<xs:element name='friend-of' type='xs:string' minOccurs='0' maxOccurs='unbounded'/>
<xs:element name='since' type='xs:date'/>
<xs:element name='qualification' type='xs:string'/>
Real example:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="character" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="friend-of" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="since" type="xs:date"/>
<xs:element name="qualification" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="isbn" type="xs:stringâ use=ârequiredâ/>
</xs:complexType>
</xs:element>
</xs:schema>
X-Path
About
- Locate information in an XML document
- Represent node adresses following the document's tree structure (like a unix directory)
- A path starting with
/
is an absolute path - A path starting with
//
encompasses all elements in the document that fullfill the criteria - So
/book/character/name
selects all the name elements of all the character elements of the book element - So
//name
selects all the name elements in the document - Wildcards
*
can be used to select unknown XML elements - So
/book/character/*
selects all child elements of all character elements of the book element - So
/book/*/name
selects all name elements that are grandchild elements of the book element - So
//*
selects all elements in the current document