What are Schemas?

Presented by: Sanjeev Kulshreshtha

Schemas are plural of “Schema.” In order to understand it completely, we can divide our discussion in following subsections:

Schema and subschema

Schema is the conceptual organization of the entire database as viewed by designers.

Whereas, subschema is the conceptual organization of database as “seen” by the application program accessing it (1).

Origin of schema

The word schema derives from the Greek word σχημα, meaning form or shape. It was first popularized in the Western world by Immanuel Kant in the late 1700s. According to the 1933 edition of the Oxford English Dictionary, Kant used the word schema to mean, "Any one of certain forms or rules of the ‘productive imagination’ through which the understanding is able to apply its ‘categories’ to the manifold of sense-perception in the process of realizing knowledge or experience." The original Greek plural is σχηματα, schemata in Latin transliteration; and this is the form which Kant used, originally. Its plural changed to something that sounds more natural to an Anglophone ear. And hence, schemata became “schemas”. Schema word entered computer science, probably through database theory. Here, schema originally meant any document that described the permissible content of a database. More specifically, a schema was a description of all the tables in a database and the fields in the table. A schema also described what type of data each field could contain: CHAR, INT, CHAR[32], BLOB, DATE, and so on (2).

The word schema has grown from that source definition to a more generic meaning of any document that describes the permissible contents of other documents, especially if data typing is involved. Thus, there are different kinds of schemas from different technologies, including vocabulary schemas, RDF schemas, organizational schemas, X.500 schemas and, of course, XML schemas.

Comparison between Schema and Document Type Definition (DTD) language

Schemas are logical development over DTDs. Schemas do not have problems that DTDs have such as, DTDs- can not have data typing, they have non-XML syntax. DTDs are only marginally extensible and don’t scale very well, and DTDs cannot enforce the order or number of child elements in mixed content. Hence, Schemas are better choice over DTDs.

Characteristics of Schemas

Schemas are strategies to solve all the problems of DTDs by defining a new XML-based syntax for describing the permissible contents of XML documents that includes:

Powerful data typing including range checking
Namespace-aware validation based on namespace URIs rather than on prefixes
Extensibility and scalability

Schema languages and their scopes

Schemas are written using specific languages. Since schemas is such a generic term, there are more than one schema languages for XML. In fact there are many, each with its own unique advantages and disadvantages and further insight can be obtained using following links. Such as, Murata Makoto's Relax (3), Rick Jelliffe's Schematron (4), James Clark's TREX - Tree Regular Expressions for XML (5) the Document Definition Markup Language (DDML, also known as Xschema; 6), and the W3C's misleadingly, generically titled XML Schema language. In addition, traditional XML DTDs can be considered to be yet another schema language. W3C schemas are complex. Relax is a simpler language and offers still extensible data type. Relax adopts the less controversial data types half of the W3C XML Schema recommendation, but replaces the much more complex and much less popular structures half with a much simpler language. Relax also has the advantage of being an official JIS and ISO standard.

Schema code example

The ‘greeting schema’ example: first write the xml code.

File greeting.xml

<?xml version="1.0"?>

<GREETING>

Hello XML!

</GREETING>

Now write the Schema code. By convention, the cod file for Schema is stored with name of the file with 3 letter extension .xsd, for example greeting.xsd (see below). Schema code can be written and saved in any text editor that knows how to save Unicode files. Schema documents are XML documents and have all the privileges and responsibilities of other XML documents. They can even have DTDs, DOCTYPE declarations, and style sheets.

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="GREETING" type="xsd:string"/>

</xsd:schema>

The root element of this and all other schemas is schema. This must be in the http://www.w3.org/2001/XMLSchema namespace. Normally, this namespace is bound to the prefix xsd or xs, although this can change as long as the URI stays the same. Elements are declared using xsd:element elements. In the above example it includes a single such element declaring the GREETING element. The name attribute specifies which element is being declared, GREETING in this example. This xsd:element element also has a type attribute whose value is the data type of the element. In this case the type is xsd:string, a standard type for elements that can contain any amount of text in any form but not child elements (7).

Document validation against Schema

The document must be validated for its correctness against a defined Schema. The schema specification specifically allows for a variety of different means for associating documents with schemas. For instance, one possibility is that both the name of the document to validate and the name of the schema to validate it against could be passed to the validator program on the command line like this:

C:\>validator greeting.xml greeting.xsd

To attach a schema to a document, add an xsi:noNamespaceSchemaLocation attribute to the document's root element.

W3C Schema language

The W3C XML Schema language was created by the W3C XML Schema. W3C XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents. It is a very large specification designed to handle a broad range of use cases. It is an open standard, free to be implemented by any interested party (8).

The W3C XML Schema language divides elements into complex and simple types. A simple type element is one like GREETING that can only contain text and does not have any attributes. It cannot contain any child elements. It may, however, be more limited in the kind of text it can contain. For instance, a schema can say that a simple element contains an integer, a date, or a decimal value between 3.76 and 98.24. Complex elements can have attributes and can have child elements. Most documents need a mix of both complex and simple elements.

Answer the following questions:

1. What is the definition of Schema?

Answer: Schema is the conceptual organization of the entire database as viewed by designers.

2. What is subschema?

Answer: subschema is the conceptual organization of database as “seen” by the application program accessing it

3. Which is true from the following statements:

a) DTDs have extensibility & scalability b) Schemas have extensibility & scalability c) both of them have extensibility and, d) none of them extensibility & scalability. Answer: b)

4. What are schema characteristics?

Answer: i) Powerful data typing including range checking

ii) Namespace-aware validation based on namespace URIs rather than on

prefixes

iii) Extensibility and scalability

5. What is W3C Schema, explain breifly?

Answer: The W3C XML Schema language was created by the W3C XML Schema. W3C XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents. It is a very large specification designed to handle a broad range of use cases. It is an open standard, free to be implemented by any interested party.

REFERENCES

Lecture notes from Dr. P. Kirs, course MIT5314, Fall 2003
Schemas. In: XML Bible, Ed: Elliotte Rusty Harold. Ch 24. Second Edition, pub

Hungry Minds. http://www.ibiblio.org/xml/books/bible2/chapters/ch24.html