INRIA
[Up]
Warning

Work in progress

This version may be updated without notice.

Active Schema Language

The Active Schema Language Specification

Working Draft 29 march 2006

Editor
Philippe Poulard  <Philippe.Poulard@sophia.inria.fr>

Copyright © INRIA

Abstract

Active Schema is a powerful XML schema technology built on Active Tags technologies.

Active Schema has the ability to select its content models contextually, and to refactor them dynamically. That's why Active Schemata are active and much more efficient than other schema technologies.

Moreover, Active Schema can be used to define reusable active data type libraries that can also be used in Active Tags applications.

Requirement levels

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Note that for reasons of style, these words are not capitalized in this document.

Active Tags specifications

The following specifications are part of the Active Tags technologies.

Table of contents

1 What are XML Schemata ?

1.1 Why another schema technology ?
1.2 What is Active Schema ?
2 Basics
2.1 Terminology
2.2 Use case
3 Active content models
3.1 Step processing
3.2 Primitive model processing
3.3 Occurrences boundaries
3.4 Material lists and exceptions
3.5 Attributes lists
3.6 Text content list items
3.7 Assertions lists
3.8 Interim processing
3.9 Reusability
4 Types
4.1 Data types
4.1.1 Using and defining data types
4.1.2 Internal data model representation
4.1.3 Parsing
4.1.4 Type inheritance
4.1.5 Semantic support
4.1.6 Functions binding
4.1.6.1 Comparison function binding
4.1.6.2 Counterpart function binding
4.1.7 Augmentation
4.2 Element classes
5 Building Active Schema
5.1 References to namespace URIs
5.2 Multi-schema support
5.3 Integration with Active Tags
5.3.1 Integration with EXP
5.3.2 Integration with Active Catalog
5.3.3 Relationship with Active Datatype
5.4 Documenting
5.5 Model inconsistency
5.5.1 Non deterministic content model avoidance
6 Processing Active Schema
6.1 Invoking Active Schema
6.2 Batch processing
6.3 Localized validation
6.4 Errors
7 ASL module reference
7.1 Elements
7.2 Foreign attributes
7.3 Predefined properties
7.4 Extended XPath functions
7.5 Externalisable features

Appendix

A Glossary
B Related Active Tags specifications
C Common Active Tags modules
D Lists

D.1 Examples list
D.2 Figures list
E Active Schemata for ASL
E.1 ASL definitions
E.2 General purpose messages
F Known implementations


1 What are XML Schemata ?

An XML Schema is the expression of some assertions expected on an XML document class. Assertions on XML documents ensure that applications will process them without causing faults. Expressing assertions with schemata ensure that applications developpers will spend most of their time in designing data process and few of their time in controlling them.

Well known schema technologies are :

Name Syntax style Type Editor Specification location Elem nb
Document Type Definition (DTD) non XML syntax model based W3C http://www.w3.org/TR/2004/REC-xml-20040204/ 8
W3C XML Schema (WXS) XML syntax http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/
http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/
http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/
42
Schematron rule based ISO http://www.ascc.net/xml/resource/schematron/Schematron2000.html 19
Relax NG (RNG) XML syntax
XML compact syntax
pattern based OASIS
ISO
http://www.oasis-open.org/committees/relax-ng/spec-20011203.html 28
Newcomer
Active Schema (ASL) XML syntax active INRIA http://disc.inria.fr/perso/philippe.poulard/xml/active-schema 20 (*)

The general purposes of a schema technology are :

  • validation : a validation process consists on checking that an input XML document is conform to the assertions expressed in the schema. Sometimes, it is possible to limit the validation on a single element.
  • structured edition : when editing an XML document, editors need to know if inserting an element, for example, is allowed in the context of the insertion.
Note

(*) 20 elements used in schema instances + 4 elements used in active sheets.


Note

Schematron, mentioned above, was designed for validation. Unlike other schema technologies, it is not obvious to use it for structured edition.

Other applications that uses schemata are emerging, such as data binding.

1.1 Why another schema technology ?

Any schema technology is designed to cover numbers of assertions expressed. However, the existing schema technologies can't express many constraints like the following listed below. Some technologies will cover the feature, others won't ; sometime none.

As shown in the picture below, ASL covers constraints types listed above and many others that existing schemata technologies can't.

Assertions covered by schema technologies

1.2 What is Active Schema ?

Active Schema is an Active Tags module, and ruled by the relevant concepts described in the Active Tags specifications.

Active Schema is a schema technology based on very simple concepts. Enhanced with Active Tags, Active Schema deals with schemata problematics with greater efficiency than other schema technologies.

An Active Schema may be used both for validation and for structured edition, and many other purposes.

With its simple concepts and low number of elements, Active Schema is easy to learn and easy to use, because the schema follows the structure of the document. An Active Schema is also friendly human-readable : it's easy to understand at first glance the content model of an element.

Finally, the capabilities of the Active Schema technology cover the following purposes :

Active Schema has been designed with the intention to keep XML document classes as is, without structure adjustment on the pretext that a content model can't be expressed with the schema technology choosen. The motivation to design an XML structure must not be lead by any schema technology.


2 Basics

Active Schema deals with XML documents representing both Active Schemata and instances through an abstract data model. XML documents representing Active Schemata and instances must be well-formed in conformance with XML 1.0 and must conform to the constraints of XML Namespaces.

An Active Schema is a flat set of definitions. The materials defined inside an Active Schema must endorse the same namespace URI, but several storage units (files) may be part of the same schema if they share the same target namespace URI.

A material definition is composed of elementary steps that are processed independently. Steps may be primitive models or step containers for others steps. Each primitive model is processed in three phases; for example, when validating:

Steps that are applied on element definitions are called active content models.

Active Schema can't constraint comments, processing instructions, and namespaces declarations obviously; however, specific assertions may restrict their usage anyway.

2.1 Terminology

Material

The term "material" is used to represent :

Content material

The term "content material" is used to represent :

Candidate material

A candidate material is the material or content material -according to the context- to check with the schema. It may be :

Additionally, a candidate material may hold the place before the first material of a list (the child nodes of an element) or after its last material ("cap candidate").

Schema client handler

A schema client handler is a component of an application that uses Active Schemata ; it processes lists of allowed material provided by the schema at runtime.

For example, a validator handler checks if the material found in the source document matches a list computed in a given step.

A schema client handler uses callbacks to process lists because it doesn't select the step to apply ; the schema engine does. Anyway, the entry point of an application that processes an Active Schema is an element, or a document ; such application should define what to do with the callbacks :

Additionally, when an element has been processed, the schema client handler may process its subelements at user option.

2.2 Use case

This use case illustrates that Active Schemata are context dependant.

In this scenario, two companies ACME and EMCA are exchanging XML documents. They are sharing the same base set of schemata, but both are extending it for special purpose usage:

The schema soup consist on a legacy DTD (without namespaces), a Relax NG schema, a brand new Active Schema, and other well known schemata for XHTML and SVG.

Scenario

Active Schema in conjunction with Active Tags offers all means to process such a case very efficiently:


3 Active content models

Content models are element content definitions that defines which material content is allowed, when it is allowed, and how many times.

An Active Schema is a model based schema ; however, unlike other schema technologies, the models defined are active, that is to say that :

For this purpose, content models are divided in elementary checking steps, that may produce a maximum of one of the following primitive model types :

Steps set the scopes of the model types, but can also be used as step containers (with the <asl:step> or <asl:interim> elements) ; XCL can be used advantageously to control which step or substep to use. A <asl:step> element is also a container for attributes (with the <asl:attribute> element), but the list of authorized attributes must be computed in separate steps of those used for content models.

Thus, a content model is processed step by step, each step may be repeated or discarded on behalf of the following step, or on behalf of an interim step. When repeating, its content model may be kept or refactoried.

Finally, additional constraints may be computed to check the validity of an element or to check if an element can be inserted (with the <asl:assert> element).

3.1 Step processing

A step is an elementary unit of process that consists on drawing up lists of materials available and assertions. A step may be a container step, that may contain substeps, or a primitive content model step, that can't contain substeps.

When the content of an element must be checked, the steps defined in the element definition are evaluated on a global sequence. During this process, after a material (text or element) found within the element has been checked with the current step, the next material to check is then selected. According to its settings, the step used may be reused as is, refactoried, paused, or terminated ; the next step is then used.

In addition to content models, a step may also be used to draw up lists of attributes, lists of assertions, and lists of data type matchers. Attributes lists can't be mixed with content materials lists ; assertions lists can be draw up in any step ; data type matchers lists are only found within attribute definitions (<asl:attribute>) and data type definitions.

Steps and material to check are globally progressing on a synchronized reading process.

According to a given step and a given material to check, the following process is applied :

  1. if the step in use is not one of the primitive model types, its content is performed until a primitive model step is selected, or until the end of the definition (element definition, attribute definition, type definition, or class definition).
  2. the primitive model part step is used to compute the list of the allowed materials
  3. if the material matches an item of the primitive model, the step is reused, refactoried, or terminated according to its settings.
Note

When a content material must be checked for example to test if it is possible to insert an element, the host element definition must be performed step by step until the position of insertion.

Moreover :

3.2 Primitive model processing

A primitive model type is a special step used to establish a list of available material. Once selected, the model type establishes lists of allowed materials and assertions that are transmitted to the host application for candidate material checking. For example, a validator would apply these lists on the material to check (candidate material).

Once a candidate material is selected by the host application, it is used to check if it matches the material of the list :

primitive model typeapplicationrepeating
sequence (<asl:sequence>) the first material of the list may match the candidate material the list is updated
selection (<asl:select>) any material of the list may match the candidate material
choice (<asl:choice>) the list remains the same
  1. the number of times the step occurred with a matched material is checked with the min and max values given. If the number doesn't suits the boundaries, an error is raised.

To check if a candidate material matches a choice or a selection, the list is browsed sequencially ; the first item that matches the candidate material is retained. To check if a candidate material matches a sequence, the items are tested sequencially according to the occurrences boundaries.

Once a step matches a candidate material, it may be refactoried on user request if it is reused, or kept as is in the conditions of the repeating mentioned in the table above. When a list is updated while repeating, the use counter of the material is incremented ; the material used is discarded from the list if it is no longer usable, according to the occurrences boundaries set.

Once a step ends, the step container process goes on.

3.3 Occurrences boundaries

Occurences can be set on steps with the attributes :

Elements that allow using this attributes always use 1 as the default value for both attributes. The value "unbounded" for the @max-occurs may be specified ; otherwise, a nul or positive integer may be specified ; finally, an expression may also be specified to compute a dynamic value.

Note

Related occurs values

2 predefined properties have been defined to allow the min occurs value to be based on the max occurs value, or the contrary : $asl:min-occurs and $asl:max-occurs.

For example, min-occurs="{count(//foo}" max-occurs="{$asl:min-occurs}" is correct.

Occurences can be set only on steps. Sequences can't have occurrences boundaries (occurences are reported on the material referenced inside). Additionally, when sequences are defined... sequentially, they can be merged. A sequence is always a stable list with no occurrences boundaries.

Instead of :

    <asl:sequence>
        <asl:element ref-elem="Title"/>
    </asl:sequence>
    <asl:sequence min-occurs="0"><!-- this is invalid -->
        <asl:element ref-elem="Content"/>
    </asl:sequence>

...use the short form :

    <asl:sequence>
        <asl:element ref-elem="Title"/>
        <asl:element min-occurs="0" ref-elem="Content"/>
    </asl:sequence>

Occurences may be used in material reference inside select models, but grouping adjacent select models doesn't express the same model. In fact, sequence models are also slightly differents when the subactions are involving the asl:candidate() function, because the entire sequence list is evaluated with the same candidate material, whereas in the other form, it is evaluated with successive candidate material to check in the case of validation.

Occurences [FIXME: can't ???[shouldn't]] be used in material reference inside choice models, because the list is not updated.

When a candidate element has matched a material that specified occurences, the numbers of occurences are decremented for the next usage.

Element definition example

The ASL element definition below mimicks the following familiar DTD declaration :

<!ELEMENT Chapter (Title, ((Content, Chapter*) | Chapter+))>
    <asl:element name="Chapter">
        <asl:sequence>
            <asl:element ref-elem="Title"/>
            <asl:element min-occurs="0" ref-elem="Content"/>
        </asl:sequence>
        <asl:choice max-occurs="unbounded"
min-occurs="{1 - count( asl:candidate()/preceding-sibling::Content )}"
repeating="stable"> <asl:element ref-elem="Chapter"/> </asl:choice> </asl:element>

...where asl:candidate() refers to the candidate material at the position it is expected. When the choice step is involved, a <Content> element may or may not have been found. In the first case, the @min-occurs attribute will be set to 0, which denotes that the <Chapter> is optional, and in the second to 1, which denotes that a <Chapter> at least must be found. The @repeating directive of the last step indicates that both the min occurs value and the list have to be computed only once.

When involved in a stable step, occurs values are kept unchanged ; when involved in an unstable step, occurs values are actualized.

Repeating a step

If a step must be repeating, according to its occurrences boundaries, its content may be kept as is or refactoried, according to the value of the @repeating attribute :

Exiting a step

Once a primitive content model occurs the minimum times expected, it must exit as soon as the candidate material doesn't match the material, or as soon as the maximum times expected is reached.

Once a container step occurs the minimum times expected, it must exit as soon as the maximum times expected is reached, or as soon as its substeps are no longer in use.

Steps must inform that they were used with a bubble message. A primitive content model was used if a matching occurs. A container step was used if it received a bubble message that indicates that a substep was used.

3.4 Material lists and exceptions

A list is an ordered set of material ; as each list item may represent a group of material when a class or type reference is used, or when a namespace URI reference is used, a sublist that disables (<asl:except>) a subgroup may be added ; this sublist may also have its sublist that enables (an exception of an exception) another subgroup and so on. A sublist is defined as content or subcontent of a material.

List and sublists usage
    <asl:element name="foo"
    xmlns:bar="http://www.acme.org/bar">
        <asl:sequence>
            <!-- top list of enabled elements -->
            <asl:element ref-elem="oof"/><!-- <oof> enabled -->
            <asl:element ref-ns="bar">
                <!-- <bar:*> enabled except <bar:bar> -->
                <asl:except>
                    <!-- sublist of disabled elements -->
                    <asl:element ref-elem="bar:bar"/>
                </asl:except>
            </asl:element>
        </asl:sequence>
    </asl:element>

Of course, an exception must build a list compliant with its target list (in the example above, only elements are concerned).

3.5 Attributes lists

An element definition may refer to attributes with the <asl:attribute> element ; an attribute reference may be expressed thanks to the @ref-attr attribute, or directly with the @name attribute for private attributes.

Thus, attribute definitions may occur on the top level elements of the schema (and shared with all schemata), or directly within an element definition ; the latters may be without a namespace URI because unprefixed attributes are "belonging" to their host element.

Hereafter, the <person> element uses an attribute defined locally :

    <asl:element name="my:person">
        <asl:attribute name="role">
            <asl:text value="author"/>
            <asl:text value="editor"/>
            <asl:text value="reviewer"/>
        </asl:attribute>
    </asl:element>

Now, it refers to a sharable and global attribute :

    <asl:attribute name="my:role">
        <asl:text value="author"/>
        <asl:text value="editor"/>
        <asl:text value="reviewer"/>
    </asl:attribute>
    <asl:element name="my:person">
        <asl:attribute ref-attr="my:role"/>
    </asl:element>

Within an element definition, more than one attribute reference or inline definition may occur ; attributes lists are separated lists which must be computed in a separate step to content models.

As attributes are unordered inside their host element, attributes references and local definitions are allowed directly under the <asl:element> element, unlike content models that must be specified within steps. Under the <asl:element> element, if attributes are encountered outside the scope of a step, they are processed as if a step were defined above ; once an explicit step is encountered, the list is applied before running the explicit step. This is important because the attributes references can't be used anymore for disabling/enabling purposes as they just have been consumed.

Only the <asl:select> primitive step is allowed for attribute lists. Once a list of attributes is established, it is applied on the element to check, or transmitted to the host application.

When validating, once an element definition ends, all its attributes must have been matched, except namespaces declarations that are not checked. The same attribute can't be matched by several lists.

Like with other materials, each list item may have a sublist, and items may be arbitrary enabled or disabled in a top list.

Additionally, items are data typed.

Global attributes are defined with the <asl:attribute> element under the root ; local attributes are defined under elements definitions. Global attributes can be used in an other schema, local attributes can't. However, within a schema, local and global attributes are reusable. Global attributes should have their names bound to namespace URI.

Attribute references or inline definition may be specified with occurrences boundaries ; static and runtime values are allowed.

As usual, the default values for both attributes is 1, which denotes that the attribute is mandatory. Hereafter an element definition references an optional attribute :

    <asl:element name="my:person">
        <asl:attribute min-occurs="0" ref-attr="my:role"/>
    </asl:element>

3.6 Text content list items

Element content models may contain element references or text items ; Active Schema allow to define which text content is enabled, and where it is enabled, even in mixed contents.

Text content list items are very close to attribute values, except that they are unnamed items (however, a convenient way to "name" text content is to use data types) and appear exclusively in primitive content models, exactly like element references. Attribute values and text contents are text values that data types may constraint.

When processing text, comments and processing instructions are ignored, and adjacent texts are merged.

Within primitive content models, the <asl:text> element is used to introduce a text content material. When involved in a material list, a candidate material matches a text definition if and only if :

Whitespaces

A whitespace, in the sense of XML, is a text that contains spaces, tabs, and returns (carriage return, linefeed, or both).

Whitespace candidates are discarded in the following conditions : when a content model is defined with elements and texts that can contains whitespaces, if the candidate material is a whitespace followed by an element that matches an item of the content model, then the whitespace candidate is ignored.

Text matchers

A text item uses the same matchers than those used to define data types and those used in attribute values. A text definition involves the <asl:text> element with :

However, as they may be mixed with element references, only a single matcher can be used at a time, that is to say that 2 text matchers can't be found side by side. When a choice of text matcher is needed, it must be enclosed within an inline type definition, or defined with a type reference. Schemata designers must take care that a step that ends with a text matcher can't be followed by a step that begins with a mandatory text matcher, because the last text matched has been totally consumed by its matcher.

Here is an element that must contain one string beyond a predefined set :

    <asl:element name="Role">
        <asl:choice>
            <asl:text min-occurs="0" value="author"/>
            <asl:text min-occurs="0" value="editor"/>
            <asl:text min-occurs="0" value="reviewer"/>
        </asl:choice>
    </asl:element>

...and another that may contain any string :

    <asl:element name="Para">
        <asl:choice>
            <asl:text ref-type="xs:string"/>
        </asl:choice>
    </asl:element>

A mixed content may also be defined :

    <asl:element name="p">
        <asl:choice max-occurs="unbounded">
            <asl:text min-occurs="0" ref-type="xs:string"/>
            <asl:element min-occurs="0" ref-elem="b"/>
            <asl:element min-occurs="0" ref-elem="i"/>
            <asl:element min-occurs="0" ref-elem="span"/>
            <asl:element min-occurs="0" ref-elem="tt"/>
            <asl:element min-occurs="0" ref-elem="a"/>
        </asl:choice>
    </asl:element>

A content model may precisely indicates where and which text content is allowed :

    <asl:element name="Person">
        <asl:choice>
            <asl:text min-occurs="0" value="Mrs"/>
            <asl:text min-occurs="0" value="Ms"/>
            <asl:text min-occurs="0" value="Mr"/>
        </asl:choice>
        <asl:sequence>
            <asl:element ref-elem="Name"/>
        </asl:sequence>
    </asl:element>

...that could match :

<Person>Mr<Name>Poulard</Name></Person>

...but can't match :

<Person>Mr<Name>Poulard</Name> is french.</Person>

When used directly, the <asl:text> element allow to express simple rules ; for more complex text combinations, the reference to a <asl:type> offers much more flexibility. For example, both following text contents :

<polygon>x=6, y=10, x=37, y=61, x=37, y=16</polygon>
<polygon>6, 10, 37, 61, 37, 16</polygon>

...could be defined by the following schema that refers to a custom data type :

    <asl:element name="polygon">
        <asl:choice>
            <asl:text min-occurs="0" ref-type="my:points"/>
        </asl:choice>
    </asl:element>

The definition of this type is shown in the chapter about data types. Notice that a type may be used indifferently in a text content or in an attribute value :

    <asl:element name="polygon">
        <asl:attribute name="points" ref-type="my:points"/>
    </asl:element>

A type may also be defined anonymously (and can be used also for attributes definitions) :

    <asl:element name="polygon">
        <asl:choice>
            <asl:text>
                <asl:type>
                    <!-- insert the type definition here -->
                </asl:type>
            </asl:text>
        </asl:choice>
    </asl:element>

Please refer to the chapter about data types.

3.7 Assertions lists

Assertions lists are separated lists than can be computed at each step. Once a list of assertions is established, it is applied on the element to check, or transmitted to the host application.

Assertions are additive controls that can't be expressed by content models. Assertions are defined with the <asl:assert> element. Its @test attribute contains an expression that must return true on valid datas, false otherwise. If the assertion to test can't be expressed within this single attribute, its subactions are performed, and the assert is true if the current object is evaluated to true, false otherwise.

For example, the following assertion limits the deepest of an element :

    <asl:element name="Chapter">
        <asl:assert test="{ count( asl:element()/ancestor::Chapter ) < 4 }">
            <asl:desc>Too much nested chapters !</asl:desc>
        </asl:assert>
    </asl:element>

The asl:element() function returns a reference of the element currently tested.

3.8 Interim processing

The <asl:interim> element defines a step that marks a pause on the current model in use. It allows other content models to be applied, but other special purpose processing may be intend. When ending, the model in pause goes on.

An interim step is an unstable step launched only when its host model matched.

    <asl:element name="foo">
        <asl:choice max-occurs="10" min-occurs="5">
            <asl:element ref-elem="bar"/>
            <asl:element ref-elem="goo">
                <asl:interim>
                    <asl:sequence>
                        <asl:element max-occurs="2" min-occurs="2" ref-elem="hoo"/>
                        <asl:element max-occurs="3" min-occurs="3" ref-elem="woo"/>
                    </asl:sequence>
                </asl:interim>
            </asl:element>
        </asl:choice>
    </asl:element>

Each time the <goo> element will be matched, the sequence of <hoo> and <foo> must be applied, as shown in the instance above which is valid with the schema.

<foo>
    <bar/> <!-- 1st choice -->
    <goo/> <!-- 2nd choice -->
    <!-- interim model starts -->
    <hoo/> <!-- 1st occur of the 1st elem -->
    <hoo/> <!-- 2nd -->
    <woo/> <!-- 1st occur of the 2nd elem -->
    <woo/> <!-- 2nd -->
    <woo/> <!-- 3rd -->
    <!-- interim model ends -->
    <bar/> <!-- 3rd choice -->
    <bar/> <!-- 4th choice -->
    <bar/> <!-- 5th choice -->
</foo>
Warning

This structure is somewhat unusual in other schema technologies : when a content model is defined within an element referenced, it means that this content model is applied on the children of the candidate element.

The <asl:interim> element denotes that the content models defined within are applied on the next sibling candidate elements.

Set of attributes

There is no structure that defines groups of attributes in Active Schema, but it is possible anyway to select one set or another with an interim step : once an attribute matched, an additional attribute list may be provided.

Separate attributes sets

The following snippet schema express that wether the a, b, and c attributes must be present together, or the d and e attributes must be present together.

    <asl:element name="foo">
        <asl:select>
            <asl:attribute min-occurs="0" ref-attr="a">
                <asl:interim><!-- an interim step is always unstable !!! -->
                    <asl:select max-occurs="2" min-occurs="2">
                        <asl:attribute ref-attr="b"/>
                        <asl:attribute ref-attr="c"/>
                    </asl:select>
                </asl:interim>
            </asl:attribute>
            <asl:attribute min-occurs="0" ref-attr="d">
                <asl:interim><!-- only one item : don't mind about instability -->
                    <asl:attribute ref-attr="e"/>
                </asl:interim>
            </asl:attribute>
        </asl:select>
    </asl:element>

Mixed sets and stacked interim processes

An interim step may be advantageously used for complex combination descriptions. It is possible to define an interim step that occurs when an element has been matched, but that draw up an attribute list, or the opposite. It is also possible to define an interim step inside an element or attribute that have been involved in another interim step.

Restrictions

As the <asl:attribute> element is used both to define an attribute and to refer to one, the <asl:interim> element must be used without causing a conflict. This can be avoid only when using it on attribute references. On the other hand, attribute definitions can't contain the <asl:interim> element.

An <asl:interim> step can also be used within text and type definitions. In this case, its substeps must deal exclusively with text and type matchers. More generally, an interim step must not be used to check additional constraints on attributes or elements because a text parsing is currently in course and may fail without causing fault because another text parsing may suit later. It would be too problematic to deal with possible additional constraint checking in the case where a type, for example, relies on another type which could match whereas its parent type doesn't.

Interim tuning

The @replace attribute indicates if the interim model replaces or not the host model :

Replacement with an interim model

The ASL element definition below mimicks the following familiar DTD declaration :

<!ELEMENT Chapter (Title, ((Content, Chapter*) | Chapter+))>
    <asl:element name="Chapter">
        <asl:sequence>
            <asl:element ref-elem="Title"/>
            <asl:element min-occurs="0" ref-elem="Content">
                <asl:interim min-occurs="0" replace="all">
                    <asl:sequence>
                        <asl:element max-occurs="unbounded" min-occurs="0" ref-elem="Chapter"/>
                    </asl:sequence>
                </asl:interim>
            </asl:element>
            <asl:element max-occurs="unbounded" ref-elem="Chapter"/>
        </asl:sequence>
    </asl:element>

If the <Content> element is matched, the rest of this sequence will be ignored ; instead, the inner sequence where the <Chapter> element is optional will be applied.

If the <Content> element is not matched, the rest of this sequence will be applied as usual.

When an interim step replaces definitively an upper model, this model is discarded without further occurrence boundaries checking.

3.9 Reusability

Numbers of ASL elements have an @id attribute that identifies the element with a qualified name.

Any identified attribute may be reused thanks to the <asl:use> element. The @scope attributes indicates if the target element must be used itself, or only its content (by default).

Additionaly, when only a part of a definition would be convenient to reuse, the <asl:block> element can be use to set the boundaries of the reusable part. For other ASL elements, the <asl:block> element is totally neutral (it is traversed as if there were only its content). When using a group, the scope must be set to the content.

Note

ID scope

It is strongly recommended for identifiers to be qualified names ; the namespace URI of identifiers should be the same of the target namespace URI of the host schema.

An ID bound to a namespace URI is looked up within the set of schemata bound to the same namespace URI.

For example, the ASL schema for OASIS XML Catalog uses this elements.


4 Types

Types differ whether they are related to textual datas or elements. This specification talks about data types (<asl:type>) or element classes (<asl:class>).

Warning
Notice that the Active Datatype specification talk also about objects types that are not relevant in this specification.

4.1 Data types

Data types apply both on attribute values and text content, designated as textual data. A textual data is a string that can be parsed into a typed data. Parsing a textual data is the operation that consist on sequencially converting the characters into the typed data according to a data type. A typed data consists on:

Data types may be composite, that is to say composed of sequences of data types. Once the first data type of the sequence ends to parse the textual data, the second try to parse the remainder, and so on.

From the point of view of an attribute or a text content, the parsing succeeds if and only if a typed data has been parsed succesfully with no remainder. That is to say that if the data type is a composite data type, the last data type of the sequence must consume all the remainder, otherwise the entire parsing fails.

Defining new data types

Active Schema provides means to define new data types, for example by adding constraints on an existing type, like W3C XML Schema does. It is possible for example to restrict the values of an integer to be between 1 and 365.

When defining a data type, it is possible to apply constraints during or after parsing. Constraints may be applied on the lexical value and/or the logical value and its components (see internal data model representation).

A data type can be defined with a name with the <asl:type> element and its @name attribute, or anonymously directly where it is needed. Named data types are easily reusable. Anonymous data types should be designed for single shot usage.

4.1.1 Using and defining data types

Named types are defined at the top level with the @name attribute of the <asl:type> element. Anonymous types are defined anywhere a type is expected without its @name attribute. When a type is expected, it is defined anonymously, or referred to by its name with the @ref-type attribute of the <asl:attribute> and <asl:text> elements.

Data types are defined on behalf of :

The same type definition may be referred both in an attribute value and in a text content.

For example, the following type definition is reusable :

    <asl:type name="asl:min-occurs">
        <asl:choice>
            <asl:text ref-type="xs:nonNegativeInteger"/>
            <asl:text ref-type="adt:expression"/>
        </asl:choice>
    </asl:type>

The above definition is explicitely a choice step ; the first type that matches the text value is kept.

If an attribute is defined with a single type, its definition uses the @ref-type attribute. Otherwise, this attribute is missing and the content of the attribute definition may refer to a list of types. The attribute definitions below are equivalent; the former uses a type that aggregates those that the latter uses directly:

    <asl:attribute name="min-occurs" ref-type="asl:min-occurs"/>
    <asl:attribute name="min-occurs">
        <asl:text ref-type="xs:nonNegativeInteger"/>
        <asl:text ref-type="adt:expression"/>
    </asl:attribute>

The attribute definition act as a choice step ; the first type that matches the attribute value is kept.

The last mean to define a type, is to extend an existing type, by using the @base attribute, to specify which type it is based on.

    <asl:type base="xs:integer" name="xs:nonNegativeInteger">
        <!-- type definition here -->
    </asl:type>

The definition consists on steps that use matchers.

As explained hereafter, a type may be :

A composite data is a typed data produced by a composite type, that is to say, a typed data that may contain other typed datas. A non-composite data such as an xs:int is a typed data with a single value; it can't contain other typed datas.

The formal type of a composite type is adt:XComponent.

4.1.2 Internal data model representation

A typed data is a cross operable object for which its attributes contains characteristics of the type for the specific value (facets), and its children contain the parsed datas (values).

When parsing a text value, the engine try to build an internal data model ; the parsing fails when the target object fails to construct, or if some additional assertions -introduced with the <asl:assert> element- fails. Otherwise, the parsing succeeds.

Runtime data types are parsed as expressions, and the object expected can be retrieved only at runtime ; thus, errors may be raised at runtime. Runtime data types are involved thanks to the adt:expression type. Notice that at runtime, an adt:expression may also return non XML-aware objects; the type of such objects, known as marker types, are out of the scope of this specification. Please refer to the Active Datatypes specification for further information.

The parsing result may be constructed with the help of other types ; in this way, the data model obtained may be any arbitrary complex structure. Active Schema provide the <asl:item> element to build a made-to-measure data model. When built with Active Schema actions, a text value is always parsed to a typed data that is a cross operable object.

For example, the fr:date type could be defined to parse a value such as 10 juin 1969, and return an object that could be accessed thanks to XPath ; in the context of its value :

Facets

Facets are attributes exposed in addition to the data model. They have a name and a value that is not necessary a string, and can be constraint.

For example, an xs:integer have the facet @adt:total-digits that contains the number of digits of the integer. An assertion on this facet could be set like this :

    <asl:assert test="{ @adt:total-digits < number(2) }"/>
Note

WXS datatypes are exposed in Active Tags in a slightly different manner than in the W3C XML Schema specification, because the base concepts are somewhat different, specifically on the hierarchy model. However, as the same features are covered and as they share the same semantics, they are compatible. Active Tags just provides a different view of the WXS datatypes.

The Active Datatypes specification describes how WXS datatypes can be used in Active Tags. In particular, it names the WXS facets to use as attributes in typed datas.

The core facets are :

Note

The facets are bound to the http://www.inria.fr/xml/active-datatypes namespace URI for convenience : typed datas may have their own attributes (user defined) that can't be in conflict with the facets.

The value of the object itself may be used to express constraints. For example, to constraint an integer to be less than or equal to 31 :

    <asl:assert test="{ value( . ) <= number( 31 ) }"/>

4.1.3 Parsing

Text parsing is very close to content model parsing : many Active Schema elements (<asl:choice>, <asl:except>, <asl:interim>...) are accomplishing the same function for text parsing that for content model parsing. The difference is that the material used to feed the context are related to text :

Matchers and composite types

A type definition uses text matchers that are text values, regular expressions or other type definitions that define which character sequences are allowed in the type definition. When all matchers expected in a type definition has been involved and that a character sequence remains, the type returns the result data model with a remainder. If the host material that was using this type definition is itself a type definition, the host type goes on applying the matching with the remainder, and so on until the host is an attribute definition or a text content model. At this stage, if the remainder is involved in the next type or matcher, the process is repeated. When the host attribute or text content model definition definitively ends, the remainders must have been consumed. Otherwise, the matching fails.

A text matcher is involved with the @value, @match, or @ref-type attributes of the <asl:text> element :

Finally, if the <asl:text> element has none of the above attributes, the type definition used will be the first found in the context after running the element content, otherwise it won't match anything.

Text matchers may be optional, and may be repeated. The repetition may be specified with the @min-occurs and @max-occurs attributes.

Warning

Repetitions may be impossible to process without the help of separators that are not involved in the matching process.

For example, 123456 can't match two xs:integer whereas 12,3456 can match one xs:integer, the "," separator, and another xs:integer. If the type my:twoDigits was defined to match two digits, then 123456 could match three my:twoDigits.

However, my:twoDigits could work as explained above only if it doesn't rely on an xs:integer, constraint by an aditionnal assertion set on its facets, like this:

    <asl:assert test="{ @total-digits < number(2) }"/>

Typed data items

The following sequence definition is used to match x=12,y=34 but not x=,y=34 :

    <asl:sequence>
        <asl:text ignore="yes" value="x="/>
        <asl:text ref-type="xs:nonNegativeInteger"/>
        <asl:text ignore="yes" value=",y="/>
        <asl:text ref-type="xs:nonNegativeInteger"/>
    </asl:sequence>

The @ignore attribute is used to specify that the value matched is not used to build the result data model. The others matched character sequences are used to build the result data model as unnamed items which are of the type xs:nonNegativeInteger in this example.

To build the data model with a named item, or to compute a value other than those matched, the @item-value and @item-name attributes may be used :

    <asl:text item-name="x" item-value="{current()}" ref-type="xs:nonNegativeInteger"/>

As shown above, the current object is set to the matched value before item creation.[FIXME: not sure, remove it ?[ After, the previous value of the current object is restored.]]

Additionally, this attributes (@item-value and @item-name) may be separated ; in this case, the next matched value that follows a matched value that indicates an item name, must specify an item value.

    <asl:text item-name="x" value="x="/>
    <asl:text item-value="{current()}" ref-type="xs:nonNegativeInteger"/>

In short, a matched content may be :

Finally, the result data model may be construct with arbitrary additional items with the <asl:item> element, and optionally its @name attribute. When encountered, this element wraps in a composite data (its type is adt:XComponent) all subitems produced; if empty, the item is not created.

    <asl:item name="point">
        <asl:sequence>
            ...
        </asl:sequence>
    </asl:item>

Initializing the internal data model

The @base attribute of a type definition (<asl:type>) indicates that the type is based on another type, called the base type. The base type is used to parse the input text data before using the inner type definition. The type definition may indicate how to initialize the typed data and how to parse further.

For this purpose, the @init attribute indicates how to initialize the typed data ; when present, it contains an expression that will be computed to initialize the typed data ; common usage are explained below :

In any case, the current object is set to the typed data produced by the base type. Additionally, the $asl:data property is set to the initialized typed data. While parsing, the typed data initialized may be updated or its content appended if it is a composite data; it can be referred thanks to the $asl:data property.

The @parse attribute is involved after the @init attribute to indicate which text data will be used for the parsing. If missing, the type will parse the remainder that has not been parsed by the base type. Otherwise, it contains an expression that will return the text to parse.

Note

When the @base attribute of a type definition is missing, it is equivalent to set the base type to xs:string, the @init attribute to "void", and the @parse attribute to "{.}" : the effect is that the entire text value is parsed with the type definition.

For example, the following type is based on an integer:

    <asl:type base="xs:int" init="{.}" name="temperature">
        <!--temperature stuff here-->
    </asl:type>

A typed data created by this type is of the xs:int type; the inline part of the definition was parsing the remainder, if any (the @parse attribute is missing).

The following type will remove undesirable spaces from the input text value before choosing which text has been selected :

    <asl:type init="" name="size" parse="{asl:compacted-string(.)}">
        <asl:choose>
            <asl:text value="big"/>
            <asl:text value="small"/>
        </asl:choose>
    </asl:type>

In this example, the remainder -if any- is also cleaned of trailing spaces.

[FIXME: what about an optional attribute item-name='name' to do things like for asl:text ?]

Building the internal data model

When parsing, each time a matcher has matched, the typed data matched is set as the current object, that the matcher can refer to build the data model if the @ignore attribute is not set to "yes". For example, the following text matcher builds an item with the name "x" and which value is t