[Up] |
Work in progressThis version may be updated without notice. |
Copyright © INRIA
This specification describes an I/O optional module for
. This I/O module provides tags, attributes, functions and predefined properties related to I/O problematics.This module intends to deal with several I/O sources but as it provides general purpose I/O features, dedicated modules may be more suitable to handle a specific source. Anyway, this module also provide a basic support for requestable I/O sources, such as querying native XML databases with XQuery queries.
With the I/O module, XPath can also be used to traverse various file systems.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Note that for reasons of style, these words are not capitalized in this document.
The following specifications are part of the technologies.
1.1 Traversing the file system2 Examples
1.2 URI handling
1.3 URI schemes support
1.4 Parameters
1.5 MIME types
1.6 File content and encoding
1.7 Fixing non-canonalizable file information with catalogs
2.1 Batch processing3 I/O module reference
2.2 Synchronized copy to a WebDAV server
2.3 Querying XML:DB in a Web application
3.1 Elements
3.2 Foreign attributes
3.3 Predefined properties
3.4 Extended XPath functions
3.5 Data types
A Glossary
B Related specifications
C Lists
C.1 Examples listD for the I/O module
C.2 Figures list
This
module has been designed to provide basic I/O support, including the capability of browsing file systems with XPath and querying native XML databases with XQuery queries. Both local files and remote files might be supported according to the protocols covered by the underlying implementation. Additionally, this module supplies a consistent support of URIs and their underlying resources.Few objects are provided by this module :
The former is a cross operable object that provides useful informations about the file ; the 2 latters are just marker types to handle by actions.
Traversing the file system with XPath is not the same as traversing the file system without XPath.
For example, assuming that file:///path/to/ is an empty directory, the expression io:file('file:///path/to/my/file') will return a file object corresponding to a file that doesn't exist, whereas the expression io:file('file:///path/to/')/my/file will return nothing. Actually, the XPath expression will be evaluated regarding the directory and as this expression consist on traversing its existing children, it returns nothing.
Attempting to write to a file that doesn't exist will create it and its parent directories when needed, but will fail in the second example because no file is held.
In order to create a handler to a file (or a subdirectory) that doesn't necessary exist inside a base directory, simply use the io:resolve-uri() function : io:resolve('file:///path/to/', 'my/file') will be resolved to the expected path file:///path/to/my/file. Attempting to create a file on a file system where a directory of the same name already exists (or the opposite) will fail.
Before traversing any file system, this module is also intended to work with URIs. The string representation of hierarchic URIs must take care of whether the path represent a file or a directory. To achieve this, the string representation of the path of files must not end with "/", and the path of directories must end with "/". On the opposite, the last step involved in an XPath expression applied to an #io:x-file object doesn't need a trailing slash since the XPath syntax doesn't allow it ; as each step consist on looking up for the child resource, the kind of the resource (file or directory) is supplied by the underlying file system ; thus, if the last step denotes a directory, a slash will be automatically appended to its path name.
This is specifically important for resolving URIs : io:resolve-uri('file:///path/to', 'my/file') will be resolved to the URI file:///path/my/file whereas io:resolve-uri('file:///path/to/', 'my/file') will be resolved to the URI file:///path/to/my/file ; additionally, this last URI stands for a file whereas this one : io:resolve-uri('file:///path/to/', 'my/file/') will be resolved to the URI file:///path/to/my/file/ that stands for a directory since it ends with a "/".
The underlying file system will be invoked only if an operation requires it, such as accessing with XPath to the children of a directory, or reading its creation date. Notice that the trailing "/" is forbidden in XPath expressions, and io:file('file:///path/to/')/my/file can lead either to a file that would have the URI file:///path/to/my/file, or to a directory that would have the URI file:///path/to/my/file/, or to nothing according to what would return the underlying file system.
As most of URI handling are performed without accessing resources, resolving a relative path regarding a base URI must be understand as specified in RFC 3986. Other operations that need resource access can be involved implicitely, particularly when the user browse the file system, e.g io:file('file:///path/to/')/my/file as explained above. More generally, the #io:x-file type contains characteristics that are intrinsic to an instance (such as the components of the URI), and others that require an access to the underlying file system.
If an URL is used when opening a stream, the underlying implementation may support several additional protocols such as WebDAV or FTP. How these protocols are registered to the engine is implementation dependant.
An implementation must at least implement the file scheme ; other well-known schemes are welcome :
|
|
|
Additionally, specific URN schemes might be considered if the underlying implementation would support some.
More suitable modules could define specific scheme handling, for example when user interaction is needed. However, the <io:file> element is also convenient to pass additional parameters when accessing the resource ; generally, io:file() can be substituted to <io:file> when no additional parameters are expected.
Parameters are somewhat unusual in file systems. URIs intend to be a canonical representation of all informations expected to handle a file. However, some schemes may accept parameters either in the aim of changing the normal behaviour, or because it is expected to access the resource.
For example, in addition to an HTTP URL, it is possible to send pairs of { key , value } that are not part of the URL with the POST HTTP method. These additional datas are not part of the URL.
The XML Control Language provides the <xcl:param> element which is useful for this purpose and can be used in this module with the <io:file> or <io:request> elements. In fact, parameters in this I/O module could be understand like "headers" for HTTP, considering that this kind of feature is extended to other file system types. Each file system type is free to understand or ignore parameters, and this module doesn't expect any of them to support parameters.
MIME types are additional information that indicates the file format and are also called content types.
Some files must endorse a MIME type to store their content on their file systems. Actually, when storing a file to an XML Native Database through XML:DB, the type of the resource to store must be specified :
<io:file mime-type="application/xml" name="myFile"
uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <io:save content="..." uri="{ $myFile }"/>
As this data is also available in the #io:x-file created, it can be set on it at a later stage :
<io:file name="myFile" uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <!--do something--> <xcl:set-attribute content="application/xml" name="io:mime-type" referent="{ $myFile }"/> <io:save content="..." uri="{ $myFile }"/>
Instead of setting the MIME type to the target resource, it can also be set to the source :
<io:file mime-type="application/xml" name="source" uri="file:///path/to/resource"/> <io:file name="target" uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <io:copy source="{ $source }" target="{ $target }"/>
If the underlying file system is able to supply the MIME type of the resource, it can be omitted :
<!--the web host supplies the MIME type of the resource--> <io:file name="source" uri="http://host:port/path/to/resource"/> <io:file name="target" uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <io:copy source="{ $source }" target="{ $target }"/>
...or override :
<!--the web host MIGHT supply the MIME type of the resource, but we want to be sure--> <io:file mime-type="application/xml" name="source"
uri="http://host:port/path/to/resource"/> <io:file name="target" uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <io:copy source="{ $source }" target="{ $target }"/>
Some file systems have means to bind MIME types to file extensions. This is implementation dependant :
<!--this implementation sets the MIME type thanks to a mapping of ".xml" to "application/xml" --> <io:file name="source" uri="file:///path/to/resource.xml"/> <io:file name="target" uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <io:copy source="{ $source }" target="{ $target }"/>
It is also possible to explicitely invoke the mapper thanks to the io:mime-type() function. How the MIME types are mapped to the file extensions is implementation dependant.
<!--set the MIME type mapped to ".xml"--> <io:file mime-type="{ io:mime-type('xml') }" name="source"
uri="file:///path/to/resource.xml"/> <io:file name="target" uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <io:copy source="{ $source }" target="{ $target }"/>
The function can also be invoked after the creation of the file object :
<!--set the MIME type mapped to ".xml"--> <io:file mime-type="{ io:mime-type('xml') }" name="source"
uri="file:///path/to/resource.xml"/> <xcl:set-attribute content="{ io:mime-type( $source ) }" name="io:mime-type"
referent="{ $source }"/> <io:file name="target" uri="xmldb:provider://user:pwd@host:port/path/to/resource"/> <io:copy source="{ $source }" target="{ $target }"/>
This module intends to process XML documents as well as binary resources or other (non-XML) text files. XML parsers have means to fix which charset is used in an XML document. By definition, binary resources are not read neither write as text, thus no encoding must be supplied. On the opposite, to process healthily a text file, a specific encoding must be supplied.
Some file systems may provide an encoding by themselves : for example, http, https, webdav and webdavs may set the encoding to a file thanks to an HTTP header. In this case, such files can be handled safely as the right encoding has been set to the file by the underlying file provider.
However, in many cases, the expected encoding is missing. The #io:x-file object uses the @io:encoding attribute that may be set or overriden, or even removed (to force a text file to be read as binary datas). To achieve this, an #io:x-file instance may be created with the <io:file> active tag, and with the right HTTP header as a parameter.
When opening a file, the #io:input or #io:output will be read or written with the encoding supplied by the file, if any ; if no encoding is provided, the content will be processed as binary data.
The $sys:encoding property may be used to set to a file the encoding of the underlying locale.
The files :
<io:file name="f1" uri="http://www.acme.org/boats/titanic.jpg"/> |
should have no encoding. |
<io:file name="f2" uri="http://www.acme.org/boats/titanic.html"/> |
should have an encoding. |
<io:file encoding="" name="f3" uri="http://www.acme.org/boats/titanic.jpg"/> |
is forced to have no encoding. |
<io:file name="f4" uri="file:///path/to/titanic.jpg"/> |
has no encoding. |
<io:file encoding="iso-8859-1" name="f5" uri="file:///path/to/titanic.txt"/> |
has the encoding specified. |
<io:file encoding="{ $sys:encoding }" name="f6" uri="file:///path/to/titanic.txt"/> |
has the locale encoding. |
The same source file may also be handled both in a binary form or with different text encodings. For example, assuming that <tool:text-dump> and <tool:binary-dump> are custom actions that are dumping a file, the following snippet code will dump the same file in text and binary :
<io:file name="myBinary" uri="file:///path/to/file"/> <io:file encoding="{ $sys:encoding }" name="myText" uri="{ $myBinary }"/> <tool:text-dump source="{ $myText }"/> <tool:binary-dump source="{ $myBinary }"/>
Of course, a low-level implementation of these actions could also bypass this mechanism if necessary.
As shown previously, a file is not necessary defined entirely with a sole URI ; an encoding and other parameters may be added.
An
may be advantageously used to arrange URIs mapping to full parameterized files.[TODO]
<io:file name="file" selector="my:selector" uri="urn:scheme:scheme-specific-part"/>
By specifying a selector, the catalog will be forced to resolve the URI (which is an URN in the above example, but that could be an URL as well). The catalog entry that will match the selector and the URI will be used to resolve it ; the catalog might supply a resolved file with its additional informations (encoding and special purpose parameters).
With XML technologies, it is often useful to publish an entire XML repository in HTML or PDF ;
allows to describe such a publishing process.Batch example | |
---|---|
Very few tags are necessary to transform XML files of a whole directory in HTML. <?xml version="1.0" encoding="iso-8859-1"?> Here we use an iterative tag (<xcl:for-each>) that nests subactions. io:file() is an XPath function of the I/O module that produces an #io:x-file# file object which behaves like an XML object : when the XPath step // is applied on such object, the subdirectories are crossed recursively, as expected. The XPath predicate [@io:is-file] is applied on the result to keep only files, not directories, and the next predicate is applied to keep files that end with ".xml". Objects that behaves like XML objects, like the file objects in this example, are called cross operable or X-operable objects. They may have attributes (like @io:is-file), and may support other XPath axes. |
Modularization and cross operable objects are one of the most powerfull concepts of .
The <io:request> element allows to query a data source. It can be used for querying native XML databases, and XQuery queries can be submitted to the databases that support it.
I/O | : | Input/Ouput module |
I/O namespace URI | : | http://ns.inria.org/active-tags/io |
Usual prefix | : | io |
Elements | Foreign attributes | Extended functions | Data types |
---|---|---|---|
<io:close> <io:copy> <io:create-dir> <io:delete> <io:file> <io:open> <io:read> <io:request> <io:save> <io:write> | @io:version | io:file() io:exists() io:root-files() io:resolve-uri() io:relativize-uri() io:mime-type() | #io:input #io:output #io:x-file |
Must be an adt:expression that computes an object of the type expected. | |
Must be a hard-coded value (litteral) | |
Can be either a hard-coded value or an adt:expression | |
This material may be missing | |
Denotes a value to use by default | |
Allows a read operation. | |
Allows a write operation. | |
Allows a rename operation. | |
Allows an update operation. | |
Allows a delete operation. |
Closes a stream.
Attributes runtime | hard-coded | both Name Type Value optional | default value stream #io:input The input stream to close. #io:output The output stream to close.
[TODO: Synchronized copy]Copies a file. If the source to copy is a directory, all its descendant are candidate to copy ; each candidate may be filtered.
If the source to copy is a file with a MIME type and/or an encoding and the target is a file, the MIME type and/or the encoding of the source is set to the target file. The MIME type and the encoding of the source can be either set explicitely or set automatically by the underlying file system ; notice that in any case they can be also unset explicitely.
Copying with a filter <io:copy filter="{ @io:extension='xml' }" source="file:///path/to/srcdir/"
target="file:///path/to/targetdir/"/>
Attributes runtime | hard-coded | both Name Type Value optional | default value source #io:input The input stream to copy. #io:x-file The file to copy. #xs:anyURI The URI of the file to copy. target #io:output The output stream to copy. #io:x-file The target file or directory. #xs:anyURI The URI of the file or directory to copy. filter Set a filter for each candidate file to copy. The current object is set to each candidate file when the expression is evaluated. #xs:boolean true Reject the file. false Accept the file.
Using complex filters
A macro function might be useful for complex filters that can't be expressed in a single expression.
Creates a directory. When needed, the parent directories are also created.
A #io:x-file property is added to the data set. When the @name attribute is omitted, the property is set as the current object.
Attributes runtime | hard-coded | both Name Type Value optional | default value uri #xs:anyURI The URI of the directory to create, or the base directory that will contain the directory to create if the is specified. #io:x-file The directory to create, or the base directory that will contain the directory to create if the is specified. path #xs:string The relative path to the directory to create. The path is relative to the base URI given by the attribute name #xs:QName The name of the property that contains the created dir.
Deletes a file or a directory and its content if any.
Attributes runtime | hard-coded | both Name Type Value optional | default value uri #xs:anyURI The URI of the file to delete. #io:x-file The file to delete.
Define a file.
According to the scheme specified in the URI, some additional informations that can't be canonicalized in the address may be accepted ; for this purpose, parameters may provide them. Other I/O actions that require parameters but don't open a parameter context at runtime should use this action before.
A file that doesn't need parameters can be handled directly with the io:file() function.
This action create a property of the #io:x-file type. If some parameters are found while performing the subactions, they are set as attributes of the property created if their name is not bound to a namespace URI, and will be used by the underlying scheme provider to access the resource. Parameters irrelevant for the provider are ignored.
runtime phase
- Opens a parameter context and runs its subactions. Any parameter found (with <xcl:param> or other) will be set as an attribute to the property to create if its name is not bound to a namespace URI.
- A #io:x-file property is added to the data set. When the @name attribute is omitted, the property is set as the current object.
Attributes runtime | hard-coded | both Name Type Value optional | default value uri #xs:anyURI The URI of the file. #io:x-file The file given is returned with additional parameters if some were found while performing the subactions. name #xs:QName The name of the property to add to the data set. mime-type #xs:string The MIME type of that file. encoding #xs:string The encoding of the content of that file.
Opens a file.
Create a property that is either an #io:input or an #io:output, according to the mode used.
Attributes runtime | hard-coded | both Name Type Value optional | default value uri #xs:anyURI The URI of the file to open. #io:x-file The file to open. mode The open mode. #xs:string append Opens an input stream for appending content. read Opens an input stream for reading content. write Opens an output stream for writing content. name #xs:string The name of the property to add to the data set.
Reads an input stream.
A property that contains the data read is added to the data set. When the @name attribute is omitted, the property is set as the current object.
Attributes runtime | hard-coded | both Name Type Value optional | default value input #io:input The input stream to read. mode The reading mode. #xs:string line Reads an entire line. char Reads a single char. byte Reads a single byte. name #xs:string The name of the property to add to the data set.
Sends a request to an I/O source. This operation is relevant for XML:DB sources.
runtime phase
- Opens a parameter context and runs its subactions. Any parameter found (with <xcl:param> or other) is transmitted to the XML:DB driver.
- A property that contains the result of the request is added to the data set. When the @name attribute is omitted, the property is set as the current object.
error handling phase
If an error is encountered and <xcl:fallback> elements are children of this element, they will be invoked according to their identifier :
ID Condition of invokation io:serviceNotRegistered Denotes that the driver has not been registered. io:serviceNotCompatible Denotes that the query service is not understood by the underlying implementation. Each of the fallback actions above will be invoked with the same data set as those used when the parsing starts. A property of the type #xml:x-error is previously added.
Additionally, a default <xcl:fallback> element may be supplied.
Attributes runtime | hard-coded | both Name Type Value optional | default value name #xs:QName The name of the property to add to the data set. scope The scope of the property to create. #xs:string local Local. global Global. shared Shared. connect #xs:anyURI The connexion to the I/O source, such as : "xmldb:provider://user:password@host:port/path". #io:x-file The connexion to the I/O source. style The type of the items of the result. Some provider may deliver the result as a list of items that is accessible as soon as the first item is available. missing attribute The result is delivered as is. #xs:string tree XML DOM document, if relevant. stream XML SAX events, if relevant. xml XML character stream, if relevant. as-is As returned by the query. type #xs:string The name of the query service ; the query service establishes the request language to use for the query. May be :
- "XQuery",
- "XPath",
- a provider-dependant language request such as "XyQL",
- or anything else registered as a query service to the XML:DB provider.
version The version of the query service. #xs:string 1.0 The version. other query #xs:string @sourceThe query to submit, that depends on the query service given by the @type attribute. If missing, the query is defined either with the attribute or thanks to some parameters accepted by this action, which are dependant to the provider. source The URI to the source of the query to submit, that depends on the query service given by the @type attribute. If missing, the query is defined either with the @query attribute or thanks to some parameters accepted by this action, which are dependant to the provider. #xs:anyURI The source URI. #io:x-file The source file.
Saves a content to a file.
Attributes runtime | hard-coded | both Name Type Value optional | default value uri #xs:anyURI The file to save to. #io:x-file The file to save to. content #xs:string The content to write.
Writes to an output stream.
Attributes runtime | hard-coded | both Name Type Value optional | default value output #io:output The output stream to write to. content #xs:string The content to write. #other The content to write, taken from the string value of the object given.
- Priority : 0
The version of the I/O module to use. This attribute should be encountered before any I/O element, but it takes precedence on the element inside which it is hosted.
No predefined properties are defined by this module.
Return: #io:x-file Represent a file. If the file to be represented needs parameters, use the <io:file> element instead.
Arguments 1 #xs:anyURI The URI of the file. Arguments 1 #io:x-file A parent file. 2 #xs:anyURI A child path.
Return: #xs:boolean Test if a file exist or not.
Arguments 1 #io:x-file The file to test. Arguments 1 #xs:anyURI The URI of the file to test.
Return: #adt:list of #io:x-file This function returns the collection of the available filesystem roots.
Return: #xs:anyURI Resolve an URI regarding a base URI, or relocate an URI regarding a base URI to a target base URI.
Resolving an URI
Resolving more-path/to/file upon file:///path/to/base/ will give file:///path/to/base/more-path/to/file.
Arguments 1 The base URI. #io:x-file A file. #xs:anyURI An URI. 2 The URI to resolve. #io:x-file A file. #xs:anyURI An URI. Relocating an URI
Relocating consist on relativizing then resolving.
Relocating the URI file:///path/to/base/more-path/to/file upon the base file:///path/to/base/ to the target http://www.somehost.org/somepath/ will give http://www.somehost.org/somepath/more-path/to/file.
Arguments 1 The base URI. #io:x-file A file. #xs:anyURI An URI. 2 The URI to resolve. #io:x-file A file. #xs:anyURI An URI. 3 The base target URI used for the resolution. #io:x-file A file. #xs:anyURI An URI.
Return: #xs:anyURI Relativize an URI regarding a base URI.
Arguments 1 The base URI. #io:x-file A file. #xs:anyURI An URI. 2 The URI to relativize with the base. #io:x-file A file. #xs:anyURI An URI.
Return: #xs:string Return the MIME type bound to the given file extension, or the empty string if none.
How the extensions are mapped to the MIME types is implementation dependant.
Arguments 1 #io:x-file A file. The MIME type returned is not those set to the file but those mapped to its extension if any. Implementations are free to access resources content to examine which type it is. 2 #xs:string A file extension, without the dot : "xml", "gif", "jpeg", "txt" etc
Represents an input stream of bytes or characters.
Represents an output stream of bytes or characters.
Represents a file or directory on the file system.
Operation read | write | rename | update | delete Type Value Comment #xs:QName The new name of the file. type() #xs:QName #io:x-file This type name() #xs:QName The name of the file. position() #xs:integer The position of the file in its parent directory. string() #xs:string The path of this file. parent:: #io:x-file The parent directory of this file. child:: #adt:list of #io:x-file The files contained in this directory. attribute:: A set of attributes including those specified below (and that can't be removed). Additional attributes may be set, removed or updated, if they are not bound to a namespace URI (see <io:file>). #adt:map of #adt:NItem All attributes are readable. #adt:map of #adt:NItem Any attribute not bound to a namespace URI may be written. #adt:map of #adt:NItem Any attribute not bound to a namespace URI may be updated. Others are updated as specified below. #adt:map of #adt:NItem Any attribute not bound to a namespace URI may be removed. #adt:map of #adt:NItem Any attribute not bound to a namespace URI may be renamed if the new name is not bound to a namespace URI. @io:can-read Indicates whether or not this file can be read. #xs:boolean true This file can be read. false This file can't be read. #xs:boolean true Mark this file so that read operations are allowed. false Mark this file so that read operations are disallowed. @io:can-write Indicates whether or not this file can be written. #xs:boolean true This file can be written. false This file can't be written. #xs:boolean true Mark this file so that write operations are allowed. false Mark this file so that write operations are disallowed. @io:exists Indicates whether or not this file exists. #xs:boolean true This file exists. false This file doesn't exist. @io:is-empty Indicates whether or not this file is empty. #xs:boolean true This file is empty. false This file is not empty. @io:extension #xs:string The extension of this file (the part of the file name that follows the last dot). @io:short-name #xs:string The name of this file without its extension if any. @io:path #io:x-file The path of this file. This path is normalised, so that . and .. elements have been removed. Also, the path only contains / as its separator character. The path always starts with / @io:decoded-path #xs:string The path where each escape sequence "%nn" is replaced by its character. @io:depth #xs:string The depth of this file path. The depth of the root of a file system is 0. The depth of any other file is 1 + the depth of its parent. @io:is-directory Indicates whether or not this is a directory. #xs:boolean true This is a directory. false This is not a directory. @io:is-file Indicates whether or not this is a file. #xs:boolean true This is a file. false This is not a file. @io:is-hidden Indicates whether or not this file is hidden. #xs:boolean true This file is hidden. false This file is not hidden. @io:last-modified #xs:date The last date when this file was modified. #xs:date Set the date of modification of this file. @io:length #xs:int The length of this file. @io:encoding The encoding of this file. #xs:string By default, it is those supplied by the underlying file system if any, but it can also be those set explicitely if relevant. #xs:string Set the encoding of this file. @io:mime-type The MIME type of this file. #xs:string By default, it is those supplied by the underlying file system if any, but it can also be those set explicitely if relevant. #xs:string Set the MIME type of this file. @io:uri #xs:anyURI The URI of this file.
This list is not exhaustive; it is a list of common modules usable by an engine that implements the
specifications that implementors may use. Additional modules are welcome.<asl:active-schema asl:version="1.0" target="io" schema-version="1.0" xml:lang="en" xmlns:io="http://ns.inria.org/active-tags/io" xmlns:asl="http://ns.inria.org/active-schema" xmlns:adt="http://ns.inria.org/active-datatypes" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes" xmlns="http://www.w3.org/1999/xhtml" xmlns:at="http://ns.inria.org/active-tags/reference"> <asl:element name="io:open"> </asl:element> <!-- TODO --> </asl:active-schema>