Q U E R Y   L A N G U A G E
We implemented a declarative query language, XAQL, for querying archives. XAQL allows retrieval of particular data versions, tracking of object history, and retrieval of timestamps representing the sequence of versions when a given condition was valid. XAQL is oriented toward OQL since archives are not arbitrary XML documents, but follow a fairly regular structure given by the key specification. XAQL considers keyed nodes as objects and an archive as a nested, timestamped object hierarchy defined by the given key specification. Besides keyed and unkeyed nodes timestamps are a first class concept in XAQL. The general syntax of a XAQL query is as follows:
SELECT [TIMESTAMP | * | expr]
FROM {CHANGES IN | UNCHANGED} name {LCP path-expression} {KEEP KEY VALUES}
{WITH variable-list}
{VERSION timestamp}
{WHERE condition {COINCIDE}}
The SELECT and FROM clause are mandatory, all other parts are optional. The SELECT clause specifies the nodes in the query result. The FROM clause specifies the archive to be queried. The WITH clause allows definition of object variables that are used in select statements and where conditions. Temporal projection is enabled by the VERSION clause. When specified, a query is evaluated only on those versions in the archive that are listed in the given timestamp. The WHERE clause filters nodes based on a given condition. XAQL currently allows conjunctions of conditions on values of text nodes or unkeyed subtrees. Coincidence queries are supported by the keyword COINCIDE in the WHERE clause. We further support predicates on the history of nodes and subtrees. Predicate HAS CHANGES is true if the subtree rooted under a node has changes in the considered versions. The WAS MODIFIED predicate is true if an element itself was changed, i.e., it does not inherit its timestamp from the parent.
Basic Queries
Given an archive of a company database (called MYCOMPANY) as outlined below:
<T t='1-5'>
 <COMPANY>
  <DEPARTMENT>
   <NAME>Marketing</NAME>
   <EMPLOYEE>
    <SSN>111</SSN>
    <NAME>John</NAME>
    <SALARY>
     <T t='1-2'>30,000</T>
     <T t='3-5'>40,000</T>
...
The following query lists the name and Social Security Number of all employees whose salary changed over the first 3 versions:
SELECT E/NAME, E/SSN
FROM MYCOMPANY
WITH /COMPANY/DEPARTMENT/EMPLOYEE E
VERSION 1-3
WHERE E/SALARY HAS CHANGES
Path expressions in XAQL may contain constraints on key values. For example, /COMPANY/DEPARTMENT('Marketing') matches the marketing department (assuming departments are keyed by their name). Queries are evaluated in a single scan over the data. We start by determining the longest common prefix (LCP) for all path expressions in the SELECT and WHERE clause. For nodes that match the LCP the WHERE clause is evaluated. If the condition evaluates to true, we output those parts of the subtree that are specified in the SELECT clause. The results are merged into a single document. Similar to OQL we allow subselects in the SELECT clause.
The following query lists the department name and employee name for employees that have (or had) a salary of 30000 or more:
SELECT D/NAME, (SELECT E/NAME, E/SALARY FROM E WHERE E/SALARY >= 30000)
FROM MYCOMPANY
WITH /COMPANY/DEPARTMENT D, /COMPANY/DEPARTMENT/EMPLOYEE E
WHERE E/SALARY >= 30000
Specifying TIMESTAMP instead of a select statement in the SELECT clause returns the union of timestamps for all nodes that satisfy the WHERE clause.
The following query returns the timestamp of all those versions in which at least one employee in the marketing department had a salary of 30000 or more:
SELECT TIMESTAMP
FROM MYCOMPANY
WHERE /COMPANY/DEPARTMENT('Marketing')/EMPLOYEE/SALARY >= 30000
WHERE Conditions
The following predicates are currently allowed in the WHERE clause of a XAQL query:
node-path [< | <= | = | != | >= | >] value Compares the value of a node with a given value. If the node value should be interpreted as a string the value has to be placed in single quotes. Otherwise, the node value and the specified value are interpreted as integers.
node-path LIKE value   Similar to the SQL LIKE predicate. The wildcard character is %.
node-path MATCHES pattern Similar to the LIKE predicate. However, the pattern is any valid regular expression supported by the java.util.regex API for pattern matching.
EXISTS node-path Evaluates to true if the specified node exists.
node-path HAS CHANGES Evaluates to true if the specified element or one of its sub-elements is timestamped.
node-path WAS MODIFIED Evaluates to true if the specified element is timestamped.
Each predicate may be preceded by the keyword NOT. The first three predicates may also be followed by the keywords FOR ALL VERSIONS meaning that the predicate has to evaluate to true for all versions of the specified node.
The following query returns the age of all employees whose name starts with 'P' and who never had a salary below 30000:
SELECT E/AGE
FROM MYCOMPANY
WITH /COMPANY/DEPARTMENT/EMPLOYEE E
WHERE E/NAME LIKE 'P%' AND E/SALARY >= 30000 FOR ALL VERSIONS
Comparing Versions
In order to compare database versions in an archive you can use the CHANGES IN and UNCHANGED keywords in the FROM clause. Specifying CHANGES IN means that only those nodes (and their sub-trees) are considered in query evaluation that have their own timestamp. Since key values in archives don't change they will never be considered (i.e., output) in queries with CHANGES IN. This problem can be avoided by using the KEEP KEY VALUES extension. The keyword UNCHANGED in general only considers noded that do not have a timestamp. The CHANGES IN and UNCHANGED keywords are best used in combination with the VERSION clause to compare database versions.
The following query compares the first to versions of the company database and outputs only the nodes that changed together with their key values:
SELECT *
FROM CHANGES IN MYCOMPANY KEEP KEY VALUES
VERSION 1-2