We implemented a declarative query language, XAQL, for querying archives.
XAQL allows retrieval of particular data versions, tracking
of object history, and retrieval of timestamps representing the sequence of
versions when a given condition was valid.
XAQL is oriented toward OQL since archives are not arbitrary
XML documents, but follow a fairly regular structure given by the key specification.
XAQL considers keyed nodes as objects and an archive as a nested, timestamped
object hierarchy defined by the given key specification. Besides keyed and unkeyed
nodes timestamps are a first class concept in XAQL. The general syntax of a
XAQL query is as follows:
|
|
SELECT [TIMESTAMP | * | expr] |
FROM {CHANGES IN | UNCHANGED} name {LCP path-expression} {KEEP KEY VALUES} |
{WITH variable-list} |
{VERSION timestamp} |
{WHERE condition {COINCIDE}} |
|
The SELECT and FROM clause are mandatory, all other parts are optional. The SELECT clause specifies the nodes in the query
result. The FROM clause specifies the archive to be queried. The WITH clause allows definition of object variables that are
used in select statements and where conditions. Temporal projection is enabled by the VERSION clause. When specified, a
query is evaluated only on those versions in the archive that are listed in the given timestamp. The WHERE clause filters
nodes based on a given condition. XAQL currently allows conjunctions of conditions on values of text nodes or unkeyed subtrees.
Coincidence queries are supported by the keyword COINCIDE in the WHERE clause. We further support predicates on the history
of nodes and subtrees. Predicate HAS CHANGES is true if the subtree rooted under a node has changes in the considered versions.
The WAS MODIFIED predicate is true if an element itself was changed, i.e., it does not inherit its timestamp from the parent.
|
|
Basic Queries |
|
Given an archive of a company database (called MYCOMPANY) as outlined below: |
|
<T t='1-5'> |
<COMPANY> |
<DEPARTMENT> |
<NAME>Marketing</NAME> |
<EMPLOYEE> |
<SSN>111</SSN> |
<NAME>John</NAME> |
<SALARY> |
<T t='1-2'>30,000</T> |
<T t='3-5'>40,000</T> |
... |
|
The following query lists the name and Social Security Number of all employees whose salary changed over the first 3 versions: |
|
SELECT E/NAME, E/SSN |
FROM MYCOMPANY |
WITH /COMPANY/DEPARTMENT/EMPLOYEE E |
VERSION 1-3 |
WHERE E/SALARY HAS CHANGES |
|
Path expressions in XAQL may contain constraints on key values. For example,
/COMPANY/DEPARTMENT('Marketing') matches the marketing department (assuming departments are keyed by their name).
Queries are evaluated in a single scan over the data. We start by determining the longest common prefix
(LCP) for all path expressions in the SELECT and WHERE clause. For nodes that match the LCP the WHERE clause is
evaluated. If the condition evaluates to true, we output those parts of the subtree that are specified in the
SELECT clause. The results are merged into a single document. Similar to OQL we allow subselects in the SELECT
clause. |
|
The following query lists the department name and employee name for employees that have (or had) a salary of 30000 or more: |
|
SELECT D/NAME, (SELECT E/NAME, E/SALARY FROM E WHERE E/SALARY >= 30000) |
FROM MYCOMPANY |
WITH /COMPANY/DEPARTMENT D, /COMPANY/DEPARTMENT/EMPLOYEE E |
WHERE E/SALARY >= 30000 |
|
Specifying TIMESTAMP instead of a select statement in the SELECT clause returns the union of
timestamps for all nodes that satisfy the WHERE clause. |
|
The following query returns the timestamp of all those versions in which at least one employee in the
marketing department had a salary of 30000 or more: |
|
SELECT TIMESTAMP |
FROM MYCOMPANY |
WHERE /COMPANY/DEPARTMENT('Marketing')/EMPLOYEE/SALARY >= 30000 |
|
WHERE Conditions |
|
The following predicates are currently allowed in the WHERE clause of a XAQL query: |
|
node-path [< | <= | = | != | >= | >] value |
Compares the value of a node with a given value. If the node value should
be interpreted as a string the value has to be placed in single quotes. Otherwise, the node value and
the specified value are interpreted as integers. |
| |
node-path LIKE value |
Similar to the SQL LIKE predicate. The wildcard character is %. |
| |
node-path MATCHES pattern |
Similar to the LIKE predicate. However, the pattern is any valid regular expression supported
by the java.util.regex API for pattern matching. |
| |
EXISTS node-path |
Evaluates to true if the specified node exists. |
| |
node-path HAS CHANGES |
Evaluates to true if the specified element or one of its sub-elements is timestamped. |
| |
node-path WAS MODIFIED |
Evaluates to true if the specified element is timestamped. |
|
|
|
|
Each predicate may be preceded by the keyword NOT. The first three predicates may also be followed by the keywords FOR ALL VERSIONS meaning that the
predicate has to evaluate to true for all versions of the specified node. |
|
The following query returns the age of all employees whose name starts with 'P' and who never had
a salary below 30000: |
|
SELECT E/AGE |
FROM MYCOMPANY |
WITH /COMPANY/DEPARTMENT/EMPLOYEE E |
WHERE E/NAME LIKE 'P%' AND E/SALARY >= 30000 FOR ALL VERSIONS |
|
Comparing Versions |
|
In order to compare database versions in an archive you can use the CHANGES IN and UNCHANGED keywords in the FROM clause.
Specifying CHANGES IN means that only those nodes (and their sub-trees) are considered in query evaluation that have their own timestamp.
Since key values in archives don't change they will never be considered (i.e., output) in queries with CHANGES IN. This problem can be
avoided by using the KEEP KEY VALUES extension. The keyword UNCHANGED in general only considers noded that do not have a timestamp.
The CHANGES IN and UNCHANGED keywords are best used in combination with the VERSION clause to compare database versions. |
|
The following query compares the first to versions of the company database and
outputs only the nodes that changed together with their key values: |
|
SELECT * |
FROM CHANGES IN MYCOMPANY KEEP KEY VALUES |
VERSION 1-2 |
|