ArangoDB v2.8 reached End of Life (EOL) and is no longer supported.

This documentation is outdated. Please see the most recent version here: Try latest

High-level operations

FOR

The FOR keyword can be to iterate over all elements of an array. The general syntax is:

FOR variable-name IN expression

There also is a special case for graph traversals:

FOR vertex-variable-name, edge-variable-name, path-variable-name IN traversal-expression

For this special case see the graph traversals chapter. For all other cases read on:

Each array element returned by expression is visited exactly once. It is required that expression returns an array in all cases. The empty array is allowed, too. The current array element is made available for further processing in the variable specified by variable-name.

FOR u IN users
  RETURN u

This will iterate over all elements from the array users (note: this array consists of all documents from the collection named “users” in this case) and make the current array element available in variable u. u is not modified in this example but simply pushed into the result using the RETURN keyword.

Note: When iterating over collection-based arrays as shown here, the order of documents is undefined unless an explicit sort order is defined using a SORT statement.

The variable introduced by FOR is available until the scope the FOR is placed in is closed.

Another example that uses a statically declared array of values to iterate over:

FOR year IN [ 2011, 2012, 2013 ]
  RETURN { "year" : year, "isLeapYear" : year % 4 == 0 && (year % 100 != 0 || year % 400 == 0) }

Nesting of multiple FOR statements is allowed, too. When FOR statements are nested, a cross product of the array elements returned by the individual FOR statements will be created.

FOR u IN users
  FOR l IN locations
    RETURN { "user" : u, "location" : l }

In this example, there are two array iterations: an outer iteration over the array users plus an inner iteration over the array locations. The inner array is traversed as many times as there are elements in the outer array. For each iteration, the current values of users and locations are made available for further processing in the variable u and l.

RETURN

The RETURN statement can be used to produce the result of a query. It is mandatory to specify a RETURN statement at the end of each block in a data-selection query, otherwise the query result would be undefined. Using RETURN on the main level in data-modification queries is optional.

The general syntax for RETURN is:

RETURN expression

The expression returned by RETURN is produced for each iteration in the block the RETURN statement is placed in. That means the result of a RETURN statement is always an array (this includes the empty array). To return all elements from the currently iterated array without modification, the following simple form can be used:

FOR variable-name IN expression
  RETURN variable-name

As RETURN allows specifying an expression, arbitrary computations can be performed to calculate the result elements. Any of the variables valid in the scope the RETURN is placed in can be used for the computations.

Note: RETURN will close the current scope and eliminate all local variables in it.

RETURN DISTINCT

Since ArangoDB 2.7, RETURN can optionally be followed by the DISTINCT keyword. The DISTINCT keyword will ensure uniqueness of the values returned by the RETURN statement:

FOR variable-name IN expression
  RETURN DISTINCT expression

If the DISTINCT is applied on an expression that itself is an array or a subquery, the DISTINCT will not make the values in each array or subquery result unique, but instead ensure that the result contains only distinct arrays or subquery results. To make the result of an array or a subquery unique, simply apply the DISTINCT for the array or the subquery.

For example, the following query will apply DISTINCT on its subquery results, but not inside the subquery:

FOR what IN 1..2
  RETURN DISTINCT (
    FOR i IN [ 1, 2, 3, 4, 1, 3 ] 
      RETURN i
  )

Here we’ll have a FOR loop with two iterations that each execute a subquery. The DISTINCT here is applied on the two subquery results. Both subqueries return the same result value (that is [ 1, 2, 3, 4, 1, 3 ]), so after DISTINCT there will only be one occurrence of the value [ 1, 2, 3, 4, 1, 3 ] left:

[
  [ 1, 2, 3, 4, 1, 3 ]
]

If the goal is to apply the DISTINCT inside the subquery, it needs to be moved there:

FOR what IN 1..2
  LET sub = (
    FOR i IN [ 1, 2, 3, 4, 1, 3 ] 
      RETURN DISTINCT i
  ) 
  RETURN sub

In the above case, the DISTINCT will make the subquery results unique, so that each subquery will return a unique array of values ([ 1, 2, 3, 4 ]). As the subquery is executed twice and there is no DISTINCT on the top-level, that array will be returned twice:

[
  [ 1, 2, 3, 4 ],
  [ 1, 2, 3, 4 ]
]

Note: the order of results is undefined for RETURN DISTINCT.

Note: RETURN DISTINCT is not allowed on the top-level of a query if there is no FOR loop in front of it.

FILTER

The FILTER statement can be used to restrict the results to elements that match an arbitrary logical condition. The general syntax is:

FILTER condition

condition must be a condition that evaluates to either false or true. If the condition result is false, the current element is skipped, so it will not be processed further and not be part of the result. If the condition is true, the current element is not skipped and can be further processed.

FOR u IN users
  FILTER u.active == true && u.age < 39
  RETURN u

In the above example, all array elements from users will be included that have an attribute active with value true and that have an attribute age with a value less than 39 (including null ones). All other elements from users will be skipped and not be included the result produced by RETURN.

It is allowed to specify multiple FILTER statements in a query, and even in the same block. If multiple FILTER statements are used, their results will be combined with a logical and, meaning all filter conditions must be true to include an element.

FOR u IN users
  FILTER u.active == true
  FILTER u.age < 39
  RETURN u

SORT

The SORT statement will force a sort of the array of already produced intermediate results in the current block. SORT allows specifying one or multiple sort criteria and directions. The general syntax is:

SORT expression direction

Specifying the direction is optional. The default (implicit) direction for a sort is the ascending order. To explicitly specify the sort direction, the keywords ASC (ascending) and DESC can be used. Multiple sort criteria can be separated using commas.

Note: when iterating over collection-based arrays, the order of documents is always undefined unless an explicit sort order is defined using SORT.

FOR u IN users
  SORT u.lastName, u.firstName, u.id DESC
  RETURN u

Note that constant SORT expressions can be used to indicate that no particular sort order is desired. Constant SORT expressions will be optimized away by the AQL optimizer during optimization, but specifying them explicitly may enable further optimizations if the optimizer does not need to take into account any particular sort order.

LIMIT

The LIMIT statement allows slicing the result array using an offset and a count. It reduces the number of elements in the result to at most the specified number. Two general forms of LIMIT are followed:

LIMIT count
LIMIT offset, count

The first form allows specifying only the count value whereas the second form allows specifying both offset and count. The first form is identical using the second form with an offset value of 0.

The offset value specifies how many elements from the result shall be discarded. It must be 0 or greater. The count value specifies how many elements should be at most included in the result.

FOR u IN users
  SORT u.firstName, u.lastName, u.id DESC
  LIMIT 0, 5
  RETURN u

LET

The LET statement can be used to assign an arbitrary value to a variable. The variable is then introduced in the scope the LET statement is placed in. The general syntax is:

LET variable-name = expression

LET statements are mostly used to declare complex computations and to avoid repeated computations of the same value at multiple parts of a query.

FOR u IN users
  LET numRecommendations = LENGTH(u.recommendations)
  RETURN { 
    "user" : u, 
    "numRecommendations" : numRecommendations, 
    "isPowerUser" : numRecommendations >= 10 
  } 

In the above example, the computation of the number of recommendations is factored out using a LET statement, thus avoiding computing the value twice in the RETURN statement.

Another use case for LET is to declare a complex computation in a subquery, making the whole query more readable.

FOR u IN users
  LET friends = (
  FOR f IN friends 
    FILTER u.id == f.userId
    RETURN f
  )
  LET memberships = (
  FOR m IN memberships
    FILTER u.id == m.userId
      RETURN m
  )
  RETURN { 
    "user" : u, 
    "friends" : friends, 
    "numFriends" : LENGTH(friends), 
    "memberShips" : memberships 
  }

COLLECT

The COLLECT keyword can be used to group an array by one or multiple group criteria.

The COLLECT statement will eliminate all local variables in the current scope. After COLLECT only the variables introduced by COLLECT itself are available.

The general syntaxes for COLLECT are:

COLLECT variable-name = expression options
COLLECT variable-name = expression INTO groups-variable options
COLLECT variable-name = expression INTO groups-variable = projection-expression options
COLLECT variable-name = expression INTO groups-variable KEEP keep-variable options
COLLECT variable-name = expression WITH COUNT INTO count-variable options
COLLECT variable-name = expression AGGREGATE variable-name = aggregate-expression options
COLLECT AGGREGATE variable-name = aggregate-expression options
COLLECT WITH COUNT INTO count-variable options

Grouping syntaxes

The first syntax form of COLLECT only groups the result by the defined group criteria specified in expression. In order to further process the results produced by COLLECT, a new variable (specified by variable-name) is introduced. This variable contains the group value.

Here’s an example query that find the distinct values in u.city and makes them available in variable city:

FOR u IN users
  COLLECT city = u.city
  RETURN { 
    "city" : city 
  }

The second form does the same as the first form, but additionally introduces a variable (specified by groups-variable) that contains all elements that fell into the group. This works as follows: The groups-variable variable is an array containing as many elements as there are in the group. Each member of that array is a JSON object in which the value of every variable that is defined in the AQL query is bound to the corresponding attribute. Note that this considers all variables that are defined before the COLLECT statement, but not those on the top level (outside of any FOR), unless the COLLECT statement is itself on the top level, in which case all variables are taken. Furthermore note that it is possible that the optimizer moves LET statements out of FOR statements to improve performance.

FOR u IN users
  COLLECT city = u.city INTO groups
  RETURN { 
    "city" : city, 
    "usersInCity" : groups 
  }

In the above example, the array users will be grouped by the attribute city. The result is a new array of documents, with one element per distinct u.city value. The elements from the original array (here: users) per city are made available in the variable groups. This is due to the INTO clause.

COLLECT also allows specifying multiple group criteria. Individual group criteria can be separated by commas:

FOR u IN users
  COLLECT country = u.country, city = u.city INTO groups
  RETURN { 
    "country" : country, 
    "city" : city, 
    "usersInCity" : groups 
  }

In the above example, the array users is grouped by country first and then by city, and for each distinct combination of country and city, the users will be returned.

Discarding obsolete variables

The third form of COLLECT allows rewriting the contents of the groups-variable using an arbitrary projection-expression:

FOR u IN users
  COLLECT country = u.country, city = u.city INTO groups = u.name
  RETURN { 
    "country" : country, 
    "city" : city, 
    "userNames" : groups 
  }

In the above example, only the projection-expression is u.name. Therefore, only this attribute is copied into the groups-variable for each document. This is probably much more efficient than copying all variables from the scope into the groups-variable as it would happen without a projection-expression.

The expression following INTO can also be used for arbitrary computations:

FOR u IN users
  COLLECT country = u.country, city = u.city INTO groups = { 
    "name" : u.name, 
    "isActive" : u.status == "active"
  }
  RETURN { 
    "country" : country, 
    "city" : city, 
    "usersInCity" : groups 
  }

COLLECT also provides an optional KEEP clause that can be used to control which variables will be copied into the variable created by INTO. If no KEEP clause is specified, all variables from the scope will be copied as sub-attributes into the groups-variable. This is safe but can have a negative impact on performance if there are many variables in scope or the variables contain massive amounts of data.

The following example limits the variables that are copied into the groups-variable to just name. The variables u and someCalculation also present in the scope will not be copied into groups-variable because they are not listed in the KEEP clause:

FOR u IN users
  LET name = u.name
  LET someCalculation = u.value1 + u.value2
  COLLECT city = u.city INTO groups KEEP name 
  RETURN { 
    "city" : city, 
    "userNames" : groups[*].name 
  }

KEEP is only valid in combination with INTO. Only valid variable names can be used in the KEEP clause. KEEP supports the specification of multiple variable names.

Group length calculation

COLLECT also provides a special WITH COUNT clause that can be used to determine the number of group members efficiently.

The simplest form just returns the number of items that made it into the COLLECT:

FOR u IN users
  COLLECT WITH COUNT INTO length
  RETURN length

The above is equivalent to, but more efficient than:

RETURN LENGTH(
  FOR u IN users
    RETURN length
)

The WITH COUNT clause can also be used to efficiently count the number of items in each group:

FOR u IN users
  COLLECT age = u.age WITH COUNT INTO length
  RETURN { 
    "age" : age, 
    "count" : length 
  }

Note: the WITH COUNT clause can only be used together with an INTO clause.

Aggregation

A COLLECT statement can be used to perform aggregation of data per group. To only determine group lengths, the WITH COUNT INTO variant of COLLECT can be used as described before.

For other aggregations, it is possible to run aggregate functions on the COLLECT results:

FOR u IN users
  COLLECT ageGroup = FLOOR(u.age / 5) * 5 INTO g
  RETURN { 
    "ageGroup" : ageGroup,
    "minAge" : MIN(g[*].u.age),
    "maxAge" : MAX(g[*].u.age)
  }

The above however requires storing all group values during the collect operation for all groups, which can be inefficient.

The special AGGREGATE variant of COLLECT allows building the aggregate values incrementally during the collect operation, and is therefore often more efficient.

With the AGGREGATE variant the above query becomes:

FOR u IN users
  COLLECT ageGroup = FLOOR(u.age / 5) * 5 
  AGGREGATE minAge = MIN(u.age), maxAge = MAX(u.age)
  RETURN {
    ageGroup, 
    minAge, 
    maxAge 
  }

The AGGREGATE keyword can only be used after the COLLECT keyword. If used, it must directly follow the declaration of the grouping keys. If no grouping keys are used, it must follow the COLLECT keyword directly:

FOR u IN users
  COLLECT AGGREGATE minAge = MIN(u.age), maxAge = MAX(u.age)
  RETURN {
    minAge, 
    maxAge 
  }

Only specific expressions are allowed on the right-hand side of each AGGREGATE assignment:

  • on the top level, an aggregate expression must be a call to one of the supported aggregation functions LENGTH, MIN, MAX, SUM, AVERAGE, STDDEV_POPULATION, STDDEV_SAMPLE, VARIANCE_POPULATION, or VARIANCE_SAMPLE

  • an aggregate expression must not refer to variables introduced by the COLLECT itself

COLLECT variants

Since ArangoDB 2.6, there are two variants of COLLECT that the optimizer can choose from: the sorted variant and the hash variant. The hash variant only becomes a candidate for COLLECT statements that do not use an INTO clause.

The optimizer will always generate a plan that employs the sorted method. The sorted method requires its input to be sorted by the group criteria specified in the COLLECT clause. To ensure correctness of the result, the AQL optimizer will automatically insert a SORT statement into the query in front of the COLLECT statement. The optimizer may be able to optimize away that SORT statement later if a sorted index is present on the group criteria.

In case a COLLECT qualifies for using the hash variant, the optimizer will create an extra plan for it at the beginning of the planning phase. In this plan, no extra SORT statement will be added in front of the COLLECT. This is because the hash variant of COLLECT does not require sorted input. Instead, a SORT statement will be added after the COLLECT to sort its output. This SORT statement may be optimized away again in later stages. If the sort order of the COLLECT is irrelevant to the user, adding the extra instruction SORT null after the COLLECT will allow the optimizer to remove the sorts altogether:

FOR u IN users
  COLLECT age = u.age
  SORT null  /* note: will be optimized away */
  RETURN age

Which COLLECT variant is used by the optimizer depends on the optimizer’s cost estimations. The created plans with the different COLLECT variants will be shipped through the regular optimization pipeline. In the end, the optimizer will pick the plan with the lowest estimated total cost as usual.

In general, the sorted variant of COLLECT should be preferred in cases when there is a sorted index present on the group criteria. In this case the optimizer can eliminate the SORT statement in front of the COLLECT, so that no SORT will be left.

If there is no sorted index available on the group criteria, the up-front sort required by the sorted variant can be expensive. In this case it is likely that the optimizer will prefer the hash variant of COLLECT, which does not require its input to be sorted.

Which variant of COLLECT was actually used can be figured out by looking into the execution plan of a query, specifically the AggregateNode and its aggregationOptions attribute.

Setting COLLECT options

options can be used in a COLLECT statement to inform the optimizer about the preferred COLLECT method. When specifying the following appendix to a COLLECT statement, the optimizer will always use the sorted variant of COLLECT and not even create a plan using the hash variant:

OPTIONS { method: "sorted" }

Note that specifying hash as method will not make the optimizer use the hash variant. This is because the hash variant is not eligible for all queries. Instead, if no options or any other method than sorted are specified in OPTIONS, the optimizer will use its regular cost estimations.

COLLECT vs. RETURN DISTINCT

In order to make a result set unique, one can either use COLLECT or RETURN DISTINCT. Behind the scenes, both variants will work by creating an AggregateNode. For both variants, the optimizer may try the sorted and the hashed variant of COLLECT. The difference is therefore mainly syntactical, with RETURN DISTINCT saving a bit of typing when compared to an equivalent COLLECT:

FOR u IN users
  RETURN DISTINCT u.age
FOR u IN users
  COLLECT age = u.age
  RETURN age

However, COLLECT is vastly more flexible than RETURN DISTINCT. Additionally, the order of results is undefined for a RETURN DISTINCT, whereas for a COLLECT the results will be sorted.

REMOVE

The REMOVE keyword can be used to remove documents from a collection. On a single server, the document removal is executed transactionally in an all-or-nothing fashion. For sharded collections, the entire remove operation is not transactional.

Each REMOVE operation is restricted to a single collection, and the collection name must not be dynamic. Only a single REMOVE statement per collection is allowed per AQL query, and it cannot be followed by read operations that access the same collection, by traversal operations, or AQL functions that can read documents.

The syntax for a remove operation is:

REMOVE key-expression IN collection options

collection must contain the name of the collection to remove the documents from. key-expression must be an expression that contains the document identification. This can either be a string (which must then contain the document key) or a document, which must contain a _key attribute.

The following queries are thus equivalent:

FOR u IN users
  REMOVE { _key: u._key } IN users

FOR u IN users
  REMOVE u._key IN users

FOR u IN users
  REMOVE u IN users

Note: A remove operation can remove arbitrary documents, and the documents do not need to be identical to the ones produced by a preceding FOR statement:

FOR i IN 1..1000
  REMOVE { _key: CONCAT('test', i) } IN users

FOR u IN users
  FILTER u.active == false
  REMOVE { _key: u._key } IN backup

Setting query options

options can be used to suppress query errors that may occur when trying to remove non-existing documents. For example, the following query will fail if one of the to-be-deleted documents does not exist:

FOR i IN 1..1000
  REMOVE { _key: CONCAT('test', i) } IN users

By specifying the ignoreErrors query option, these errors can be suppressed so the query completes:

FOR i IN 1..1000
  REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { ignoreErrors: true }

To make sure data has been written to disk when a query returns, there is the waitForSync query option:

FOR i IN 1..1000
  REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { waitForSync: true }

Returning the removed documents

The removed documents can also be returned by the query. In this case, the REMOVE statement must be followed by a RETURN statement (intermediate LET statements are allowed, too).REMOVE introduces the pseudo-value OLD to refer to the removed documents:

REMOVE key-expression IN collection options RETURN OLD

Following is an example using a variable named removed for capturing the removed documents. For each removed document, the document key will be returned.

FOR u IN users
  REMOVE u IN users 
  LET removed = OLD 
  RETURN removed._key

UPDATE

The UPDATE keyword can be used to partially update documents in a collection. On a single server, updates are executed transactionally in an all-or-nothing fashion. For sharded collections, the entire update operation is not transactional.

Each UPDATE operation is restricted to a single collection, and the collection name must not be dynamic. Only a single UPDATE statement per collection is allowed per AQL query, and it cannot be followed by read operations that access the same collection, by traversal operations, or AQL functions that can read documents. The system attributes of documents (_key, _id, _from, _to, _rev) cannot be updated.

The two syntaxes for an update operation are:

UPDATE document IN collection options
UPDATE key-expression WITH document IN collection options

collection must contain the name of the collection in which the documents should be updated. document must be a document that contains the attributes and values to be updated. When using the first syntax, document must also contain the _key attribute to identify the document to be updated.

FOR u IN users
  UPDATE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users

The following query is invalid because it does not contain a _key attribute and thus it is not possible to determine the documents to be updated:

FOR u IN users
  UPDATE { name: CONCAT(u.firstName, u.lastName) } IN users

When using the second syntax, key-expression provides the document identification. This can either be a string (which must then contain the document key) or a document, which must contain a _key attribute.

The following queries are equivalent:

FOR u IN users
  UPDATE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users

FOR u IN users
  UPDATE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users

FOR u IN users
  UPDATE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users

An update operation may update arbitrary documents which do not need to be identical to the ones produced by a preceding FOR statement:

FOR i IN 1..1000
  UPDATE CONCAT('test', i) WITH { foobar: true } IN users

FOR u IN users
  FILTER u.active == false
  UPDATE u WITH { status: 'inactive' } IN backup

Setting query options

options can be used to suppress query errors that may occur when trying to update non-existing documents or violating unique key constraints:

FOR i IN 1..1000
  UPDATE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true }

An update operation will only update the attributes specified in document and leave other attributes untouched. Internal attributes (such as _id, _key, _rev, _from and _to) cannot be updated and are ignored when specified in document. Updating a document will modify the document’s revision number with a server-generated value.

When updating an attribute with a null value, ArangoDB will not remove the attribute from the document but store a null value for it. To get rid of attributes in an update operation, set them to null and provide the keepNull option:

FOR u IN users
  UPDATE u WITH { foobar: true, notNeeded: null } IN users OPTIONS { keepNull: false }

The above query will remove the notNeeded attribute from the documents and update the foobar attribute normally.

There is also the option mergeObjects that controls whether object contents will be merged if an object attribute is present in both the UPDATE query and in the to-be-updated document.

The following query will set the updated document’s name attribute to the exact same value that is specified in the query. This is due to the mergeObjects option being set to false:

FOR u IN users
  UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: false }

Contrary, the following query will merge the contents of the name attribute in the original document with the value specified in the query:

FOR u IN users
  UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: true }

Attributes in name that are present in the to-be-updated document but not in the query will now be preserved. Attributes that are present in both will be overwritten with the values specified in the query.

Note: the default value for mergeObjects is true, so there is no need to specify it explicitly.

To make sure data are durable when an update query returns, there is the waitForSync query option:

FOR u IN users
  UPDATE u WITH { foobar: true } IN users OPTIONS { waitForSync: true }

Returning the modified documents

The modified documents can also be returned by the query. In this case, the UPDATE statement needs to be followed a RETURN statement (intermediate LET statements are allowed, too). These statements can refer to the pseudo-values OLD and NEW. The OLD pseudo-value refers to the document revisions before the update, and NEW refers to document revisions after the update.

Both OLD and NEW will contain all document attributes, even those not specified in the update expression.

UPDATE document IN collection options RETURN OLD
UPDATE document IN collection options RETURN NEW
UPDATE key-expression WITH document IN collection options RETURN OLD
UPDATE key-expression WITH document IN collection options RETURN NEW

Following is an example using a variable named previous to capture the original documents before modification. For each modified document, the document key is returned.

FOR u IN users
  UPDATE u WITH { value: "test" } 
  LET previous = OLD 
  RETURN previous._key

The following query uses the NEW pseudo-value to return the updated documents, without some of the system attributes:

FOR u IN users
  UPDATE u WITH { value: "test" } 
  LET updated = NEW 
  RETURN UNSET(updated, "_key", "_id", "_rev")

It is also possible to return both OLD and NEW:

FOR u IN users
  UPDATE u WITH { value: "test" } 
  RETURN { before: OLD, after: NEW }

REPLACE

The REPLACE keyword can be used to completely replace documents in a collection. On a single server, the replace operation is executed transactionally in an all-or-nothing fashion. For sharded collections, the entire replace operation is not transactional.

Each REPLACE operation is restricted to a single collection, and the collection name must not be dynamic. Only a single REPLACE statement per collection is allowed per AQL query, and it cannot be followed by read operations that access the same collection, by traversal operations, or AQL functions that can read documents. The system attributes of documents (_key, _id, _from, _to, _rev) cannot be replaced.

The two syntaxes for a replace operation are:

REPLACE document IN collection options
REPLACE key-expression WITH document IN collection options

collection must contain the name of the collection in which the documents should be replaced. document is the replacement document. When using the first syntax, document must also contain the _key attribute to identify the document to be replaced.

FOR u IN users
  REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName), status: u.status } IN users

The following query is invalid because it does not contain a _key attribute and thus it is not possible to determine the documents to be replaced:

FOR u IN users
  REPLACE { name: CONCAT(u.firstName, u.lastName, status: u.status) } IN users

When using the second syntax, key-expression provides the document identification. This can either be a string (which must then contain the document key) or a document, which must contain a _key attribute.

The following queries are equivalent:

FOR u IN users
  REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users

FOR u IN users
  REPLACE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users

FOR u IN users
  REPLACE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users

FOR u IN users
  REPLACE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users

A replace will fully replace an existing document, but it will not modify the values of internal attributes (such as _id, _key, _from and _to). Replacing a document will modify a document’s revision number with a server-generated value.

A replace operation may update arbitrary documents which do not need to be identical to the ones produced by a preceding FOR statement:

FOR i IN 1..1000
  REPLACE CONCAT('test', i) WITH { foobar: true } IN users

FOR u IN users
  FILTER u.active == false
  REPLACE u WITH { status: 'inactive', name: u.name } IN backup

Setting query options

options can be used to suppress query errors that may occur when trying to replace non-existing documents or when violating unique key constraints:

FOR i IN 1..1000
  REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true }

To make sure data are durable when a replace query returns, there is the waitForSync query option:

FOR i IN 1..1000
  REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { waitForSync: true }

Returning the modified documents

The modified documents can also be returned by the query. In this case, the REPLACE statement must be followed by a RETURN statement (intermediate LET statements are allowed, too). The OLD pseudo-value can be used to refer to document revisions before the replace, and NEW refers to document revisions after the replace.

Both OLD and NEW will contain all document attributes, even those not specified in the replace expression.

REPLACE document IN collection options RETURN OLD
REPLACE document IN collection options RETURN NEW
REPLACE key-expression WITH document IN collection options RETURN OLD
REPLACE key-expression WITH document IN collection options RETURN NEW

Following is an example using a variable named previous to return the original documents before modification. For each replaced document, the document key will be returned:

FOR u IN users
  REPLACE u WITH { value: "test" } 
  LET previous = OLD 
  RETURN previous._key

The following query uses the NEW pseudo-value to return the replaced documents (without some of their system attributes):

FOR u IN users
  REPLACE u WITH { value: "test" } 
  LET replaced = NEW 
  RETURN UNSET(replaced, '_key', '_id', '_rev')

INSERT

The INSERT keyword can be used to insert new documents into a collection. On a single server, an insert operation is executed transactionally in an all-or-nothing fashion. For sharded collections, the entire insert operation is not transactional.

Each INSERT operation is restricted to a single collection, and the collection name must not be dynamic. Only a single INSERT statement per collection is allowed per AQL query, and it cannot be followed by read operations that access the same collection, by traversal operations, or AQL functions that can read documents.

The syntax for an insert operation is:

INSERT document IN collection options

Note: The INTO keyword is also allowed in the place of IN.

collection must contain the name of the collection into which the documents should be inserted. document is the document to be inserted, and it may or may not contain a _key attribute. If no _key attribute is provided, ArangoDB will auto-generate a value for _key value. Inserting a document will also auto-generate a document revision number for the document.

FOR i IN 1..100
  INSERT { value: i } IN numbers

When inserting into an edge collection, it is mandatory to specify the attributes _from and _to in document:

FOR u IN users
  FOR p IN products
    FILTER u._key == p.recommendedBy
    INSERT { _from: u._id, _to: p._id } IN recommendations

Setting query options

options can be used to suppress query errors that may occur when violating unique key constraints:

FOR i IN 1..1000
  INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true }

To make sure data are durable when an insert query returns, there is the waitForSync query option:

FOR i IN 1..1000
  INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { waitForSync: true }

Returning the inserted documents

The inserted documents can also be returned by the query. In this case, the INSERT statement can be a RETURN statement (intermediate LET statements are allowed, too). To refer to the inserted documents, the INSERT statement introduces a pseudo-value named NEW.

The documents contained in NEW will contain all attributes, even those auto-generated by the database (e.g. _id, _key, _rev, _from, and _to).

INSERT document IN collection options RETURN NEW

Following is an example using a variable named inserted to return the inserted documents. For each inserted document, the document key is returned:

FOR i IN 1..100
  INSERT { value: i } 
  LET inserted = NEW 
  RETURN inserted._key

UPSERT

The UPSERT keyword can be used for checking whether certain documents exist, and to update them in case they exist, or create them in case they do not exist. On a single server, upserts are executed transactionally in an all-or-nothing fashion. For sharded collections, the entire update operation is not transactional.

Each UPSERT operation is restricted to a single collection, and the collection name must not be dynamic. Only a single UPSERT statement per collection is allowed per AQL query, and it cannot be followed by read operations that access the same collection, by traversal operations, or AQL functions that can read documents.

The syntax for an upsert operation is:

UPSERT search-expression INSERT insert-expression UPDATE update-expression IN collection options
UPSERT search-expression INSERT insert-expression REPLACE update-expression IN collection options

When using the UPDATE variant of the upsert operation, the found document will be partially updated, meaning only the attributes specified in update-expression will be updated or added. When using the REPLACE variant of upsert, existing documents will be replaced with the contexts of update-expression.

Updating a document will modify the document’s revision number with a server-generated value. The system attributes of documents (_key, _id, _from, _to, _rev) cannot be updated.

The search-expression contains the document to be looked for. It must be an object literal without dynamic attribute names. In case no such document can be found in collection, a new document will be inserted into the collection as specified in the insert-expression.

In case at least one document in collection matches the search-expression, it will be updated using the update-expression. When more than one document in the collection matches the search-expression, it is undefined which of the matching documents will be updated. It is therefore often sensible to make sure by other means (such as unique indexes, application logic etc.) that at most one document matches search-expression.

The following query will look in the users collection for a document with a specific name attribute value. If the document exists, its logins attribute will be increased by one. If it does not exist, a new document will be inserted, consisting of the attributes name, logins, and dateCreated:

UPSERT { name: 'superuser' } 
INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() } 
UPDATE { logins: OLD.logins + 1 } IN users

Note that in the UPDATE case it is possible to refer to the previous version of the document using the OLD pseudo-value.

Setting query options

As in several above examples, the ignoreErrors option can be used to suppress query errors that may occur when trying to violate unique key constraints.

When updating or replacing an attribute with a null value, ArangoDB will not remove the attribute from the document but store a null value for it. To get rid of attributes in an upsert operation, set them to null and provide the keepNull option.

There is also the option mergeObjects that controls whether object contents will be merged if an object attribute is present in both the UPDATE query and in the to-be-updated document.

Note: the default value for mergeObjects is true, so there is no need to specify it explicitly.

To make sure data are durable when an update query returns, there is the waitForSync query option.

Returning documents

UPSERT statements can optionally return data. To do so, they need to be followed by a RETURN statement (intermediate LET statements are allowed, too). These statements can optionally perform calculations and refer to the pseudo-values OLD and NEW. In case the upsert performed an insert operation, OLD will have a value of null. In case the upsert performed an update or replace operation, OLD will contain the previous version of the document, before update/replace.

NEW will always be populated. It will contain the inserted document in case the upsert performed an insert, or the updated/replaced document in case it performed an update/replace.

This can also be used to check whether the upsert has performed an insert or an update internally:

UPSERT { name: 'superuser' } 
INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() } 
UPDATE { logins: OLD.logins + 1 } IN users
RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }