ArangoDB v2.8 reached End of Life (EOL) and is no longer supported.

This documentation is outdated. Please see the most recent version here: Try latest

Administrating ArangoDB

AppendOnly/MVCC

Instead of overwriting existing documents, ArangoDB will create a new version of modified documents. This is even the case when a document gets deleted. The two benefits are:

Objects can be stored coherently and compactly in the main memory.
Objects are preserved, isolated writing and reading transactions allow accessing these objects for parallel operations.

The system collects obsolete versions as garbage, recognizing them as forsaken. Garbage collection is asynchronous and runs parallel to other processes.

Mostly Memory/Durability

Database documents are stored in memory-mapped files. Per default, these memory-mapped files are synced regularly but not instantly. This is often a good tradeoff between storage performance and durability. If this level of durability is too low for an application, the server can also sync all modifications to disk instantly. This will give full durability but will come with a performance penalty as each data modification will trigger a sync I/O operation.

Durability Configuration

Global Configuration

There are global configuration values for durability, which can be adjusted by specifying the following configuration options:

--database.wait-for-sync boolean
Default wait-for-sync value. Can be overwritten when creating a new collection.
The default is false.

--database.force-sync-properties boolean
Force syncing of collection properties to disk after creating a collection or updating its properties.
If turned off, no fsync will happen for the collection and database properties stored in parameter.json files in the file system. Turning off this option will speed up workloads that create and drop a lot of collections (e.g. test suites).
The default is true.

--wal.sync-interval
The interval (in milliseconds) that ArangoDB will use to automatically synchronize data in its write-ahead logs to disk. Automatic syncs will only be performed for not-yet synchronized data, and only for operations that have been executed without the waitForSync attribute.

Per-collection configuration

You can also configure the durability behavior on a per-collection basis. Use the ArangoDB shell to change these properties.

collection.properties()
Returns an object containing all collection properties.

waitForSync: If true creating a document will only return after the data was synced to disk.
journalSize : The size of the journal in bytes.
isVolatile: If true then the collection data will be kept in memory only and ArangoDB will not write or sync the data to disk.
keyOptions (optional) additional options for key generation. This is a JSON array containing the following attributes (note: some of the attributes are optional):
- type: the type of the key generator used for the collection.
- allowUserKeys: if set to true, then it is allowed to supply own key values in the _key attribute of a document. If set to false, then the key generator will solely be responsible for generating keys and supplying own key values in the _key attribute of documents is considered an error.
- increment: increment value for autoincrement key generator. Not used for other key generator types.
- offset: initial offset value for autoincrement key generator. Not used for other key generator types.
indexBuckets: number of buckets into which indexes using a hash table are split. The default is 16 and this number has to be a power of 2 and less than or equal to 1024.
For very large collections one should increase this to avoid long pauses when the hash table has to be initially built or resized, since buckets are resized individually and can be initially built in parallel. For example, 64 might be a sensible value for a collection with 100 000 000 documents. Currently, only the edge index respects this value, but other index types might follow in future ArangoDB versions. Changes (see below) are applied when the collection is loaded the next time.
In a cluster setup, the result will also contain the following attributes:
numberOfShards: the number of shards of the collection.
shardKeys: contains the names of document attributes that are used to determine the target shard for documents.
collection.properties(properties)
Changes the collection properties. properties must be a object with one or more of the following attribute(s):
waitForSync: If true creating a document will only return after the data was synced to disk.
journalSize : The size of the journal in bytes.
indexBuckets : See above, changes are only applied when the collection is loaded the next time.
Note: it is not possible to change the journal size after the journal or datafile has been created. Changing this parameter will only effect newly created journals. Also note that you cannot lower the journal size to less then size of the largest document already stored in the collection.
Note: some other collection properties, such as type, isVolatile, or keyOptions cannot be changed once the collection is created.

Examples
Read all properties

arangosh> db.example.properties();
{ 
  "doCompact" : true, 
  "journalSize" : 1048576, 
  "isSystem" : false, 
  "isVolatile" : false, 
  "waitForSync" : false, 
  "keyOptions" : { 
    "type" : "traditional", 
    "allowUserKeys" : true 
  }, 
  "indexBuckets" : 8 
}

arangosh> db.example.properties();

show execution results

Change a property

arangosh> db.example.properties({ waitForSync : true });
{ 
  "doCompact" : true, 
  "journalSize" : 1048576, 
  "isSystem" : false, 
  "isVolatile" : false, 
  "waitForSync" : true, 
  "keyOptions" : { 
    "type" : "traditional", 
    "allowUserKeys" : true 
  }, 
  "indexBuckets" : 8 
}

arangosh> db.example.properties({ waitForSync : true });

show execution results

Per-operation configuration

Many data-modification operations and also ArangoDB’s transactions allow to specify a waitForSync attribute, which when set ensures the operation data has been synchronized to disk when the operation returns.

Disk-Usage Configuration

The amount of disk space used by ArangoDB is determined by a few configuration options.

Global Configuration

The total amount of disk storage required by ArangoDB is determined by the size of the write-ahead logfiles plus the sizes of the collection journals and datafiles.

There are the following options for configuring the number and sizes of the write-ahead logfiles:

--wal.reserve-logfiles
The maximum number of reserve logfiles that ArangoDB will create in a background process. Reserve logfiles are useful in the situation when an operation needs to be written to a logfile but the reserve space in the logfile is too low for storing the operation. In this case, a new logfile needs to be created to store the operation. Creating new logfiles is normally slow, so ArangoDB will try to pre-create logfiles in a background process so there are always reserve logfiles when the active logfile gets full. The number of reserve logfiles that ArangoDB keeps in the background is configurable with this option.

--wal.historic-logfiles
The maximum number of historic logfiles that ArangoDB will keep after they have been garbage-collected. If no replication is used, there is no need to keep historic logfiles except for having a local changelog.
In a replication setup, the number of historic logfiles affects the amount of data a slave can fetch from the master’s logs. The more historic logfiles, the more historic data is available for a slave, which is useful if the connection between master and slave is unstable or slow. Not having enough historic logfiles available might lead to logfile data being deleted on the master already before a slave has fetched it.

--wal.logfile-size
Specifies the filesize (in bytes) for each write-ahead logfile. The logfile size should be chosen so that each logfile can store a considerable amount of documents. The bigger the logfile size is chosen, the longer it will take to fill up a single logfile, which also influences the delay until the data in a logfile will be garbage-collected and written to collection journals and datafiles. It also affects how long logfile recovery will take at server start.

--wal.allow-oversize-entries
Whether or not it is allowed to store individual documents that are bigger than would fit into a single logfile. Setting the option to false will make such operations fail with an error. Setting the option to true will make such operations succeed, but with a high potential performance impact. The reason is that for each oversize operation, an individual oversize logfile needs to be created which may also block other operations. The option should be set to false if it is certain that documents will always have a size smaller than a single logfile.

--wal.suppress-shape-information
Setting this variable to true will lead to no shape information being written into the write-ahead logfiles for documents or edges. While this is a good optimization for a single server to save memory (and disk space), it it will effectively disable using the write-ahead log as a reliable source for replicating changes to other servers. A master server with this option set to true will not be able to fully reproduce the structure of saved documents after a collection has been deleted. In case a replication client requests a document for which the collection is already deleted, the master will return an empty document. Note that this only affects replication and not normal operation on the master.
Do not set this variable to true on a server that you plan to use as a replication master

When data gets copied from the write-ahead logfiles into the journals or datafiles of collections, files will be created on the collection level. How big these files are is determined by the following global configuration value:

--database.maximal-journal-size size
Maximal size of journal in bytes. Can be overwritten when creating a new collection. Note that this also limits the maximal size of a single document.
The default is 32MB.

Per-collection configuration

The journal size can also be adjusted on a per-collection level using the collection’s properties method.

❮ Using jsUnity Advanced ❯