ArangoDB v2.8 reached End of Life (EOL) and is no longer supported.
This documentation is outdated. Please see the most recent version here: Try latest
Administrating ArangoDB
AppendOnly/MVCC
Instead of overwriting existing documents, ArangoDB will create a new version of modified documents. This is even the case when a document gets deleted. The two benefits are:
- Objects can be stored coherently and compactly in the main memory.
- Objects are preserved, isolated writing and reading transactions allow accessing these objects for parallel operations.
The system collects obsolete versions as garbage, recognizing them as forsaken. Garbage collection is asynchronous and runs parallel to other processes.
Mostly Memory/Durability
Database documents are stored in memory-mapped files. Per default, these memory-mapped files are synced regularly but not instantly. This is often a good tradeoff between storage performance and durability. If this level of durability is too low for an application, the server can also sync all modifications to disk instantly. This will give full durability but will come with a performance penalty as each data modification will trigger a sync I/O operation.
Durability Configuration
Global Configuration
There are global configuration values for durability, which can be adjusted by specifying the following configuration options:
--database.wait-for-sync boolean
Default wait-for-sync value. Can be overwritten when creating a new
collection.
The default is false.
--database.force-sync-properties boolean
Force syncing of collection properties to disk after creating a collection
or updating its properties.
If turned off, no fsync will happen for the collection and database
properties stored in parameter.json
files in the file system. Turning
off this option will speed up workloads that create and drop a lot of
collections (e.g. test suites).
The default is true.
--wal.sync-interval
The interval (in milliseconds) that ArangoDB will use to automatically
synchronize data in its write-ahead logs to disk. Automatic syncs will only
be performed for not-yet synchronized data, and only for operations that
have been executed without the waitForSync attribute.
Per-collection configuration
You can also configure the durability behavior on a per-collection basis. Use the ArangoDB shell to change these properties.
collection.properties()
Returns an object containing all collection properties.
- waitForSync: If true creating a document will only return
after the data was synced to disk.
- journalSize : The size of the journal in bytes.
- isVolatile: If true then the collection data will be
kept in memory only and ArangoDB will not write or sync the data
to disk.
- keyOptions (optional) additional options for key generation. This is
a JSON array containing the following attributes (note: some of the
attributes are optional):
- type: the type of the key generator used for the collection.
- allowUserKeys: if set to true, then it is allowed to supply own key values in the _key attribute of a document. If set to false, then the key generator will solely be responsible for generating keys and supplying own key values in the _key attribute of documents is considered an error.
- increment: increment value for autoincrement key generator. Not used for other key generator types.
- offset: initial offset value for autoincrement key generator.
Not used for other key generator types.
- indexBuckets: number of buckets into which indexes using a hash
table are split. The default is 16 and this number has to be a
power of 2 and less than or equal to 1024.
For very large collections one should increase this to avoid long pauses when the hash table has to be initially built or resized, since buckets are resized individually and can be initially built in parallel. For example, 64 might be a sensible value for a collection with 100 000 000 documents. Currently, only the edge index respects this value, but other index types might follow in future ArangoDB versions. Changes (see below) are applied when the collection is loaded the next time.
In a cluster setup, the result will also contain the following attributes: - numberOfShards: the number of shards of the collection.
- shardKeys: contains the names of document attributes that are used to
determine the target shard for documents.
collection.properties(properties)
Changes the collection properties. properties must be a object with one or more of the following attribute(s): - waitForSync: If true creating a document will only return
after the data was synced to disk.
- journalSize : The size of the journal in bytes.
- indexBuckets : See above, changes are only applied when the
collection is loaded the next time.
Note: it is not possible to change the journal size after the journal or datafile has been created. Changing this parameter will only effect newly created journals. Also note that you cannot lower the journal size to less then size of the largest document already stored in the collection.
Note: some other collection properties, such as type, isVolatile, or keyOptions cannot be changed once the collection is created.
Examples
Read all properties
arangosh> db.example.properties();
{
"doCompact" : true,
"journalSize" : 1048576,
"isSystem" : false,
"isVolatile" : false,
"waitForSync" : false,
"keyOptions" : {
"type" : "traditional",
"allowUserKeys" : true
},
"indexBuckets" : 8
}
arangosh> db.example.properties();
Change a property
arangosh> db.example.properties({ waitForSync : true });
{
"doCompact" : true,
"journalSize" : 1048576,
"isSystem" : false,
"isVolatile" : false,
"waitForSync" : true,
"keyOptions" : {
"type" : "traditional",
"allowUserKeys" : true
},
"indexBuckets" : 8
}
arangosh> db.example.properties({ waitForSync : true });
Per-operation configuration
Many data-modification operations and also ArangoDB’s transactions allow to specify a waitForSync attribute, which when set ensures the operation data has been synchronized to disk when the operation returns.
Disk-Usage Configuration
The amount of disk space used by ArangoDB is determined by a few configuration options.
Global Configuration
The total amount of disk storage required by ArangoDB is determined by the size of the write-ahead logfiles plus the sizes of the collection journals and datafiles.
There are the following options for configuring the number and sizes of the write-ahead logfiles:
--wal.reserve-logfiles
The maximum number of reserve logfiles that ArangoDB will create in a
background process. Reserve logfiles are useful in the situation when an
operation needs to be written to a logfile but the reserve space in the
logfile is too low for storing the operation. In this case, a new logfile
needs to be created to store the operation. Creating new logfiles is
normally slow, so ArangoDB will try to pre-create logfiles in a background
process so there are always reserve logfiles when the active logfile gets
full. The number of reserve logfiles that ArangoDB keeps in the background
is configurable with this option.
--wal.historic-logfiles
The maximum number of historic logfiles that ArangoDB will keep after they
have been garbage-collected. If no replication is used, there is no need
to keep historic logfiles except for having a local changelog.
In a replication setup, the number of historic logfiles affects the amount
of data a slave can fetch from the master’s logs. The more historic
logfiles, the more historic data is available for a slave, which is useful
if the connection between master and slave is unstable or slow. Not having
enough historic logfiles available might lead to logfile data being deleted
on the master already before a slave has fetched it.
--wal.logfile-size
Specifies the filesize (in bytes) for each write-ahead logfile. The logfile
size should be chosen so that each logfile can store a considerable amount of
documents. The bigger the logfile size is chosen, the longer it will take
to fill up a single logfile, which also influences the delay until the data
in a logfile will be garbage-collected and written to collection journals
and datafiles. It also affects how long logfile recovery will take at
server start.
--wal.allow-oversize-entries
Whether or not it is allowed to store individual documents that are bigger
than would fit into a single logfile. Setting the option to false will make
such operations fail with an error. Setting the option to true will make
such operations succeed, but with a high potential performance impact.
The reason is that for each oversize operation, an individual oversize
logfile needs to be created which may also block other operations.
The option should be set to false if it is certain that documents will
always have a size smaller than a single logfile.
--wal.suppress-shape-information
Setting this variable to true will lead to no shape information being
written into the write-ahead logfiles for documents or edges. While this is
a good optimization for a single server to save memory (and disk space), it
it will effectively disable using the write-ahead log as a reliable source
for replicating changes to other servers. A master server with this option
set to true will not be able to fully reproduce the structure of saved
documents after a collection has been deleted. In case a replication client
requests a document for which the collection is already deleted, the master
will return an empty document. Note that this only affects replication and
not normal operation on the master.
Do not set this variable to true on a server that you plan to use as a
replication master
When data gets copied from the write-ahead logfiles into the journals or datafiles of collections, files will be created on the collection level. How big these files are is determined by the following global configuration value:
--database.maximal-journal-size size
Maximal size of journal in bytes. Can be overwritten when creating a new
collection. Note that this also limits the maximal size of a single
document.
The default is 32MB.
Per-collection configuration
The journal size can also be adjusted on a per-collection level using the collection’s properties method.