NAME
cdb - Constant DataBase librarySYNOPSYS
#include <cdb.h> cc ... -lcdb
DESCRIPTION
cdb is a library to create and access Constant DataBase files. File stores (key,value) pairs and used to quickly find a value based on a given key. Cdb files are create-once files, that is, once created, file cannot be updated but recreated from scratch -- this is why database is called constant. Cdb file is optimized for quick access. Format of such file described in cdb(5) manpage. This manual page corresponds to version 0.78 of tinycdb package.QUERY MODE
There are two query modes available. First uses a structure that represents a cdb database, just like FILE structure in stdio library, and another works with plain filedescriptor. First mode is more sophisticated and flexible, and usually somewhat faster. It uses mmap(2) internally. This mode may look more "natural" or object-oriented compared to second one.unsigned cdb_unpack(buf) const unsigned char buf[4];
helper routine to convert 32-bit integer from
internal representation to machine format. May be used to handle application
integers in a portable way. There is no error return.
Query Mode 1
All query operations in first more deals with common data structure, struct cdb, associated with an open file descriptor. This structure is opaque to application.int cdb_init(cdbp, fd) struct cdb * cdbp; int fd;
initializes structure given by cdbp
pointer and associates it with opened file descriptor fd. Memory
allocation for structure itself if needed and file open operation should be
done by application. File fd should be opened at least read-only, and
should be seekable. Routine returns 0 on success or negative value on
error.
void cdb_free(cdbp) struct cdb * cdbp;
frees internal resources held by structure.
Note that this routine does not closes a file.
int cdb_fileno(cdbp) const struct cdb * cdbp;
returns filedescriptor associated with cdb (as
was passed to cdb_init()).
int cdb_read(cdbp, buf, len, pos) int cdb_readdata(cdbp, buf, len, pos) int cdb_readkey(cdbp, buf, len, pos) const struct cdb * cdbp; void * buf; unsigned len; unsigned pos;
reads a data from cdb file, starting at
position pos of length len, placing result to buf. This
routine may be used to get actual value found by cdb_find() or other
routines that returns position and length of a data. Returns 0 on success or
negative value on error. Routines cdb_readdata() and
cdb_readkey() are shorthands to read current (after e.g.
cdb_find()) data and key respectively, using cdb_read().
const void * cdb_get(cdbp, len, pos) const void * cdb_getdata(cdbp) const void * cdb_getkey(cdbp) const struct cdb * cdbp; unsigned len; unsigned pos;
Internally, cdb library uses memory-mmaped
region to access the on-disk database. cdb_get() allows to access
internal memory in a way similar to cdb_read() but without extra
copying and buffer allocation. Returns pointer to actual data on success or
NULL on error (position points to outside of the database). Routines
cdb_getdata() and cdb_getkey() are shorthands to access current
(after e.g. cdb_find()) data and key respectively, using
cdb_get().
int cdb_find(cdbp, key, klen) unsigned cdb_datapos(cdbp) unsigned cdb_datalen(cdbp) unsigned cdb_keypos(cdbp) unsigned cdb_keylen(cdbp) struct cdb * cdbp; const void * key; unsigned klen;
attempts to find a key given by
(key,klen) parameters. If key exists in database, routine
returns 1 and places position and length of value associated with this key to
internal fields inside cdbp structure, to be accessible by
cdb_datapos( cdbp) and cdb_datalen(cdbp) routines.
If key is not in database, cdb_find() returns 0. On error, negative
value is returned. Data pointers (available via cdb_datapos() and
cdb_datalen()) gets updated only in case of successful search. Note
that using cdb_find() it is possible to lookup only first record
with a given key.
int cdb_findinit(cdbfp, cdbp, key, klen) int cdb_findnext(cdbfp) struct cdb_find * cdbfp; const struct cdb * cdbp; const void * key; unsigned klen;
sequential-find routines that used separate
structure. It is possible to have more than one record with the same key in a
database, and these routines allows to enumerate all them.
cdb_findinit() initializes search structure pointed to by cdbfp.
It will return negative value on error or non-negative value on success.
cdb_findnext() attempts to find next (first when called right after
cdb_findinit()) matching key, setting value position and length in
cdbfp structure. It will return positive value if given key was found,
0 if there is no more such key(s), or negative value on error. To access value
position and length after successful call to cdb_findnext() (when it
returned positive result), use cdb_datapos(cdbp) and
cdb_datalen( cdbp) routines. It is error to continue using
cdb_findnext() after it returned 0 or error condition (
cdb_findinit() should be called again). Current data pointers
(available via cdb_datapos() and cdb_datalen()) gets updated
only on successful search.
void cdb_seqinit(cptr, cdbp) int cdb_seqnext(cptr, cdbp) unsigned * cptr; struct cdb * cdbp;
sequential enumeration of all records stored
in cdb file. cdb_seqinit() initializes access current data pointer
cptr to point before first record in a cdb file. cdb_seqnext()
updates data pointers in cdbp to point to the next record and updates
cptr, returning positive value on success, 0 on end of data condition
and negative value on error. Current record will be available after successful
operation using cdb_datapos(cdbp) and
cdb_datalen(cdbp) (for the data) and
cdb_keypos(cdbp) and cdb_keylen(cdbp) (for the key
of the record). Data pointers gets updated only in case of successful
operation.
Query Mode 2
In this mode, one need to open a cdb file using one of standard system calls (such as open(2)) to obtain a filedescriptor, and then pass that filedescriptor to cdb routines. Available methods to query a cdb database using only a filedescriptor include:int cdb_seek(fd, key, klen, dlenp) int fd; const void * key; unsigned klen; unsigned * dlenp;
searches a cdb database (as pointed to by
fd filedescriptor) for a key given by ( key, klen), and
positions file pointer to start of data associated with that key if found, so
that next read operation from this filedescriptor will read that value, and
places length of value, in bytes, to variable pointed to by dlenp.
Returns positive value if operation was successful, 0 if key was not found, or
negative value on error. To read the data from a cdb file, cdb_bread()
routine below can be used.
int cdb_bread(fd, buf, len) int fd; void * buf; int len;
reads data from a file (as pointed to by
fd filedescriptor) and places len bytes from this file to a
buffer pointed to by buf. Returns 0 if exactly len bytes was
read, or a negative value in case of error or end-of-file. This routine
ignores interrupt errors (EINTR). Sets errno variable to EIO in case of
end-of-file condition (when there is less than len bytes available to
read).
Notes
Note that value of any given key may be updated in place by another value of the same size, by writing to file at position found by cdb_find() or cdb_seek(). However one should be very careful when doing so, since write operation may not succeed in case of e.g. power failure, thus leaving corrupted data. When database is (re)created, one can guarantee that no incorrect data will be written to database, but not with inplace update. Note also that it is not possible to update any key or to change length of value.CREATING MODE
cdb database file should usually be created in two steps: first, temporary file created and written to disk, and second, that temporary file is renamed to permanent place. Unix rename(2) call is atomic operation, it removes destination file if any AND renaes another file in one step. This way it is guaranteed that readers will not see incomplete database. To prevent multiple simultaneous updates, locking may also be used.int cdb_make_start(cdbmp, fd) struct cdb_make * cdbmp; int fd;
initializes structure to create a database.
File fd should be opened read-write and should be seekable. Returns 0
on success or negative value on error.
int cdb_make_add(cdbmp, key, klen, val, vlen) struct cdb_make * cdbmp; const void * key, *val; unsigned klen, vlen;
adds record with key (key,klen)
and value ( val,vlen) to a database. Returns 0 on success or
negative value on error. Note that this routine does not checks if given key
already exists, but cdb_find() will not see second record with the same
key. It is not possible to continue building a database if
cdb_make_add() returned error indicator.
int cdb_make_finish(cdbmp) struct cdb_make * cdbmp;
finalizes database file, constructing all
needed indexes, and frees memory structures. It does not closes
filedescriptor. Returns 0 on success or negative value on error.
int cdb_make_exists(cdbmp, key, klen) struct cdb_make * cdbmp; const void * key; unsigned klen;
This routine attempts to find given by
(key, klen) key in a not-yet-complete database. It may
significantly slow down the whole process, and currently it flushes internal
buffer to disk on every call with key those hash value already exists in db.
Returns 0 if such key doesn't exists, 1 if it is, or negative value on error.
Note that database file should be opened read-write (not write-only) to use
this routine. If cdb_make_exists() returned error, it may be not
possible to continue constructing database.
int cdb_make_find(cdbmp, key, klen, mode) struct cdb_make * cdbmp; const void * key; unsigned klen; int mode;
This routine attempts to find given by
(key, klen) key in the database being created. If the given key
is already exists, it an action specified by mode will be performed:
- CDB_FIND
- checks whenever the given record is already in the database.
- CDB_FIND_REMOVE
- removes all matching records by re-writing the database file accordingly.
- CDB_FIND_FILL0
- fills all matching records with zeros and removes them from index so that the records in question will not be findable with cdb_find(). This is faster than CDB_FIND_REMOVE, but leaves zero "gaps" in the database. Lastly inserted records, if matched, are always removed.
int cdb_make_put(cdbmp, key, klen, val, vlen, mode) struct cdb_make * cdbmp; const void * key, *val; unsigned klen, vlen; int mode;
This is a somewhat combined
cdb_make_exists() and cdb_make_add() routines. mode
argument controls how repeated (already existing) keys will be treated:
- CDB_PUT_ADD
- no duplicate checking will be performed. This mode is the same as cdb_make_add() routine does.
- CDB_PUT_REPLACE
- If the key already exists, it will be removed from the database before adding new key,value pair. This requires moving data in the file, and can be quite slow if the file is large. All matching old records will be removed this way. This is the same as calling cdb_make_find() with CDB_FIND_REMOVE mode argument followed by calling cdb_make_add().
- CDB_PUT_REPLACE0
- If the key already exists and it isn't the last record in the file, old record will be zeroed out before adding new key,value pair. This is alot faster than CDB_PUT_REPLACE, but some extra data will still be present in the file. The data -- old record -- will not be accessible by normal searches, but will appear in sequential database traversal. This is the same as calling cdb_make_find() with CDB_FIND_FILL0 mode argument followed by cdb_make_add().
- CDB_PUT_INSERT
- add key,value pair only if such key does not exists in a database. Note that since query (see query mode above) will find first added record, this mode is somewhat useless (but allows to reduce database size in case of repeated keys). This is the same as calling cdb_make_exists(), followed by cdb_make_add() if the key was not found.
- CDB_PUT_WARN
- add key,value pair unconditionally, but also check if this key already exists. This is equivalent of cdb_make_exists() to check existence of the given key, unconditionally followed by cdb_make_add().
void cdb_pack(num, buf) unsigned num; unsigned char buf[4];
helper routine that used internally to convert
machine integer n to internal form to be stored in datafile. 32-bit
integer is stored in 4 bytes in network byte order. May be used to handle
application data. There is no error return.
unsigned cdb_hash(buf, len) const void * buf; unsigned len;
helper routine that calculates cdb hash value
of given bytes. CDB hash function is
hash[n] = (hash[n-1] + (hash[n-1] << 5)) ^ buf[n]
starting with
hash[-1] = 5381
hash[n] = (hash[n-1] + (hash[n-1] << 5)) ^ buf[n]
hash[-1] = 5381
ERRORS
cdb library may set errno to following on error:- EPROTO
- database file is corrupted in some way
- EINVAL
- the same as EPROTO above if system lacks EPROTO constant
- EINVAL
- flag argument for cdb_make_put() is invalid
- EEXIST
- flag argument for cdb_make_put() is CDB_PUT_INSERT, and key already exists
- ENOMEM
- not enough memory to complete operation (cdb_make_finish and cdb_make_add)
- EIO
- set by cdb_bread and cdb_seek if a cdb file is shorter than expected or corrupted in some other way.
EXAMPLES
Note: in all examples below, error checking is not shown for brewity.Query Mode
int fd; struct cdb cdb; char *key, *data; unsigned keylen, datalen; /* opening the database */ fd = open(filename, O_RDONLY); cdb_init(&cdb, fd); /* initialize key and keylen here */ /* single-record search. */ if (cdb_find(&cdb, key, keylen) > 0) { datalen = cdb_datalen(&cdb); data = malloc(datalen + 1); cdb_read(&cdb, data, datalen, cdb_datapos(&cdb)); data[datalen] = '\0'; printf("key=%s data=%s\n", key, data); free(data); } else printf("key=%s not found\n", key); /* multiple record search */ struct cdb_find cdbf; int n; cdb_findinit(&cdbf, &cdb, key, keylen); n = 0; while(cdb_findnext(&cdbf) > 0) { datalen = cdb_datalen(&cdb); data = malloc(datalen + 1); cdb_read(&cdb, data, datalen, cdb_datapos(&cdb)); data[datalen] = '\0'; printf("key=%s data=%s\n", key, data); free(data); ++n; } printf("key=%s %d records found\n", n); /* sequential database access */ unsigned pos; int n; cdb_seqinit(&pos, &cdb); n = 0; while(cdb_seqnext(&pos, &cdb) > 0) { keylen = cdb_keylen(&cdb); key = malloc(keylen + 1); cdb_read(&cdb, key, keylen, cdb_keypos(&cdb)); key[keylen] = '\0'; datalen = cdb_datalen(&cdb); data = malloc(datalen + 1); cdb_read(&cdb, data, datalen, cdb_datapos(&cdb)); data[datalen] = '\0'; ++n; printf("record %n: key=%s data=%s\n", n, key, data); free(data); free(key); } printf("total records found: %d\n", n); /* close the database */ cdb_free(&cdb); close(fd); /* simplistic query mode */ fd = open(filename, O_RDONLY); if (cdb_seek(fd, key, keylen, &datalen) > 0) { data = malloc(datalen + 1); cdb_bread(fd, data, datalen); data[datalen] = '\0'; printf("key=%s data=%s\n", key, data); } else printf("key=%s not found\n", key); close(fd);
Create Mode
int fd; struct cdb_make cdbm; char *key, *data; unsigned keylen, datalen; /* initialize the database */ fd = open(filename, O_RDWR|O_CREAT|O_TRUNC, 0644); cdb_make_start(&cdbm, fd); while(have_more_data()) { /* initialize key and data */ if (cdb_make_exists(&cdbm, key, keylen) == 0) cdb_make_add(&cdbm, key, keylen, data, datalen); /* or use cdb_make_put() with appropriate flags */ } /* finalize and close the database */ cdb_make_finish(&cdbm); close(fd);
SEE ALSO
cdb(5), cdb(1), dbm(3), db(3), open(2).AUTHOR
The tinycdb package written by Michael Tokarev <[email protected]>, based on ideas and shares file format with original cdb library by Dan Bernstein.LICENSE
Public domain.Jun 2006 |