# litedb

**Repository Path**: jungle/litedb

## Basic Information

- **Project Name**: litedb
- **Description**: No description available
- **Primary Language**: C
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-03-25
- **Last Updated**: 2026-04-05

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# litedb

Light-weight in-memory table library (C, C99) — example project for small embeddable table storage.

## Features

- Generated table descriptors from schema (`gen/`)
- In-memory table storage with simple structures
- Create-or-update and delete operations by key fields
- Field-level partial updates via a bitmask (`itemMap`)
- Application-facing query helpers for filtering, range scans, sorting, pagination, lightweight aggregation, and selected-column projection
- Optional bulk-load preallocation via `db_reserve(tableId, expected_rows)`
- WAL-style append log replay for lightweight crash recovery between saves
- JSON export/import helpers for easier integration and snapshot inspection
- Schema migration / versioned restore helpers for long-lived embedded products via `LITEDB_TABLE_VERSIONED(...)` and `db_schema_migration_register()`
- Subscription callbacks for add/update/delete events
- Async / cancellable subscriptions via `LITEDB_SUB_ASYNC`, `db_unsubscribe()`, and `db_async_wait_idle()`
- Readable error status API via `litedb_error_t`, `db_last_error()`, and `db_strerror()`
- Friendlier schema-definition helpers via `LITEDB_FIELD(...)`, `LITEDB_FIELD_KEY(...)`, `LITEDB_FIELD_DEFAULT(...)`, `LITEDB_FIELD_RANGE(...)`, `LITEDB_TABLE(...)`, and `LITEDB_TABLE_VERSIONED(...)`
- Field constraints for `NOT NULL`, `UNIQUE`, and inclusive range checks on hand-written schemas
- Rich field types including DATE / MAC / IPv4 / IPv6 / INT64 / UINT32 / FLOAT / DOUBLE / BLOB / UUID / DECIMAL
- Per-table locking plus bucket-stripe isolation for better multi-threaded throughput across independent tables and keys
- In-memory snapshots with direct restore and true oplog-based replay restore

## Quick Usage

1. Include generated headers from `gen/` and public headers from `src/`.
2. Initialize runtime: `db_init()` (generated tables are registered automatically).
3. Optional for bulk load: `db_reserve(tableId, expected_rows)` to pre-size hash buckets and node storage.
4. Subscribe to changes: `db_subscribe(tableId, callback)` (events: 1=ADD, 2=UPDATE, 3=DELETE).
5. Insert/update: `db_createorset(tableId, itemMap, &row)`.
6. Delete by key: `db_delete(tableId, &keyRow)`.
7. Shutdown: `db_shutdown()` to free runtime storage.

API highlights (see `src/litedb.h`):

- `int db_init(void);`
- `int db_shutdown(void);`
- `int db_table_register(table_desc_t* desc);`
- `litedb_error_t db_last_error(void);`
- `const char* db_strerror(litedb_error_t code);`
- `int db_reserve(int32_t tableId, size_t expected_rows);`
- `int db_schema_migration_register(const litedb_schema_migration_t* migrations, size_t n);`
- `int db_createorset(int32_t tableId, uint32_t itemMap, void* data);`
- `int db_delete(int32_t tableId, void* data);`
- `int db_query_range(int32_t tableId, uint32_t itemId, const void* min_value, const void* max_value, litedb_iter_cb_t cb, void* user_data);`
- `int db_query_select(int32_t tableId, void* key, uint32_t itemMap, void* out, size_t out_size);`
- `int db_query_page(int32_t tableId, litedb_predicate_t pred, void* pred_user_data, uint32_t sort_itemId, litedb_sort_order_t sort_order, size_t offset, size_t limit, litedb_iter_cb_t cb, void* cb_user_data, size_t* out_total);`
- `int db_query_page_select(int32_t tableId, litedb_predicate_t pred, void* pred_user_data, uint32_t sort_itemId, litedb_sort_order_t sort_order, size_t offset, size_t limit, uint32_t itemMap, litedb_iter_cb_t cb, void* cb_user_data, size_t* out_total);`
- `int db_aggregate_number(int32_t tableId, uint32_t itemId, litedb_aggregate_op_t op, litedb_predicate_t pred, void* pred_user_data, double* out_result);`
- `int db_wal_enable(const char* file_path);`
- `int db_export_json(int32_t tableId, const char* file_path);`
- `int db_import_json(int32_t tableId, const char* file_path);`
- `int db_subscribe(int32_t tableId, litedb_cb_t cb);`

`itemMap` is a bitmask where bit (field_id-1) selects the field to update; use `0xFFFFFFFFu` to replace the whole row.

When an API call fails, the return value stays lightweight (`0` / `-1` or a count), while `db_last_error()` and `db_strerror()` expose the more specific reason such as invalid arguments, missing tables/rows, callback failures, constraint violations, or I/O errors.

For hand-written schemas, the helper macros reduce descriptor boilerplate:

```c
typedef struct {
    int32_t id;
    char title[32];
    int32_t priority;
} note_t;

static field_desc_t g_note_fields[] = {
    LITEDB_FIELD_KEY(note_t, id, 1, FTYPE_INT32),
    LITEDB_FIELD(note_t, title, 2, FTYPE_STRING),
    LITEDB_FIELD_DEFAULT(note_t, priority, 3, FTYPE_INT32, "5"),
};

static table_desc_t g_note_table =
    LITEDB_TABLE(note_t, 0x00090001, "Notes", g_note_fields);
```

When a table schema needs an explicit persisted version, use `LITEDB_TABLE_VERSIONED(...)` and register a migration callback for old snapshots:

```c
static table_desc_t g_note_table_v2 =
    LITEDB_TABLE_VERSIONED(note_t, 0x00090001, "Notes", 2, g_note_fields);

static int migrate_note_v1_to_v2(const void* old_row, size_t old_size,
                                 uint32_t from_version, void* new_row, size_t new_size,
                                 uint32_t to_version, void* user_data) {
    (void)from_version; (void)to_version; (void)user_data;
    if (old_size > new_size) return -1;
    memset(new_row, 0, new_size);
    memcpy(new_row, old_row, old_size);
    return 0;
}
```

Constraint helpers can be added directly to the descriptor:

```c
static field_desc_t g_rule_fields[] = {
    LITEDB_FIELD_KEY(rule_t, id, 1, FTYPE_INT32),
    LITEDB_FIELD_CONSTRAINED(rule_t, code, 2, FTYPE_STRING, NULL,
                             LITEDB_CONSTRAINT_NOT_NULL | LITEDB_CONSTRAINT_UNIQUE,
                             NULL, NULL),
    LITEDB_FIELD_RANGE(rule_t, score, 3, FTYPE_INT32, "0", "100"),
};
```

When a constraint fails during create or update, the call returns `-1` and `db_last_error()` reports `LITEDB_ERR_CONSTRAINT_VIOLATION`.

For large row structs, projection reads avoid copying unused fields. `db_query_select()` and `db_query_page_select()` always preserve key fields and then copy only the columns requested by `itemMap`.

For lightweight durability between checkpoints, call `db_wal_enable("data.bin")`. Subsequent successful creates, updates, and deletes are appended to `data.bin.wal`, and `db_load("data.bin")` automatically replays that append log after restoring the last full snapshot.

For integration and diagnostics, `db_export_json(tableId, "rows.json")` writes a table as a JSON array of row objects, and `db_import_json(tableId, "rows.json")` upserts those rows back into memory.

For long-lived products with evolving row layouts, persist your descriptor with `LITEDB_TABLE_VERSIONED(...)` and register one or more `litedb_schema_migration_t` entries through `db_schema_migration_register()`. `db_load()` will then restore older snapshot files by migrating each legacy row image into the current in-memory schema.

## Build & Run (CMake)

From repository root (recommended):

```bash
mkdir -p build
cmake -S . -B build
cmake --build build --config Release
```

On Unix/macOS the test binary will be at `build/litedb_test`; on Windows it will typically be `build\\Release\\litedb_test.exe`.

Run the tests/example program:

```bash
./build/litedb_test          # Unix
build\\Release\\litedb_test.exe  # Windows
```

## Sanitizer Builds (ASan / UBSan)

For memory and undefined-behavior checks on GCC/Clang toolchains, LiteDB supports optional sanitizer builds through CMake:

- `LITEDB_ENABLE_ASAN=ON` — enable AddressSanitizer / LeakSanitizer
- `LITEDB_ENABLE_UBSAN=ON` — enable UndefinedBehaviorSanitizer
- both options can be enabled together

Examples:

```bash
# ASan only
cmake -S . -B builds-asan -DLITEDB_ENABLE_ASAN=ON -DCMAKE_BUILD_TYPE=Debug
cmake --build builds-asan --target litedb_test -j4
cd builds-asan/test/testcase
ASAN_OPTIONS=detect_leaks=1:abort_on_error=1:strict_string_checks=1 ./litedb_test
```

```bash
# UBSan only
cmake -S . -B builds-ubsan -DLITEDB_ENABLE_UBSAN=ON -DCMAKE_BUILD_TYPE=Debug
cmake --build builds-ubsan --target litedb_test -j4
cd builds-ubsan/test/testcase
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 ./litedb_test
```

```bash
# ASan + UBSan together
cmake -S . -B builds-sanitize -DLITEDB_ENABLE_ASAN=ON -DLITEDB_ENABLE_UBSAN=ON -DCMAKE_BUILD_TYPE=Debug
cmake --build builds-sanitize --target litedb_test -j4
cd builds-sanitize/test/testcase
ASAN_OPTIONS=detect_leaks=1:abort_on_error=1:strict_string_checks=1 \
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 \
./litedb_test --gtest_color=no
```

> Note: sanitizer builds are intended for debugging and validation, not for release binaries.

## Project Layout

- `src/` — library implementation and public headers (`litedb.h`, `litedb.c`)
- `gen/` — generated table headers and registration (`litedb_gen.h`, `litedb_gen_register.c`)
- `test/` — test program and unit tests
- `CMakeLists.txt` — build configuration

## Architecture & Core Algorithms

### 1. Runtime architecture

The project is organized as a **generated schema layer + generic in-memory runtime layer**:

- **Schema/codegen layer (`gen/`)**
  - Generates C structs, table IDs, field IDs, and registration code from table definitions.
  - Lets business code operate on strongly-typed row structs such as `CLASS_DB_USER_S` and `CLASS_DB_TERMINAL_S`.

- **Runtime layer (`src/`)**
  - `litedb.c`: core CRUD, indexing, hashing, and locking.
  - `litedb_query.c`: batch operations, range/page queries, projection, and aggregation.
  - `litedb_persist.c`: save/load, transactions, and WAL integration.
  - `litedb_schema.c`: schema-version registry and migration callbacks for versioned restore.
  - `litedb_json.c`: JSON export/import helpers.
  - `litedb_error.c` + `litedb_error.h`: centralized error code state and human-readable message mapping.
  - `litedb_snapshot.c`: snapshot creation and restore logic.
  - `litedb_subscribe.c`: subscription management and ordered callbacks.
  - `litedb_oplog.c`: operation log used by rollback/snapshot replay paths.

### 2. Core in-memory data structures

For each table, LiteDB keeps several parallel runtime structures:

- `g_tables[]` — table descriptors and field metadata.
- `g_heads[]` — full-table linked list for traversal and cleanup.
- `g_buckets[]` + `g_bucket_counts[]` — hash index for fast key lookup.
- `g_row_counts[]` — cached row count, making `db_statistic(tableId, NULL)` an **O(1)** operation.

This design combines:
- **fast point lookup** through the hash buckets,
- **efficient full scan / cleanup** through the global linked list,
- **simple implementation cost** suitable for embedded or lightweight scenarios.

### 3. Key algorithms

| Operation | Main idea | Typical complexity |
|---|---|---|
| `db_query` | Hash on key fields, then walk only one bucket chain | Average **O(1)** |
| `db_createorset` | Lookup by hash, then insert or partial-update by `itemMap` | Average **O(1)** |
| `db_delete` | Remove from hash bucket and global list | Average **O(1)** |
| `db_statistic(..., NULL)` | Return cached row count | **O(1)** |
| `db_snapshot_restore` | Rebuild state directly from saved rows | **O(n)** |
| `db_snapshot_restore_replay` | Undo post-snapshot operations by inverse replay of oplog entries | **O(k)** |

#### Hashing

LiteDB computes a hash across all key fields using an FNV-style rolling hash. This keeps the implementation simple and portable while providing stable distribution for mixed key types such as MAC + IPv4.

#### Partial update by bitmask

`itemMap` is used as a field bitmask:
- if a bit is set, the corresponding field is copied from the caller's row,
- if all bits are set (`0xFFFFFFFFu`), the row is treated as a full replacement,
- key fields can be rehashed automatically if they change.

#### Dynamic bucket expansion

To prevent performance degradation on large datasets, the hash index now:
- starts with a larger default bucket count,
- expands automatically when the load factor grows too high,
- rebuilds bucket chains after resizing.

This keeps large test cases such as **1,000,000 terminal rows** within a practical runtime.

#### Bulk-load preallocation and pooled allocation

For high-volume insert scenarios, LiteDB now exposes `db_reserve()`:

- pre-sizes the table hash buckets before the first insert,
- pre-allocates node storage through a per-table pooled allocator,
- reuses released nodes to reduce `malloc/free` overhead during repeated runs.

This optimization is especially effective for the **10,000,000-row terminal insert** benchmark used in `test/testcase/test_performance.cpp`. In the current Linux development environment, the best verified run dropped from about **29s** to about **12.6s**, with repeated runs typically around **15–16s** depending on machine state.

#### Rollback and snapshot support

- **subscription callbacks** can veto add/update/delete operations;
- **async subscriptions** run on a background worker and do not block writes; use `db_async_wait_idle()` when tests or callers need to wait for delivery;
- if a callback fails, LiteDB restores the previous in-memory state immediately;
- `db_snapshot_restore()` restores by rebuilding table state from the snapshot image;
- `db_snapshot_restore_replay()` now performs a **true inverse replay** of operations recorded after the snapshot (`ADD -> DELETE`, `UPDATE -> restore before image`, `DELETE -> ADD back`);
- oplog recording is enabled only while at least one snapshot is alive, so normal CRUD performance remains fast.

### 4. Concurrency model

LiteDB now uses a **hybrid locking model**:

- a **global read-write lock** protects lifecycle and cross-table operations such as `db_init()`, `db_shutdown()`, snapshot creation/restore, and persistence,
- **per-table read-write locks** protect normal CRUD, index lookup, and traversal on each table independently,
- **bucket-stripe locks** keep slow synchronous callbacks on one key from blocking unrelated keys on the same table.

This keeps correctness and predictable behavior while allowing independent tables and independent key ranges to make progress concurrently, which fits the lightweight/embedded positioning of the project.

## Roadmap & Planned Capabilities

The next evolution of LiteDB is planned in priority order:

### Priority 1 — Core usability
- [x] More built-in field types: `FTYPE_DATE`, `FTYPE_INT64`, `FTYPE_UINT32`, `FTYPE_FLOAT`, `FTYPE_DOUBLE`, `FTYPE_BLOB`, `FTYPE_UUID`, `FTYPE_DECIMAL`
- [x] Traversal / filtered query APIs: `db_foreach()` and `db_query_if()`
- [x] Lightweight transactions: `db_tx_begin()`, `db_tx_commit()`, `db_tx_rollback()`
- [x] Simple binary persistence: `db_save()` and `db_load()`
- [x] Error status API: `litedb_error_t`, `db_last_error()`, `db_strerror()`
- [x] Friendlier schema-definition DSL / helper macros to reduce manual `table_desc_t` boilerplate

### Priority 2 — Business query capabilities
- [x] Secondary indexes for non-key fields (`db_index_create()`, `db_index_query()`, `db_index_drop()`)
- [x] Batch insert / update / delete APIs (`db_batch_createorset()`, `db_batch_delete()`)
- [x] Pagination / sorting / range-query support for application-facing reads
- [x] Field constraints (`NOT NULL`, `UNIQUE`, range checks)
- [x] Aggregation helpers such as `count / sum / min / max` via `db_aggregate_number()`
- [ ] Custom validation callbacks / richer business rules

### Priority 4 — More competitive differentiation
- [x] Projection / selected-column reads to reduce copy cost for large rows
- [x] WAL-style durable append log with faster crash recovery
- [x] Schema migration / versioned restore helpers for long-lived embedded products
- [x] JSON import/export and snapshot inspection tooling for easier integration

### Priority 3 — Runtime and eventing
- [x] Finer-grained locking (per-table locking + bucket-stripe isolation for keyed operations)
- [x] Async / cancellable subscriptions (`LITEDB_SUB_ASYNC`, `db_unsubscribe()`, `db_async_wait_idle()`)
- [x] More complex types such as `BLOB`, `UUID`, `DECIMAL` (with `DECIMAL` stored as a scaled `int64_t`, 4 fractional digits)
- [ ] Business-oriented subscription filters, field-level change subscriptions, and batched notifications

## Development Suggestions

- Add CI (GitHub Actions) to build and run tests automatically.
- Consider dynamic table/container sizes instead of fixed limits.
- Consider bucket-level locking on top of the current per-table locking model if even higher multi-thread throughput is needed.

## Contributing

1. Fork the repository
2. Create a feature branch
3. Commit changes and open a pull request

---

If you want, I can add a GitHub Actions workflow to build and run tests. Tell me if you prefer that and I'll scaffold it.

## New: Additional Field Types

- FTYPE_DATE: a packed 7-byte date/time format used to store timestamps as [year(2),month(1),day(1),hour(1),minute(1),second(1)].
  - Human-readable defaults for date fields accept strings like `"YYYY-MM-DD HH:MM:SS"` and are parsed at row-creation time.
- FTYPE_INT64, FTYPE_UINT32, FTYPE_FLOAT, FTYPE_DOUBLE: additional numeric types with default-value parsing and debug printing.

Generated test tables were added for exercising these types:
- `Event` (in `test/gen/litedb_gen.h`) — contains an `FTYPE_DATE` `timestamp` field (7 bytes).
- `Misc` — contains `int64/uint32/float/double` fields with default values parsed from the generated descriptors.

Files added for tests:
- `test/testcase/test_date.cpp` — validates default timestamp parsing and explicit timestamp writes.
- `test/testcase/test_misc_types.cpp` — validates defaults and updates for the new numeric types.

Runtime note (Windows / MinGW):
- The test binary links against MinGW runtime DLLs (e.g. `libgcc_s_seh-1.dll`, `libstdc++-6.dll`, `libwinpthread-1.dll`). If you see an exit code `0xC000007B` when running tests, ensure your MinGW `bin` directory is on `PATH`, for example:

```powershell
$env:Path = "D:\mingw64\bin;" + $env:Path
& "builds\test\testcase\litedb_test.exe"
```

Alternatively, to avoid depending on the MinGW runtime on target machines, you can enable static linking of libgcc/libstdc++ by adding the following linker flags in `CMakeLists.txt` for the test target or global CXX flags:

```cmake
target_link_options(litedb_test PRIVATE -static-libgcc -static-libstdc++)
```

This makes the test binaries larger but removes the need to ship the MinGW runtime DLLs.