Background Indexing

Fluree maintains query-optimized indexes through a background indexing process. This document covers the indexing architecture, configuration, and monitoring.

Index Architecture

Fluree maintains four index permutations for efficient query execution:

SPOT (Subject-Predicate-Object-Time)

Organized by subject first:

ex:alice → schema:name → "Alice" → [t=1, t=5]
ex:alice → schema:age → 30 → [t=1]
ex:alice → schema:age → 31 → [t=10]

Optimized for: “Give me all properties of this subject”

POST (Predicate-Object-Subject-Time)

Organized by predicate first:

schema:name → "Alice" → ex:alice → [t=1, t=5]
schema:age → 30 → ex:alice → [t=1]
schema:age → 31 → ex:alice → [t=10]

Optimized for: “Find all subjects with this property/value”

OPST (Object-Predicate-Subject-Time)

Organized by object first:

"Alice" → schema:name → ex:alice → [t=1, t=5]
30 → schema:age → ex:alice → [t=1]
31 → schema:age → ex:alice → [t=10]

Optimized for: “Find subjects with this object value”

PSOT (Predicate-Subject-Object-Time)

Organized by predicate, then subject:

schema:name → ex:alice → "Alice" → [t=1, t=5]
schema:age → ex:alice → 30 → [t=1]
schema:age → ex:alice → 31 → [t=10]

Optimized for: “Get all values for this predicate”

Indexing Process

1. Transaction Commit

t=42: Transaction committed
  - Flakes written to append-only log
  - Commit metadata created
  - Commit published to nameservice (commit_t=42)

2. Indexer Detection

Background indexing is triggered when the ledger’s novelty exceeds the configured threshold (see Configuration below):

Indexer checks: commit_t=42, index_t=40
Indexer: Need to index t=41, t=42

Background indexing builds a new index snapshot up to a specific to_t (typically the current commit_t when the job starts). During the job, new commits may arrive; those remain in novelty for the next cycle.

Incremental indexing (default path):
  - Load the existing index root (CAS CID) from nameservice
  - Resolve only commits with t in (index_t, to_t]
  - Merge resolved novelty into only the affected leaf blobs (Copy-on-Write)
  - Update dictionaries (forward packs + reverse trees)
  - Assemble a new root referencing mostly-unchanged CAS artifacts

Fallback:
  - If incremental indexing cannot safely proceed, fall back to a full rebuild

4. Index Publishing

When complete:

  - Upload new CAS blobs (leaves, branches, dict blobs) as needed
  - Upload the new index root (CAS CID)
  - Publish index_head_id to nameservice (atomic “commit point”)
  - Update index_t to to_t

Novelty Layer

The novelty layer consists of transactions committed but not yet indexed:

Current State:
  commit_t = 150
  index_t = 145
  novelty = [t=146, t=147, t=148, t=149, t=150]

Query Execution with Novelty

Queries combine indexed data with novelty:

Query for ex:alice's properties:

1. Check SPOT index (up to t=145)
2. Apply novelty layer (t=146 to t=150)
3. Combine results

Impact of Large Novelty

Small novelty (< 10 transactions):

Minimal query overhead
Fast query execution

Large novelty (> 100 transactions):

Significant query overhead
Slower query execution
Higher memory usage

Configuration

Background indexing is on by default. Indexing is triggered based on novelty size thresholds:

Enable/disable background indexing: --indexing-enabled / FLUREE_INDEXING_ENABLED (default true; disable only when a peer/indexer process owns this storage)
Trigger threshold (soft): --reindex-min-bytes / FLUREE_REINDEX_MIN_BYTES
Backpressure threshold (hard): --reindex-max-bytes / FLUREE_REINDEX_MAX_BYTES

See Operations: Configuration for the canonical flag/env/config-file reference.

Incremental parallelism (per ledger)

Within a single incremental indexing job, Fluree can update multiple (graph, index-order) branches concurrently. This is bounded by:

IndexerConfig.incremental_max_concurrency (default: 4)

This setting is part of the Rust IndexerConfig used by the indexer pipeline; it is not a server CLI flag. Increasing it can improve throughput on multi-graph ledgers and can run the four main index orders (SPOT/PSOT/POST/OPST) in parallel, at the cost of higher peak memory.

Monitoring

Check Index Status

curl http://localhost:8090/v1/fluree/info/mydb:main

Response:

{
  "ledger_id": "mydb:main",
  "branch": "main",
  "commit_t": 150,
  "index_t": 145,
  "commit_id": "bafy...headCommit",
  "index_id": "bafy...indexRoot"
}

Key Metrics:

index lag (txns): commit_t - index_t

For byte-level novelty size and indexing trigger decisions, see the indexing block returned by transaction and replication endpoints (e.g. POST /push/<ledger>), documented in API Endpoints.

Key Log Messages

At INFO, background indexing now emits coarse-grained progress logs that make it easier to distinguish:

request queued vs. worker started
current wait status while trigger_index() is blocked
incremental vs. rebuild path selection
commit-chain walking progress
commit resolution progress and phase completion

When background indexing is queued by an HTTP transaction request, the worker logs also include copied request_id and trace_id fields from the triggering request. This provides log-level correlation between the foreground request and the later background build without making the index build part of the original request trace.

At DEBUG, the same wait and commit-walk paths emit more frequent progress updates for incident debugging without changing behavior.

When you call indexing through the Rust API with trigger_index(), wait timeout is optional and should generally be chosen by the caller. Leave TriggerIndexOptions.timeout_ms unset to wait until completion, or set it explicitly for bounded environments such as Lambda jobs, HTTP gateways, or other workers with a fixed maximum runtime.

Health Indicators

Healthy:

index_lag: 0-10 transactions
index_rate > transaction_rate

Warning:

index_lag: 10-50 transactions
index_rate ≈ transaction_rate

Critical:

index_lag: > 50 transactions
index_rate < transaction_rate

Performance Tuning

Optimize for Write-Heavy Loads

fluree-server \
  --indexing-enabled \
  --reindex-min-bytes 200000 \
  --reindex-max-bytes 2000000

Larger thresholds reduce indexing frequency (more novelty accumulation), trading some query-time overlay cost for reduced background indexing activity.

Optimize for Read-Heavy Loads

fluree-server \
  --indexing-enabled \
  --reindex-min-bytes 50000

Smaller reindex-min-bytes keeps novelty smaller (better query performance) at the cost of more frequent background indexing cycles.

Index Storage

Index Snapshots

Indexes are stored as immutable, content-addressed snapshots:

  - Leaf blobs (FLI3) and branch manifests (FBR3)
  - Dictionary blobs (forward packs, reverse tree leaves/branches)
  - An index root blob (FIR6) that references everything needed for queries

The nameservice stores the current index root CID (index_head_id) and its watermark (index_t). Peers fetch only the CAS objects they need on demand.

Index Retention

Old index snapshots are retained for time-travel safety and concurrent query safety. Cleanup is performed by the binary index garbage collector, governed by:

IndexerConfig.gc_max_old_indexes
IndexerConfig.gc_min_time_mins

No standalone HTTP compaction endpoint is currently exposed. Use POST /v1/fluree/reindex when you need to force a full index refresh.

Troubleshooting

High indexing lag

Symptom: commit_t - index_t grows continuously

Causes:

Transaction rate exceeds indexing capacity
Large transactions
Insufficient resources

Solutions:

Reduce reindex-min-bytes so indexing triggers sooner
Increase resources for the indexer (CPU/memory and storage throughput)
Consider running a dedicated indexer process (separate from the transactor)
For incremental indexing, consider increasing IndexerConfig.incremental_max_concurrency

Slow Indexing

Symptom: index_t advances slowly (or stops advancing)