Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Fluree DB

A semantic graph database with time travel, branching, and verifiable data — built on W3C standards.

Fluree DB is a single binary that stores your data as an RDF knowledge graph, queryable with SPARQL or JSON-LD Query, with every commit immutably recorded so you can travel back to any prior state. It supports git-style branching and merging, signed and policy-gated transactions, SHACL validation, OWL/RDFS reasoning, and full-text and vector search — over local files, S3, or IPFS — without bolting on external services.

What you get

  • Semantic by default. Your data is RDF. IRIs, JSON-LD @context, named graphs, and typed values are first-class. Queries are SPARQL 1.1 or the equivalent JSON-LD Query, both compiling to the same execution engine.
  • Time travel. Every transaction is a commit on an immutable chain. Query the state of the graph at any past moment with a single t parameter — no snapshots to restore, no separate audit log to consult.
  • Branching and merging. Create a branch off any commit, transact against it in isolation, then merge it back. Useful for staging changes, running what-if analyses, or maintaining environment-specific overlays.
  • Verifiable data. Transactions and commits can be signed (JWS / W3C Verifiable Credentials). The commit chain is content-addressed, so any tampering is detectable. Pair it with policy enforcement to prove who changed what and when they were allowed to.
  • Policy-based access control. Policies are written as graph data, evaluated per query and per transaction, and travel with the ledger — not bolted on at the API layer.
  • Storage your way. Local filesystem for development, S3 + DynamoDB for production, IPFS for content-addressed distribution. The same ledger format works across all of them.
  • Search built in. BM25 full-text indexing and HNSW vector search live alongside SPARQL — no separate search service to operate.
  • Reasoning. OWL/RDFS inference and Datalog rules run inside the query engine, so derived facts are queryable without a materialization step.
  • Embeddable. The same engine that powers the server runs as a Rust library, generic over storage and nameservice. Use it directly in your application or run it standalone over HTTP.

Start here

Explore the docs

  • Concepts — ledgers, graph sources, IRIs, time travel, policy, verifiable data, reasoning
  • Guides (cookbooks) — search, time travel, branching, policies, SHACL — task-oriented recipes
  • CLI reference — every fluree command, flag by flag
  • HTTP API — endpoints, headers, signed requests, error model
  • Query — JSON-LD Query, SPARQL, output formats, CONSTRUCT, explain plans, reasoning
  • Transactions — insert, upsert, update, conditional updates, signed transactions
  • Security and policy — authentication, encryption, commit signing, policy model
  • Indexing and search — background indexing, BM25, vector search, geospatial
  • Graph sources and integrations — Iceberg/Parquet, R2RML, BM25 graph source
  • Operations — configuration, Docker, storage modes, telemetry, archive/restore
  • Design — internals: query execution, storage traits, index format, nameservice
  • Reference — glossary, vocabulary, OWL/RDFS support, crate map
  • Troubleshooting — common errors, debugging queries, performance tracing
  • Contributing — dev setup, tests, SPARQL compliance, releasing

The full table of contents is in SUMMARY.md.

Fluree Memory

Fluree Memory is persistent, searchable memory for AI coding assistants — built on Fluree DB and shipped in the same fluree binary. If you’re here for the memory tooling, jump straight to the Memory docs.

Fluree CLI

The fluree command-line interface provides a convenient way to manage ledgers, run queries, and perform transactions without running a server.

Installation

Build from source:

cargo build --release -p fluree-db-cli

The binary will be at target/release/fluree.

Quick Start

# Initialize a project directory
fluree init

# Create a ledger
fluree create myledger

# Insert data
fluree insert '@prefix ex: <http://example.org/> .
ex:alice a ex:Person ; ex:name "Alice" .'

# Query
fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'

Global Options

OptionDescription
-v, --verboseEnable verbose output
-q, --quietSuppress non-essential output
--no-colorDisable colored output (also respects NO_COLOR env var)
--config <PATH>Path to config file
--memory-budget-mb <MB>Memory budget in MB for bulk import (0 = auto: 75% of system RAM). Affects chunk size, concurrency, and run budget when creating a ledger with --from.
--parallelism <N>Number of parallel parse threads for bulk import (0 = auto: system cores, default cap 6). Used when creating a ledger with --from.
-h, --helpPrint help
-V, --versionPrint version

Commands

Core Commands

CommandDescription
initInitialize a new Fluree project directory
createCreate a new ledger
useSet the active ledger
listList all ledgers
infoShow detailed information about a ledger
dropDrop (delete) a ledger
insertInsert data into a ledger
upsertUpsert data (insert or update existing)
updateUpdate with WHERE/DELETE/INSERT patterns
queryQuery a ledger
historyShow change history for an entity
exportExport ledger data
logShow commit log
showShow decoded commit contents (flakes with resolved IRIs)
indexBuild or update the binary index (incremental)
reindexFull reindex from commit history

Remote Sync

CommandDescription
remoteManage remote servers
upstreamManage upstream tracking configuration
fetchFetch refs from a remote
cloneClone a ledger from a remote (full commit download)
pullPull commits from upstream
pushPush to upstream remote
trackTrack remote-only ledgers (no local data)

Clone and pull transfer commits and, by default, binary index data from the remote (pack protocol), so the local ledger is query-ready without a separate reindex. Use --no-indexes to skip index transfer and reduce download size; run fluree reindex afterward if you need the index. Large transfers may prompt for confirmation before streaming.

Server Management

CommandDescription
serverManage the Fluree HTTP server (run, start, stop, status, restart, logs)

Start a server directly from a project directory — it inherits the same .fluree/ context (config, storage) as the CLI. See server for details.

Implementers

If you’re building a custom server that must support the CLI end-to-end (for example, integrating into another app), see:

Authentication

CommandDescription
tokenCreate, inspect, and manage JWS tokens
authManage bearer tokens stored on remotes (login/logout/status)

Configuration

CommandDescription
configManage configuration
prefixManage IRI prefix mappings
completionsGenerate shell completions

Developer Memory

CommandDescription
memoryStore and recall facts, decisions, constraints, preferences, and artifact references
mcpMCP server for IDE agent integration

For background, IDE setup, team workflows, and the mem: schema, see the Memory section of the docs.

Project Structure

When you run fluree init, a .fluree/ directory is created with:

.fluree/
├── active          # Currently active ledger name
├── config.toml     # Configuration settings
├── prefixes.json   # IRI prefix mappings
└── storage/        # Ledger data storage

Input Resolution

Commands that accept data input (insert, upsert, update, query) use flexible argument resolution:

ArgumentsBehavior
(none)Active ledger; provide input via -e, -f, or stdin
<arg>Auto-detected: if it looks like a query/data, uses it inline; if it’s an existing file, reads from it; otherwise treats it as a ledger name
<ledger> <input>Specified ledger + inline input

Input is resolved in this priority order: -e flag > positional inline > -f flag > positional file > stdin.

Data Format Detection

The CLI auto-detects data format based on content:

  • Lines starting with @prefix or @base → Turtle
  • Content starting with { or [ → JSON-LD
  • Files with .ttl extension → Turtle
  • Files with .json or .jsonld extension → JSON-LD

You can override with --format turtle or --format jsonld.

fluree init

Initialize a new Fluree project directory.

Usage

fluree init [OPTIONS]

Options

OptionDescription
--globalCreate global config and data directories instead of a local .fluree/ directory

Description

Creates a .fluree/ directory in the current working directory (or global directories with --global). This directory stores:

  • active - The currently active ledger name
  • config.toml - Configuration settings
  • prefixes.json - IRI prefix mappings for compact IRIs
  • storage/ - Ledger data

Running init is idempotent - it won’t overwrite existing configuration.

Examples

# Initialize in current directory
fluree init

# Initialize global config
fluree init --global

Global Directory

With --global, the directories are determined by:

  1. $FLUREE_HOME environment variable (if set) — both config and data go in this single directory.
  2. Platform directories (when $FLUREE_HOME is not set):
ContentLinuxmacOSWindows
Config (config.toml)$XDG_CONFIG_HOME/fluree (default: ~/.config/fluree)~/Library/Application Support/fluree%LOCALAPPDATA%\fluree
Data (storage/, active, prefixes.json)$XDG_DATA_HOME/fluree (default: ~/.local/share/fluree)~/Library/Application Support/fluree%LOCALAPPDATA%\fluree

On macOS and Windows both resolve to the same directory (unified); on Linux config and data are separated per XDG conventions.

The generated config.toml will contain an absolute storage_path pointing to the data directory when the directories are split.

See Also

  • create - Create a new ledger after initialization

fluree create

Create a new ledger.

Usage

fluree create <LEDGER> [OPTIONS]

Arguments

ArgumentDescription
<LEDGER>Name for the new ledger

Options

OptionDescription
--from <PATH>Import data from a file (Turtle or JSON-LD)
--memory [PATH]Import memory history from a git-tracked .fluree-memory/ directory. Defaults to the current repo if no path is given. Mutually exclusive with --from.
--no-userExclude user-scoped memories (.local/user.ttl) from --memory import
--chunk-size-mb <MB>Chunk size in MB for splitting large Turtle files (0 = derive from memory budget). Only used when --from points to a .ttl file.
--leaflet-rows <N>Rows per leaflet in the binary index (default: 25000). Larger values produce fewer, bigger leaflets — less I/O per scan, more memory per read.
--leaflets-per-leaf <N>Leaflets per leaf file (default: 10). Larger values produce fewer leaf files — shallower tree, bigger reads.

Global flags that affect bulk import when using --from (see CLI README):

  • --memory-budget-mb <MB> — Memory budget in MB (0 = auto: 75% of system RAM). Drives chunk size, concurrency, and indexer run budget.
  • --parallelism <N> — Number of parallel parse threads (0 = auto: system cores, cap 6).

Description

Creates a new empty ledger with the given name and sets it as the active ledger. The ledger is stored in .fluree/storage/.

Use --from to create a ledger pre-populated with data from a Turtle or JSON-LD file. For large Turtle files, the CLI splits work into chunks and runs parallel parse threads; tune with --memory-budget-mb and --parallelism if needed.

Use --memory to import your project’s developer memory history into a time-travel-capable Fluree ledger. Each git commit that touched .fluree-memory/repo.ttl (and .local/user.ttl unless --no-user is set) becomes a Fluree transaction. The git commit message, SHA, and author date are stored as transaction metadata, so you can correlate Fluree t values with git history.

Examples

# Create an empty ledger
fluree create mydb

# Create with initial data
fluree create mydb --from seed-data.ttl

# Create from JSON-LD
fluree create mydb --from initial.jsonld

# Create with explicit memory and parallelism for a large Turtle file
fluree create mydb --from large.ttl --memory-budget-mb 4096 --parallelism 8

# Import memory history from the current repo
fluree create memories --memory

# Import memory history from another repo, excluding user memories
fluree create memories --memory /path/to/other/repo --no-user

Output

Created ledger 'mydb'
Set 'mydb' as active ledger

With --from:

Created ledger 'mydb'
Committed t=1 (42 flakes)
Set 'mydb' as active ledger

With --memory:

Created ledger 'memories' with 42 commits (t=1..43)
  Earliest: bf803255 — initial memory store
  Latest:   9865e5cd — prevent overrides of fluree txn-meta

Query with time travel:
  fluree query memories 'SELECT ?id ?content WHERE { ?id a mem:Fact ; mem:content ?content } LIMIT 5'
  fluree query memories --at-t 2 'SELECT ...'   # state at first commit

See Also

  • list - List all ledgers
  • use - Switch active ledger
  • drop - Delete a ledger

fluree use

Set the active ledger.

Usage

fluree use <LEDGER>

Arguments

ArgumentDescription
<LEDGER>Ledger name to set as active

Description

Sets the specified ledger as the active ledger. Subsequent commands that don’t specify a ledger will use this one.

Examples

# Switch to a different ledger
fluree use production

# Verify with info
fluree info

Output

Active ledger set to 'production'

Errors

If the ledger doesn’t exist:

error: ledger 'nonexistent' not found

See Also

  • list - List all ledgers
  • create - Create a new ledger

fluree list

List all ledgers and graph sources.

Usage

fluree list

Description

Lists all ledgers and graph sources (Iceberg, R2RML, BM25, Vector, etc.) in the current Fluree directory. The active ledger is marked with an asterisk (*).

When graph sources are present, a TYPE column is shown to distinguish ledgers from graph sources.

Examples

fluree list

Output

When only ledgers exist:

   LEDGER      BRANCH  T
 * mydb        main    5
   production  main    12

When graph sources are also present:

   NAME              BRANCH  TYPE     T
 * mydb              main    Ledger   5
   production        main    Ledger   12
   warehouse-orders  main    Iceberg  -
   my-search         main    BM25     5

If nothing exists:

No ledgers found. Run 'fluree create <name>' to create one.

See Also

  • create - Create a new ledger
  • iceberg - Map Iceberg tables as graph sources
  • info - Show detailed ledger or graph source information
  • use - Switch active ledger

fluree info

Show detailed information about a ledger or graph source.

Usage

fluree info [NAME] [--remote <name>] [--graph <name|IRI>]

Arguments

ArgumentDescription
[NAME]Ledger or graph source name (defaults to active ledger)

Options

OptionDescription
--remote <name>Query a remote server (e.g., origin) instead of the local installation.
--graph <name|IRI>Scope the stats block to a single named graph within the ledger. Accepts a well-known name (default, txn-meta) or a graph IRI. Not applicable to graph sources.

Description

Displays detailed information about a ledger or graph source. The command first checks for a matching ledger; if none is found, it checks for a graph source with the same name.

For ledgers, displays:

  • Ledger ID, branch, and type
  • Current transaction number (t)
  • Commit and index details

For graph sources (Iceberg, R2RML, BM25, etc.), displays:

  • Name, branch, and type
  • Graph source ID
  • Index status
  • Dependencies
  • Configuration (catalog URI, table, mapping, etc.)

Examples

# Info for active ledger
fluree info

# Info for specific ledger
fluree info production

# Info for a graph source
fluree info warehouse-orders

# Query a remote server
fluree info production --remote origin

# Scope stats to the default graph
fluree info mydb --graph default

# Scope stats to the transaction-metadata graph
fluree info mydb --graph txn-meta

# Scope stats to a specific named graph by IRI
fluree info mydb --graph https://example.org/graphs/inventory

When --graph is set, the command prints the full ledger-info JSON response with the stats block scoped to the selected graph (properties, classes, flakes, size).

Output

Ledger:

Ledger:         mydb
Branch:         main
Type:           Ledger
Ledger ID:      mydb:main
Commit t:       5
Commit ID:      bafybeig...
Index t:        5
Index ID:       bafybeig...

Graph source (Iceberg):

Name:           warehouse-orders
Branch:         main
Type:           Iceberg
ID:             warehouse-orders:main
Retracted:      false
Index t:        0
Index ID:       (none)

Configuration:
{
  "catalog": {
    "type": "rest",
    "uri": "https://polaris.example.com/api/catalog"
  },
  "table": "sales.orders",
  ...
}

See Also

  • list - List all ledgers and graph sources
  • iceberg - Map Iceberg tables as graph sources
  • log - Show commit history

fluree branch

Manage branches for a ledger.

Subcommands

fluree branch create

Create a new branch.

Usage:

fluree branch create <NAME> [OPTIONS]

Arguments:

ArgumentDescription
<NAME>Name for the new branch (e.g., “dev”, “feature-x”)

Options:

OptionDescription
--ledger <LEDGER>Ledger name (defaults to active ledger)
--from <BRANCH>Source branch to create from (defaults to “main”)
--at <COMMIT-REF>Commit to branch at (defaults to source branch HEAD). Accepts t:N for a transaction number or a hex digest / full CID.
--remote <REMOTE>Execute against a remote server

Description:

Creates a new branch for a ledger. By default the branch starts at the source branch’s current HEAD, and is fully isolated — subsequent transactions on either branch are invisible to the other.

Pass --at to branch from a historical commit on the source branch instead of its HEAD. The commit must be reachable from the source HEAD; the new branch starts with no index and replays from genesis on first query. t:N and hex-prefix resolution require the source branch to be indexed (full CIDs work unconditionally).

Branches can be nested: you can create a branch from any existing branch, not just “main”.

Examples:

# Create a branch from main (default)
fluree branch create dev

# Create a branch for a specific ledger
fluree branch create dev --ledger mydb

# Create a branch from another branch
fluree branch create feature-x --from dev

# Branch at a historical point on main (transaction number)
fluree branch create rewind --at t:5

# Branch at a historical commit by hex-digest prefix
fluree branch create rewind --at 3dd028a7

# Create a branch on a remote server
fluree branch create staging --ledger mydb --remote origin

Output:

Created branch 'dev' from 'main' at t=5
Ledger ID: mydb:dev

fluree branch list

List all branches for a ledger.

Usage:

fluree branch list [LEDGER] [OPTIONS]

Arguments:

ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)

Options:

OptionDescription
--remote <REMOTE>List branches on a remote server

Examples:

# List branches for the active ledger
fluree branch list

# List branches for a specific ledger
fluree branch list mydb

# List branches on a remote server
fluree branch list mydb --remote origin

Output:

 BRANCH     T   SOURCE
 main       5   -
 dev        7   main
 feature-x  8   dev

fluree branch drop

Drop a branch from a ledger.

Usage:

fluree branch drop <NAME> [OPTIONS]

Arguments:

ArgumentDescription
<NAME>Branch name to drop (e.g., “dev”, “feature-x”)

Options:

OptionDescription
--ledger <LEDGER>Ledger name (defaults to active ledger)
--remote <REMOTE>Execute against a remote server

Description:

Drops a branch from a ledger. The main branch cannot be dropped.

  • Leaf branches (no children) are fully deleted — storage artifacts are removed and the NsRecord is purged.
  • Branches with children are retracted (hidden from listings, reject new transactions) but storage is preserved so that child branches continue to work. When the last child is eventually dropped, the retracted parent is automatically cascade-purged.

Examples:

# Drop a branch
fluree branch drop dev

# Drop a branch for a specific ledger
fluree branch drop dev --ledger mydb

# Drop a branch on a remote server
fluree branch drop staging --ledger mydb --remote origin

Output (leaf branch):

Dropped branch 'dev'.
  Artifacts deleted: 5

Output (branch with children):

Branch 'dev' retracted (has children, storage preserved).

Output (cascade):

Dropped branch 'feature'.
  Artifacts deleted: 3
  Cascaded drops: mydb:dev

fluree branch rebase

Rebase a branch onto its source branch’s current HEAD.

Usage:

fluree branch rebase <NAME> [OPTIONS]

Arguments:

ArgumentDescription
<NAME>Branch name to rebase (e.g., “dev”, “feature-x”)

Options:

OptionDescription
--ledger <LEDGER>Ledger name (defaults to active ledger)
--strategy <STRATEGY>Conflict resolution strategy (default: “take-both”). Options: take-both, abort, take-source, take-branch, skip
--remote <REMOTE>Execute against a remote server

Description:

Replays a branch’s unique commits on top of the source branch’s current HEAD. This brings the branch up to date with upstream changes. The main branch cannot be rebased.

If the branch has no unique commits, a fast-forward rebase is performed — the branch point is simply updated to the source’s current HEAD.

Conflicts occur when both the branch and source have modified the same (subject, predicate, graph) tuples. See conflict strategies for details.

Examples:

# Rebase with default strategy
fluree branch rebase dev

# Rebase with abort-on-conflict strategy
fluree branch rebase dev --strategy abort

# Rebase for a specific ledger
fluree branch rebase feature-x --ledger mydb --strategy take-source

# Rebase on a remote server
fluree branch rebase dev --ledger mydb --remote origin

Output (fast-forward):

Fast-forward rebase of 'dev' to t=5.

Output (with replay):

Rebased 'dev': 3 commits replayed, 0 skipped, 1 conflicts, 0 failures.
  New branch point: t=8

fluree branch diff

Show a read-only merge preview between two branches.

Usage:

fluree branch diff <SOURCE> [OPTIONS]

Arguments:

ArgumentDescription
<SOURCE>Source branch name to preview merging from (e.g., “dev”, “feature-x”)

Options:

OptionDescription
--target <BRANCH>Target branch to preview merging into (defaults to source’s parent branch)
--max-commits <N>Cap on per-side commit summaries shown (default: 50; pass 0 for unbounded in local mode)
--max-conflict-keys <N>Cap on conflict keys shown (default: 50; pass 0 for unbounded in local mode)
--no-conflictsSkip conflict computation for a cheaper preview
--conflict-detailsInclude source/target flake values for returned conflict keys
--strategy <STRATEGY>Strategy used for conflict detail labels (default: take-both). Options: take-both, abort, take-source, take-branch
--jsonEmit the raw JSON preview
--ledger <LEDGER>Ledger name (defaults to active ledger)
--remote <REMOTE>Execute against a remote server

Description:

branch diff reports ahead/behind commits, fast-forward eligibility, and conflicting (subject, predicate, graph) keys without mutating state. With --conflict-details, the preview also shows the source and target values for the returned conflict keys and annotates what the selected strategy would do.

Examples:

# Preview merging dev into its parent
fluree branch diff dev

# Preview a specific target
fluree branch diff dev --target main

# Show value details and source-winning labels
fluree branch diff dev --target main --conflict-details --strategy take-source

# Emit raw JSON for UI tooling
fluree branch diff dev --conflict-details --json

fluree branch merge

Merge a source branch into a target branch.

Usage:

fluree branch merge <SOURCE> [OPTIONS]

Arguments:

ArgumentDescription
<SOURCE>Source branch name to merge from (e.g., “dev”, “feature-x”)

Options:

OptionDescription
--ledger <LEDGER>Ledger name (defaults to active ledger)
--target <BRANCH>Target branch to merge into (defaults to source’s parent branch)
--strategy <STRATEGY>Conflict resolution strategy (default: take-both). Options: take-both, abort, take-source, take-branch.
--remote <REMOTE>Execute against a remote server

Description:

Merges a source branch into a target branch. When the target hasn’t advanced since the source branched, this is a fast-forward; otherwise --strategy controls how conflicting edits are resolved (mirroring branch rebase).

When --target is omitted, the merge target is inferred from the source branch’s parent (the branch it was created from).

After a successful merge, the source branch remains intact and can continue to receive new transactions and be merged again. Only the new commits since the last merge (or branch creation) are copied.

Examples:

# Merge dev into main (inferred from branch point)
fluree branch merge dev

# Merge feature-x into dev (explicit target)
fluree branch merge feature-x --target dev

# Merge for a specific ledger
fluree branch merge dev --ledger mydb

# Merge with source-winning conflict resolution
fluree branch merge dev --target main --strategy take-source

# Merge on a remote server
fluree branch merge dev --ledger mydb --remote origin

Output:

Merged 'dev' into 'main' (fast-forward to t=8, 3 commits copied).

Output (non-fast-forward):

Merged 'dev' into 'main' (t=9, 3 commits copied, 1 conflicts).

See Also

  • create - Create a new ledger
  • list - List all ledgers
  • info - Show ledger details
  • use - Switch active ledger

fluree drop

Drop (delete) a ledger or graph source.

Usage

fluree drop <NAME> --force

Arguments

ArgumentDescription
<NAME>Ledger or graph source name to drop

Options

OptionDescription
--forceRequired flag to confirm deletion

Description

Permanently deletes a ledger or graph source. The --force flag is required to prevent accidental deletion.

The command first tries to drop the name as a ledger. If no ledger is found, it tries to drop it as a graph source. This means fluree drop works uniformly for both ledgers and graph sources like Iceberg mappings.

Examples

# Delete a ledger
fluree drop oldledger --force

# Delete a graph source (Iceberg mapping)
fluree drop warehouse-orders --force

Output

Ledger:

Dropped ledger 'oldledger'

Graph source:

Dropped graph source 'warehouse-orders:main'

Errors

Without --force:

error: use --force to confirm deletion of 'oldledger'

See Also

  • create - Create a new ledger
  • iceberg - Map Iceberg tables as graph sources
  • list - List all ledgers and graph sources

fluree insert

Insert data into a ledger.

Usage

fluree insert [LEDGER] [DATA] [OPTIONS]

Arguments

ArgumentsBehavior
(none)Active ledger; provide data via -e, -f, or stdin
<arg>Auto-detected: if it looks like data (JSON, Turtle), uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name
<ledger> <data>Specified ledger + inline data

Options

OptionDescription
-e, --expr <EXPR>Inline data expression (alternative to positional)
-f, --file <FILE>Read data from a file
-m, --message <MSG>Commit message
--format <FORMAT>Data format: turtle or jsonld (auto-detected if omitted)
--remote <NAME>Execute against a remote server (by remote name, e.g., origin)

Description

Inserts RDF data into a ledger. Supports both Turtle and JSON-LD formats. Data can come from:

  • A positional argument (inline data)
  • -e flag (inline expression)
  • -f flag (file)
  • Standard input (pipe)

Examples

# Insert inline Turtle
fluree insert '@prefix ex: <http://example.org/> .
ex:alice a ex:Person ; ex:name "Alice" .'

# Insert inline JSON-LD
fluree insert '{"@id": "ex:bob", "ex:name": "Bob"}'

# Insert from file
fluree insert -f data.ttl

# Insert with commit message
fluree insert -f data.ttl -m "Added initial users"

# Insert into specific ledger
fluree insert production '<http://example.org/x> a <http://example.org/Thing> .'

# Pipe from stdin
cat data.ttl | fluree insert

Output

Committed t=1 (42 flakes)

With verbose mode:

Committed t=1 (42 flakes)
Commit ID: bafybeig...

Data Format Detection

The format is auto-detected:

  • @prefix or @base at line start → Turtle
  • Starts with { or [ → JSON-LD
  • .ttl file extension → Turtle
  • .json or .jsonld extension → JSON-LD

Override with --format turtle or --format jsonld.

See Also

  • upsert - Insert or update existing data
  • update - Full WHERE/DELETE/INSERT updates
  • query - Query the inserted data
  • export - Export all data

fluree upsert

Upsert data into a ledger (insert or update existing).

Usage

fluree upsert [LEDGER] [DATA] [OPTIONS]

Arguments

ArgumentsBehavior
(none)Active ledger; provide data via -e, -f, or stdin
<arg>Auto-detected: if it looks like data (JSON, Turtle), uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name
<ledger> <data>Specified ledger + inline data

Options

OptionDescription
-e, --expr <EXPR>Inline data expression (alternative to positional)
-f, --file <FILE>Read data from a file
-m, --message <MSG>Commit message
--format <FORMAT>Data format: turtle or jsonld (auto-detected if omitted)
--remote <NAME>Execute against a remote server (by remote name, e.g., origin)

Description

Upserts RDF data into a ledger. Unlike insert, upsert will:

  • Insert new entities
  • Replace existing values for entities that already exist (matched by @id)

This is useful for updating data without needing to know whether it exists.

Examples

# Update or insert a user
fluree upsert '@prefix ex: <http://example.org/> .
ex:alice ex:name "Alice Smith" ; ex:age 31 .'

# Upsert from file
fluree upsert -f updates.ttl

# Upsert with commit message
fluree upsert '{"@id": "ex:alice", "ex:status": "active"}' -m "Updated Alice status"

Output

Committed t=2 (3 flakes)

Difference from Insert

OperationExisting EntityNew Entity
insertAdds new triples (may create duplicates)Creates entity
upsertReplaces values for given predicatesCreates entity

See Also

  • insert - Insert without replacement
  • update - Full WHERE/DELETE/INSERT updates
  • query - Query data
  • history - View change history

fluree update

Update data with full WHERE/DELETE/INSERT semantics.

Usage

fluree update [LEDGER] [DATA] [OPTIONS]

Arguments

ArgumentsBehavior
(none)Active ledger; provide data via -e, -f, or stdin
<arg>Auto-detected: if it looks like data (JSON or SPARQL UPDATE), uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name
<ledger> <data>Specified ledger + inline data

Options

OptionDescription
-e, --expr <EXPR>Inline data expression (alternative to positional)
-f, --file <FILE>Read data from a file
-m, --message <MSG>Commit message
--format <FORMAT>Data format: jsonld or sparql (auto-detected if omitted)
--remote <NAME>Execute against a remote server (by remote name)
--directBypass auto-routing through a local server (global flag; see note on SPARQL UPDATE below)

Description

Executes a WHERE/DELETE/INSERT transaction against a ledger. Unlike insert (which only adds data) and upsert (which replaces by subject+predicate), update supports the full WHERE/DELETE/INSERT pattern, enabling:

  • Conditional deletes: delete triples matching a WHERE pattern
  • Conditional updates: delete old values and insert new ones based on WHERE matches
  • Computed updates: use variables from WHERE to derive new values via bind
  • Delete-only operations: WHERE + DELETE without INSERT
  • Insert-only operations: equivalent to insert but using the update command

Supported Formats

  • JSON-LD (default): transaction body with where, delete, and/or insert keys
  • SPARQL UPDATE: standard INSERT DATA, DELETE DATA, DELETE/INSERT WHERE syntax

SPARQL UPDATE Note

SPARQL UPDATE requires the server’s parsing pipeline. It works automatically when:

  • A local server is running (the CLI auto-routes through it by default)
  • Using --remote to target a remote server

For direct local mode (--direct), use JSON-LD format instead.

Examples

Conditional Property Update (JSON-LD)

# Update Alice's age: find old value, delete it, insert new one
fluree update '{
  "@context": {"ex": "http://example.org/"},
  "where": [{"@id": "ex:alice", "ex:age": "?oldAge"}],
  "delete": [{"@id": "ex:alice", "ex:age": "?oldAge"}],
  "insert": [{"@id": "ex:alice", "ex:age": 31}]
}'

Delete-Only

# Remove all email addresses for alice
fluree update '{
  "@context": {"ex": "http://example.org/"},
  "where": [{"@id": "ex:alice", "ex:email": "?email"}],
  "delete": [{"@id": "ex:alice", "ex:email": "?email"}]
}'

Bulk Conditional Update

# Set all "pending" users to "active"
fluree update '{
  "@context": {"ex": "http://example.org/"},
  "where": [{"@id": "?person", "ex:status": "pending"}],
  "delete": [{"@id": "?person", "ex:status": "pending"}],
  "insert": [{"@id": "?person", "ex:status": "active"}]
}'

From File

fluree update -f update.json
fluree update -f update.json -m "Updated user statuses"

SPARQL UPDATE (via server)

# Requires a running server (fluree server start)
fluree update -e 'PREFIX ex: <http://example.org/>
DELETE { ex:alice ex:age ?oldAge }
INSERT { ex:alice ex:age 31 }
WHERE { ex:alice ex:age ?oldAge }'

Pipe from stdin

cat update.json | fluree update

Target a specific ledger

fluree update production -f migration.json

Output

Committed t=3, 4 flakes

With remote mode, the full server response is printed as JSON.

Format Detection

The format is auto-detected using this priority:

  1. Explicit flag (--format) — always wins
  2. File extension (when using -f or a positional file path):
    • .json, .jsonld → JSON-LD
    • .rq, .ru, .sparql → SPARQL UPDATE
  3. Content sniffing:
    • Valid JSON (full parse, not just first character) → JSON-LD
    • Starts with INSERT, DELETE, PREFIX, or BASE → SPARQL UPDATE

Override with --format jsonld or --format sparql.

Comparison with Insert and Upsert

OperationWHERE clauseDELETEConditionalUse case
insertNoNoNoAdd new data
upsertNoAuto (per subject+predicate)NoReplace values for known entities
updateYesExplicitYesTargeted updates, deletes, complex transformations

See Also

fluree query

Query a ledger.

Usage

fluree query [LEDGER] [QUERY] [OPTIONS]

Arguments

ArgumentsBehavior
(none)Active ledger; provide query via -e, -f, or stdin
<arg>Auto-detected: if it looks like a query, uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name
<ledger> <query>Specified ledger + inline query

Options

OptionDescription
-e, --expr <EXPR>Inline query expression (alternative to positional)
-f, --file <FILE>Read query from a file
--format <FORMAT>Output format: json, typed-json, table, csv, or tsv (default: table)
--sparqlForce SPARQL query format
--jsonldForce JSON-LD query format
--at <TIME>Query at a specific point in time
--normalize-arraysAlways wrap multi-value properties in arrays (graph-crawl JSON-LD queries only)
--benchBenchmark mode: time execution only and print the first 5 rows as a table (no full-result JSON formatting)
--explainPrint the query plan without executing it
--remote <NAME>Execute against a remote server (by remote name, e.g., origin)

Description

Executes a query against a ledger. Supports both SPARQL and JSON-LD query formats.

Query Formats

SPARQL

fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'

JSON-LD Query

fluree query '{"select": ["?name"], "where": {"http://example.org/name": "?name"}}'

Format is auto-detected if not specified:

  • Contains SELECT, CONSTRUCT, ASK, or DESCRIBE → SPARQL
  • Otherwise → JSON-LD

Output Formats

JSON (default)

fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
{
  "head": {"vars": ["name"]},
  "results": {"bindings": [{"name": {"type": "literal", "value": "Alice"}}]}
}

Table

fluree query --format table 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
┌───────┐
│ name  │
├───────┤
│ Alice │
│ Bob   │
└───────┘

CSV

fluree query --format csv 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
name
Alice
Bob

Note: --format csv (and --format tsv) are only supported for local ledgers. Tracked/remote ledgers support json and table output.

Time Travel

Query historical states with --at:

# Query at transaction 5
fluree query --at 5 'SELECT * WHERE { ?s ?p ?o }'

# Query at specific commit
fluree query --at abc123def 'SELECT * WHERE { ?s ?p ?o }'

# Query at ISO-8601 timestamp
fluree query --at 2024-01-15T10:30:00Z 'SELECT * WHERE { ?s ?p ?o }'

Tracked/remote ledgers also support --at. The CLI will translate --at into the appropriate dataset/time-travel form when forwarding the query to the remote server.

SPARQL note (remote): if your SPARQL already includes FROM / FROM NAMED, the CLI will not rewrite it for --at. In that case, encode time travel directly in the FROM IRI (e.g., FROM <myledger:main@t:5>).

Examples

# Inline SPARQL query (most common)
fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'

# JSON-LD query inline
fluree query '{"select": {"?s": ["*"]}, "where": {"@id": "?s"}}'

# Query specific ledger with CSV output
fluree query production --format csv 'SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10'

# SPARQL query from file
fluree query -f query.rq

# Time travel query
fluree query --at 3 'SELECT * WHERE { ?s ?p ?o }'

# Pipe from stdin
cat query.rq | fluree query

See Also

fluree history

Show change history for an entity.

Usage

fluree history <ENTITY> [OPTIONS]

Arguments

ArgumentDescription
<ENTITY>Entity IRI (compact or full)

Options

OptionDescription
--ledger <LEDGER>Ledger name (defaults to active ledger)
--from <TIME>Start of time range (default: 1)
--to <TIME>End of time range (default: latest)
-p, --predicate <PRED>Filter to specific predicate
--format <FORMAT>Output format: json, table, or csv (default: table)

Description

Shows the change history for a specific entity across transactions. Each change shows:

  • t - Transaction number
  • op - Operation: + (assert) or - (retract)
  • predicate - The property that changed (if not filtered)
  • value - The value asserted or retracted

Prefix Expansion

Entity IRIs can use stored prefixes:

# First, add a prefix
fluree prefix add ex http://example.org/

# Then use compact IRI
fluree history ex:alice

Or use the full IRI:

fluree history http://example.org/alice

Examples

# Show all changes to an entity
fluree history ex:alice

# Show changes in JSON format
fluree history ex:alice --format json

# Filter to specific predicate
fluree history ex:alice -p ex:name

# Show changes in a time range
fluree history ex:alice --from 1 --to 5

# Query specific ledger
fluree history ex:alice --ledger production

Output

Table (default)

┌───┬────┬─────────────────────────────────┬─────────────┐
│ t │ op │ predicate                       │ value       │
├───┼────┼─────────────────────────────────┼─────────────┤
│ 1 │ +  │ http://example.org/name         │ Alice       │
│ 1 │ +  │ http://example.org/age          │ 30          │
│ 2 │ -  │ http://example.org/name         │ Alice       │
│ 2 │ +  │ http://example.org/name         │ Alice Smith │
└───┴────┴─────────────────────────────────┴─────────────┘

JSON

[
  {"?t": 1, "?op": true, "?p": "http://example.org/name", "?v": "Alice"},
  {"?t": 1, "?op": true, "?p": "http://example.org/age", "?v": 30},
  {"?t": 2, "?op": false, "?p": "http://example.org/name", "?v": "Alice"},
  {"?t": 2, "?op": true, "?p": "http://example.org/name", "?v": "Alice Smith"}
]

CSV

t,op,predicate,value
1,+,http://example.org/name,Alice
1,+,http://example.org/age,30
2,-,http://example.org/name,Alice
2,+,http://example.org/name,Alice Smith

See Also

  • prefix - Manage prefix mappings
  • log - Show commit history
  • query - Run custom queries

fluree export

Export ledger data as Turtle, N-Triples, N-Quads, TriG, or JSON-LD.

Usage

fluree export [LEDGER] [OPTIONS]

Arguments

ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)

Options

OptionDescription
--format <FORMAT>Output format: turtle (or ttl), ntriples (or nt), jsonld, trig, or nquads (default: turtle)
--all-graphsExport default + all named graphs including system graphs (dataset export). Requires --format trig or --format nquads.
--graph <IRI>Export a specific named graph by IRI. Mutually exclusive with --all-graphs.
--context <JSON>JSON-LD context for prefix declarations. Overrides the ledger’s default context.
--context-file <FILE>Read context from a JSON file. Overrides the ledger’s default context.
--at <TIME>Export data as of a specific point in time. Accepts a transaction number (5), ISO-8601 datetime (2024-01-15T10:30:00Z), or commit CID prefix (abc123def456). If omitted, exports at the latest committed time (including data committed but not yet persisted to index).

Formats

turtle / jsonld (data snapshot)

Exports a point-in-time snapshot of all triples in the ledger. Output goes to stdout.

ledger (native pack)

Exports the full native ledger — all commits, transaction blobs, indexes, and dictionaries — as a .flpack file. This format preserves the complete history and can be imported into a new Fluree instance via fluree create <name> --from <file>.flpack.

The .flpack format uses the fluree-pack-v1 binary wire protocol (the same format used by fluree clone and fluree pull for network transfers).

All formats (Turtle, N-Triples, N-Quads, TriG, JSON-LD) read directly from the binary SPOT index with a novelty overlay, so export always includes the latest committed transactions — even those not yet persisted to index. Memory usage stays constant regardless of dataset size. JSON-LD streams one subject at a time, so memory is O(largest subject), not O(dataset).

Prefixes / Context

Turtle, TriG, and JSON-LD output use prefix compaction to produce compact, readable output. The prefix map is resolved in this order:

  1. --context or --context-file (explicit override)
  2. The ledger’s default context (set via fluree context set)
  3. No prefixes (falls back to full IRIs)

The context format is a JSON object mapping prefixes to namespace IRIs:

{"ex": "http://example.org/", "schema": "http://schema.org/"}

Prerequisites

All export formats require a binary index. Ledgers that have only been created and inserted into (without an index build) cannot be exported. Run the server to trigger index building first.

Examples

# Export as Turtle (default) — uses ledger's default context for prefixes
fluree export > backup.ttl

# Export as Turtle with custom prefixes
fluree export --context '{"ex": "http://example.org/"}' > backup.ttl

# Export as Turtle with prefixes from a file
fluree export --context-file prefixes.json > backup.ttl

# Export as N-Triples (no prefixes, one triple per line)
fluree export --format ntriples > backup.nt

# Export as JSON-LD
fluree export --format jsonld > backup.jsonld

# Export all graphs as TriG
fluree export --all-graphs --format trig > backup.trig

# Export all graphs as N-Quads
fluree export --all-graphs --format nquads > backup.nq

# Export a specific named graph
fluree export --graph "http://example.org/g1" --format turtle > g1.ttl

# Export data as of a specific transaction number
fluree export --at 5 > snapshot-at-t5.ttl

# Export data as of an ISO-8601 datetime
fluree export --at "2024-06-15T12:00:00Z" > snapshot.ttl

# Export data as of a specific commit
fluree export --at abc123def456 > at-commit.ttl

# Export specific ledger
fluree export production > prod-backup.ttl

# Pipe to other tools
fluree export | grep "example.org"

Output

Turtle (default)

@prefix ex: <http://example.org/> .

ex:alice
    a ex:Person ;
    ex:name "Alice" .
ex:bob
    a ex:Person ;
    ex:name "Bob" .

N-Triples

<http://example.org/alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Person> .
<http://example.org/alice> <http://example.org/name> "Alice" .

TriG (all graphs)

@prefix ex: <http://example.org/> .

ex:alice
    ex:name "Alice" .

GRAPH ex:g1 {
ex:bob
    ex:name "Bob" .
}

N-Quads (all graphs)

<http://example.org/alice> <http://example.org/name> "Alice" .
<http://example.org/bob> <http://example.org/name> "Bob" <http://example.org/g1> .

JSON-LD

{
  "@context": {
    "ex": "http://example.org/"
  },
  "@graph": [
    {"@id": "ex:alice", "@type": "ex:Person", "ex:name": "Alice"},
    {"@id": "ex:bob", "@type": "ex:Person", "ex:name": "Bob", "ex:age": {"@value": 25, "@type": "http://www.w3.org/2001/XMLSchema#long"}}
  ]
}

JSON-LD output uses prefix compaction from the context. Value encoding rules:

  • Plain strings (xsd:string) → JSON string (no @type)
  • Booleans → native JSON true/false
  • Integers/longs → {"@value": 42, "@type": "xsd:long"} (explicit datatype)
  • Decimals → {"@value": "3.14", "@type": "xsd:decimal"}
  • Doubles → {"@value": 3.14, "@type": "xsd:double"}
  • Language-tagged strings → {"@value": "Bonjour", "@language": "fr"}
  • References → {"@id": "ex:other"}
  • Single-cardinality properties are unwrapped (not in [])
  • Multi-cardinality properties use arrays

API Usage

The export feature is available at the API level for upstream applications:

#![allow(unused)]
fn main() {
use fluree_db_api::export::ExportFormat;

// Turtle with default context
let stats = fluree.export("mydb")
    .format(ExportFormat::Turtle)
    .write_to(&mut writer)
    .await?;

// N-Quads with all graphs
let stats = fluree.export("mydb")
    .format(ExportFormat::NQuads)
    .all_graphs()
    .write_to(&mut writer)
    .await?;

// Turtle with custom prefixes
let stats = fluree.export("mydb")
    .format(ExportFormat::Turtle)
    .context(&json!({"ex": "http://example.org/"}))
    .write_to(&mut writer)
    .await?;

// JSON-LD with prefix compaction
let stats = fluree.export("mydb")
    .format(ExportFormat::JsonLd)
    .context(&json!({"ex": "http://example.org/"}))
    .write_to(&mut writer)
    .await?;

// Export a specific named graph
let stats = fluree.export("mydb")
    .format(ExportFormat::Turtle)
    .graph("http://example.org/g1")
    .write_to(&mut writer)
    .await?;

// Time-travel: export as of transaction t=5
let stats = fluree.export("mydb")
    .format(ExportFormat::Turtle)
    .as_of(TimeSpec::at_t(5))
    .write_to(&mut writer)
    .await?;

// Time-travel: export as of an ISO-8601 datetime
let stats = fluree.export("mydb")
    .format(ExportFormat::Turtle)
    .as_of(TimeSpec::at_time("2024-06-15T12:00:00Z"))
    .write_to(&mut writer)
    .await?;

// Convenience: write directly to stdout
let stats = fluree.export("mydb")
    .format(ExportFormat::Turtle)
    .to_stdout()
    .await?;
}

See Also

  • context - Manage default JSON-LD context (prefix map)
  • query - Run custom queries

fluree context

Manage the default JSON-LD context for a ledger.

Usage

fluree context <COMMAND>

Subcommands

CommandDescription
get [LEDGER]Show the default JSON-LD context
set [LEDGER]Replace the default JSON-LD context

Description

Each ledger can have a default context — a JSON object mapping prefixes to IRIs (e.g., {"ex": "http://example.org/"}). When a JSON-LD query is sent via the CLI and omits its own @context, the ledger’s default context is injected automatically. The HTTP API requires ?default-context=true to opt in per request, and fluree-db-api requires explicit opt-in via its default-context view builders.

Default context is populated automatically during bulk import (from Turtle @prefix declarations). This command allows reading or replacing it after the fact.

The context is stored in content-addressed storage (CAS) and referenced from the nameservice config. Updates use compare-and-set semantics, so concurrent writers are safely handled.

context get

Show the current default context.

fluree context get [LEDGER]
ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)

Examples

# Show context for active ledger
fluree context get

# Show context for a specific ledger
fluree context get mydb

Output (pretty-printed JSON):

{
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "xsd": "http://www.w3.org/2001/XMLSchema#",
  "owl": "http://www.w3.org/2002/07/owl#",
  "ex": "http://example.org/"
}

If no default context has been set, a message is printed to stderr.

context set

Replace the default context with a new JSON object.

fluree context set [LEDGER] [OPTIONS]
ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)
OptionDescription
-e, --expr <JSON>Inline JSON context
-f, --file <PATH>Read context from a JSON file

If neither -e nor -f is provided, context is read from stdin.

The body can be either a bare JSON object or wrapped in {"@context": {...}} — both forms are accepted.

Examples

# Set inline
fluree context set mydb -e '{"ex": "http://example.org/", "foaf": "http://xmlns.com/foaf/0.1/"}'

# Set from file
fluree context set mydb -f context.json

# Pipe from stdin
cat context.json | fluree context set mydb

# Wrapped form also accepted
fluree context set mydb -e '{"@context": {"ex": "http://example.org/"}}'

See Also

fluree log

Show commit log for a ledger.

Usage

fluree log [LEDGER] [OPTIONS]

Arguments

ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)

Options

OptionDescription
--onelineShow one-line summary per commit
-n, --count <N>Maximum number of commits to show

Description

Displays the commit history for a ledger, similar to git log. Shows transaction numbers, timestamps, and commit details.

Examples

# Show full commit log
fluree log

# Show last 5 commits
fluree log -n 5

# One-line format
fluree log --oneline

# Specific ledger
fluree log production --oneline -n 10

Output

Full Format (default)

commit bafybeig2k5...
t: 3
Date: 2024-01-15T10:30:00Z

    Added new users

commit bafybeig7x3...
t: 2
Date: 2024-01-14T09:15:00Z

commit bafybeig9m1...
t: 1
Date: 2024-01-13T08:00:00Z

    Initial data load

One-line Format

bafybeig2k5 t=3 Added new users
bafybeig7x3 t=2
bafybeig9m1 t=1 Initial data load

See Also

  • show - Show decoded contents of a specific commit
  • info - Show ledger details
  • history - Show entity change history

fluree show

Show the decoded contents of a commit — assertions and retractions with resolved IRIs.

Usage

fluree show <COMMIT> [OPTIONS]

Arguments

ArgumentDescription
<COMMIT>Commit identifier: t:<N> transaction number, hex-digest prefix (min 6 chars), or full CID

Options

OptionDescription
--ledger <NAME>Ledger name (defaults to active ledger)
--remote <NAME>Execute against a remote server (by remote name, e.g., “origin”)

Description

Displays the full decoded contents of a single commit, similar to git show. Each flake (assertion or retraction) is rendered with IRIs compacted using the ledger’s namespace prefix table.

The commit identifier can be:

  • A transaction number prefixed with t: (e.g., t:5) as shown in fluree log output
  • An abbreviated hex digest (minimum 6 characters) as shown in the storage directory or obtained from the txn-meta graph
  • A full CID string (e.g., bagaybqabciq...)

Policy Filtering

When executed against a remote server (--remote), the returned flakes are filtered by the server’s data-auth policy. The identity is derived from the Bearer token and the policy class from the server’s default_policy_class configuration. Flakes the caller is not permitted to read are silently omitted, and the asserts/retracts counts reflect only the visible flakes.

Unlike the query endpoints, show does not support per-request policy overrides via headers or request body — it uses only the Bearer token identity and server-configured default policy class.

When executed locally (no --remote, or with --direct), fluree show operates with full local-admin access and no policy filtering is applied. This is consistent with other local CLI operations that read directly from storage.

Output Format

The output is a JSON object containing:

FieldDescription
idFull CID of the commit
tTransaction number
timeISO 8601 timestamp
sizeCommit blob size in bytes
previousPrevious commit CID
signerTransaction signer (if signed)
assertsNumber of assertion flakes
retractsNumber of retraction flakes
@contextNamespace prefix table (prefix → IRI)
flakesArray of flake tuples in SPOT order

Each flake is a tuple: [subject, predicate, object, datatype, operation]

  • operation: true = assert (added), false = retract (removed)
  • Ref objects use "@id" as the datatype
  • When metadata is present (language tag, list index, or named graph), a 6th element is appended: {"lang": "en", "i": 0, "graph": "ex:myGraph"}

Examples

# Show a commit by transaction number
fluree show t:5

# Show a commit by hex prefix
fluree show 3dd028

# Show a commit from a specific ledger
fluree show 0303b7 --ledger _system

# Show a commit on a remote server
fluree show t:5 --remote origin

# Show by hex prefix on remote with explicit ledger
fluree show 3dd028 --remote origin --ledger mydb

# Pipe to jq for filtering
fluree show 3dd028 | jq '.flakes[] | select(.[4] == true)'

Example Output

{
  "id": "bagaybqabciqd3ubikmk2zh6gjxngpgjja3vi5myleidf46htiybpswyy2665zra",
  "t": 40,
  "time": "2026-03-12T16:58:18.395474217+00:00",
  "size": 327,
  "previous": "bagaybqabciqc64dbbv46vrueddgqfrafgmo27u4fibkrvwdmr2g6ze4cbaeg23a",
  "asserts": 1,
  "retracts": 1,
  "@context": {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "schema": "http://schema.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "flakes": [
    ["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T14:15:30Z", "xsd:string", false],
    ["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T16:58:16Z", "xsd:string", true]
  ]
}

In this example, one property (dateModified) was updated: the old value was retracted (false) and the new value asserted (true).

See Also

  • log - Show commit log (list of commits)
  • history - Show change history for a specific entity
  • info - Show ledger details

fluree index

Build or update the binary index for a ledger.

Usage

fluree index [LEDGER]

Arguments

ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)

Description

Performs incremental indexing when possible — merges only new commits into the existing index. Falls back to a full rebuild if incremental indexing isn’t possible (e.g., no prior index exists).

Run this after transactions to clear the novelty layer and speed up queries. For routine use this is preferred over reindex, which always rebuilds from scratch.

Examples

# Index the active ledger
fluree index

# Index a specific ledger
fluree index mydb

Output

Indexed mydb to t=15 (root: bafyreig...)

When to Use

  • After bulk transactions — clears accumulated novelty so queries hit the optimized binary index instead of scanning in-memory flakes.
  • Routine maintenance — keeps query performance consistent as data grows.
  • After clone --no-indexes or pull --no-indexes — builds the local index that was skipped during transfer.

For a clean rebuild from commit history (e.g., suspected corruption), use reindex instead.

See Also

fluree reindex

Full reindex from commit history.

Usage

fluree reindex [LEDGER]

Arguments

ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)

Description

Rebuilds the binary index from scratch by replaying all commits in order. This is a heavier operation than index — use it when the index is corrupted, missing, or you want a guaranteed clean rebuild.

For routine indexing after transactions, prefer index.

Examples

# Reindex the active ledger
fluree reindex

# Reindex a specific ledger
fluree reindex mydb

Output

Reindexed mydb to t=15 (root: bafyreig...)

When to Use

  • Suspected index corruption — query results seem wrong or incomplete.
  • After schema or configuration changes that affect index structure.
  • Clean slate — you want to guarantee the index matches the commit history exactly.

For incremental indexing (faster, merges only new commits), use index instead.

See Also

fluree config

Manage configuration settings.

Usage

fluree config <COMMAND>

Subcommands

CommandDescription
get <KEY>Get a configuration value
set <KEY> <VALUE>Set a configuration value
listList all configuration values
set-origins <LEDGER> --file <PATH>Set CID fetch origins for a ledger (writes a LedgerConfig to CAS and updates config_id)

Description

Manages configuration stored in .fluree/config.toml. Configuration uses dotted keys for nested values (e.g., storage.path).

Examples

Get a value

fluree config get storage.path

Output:

/custom/storage/path

Set a value

fluree config set storage.path /custom/storage/path

Output:

Set 'storage.path' = "/custom/storage/path"

List all values

fluree config list

Output:

storage.path = "/custom/storage/path"
storage.encryption = "aes256"

If no configuration is set:

(no configuration set)

Configuration File

Configuration is stored in .fluree/config.toml:

[storage]
path = "/custom/storage/path"
encryption = "aes256"

Errors

Getting a key that doesn’t exist:

error: configuration key 'nonexistent' is not set

See Also

  • init - Initialize project directory
  • prefix - Manage IRI prefixes

fluree config set-origins

Store a LedgerConfig blob in local CAS and update the ledger’s nameservice record to point to it via config_id.

This enables origin-based fluree pull (when no upstream remote is configured) and improves fluree clone --origin by allowing the remote to advertise multiple fallback origins.

Usage

fluree config set-origins <LEDGER> --file <PATH>

Arguments

ArgumentDescription
<LEDGER>Ledger ID (e.g., mydb or mydb:main)

Options

OptionDescription
--file <PATH>Path to a JSON file containing a LedgerConfig

LedgerConfig File Format

The file is canonical JSON using compact f: keys (not JSON-LD):

{
  "f:origins": [
    { "f:priority": 10, "f:enabled": true, "f:transport": "http://localhost:8090", "f:auth": { "f:mode": "none" } }
  ],
  "f:replication": { "f:preferPack": true, "f:maxPackMiB": 64 }
}

Notes:

  • f:transport is an origin base URL. The CLI normalizes it the same way as remotes: it will append /fluree if missing and will use GET /.well-known/fluree.json discovery when available.
  • Auth requirements are declarative. Credentials are not stored in the LedgerConfig.

Current Limitations

  • fluree pull via origins currently does not attach a Bearer token from any credential store, so only origins with f:auth.f:mode = "none" are usable for pull today.
  • fluree clone --origin ... --token ... can use a Bearer token for origin fetch.

fluree prefix

Manage IRI prefix mappings.

Usage

fluree prefix <COMMAND>

Subcommands

CommandDescription
add <PREFIX> <IRI>Add a prefix mapping
remove <PREFIX>Remove a prefix mapping
listList all prefix mappings

Description

Manages IRI prefix mappings stored in .fluree/prefixes.json. These prefixes are used to expand compact IRIs in commands like history.

Examples

Add a prefix

fluree prefix add ex http://example.org/
fluree prefix add foaf http://xmlns.com/foaf/0.1/
fluree prefix add schema https://schema.org/

Output:

Added prefix: ex = <http://example.org/>

List prefixes

fluree prefix list

Output:

ex: <http://example.org/>
foaf: <http://xmlns.com/foaf/0.1/>
schema: <https://schema.org/>

If no prefixes are defined:

(no prefixes defined)

Add prefixes with: fluree prefix add <prefix> <iri>
Example: fluree prefix add ex http://example.org/

Remove a prefix

fluree prefix remove foaf

Output:

Removed prefix: foaf

Usage with History

Once prefixes are defined, you can use compact IRIs:

# Instead of:
fluree history http://example.org/alice

# Use:
fluree history ex:alice

IRI Best Practices

IRI namespaces should end with / or #:

# Good
fluree prefix add ex http://example.org/
fluree prefix add foaf http://xmlns.com/foaf/0.1/

# Warning (will still work but may cause issues)
fluree prefix add bad http://example.org

Storage

Prefixes are stored in .fluree/prefixes.json:

{
  "ex": "http://example.org/",
  "foaf": "http://xmlns.com/foaf/0.1/"
}

See Also

  • history - Uses prefix expansion
  • config - Manage other configuration

fluree token

Manage JWS tokens for authentication with Fluree servers.

Subcommands

SubcommandDescription
createCreate a new JWS token
keygenGenerate a new Ed25519 keypair
inspectDecode and verify a JWS token

fluree token create

Create a new JWS token for authenticating with Fluree servers.

Usage

fluree token create --private-key <KEY> [OPTIONS]

Options

OptionDescription
--private-key <KEY>Required. Ed25519 private key (hex, base58, @filepath, or @- for stdin)
--expires-in <DUR>Token lifetime (default: 1h). Supports s, m, h, d, w suffixes
--subject <SUB>Subject claim (sub) - identity of the token holder
--audience <AUD>Audience claim (aud) - repeatable for multiple audiences
--identity <ID>Fluree identity claim (fluree.identity) - takes precedence over sub for policy
--allGrant full access to all ledgers (events, storage, read, and write)
--events-ledger <ALIAS>Grant events access to specific ledger (repeatable)
--storage-ledger <ALIAS>Grant storage access to specific ledger (repeatable)
--read-allGrant data API read access to all ledgers (fluree.ledger.read.all=true)
--read-ledger <ALIAS>Grant data API read access to specific ledger (repeatable)
--write-allGrant data API write access to all ledgers (fluree.ledger.write.all=true)
--write-ledger <ALIAS>Grant data API write access to specific ledger (repeatable)
--graph-source <ALIAS>Grant access to specific graph source (repeatable)
--output <FMT>Output format: token, json, or curl (default: token)
--print-claimsPrint decoded claims to stderr

Private Key Formats

FormatExample
Hex0x<64 hex chars> or <64 hex chars>
Base58z<base58 string> (multibase) or raw base58
File@/path/to/keyfile or @~/.fluree/key (tilde expansion)
Stdin@- (read from stdin to avoid shell history)

Examples

# Create a token with full access
fluree token create --private-key 0x1234...abcd --all

# Create a token for specific ledgers (events/storage)
fluree token create --private-key @~/.fluree/key \
  --events-ledger mydb --storage-ledger mydb

# Create a token with data API read+write for specific ledgers
fluree token create --private-key @~/.fluree/key \
  --read-ledger mydb:main --write-ledger mydb:main

# Create a token with identity and audience
fluree token create --private-key @- \
  --identity did:example:alice \
  --audience https://api.example.com \
  --expires-in 7d

# Output as curl command
fluree token create --private-key 0x... --all --output curl

# View claims while creating
fluree token create --private-key 0x... --all --print-claims

fluree token keygen

Generate a new Ed25519 keypair for signing tokens.

Usage

fluree token keygen [OPTIONS]

Options

OptionDescription
--format <FMT>Output format: hex, base58, or json (default: hex)
-o, --output <PATH>Write private key to file (otherwise prints to stdout)

Examples

# Generate keypair in hex format
fluree token keygen

# Generate in JSON format with all representations
fluree token keygen --format json

# Save private key to file
fluree token keygen --output ~/.fluree/key

# Generate base58 format
fluree token keygen --format base58

Output

Hex format:

Private key: 0x1234567890abcdef...
Public key:  0xabcdef1234567890...
DID:         did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK

JSON format:

{
  "private_key": {
    "hex": "0x1234...",
    "base58": "z..."
  },
  "public_key": {
    "hex": "0xabcd...",
    "base58": "z..."
  },
  "did": "did:key:z6Mk..."
}

fluree token inspect

Decode and optionally verify a JWS token.

Usage

fluree token inspect <TOKEN> [OPTIONS]

Arguments

ArgumentDescription
<TOKEN>JWS token string or @filepath

Options

OptionDescription
--no-verifySkip signature verification (default: verify)
--output <FMT>Output format: pretty, json, or table (default: pretty)

Examples

# Inspect and verify a token
fluree token inspect eyJhbGciOiJFZERTQSI...

# Inspect without verification
fluree token inspect eyJ... --no-verify

# Output as JSON
fluree token inspect eyJ... --output json

# Read token from file
fluree token inspect @token.txt

Output

Pretty format:

Token Information
─────────────────────────────────────────────────────
Issuer:   did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
Subject:  test@example.com
Issued:   2024-01-15 10:30:00 UTC
Expires:  2024-01-15 11:30:00 UTC

Permissions:
  Events:  all ledgers
  Storage: all ledgers

Signature: ✓ Valid

Token Scopes

Tokens can carry different permission scopes that control access to different server features:

ScopeClaimControls
Events (all)fluree.events.allSSE event stream for all ledgers
Events (specific)fluree.events.ledgersSSE event stream for listed ledgers
Storage (all)fluree.storage.allStorage proxy read access (all); also implies data API read
Storage (specific)fluree.storage.ledgersStorage proxy read access (listed); also implies data API read
Read (all)fluree.ledger.read.allData API query access to all ledgers
Read (specific)fluree.ledger.read.ledgersData API query access to listed ledgers
Write (all)fluree.ledger.write.allData API write access to all ledgers
Write (specific)fluree.ledger.write.ledgersData API write access to listed ledgers

The --all flag sets events, storage, read, and write access for all ledgers.

Back-compat: fluree.storage.* claims also grant data API read access for the same ledgers.

See Also

  • auth - Store/manage tokens on remotes
  • remote - Configure remote servers
  • Authentication - Auth model, modes, and token claims
  • fetch - Fetch from remotes (requires auth token)
  • push - Push to remotes (requires auth token)

fluree auth

Manage authentication tokens for remote servers. Tokens are stored in .fluree/config.toml as part of the remote configuration.

Token values are never printed to stdout. The status command shows token presence, expiry, and identity only.

Subcommands

SubcommandDescription
statusShow authentication status for a remote
loginStore a bearer token for a remote
logoutClear the stored token for a remote

fluree auth status

Show the current authentication state for a remote, including token presence, expiry time, identity, and issuer.

Usage

fluree auth status [OPTIONS]

Options

OptionDescription
--remote <NAME>Remote name (defaults to the only configured remote)

Examples

# Show auth status (single remote)
fluree auth status

# Show auth status for a specific remote
fluree auth status --remote origin

Output

When a token is configured:

Auth Status:
  Remote: origin
  Token:  configured
  Expiry: 2026-02-15 12:00 UTC
  Identity: did:example:alice
  Issuer: did:key:z6Mk...
  Subject: alice@example.com

When no token is configured:

Auth Status:
  Remote: origin
  Token:  not configured
  hint: fluree auth login --remote origin

fluree auth login

Store a bearer token for a remote. The token is saved in .fluree/config.toml and will be sent as a Authorization: Bearer header on subsequent remote operations (fetch, pull, push, query --remote, etc.).

Usage

fluree auth login [OPTIONS]

Options

OptionDescription
--remote <NAME>Remote name (defaults to the only configured remote)
--token <VALUE>Token value, @filepath to read from file, or @- for stdin

If --token is omitted, you will be prompted to paste the token interactively.

Token Input Methods

MethodExample
Inline value--token eyJhbG...
File--token @/path/to/token.jwt
File (tilde)--token @~/.fluree/token.jwt
Stdin--token @- (pipe or redirect)
InteractiveOmit --token to be prompted

Examples

# Store a token (prompted interactively)
fluree auth login

# Store a token from a value
fluree auth login --token eyJhbGciOiJFZERTQSI...

# Store a token from a file
fluree auth login --token @~/.fluree/my-token.jwt

# Pipe a token from another command
fluree token create --private-key @~/.fluree/key --all | fluree auth login --token @-

# Login to a specific remote
fluree auth login --remote staging --token @token.jwt

Output

Token stored for remote 'origin'
  Expiry: 2026-02-15 12:00 UTC
  Identity: did:example:alice

fluree auth logout

Clear the stored token for a remote.

Usage

fluree auth logout [OPTIONS]

Options

OptionDescription
--remote <NAME>Remote name (defaults to the only configured remote)

Examples

# Clear token for the default remote
fluree auth logout

# Clear token for a specific remote
fluree auth logout --remote staging

Output

Token cleared for remote 'origin'

Token Types

The auth command stores bearer tokens that are sent in the Authorization header. Fluree supports two types of bearer tokens:

Ed25519 JWS Tokens (did:key)

Created locally with fluree token create. These contain an embedded JWK (JSON Web Key) in the token header and are verified against the embedded public key. The issuer is a did:key identifier derived from the signing key.

# Create and store a token in one step
fluree token create --private-key @~/.fluree/key --all | fluree auth login --token @-

OIDC/JWKS Tokens (RS256)

Issued by external identity providers (OIDC). These contain a kid (Key ID) in the token header and are verified by the server against the provider’s JWKS (JSON Web Key Set) endpoint. The issuer is the provider’s URL.

The server must be configured with --jwks-issuer to trust these tokens. See Configuration.

Remote Resolution

When --remote is omitted:

  • If exactly one remote is configured, it is used automatically.
  • If no remotes are configured, an error is shown with a hint to use fluree remote add.
  • If multiple remotes are configured, an error asks you to specify --remote <name>.

Security Notes

  • Tokens are stored in plaintext in .fluree/config.toml. Protect this file with appropriate filesystem permissions.
  • The status command never displays the raw token value.
  • On 401 errors from remote operations, the CLI checks token expiry and suggests fluree auth login if the token appears expired.

OIDC login flow

When a remote is configured with auth.type = "oidc_device" (auto-discovered from the server’s /.well-known/fluree.json), fluree auth login runs an OIDC interactive login flow and then exchanges the IdP token for a Fluree-scoped Bearer token:

  1. Discovers OIDC endpoints from the configured issuer
  2. Chooses the flow based on IdP support:
    • If the IdP discovery document includes device_authorization_endpoint: use OAuth device-code (prints a URL + code and polls).
    • Otherwise, if it includes authorization_endpoint: use OAuth authorization-code + PKCE (opens a browser and receives a localhost callback).
  3. Exchanges the IdP token for a Fluree-scoped Bearer token via the server’s exchange_url
  4. Stores the token (and optional refresh token) in the remote config

Cognito note (Authorization Code + PKCE)

AWS Cognito does not publish device_authorization_endpoint, so the CLI will use authorization-code + PKCE.

Cognito requires the callback URL to be pre-allowlisted (no wildcard ports). Allowlist:

  • http://127.0.0.1:8400/callback
  • http://127.0.0.1:8401/callback
  • http://127.0.0.1:8402/callback
  • http://127.0.0.1:8403/callback
  • http://127.0.0.1:8404/callback
  • http://127.0.0.1:8405/callback

If your app only allowlists one callback URL, configure a fixed port with redirect_port in /.well-known/fluree.json (or set FLUREE_AUTH_PORT locally) and allowlist that single callback URL.

On subsequent 401 errors, the CLI automatically attempts a silent refresh using the stored refresh token before prompting for re-login.

See Auth contract (CLI ↔ Server) for the full protocol specification.

See Also

fluree remote

Manage remote servers for syncing ledgers.

Subcommands

SubcommandDescription
addAdd a remote server
removeRemove a remote
listList all configured remotes
showShow details for a remote

fluree remote add

Add a remote server configuration.

Usage

fluree remote add <NAME> <URL> [OPTIONS]

Arguments

ArgumentDescription
<NAME>Remote name (e.g., origin)
<URL>Server URL (e.g., http://localhost:8090)

Options

OptionDescription
--token <TOKEN>Authentication token (or @filepath to read from file)

Examples

# Add a remote without authentication
fluree remote add origin http://localhost:8090

# Add a remote with inline token
fluree remote add prod https://api.example.com --token eyJ...

# Add a remote with token from file
fluree remote add staging https://staging.example.com --token @~/.fluree/staging-token

fluree remote remove

Remove a remote configuration.

Usage

fluree remote remove <NAME>

Arguments

ArgumentDescription
<NAME>Remote name to remove

Examples

fluree remote remove origin

fluree remote list

List all configured remotes.

Usage

fluree remote list

Output

┌─────────┬─────────────────────────────┬───────┐
│ Name    │ URL                         │ Auth  │
├─────────┼─────────────────────────────┼───────┤
│ origin  │ http://localhost:8090       │ none  │
│ prod    │ https://api.example.com     │ token │
└─────────┴─────────────────────────────┴───────┘

fluree remote show

Show detailed information about a remote.

Usage

fluree remote show <NAME>

Arguments

ArgumentDescription
<NAME>Remote name

Output

Remote:
  Name: origin
  Type: HTTP
  URL:  http://localhost:8090
  Auth: token configured

See Also

  • upstream - Configure upstream tracking
  • clone - Clone a ledger from a remote
  • fetch - Fetch refs from a remote
  • token - Create authentication tokens

fluree upstream

Manage upstream tracking configuration for ledgers.

Upstream configuration links a local ledger to a remote ledger, enabling pull and push operations.

Subcommands

SubcommandDescription
setSet upstream tracking for a ledger
removeRemove upstream tracking
listList all upstream configurations

fluree upstream set

Configure a local ledger to track a remote ledger.

Usage

fluree upstream set <LOCAL> <REMOTE> [OPTIONS]

Arguments

ArgumentDescription
<LOCAL>Local ledger ID (e.g., mydb or mydb:main)
<REMOTE>Remote name (e.g., origin)

Options

OptionDescription
--remote-alias <ALIAS>Remote ledger ID (defaults to local ledger ID)
--auto-pullAutomatically pull on fetch

Examples

# Track remote ledger with same name
fluree upstream set mydb origin

# Track a differently-named remote ledger
fluree upstream set mydb origin --remote-alias production-db

# Enable auto-pull on fetch
fluree upstream set mydb origin --auto-pull

fluree upstream remove

Remove upstream tracking for a ledger.

Usage

fluree upstream remove <LOCAL>

Arguments

ArgumentDescription
<LOCAL>Local ledger ID

Examples

fluree upstream remove mydb

fluree upstream list

List all configured upstream tracking relationships.

Usage

fluree upstream list

Output

┌────────────┬─────────┬────────────────┬───────────┐
│ Local      │ Remote  │ Remote Alias   │ Auto-Pull │
├────────────┼─────────┼────────────────┼───────────┤
│ mydb:main  │ origin  │ mydb           │ no        │
│ test:main  │ staging │ test-ledger    │ yes       │
└────────────┴─────────┴────────────────┴───────────┘

See Also

  • remote - Configure remote servers
  • clone - Clone a ledger from a remote
  • pull - Pull from upstream
  • push - Push to upstream

fluree fetch

Fetch refs from a remote server (similar to git fetch).

Usage

fluree fetch <REMOTE>

Arguments

ArgumentDescription
<REMOTE>Remote name (e.g., origin)

Description

Fetches ledger references from a remote server and updates local tracking data. This does not modify your local ledgers - it only updates what the CLI knows about the remote’s state.

This is a replication operation. It requires a Bearer token with root / storage-proxy permissions (fluree.storage.*). If you only have permissioned/query access to a ledger, you should use fluree track (or --remote) and run queries/transactions against the remote instead.

After fetching, you can use pull to download and apply new commits to your local ledger.

Examples

# Fetch from origin
fluree fetch origin

# Typical workflow
fluree fetch origin
fluree pull mydb

Output

Fetching from 'origin'...
Updated:
  mydb -> t=42
  testdb -> t=15
Already up to date: 2 ledger(s) unchanged

If no ledgers are found:

Fetching from 'origin'...
No ledgers found on remote.

See Also

  • remote - Configure remote servers
  • clone - Clone a ledger from a remote
  • pull - Pull commits from upstream
  • push - Push to upstream

fluree pull

Pull commits from upstream and apply them to the local ledger, similar to git pull.

Usage

fluree pull [OPTIONS] [LEDGER]

Arguments

ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)
--no-indexesSkip pulling binary index data; only transfer new commits and txn blobs (local index may lag until you run fluree reindex)

Description

Downloads new commits from the configured upstream and applies them to the local ledger:

  1. Queries the remote for its current head (t and commit ContentId)
  2. Compares with the local head; exits early if already up to date
  3. Attempts bulk download of missing commits (and by default index artifacts) via the pack protocol (single streaming request)
  4. Falls back to paginated JSON export if the server does not support pack
  5. Stores all commit and transaction blobs to local CAS
  6. When index data is requested and transferred, advances the local index head to match the remote
  7. Advances the local commit head to the remote head

Index transfer

As with clone, pull uses the pack protocol to request index artifacts by default when the remote has an index. Use --no-indexes to transfer only new commits and txn blobs. For large estimated transfers (~1 GiB or more), the CLI prompts for confirmation before streaming.

Transport

Pull uses the same pack protocol as clone – see clone: Transport for details.

Origin-based pull

When no upstream remote is configured, pull falls back to origin-based fetching if a LedgerConfig with origins is set on the ledger (see fluree config set-origins). This uses the same pack-first / CID-walk-fallback transport as fluree clone --origin.

This is a replication operation. It requires a Bearer token with root / storage-proxy permissions (fluree.storage.*). If you only have permissioned/query access to a ledger, you should use fluree track (or --remote) and run queries/transactions against the remote instead.

The ledger must have an upstream configured (see fluree upstream set), or a LedgerConfig with origins (see fluree config set-origins).

Restart safety: If interrupted, the local head reflects the last successful import. The next pull resumes from the local head automatically.

Examples

# Pull changes for active ledger
fluree pull

# Pull changes for specific ledger
fluree pull mydb

# Pull commits only (skip index transfer)
fluree pull --no-indexes mydb

Output

Successful pull (with index data when remote has an index):

Pulling 'mydb:main' from 'origin' (local t=10, remote t=42)...
✓ 'mydb:main' pulled 32 commit(s) via pack (new head t=42)

With --no-indexes, only commits (and referenced txn blobs) are transferred; the message does not include index artifact counts.

Already up to date:

✓ 'mydb:main' is already up to date

No upstream configured:

error: no upstream configured for 'mydb:main'
  hint: fluree upstream set mydb:main <remote>

Errors

ErrorDescription
No upstream configuredRun fluree upstream set <ledger> <remote> first, or configure origins via fluree config set-origins
Ancestry mismatchRemote chain does not descend from local head (histories diverged)
Import validation failureCommit chain or retraction invariant violation

Limitations

  • Index head vs commit head: When you use --no-indexes, the local index head is not updated. Queries still work but may replay more novelty; run fluree reindex to bring the index up to the current commit head.
  • Graph source indexes not replicated: Graph source snapshots (BM25/vector/geo, etc.) are not replicated by fluree pull yet. Rebuild graph source indexes in the target environment as needed.

See Also

  • clone - Clone a ledger from a remote server
  • upstream - Configure upstream tracking
  • fetch - Fetch refs without modifying local ledger
  • push - Push local changes to upstream

fluree push

Push local ledger changes to upstream remote, similar to git push.

Usage

fluree push [LEDGER]

Arguments

ArgumentDescription
[LEDGER]Ledger name (defaults to active ledger)

Description

Pushes local commits to the configured upstream remote by uploading the commit v2 bytes to the server.

The ledger must have an upstream configured (see fluree upstream set).

The push uses strict sequencing + CAS semantics:

  • The server rejects the push if the remote head is not in your local history (diverged) or if the remote is ahead.
  • The server also rejects the push if the first commit’s t does not match the server’s next-t.

Unlike fetch/pull, this is not a storage-proxy replication operation. It requires write permissions for the ledger (Bearer token with fluree.ledger.write.* claims) and the server validates the pushed commits like normal transactions.

If a pushed commit contains retractions, the server enforces a strict invariant: each retraction must target a fact that is currently asserted at that point in the push batch. (List retractions require exact list-index metadata match.)

Examples

# Push active ledger
fluree push

# Push specific ledger
fluree push mydb

Output

Successful push:

Pushing 'mydb:main' to 'origin'...
✓ 'mydb:main' pushed 3 commit(s) (new head t=42)

Push rejected (remote is ahead):

Pushing 'mydb:main' to 'origin'...
error: push rejected; remote is ahead (local t=10, remote t=42). Pull first.

No upstream configured:

error: no upstream configured for 'mydb:main'
  hint: fluree upstream set mydb:main <remote>

Errors

ErrorDescription
No upstream configuredRun fluree upstream set <ledger> <remote> first
Push rejected (409)Remote head changed, histories diverged, or first commit t does not match next-t
Push rejected (422)Invalid commit bytes, missing required referenced blob, or retraction invariant violation

Workflow

Typical sync workflow:

# Configure remote and upstream (one time)
fluree remote add origin https://api.example.com --token @~/.fluree/token
fluree upstream set mydb origin

# Daily workflow
fluree pull mydb        # Get latest changes
# ... make local changes ...
fluree push mydb        # Push your changes

See Also

  • clone - Clone a ledger from a remote
  • upstream - Configure upstream tracking
  • pull - Pull changes from upstream
  • fetch - Fetch refs without modifying local ledger

fluree publish

Publish a local ledger to a remote server. Creates the ledger on the remote if it doesn’t exist, pushes all local commits, and configures upstream tracking for subsequent push/pull.

Usage

fluree publish <REMOTE> [LEDGER] [OPTIONS]

Arguments

ArgumentDescription
<REMOTE>Remote name (e.g., “origin”)
[LEDGER]Ledger name (defaults to active ledger)

Options

OptionDescription
--remote-name <NAME>Remote ledger name (defaults to local ledger name)

Description

fluree publish is the reverse of fluree clone. It takes a locally-created ledger and pushes it to a remote server in a single operation:

  1. Checks if the ledger exists on the remote (GET /exists)
  2. Creates it if not (POST /create)
  3. Pushes all local commits (POST /push)
  4. Configures upstream tracking so subsequent fluree push and fluree pull work

This is intended for the “create locally, deploy to server” workflow. If the remote ledger already has data (t > 0), the command will fail — use fluree push instead for incremental updates.

Examples

# Publish active ledger to origin
fluree publish origin

# Publish a specific ledger
fluree publish origin mydb

# Publish with a different name on the remote
fluree publish origin mydb --remote-name production-db

# Typical workflow: create locally, develop, then publish
fluree create mydb
fluree insert mydb -e '{"@id": "ex:test", "ex:name": "Test"}'
fluree publish origin mydb

Prerequisites

  • A remote must be configured: fluree remote add origin <url>
  • The remote must support the Fluree HTTP API (see Server implementation guide)
  • A valid auth token if the remote requires authentication: fluree auth login --remote origin

After Publishing

Once published, the ledger has upstream tracking configured. Use standard sync commands:

# Push new local commits to remote
fluree push

# Pull remote changes
fluree pull

See Also

  • push - Push incremental commits to upstream
  • pull - Pull changes from upstream
  • clone - Clone a remote ledger locally (reverse of publish)
  • remote - Manage remote server configuration
  • upstream - Manage upstream tracking
  • export - Export ledger as .flpack for file-based transfer

fluree clone

Clone a ledger from a remote server, similar to git clone.

Usage

# Named-remote clone
fluree clone [OPTIONS] <REMOTE> <LEDGER>

# Origin-based clone (no pre-configured remote)
fluree clone --origin <URI> [--token <TOKEN>] [OPTIONS] <LEDGER>

Arguments

ArgumentDescription
<REMOTE>Remote name (configured via fluree remote add)
<LEDGER>Ledger name on the remote server
--origin <URI>Bootstrap URI for CID-based clone (replaces <REMOTE>)
--token <TOKEN>Auth token for origin server (with --origin only)
--no-indexesSkip pulling binary index data; only transfer commits and txn blobs (queries will replay from commits until you run fluree reindex)
--no-txnsSkip pulling original transaction payloads. Commits still transfer (chain remains valid and verifiable), but the raw JSON-LD / SPARQL requests that produced each commit are not downloaded. Use for read-only clones of large ledgers. See Transaction transfer.

Description

Downloads all commits from a remote ledger and creates a local copy:

  1. Verifies the remote ledger exists and has commits
  2. Creates a local ledger with the same name as on the remote
  3. Attempts bulk download via the pack protocol (single streaming request)
  4. Falls back to paginated JSON export if the server does not support pack
  5. Stores all commit and transaction blobs to local CAS
  6. By default, also transfers binary index artifacts when the remote has an index (see Index transfer)
  7. Sets the local commit head (and index head when index data was transferred) to match the remote
  8. Configures the remote as upstream for future pull/push (named-remote only)

Index transfer

When using the pack protocol, the CLI requests index artifacts by default so the local ledger is query-ready without a full reindex. The server sends missing commit blobs, txn blobs, and binary index artifacts (dictionaries, branches, leaves) in one stream.

  • Use --no-indexes to transfer only commits and txn blobs. This reduces transfer size and time; afterward, run fluree reindex to build the index locally if needed.
  • For large transfers (estimated size above ~1 GiB), the CLI prompts: “Estimated transfer size: ~X. This may take several minutes. Continue? [Y/n]”. Answer n to abort or to re-request without index data (commits-only).
  • If the remote has no index yet (e.g. a fresh ledger), only commits and txns are transferred regardless of the flag.

Transaction transfer

Every commit references an original transaction blob — the raw request (JSON-LD insert/update or SPARQL Update) that produced the commit. By default, fluree clone downloads these so the local ledger has a complete audit trail of the original payloads.

  • Use --no-txns to skip transaction blobs entirely. The commit chain is still cloned and remains valid and verifiable; only the original request payloads are missing.
  • The materialized ledger state (what queries return) is reconstructable from commits + indexes alone — transactions are not needed for query answering.
  • With --no-txns, operations that need the original request payload (e.g., fluree show --flakes for transaction-level inspection, or re-running a transaction against a branch) will fail locally for those transactions. Anything that only reads materialized state is unaffected.
  • Combine with --no-indexes for the smallest possible clone (fluree clone --no-indexes --no-txns origin mydb), useful for minimal verification / auditing of the commit chain only.

Transport

The CLI uses the pack protocol (fluree-pack-v1) as the primary transport for clone and pull. Pack transfers all missing CAS objects (commits + txn blobs, and by default index artifacts) in a single streaming HTTP request, avoiding per-object round-trips.

If the remote server does not support the pack endpoint (returns 404, 405, 406, or 501), the CLI automatically falls back to:

  • Named-remote mode: paginated JSON export via GET /commits/{ledger} (500 commits per page)
  • Origin mode: CID chain walk via GET /storage/objects/{cid} (one round-trip per commit)

This fallback is transparent – no user action is required.

Origin-based clone

The --origin flag enables CID-based clone from a server URL without pre-configuring a named remote:

fluree clone --origin http://localhost:8090 mydb
fluree clone --origin https://api.example.com --token @~/.fluree/token mydb

This mode:

  1. Fetches the NsRecord from the origin to discover the head commit CID
  2. Optionally upgrades to a multi-origin fetcher if a LedgerConfig is advertised
  3. Downloads commits via pack (or CID chain walk as fallback)
  4. Stores the LedgerConfig locally for future origin-based pull
  5. Does not configure upstream tracking (use fluree upstream set manually)

This is a replication operation. It requires a Bearer token with root / storage-proxy permissions (fluree.storage.*). If you only have permissioned/query access to a ledger, you should use fluree track (or --remote) and run queries/transactions against the remote instead.

Idempotent CAS writes: If interrupted mid-clone, CAS blob writes are idempotent. Re-running the clone command will re-fetch all pages (duplicate writes are harmless). The local head is only set after all data is downloaded.

Examples

# Clone a ledger from a configured remote
fluree clone origin mydb

# Full workflow: add remote, then clone
fluree remote add production https://api.example.com --token @~/.fluree/token
fluree clone production customers

# Origin-based clone (no remote setup needed)
fluree clone --origin http://localhost:8090 mydb

# Origin-based clone with auth
fluree clone --origin https://api.example.com --token @~/.fluree/token mydb

# Clone without index data (faster; run fluree reindex afterward if needed)
fluree clone --no-indexes origin mydb

# Clone commits + indexes but skip original transaction payloads
fluree clone --no-txns origin mydb

# Smallest possible clone — commits only (no indexes, no transactions)
fluree clone --no-indexes --no-txns origin mydb

Output

Successful clone (via pack, with index data):

Cloning 'mydb:main' from 'origin' (remote t=1042)...
  fetched 2084 object(s) via pack
✓ Cloned 'mydb:main' (1042 commits, head t=1042)
  → upstream set to 'origin/mydb:main'

With --no-indexes (commits and txns only), the object count will be lower and the local index head is not set until you run fluree reindex.

Successful clone (fallback to paginated export):

Cloning 'mydb:main' from 'origin' (remote t=1042)...
  fetched 500 commits...
  fetched 1000 commits...
  fetched 1042 commits...
✓ Cloned 'mydb:main' (1042 commits, head t=1042)
  → upstream set to 'origin/mydb:main'

Origin-based clone:

Cloning 'mydb:main' from 'http://localhost:8090' (remote t=50)...
  fetched 100 object(s) via pack
✓ Cloned 'mydb:main' (50 commit(s), head t=50)

Remote ledger has no commits:

Remote ledger 'mydb:main' has no commits (t=0), nothing to clone.

Errors

ErrorDescription
Remote not configuredRun fluree remote add <name> <url> first
Ledger not found on remoteVerify the ledger name matches the remote server
Auth failureToken missing or lacks fluree.storage.* permissions
Local ledger already existsDrop the existing ledger first

Limitations

  • Post-clone indexing: If you used --no-indexes, run fluree reindex to build a binary index locally. Without an index, queries replay from commits and can be slow for large ledgers. When index data is transferred by default (no --no-indexes), the local index head is set and no reindex is needed for the core ledger.
  • Missing transactions: If you used --no-txns, the original transaction payloads for historical commits are permanently unavailable on the local clone (re-pull will not fetch them unless you explicitly re-clone without the flag). The ledger state remains queryable; only transaction-level inspection and replay are affected.
  • Graph source indexes not replicated: Graph source snapshots (BM25/vector/geo, etc.) are not replicated by fluree clone yet. After cloning, rebuild graph source indexes in the target environment as needed.

See Also

  • pull - Pull new commits from upstream
  • push - Push local commits to upstream
  • remote - Configure remote servers
  • upstream - Configure upstream tracking

fluree track

Track a remote ledger without storing local data. Tracked ledgers route reads and writes to the configured remote server while keeping a lightweight record locally so you can use short aliases and the active-ledger shortcut.

Usage

fluree track <SUBCOMMAND>

Subcommands

fluree track add

Start tracking a remote ledger under a local alias.

Usage:

fluree track add <LEDGER> [--remote <NAME>] [--remote-alias <NAME>]

Arguments:

ArgumentDescription
<LEDGER>Local alias for the tracked ledger

Options:

OptionDescription
--remote <NAME>Remote name (e.g., origin). Defaults to the only configured remote if unambiguous.
--remote-alias <NAME>Alias on the remote (defaults to the local alias)

Examples:

# Track a remote ledger using the same name locally
fluree track add production --remote origin

# Use a different local alias
fluree track add prod --remote origin --remote-alias production

fluree track remove

Stop tracking a remote ledger. Local data is not affected (tracked ledgers have none).

Usage:

fluree track remove <LEDGER>
ArgumentDescription
<LEDGER>Local alias to stop tracking

fluree track list

List all currently tracked ledgers and the remote each resolves to.

Usage:

fluree track list

fluree track status

Show status of tracked ledger(s) by querying the configured remote for each — commit t, index t, and head IDs.

Usage:

fluree track status [LEDGER]
ArgumentDescription
[LEDGER]Local alias (shows all tracked ledgers if omitted)

Examples:

# Status of all tracked ledgers
fluree track status

# Status for a single tracked ledger
fluree track status production

Description

A tracked ledger is a local pointer to a remote ledger. Queries, transactions, and most administrative commands against a tracked alias are transparently forwarded to the remote. This lets you work against a hosted ledger using the same CLI flow as a local ledger — including the active-ledger shortcut (fluree use), without syncing commit/index data to disk.

Use fluree clone instead when you need a full local copy of a remote ledger’s data.

See Also

  • remote - Manage named remote servers
  • clone - Clone a remote ledger locally (with data)
  • use - Switch active ledger
  • list - List local and tracked ledgers

server

Manage the Fluree HTTP server from the CLI. The server inherits the same .fluree/ context (config file, storage path) as the CLI — one directory, two modes of interaction.

Subcommands

SubcommandDescription
runRun the server in the foreground (Ctrl-C to stop)
startStart the server as a background process
stopStop a backgrounded server
statusShow server status (PID, address, health)
restartStop and restart a backgrounded server
logsView server logs

Common Options

These options are available on run, start, and restart:

OptionDescription
--listen-addr <ADDR>Listen address (e.g., 0.0.0.0:8090)
--storage-path <PATH>Storage path override (local file storage)
--connection-config <FILE>JSON-LD connection config for S3, DynamoDB, etc.
--log-level <LEVEL>Log level (trace, debug, info, warn, error)
--profile <NAME>Configuration profile to activate
-- <ARGS>...Additional server flags (passed through to server config)

--storage-path and --connection-config are mutually exclusive. Use --storage-path for local file storage or --connection-config for remote backends (S3, DynamoDB, split storage). See Configuration for details.

When no flags are provided, the server discovers its configuration using the same search as the CLI: it walks up from the current working directory looking for a .fluree/config.toml (or config.jsonld), then falls back to the global Fluree config directory ($FLUREE_HOME, or the platform config directory — see Configuration). Server settings live under the [server] section. The CLI’s --config flag is also honored.

run

Run the server in the foreground. Logs go to stderr. Press Ctrl-C for graceful shutdown.

# Start with defaults from config.toml
fluree server run

# Override listen address
fluree server run --listen-addr 127.0.0.1:9090

# S3 + DynamoDB backend
fluree server run --connection-config /etc/fluree/connection.jsonld

# Pass through advanced server flags
fluree server run -- --cors-enabled --indexing-enabled

start

Start the server as a background daemon. Writes PID and metadata to .fluree/ and redirects output to .fluree/server.log.

# Start in background
fluree server start

# Preview resolved config without starting
fluree server start --dry-run

# Start with overrides
fluree server start --listen-addr 0.0.0.0:8090 --log-level debug

The --dry-run flag prints the fully resolved configuration (config file + env + flag overrides merged) without actually starting the server. Useful for debugging “why is it using port X?”.

stop

Stop a backgrounded server by sending SIGTERM and waiting for graceful shutdown (up to 10 seconds).

fluree server stop

# Force kill after timeout
fluree server stop --force

status

Check whether the server is running. Shows PID, listen address, uptime, storage path, and performs an HTTP health check.

fluree server status

Example output:

ok: Server is running
  pid:          12345
  listen_addr:  0.0.0.0:8090
  storage_path: /path/to/.fluree/storage
  started_at:   2026-02-16T10:30:00Z
  uptime:       2h 15m 30s
  health:       ok
  log:          /path/to/.fluree/server.log

When using --connection-config, the status shows the connection config path instead of the storage path:

ok: Server is running
  pid:          12345
  listen_addr:  0.0.0.0:8090
  connection:   /etc/fluree/connection.jsonld
  started_at:   2026-02-16T10:30:00Z
  uptime:       2h 15m 30s
  health:       ok

restart

Stop and restart a backgrounded server. Recovers the original arguments from .fluree/server.meta.json. New flag overrides can be applied on restart.

fluree server restart

# Restart with a different log level
fluree server restart --log-level debug

logs

View server log output from .fluree/server.log.

# Last 50 lines (default)
fluree server logs

# Last 100 lines
fluree server logs -n 100

# Follow (like tail -f)
fluree server logs -f

Auto-Routing

When a local server is running (started via fluree server start), CLI commands that support remote execution are automatically routed through the server’s HTTP API. This applies to:

  • fluree query
  • fluree insert
  • fluree upsert
  • fluree list
  • fluree info

The CLI detects the running server by checking .fluree/server.meta.json and verifying the PID is alive. When auto-routing is active, you’ll see a hint on stderr:

  server: routing through local server at 0.0.0.0:8090 (use --direct to bypass)

Opting out

Use the --direct global flag to bypass auto-routing and execute directly via the CLI’s file-based path:

# Route through server (default when server is running)
fluree query 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'

# Bypass server, execute directly
fluree query --direct 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'

Crash detection

If the server has crashed or been killed, the CLI detects the stale PID and falls back to direct execution with a notice:

  notice: local server (pid 12345) is no longer running; executing directly

Use fluree server status to check server health, or fluree server logs to view crash output.

Runtime Files

When a background server is running, these files are created in the .fluree/ data directory:

FileDescription
server.pidPID of the background server process
server.logstdout + stderr from the background server
server.meta.jsonMetadata for restart and status (PID, address, args, start time)

These files are cleaned up automatically by fluree server stop.

Configuration

The server uses the same config file as the CLI (discovered via walk-up or global fallback — see above). Server-specific settings live under the [server] section:

[server]
listen_addr = "0.0.0.0:8090"
storage_path = "/var/lib/fluree"
log_level = "info"
cors_enabled = true
# cache_max_mb = 4096  # global cache budget (MB); default: tiered fraction of RAM (30% <4GB, 40% 4-8GB, 50% ≥8GB)

[server.indexing]
enabled = true
reindex_min_bytes = 100_000
# reindex_max_bytes defaults to 20% of system RAM; uncomment to override
# reindex_max_bytes = 536_870_912  # 512 MB

For S3/DynamoDB backends, use connection_config instead of storage_path:

[server]
connection_config = "/etc/fluree/connection.jsonld"
cache_max_mb = 4096

[server.indexing]
enabled = true

Indexing settings live under the [server.indexing] subsection, not directly on [server]. Authentication settings similarly use [server.auth.events], [server.auth.data], etc.

See Configuration for the full list of server options.

Feature Flags

The server subcommand requires the server Cargo feature (enabled by default). If compiled without it:

fluree server run
# error: server support not compiled. Rebuild with `--features server`.

For S3/DynamoDB support via --connection-config, the aws feature must be enabled:

cargo build -p fluree-db-cli --features aws

Without this feature, S3 storage configs in the connection config will produce a clear error at startup.

fluree memory

Developer memory — store and recall facts, decisions, and constraints.

This page is the CLI command reference. For conceptual background, IDE setup, team workflows, and the full schema, see the Memory section of the docs.

Usage

fluree memory <COMMAND>

Subcommands

CommandDescription
initInitialize the memory store (creates __memory ledger)
addStore a new memory
recallSearch and rank relevant memories
update <ID>Update a memory in place
forget <ID>Delete a memory
statusShow memory store status
exportExport all current memories as JSON
import <FILE>Import memories from a JSON file
mcp-installInstall MCP configuration for an IDE

Description

The memory system stores project knowledge as RDF triples in a dedicated __memory Fluree ledger. Memories persist across sessions and are searchable by keyword-scored recall.

Run fluree memory init before using other memory commands. The MCP server auto-initializes on first tool call.

fluree memory init

Initialize the memory store and optionally configure MCP for detected AI coding tools. Idempotent — safe to run multiple times.

fluree memory init [OPTIONS]

Options

OptionDescription
--yes, -yAuto-confirm all MCP installations (non-interactive)
--no-mcpSkip AI tool detection and MCP configuration entirely

What init does

  1. Creates the __memory ledger and transacts the memory schema.
  2. Creates .fluree-memory/ at the project root with repo.ttl, .gitignore, and .local/user.ttl.
  3. Migrates existing memories — if the ledger already has memories (e.g., from a pre-TTL version), they are exported to the appropriate .ttl files.
  4. Detects AI coding tools (Claude Code, Cursor, VS Code, Windsurf, Zed) and offers to install MCP config for each.

Example

$ fluree memory init

Memory store initialized at /path/to/project/.fluree-memory

Repo memories are stored in .fluree-memory/repo.ttl (git-tracked).
Commit this directory to share project knowledge with your team.

Detected AI coding tools:
  - Claude Code (already configured)
  - Cursor
  - VS Code (Copilot) (already configured)

Install MCP config for Cursor? [Y/n] Y
  Installed: .cursor/mcp.json
  Installed: .cursor/rules/fluree_rules.md

Configured 1 tool.

With --yes: auto-confirms all installations without prompting. In a non-interactive shell (piped stdin) without --yes, MCP installation is skipped with a message.

fluree memory add

Store a new memory.

fluree memory add [OPTIONS]

Options

OptionDescription
--kind <KIND>Memory kind: fact, decision, constraint (default: fact)
--text <TEXT>Content text (or provide via stdin)
--tags <T1,T2>Required. Comma-separated tags for categorization — the primary recall signal
--refs <R1,R2>Comma-separated file/artifact references
--severity <SEV>For constraints: must, should, prefer
--scope <SCOPE>Scope: repo (default) or user
--rationale <TEXT>Why this memory exists (available on any kind)
--alternatives <TEXT>Alternatives considered (comma-separated)
--format <FMT>Output format: text (default) or json

Examples

# Store a fact
fluree memory add --kind fact --text "Tests use cargo nextest" --tags testing,cargo

# Store a constraint with severity
fluree memory add --kind constraint --text "Never suppress dead code with underscore prefix" \
  --tags code-style --severity must

# Store from stdin
echo "The index format uses postcard encoding" | fluree memory add --kind fact --tags indexer

# Store a decision with rationale and alternatives
fluree memory add --kind decision --text "Use postcard for compact index encoding" \
  --rationale "no_std compatible, smaller output than bincode" \
  --alternatives "bincode, CBOR, MessagePack" --refs fluree-db-indexer/

# Store a fact with rationale
fluree memory add --kind fact --text "PSOT queries return supersets — post-filter required" \
  --rationale "B-tree range scan can't filter on non-key predicates" --tags query,index

Output (text):

Stored memory: mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0

Secret detection

If the content contains secrets (API keys, passwords, tokens, connection strings), they are automatically redacted and a warning is printed:

  warning: secrets detected in content — storing redacted version.
  Original content contained sensitive data that was replaced with [REDACTED].
Stored memory: mem:fact-01JDXYZ...

fluree memory recall

Search and retrieve relevant memories ranked by score.

fluree memory recall <QUERY> [OPTIONS]

Arguments

ArgumentDescription
<QUERY>Natural language search query

Options

OptionDescription
-n, --limit <N>Maximum results per page (default: 3)
--offset <N>Skip the first N results — use for pagination (default: 0)
--kind <KIND>Filter to a specific memory kind
--tags <T1,T2>Filter to memories with these tags
--scope <SCOPE>Filter by scope: repo or user
--format <FMT>Output: text (default), json, or context (XML for LLM)

Examples

# Basic recall (returns top 3)
fluree memory recall "how to run tests"

# Get the next page
fluree memory recall "how to run tests" --offset 3

# Return up to 10 results
fluree memory recall "error handling" -n 10

# Filter by kind and tags
fluree memory recall "error handling" --kind constraint --tags errors

# Output as XML context (for LLM injection)
fluree memory recall "testing patterns" --format context

Output (text):

Recall: "how to run tests" (2 matches)

1. [score: 13.0] mem:fact-01JDXYZ...
   Tests use cargo nextest
   Tags: testing, cargo

2. [score: 8.0] mem:fact-01JDABC...
   Integration tests use assert_cmd + predicates
   Tags: testing

  (showing results 1–3; use --offset 3 for more)

Output (context):

<memory-context>
  <memory id="mem:fact-01JDXYZ..." kind="fact" score="13.0">
    <content>Tests use cargo nextest</content>
    <tags>testing, cargo</tags>
  </memory>
  <pagination shown="1" offset="0" total_in_store="13" />
</memory-context>

When results are cut off, the pagination element includes a hint:

  <pagination shown="3" offset="0" limit="3" total_in_store="13">Results 1–3. Use offset=3 to retrieve more.</pagination>

fluree memory update

Update a memory in place. Only the fields you provide are changed — the ID stays the same. History is tracked via git.

fluree memory update <ID> [OPTIONS]

Options

OptionDescription
--text <TEXT>New content text
--tags <T1,T2>New tags (replaces all existing)
--refs <R1,R2>New artifact refs (replaces all existing)
--format <FMT>Output: text or json

Example

fluree memory update mem:fact-01JDXYZ... --text "Tests use cargo nextest with --no-fail-fast"

Output:

Updated: mem:fact-01JDXYZ...

fluree memory forget

Delete a memory by retracting all its triples.

fluree memory forget <ID>

Output:

Forgotten: mem:fact-01JDXYZ...

fluree memory status

Show a summary of the memory store.

fluree memory status

Output:

Memory Store Status
  Total memories: 12
  Total tags:     25
  By kind:
    fact: 7
    decision: 2
    constraint: 3

fluree memory export / import

Export all current (non-superseded) memories as JSON, or import from a file.

fluree memory export > memories.json
fluree memory import memories.json

fluree memory mcp-install

Install MCP configuration for an IDE so agents can use memory tools.

fluree memory mcp-install [--ide <IDE>]

Options

OptionDescription
--ide <IDE>Target IDE (auto-detected if omitted)

Supported IDE values:

ValueConfig writtenNotes
claude-codeclaude mcp add (local scope → ~/.claude.json)Also appends to CLAUDE.md
vscode.vscode/mcp.json (key: servers)Also installs .vscode/fluree_rules.md
cursor.cursor/mcp.json (key: mcpServers)Also installs .cursor/rules/fluree_rules.md
windsurf~/.codeium/windsurf/mcp_config.json (global)
zed.zed/settings.json (key: context_servers)Skips if JSONC (comments) detected

Legacy aliases: claude-vscode and github-copilot map to vscode.

When --ide is omitted, the first unconfigured detected tool is used; defaults to claude-code if none detected.

Example

fluree memory mcp-install --ide cursor

Output:

  Installed: .cursor/mcp.json
  Installed: .cursor/rules/fluree_rules.md

Cursor’s MCP configuration supports stdio servers with a type field and config interpolation like ${workspaceFolder}. A portable repo-scoped setup looks like:

{
  "mcpServers": {
    "fluree-memory": {
      "type": "stdio",
      "command": "fluree",
      "args": ["mcp", "serve", "--transport", "stdio"],
      "env": {
        "FLUREE_HOME": "${workspaceFolder}/.fluree"
      }
    }
  }
}

Setting FLUREE_HOME ensures the MCP server uses the current workspace’s .fluree/ directory even if Cursor spawns the process from a different working directory. That keeps repo memory/logs under <repo>/.fluree-memory/ instead of a global location.

Troubleshooting: repo vs global memory

  • Repo-scoped expected:
    • Memories: <repo>/.fluree-memory/repo.ttl
    • MCP log: <repo>/.fluree-memory/.local/mcp.log (should show client initialized after a full Cursor restart)
  • If it’s using global dirs on macOS:
    • Memories/log: ~/Library/Application Support/.fluree-memory/...
    • Fix: ensure your Cursor config sets env.FLUREE_HOME = "${workspaceFolder}/.fluree" and restart Cursor fully.

See Also

  • Memory overview — what it is, when to use it, how it fits into your workflow
  • Memory getting started — install, quickstart, and per-IDE setup guides
  • Memory concepts — repo vs user memory, supersession, recall ranking, secrets
  • Memory guides — team workflows, rules-file customization, migrating from plain markdown
  • Memory reference — IDE support matrix, mem: schema, TTL file format
  • mcp — MCP server for IDE agent integration

fluree mcp

Model Context Protocol (MCP) server for IDE agent integration.

Usage

fluree mcp <COMMAND>

Subcommands

CommandDescription
serveStart the MCP server

fluree mcp serve

Start an MCP server that exposes developer memory tools to IDE agents.

fluree mcp serve [--transport <TRANSPORT>]

Options

OptionDescription
--transport <TRANSPORT>Transport protocol: stdio (default)

The stdio transport reads JSON-RPC requests from stdin and writes responses to stdout. This is the standard transport for IDE integration — the IDE spawns the process and communicates over pipes.

Available tools

The MCP server exposes 6 tools:

ToolDescription
memory_addStore a new memory (fact, decision, constraint, preference, artifact)
memory_recallSearch and retrieve relevant memories as XML context. Accepts query, limit (default: 3), offset (default: 0), kind, tags, scope. Returns a <pagination> element indicating whether more results are available.
memory_updateUpdate (supersede) an existing memory
memory_forgetDelete a memory
memory_statusShow memory store summary
kg_queryExecute raw SPARQL against the memory graph

The server auto-initializes the memory store on first tool call. No separate fluree memory init is needed.

IDE configuration

The easiest way to configure your IDE is with fluree memory mcp-install:

fluree memory mcp-install --ide cursor

Or manually add to your IDE’s MCP config:

{
  "mcpServers": {
    "fluree-memory": {
      "type": "stdio",
      "command": "/path/to/fluree",
      "args": ["mcp", "serve", "--transport", "stdio"],
      "env": {
        "FLUREE_HOME": "${workspaceFolder}/.fluree"
      }
    }
  }
}

Testing with JSON-RPC

To test the server directly, pipe JSON-RPC to stdin:

printf '%s\n' \
  '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"smoke","version":"0.0"}}}' \
  '{"jsonrpc":"2.0","method":"notifications/initialized","params":{}}' \
  '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' \
  | fluree mcp serve --transport stdio

Tracing

CLI tracing is disabled when running fluree mcp serve to avoid any log output on stderr that could interfere with the JSON-RPC protocol.

See Also

fluree iceberg

Manage Apache Iceberg table connections.

Subcommands

SubcommandDescription
mapMap an Iceberg table as a graph source
listList Iceberg-family graph sources (Iceberg and R2RML)
infoShow details for an Iceberg-family graph source
dropDrop an Iceberg-family graph source

fluree iceberg map

Map an Iceberg table as a queryable graph source.

Usage

fluree iceberg map <NAME> [OPTIONS]

Arguments

ArgumentDescription
<NAME>Graph source name (e.g., “warehouse-orders”)

Options

Catalog mode:

OptionDescription
--mode <MODE>Catalog mode: rest (default) or direct

REST catalog mode options:

OptionDescription
--catalog-uri <URI>REST catalog URI (required for rest mode)
--table <ID>Table identifier in namespace.table format (required if not specified in R2RML mapping)
--warehouse <NAME>Warehouse identifier
--no-vended-credentialsDisable vended credentials (enabled by default)

Direct S3 mode options:

OptionDescription
--table-location <URI>S3 table location (required for direct mode, e.g., s3://bucket/warehouse/ns/table)

R2RML mapping:

OptionDescription
--r2rml <PATH>R2RML mapping file (Turtle format, required). Defines how table rows become RDF triples. Table references come from the mapping’s rr:tableName entries.
--r2rml-type <TYPE>Mapping media type (e.g., text/turtle); inferred from extension if omitted

Authentication:

OptionDescription
--auth-bearer <TOKEN>Bearer token for REST catalog authentication
--oauth2-token-url <URL>OAuth2 token URL for client credentials auth
--oauth2-client-id <ID>OAuth2 client ID
--oauth2-client-secret <SECRET>OAuth2 client secret

S3 overrides:

OptionDescription
--s3-region <REGION>S3 region override
--s3-endpoint <URL>S3 endpoint override (for MinIO, LocalStack)
--s3-path-styleUse path-style S3 URLs

Other:

OptionDescription
--remote <NAME>Execute against a remote server (by remote name)
--branch <NAME>Branch name (defaults to “main”)

Description

Maps an Apache Iceberg table as a graph source that can be queried using SPARQL or JSON-LD queries. The table is accessed read-only; Fluree does not modify the Iceberg table.

An R2RML mapping (--r2rml) is required to define how Iceberg table rows are transformed into RDF triples.

Two catalog modes are supported:

  • REST mode (default): Connects to an Iceberg REST catalog (e.g., Apache Polaris) to discover table metadata. Supports vended credentials and warehouse selection.
  • Direct S3 mode: Reads table metadata directly from S3 by resolving version-hint.text in the table’s metadata/ directory. No catalog server required.

Examples

# REST catalog with R2RML mapping
fluree iceberg map airlines \
  --catalog-uri https://polaris.example.com/api/catalog \
  --r2rml mappings/airlines.ttl \
  --auth-bearer $POLARIS_TOKEN

# REST catalog with explicit table and warehouse
fluree iceberg map warehouse-orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --r2rml mappings/orders.ttl \
  --auth-bearer $POLARIS_TOKEN \
  --warehouse my-warehouse

# Direct S3 (no catalog server)
fluree iceberg map execution-log \
  --mode direct \
  --table-location s3://my-bucket/warehouse/logs/execution_log \
  --r2rml mappings/execution_log.ttl \
  --s3-region us-east-1

# OAuth2 authentication
fluree iceberg map orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --r2rml mappings/orders.ttl \
  --oauth2-token-url https://auth.example.com/token \
  --oauth2-client-id my-client \
  --oauth2-client-secret $CLIENT_SECRET

# Create the graph source on a remote Fluree server
fluree iceberg map warehouse-orders \
  --remote origin \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --r2rml mappings/orders.ttl

Output

Mapped Iceberg table as R2RML graph source 'airlines:main'
  Table:       openflights.airlines
  Catalog:     https://polaris.example.com/api/catalog
  R2RML:       mappings/airlines.ttl
  TriplesMaps: 3
  Connection:  verified
  Mapping:     validated

After Mapping

Once mapped, the graph source appears in standard commands:

# Listed alongside ledgers
fluree list

# Inspect configuration
fluree info warehouse-orders

# Query via SPARQL GRAPH pattern
fluree query mydb 'SELECT ?id ?total FROM <mydb:main> WHERE { GRAPH <warehouse-orders:main> { ?o ex:id ?id ; ex:total ?total } }'

# Remove the mapping
fluree drop warehouse-orders --force

Feature Flag

Requires the iceberg feature flag. Without it, the command returns:

error: Iceberg support not compiled. Rebuild with `--features iceberg`.

See Also

  • Iceberg / Parquet - Iceberg integration details
  • R2RML - R2RML mapping reference
  • list - List ledgers and graph sources
  • info - Show graph source details
  • drop - Remove a graph source

fluree iceberg list

List Iceberg-family graph sources (Iceberg and R2RML types).

Usage

fluree iceberg list [--remote <NAME>]

Examples

# Local
fluree iceberg list

# Remote
fluree iceberg list --remote origin

fluree iceberg info

Show details for an Iceberg-family graph source.

Usage

fluree iceberg info <NAME> [--remote <NAME>]

Examples

# Local
fluree iceberg info warehouse-orders

# Remote
fluree iceberg info warehouse-orders --remote origin

fluree iceberg drop

Drop an Iceberg-family graph source. This command only targets Iceberg/R2RML graph sources; it does not fall back to dropping ledgers of the same name.

Usage

fluree iceberg drop <NAME> --force [--remote <NAME>]

Examples

# Local
fluree iceberg drop warehouse-orders --force

# Remote
fluree iceberg drop warehouse-orders --force --remote origin

fluree completions

Generate shell completions.

Usage

fluree completions <SHELL>

Arguments

ArgumentDescription
<SHELL>Shell to generate completions for

Supported Shells

  • bash
  • zsh
  • fish
  • powershell
  • elvish

Description

Generates shell completion scripts that enable tab-completion for fluree commands, options, and arguments.

Installation

Bash

# Add to ~/.bashrc
eval "$(fluree completions bash)"

# Or save to a file
fluree completions bash > /etc/bash_completion.d/fluree

Zsh

# Add to ~/.zshrc
eval "$(fluree completions zsh)"

# Or save to completions directory
fluree completions zsh > ~/.zfunc/_fluree
# Then add to ~/.zshrc: fpath=(~/.zfunc $fpath)

Fish

fluree completions fish > ~/.config/fish/completions/fluree.fish

PowerShell

# Add to your PowerShell profile
fluree completions powershell | Out-String | Invoke-Expression

Examples

# Generate bash completions
fluree completions bash

# Generate zsh completions and save
fluree completions zsh > ~/.zfunc/_fluree

Usage After Installation

After installing completions, you can use tab to complete:

fluree <TAB>        # Shows all commands
fluree que<TAB>     # Completes to "query"
fluree query --<TAB> # Shows available options

Getting Started

Welcome to Fluree! This section will guide you through the essential steps to start using Fluree for your graph database needs.

Quick Navigation

Fluree for SQL Developers

Coming from PostgreSQL, MySQL, or SQL Server? This guide maps SQL concepts to Fluree equivalents, shows the same operations in both languages, and highlights where Fluree gives you capabilities that relational databases don’t have.

Quickstart: Run the Server

Get Fluree up and running in minutes. Learn how to:

  • Install and run the Fluree server
  • Configure basic settings
  • Verify the server is running
  • Access the HTTP API

Quickstart: Create a Ledger

Create your first ledger to store data. Learn how to:

  • Create a new ledger using the API
  • Understand ledger IDs and branching
  • Set up initial configuration
  • Verify ledger creation

Quickstart: Write Data

Start writing data to your ledger. Learn how to:

  • Insert new entities (basic inserts)
  • Upsert data (idempotent transactions; predicate-level replacement for supplied predicates)
  • Update existing data (WHERE/DELETE/INSERT pattern)
  • Understand JSON-LD transaction format

Quickstart: Query Data

Query your data using Fluree’s powerful query languages. Learn how to:

  • Write basic JSON-LD queries
  • Write basic SPARQL queries
  • Filter and select data
  • Understand query results

Tutorial: End-to-End

Build a knowledge base that combines Fluree’s differentiating features in one workflow:

  • Full-text search with BM25 relevance ranking
  • Time travel to compare current and historical state
  • Branching to experiment safely
  • Policies for role-based access control

Using Fluree as a Rust Library

Embed Fluree directly in your Rust applications. Learn how to:

  • Add Fluree as a dependency in Cargo.toml
  • Use the Rust API programmatically
  • Implement common patterns (insert, query, update)
  • Integrate BM25 and vector search
  • Handle errors and configuration
  • Write tests with Fluree

What is Fluree?

Fluree is a temporal graph database that stores data as RDF triples with built-in support for:

  • Time Travel: Query data as it existed at any point in time
  • Full-Text Search: Integrated BM25 indexing for powerful text search
  • Vector Search: Approximate nearest neighbor (ANN) queries
  • Policy Enforcement: Fine-grained, data-level access control
  • Verifiable Data: Cryptographically signed transactions
  • Graph Sources: Integration with external data sources (Iceberg, R2RML)

Learning Path

For HTTP API users (server-based):

  1. Bridge the gap: Fluree for SQL Developers if coming from relational databases
  2. Start with the Server: Run the Server to get Fluree running
  3. Create Your First Ledger: Create a Ledger to set up your database
  4. Add Data: Write Data to insert your first entities
  5. Query Your Data: Query Data to retrieve and explore
  6. See it all together: End-to-End Tutorial — search, time travel, branching, and policies in one workflow
  7. Core Concepts: Read Concepts to understand Fluree’s architecture
  8. Practical Guides: Explore Cookbooks for search, time travel, branching, policies, and SHACL validation
  9. Deep Dive: Explore Query, Transactions, and Security
  10. Production Ready: Review Operations for deployment guidance

For Rust developers (embedded library):

  1. Rust API Guide: Using Fluree as a Rust Library for embedding Fluree in your application
  2. Core Concepts: Concepts to understand how Fluree works
  3. Practical Guides: Cookbooks for search, time travel, branching, policies, and validation
  4. Advanced Queries: Query for complex query patterns
  5. Transactions: Transactions for data modification patterns
  6. Production Ready: Operations and Dev Setup

Prerequisites

  • Familiarity with JSON format
  • HTTP client (curl, Postman, or your programming language’s HTTP library)
  • No graph database or RDF experience required — Fluree for SQL Developers bridges the gap from relational databases

Support and Resources

  • Documentation: This documentation provides comprehensive coverage
  • API Reference: See HTTP API for endpoint details
  • Troubleshooting: Check Troubleshooting for common issues

Let’s get started!

Fluree for SQL Developers

If you’ve spent years with PostgreSQL, MySQL, or SQL Server and are encountering a graph database for the first time, this guide bridges the gap. It maps SQL concepts you already know to their Fluree equivalents, shows you the same operations in both languages, and highlights where Fluree gives you capabilities that relational databases simply don’t have.

The mental model shift

In SQL, you design tables with fixed columns, then insert rows. In Fluree, you make statements about things — and those statements can describe anything, with any properties, at any time.

SQL ConceptFluree EquivalentKey Difference
DatabaseLedgerImmutable — every change is preserved
TableType (via rdf:type)No fixed schema required; types are just labels
RowEntity (identified by IRI)An entity can have any properties, not just those in a “table”
ColumnPredicate (property)Not tied to a single type; any entity can use any property
Foreign keyReference (IRI link)Relationships are first-class, bidirectional, and traversable
ValueObject (literal or reference)Typed values (string, integer, date, etc.)
Row (one fact)FlakeA triple + provenance (graph, transaction time, assert/retract)
NULLAbsenceProperties simply don’t exist if not set — no nulls

The flake: Fluree’s atomic unit

Every fact in Fluree is stored as a flake — an extended triple that adds provenance. At its core, a flake is a statement: subject → predicate → object, plus metadata about when it was asserted, which graph it belongs to, and whether it’s an assertion or retraction.

ex:alice  schema:name  "Alice"       (graph: default, t: 1, op: assert)
ex:alice  schema:age   30            (graph: default, t: 1, op: assert)
ex:alice  schema:knows ex:bob        (graph: default, t: 1, op: assert)

Think of it as: “Alice’s name is Alice (added in transaction 1).” The provenance is what makes time travel and immutability possible — every change is a new flake, and retractions are recorded alongside assertions.

In SQL terms, imagine a universal table with columns entity_id, attribute, value, graph, transaction, operation — that can represent any data structure without DDL and preserves complete history.

Terminology note: In RDF standards, the core unit is called a “triple” (subject-predicate-object). Fluree’s “flake” extends the triple with temporal and provenance metadata. You’ll see both terms in the documentation — “triple” when discussing the RDF data model, “flake” when discussing Fluree’s storage and history.

Side by side: common operations

Creating structure

SQL — Define a table:

CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  email VARCHAR(255) UNIQUE,
  department VARCHAR(100),
  salary DECIMAL(10,2),
  manager_id INTEGER REFERENCES employees(id)
);

Fluree — Just insert data:

fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex:     <http://example.org/> .

ex:alice  a schema:Person ;
  schema:name        "Alice Smith" ;
  schema:email       "alice@example.com" ;
  ex:department      "Engineering" ;
  ex:salary          125000 ;
  ex:reportsTo       ex:bob .

ex:bob  a schema:Person ;
  schema:name        "Bob Jones" ;
  schema:email       "bob@example.com" ;
  ex:department      "Engineering" .
'

There’s no CREATE TABLE. Types and properties emerge from the data itself. You can add new properties to any entity at any time without migrations.

Inserting data

SQL:

INSERT INTO employees (name, email, department, salary)
VALUES ('Carol Davis', 'carol@example.com', 'Marketing', 95000);

Fluree (CLI):

fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex:     <http://example.org/> .

ex:carol  a schema:Person ;
  schema:name        "Carol Davis" ;
  schema:email       "carol@example.com" ;
  ex:department      "Marketing" ;
  ex:salary          95000 .
'

Fluree (HTTP API):

curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
  -H "Content-Type: application/ld+json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/",
      "ex": "http://example.org/"
    },
    "@id": "ex:carol",
    "@type": "schema:Person",
    "schema:name": "Carol Davis",
    "schema:email": "carol@example.com",
    "ex:department": "Marketing",
    "ex:salary": 95000
  }'

Basic queries

SQL:

SELECT name, email FROM employees WHERE department = 'Engineering';

Fluree (SPARQL):

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?name ?email
WHERE {
  ?person a schema:Person ;
          schema:name ?name ;
          schema:email ?email ;
          ex:department "Engineering" .
}

Fluree (JSON-LD Query):

{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "select": ["?name", "?email"],
  "where": [
    {
      "@id": "?person", "@type": "schema:Person",
      "schema:name": "?name",
      "schema:email": "?email",
      "ex:department": "Engineering"
    }
  ]
}

Joins

In SQL, joins are explicit operations. In Fluree, relationships are just triples — “joining” is following a link.

SQL — Find employees and their managers:

SELECT e.name AS employee, m.name AS manager
FROM employees e
JOIN employees m ON e.manager_id = m.id;

Fluree (SPARQL):

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?employee ?manager
WHERE {
  ?e schema:name ?employee ;
     ex:reportsTo ?m .
  ?m schema:name ?manager .
}

No JOIN keyword — you just follow the ex:reportsTo link from one entity to another. The database traverses relationships natively.

Multi-hop relationships

This is where graphs shine. “Find everyone in Alice’s reporting chain” requires recursive CTEs in SQL but is natural in a graph.

SQL (recursive CTE):

WITH RECURSIVE chain AS (
  SELECT id, name, manager_id FROM employees WHERE name = 'Alice Smith'
  UNION ALL
  SELECT e.id, e.name, e.manager_id
  FROM employees e JOIN chain c ON e.id = c.manager_id
)
SELECT name FROM chain;

Fluree (SPARQL — property path):

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?name
WHERE {
  ex:alice ex:reportsTo+ ?manager .
  ?manager schema:name ?name .
}

The + after ex:reportsTo means “follow this relationship one or more times.” No recursion needed.

Aggregation

SQL:

SELECT department, COUNT(*) as count, AVG(salary) as avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

Fluree (SPARQL):

PREFIX ex: <http://example.org/>

SELECT ?dept (COUNT(?person) AS ?count) (AVG(?salary) AS ?avg_salary)
WHERE {
  ?person ex:department ?dept ;
          ex:salary ?salary .
}
GROUP BY ?dept
ORDER BY DESC(?avg_salary)

Updates

SQL:

UPDATE employees SET salary = 130000 WHERE name = 'Alice Smith';

Fluree (SPARQL UPDATE):

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

DELETE { ?person ex:salary ?oldSalary }
INSERT { ?person ex:salary 130000 }
WHERE  { ?person schema:name "Alice Smith" ; ex:salary ?oldSalary }

The WHERE finds Alice, DELETE removes the old salary, and INSERT adds the new one. This is atomic.

Fluree (CLI — upsert for simpler cases):

fluree upsert '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "@id": "ex:alice",
  "ex:salary": 130000
}'

Upsert replaces the salary value if Alice already exists, or creates the entity if she doesn’t.

Deletes

SQL:

DELETE FROM employees WHERE name = 'Carol Davis';

Fluree (SPARQL UPDATE):

PREFIX schema: <http://schema.org/>

DELETE { ?person ?p ?o }
WHERE  { ?person schema:name "Carol Davis" ; ?p ?o }

But here’s the key difference: in SQL, the row is gone. In Fluree, the retraction is recorded — you can still query Carol’s data at any previous point in time.

What SQL can’t do

These features have no relational equivalent:

Time travel

Query data as it existed at any point in the past:

# What was Alice's salary before the raise?
fluree query --at 1 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?salary WHERE {
  ?person schema:name "Alice Smith" ; ex:salary ?salary .
}'
# Show the full history of salary changes
fluree history 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?salary ?t ?op WHERE {
  ?person schema:name "Alice Smith" ; ex:salary ?salary .
}'

In SQL, you’d need audit tables, temporal extensions, or trigger-based logging. In Fluree, every change is automatically preserved.

Schema flexibility

Add new properties to any entity without ALTER TABLE:

# Alice now has a phone number — no migration needed
fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex:     <http://example.org/> .

ex:alice schema:telephone "+1-555-0100" .
'

Different entities of the same “type” can have different properties. There’s no fixed set of columns.

Branching

Fork your data to experiment without affecting production:

fluree branch create experiment
fluree use mydb:experiment

# Try risky changes on the branch
fluree update 'PREFIX ex: <http://example.org/>
DELETE { ?p ex:salary ?s }
INSERT { ?p ex:salary 200000 }
WHERE  { ?p ex:salary ?s }'

# Main branch is untouched
fluree query --ledger mydb:main 'SELECT ?name ?salary WHERE {
  ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary
}'

Triple-level access control

SQL databases give you table-level or row-level security. Fluree policies control access to individual facts:

{
  "@id": "ex:hide-salary",
  "f:action": "query",
  "f:resource": { "f:predicate": "ex:salary" },
  "f:allow": false
}

This hides salary data from everyone unless another policy explicitly grants access. The same query returns different results for different users, automatically.

No need for Elasticsearch or Solr alongside your database:

fluree insert '{
  "@context": {"ex": "http://example.org/"},
  "@id": "ex:doc1",
  "ex:content": {
    "@value": "Fluree is a graph database with time travel and integrated search",
    "@type": "@fulltext"
  }
}'

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?id", "?score"],
  "where": [
    {"@id": "?id", "ex:content": "?text"},
    ["bind", "?score", "(fulltext ?text \"graph database search\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]]
}'

Common “but in SQL I would…” questions

“How do I enforce NOT NULL?” Use SHACL shapes to define constraints like required properties, value types, and cardinality.

“How do I enforce UNIQUE?” Fluree supports unique constraints in the ledger configuration.

“How do I do transactions?” Every Fluree transaction is atomic. Multiple operations in a single request either all succeed or all fail.

“How do I create indexes?” Fluree automatically maintains four indexes (SPOT, POST, OPST, PSOT) that cover all query patterns. You don’t need to create indexes manually.

“How do I paginate?” Use LIMIT and OFFSET, just like SQL:

SELECT ?name WHERE { ?p schema:name ?name }
ORDER BY ?name LIMIT 20 OFFSET 40

“How do I do subqueries?” SPARQL supports subqueries natively:

SELECT ?name ?avgSalary WHERE {
  ?person schema:name ?name ; ex:department ?dept .
  { SELECT ?dept (AVG(?s) AS ?avgSalary) WHERE { ?p ex:department ?dept ; ex:salary ?s } GROUP BY ?dept }
}

Next steps

Quickstart: Run the Server

This guide will get the Fluree server running on your machine in minutes.

Installation

Option 1: Shell Installer (macOS / Linux)

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.sh | sh

Option 2: Homebrew (macOS / Linux)

brew install fluree/tap/fluree

Option 3: PowerShell (Windows)

Open PowerShell and run:

irm https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.ps1 | iex

Then open a new PowerShell session and verify fluree --version. The installer adds %USERPROFILE%\bin to your PATH. The binary is unsigned, so Windows SmartScreen may prompt on first run — click More info → Run anyway.

Option 4: Download Pre-built Binary

Download the latest release for your platform from GitHub Releases:

# Linux (x86_64)
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-x86_64-unknown-linux-gnu.tar.xz | tar xJ
chmod +x fluree-db-cli-x86_64-unknown-linux-gnu/fluree

# macOS (Apple Silicon)
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-aarch64-apple-darwin.tar.xz | tar xJ
chmod +x fluree-db-cli-aarch64-apple-darwin/fluree

Option 5: Build from Source

If you have Rust installed:

# Clone the repository
git clone https://github.com/fluree/db.git
cd db

# Build the CLI (includes embedded server)
cargo build --release -p fluree-db-cli

# Binary will be at target/release/fluree

Option 6: Docker

# Pull the image
docker pull fluree/server:latest

# Run the container
docker run -p 8090:8090 fluree/server:latest

For configuration (mounted JSON-LD/TOML config files, env vars, persistent volumes, S3+DynamoDB, query peers, full Compose example), see Running with Docker.

Start the Server

Memory Storage (Development)

Start the server with in-memory storage (data is lost on restart):

fluree server run

You should see output like:

INFO fluree_db_server: Starting Fluree server
INFO fluree_db_server: Storage mode: memory
INFO fluree_db_server: Server listening on 0.0.0.0:8090

File Storage (Persistent)

For persistent storage, specify a storage path:

fluree server run --storage-path /var/lib/fluree

Custom Port

fluree server run --listen-addr 0.0.0.0:9090

Debug Logging

fluree server run --log-level debug

Verify Installation

Check Server Health

curl http://localhost:8090/health

Expected response:

{
  "status": "ok",
  "version": "4.0.3"
}

Create a Ledger

curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -d '{"ledger": "test:main"}'

Insert Data

curl -X POST "http://localhost:8090/v1/fluree/insert" \
  -H "Content-Type: application/json" \
  -H "fluree-ledger: test:main" \
  -d '{
    "@context": {"ex": "http://example.org/"},
    "@id": "ex:alice",
    "ex:name": "Alice"
  }'

Query Data

curl -X POST "http://localhost:8090/v1/fluree/query" \
  -H "Content-Type: application/json" \
  -d '{
    "from": "test:main",
    "select": {"?s": ["*"]},
    "where": [["?s", "ex:name", "?name"]]
  }'

Understanding the Server

Endpoints

Default server endpoints:

EndpointMethodDescription
/healthGETHealth check
/v1/fluree/createPOSTCreate a ledger
/v1/fluree/dropPOSTDrop a ledger
/v1/fluree/queryGET/POSTExecute queries
/v1/fluree/insertPOSTInsert data
/v1/fluree/updatePOSTUpdate with WHERE/DELETE/INSERT
/v1/fluree/eventsGETSSE event stream

See the API Reference for complete endpoint documentation.

Storage Modes

Memory (default):

  • Fast, in-process storage
  • Data lost on restart
  • Best for development and testing

File (with --storage-path):

  • Persistent local file storage
  • Data survives restarts
  • Best for single-server deployments

Configuration

All options can be set via CLI flags or environment variables:

# CLI flag
fluree server run --storage-path /data --log-level debug

# Environment variables
export FLUREE_STORAGE_PATH=/data
export FLUREE_LOG_LEVEL=debug
fluree server run

See Configuration for all options.

Common Configurations

Development

fluree server run --log-level debug

Production (Single Server)

fluree server run \
  --storage-path /var/lib/fluree \
  --indexing-enabled \
  --events-auth-mode required \
  --events-auth-trusted-issuers did:key:z6Mk...

With Background Indexing

fluree server run \
  --storage-path /var/lib/fluree \
  --indexing-enabled

Docker Deployment

For the full Docker guide — image internals, configuration via env vars vs mounted JSON-LD/TOML config files, persistent volumes, LRU cache and indexing tuning, S3+DynamoDB connection configs, query peers, and a production-ready Compose example — see Running with Docker.

Minimal persistent run:

docker run -d --name fluree \
  -p 8090:8090 \
  -v fluree-data:/var/lib/fluree \
  fluree/server:latest

Troubleshooting

Port Already in Use

# Use a different port
fluree server run --listen-addr 0.0.0.0:9090

Permission Denied (File Storage)

sudo chown -R $USER:$USER /var/lib/fluree
chmod -R 755 /var/lib/fluree

Server Won’t Start

Check logs with debug level:

fluree server run --log-level debug

Connection Refused

Verify the server is running and check the listen address:

# Listen on all interfaces (not just localhost)
fluree server run --listen-addr 0.0.0.0:8090

Next Steps

Now that your server is running:

  1. Create a Ledger - Set up your first database
  2. Write Data - Insert your first records
  3. Query Data - Retrieve and explore your data

For production deployments:

Quickstart: Create a Ledger

Ledgers are Fluree’s fundamental unit of data organization—similar to databases in traditional systems. This guide shows you how to create your first ledger.

Understanding Ledger IDs

Ledgers are identified by ledger IDs with the format ledger-name:branch:

  • mydb:main - Primary branch of the “mydb” ledger
  • customers:dev - Development branch of the “customers” ledger
  • inventory:prod - Production branch

The default branch is main, so mydb is equivalent to mydb:main.

Creating a Ledger

Rust API (Library Usage)

When using Fluree as a Rust library, create ledgers explicitly with create_ledger:

#![allow(unused)]
fn main() {
let fluree = FlureeBuilder::memory().build_memory();

// Create a new ledger (returns LedgerState at t=0)
let ledger = fluree.create_ledger("mydb").await?;

// Now insert data
let result = fluree.graph("mydb:main")
    .transact()
    .insert(&data)
    .commit()
    .await?;
}

create_ledger registers the ledger in the nameservice and returns a genesis LedgerState ready for transactions. It returns ApiError::LedgerExists (HTTP 409) if the ledger already exists.

To load an existing ledger, use ledger:

#![allow(unused)]
fn main() {
let ledger = fluree.ledger("mydb:main").await?;
}

HTTP API (Server Usage)

Via the HTTP API, create a ledger explicitly with POST /v1/fluree/create, then write data with POST /v1/fluree/insert.

Step 1: Create the Ledger

curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

Response:

{
  "ledger_id": "mydb:main",
  "t": 0,
  "tx-id": "fluree:tx:sha256:...",
  "commit": {"hash": ""}
}

Step 2: Insert Data

curl -X POST http://localhost:8090/v1/fluree/insert \
  -H "Content-Type: application/json" \
  -H "fluree-ledger: mydb:main" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:alice",
        "@type": "schema:Person",
        "schema:name": "Alice",
        "schema:email": "alice@example.org"
      }
    ]
  }'

Response:

{
  "ledger_id": "mydb:main",
  "t": 1,
  "tx-id": "fluree:tx:sha256:...",
  "commit": {"hash": "bagaybqab..."}
}

The ledger mydb:main now has data!

Verifying Ledger Creation

Check Ledger Exists

curl http://localhost:8090/v1/fluree/exists/mydb:main

Response:

{
  "ledger_id": "mydb:main",
  "exists": true
}

Query the Ledger

Verify you can query the new ledger:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main",
    "select": ["?name"],
    "where": [
      { "@id": "?person", "schema:name": "?name" }
    ]
  }'

Response:

[
  { "name": "Alice" }
]

Ledger Naming Best Practices

Descriptive Names

Choose names that clearly indicate purpose:

Good examples:

  • customers:main
  • inventory:prod
  • analytics:warehouse

Bad examples:

  • db1:main
  • test:main
  • data:main

Hierarchical Organization

Use slashes for logical grouping:

tenant/app:main
tenant/app:dev
department/project:feature-x

Branch Naming

Establish consistent branch naming conventions:

mydb:main              - Production branch
mydb:dev               - Development branch
mydb:staging           - Staging branch
mydb:feature-auth      - Feature branch
mydb:bugfix-login      - Bug fix branch

Working with Branches

Creating a New Branch

Branches are independent ledgers. First create the branch, then transact data into it:

# Create the branch
curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:dev"}'

# Insert data into the branch
curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:dev \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:bob",
        "@type": "schema:Person",
          "schema:name": "Bob"
      }
    ]
  }'

Now you have two independent ledgers:

  • mydb:main (with Alice)
  • mydb:dev (with Bob)

Understanding Branch Independence

Branches are completely independent—changes in one don’t affect the other:

# Query main branch
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "mydb:main", "select": ["?name"], "where": [{"@id": "?person", "schema:name": "?name"}]}'
# Returns: [{"name": "Alice"}]

# Query dev branch
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "mydb:dev", "select": ["?name"], "where": [{"@id": "?person", "schema:name": "?name"}]}'
# Returns: [{"name": "Bob"}]

Ledger Metadata

Each ledger maintains metadata accessible via the nameservice:

  • commit_t: Latest transaction time
  • index_t: Latest indexed transaction time
  • commit_id: ContentId (CID) of the latest commit
  • index_id: ContentId (CID) of the latest index
  • default_context: Default JSON-LD @context for the ledger

Checking Ledger Status

curl http://localhost:8090/v1/fluree/info/mydb:main

Response:

{
  "ledger_id": "mydb:main",
  "branch": "main",
  "commit_t": 1,
  "index_t": 1,
  "commit_id": "bafybeig...commitT1",
  "index_id": "bafybeig...indexT1",
  "created": "2024-01-22T10:30:00.000Z",
  "last_updated": "2024-01-22T10:30:05.000Z"
}

Understanding Commit vs Index

  • commit_t: Most recent transaction (always up-to-date)
  • index_t: Most recent indexed snapshot (may lag behind commits)
  • Gap: If commit_t > index_t, there’s a “novelty layer” being indexed

See Ledgers and Nameservice for details.

Multi-Tenant Scenarios

For multi-tenant applications, use hierarchical naming:

tenant1/app:main
tenant1/app:dev
tenant2/app:main
tenant2/app:dev

Or use separate ledgers per tenant:

tenant1-customers:main
tenant1-orders:main
tenant2-customers:main
tenant2-orders:main

Setting Default Context

A ledger may have a stored default JSON-LD @context that the CLI and HTTP server can auto-inject into queries that omit @context / PREFIX. Two ways to set it:

  1. At import time: fluree create --from data.ttl captures @prefix declarations from the Turtle source and stores them as the default.
  2. Explicitly: fluree context set <ledger> <ctx.json>, or PUT /v1/fluree/context/{ledger...} over HTTP.

Regular JSON-LD transactions (insert/update) do not update the default context — only the two paths above do.

// One-time setup via the CLI:
// fluree context set mydb context.json
{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  }
}

After this, the CLI (fluree query) and the HTTP server query endpoint will inject the stored context into queries that don’t supply their own @context / PREFIX. Direct fluree-db-api consumers do not get auto-injection — they must opt in via Fluree::db_with_default_context(...) or include @context in each query. See docs/concepts/iri-and-context.md for the full opt-in story.

Common Patterns

Development Workflow

1. Create main branch: mydb:main
2. Create dev branch: mydb:dev
3. Develop and test in dev
4. Copy desired state to main (application logic)
5. Repeat

Feature Branching

1. Create feature branch: mydb:feature-x
2. Develop feature in isolation
3. Test thoroughly
4. Merge to main (via application logic)
5. Optionally retract feature branch

Environment Separation

mydb:dev      - Development environment
mydb:staging  - Staging environment
mydb:prod     - Production environment

Troubleshooting

Ledger Already Exists

If you try to query a ledger before it exists:

Error: Ledger not found: mydb:main

Solution: Create the ledger with a transaction first.

Permission Issues (File Storage)

If using file storage, ensure the server has write permissions:

# Check data directory permissions
ls -la /path/to/data

# Fix permissions if needed
sudo chown -R fluree:fluree /path/to/data
chmod -R 755 /path/to/data

AWS Storage Issues

For AWS storage, verify credentials and bucket access:

# Test S3 access
aws s3 ls s3://your-fluree-bucket/

# Test DynamoDB access
aws dynamodb describe-table --table-name fluree-nameservice

Next Steps

Now that you have a ledger:

  1. Write Data - Learn how to insert, upsert, and update data
  2. Query Data - Explore your data with queries
  3. Concepts: Ledgers - Deep dive into ledger architecture

Quickstart: Write Data

This guide shows you how to write data to Fluree using three main patterns: insert, upsert, and update.

Prerequisites

Understanding Fluree Transactions

Fluree stores data as RDF triples (subject-predicate-object). Transactions are submitted as JSON-LD documents that get converted to triples internally.

Basic Transaction Structure

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice"
    }
  ]
}

This creates triples like:

ex:alice  rdf:type        schema:Person
ex:alice  schema:name     "Alice"

Insert: Adding New Data

The simplest operation is inserting new entities.

Insert a Single Entity

curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:alice",
        "@type": "schema:Person",
        "schema:name": "Alice",
        "schema:email": "alice@example.org",
        "schema:age": 30
      }
    ]
  }'

Response:

{
  "t": 1,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT1",
  "flakes_added": 4,
  "flakes_retracted": 0
}

Insert Multiple Entities

curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:bob",
        "@type": "schema:Person",
        "schema:name": "Bob",
        "schema:email": "bob@example.org"
      },
      {
        "@id": "ex:carol",
        "@type": "schema:Person",
        "schema:name": "Carol",
        "schema:email": "carol@example.org"
      }
    ]
  }'

Insert with Relationships

curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:company-a",
        "@type": "schema:Organization",
        "schema:name": "Acme Corp"
      },
      {
        "@id": "ex:alice",
        "@type": "schema:Person",
        "schema:name": "Alice",
        "schema:worksFor": {"@id": "ex:company-a"}
      }
    ]
  }'

Upsert: Idempotent Transactions

Upsert (update/insert) replaces values for the predicates you supply on an entity. If the entity doesn’t exist, it’s created.

Basic Upsert

Use the dedicated /upsert endpoint:

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:alice",
        "@type": "schema:Person",
        "schema:name": "Alice Smith",
        "schema:email": "alice.smith@example.org",
        "schema:age": 31
      }
    ]
  }'

This replaces existing values for the predicates included in the payload (for ex:alice, those are @type, schema:name, schema:email, schema:age).

Upsert Behavior

First transaction (entity doesn’t exist):

  • Creates the entity with all specified properties

Subsequent transactions (entity exists):

  • Retracts existing values for the supplied predicates
  • Asserts new values for those predicates
  • Leaves other predicates unchanged

Use Cases for Upsert

Good for:

  • Idempotent transactions (can retry safely)
  • Syncing from external systems
  • Replacing values for the predicates you supply
  • Avoiding duplicate checks

Not good for:

  • Conditional/targeted changes (use UPDATE instead)

Update: Targeted Changes (WHERE/DELETE/INSERT)

For targeted changes to existing data, use the UPDATE pattern with WHERE/DELETE/INSERT.

Basic Update

curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "where": [
      { "@id": "ex:alice", "schema:age": "?oldAge" }
    ],
    "delete": [
      { "@id": "ex:alice", "schema:age": "?oldAge" }
    ],
    "insert": [
      { "@id": "ex:alice", "schema:age": 32 }
    ]
  }'

This pattern:

  1. WHERE: Finds matching data
  2. DELETE: Retracts specific triples
  3. INSERT: Asserts new triples

Update Multiple Properties

curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "where": [
      { "@id": "ex:alice", "schema:name": "?name", "schema:email": "?email" }
    ],
    "delete": [
      { "@id": "ex:alice", "schema:name": "?name", "schema:email": "?email" }
    ],
    "insert": [
      { "@id": "ex:alice", "schema:name": "Alice Johnson", "schema:email": "alice.j@example.org" }
    ]
  }'

Conditional Update

Only update if condition is met:

curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "where": [
      { "@id": "ex:alice", "schema:age": "?age" },
      { "@id": "?age", "@type": "xsd:integer" }
    ],
    "delete": [
      { "@id": "ex:alice", "schema:age": "?age" }
    ],
    "insert": [
      { "@id": "ex:alice", "schema:age": { "@value": "32", "@type": "xsd:integer" } }
    ]
  }'

Adding Properties (Not Replacing)

To add a property without removing existing ones, use INSERT only:

curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "insert": [
      { "@id": "ex:alice", "schema:telephone": "+1-555-0100" }
    ]
  }'

This adds the telephone property without affecting other properties.

Data Types

Fluree supports various data types through JSON-LD typing:

Strings (Default)

{
  "@id": "ex:alice",
  "schema:name": "Alice"
}

Numbers

{
  "@id": "ex:alice",
  "schema:age": 30,
  "schema:height": 1.68
}

Booleans

{
  "@id": "ex:alice",
  "schema:active": true
}

Dates

{
  "@id": "ex:alice",
  "schema:birthDate": {
    "@value": "1994-05-15",
    "@type": "xsd:date"
  }
}

Timestamps

{
  "@id": "ex:alice",
  "schema:lastLogin": {
    "@value": "2024-01-22T10:30:00Z",
    "@type": "xsd:dateTime"
  }
}
{
  "@id": "ex:alice",
  "schema:worksFor": { "@id": "ex:company-a" }
}

Transaction Receipts

Every successful transaction returns a receipt with metadata:

{
  "t": 5,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT5",
  "flakes_added": 3,
  "flakes_retracted": 2,
  "previous_commit_id": "bafybeig...commitT4"
}

Key fields:

  • t: Transaction time (monotonically increasing)
  • timestamp: ISO 8601 timestamp
  • commit_id: Content-addressed identifier (CID) for the commit
  • flakes_added: Number of triples added
  • flakes_retracted: Number of triples removed
  • previous_commit_id: ContentId of the previous commit (present when t > 1)

See Commit Receipts for details.

Error Handling

Transaction Errors

If a transaction fails, you’ll receive an error response:

{
  "error": "TransactionError",
  "message": "Invalid IRI: not a valid URI",
  "code": "INVALID_IRI"
}

Common errors:

  • INVALID_IRI: Malformed IRIs
  • PARSE_ERROR: Invalid JSON-LD syntax
  • TYPE_ERROR: Type mismatch
  • CONSTRAINT_VIOLATION: Data constraint violated

Validation

Transactions are validated before being applied:

  • JSON-LD syntax must be valid
  • IRIs must be well-formed
  • Types must be compatible
  • References must resolve (optional)

Best Practices

1. Use Appropriate Transaction Pattern

  • Insert: New entities, no duplication concerns
  • Upsert: Idempotent transactions, predicate-level replacement for supplied predicates
  • Update: Targeted changes, preserve other properties

2. Choose Meaningful IRIs

Good:

{"@id": "ex:user-12345"}
{"@id": "ex:product-widget-2024"}

Bad:

{"@id": "ex:1"}
{"@id": "ex:thing"}

3. Use Consistent Namespaces

Define a clear namespace strategy:

{
  "@context": {
    "app": "https://myapp.com/ns/",
    "schema": "http://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  }
}

Include related entities in a single transaction:

{
  "@graph": [
    {"@id": "ex:order-123", "ex:customer": {"@id": "ex:alice"}},
    {"@id": "ex:order-123", "ex:product": {"@id": "ex:widget"}},
    {"@id": "ex:order-123", "ex:quantity": 5}
  ]
}

5. Use Typed Literals

Be explicit about types for dates, numbers, etc.:

{
  "@id": "ex:alice",
  "ex:birthDate": {
    "@value": "1994-05-15",
    "@type": "xsd:date"
  }
}

Transaction Size Limits

Be aware of transaction size constraints:

  • Recommended: < 1000 triples per transaction
  • Maximum: Configurable (default: 10,000 triples)
  • Large imports: Use batch processing

See Indexing Side-Effects for performance considerations.

Next Steps

Now that you can write data:

  1. Query Data - Learn how to retrieve your data
  2. Transactions Overview - Detailed transaction documentation
  3. JSON-LD Context - Understanding @context
  • Insert - Detailed insert documentation
  • Upsert - Detailed upsert documentation
  • Update - Detailed update documentation
  • Data Types - Comprehensive type system guide

Quickstart: Query Data

This guide introduces you to querying data in Fluree using both JSON-LD Query and SPARQL.

Prerequisites

  • Fluree server running with data (complete previous quickstarts)
  • Sample data from Write Data guide

Query Languages

Fluree supports two query languages:

  • JSON-LD Query: Fluree’s native JSON-based query language
  • SPARQL: W3C standard RDF query language

Both provide access to the same data and features.

JSON-LD Query

Basic SELECT Query

Retrieve all person names:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main",
    "select": ["?name"],
    "where": [
      { "@id": "?person", "schema:name": "?name" }
    ]
  }'

Response:

[
  { "name": "Alice" },
  { "name": "Bob" },
  { "name": "Carol" }
]

Query Multiple Properties

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main",
    "select": ["?name", "?email"],
    "where": [
      { "@id": "?person", "schema:name": "?name" },
      { "@id": "?person", "schema:email": "?email" }
    ]
  }'

Response:

[
  { "name": "Alice", "email": "alice@example.org" },
  { "name": "Bob", "email": "bob@example.org" },
  { "name": "Carol", "email": "carol@example.org" }
]

Filter Results

Query with a specific filter:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main",
    "select": ["?name", "?age"],
    "where": [
      { "@id": "?person", "schema:name": "?name" },
      { "@id": "?person", "schema:age": "?age" }
    ],
    "filter": "?age > 25"
  }'

Query Specific Entity

Query a specific entity by IRI:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "from": "mydb:main",
    "select": ["?name", "?email", "?age"],
    "where": [
      { "@id": "ex:alice", "schema:name": "?name" },
      { "@id": "ex:alice", "schema:email": "?email" },
      { "@id": "ex:alice", "schema:age": "?age" }
    ]
  }'

Query with Relationships

Follow links between entities:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main",
    "select": ["?personName", "?companyName"],
    "where": [
      { "@id": "?person", "schema:name": "?personName" },
      { "@id": "?person", "schema:worksFor": "?company" },
      { "@id": "?company", "schema:name": "?companyName" }
    ]
  }'

Response:

[
  { "personName": "Alice", "companyName": "Acme Corp" }
]

SPARQL

Basic SELECT Query

The same queries in SPARQL syntax:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d '
    PREFIX schema: <http://schema.org/>
    
    SELECT ?name
    FROM <mydb:main>
    WHERE {
      ?person schema:name ?name .
    }
  '

Query Multiple Properties

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d '
    PREFIX schema: <http://schema.org/>
    
    SELECT ?name ?email
    FROM <mydb:main>
    WHERE {
      ?person schema:name ?name .
      ?person schema:email ?email .
    }
  '

Filter Results

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d '
    PREFIX schema: <http://schema.org/>
    
    SELECT ?name ?age
    FROM <mydb:main>
    WHERE {
      ?person schema:name ?name .
      ?person schema:age ?age .
      FILTER (?age > 25)
    }
  '

Query with Relationships

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d '
    PREFIX schema: <http://schema.org/>
    
    SELECT ?personName ?companyName
    FROM <mydb:main>
    WHERE {
      ?person schema:name ?personName .
      ?person schema:worksFor ?company .
      ?company schema:name ?companyName .
    }
  '

Time Travel Queries

Query historical data using time specifiers.

Query at Specific Transaction

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main@t:1",
    "select": ["?name"],
    "where": [
      { "@id": "?person", "schema:name": "?name" }
    ]
  }'

This shows data as it existed at transaction 1.

Query at ISO Timestamp

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main@iso:2024-01-22T10:00:00Z",
    "select": ["?name"],
    "where": [
      { "@id": "?person", "schema:name": "?name" }
    ]
  }'

Query at Commit ContentId

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": "mydb:main@commit:bafybeig...",
    "select": ["?name"],
    "where": [
      { "@id": "?person", "schema:name": "?name" }
    ]
  }'

See Time Travel for comprehensive details.

History Queries

Track changes to entities over time by specifying a time range in the from clause.

Entity History

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "from": "mydb:main@t:1",
    "to": "mydb:main@t:latest",
    "select": ["?name", "?age", "?t", "?op"],
    "where": [
      { "@id": "ex:alice", "schema:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
      { "@id": "ex:alice", "schema:age": "?age" }
    ],
    "orderBy": "?t"
  }'

The @t annotation binds the transaction time, and @op binds the operation type as a boolean (true = assert, false = retract).

Response shows all changes:

[
  ["Alice", 30, 1, true],
  ["Alice", 30, 5, false],
  ["Alicia", 31, 5, true]
]

Property History

Track changes to a specific property:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "from": "mydb:main@t:1",
    "to": "mydb:main@t:latest",
    "select": ["?age", "?t", "?op"],
    "where": [
      { "@id": "ex:alice", "schema:age": { "@value": "?age", "@t": "?t", "@op": "?op" } }
    ],
    "orderBy": "?t"
  }'

Response:

[
  [30, 1, true],
  [30, 5, false],
  [31, 5, true]
]

Aggregations

Count Results

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d '
    PREFIX schema: <http://schema.org/>
    
    SELECT (COUNT(?person) AS ?count)
    FROM <mydb:main>
    WHERE {
      ?person schema:name ?name .
    }
  '

Average, Min, Max

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d '
    PREFIX schema: <http://schema.org/>
    
    SELECT (AVG(?age) AS ?avgAge) (MIN(?age) AS ?minAge) (MAX(?age) AS ?maxAge)
    FROM <mydb:main>
    WHERE {
      ?person schema:age ?age .
    }
  '

Limiting Results

JSON-LD Query Limit

{
  "@context": {
    "schema": "http://schema.org/"
  },
  "from": "mydb:main",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "schema:name": "?name" }
  ],
  "limit": 10
}

SPARQL Limit and Offset

PREFIX schema: <http://schema.org/>

SELECT ?name
FROM <mydb:main>
WHERE {
  ?person schema:name ?name .
}
ORDER BY ?name
LIMIT 10
OFFSET 20

Ordering Results

JSON-LD Query Order

{
  "@context": {
    "schema": "http://schema.org/"
  },
  "from": "mydb:main",
  "select": ["?name", "?age"],
  "where": [
    { "@id": "?person", "schema:name": "?name" },
    { "@id": "?person", "schema:age": "?age" }
  ],
  "orderBy": ["?age"]
}

SPARQL Order

PREFIX schema: <http://schema.org/>

SELECT ?name ?age
FROM <mydb:main>
WHERE {
  ?person schema:name ?name .
  ?person schema:age ?age .
}
ORDER BY DESC(?age)

Multi-Ledger Queries

Query across multiple ledgers:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "schema": "http://schema.org/"
    },
    "from": ["customers:main", "orders:main"],
    "select": ["?customerName", "?orderTotal"],
    "where": [
      { "@id": "?customer", "schema:name": "?customerName" },
      { "@id": "?order", "schema:customer": "?customer" },
      { "@id": "?order", "schema:totalPrice": "?orderTotal" }
    ]
  }'

See Datasets for comprehensive multi-graph query documentation.

Understanding Query Results

JSON-LD Query Results

Results are returned as an array of objects:

[
  { "name": "Alice", "age": 30 },
  { "name": "Bob", "age": 25 }
]

SPARQL Results

SPARQL returns results in SPARQL JSON format:

{
  "head": {
    "vars": ["name", "age"]
  },
  "results": {
    "bindings": [
      {
        "name": { "type": "literal", "value": "Alice" },
        "age": { "type": "literal", "value": "30", "datatype": "http://www.w3.org/2001/XMLSchema#integer" }
      },
      {
        "name": { "type": "literal", "value": "Bob" },
        "age": { "type": "literal", "value": "25", "datatype": "http://www.w3.org/2001/XMLSchema#integer" }
      }
    ]
  }
}

See Output Formats for format details.

Query Performance Tips

1. Use Specific Patterns

More specific patterns are faster:

Good:

{ "@id": "ex:alice", "schema:name": "?name" }

Less efficient:

{ "@id": "?person", "?predicate": "?value" }

2. Filter Early

Apply filters in WHERE clauses when possible:

"where": [
  { "@id": "?person", "schema:age": "?age" }
],
"filter": "?age > 25"

3. Limit Results

Always use LIMIT for large result sets:

"limit": 100

4. Use Indexes

Queries leverage automatic indexes. Structure queries to take advantage:

  • Subject-based lookups are fast
  • Predicate-based lookups are fast
  • Complex graph patterns may be slower

See Explain Plans for query optimization.

Common Query Patterns

Find All Types

SELECT DISTINCT ?type
FROM <mydb:main>
WHERE {
  ?entity a ?type .
}

Find All Predicates

SELECT DISTINCT ?predicate
FROM <mydb:main>
WHERE {
  ?subject ?predicate ?object .
}

Inverse Relationships

Find what points to an entity:

SELECT ?source ?predicate
FROM <mydb:main>
WHERE {
  ?source ?predicate <http://example.org/ns/alice> .
}

Optional Properties

Query with optional values:

PREFIX schema: <http://schema.org/>

SELECT ?name ?email ?phone
FROM <mydb:main>
WHERE {
  ?person schema:name ?name .
  ?person schema:email ?email .
  OPTIONAL { ?person schema:telephone ?phone }
}

Error Handling

Query Errors

Common query errors:

{
  "error": "QueryError",
  "message": "Ledger not found: mydb:main",
  "code": "LEDGER_NOT_FOUND"
}
{
  "error": "ParseError",
  "message": "Invalid JSON-LD: unexpected token",
  "code": "PARSE_ERROR"
}

Empty Results

Empty result set (not an error):

[]

Next Steps

Now that you can query data:

  1. Learn Advanced Queries: Explore JSON-LD Query and SPARQL documentation
  2. Understand Time Travel: Deep dive into Time Travel
  3. Optimize Queries: Read about Explain Plans
  4. Multi-Graph Queries: Learn about Datasets

Tutorial: Building a Knowledge Base with Fluree

This tutorial walks through a realistic scenario — building a team knowledge base — to show how Fluree’s differentiating features work together. You’ll use time travel, full-text search, branching, and access control in a single workflow.

Time: ~20 minutes Prerequisites: Fluree installed and running (fluree init && fluree server run)

Step 1: Create the ledger and add data

fluree create knowledge-base
fluree use knowledge-base

Insert some articles and team members:

fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex:     <http://example.org/> .
@prefix f:      <https://ns.flur.ee/db#> .

ex:alice  a schema:Person ;
  schema:name "Alice Chen" ;
  ex:role     "engineer" ;
  ex:team     "platform" .

ex:bob  a schema:Person ;
  schema:name "Bob Martinez" ;
  ex:role     "engineer" ;
  ex:team     "platform" .

ex:carol  a schema:Person ;
  schema:name "Carol White" ;
  ex:role     "manager" ;
  ex:team     "platform" .

ex:doc1  a ex:Article ;
  schema:name    "Deployment Runbook" ;
  schema:author  ex:alice ;
  ex:team        "platform" ;
  ex:visibility  "internal" ;
  ex:content     "Step 1: Check the monitoring dashboard. Step 2: Run the database migration script. Step 3: Deploy the new container image using the CI pipeline."^^f:fullText .

ex:doc2  a ex:Article ;
  schema:name    "Onboarding Guide" ;
  schema:author  ex:bob ;
  ex:team        "platform" ;
  ex:visibility  "public" ;
  ex:content     "Welcome to the platform team. This guide covers setting up your development environment, accessing the database, and deploying your first service."^^f:fullText .

ex:doc3  a ex:Article ;
  schema:name    "Incident Response Playbook" ;
  schema:author  ex:carol ;
  ex:team        "platform" ;
  ex:visibility  "confidential" ;
  ex:content     "During a production incident, the on-call engineer should check database health, review recent deployments, and escalate if the service is not recovering within 15 minutes."^^f:fullText .
'

Verify the data is there:

fluree query --format table 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?title ?author_name ?visibility
WHERE {
  ?doc a ex:Article ;
       schema:name ?title ;
       schema:author ?author ;
       ex:visibility ?visibility .
  ?author schema:name ?author_name .
}
ORDER BY ?title'
┌─────────────────────────────┬───────────────┬──────────────┐
│ title                       │ author_name   │ visibility   │
├─────────────────────────────┼───────────────┼──────────────┤
│ Deployment Runbook          │ Alice Chen    │ internal     │
│ Incident Response Playbook  │ Carol White   │ confidential │
│ Onboarding Guide            │ Bob Martinez  │ public       │
└─────────────────────────────┴───────────────┴──────────────┘

This is transaction t=1. Remember this — we’ll come back to it.

The article content was inserted with the @fulltext datatype, so it’s automatically indexed for BM25 relevance scoring. Search for articles about deployments:

fluree query '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "select": ["?title", "?score"],
  "where": [
    {
      "@id": "?doc", "@type": "ex:Article",
      "ex:content": "?content",
      "schema:name": "?title"
    },
    ["bind", "?score", "(fulltext ?content \"database deployment\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}'

Results are ranked by relevance — the deployment runbook and incident playbook both mention deployments and databases, while the onboarding guide has a weaker match.

You can combine search with graph filters. Find only public articles matching the search:

fluree query '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "select": ["?title", "?score"],
  "where": [
    {
      "@id": "?doc", "@type": "ex:Article",
      "ex:content": "?content",
      "schema:name": "?title",
      "ex:visibility": "public"
    },
    ["bind", "?score", "(fulltext ?content \"database deployment\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]]
}'

Search results participate in standard graph joins and filters — no separate search service needed.

Step 3: Update data and use time travel

Let’s update the deployment runbook with a new version:

fluree update 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>

DELETE { ex:doc1 ex:content ?old }
INSERT { ex:doc1 ex:content "Step 1: Check the monitoring dashboard and verify all health checks pass. Step 2: Run the database migration script with --dry-run first. Step 3: Deploy the new container image. Step 4: Verify the deployment in staging before promoting to production."^^f:fullText }
WHERE  { ex:doc1 ex:content ?old }'

Now query the current version:

fluree query 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?content WHERE { ex:doc1 ex:content ?content }'

And query the original version using time travel:

fluree query --at 1 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?content WHERE { ex:doc1 ex:content ?content }'

The --at 1 flag queries the data as it was after transaction 1 — before the update. Both versions coexist in the same ledger.

You can also see the full change history:

fluree history 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?content ?t ?op WHERE { ex:doc1 ex:content ?content }'

Each result includes ?t (the transaction number) and ?op (whether it was an assertion or retraction). You see the original content retracted and the new content asserted, with exact timestamps.

Use cases this enables:

  • Audit trails — Who changed what, when?
  • Rollback — See what the data looked like before a bad change
  • Compliance — Prove what was known at a specific point in time
  • Debugging — Compare current vs. historical state to find when a problem was introduced

Step 4: Branch to experiment safely

Suppose you want to reorganize the knowledge base — maybe split articles into categories, or restructure ownership. You don’t want to affect the production data while experimenting.

Create a branch:

fluree branch create reorganize
fluree use knowledge-base:reorganize

On the branch, add categories and reorganize:

fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex:     <http://example.org/> .

ex:doc1 ex:category "operations" .
ex:doc2 ex:category "onboarding" .
ex:doc3 ex:category "operations" .
'
fluree update 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

DELETE { ex:doc3 ex:visibility "confidential" }
INSERT { ex:doc3 ex:visibility "internal" }
WHERE  { ex:doc3 ex:visibility "confidential" }'

Verify the branch has the changes:

fluree query --format table 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?title ?category ?visibility
WHERE {
  ?doc a ex:Article ;
       schema:name ?title ;
       ex:category ?category ;
       ex:visibility ?visibility .
}
ORDER BY ?title'

The main branch is untouched:

fluree query --ledger knowledge-base:main 'PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

SELECT ?title ?visibility
WHERE {
  ?doc a ex:Article ; schema:name ?title ; ex:visibility ?visibility .
  OPTIONAL { ?doc ex:category ?cat }
  FILTER(!BOUND(?cat))
}
ORDER BY ?title'

No categories on main — the branch is fully isolated.

When you’re happy with the changes, merge back:

fluree branch merge reorganize
fluree use knowledge-base:main

Now main has the categories and the visibility change. The branch can continue for future experiments or be dropped:

fluree branch drop reorganize

Step 5: Add access control

Now let’s add policies so that different users see different articles based on their role and team.

Insert policies into the ledger:

fluree insert '{
  "@context": {
    "f": "https://ns.flur.ee/db#",
    "ex": "http://example.org/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:policy-public-read",
      "@type": "f:Policy",
      "f:action": "query",
      "f:resource": { "ex:visibility": "public" },
      "f:allow": true
    },
    {
      "@id": "ex:policy-team-internal",
      "@type": "f:Policy",
      "f:subject": "?user",
      "f:action": "query",
      "f:resource": {
        "ex:visibility": "internal",
        "ex:team": "?team"
      },
      "f:condition": [
        { "@id": "?user", "ex:team": "?team" }
      ],
      "f:allow": true
    },
    {
      "@id": "ex:policy-manager-confidential",
      "@type": "f:Policy",
      "f:subject": "?user",
      "f:action": "query",
      "f:resource": {
        "ex:visibility": "confidential",
        "ex:team": "?team"
      },
      "f:condition": [
        { "@id": "?user", "ex:team": "?team", "ex:role": "manager" }
      ],
      "f:allow": true
    }
  ]
}'

These three policies create a layered access model:

  1. Public articles — visible to everyone
  2. Internal articles — visible only to members of the same team
  3. Confidential articles — visible only to managers on the same team

Query as Alice (engineer, platform team):

fluree query '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "select": ["?title", "?visibility"],
  "where": [
    {"@id": "?doc", "@type": "ex:Article", "schema:name": "?title", "ex:visibility": "?visibility"}
  ],
  "opts": {"identity": "ex:alice"}
}'

Alice sees the public onboarding guide and the internal deployment runbook, but not the confidential incident playbook.

Query as Carol (manager, platform team):

fluree query '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "select": ["?title", "?visibility"],
  "where": [
    {"@id": "?doc", "@type": "ex:Article", "schema:name": "?title", "ex:visibility": "?visibility"}
  ],
  "opts": {"identity": "ex:carol"}
}'

Carol sees all three articles, including the confidential one.

The same query, different results, based on who’s asking — enforced by the database, not application code.

Step 6: Combine everything

Now let’s use all features together. Carol (manager) searches for articles about “database” in the knowledge base, with policies applied, and compares what she sees now vs. what existed before the reorganization:

Current state, with policy:

fluree query '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "select": ["?title", "?visibility", "?score"],
  "where": [
    {
      "@id": "?doc", "@type": "ex:Article",
      "ex:content": "?content",
      "schema:name": "?title",
      "ex:visibility": "?visibility"
    },
    ["bind", "?score", "(fulltext ?content \"database\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "opts": {"identity": "ex:carol"}
}'

Historical state (before runbook was updated):

fluree query '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "from": "knowledge-base:main@t:1",
  "select": ["?title", "?score"],
  "where": [
    {
      "@id": "?doc", "@type": "ex:Article",
      "ex:content": "?content",
      "schema:name": "?title"
    },
    ["bind", "?score", "(fulltext ?content \"database\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]]
}'

In a single database, you’ve combined:

  • Full-text search — ranked by relevance
  • Access control — Carol sees confidential articles, others wouldn’t
  • Time travel — compare current vs. historical content
  • Branching — experimented with reorganization without risk

What you’ve learned

FeatureWhat it gave you
LedgerA single place for all knowledge base data
Full-text searchBM25-ranked article discovery, integrated in queries
Time travelComplete audit trail, historical comparison, rollback capability
BranchingSafe experimentation without affecting production
PoliciesAutomatic access control based on team and role
SPARQL + JSON-LDTwo query languages accessing the same engine

Next steps

Using Fluree as a Rust Library

This guide shows how to use Fluree programmatically in your Rust applications by depending on the fluree-db-api crate.

Overview

Fluree can be embedded directly in Rust applications, giving you a powerful graph database without requiring a separate server process. This is ideal for:

  • Desktop applications
  • Edge computing
  • Embedded systems
  • Library/framework integration
  • Testing and development

Add Dependency

Add Fluree to your Cargo.toml:

[dependencies]
fluree-db-api = { path = "../fluree-db-api" }
tokio = { version = "1", features = ["full"] }

Note: Replace path with version when published to crates.io:

[dependencies]
fluree-db-api = "0.1"

Features

Available feature flags:

  • native (default) - File storage support
  • credential (default in server/CLI) - DID/JWS/VerifiableCredential support for signed queries and transactions
  • shacl (default in server/CLI) - SHACL constraint validation
  • iceberg (default in server/CLI) - Apache Iceberg/R2RML graph source support
  • aws - AWS-backed storage support (S3, storage-backed nameservice). Enables FlureeBuilder::s3() and S3-based JSON-LD configs.
  • ipfs - IPFS-backed storage via Kubo HTTP RPC
  • vector - Embedded vector similarity search (HNSW indexes via usearch)
  • search-remote-client - Remote search service client (HTTP client for remote BM25 and vector search services)
  • aws-testcontainers - Opt-in LocalStack-backed S3/DynamoDB tests (auto-start via testcontainers)
  • full - Convenience bundle: native, credential, iceberg, shacl, ipfs

Quick Start

Basic Setup

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    // Create a memory-backed Fluree instance
    let fluree = FlureeBuilder::memory().build_memory();

    // Create a new ledger
    let ledger = fluree.create_ledger("mydb").await?;

    println!("Ledger created at t={}", ledger.t());

    Ok(())
}

With File Storage

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    // Use file-backed storage for persistence
    let fluree = FlureeBuilder::file("./data").build()?;

    // Create a new ledger (or load an existing one)
    let ledger = fluree.create_ledger("mydb").await?;

    // Load an existing ledger by ID (`name:branch`)
    let ledger = fluree.ledger("mydb:main").await?;

    Ok(())
}

Bulk import (high throughput)

For initial ledger bootstraps (large Turtle or JSON-LD datasets), Fluree exposes a bulk import pipeline as a first-class Rust API:

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // `chunks_dir` can be:
    // - a directory containing *.ttl, *.trig, or *.jsonld files (sorted lexicographically), OR
    // - a single .ttl or .jsonld file.
    // Directories must contain a single format (no mixing Turtle and JSON-LD).
    let result = fluree
        .create("dblp:main")
        .import("./chunks_dir")
        .threads(8)          // parallel TTL parsing; commits remain serial
        .build_index(true)   // write an index root and publish it
        .publish_every(50)   // nameservice checkpoints during long imports (0 disables)
        .cleanup(true)       // delete tmp import files on success
        .execute()
        .await?;

    println!(
        "import complete: t={}, flakes={}, root={:?}",
        result.t, result.flake_count, result.root_id
    );

    // Query normally after import (loads the published V2 root from CAS).
    let view = fluree.view("dblp:main").await?;
    let qr = fluree
        .query(&view, "SELECT * WHERE { ?s ?p ?o } LIMIT 10")
        .await?;
    println!("rows={}", qr.batches.iter().map(|b| b.len()).sum::<usize>());

    Ok(())
}

Temporary files: the bulk import pipeline uses a session-scoped tmp_import/ directory and removes it only on full success (unless .cleanup(false) is set). On failure, it keeps the session directory and logs its path for debugging.

With S3 Storage

Requires fluree-db-api feature aws and standard AWS credential/region configuration.

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    // LocalStack/MinIO: endpoint is required
    let fluree = FlureeBuilder::s3("my-bucket", "http://localhost:4566")
        .build_client()
        .await?;

    let ledger = fluree.create_ledger("mydb").await?;
    println!("Ledger created at t={}", ledger.t());
    Ok(())
}

S3 Express One Zone note: for directory buckets (--x-s3 suffix), omit s3Endpoint in JSON-LD config and let the SDK handle it.

Connection Configuration (JSON-LD)

For advanced configuration (tiered storage, address identifier routing, DynamoDB nameservice, environment variable indirection), use FlureeBuilder::from_json_ld() to parse a JSON-LD config and build from it. The typed builder methods (build(), build_memory(), build_s3()) and the type-erased build_client() all share the same underlying construction logic.

See also: JSON-LD connection configuration reference.

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let cfg = json!({
        "@context": {"@base": "https://ns.flur.ee/config/connection/", "@vocab": "https://ns.flur.ee/system#"},
        "@graph": [
            {"@id": "s3Index", "@type": "Storage", "s3Bucket": {"envVar": "INDEX_BUCKET"}, "s3Endpoint": {"envVar": "S3_ENDPOINT"}},
            {"@id": "conn", "@type": "Connection", "indexStorage": {"@id": "s3Index"}}
        ]
    });
    // from_json_ld parses the config into builder settings; build_client() constructs
    // a type-erased FlureeClient suitable for runtime-determined backends.
    let fluree = FlureeBuilder::from_json_ld(&cfg)?.build_client().await?;
    Ok(())
}

Environment variables (ConfigurationValue)

Any string/number config value can be specified directly or via a ConfigurationValue object:

{
  "s3Bucket": { "envVar": "FLUREE_S3_BUCKET", "defaultVal": "my-bucket" },
  "cacheMaxMb": { "envVar": "FLUREE_CACHE_MAX_MB", "defaultVal": "1024" }
}

Supported JSON-LD fields (Rust)

Connection node:

  • parallelism
  • cacheMaxMb
  • indexStorage, commitStorage
  • primaryPublisher (publisher node)

Storage node:

  • File: filePath, AES256Key
  • S3: s3Bucket, s3Prefix, s3Endpoint, s3ReadTimeoutMs, s3WriteTimeoutMs, s3ListTimeoutMs, s3MaxRetries, s3RetryBaseDelayMs, s3RetryMaxDelayMs

Publisher node:

  • DynamoDB nameservice: dynamodbTable, dynamodbRegion, dynamodbEndpoint, dynamodbTimeoutMs
  • Storage-backed nameservice: storage (reference to a Storage node)

Core Patterns

The Graph API

The primary API revolves around fluree.graph(graph_ref), which returns a lazy Graph handle. No I/O occurs until a terminal method (.execute(), .commit(), .load()) is called.

Use graph(...).query() when the target may be a mapped graph source as well as a native ledger. If the query body itself carries "from" / FROM, use query_from(). The lower-level fluree.db(...) + fluree.query(&view, ...) path is for materialized native ledger snapshots, not graph source aliases.

When I/O happens:

  • .execute() / .execute_formatted() / .execute_tracked() — loads the graph from storage, then runs the query (each call reloads)
  • .commit() — loads the cached ledger handle, stages, and commits
  • .stage() — loads the ledger and stages without committing
  • .load() — loads the graph once, returning a GraphSnapshot for repeated queries without reloading
#![allow(unused)]
fn main() {
// Lazy query — loads graph and executes in one step
let result = fluree.graph("mydb:main")
    .query()
    .sparql("SELECT ?name WHERE { ?s <http://schema.org/name> ?name }")
    .execute()
    .await?;

// Lazy transact + commit
let out = fluree.graph("mydb:main")
    .transact()
    .insert(&data)
    .commit()
    .await?;

// Materialize for reuse (avoids reloading on each query)
let db = fluree.graph("mydb:main").load().await?;
let r1 = db.query().sparql("SELECT ...").execute().await?;
let r2 = db.query().jsonld(&q).execute().await?;

// Time travel
let result = fluree.graph_at("mydb:main", TimeSpec::AtT(42))
    .query()
    .jsonld(&q)
    .execute()
    .await?;
}

Insert Data

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();
    let ledger = fluree.create_ledger("mydb").await?;

    // Insert JSON-LD data using the Graph API
    let data = json!({
        "@context": {
            "schema": "http://schema.org/",
            "ex": "http://example.org/ns/"
        },
        "@graph": [
            {
                "@id": "ex:alice",
                "@type": "schema:Person",
                "schema:name": "Alice",
                "schema:email": "alice@example.org",
                "schema:age": 30
            },
            {
                "@id": "ex:bob",
                "@type": "schema:Person",
                "schema:name": "Bob",
                "schema:email": "bob@example.org",
                "schema:age": 25
            }
        ]
    });

    let result = fluree.graph("mydb:main")
        .transact()
        .insert(&data)
        .commit()
        .await?;

    println!("Transaction committed");

    Ok(())
}

Query Data with JSON-LD Query

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();
    let ledger = fluree.create_ledger("mydb").await?;

    // Insert test data first (see Insert Data above)
    // ...

    // Query with JSON-LD using the lazy Graph API
    let query = json!({
        "select": ["?name", "?email"],
        "where": [
            { "@id": "?person", "@type": "schema:Person" },
            { "@id": "?person", "schema:name": "?name" },
            { "@id": "?person", "schema:email": "?email" },
            { "@id": "?person", "schema:age": "?age" }
        ],
        "filter": "?age > 25"
    });

    let result = fluree.graph("mydb:main")
        .query()
        .jsonld(&query)
        .execute_formatted()
        .await?;

    println!("Query results: {}",
        serde_json::to_string_pretty(&result)?);

    Ok(())
}

Query Data with SPARQL

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();
    let ledger = fluree.create_ledger("mydb").await?;

    // Insert test data first (see Insert Data above)
    // ...

    // Query with SPARQL using the lazy Graph API
    let sparql = r#"
        PREFIX schema: <http://schema.org/>

        SELECT ?name ?email
        WHERE {
            ?person a schema:Person .
            ?person schema:name ?name .
            ?person schema:email ?email .
            ?person schema:age ?age .
            FILTER (?age > 25)
        }
        ORDER BY ?name
    "#;

    let result = fluree.graph("mydb:main")
        .query()
        .sparql(sparql)
        .execute_formatted()
        .await?;

    println!("Results: {}",
        serde_json::to_string_pretty(&result)?);

    Ok(())
}

Update Data

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();
    let ledger = fluree.create_ledger("mydb").await?;

    // Update using WHERE/DELETE/INSERT pattern
    let update = json!({
        "@context": { "schema": "http://schema.org/" },
        "where": [
            { "@id": "?person", "schema:name": "Alice" },
            { "@id": "?person", "schema:age": "?oldAge" }
        ],
        "delete": [
            { "@id": "?person", "schema:age": "?oldAge" }
        ],
        "insert": [
            { "@id": "?person", "schema:age": 31 }
        ]
    });

    let result = fluree.graph("mydb:main")
        .transact()
        .update(&update)
        .commit()
        .await?;

    println!("Updated successfully");

    Ok(())
}

SPARQL UPDATE

Use SPARQL UPDATE syntax for transactions:

use fluree_db_api::{
    FlureeBuilder, Result,
    parse_sparql, lower_sparql_update, NamespaceRegistry, TxnOpts,
    SparqlQueryBody,
};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Get a cached ledger handle
    let handle = fluree.ledger_cached("mydb:main").await?;

    // SPARQL UPDATE string
    let sparql = r#"
        PREFIX ex: <http://example.org/ns/>

        DELETE {
            ?person ex:age ?oldAge .
        }
        INSERT {
            ?person ex:age 31 .
        }
        WHERE {
            ?person ex:name "Alice" .
            ?person ex:age ?oldAge .
        }
    "#;

    // Parse SPARQL
    let parse_output = parse_sparql(sparql);
    if parse_output.has_errors() {
        // Handle parse errors
        for diag in parse_output.diagnostics.iter().filter(|d| d.is_error()) {
            eprintln!("Parse error: {}", diag.message);
        }
        return Err(fluree_db_api::ApiError::Internal("SPARQL parse error".into()));
    }

    let ast = parse_output.ast.unwrap();

    // Extract the UPDATE operation
    let update_op = match &ast.body {
        SparqlQueryBody::Update(op) => op,
        _ => return Err(fluree_db_api::ApiError::Internal("Expected SPARQL UPDATE".into())),
    };

    // Get namespace registry from the ledger
    let snapshot = handle.snapshot().await;
    let mut ns = NamespaceRegistry::from_db(&snapshot.snapshot);

    // Lower SPARQL UPDATE to Txn IR
    let txn = lower_sparql_update(update_op, &ast.prologue, &mut ns, TxnOpts::default())?;

    // Execute the transaction
    let result = fluree.stage(&handle)
        .txn(txn)
        .execute()
        .await?;

    println!("SPARQL UPDATE committed at t={}", result.receipt.t);

    Ok(())
}

Supported SPARQL UPDATE operations:

  • INSERT DATA - Insert ground triples
  • DELETE DATA - Delete specific triples
  • DELETE WHERE - Delete matching patterns
  • DELETE/INSERT WHERE - Full update with patterns

See SPARQL UPDATE for syntax details.

Stage and Preview Changes

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();
    let ledger = fluree.create_ledger("mydb").await?;

    let data = json!({
        "@context": {"ex": "http://example.org/ns/"},
        "@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
    });

    // Stage without committing
    let staged = fluree.graph("mydb:main")
        .transact()
        .insert(&data)
        .stage()
        .await?;

    // Query the staged state to preview changes
    let preview_query = json!({
        "select": ["?name"],
        "where": [{"@id": "ex:alice", "ex:name": "?name"}]
    });

    let preview = staged.query()
        .jsonld(&preview_query)
        .execute()
        .await?;

    println!("Preview: {} rows", preview.row_count());

    Ok(())
}

Note: StagedGraph currently supports querying only. Staging on top of a staged transaction and committing from a StagedGraph are not yet supported.

Export Data

Stream ledger data as Turtle, N-Triples, N-Quads, TriG, or JSON-LD using the builder API:

use fluree_db_api::{FlureeBuilder, Result};
use fluree_db_api::export::ExportFormat;
use std::io::BufWriter;
use std::fs::File;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Export as Turtle to a file
    let file = File::create("backup.ttl").unwrap();
    let mut writer = BufWriter::new(file);
    let stats = fluree.export("mydb")
        .format(ExportFormat::Turtle)
        .write_to(&mut writer)
        .await?;
    println!("Exported {} triples", stats.triples_written);

    // Export as JSON-LD with custom prefixes
    let mut buf = Vec::new();
    let stats = fluree.export("mydb")
        .format(ExportFormat::JsonLd)
        .context(&serde_json::json!({"ex": "http://example.org/"}))
        .write_to(&mut buf)
        .await?;

    // Export all graphs as N-Quads (dataset export)
    let stats = fluree.export("mydb")
        .format(ExportFormat::NQuads)
        .all_graphs()
        .to_stdout()
        .await?;

    Ok(())
}

All formats stream directly from the binary SPOT index. Memory usage is O(leaflet size) for line-oriented formats and O(largest subject) for JSON-LD, regardless of dataset size.

Builder methods:

  • .format(ExportFormat) — output format (default: Turtle)
  • .all_graphs() — include all named graphs including system graphs (requires TriG or NQuads)
  • .graph("iri") — export a specific named graph by IRI
  • .as_of(TimeSpec) — time-travel export (transaction number, ISO-8601 datetime, or commit CID prefix)
  • .context(&json) — override prefix map (default: ledger’s context from nameservice)
  • .write_to(&mut writer) — stream to any Write sink
  • .to_stdout() — convenience for stdout output

See also: CLI export for command-line usage.

Materialize for Reuse

When you need to run multiple queries against the same snapshot, materialize a GraphSnapshot once:

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // Load once, query many times
    let db = fluree.graph("mydb:main").load().await?;

    let r1 = db.query()
        .sparql("SELECT ?name WHERE { ?s <http://schema.org/name> ?name }")
        .execute()
        .await?;

    let q2 = json!({
        "select": ["?email"],
        "where": [{"@id": "?s", "schema:email": "?email"}]
    });
    let r2 = db.query()
        .jsonld(&q2)
        .execute()
        .await?;

    // Access the underlying view if needed
    let view = db.view();

    Ok(())
}

Advanced Usage

Ledger Caching

Ledger caching is enabled by default on all FlureeBuilder constructors. When caching is active, fluree.ledger() returns a cached handle and subsequent calls avoid reloading from storage:

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    // Caching is on by default — no extra call needed
    let fluree = FlureeBuilder::file("./data").build()?;

    // First call loads from storage
    let ledger = fluree.ledger("mydb:main").await?;

    // Subsequent calls return cached state (fast)
    let ledger2 = fluree.ledger("mydb:main").await?;

    Ok(())
}

To disable caching (e.g., for a CLI tool that runs once and exits):

#![allow(unused)]
fn main() {
let fluree = FlureeBuilder::file("./data")
    .without_ledger_caching()
    .build()?;
}

Disconnecting Ledgers

Use disconnect_ledger to release a ledger from the connection cache. This forces a fresh load on the next access:

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Load and use ledger
    let ledger = fluree.ledger("mydb:main").await?;
    println!("Ledger at t={}", ledger.t());

    // Release cached state
    fluree.disconnect_ledger("mydb:main").await;

    // Next access will reload from storage
    let ledger = fluree.ledger("mydb:main").await?;

    Ok(())
}

When to use disconnect_ledger:

  • Force fresh load: After external changes to the ledger (e.g., another process wrote data)
  • Free memory: Release memory for ledgers you no longer need
  • Clean shutdown: Release resources before application exit
  • Testing: Reset state between test cases

Note: If caching is disabled (via without_ledger_caching() on builder), disconnect_ledger is a no-op.

Checking Ledger Existence

Use ledger_exists to check if a ledger is registered in the nameservice without loading it:

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Check if ledger exists (lightweight nameservice lookup)
    if fluree.ledger_exists("mydb:main").await? {
        // Ledger exists - load it
        let ledger = fluree.ledger("mydb:main").await?;
        println!("Loaded ledger at t={}", ledger.t());
    } else {
        // Ledger doesn't exist - create it
        let ledger = fluree.create_ledger("mydb").await?;
        println!("Created new ledger");
    }

    Ok(())
}

When to use ledger_exists:

  • Conditional create-or-load: Check before deciding whether to create or load
  • Validation: Verify ledger IDs exist before operations
  • Defensive programming: Avoid NotFound errors in application logic

Performance note: This is a lightweight check that only queries the nameservice - it does NOT load the ledger data, indexes, or novelty. Much faster than attempting to load and catching NotFound errors.

Dropping Ledgers

Use drop_ledger to permanently remove a ledger:

use fluree_db_api::{FlureeBuilder, DropMode, DropStatus, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Soft drop: retract from nameservice, preserve files
    let report = fluree.drop_ledger("mydb:main", DropMode::Soft).await?;
    match report.status {
        DropStatus::Dropped => println!("Ledger dropped"),
        DropStatus::AlreadyRetracted => println!("Already dropped"),
        DropStatus::NotFound => println!("Ledger not found"),
    }

    // Hard drop: delete all files (IRREVERSIBLE)
    let report = fluree.drop_ledger("mydb:main", DropMode::Hard).await?;
    println!("Deleted {} commit files, {} index files",
        report.commit_files_deleted,
        report.index_files_deleted);

    Ok(())
}

Drop Modes:

ModeBehaviorReversible
DropMode::Soft (default)Retracts from nameservice only, files remainYes
DropMode::HardRetracts + deletes all storage artifactsNo

Drop Sequence:

  1. Normalizes the ledger ID (ensures :main suffix)
  2. Cancels any pending background indexing
  3. Waits for in-progress indexing to complete
  4. In hard mode: deletes all commit and index files
  5. Retracts from nameservice
  6. Disconnects from ledger cache (if caching enabled)

When to use drop_ledger:

  • Cleanup: Remove test ledgers or unused data
  • Data lifecycle: Permanently delete ledgers that are no longer needed
  • Admin operations: Clean up after migrations or failures

Idempotency:

Safe to call multiple times:

  • Returns DropStatus::AlreadyRetracted if previously dropped
  • Hard mode still attempts deletion for NotFound/AlreadyRetracted (useful for admin cleanup)

Warnings:

The DropReport includes a warnings field for any non-fatal errors encountered during the operation (e.g., failed to delete a specific file). Always check this for hard drops:

#![allow(unused)]
fn main() {
let report = fluree.drop_ledger("mydb:main", DropMode::Hard).await?;
if !report.warnings.is_empty() {
    for warning in &report.warnings {
        eprintln!("Warning: {}", warning);
    }
}
}

Refreshing Cached Ledgers

Use refresh to poll-check whether a cached ledger is stale and update it if needed. refresh returns a RefreshResult containing the ledger’s t after the operation and what action was taken:

use fluree_db_api::{FlureeBuilder, NotifyResult, RefreshOpts, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Load ledger into cache
    let _ledger = fluree.ledger_cached("mydb:main").await?;

    // Later, check if the cached state is still fresh
    match fluree.refresh("mydb:main", Default::default()).await? {
        Some(r) => {
            println!("Ledger at t={}, action: {:?}", r.t, r.action);
            match r.action {
                NotifyResult::Current => println!("Already up to date"),
                NotifyResult::Reloaded => println!("Reloaded from storage"),
                NotifyResult::IndexUpdated => println!("Index was updated"),
                NotifyResult::CommitsApplied { count } => {
                    println!("{count} commits applied incrementally");
                }
                NotifyResult::NotLoaded => println!("Not in cache"),
            }
        }
        None => println!("Ledger not found in nameservice"),
    }

    Ok(())
}

Key behaviors:

  • Does NOT cold-load: If the ledger isn’t already cached, returns NotLoaded (no-op)
  • Returns None: If the ledger doesn’t exist in the nameservice
  • Alias resolution: Supports short aliases (mydb resolves to mydb:main)
  • No-op without caching: If caching is disabled, returns NotLoaded
  • Returns t: The RefreshResult.t field always tells you the ledger’s current transaction time

When to use refresh:

  • Poll-based freshness: When you can’t use SSE events but need periodic freshness checks
  • Before critical reads: Ensure you have the latest state before important queries
  • Peer mode: Check if the local cache is behind the transaction server

refresh vs disconnect_ledger:

Behaviorrefreshdisconnect_ledger
Checks freshnessYesNo
Updates in placeYesNo (forces full reload on next access)
Handles not-cachedReturns NotLoadedNo-op
Use casePoll-based updatesForce full reload

Read-After-Write Consistency

Fluree’s query engine is eventually consistent: when one process writes data and another (or the same process on a warm cache) queries it, the query may not yet see the latest commit. The t value returned from a transaction is the key to bridging this gap.

Pass RefreshOpts { min_t: Some(t) } to refresh() to assert that the cached ledger has reached at least that transaction time. If it hasn’t after pulling the latest state from the nameservice, refresh returns ApiError::AwaitTNotReached with both the requested and current t values. Your code owns retry timing and timeout policy.

Basic usage:

use fluree_db_api::{FlureeBuilder, RefreshOpts, ApiError, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;
    let handle = fluree.ledger_cached("mydb:main").await?;

    // Transaction returns the commit's t value
    let receipt = fluree.stage(&handle)
        .insert(&json!({"@id": "ex:item", "ex:count": 42}))
        .commit()
        .await?;
    let committed_t = receipt.t;

    // Ensure the cache reflects at least this t before querying
    let opts = RefreshOpts { min_t: Some(committed_t) };
    let result = fluree.refresh("mydb:main", opts).await?;
    // result.unwrap().t >= committed_t is guaranteed here

    Ok(())
}

Serverless / Lambda pattern (retry with backoff):

In a serverless environment, the transacting process and the querying process may be different Lambda invocations. The querying invocation receives t (e.g., via an event payload or API parameter) and must wait for that commit to be visible:

#![allow(unused)]
fn main() {
use fluree_db_api::{RefreshOpts, ApiError};
use std::time::{Duration, Instant};

async fn wait_for_t(
    fluree: &Fluree<impl Storage, impl NameService>,
    ledger_id: &str,
    min_t: i64,
    timeout: Duration,
) -> Result<i64, ApiError> {
    let deadline = Instant::now() + timeout;
    let opts = RefreshOpts { min_t: Some(min_t) };

    loop {
        match fluree.refresh(ledger_id, opts.clone()).await {
            Ok(Some(r)) => return Ok(r.t),   // reached min_t
            Ok(None) => return Err(ApiError::NotFound(
                format!("ledger {ledger_id} not in nameservice"),
            )),
            Err(ApiError::AwaitTNotReached { current, .. }) => {
                if Instant::now() >= deadline {
                    return Err(ApiError::AwaitTNotReached {
                        requested: min_t,
                        current,
                    });
                }
                // Back off before retrying
                tokio::time::sleep(Duration::from_millis(50)).await;
            }
            Err(e) => return Err(e),
        }
    }
}
}

How it works internally:

  1. Fast path: If the cached t already satisfies min_t, returns immediately without hitting the nameservice at all.
  2. Pull: Queries the nameservice for the latest commit/index pointers and applies any new commits incrementally (or reloads if the gap is large).
  3. Check: If t is still below min_t after the pull, returns ApiError::AwaitTNotReached so you can retry.

This design keeps retry/timeout policy out of the database layer. Different deployment contexts (Lambda with 100ms backoff, HTTP handler with 5s deadline, integration test with immediate assertion) each wrap the same primitive differently.

Branch Diff (Merge Preview)

Fluree::merge_preview returns the rich diff between two branches — ahead/behind commit summaries, the common ancestor, conflict keys, and fast-forward eligibility — without mutating any state. It uses the same primitives as merge_branch but skips the publish/copy steps, making it cheap enough to call on every UI render.

use fluree_db_api::{FlureeBuilder, MergePreviewOpts, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // ... create ledger, branch, transact on dev, etc.

    // Default: previewing dev → main with the spec defaults
    // (cap each commit list at 500, conflict keys at 200, run conflicts).
    let preview = fluree.merge_preview("mydb", "dev", None).await?;

    println!(
        "{} ahead, {} behind, fast-forward: {}",
        preview.ahead.count, preview.behind.count, preview.fast_forward,
    );

    if preview.fast_forward {
        println!("merge would advance {} → {}", preview.source, preview.target);
    } else {
        println!("merge has {} conflict(s)", preview.conflicts.count);
        for k in &preview.conflicts.keys {
            println!("  - s={} p={}", k.s, k.p);
        }
    }
    Ok(())
}

Tuning the preview

merge_preview_with takes a MergePreviewOpts for callers that need control over response size or want to skip the conflict computation:

use fluree_db_api::{FlureeBuilder, MergePreviewOpts, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // Cheap preview: counts only, no conflict walks.
    let counts = fluree
        .merge_preview_with(
            "mydb",
            "dev",
            Some("main"),
            MergePreviewOpts {
                max_commits: Some(0),       // counts only — no commit summaries
                max_conflict_keys: Some(0),
                include_conflicts: false,
            },
        )
        .await?;

    // Direct Rust callers can opt in to **unbounded** results — useful for
    // tooling that needs the full divergence. The HTTP layer always supplies
    // a bound, so this is a Rust-only escape hatch.
    let full = fluree
        .merge_preview_with(
            "mydb",
            "dev",
            None,
            MergePreviewOpts {
                max_commits: None,
                max_conflict_keys: None,
                include_conflicts: true,
            },
        )
        .await?;

    Ok(())
}

What the caps do (and don’t) control

max_commits and max_conflict_keys cap the size of the returned lists, not the cost of computing them:

  • BranchDelta::count on each side reflects the full unbounded divergence — computed by walking every commit envelope between HEAD and the common ancestor — regardless of max_commits.
  • When include_conflicts: true, both compute_delta_keys walks scan the full per-side delta regardless of max_conflict_keys.
  • When include_conflict_details: true, value details are collected only for the returned conflicts.keys after the max_conflict_keys cap is applied.
  • Set include_conflicts: false for a cheap preview on heavily diverged branches; you still get accurate ahead.count / behind.count.

Response shape

TypeNotable fields
MergePreviewsource, target, ancestor: Option<AncestorRef>, ahead, behind, fast_forward, conflicts, mergeable
BranchDeltacount (unbounded), commits: Vec<CommitSummary> (newest-first, capped), truncated
CommitSummaryt, commit_id, time, asserts, retracts, flake_count, message: Option<String> (extracted from the f:message txn_meta entry when present)
ConflictSummarycount (unbounded), keys: Vec<ConflictKey> (sorted, capped), truncated, strategy, details
ConflictDetailkey, source_values, target_values, resolution (values are the current asserted values at each branch HEAD)
ConflictKeys: Sid, p: Sid, g: Option<Sid>

mergeable only reflects whether the selected strategy would abort due to detected conflicts; it is not full validation of every constraint the eventual merge commit may encounter. mergeable=true does not guarantee a subsequent merge will succeed; it only reflects the conflict/strategy interaction at preview time.

All types derive Serialize so the response is wire-stable; the HTTP endpoint at GET /v1/fluree/merge-preview/{ledger...} returns the same struct. See docs/api/endpoints.md and docs/cli/server-integration.md for the HTTP contract.

Reusable primitives in fluree-db-core

The per-commit summary types and DAG walker are factored into core for reuse outside the merge-preview flow (e.g., git-log-style commit history viewers, indexer integration). Re-exported from fluree-db-api:

  • walk_commit_summaries(store, head, stop_at_t, max) -> Result<(Vec<CommitSummary>, usize)> — newest-first walk that returns both the (capped) summary list and the unbounded total count.
  • commit_to_summary(commit) -> CommitSummary — pure function, no I/O.
  • find_common_ancestor(store, head_a, head_b) — dual-frontier BFS.

Time Travel Queries

use fluree_db_api::{FlureeBuilder, TimeSpec, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // Query at a specific point in time
    let result = fluree.graph_at("mydb:main", TimeSpec::AtT(100))
        .query()
        .sparql("SELECT * WHERE { ?s ?p ?o } LIMIT 10")
        .execute()
        .await?;

    println!("Results at t=100: {:?}", result.row_count());

    Ok(())
}

Multi-Ledger Queries

use fluree_db_api::{FlureeBuilder, DataSetDb, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // Load views from multiple ledgers
    let customers = fluree.view("customers:main").await?;
    let orders = fluree.view("orders:main").await?;

    // Compose a dataset from multiple graphs
    let dataset = DataSetDb::new()
        .with_default(customers)
        .with_named("orders:main", orders);

    // Query across ledgers using the dataset builder
    let query = r#"
        SELECT ?customerName ?orderTotal
        WHERE {
            ?customer schema:name ?customerName .
            ?customer ex:customerId ?cid .

            GRAPH <orders:main> {
                ?order ex:customerId ?cid .
                ?order ex:total ?orderTotal .
            }
        }
    "#;

    let result = dataset.query(&fluree)
        .sparql(query)
        .execute()
        .await?;

    Ok(())
}

Remote Federation

Query ledgers on remote Fluree servers using SPARQL SERVICE with the fluree:remote: scheme. Register remote connections at build time — each maps a name to a server URL and optional bearer token:

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data")
        .remote_connection(
            "acme",
            "https://acme-fluree.example.com",
            Some("eyJhbG...".to_string()),
        )
        .build()?;

    let db = fluree.view("local-ledger:main").await?;

    // Join local data with a ledger on the remote server
    let result = fluree.query(&db, r#"
        PREFIX ex: <http://example.org/ns/>
        SELECT ?name ?email
        WHERE {
          ?person ex:name ?name .
          SERVICE <fluree:remote:acme/customers:main> {
            ?person ex:email ?email .
          }
        }
    "#).await?;

    Ok(())
}

The connection name (acme) maps to the server URL. The ledger path (customers:main) is appended to form the request URL: POST https://acme-fluree.example.com/v1/fluree/query/customers:main. The bearer token is sent as Authorization: Bearer <token> on every request.

Multiple ledgers on the same remote server use the same connection name — you register the server once and can query any ledger your token is authorized for.

See Configuration: Remote connections for details and SPARQL: Remote Fluree Federation for full query syntax.

FROM-Driven Queries (Connection Queries)

When the query body itself specifies which ledgers to target (via "from" in JSON-LD or FROM in SPARQL), use query_from():

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // Query where the "from" is embedded in the query body
    let query = json!({
        "from": "mydb:main",
        "select": ["?name"],
        "where": { "@id": "?s", "schema:name": "?name" }
    });

    let result = fluree.query_from()
        .jsonld(&query)
        .execute_formatted()
        .await?;

    // SPARQL with FROM clause
    let result = fluree.query_from()
        .sparql("SELECT ?name FROM <mydb:main> WHERE { ?s <http://schema.org/name> ?name }")
        .execute_formatted()
        .await?;

    Ok(())
}

Background Indexing

use fluree_db_api::{FlureeBuilder, BackgroundIndexerWorker, Result};
use std::sync::Arc;
use tokio::time::{sleep, Duration};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = Arc::new(FlureeBuilder::file("./data").build()?);

    // Start background indexer
    let indexer = BackgroundIndexerWorker::new(
        fluree.clone(),
        Duration::from_secs(5), // Index interval
    );

    let indexer_handle = indexer.start();

    // Application logic
    let ledger = fluree.create_ledger("mydb").await?;

    // Transactions will be indexed automatically in background
    for i in 0..100 {
        let txn = json!({
            "@context": {"ex": "http://example.org/ns/"},
            "@graph": [{"@id": format!("ex:item{}", i), "ex:value": i}]
        });

        fluree.graph("mydb:main")
            .transact()
            .insert(&txn)
            .commit()
            .await?;
    }

    // Wait for indexing to complete
    sleep(Duration::from_secs(10)).await;

    // Shutdown indexer
    indexer_handle.shutdown().await?;

    Ok(())
}
use fluree_db_api::{
    FlureeBuilder, Bm25CreateConfig, Bm25FieldConfig, Result
};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();
    let ledger = fluree.create_ledger("mydb").await?;

    // Insert searchable data and create BM25 index
    // ...

    // Query with full-text search using JSON-LD and the f:graphSource pattern
    let search_query = json!({
        "@context": {
            "schema": "http://schema.org/",
            "f": "https://ns.flur.ee/db#"
        },
        "from": "mydb:main",
        "select": ["?product", "?score", "?name"],
        "where": [
            {
                "f:graphSource": "products-search:main",
                "f:searchText": "laptop",
                "f:searchLimit": 10,
                "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
            },
            { "@id": "?product", "schema:name": "?name" }
        ],
        "orderBy": [["desc", "?score"]],
        "limit": 10
    });

    let result = fluree.query_from()
        .jsonld(&search_query)
        .execute()
        .await?;

    println!("Found {} matching products", result.row_count());

    Ok(())
}

Configuration

Builder Options

use fluree_db_api::{FlureeBuilder, ConnectionConfig, IndexConfig, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let config = ConnectionConfig {
        storage_path: "./data".into(),
        index_config: IndexConfig {
            interval_ms: 5000,
            batch_size: 10,
            memory_mb: 2048,
            threads: 4,
        },
        ..Default::default()
    };

    let fluree = FlureeBuilder::with_config(config).build()?;

    Ok(())
}

Custom Storage Backend

use fluree_db_api::{
    FlureeBuilder, Storage, StorageWrite, Result
};
use async_trait::async_trait;

// Implement custom storage
struct MyStorage;

#[async_trait]
impl Storage for MyStorage {
    async fn read(&self, address: &str) -> Result<Vec<u8>> {
        // Custom implementation
        todo!()
    }
}

#[async_trait]
impl StorageWrite for MyStorage {
    async fn write(&self, address: &str, data: &[u8]) -> Result<()> {
        // Custom implementation
        todo!()
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    let storage = MyStorage;
    let fluree = FlureeBuilder::custom(storage).build()?;

    Ok(())
}

If you need full control over both storage and nameservice (e.g., for proxy mode or custom backends), use build_with():

#![allow(unused)]
fn main() {
let storage = MyStorage;
let nameservice = MyNameService;

let fluree = FlureeBuilder::memory()
    .build_with(storage, nameservice);
}

build_with() respects the builder’s caching configuration — caching is on by default, or call .without_ledger_caching() before build_with() to disable it.

Error Handling

use fluree_db_api::{FlureeBuilder, ApiError, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // Create a ledger — handles duplicates gracefully
    match fluree.create_ledger("mydb").await {
        Ok(ledger) => {
            println!("Ledger created at t={}", ledger.t());
        }
        Err(ApiError::LedgerExists(ledger_id)) => {
            println!("Ledger {} already exists, loading...", ledger_id);
            let ledger = fluree.ledger("mydb:main").await?;
            println!("Loaded at t={}", ledger.t());
        }
        Err(e) => {
            eprintln!("Error: {}", e);
            return Err(e);
        }
    }

    Ok(())
}

Testing

Unit Tests

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use fluree_db_api::{FlureeBuilder, Result};
    use serde_json::json;

    #[tokio::test]
    async fn test_insert_and_query() -> Result<()> {
        // Use memory storage for tests
        let fluree = FlureeBuilder::memory().build_memory();
        let ledger = fluree.create_ledger("test").await?;

        // Insert data
        let data = json!({
            "@context": {"ex": "http://example.org/ns/"},
            "@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
        });

        fluree.graph("test:main")
            .transact()
            .insert(&data)
            .commit()
            .await?;

        // Query data
        let query = json!({
            "select": ["?name"],
            "where": [{"@id": "ex:alice", "ex:name": "?name"}]
        });

        let result = fluree.graph("test:main")
            .query()
            .jsonld(&query)
            .execute()
            .await?;

        assert_eq!(result.row_count(), 1);

        Ok(())
    }
}
}

Integration Tests

#![allow(unused)]
fn main() {
// tests/integration_test.rs
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
use tempfile::TempDir;

#[tokio::test]
async fn test_persistence() -> Result<()> {
    let temp_dir = TempDir::new()?;
    let path = temp_dir.path().to_str().unwrap();

    // Create ledger and write data
    {
        let fluree = FlureeBuilder::file(path).build()?;
        let ledger = fluree.create_ledger("test").await?;

        let data = json!({"@context": {}, "@graph": [{"@id": "ex:test"}]});
        fluree.graph("test:main")
            .transact()
            .insert(&data)
            .commit()
            .await?;
    }

    // Verify persistence by reopening
    {
        let fluree = FlureeBuilder::file(path).build()?;
        let ledger = fluree.ledger("test:main").await?;

        assert!(ledger.t() > 0);
    }

    Ok(())
}
}

Performance Tips

Batch Transactions

#![allow(unused)]
fn main() {
// Good: Batch related changes
let batch_data = json!({
    "@graph": [
        {"@id": "ex:item1", "ex:value": 1},
        {"@id": "ex:item2", "ex:value": 2},
        {"@id": "ex:item3", "ex:value": 3}
    ]
});
let result = fluree.graph("mydb:main")
    .transact()
    .insert(&batch_data)
    .commit()
    .await?;

// Bad: Individual transactions (more overhead per commit)
for i in 1..=3 {
    let txn = json!({"@graph": [{"@id": format!("ex:item{}", i), "ex:value": i}]});
    fluree.graph("mydb:main")
        .transact()
        .insert(&txn)
        .commit()
        .await?;
}
}

Use Appropriate Storage

  • Memory: Fastest, no persistence (tests, temporary data)
  • File: Good balance (single server, local development)
  • AWS: Distributed, durable (production, multi-server)

Query Optimization

#![allow(unused)]
fn main() {
// Good: Specific patterns
let query = json!({
    "select": ["?name"],
    "where": [
        {"@id": "ex:alice", "schema:name": "?name"}
    ]
});

// Bad: Broad patterns
let query = json!({
    "select": ["?s", "?p", "?o"],
    "where": [
        {"@id": "?s", "?p": "?o"}
    ]
});
}

Enable Query Tracking

use fluree_db_api::{FlureeBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // Use execute_tracked() for fuel/time/policy tracking
    let tracked = fluree.graph("mydb:main")
        .query()
        .sparql("SELECT * WHERE { ?s ?p ?o }")
        .execute_tracked()
        .await?;

    println!("Query used {} fuel", tracked.fuel().unwrap_or(0));

    Ok(())
}

Graph API Reference

The Graph API follows a lazy-handle pattern: fluree.graph(graph_ref) returns a lightweight handle, and all I/O is deferred to terminal methods.

Getting a Graph Handle

#![allow(unused)]
fn main() {
// Lazy handle to the current (head) state
let graph = fluree.graph("mydb:main");

// Lazy handle at a specific point in time
let graph = fluree.graph_at("mydb:main", TimeSpec::AtT(100));
}

Querying

#![allow(unused)]
fn main() {
// JSON-LD query (lazy — loads graph at execution time)
let result = fluree.graph("mydb:main")
    .query()
    .jsonld(&query_json)
    .execute().await?;

// SPARQL query
let result = fluree.graph("mydb:main")
    .query()
    .sparql("SELECT ?s WHERE { ?s a <ex:Person> }")
    .execute().await?;

// Formatted output (JSON-LD or SPARQL JSON based on query type)
let json = fluree.graph("mydb:main")
    .query()
    .jsonld(&query_json)
    .execute_formatted().await?;

// Tracked query (fuel/time/policy metrics)
let tracked = fluree.graph("mydb:main")
    .query()
    .sparql("SELECT * WHERE { ?s ?p ?o }")
    .execute_tracked().await?;
}

Materializing a GraphSnapshot

#![allow(unused)]
fn main() {
// Load once, query many times (avoids reloading)
let db = fluree.graph("mydb:main").load().await?;

let r1 = db.query().sparql("...").execute().await?;
let r2 = db.query().jsonld(&q).execute().await?;

// Access the underlying GraphDb
let view = db.view();
}

Transacting

#![allow(unused)]
fn main() {
// Insert and commit
let result = fluree.graph("mydb:main")
    .transact()
    .insert(&data)
    .commit().await?;

// Upsert with options. f:identity is system-controlled (signed DID,
// opts.identity, or CommitOpts::identity). f:message and f:author are
// pure user claims — supply them in the transaction body just like any
// other txn-meta property.
let data = serde_json::json!({
    "@context": {
        "ex": "http://example.org/",
        "f": "https://ns.flur.ee/db#"
    },
    "@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }],
    "f:message": "admin update",
    "f:author": "did:admin"
});

let result = fluree.graph("mydb:main")
    .transact()
    .upsert(&data)
    .commit_opts(CommitOpts::default().identity("did:admin"))
    .commit().await?;

// Stage without committing (preview changes)
let staged = fluree.graph("mydb:main")
    .transact()
    .insert(&data)
    .stage().await?;

// Query staged state
let preview = staged.query()
    .jsonld(&validation_query)
    .execute().await?;
}

Commit Inspection

Decode and display the contents of a commit — assertions and retractions with IRIs resolved to compact form. Similar to git show for individual commits.

#![allow(unused)]
fn main() {
// By exact CID
let detail = fluree.graph("mydb:main")
    .commit(&commit_id)
    .execute().await?;

// By transaction number
let detail = fluree.graph("mydb:main")
    .commit_t(5)
    .execute().await?;

// By hex-digest prefix (min 6 chars, like abbreviated git hashes)
let detail = fluree.graph("mydb:main")
    .commit_prefix("3dd028")
    .execute().await?;

// With a custom @context for IRI compaction
let detail = fluree.graph("mydb:main")
    .commit_prefix("3dd028")
    .context(my_parsed_context)
    .execute().await?;

// Access the result
println!("t={}, +{} -{}", detail.t, detail.asserts, detail.retracts);
for flake in &detail.flakes {
    let op = if flake.op { "+" } else { "-" };
    println!("{} {} {} {} [{}]", op, flake.s, flake.p, flake.o, flake.dt);
}
}

The returned CommitDetail contains:

  • Metadata: id, t, time, size, previous, signer, asserts, retracts
  • context: prefix → IRI map derived from the ledger’s namespace codes
  • flakes: flat list in SPOT order, each with resolved compact IRIs

CommitDetail implements Serialize — flakes serialize as [s, p, o, dt, op] tuples (with an optional 6th metadata element for language tags, list indices, or named graphs).

Terminal Operations

MethodReturnsDescription
.execute()Result<QueryResult>Raw query result
.execute_formatted()Result<JsonValue>Formatted JSON output (JSON-LD for .jsonld(), SPARQL JSON for .sparql())
.execute_tracked()Result<TrackedQueryResponse>Result with fuel/time/policy tracking
.commit()Result<TransactResultRef>Stage + commit transaction
.stage()Result<StagedGraph>Stage without committing
.load()Result<GraphSnapshot>Materialize snapshot for reuse

Format Override

#![allow(unused)]
fn main() {
use fluree_db_api::FormatterConfig;

// Force JSON-LD format for a SPARQL query
let result = fluree.graph("mydb:main")
    .query()
    .sparql("SELECT ?name WHERE { ?s <schema:name> ?name }")
    .format(FormatterConfig::jsonld())
    .execute_formatted()
    .await?;
}

Multi-Ledger Queries (Dataset)

For multi-ledger queries, use GraphDb directly:

#![allow(unused)]
fn main() {
let customers = fluree.view("customers:main").await?;
let orders = fluree.view("orders:main").await?;

let dataset = DataSetDb::new()
    .with_default(customers)
    .with_named("orders:main", orders);

let result = dataset.query(&fluree)
    .sparql(query)
    .execute().await?;
}

FROM-Driven Queries (Connection Queries)

#![allow(unused)]
fn main() {
let result = fluree.query_from()
    .jsonld(&query_with_from)
    .execute().await?;
}

Transaction Builder API Reference

There are two transaction builder patterns, each suited for different use cases:

Use stage(&handle) when building servers or applications with ledger caching enabled. The handle is borrowed and updated in-place on successful commit, ensuring concurrent readers see the update.

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    // Caching is on by default (required for stage)
    let fluree = FlureeBuilder::file("./data").build()?;

    // Get a cached handle
    let handle = fluree.ledger_cached("mydb:main").await?;

    // Transaction via builder — handle updated in-place
    let data = json!({"@graph": [{"@id": "ex:test", "ex:name": "Test"}]});
    let result = fluree.stage(&handle)
        .insert(&data)
        .execute()
        .await?;

    println!("Committed at t={}", result.receipt.t);

    // Handle now reflects the new state
    let snapshot = handle.snapshot().await;
    assert_eq!(snapshot.t, result.receipt.t);

    Ok(())
}

Why use stage(&handle):

  • Concurrent safety: Multiple requests share the same handle; updates are atomic
  • No ownership dance: You don’t need to track and pass around LedgerState values
  • Server-friendly: Matches how the HTTP server handles transactions internally

stage_owned(ledger) — CLI/Script/Test Pattern

Use stage_owned(ledger) when you manage your own LedgerState directly. This is typical for CLI tools, scripts, and tests where you don’t need ledger caching.

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::memory().build_memory();

    // You own the ledger state
    let ledger = fluree.create_ledger("mydb").await?;

    // Transaction consumes ledger, returns updated state
    let data = json!({"@graph": [{"@id": "ex:test", "ex:name": "Test"}]});
    let result = fluree.stage_owned(ledger)
        .insert(&data)
        .execute()
        .await?;

    // Get the updated ledger from the result
    let ledger = result.ledger;
    println!("Now at t={}", ledger.t());

    Ok(())
}

Why use stage_owned(ledger):

  • Simple ownership: Good for linear workflows (load → transact → done)
  • No caching required: Works even with without_ledger_caching()
  • Test-friendly: Each test manages its own state

Choosing Between Them

Use CasePatternWhy
HTTP serverstage(&handle)Shared handles, atomic updates
Long-running appstage(&handle)Concurrent access to same ledger
CLI toolstage_owned(ledger)Simple, no caching needed
Integration teststage_owned(ledger)Isolated state per test
Script/batch jobstage_owned(ledger)Linear workflow

Builder Methods (Both Patterns)

Both stage(&handle) and stage_owned(ledger) return a builder with identical methods:

#![allow(unused)]
fn main() {
let result = fluree.stage(&handle)  // or stage_owned(ledger)
    .insert(&data)                   // or .upsert(&data), .update(&data)
    .commit_opts(CommitOpts::default().identity("did:admin"))
    .execute()
    .await?;
// (Include `f:message` / `f:author` directly in `data` for user-claim provenance.)
}
MethodDescription
.insert(&json)Insert JSON-LD data
.upsert(&json)Upsert JSON-LD data
.update(&json)Update with WHERE/DELETE/INSERT
.insert_turtle(&ttl)Insert Turtle data
.upsert_turtle(&ttl)Upsert Turtle data
.txn_opts(opts)Set transaction options (branch, context)
.commit_opts(opts)Set commit options (identity, raw_txn)
.policy(ctx)Set policy enforcement
.execute()Stage + commit
.stage()Stage without committing (returns Staged)
.validate()Check configuration without executing

Graph API Transactions

The Graph API (fluree.graph(graph_ref).transact()) is built on top of stage(&handle) internally:

#![allow(unused)]
fn main() {
// Graph API (convenient, uses caching internally)
let result = fluree.graph("mydb:main")
    .transact()
    .insert(&data)
    .commit()
    .await?;

// Equivalent to:
let handle = fluree.ledger_cached("mydb:main").await?;
let result = fluree.stage(&handle)
    .insert(&data)
    .execute()
    .await?;
}

Ledger Info API

Get comprehensive metadata about a ledger using the ledger_info() builder:

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Get ledger info with optional context for IRI compaction
    let context = json!({
        "schema": "http://schema.org/",
        "ex": "http://example.org/ns/"
    });

    let info = fluree
        .ledger_info("mydb:main")
        .with_context(&context)
        // Optional: include datatype breakdowns under stats.properties[*]
        // .with_property_datatypes(true)
        // Optional: make property datatype details novelty-aware (real-time)
        // .with_realtime_property_details(true)
        .execute()
        .await?;

    // Access metadata sections
    println!("Commit: {}", info["commit"]);
    println!("Nameservice: {}", info["nameservice"]);
    println!("Namespace codes: {}", info["namespace-codes"]);
    println!("Stats: {}", info["stats"]);
    println!("Index: {}", info["index"]);

    Ok(())
}

Ledger Info Response

The response includes:

SectionDescription
commitCommit info in JSON-LD format
nameserviceNsRecord in JSON-LD format
namespace-codesInverted mapping (prefix → code) for IRI expansion
statsFlake counts, size, property/class statistics with selectivity
indexIndex metadata (t, ContentId, index ID)

Stats freshness (real-time vs indexed)

The stats section now uses layered runtime stats assembly:

  • Default ledger_info() uses the full novelty-aware path, including lookup-backed class/ref enrichment.
  • with_realtime_property_details(false) downgrades to the lighter fast novelty-aware merge (Indexed + novelty deltas, no extra lookups).
  • HLL / NDV fields remain index-derived, so they are omitted by default and only included via with_property_estimates(true).

That means the payload still mixes real-time values (indexed + novelty deltas) with values that are only available as-of the last index.

  • Real-time (includes novelty):

    • stats.flakes, stats.size
    • stats.properties[*].count (but not NDV)
    • stats.properties[*].datatypes by default
    • stats.classes[*].count
    • stats.classes[*].property-list and stats.classes[*].properties (property presence)
    • stats.classes[*].properties[*].refs by default
  • As-of last index:

    • stats.indexed (the index (t))
    • stats.properties[*].ndv-values, stats.properties[*].ndv-subjects when explicitly included via with_property_estimates(true)
    • Any selectivity derived from NDV values
    • stats.classes[*].properties[*].refs only when callers explicitly disable full detail with with_realtime_property_details(false)

Nameservice Query API

Query metadata about all ledgers and graph sources using the nameservice_query() builder:

use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<()> {
    let fluree = FlureeBuilder::file("./data").build()?;

    // Find all ledgers on main branch
    let query = json!({
        "@context": {"f": "https://ns.flur.ee/db#"},
        "select": ["?ledger", "?t"],
        "where": [{"@id": "?ns", "@type": "f:LedgerSource", "f:ledger": "?ledger", "f:branch": "main", "f:t": "?t"}],
        "orderBy": [{"var": "?t", "desc": true}]
    });

    let results = fluree.nameservice_query()
        .jsonld(&query)
        .execute_formatted()
        .await?;

    println!("Ledgers: {}", serde_json::to_string_pretty(&results)?);

    // SPARQL query
    let results = fluree.nameservice_query()
        .sparql("PREFIX f: <https://ns.flur.ee/db#>
                 SELECT ?ledger ?t WHERE { ?ns a f:LedgerSource ; f:ledger ?ledger ; f:t ?t }")
        .execute_formatted()
        .await?;

    println!("SPARQL results: {}", serde_json::to_string_pretty(&results)?);

    // Convenience method (equivalent to builder with defaults)
    let results = fluree.query_nameservice(&query).await?;

    Ok(())
}

Available Properties

Ledger Records (@type: "f:LedgerSource"):

PropertyDescription
f:ledgerLedger name (without branch suffix)
f:branchBranch name
f:tCurrent transaction number
f:statusStatus: “ready” or “retracted”
f:ledgerCommitReference to latest commit ContentId
f:ledgerIndexIndex info with @id and f:t

Graph Source Records (@type: "f:GraphSourceDatabase"):

PropertyDescription
f:nameGraph source name
f:branchBranch name
f:configConfiguration JSON
f:dependenciesSource ledger dependencies
f:indexAddressIndex ContentId
f:indexTIndex transaction number

Builder Methods

MethodDescription
.jsonld(&query)Set JSON-LD query input
.sparql(query)Set SPARQL query input
.format(config)Override output format
.execute_formatted()Execute and return formatted JSON
.execute()Execute with default formatting
.validate()Validate without executing

Example Queries

#![allow(unused)]
fn main() {
// Find ledgers with t > 100
let query = json!({
    "@context": {"f": "https://ns.flur.ee/db#"},
    "select": ["?ledger", "?t"],
    "where": [{"@id": "?ns", "f:ledger": "?ledger", "f:t": "?t"}],
    "filter": ["(> ?t 100)"]
});

// Find all BM25 graph sources
let query = json!({
    "@context": {"f": "https://ns.flur.ee/db#"},
    "select": ["?name", "?deps"],
    "where": [{"@id": "?gs", "@type": "f:Bm25Index", "f:name": "?name", "f:dependencies": "?deps"}]
});
}

Examples

See complete examples in fluree-db-api/examples/:

  • benchmark_aj_query_1.rs - Basic query patterns
  • benchmark_aj_query_2.rs - Complex queries
  • benchmark_aj_query_3.rs - Aggregations
  • benchmark_aj_query_4.rs - Time travel queries

Run examples:

cargo run --example benchmark_aj_query_1 --release

API Reference

For detailed API documentation, see:

cargo doc --open -p fluree-db-api

Concepts

Fluree is a graph database that stores and queries data using RDF (Resource Description Framework) semantics. This section explains the core concepts that make Fluree unique and powerful, with special emphasis on the features that differentiate Fluree from other graph databases.

These concepts build on each other. If you’re new to Fluree, read them in this order:

Foundations (read these first):

  1. IRIs, Namespaces, and JSON-LD @context — How Fluree identifies everything
  2. Datatypes and Typed Values — Fluree’s type system
  3. Ledgers and the Nameservice — The core unit of data storage

Core capabilities (read next):

  1. Time Travel — Query any point in history
  2. Branching — Git-like branch, merge, and rebase for your data
  3. Datasets and Named Graphs — Partition and query across graphs

Differentiating features (read as needed):

  1. Graph Sources — Integrated search and external data
  2. Policy Enforcement — Fine-grained access control
  3. Verifiable Data — Cryptographic signatures and trust
  4. Reasoning and Inference — Derive facts from ontology rules

If you’re coming from a SQL/relational background, start with Fluree for SQL Developers before diving into the concepts above.

Core Concepts

IRIs, Namespaces, and JSON-LD @context

Understand how Fluree uses Internationalized Resource Identifiers (IRIs) for all data identifiers, how namespaces provide convenient shorthand notation, and how JSON-LD @context enables compact, readable data exchange.

Datatypes and Typed Values

Explore Fluree’s type system, including support for XSD datatypes (strings, numbers, dates, booleans), RDF datatypes, and how all literal values are strongly typed.

Ledgers and the Nameservice

Learn about ledgers (Fluree’s equivalent of databases), how they’re organized with aliases like mydb:main, and how the nameservice provides discovery and metadata management across distributed deployments.

Time Travel

Differentiator: Discover Fluree’s temporal database capabilities, including transaction-time versioning, historical queries, and the ability to query data “as of” any previous transaction. Every change is preserved, enabling complete audit trails and historical analysis.

Datasets and Named Graphs

Learn about SPARQL datasets, named graphs, and how Fluree supports multi-graph queries across different data sources and time periods.

Graph Sources

Differentiator: Fluree’s graph source system enables seamless integration of specialized indexes and external data sources. Built-in BM25 full-text search, vector similarity search (ANN), Apache Iceberg integration, and R2RML relational mappings extend Fluree’s query capabilities beyond traditional graph queries.

Policy Enforcement

Differentiator: Fluree’s policy system provides fine-grained, data-level access control. Policies are enforced at query time, ensuring users only see data they’re authorized to access. This enables secure multi-tenant deployments and compliance with data privacy regulations.

Verifiable Data

Differentiator: Fluree supports cryptographically signed transactions using JWS (JSON Web Signatures) and Verifiable Credentials. Every transaction can be cryptographically verified, providing tamper-proof audit trails and enabling trustless data exchange.

Reasoning and Inference

Fluree’s built-in reasoning engine derives new facts from ontology declarations (RDFS, OWL) and user-defined Datalog rules. Query for a superclass and get all subclass instances automatically.

Architecture Overview

Fluree combines several architectural concepts:

  • Triple Store: All data is stored as RDF triples (subject-predicate-object)
  • Temporal Database: Every transaction is timestamped, enabling complete historical access
  • Multi-Graph Support: Data can be partitioned across named graphs
  • JSON-LD Integration: Native support for JSON-LD with full IRI expansion/compaction
  • SPARQL & JSON-LD Query: Support for both SPARQL and Fluree’s native JSON-LD Query language

Key Differentiators

What makes Fluree unique:

  1. Built-in Full-Text Search: BM25 indexing is integrated directly into the database, not a separate system
  2. Vector Similarity Search: Native support for approximate nearest neighbor (ANN) queries via embedded HNSW indexes or remote search service
  3. Apache Iceberg Integration: Query data lake formats directly as graph sources
  4. Complete Time Travel: Every transaction is preserved with full historical query capabilities
  5. Data-Level Policy Enforcement: Fine-grained access control enforced at query time, not application level
  6. Cryptographically Verifiable: Transactions can be signed and verified using industry-standard formats (JWS/VC)

These concepts work together to provide a powerful, standards-compliant graph database with temporal capabilities, integrated search, and enterprise-grade security features.

Ledgers and the Nameservice

Ledgers are Fluree’s fundamental unit of data organization—similar to databases in traditional RDBMS systems. The nameservice is the metadata registry that enables ledger discovery, coordination, and management across distributed deployments.

Ledgers

A ledger in Fluree is an independent, versioned graph database containing:

  • A complete graph of RDF triples
  • Complete transaction history with temporal versioning
  • Independent indexing and storage
  • Configurable permissions and policies
  • Support for multiple branches

Ledger IDs

Ledgers are identified by ledger IDs with the format ledger-name:branch.

A ledger ID serves as both a human-readable identifier and the canonical lookup key used across APIs, CLI, and caching.

Examples:

  • mydb:main - Primary branch of the “mydb” ledger
  • customers:dev - Development branch of the “customers” ledger
  • inventory:prod - Production branch of the “inventory” ledger
  • tenant/app:feature-x - Feature branch with hierarchical naming

Branch Semantics:

  • The :branch suffix allows multiple isolated versions of the same logical ledger to coexist
  • The default branch name is main when not specified (e.g., mydb is equivalent to mydb:main)
  • Branches are independent—changes in one branch don’t affect others
  • Branch names can include slashes for hierarchical organization

Ledger Lifecycle

Ledgers are created implicitly through the first transaction and persist until explicitly retracted. Each ledger maintains:

  • Transaction History: Every change is recorded as a transaction with a unique timestamp (t)
  • Current State: The latest indexed state of all data
  • Novelty Layer: Uncommitted transactions since the last index
  • Metadata: Creation time, latest commit, indexing status

Creation Flow:

  1. First transaction to a ledger ID creates the ledger automatically
  2. Transaction is committed and assigned a transaction time (t)
  3. Commit ID is published to the nameservice
  4. Background indexing process creates queryable indexes
  5. Index ID is published to the nameservice when complete

Retraction:

Ledgers can be marked as retracted (soft delete), which:

  • Marks the ledger as inactive in the nameservice
  • Preserves all historical data
  • Prevents new transactions (but allows historical queries)
  • Can be reversed if needed

The Nameservice

The nameservice is Fluree’s metadata registry that enables ledger discovery and coordination. It acts as a directory service, tracking where ledger data is stored and what state each ledger is in.

Purpose and Role

The nameservice provides:

  • Discovery: Find ledgers by ledger ID across distributed deployments
  • Coordination: Track commit and index state for consistency
  • Metadata Management: Store ledger configuration and status
  • Multi-Process Support: Enable coordination across multiple Fluree instances

What the Nameservice Stores

For each ledger, the nameservice maintains a nameservice record (NsRecord) containing:

Core Identifiers

  • id: Canonical ledger ID with branch (e.g., "mydb:main")
  • name: Ledger name without branch suffix (e.g., "mydb")
  • branch: Branch name (e.g., "main")

Commit State

  • commit_id: ContentId (CIDv1) of the latest commit
  • commit_t: Transaction time of the latest commit

The commit represents the most recent transaction that has been persisted. Commits are published immediately after each successful transaction. The commit_id is a content-addressed identifier derived from the commit’s bytes — it is storage-agnostic and does not depend on where the commit is physically stored.

Index State

  • index_id: ContentId (CIDv1) of the latest index root
  • index_t: Transaction time of the latest index

The index represents a queryable snapshot of the ledger state. Indexes are created by background processes and may lag behind commits. Like commits, the index_id is a content-addressed identifier.

Branch Metadata

  • source_branch: For branches created via create_branch, records the name of the source branch (e.g., "main"). None for the initial branch.

The divergence point (common ancestor) between a branch and its source is computed on demand by walking the commit chains rather than being stored. This avoids stale metadata and supports merge scenarios where the relationship between branches changes over time.

Additional Metadata

  • default_context_id: ContentId of the default JSON-LD @context for the ledger
  • retracted: Whether the ledger has been marked as inactive

Commit vs Index: Understanding the Difference

This distinction is crucial for understanding Fluree’s architecture:

Commits (commit_t):

  • Created immediately after each transaction
  • Represent the transaction log (what changed)
  • Small, append-only files
  • Published synchronously
  • Always up-to-date with latest transactions

Indexes (index_t):

  • Created by background indexing processes
  • Represent queryable database snapshots (complete state)
  • Large, optimized data structures
  • Published asynchronously
  • May lag behind commits (this gap is the “novelty layer”)

Example Timeline:

t=1:  Transaction committed → commit_t=1, index_t=0
t=2:  Transaction committed → commit_t=2, index_t=0
t=3:  Transaction committed → commit_t=3, index_t=0
       [Background indexing completes] → index_t=3
t=4:  Transaction committed → commit_t=4, index_t=3
t=5:  Transaction committed → commit_t=5, index_t=3
       [Novelty layer: t=4, t=5 not yet indexed]

Queries combine the indexed state (up to index_t) with the novelty layer (transactions between index_t and commit_t) to provide real-time results.

Nameservice Operations

The nameservice supports these key operations:

Lookup

Find ledger metadata by ledger ID:

#![allow(unused)]
fn main() {
// Pseudo-code
let record = nameservice.lookup("mydb:main").await?;
// Returns: NsRecord with commit_id, index_id, timestamps, etc.
}

Publishing

Record new commits and indexes:

  • RefPublisher::compare_and_set_ref() / fast_forward_commit(): Advance the commit head with explicit CAS conflict handling
  • publish_index(ledger_id, index_id, index_t): Update index state (monotonic: only if new_t > existing_t)

Commit-head publishing is CAS-based so concurrent writers get an explicit conflict result instead of a silent no-op. Index publishing remains monotonic and only accepts updates that advance time forward.

Branching

Create and list branches:

  • create_branch(ledger_name, new_branch, source_branch, at_commit): Create a new branch from the source. When at_commit is None, the branch starts at the source’s current HEAD; when Some((commit_id, commit_t)), the branch starts at the supplied historical commit instead (callers are expected to verify reachability from source HEAD before passing it in).
  • list_branches(ledger_name): List all non-retracted branches for a ledger

Discovery

List all available ledgers:

#![allow(unused)]
fn main() {
// Pseudo-code
let all_ledgers = nameservice.all_records().await?;
// Returns: Vec<NsRecord> for all known ledgers
}

Querying the Nameservice

The nameservice can be queried using standard JSON-LD query or SPARQL syntax. This enables powerful ledger discovery, filtering, and metadata analysis across all managed databases.

Rust API (Builder Pattern)

#![allow(unused)]
fn main() {
// Find all ledgers on main branch
let query = json!({
    "@context": {"f": "https://ns.flur.ee/db#"},
    "select": ["?ledger"],
    "where": [{"@id": "?ns", "f:ledger": "?ledger", "f:branch": "main"}]
});

let results = fluree.nameservice_query()
    .jsonld(&query)
    .execute_formatted()
    .await?;

// Query with SPARQL
let results = fluree.nameservice_query()
    .sparql("PREFIX f: <https://ns.flur.ee/db#>
             SELECT ?ledger ?t WHERE {
               ?ns a f:LedgerSource ;
                   f:ledger ?ledger ;
                   f:t ?t
             }")
    .execute_formatted()
    .await?;

// Convenience method (equivalent to builder with defaults)
let results = fluree.query_nameservice(&query).await?;
}

HTTP API

# List ledgers and graph sources from the nameservice
curl http://localhost:8090/v1/fluree/ledgers

Available Properties

Ledger Records (@type: "f:LedgerSource"):

PropertyDescription
f:ledgerLedger name (without branch suffix)
f:branchBranch name (e.g., “main”, “dev”)
f:tCurrent transaction number
f:statusStatus: “ready” or “retracted”
f:ledgerCommitReference to latest commit ContentId
f:ledgerIndexIndex info object with @id (ContentId) and f:t
f:sourceBranchSource branch name (e.g., "main") if this is a branched ledger
f:defaultContextCidDefault JSON-LD context ContentId (if set)

Graph Source Records (@type: "f:GraphSourceDatabase"):

PropertyDescription
f:nameGraph source name
f:branchBranch name
f:statusStatus: “ready” or “retracted”
f:configConfiguration JSON
f:dependenciesArray of source ledger dependencies
f:indexIdIndex ContentId
f:indexTIndex transaction number

Example Queries

Find all ledgers with t > 100:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "select": ["?ledger", "?t"],
  "where": [
    {"@id": "?ns", "f:ledger": "?ledger", "f:t": "?t"}
  ],
  "filter": ["(> ?t 100)"]
}

Find ledgers by name pattern (hierarchical):

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "select": ["?ledger", "?branch"],
  "where": [
    {"@id": "?ns", "f:ledger": "?ledger", "f:branch": "?branch"}
  ],
  "filter": ["(strStarts ?ledger \"tenant1/\")"]
}

Find all BM25 graph sources:

{
  "@context": {
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?name", "?deps"],
  "where": [
    {"@id": "?gs", "@type": "f:Bm25Index", "f:name": "?name", "f:dependencies": "?deps"}
  ]
}

Retraction

Mark ledgers as inactive:

#![allow(unused)]
fn main() {
// Pseudo-code
nameservice.retract("mydb:old-branch").await?;
// Sets retracted=true, prevents new transactions
}

Storage Backends

The nameservice can be backed by various storage systems, each suited for different deployment scenarios:

File System (FileNameService)

  • Use Case: Single-server deployments, development, testing
  • Storage: Files in ns@v2/ directory structure
  • Format: JSON files per ledger ({ledger}/{branch}.json)
  • Characteristics: Simple, local, no external dependencies

AWS S3 (StorageNameService)

  • Use Case: Distributed deployments using S3 for both data and metadata
  • Storage: S3 objects with ETag-based compare-and-swap (CAS)
  • Characteristics: Scalable, distributed, requires AWS credentials

AWS DynamoDB (DynamoDbNameService)

  • Use Case: Distributed deployments needing low-latency metadata coordination
  • Storage: DynamoDB table with composite-key layout (one item per concern)
  • Format: Separate items for meta, head, index, config, status per ledger/graph source
  • Characteristics: Single-digit millisecond latency, per-concern write independence, conditional expressions for monotonic updates
  • See DynamoDB Nameservice Guide for setup and schema details

Memory (MemoryNameService)

  • Use Case: Testing, in-process applications
  • Storage: In-memory data structures
  • Format: No persistence
  • Characteristics: Fast, ephemeral, process-local

Graph Sources

The nameservice also tracks graph sources—specialized indexes and integrations:

  • BM25: Full-text search indexes
  • Vector: Vector similarity search
  • R2RML: Relational database mappings
  • Iceberg: Apache Iceberg table integrations

Graph sources have their own nameservice records (GraphSourceRecord) with similar metadata but different semantics. See the Graph Sources documentation for details.

Example Usage

Creating a Ledger

Ledgers are created automatically on the first transaction. Specify the ledger ID in your transaction:

POST /insert?ledger=mydb:main
Content-Type: application/json

{
  "@context": {
    "ex": "http://example.org/ns/",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "foaf:Person",
      "foaf:name": "Alice"
    }
  ]
}

What Happens:

  1. Transaction is processed and committed (assigned t=1)
  2. Commit is stored and its ContentId published to nameservice
  3. Nameservice record created/updated with commit_t=1
  4. Background indexing begins
  5. When indexing completes, index_t=1 is published

Querying a Ledger

Specify the ledger ID in your query:

SPARQL:

PREFIX ex: <http://example.org/ns/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?name
FROM <mydb:main>
WHERE {
  ex:alice foaf:name ?name
}

The FROM <mydb:main> clause specifies which ledger to query. The query engine:

  1. Looks up mydb:main in the nameservice
  2. Retrieves the index ContentId for efficient querying
  3. Combines indexed data with novelty layer for current results

JSON-LD Query:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "select": ["?name"],
  "from": "mydb:main",
  "where": [
    { "@id": "ex:alice", "foaf:name": "?name" }
  ]
}

Checking Ledger Status

Query the nameservice to check ledger state:

#![allow(unused)]
fn main() {
// Pseudo-code
let record = nameservice.lookup("mydb:main").await?;

if let Some(record) = record {
    println!("Latest commit: t={}", record.commit_t);
    println!("Latest index: t={}", record.index_t);
    
    if record.has_novelty() {
        println!("Novelty layer: {} transactions pending index", 
                 record.commit_t - record.index_t);
    }
    
    if record.retracted {
        println!("Ledger is retracted (inactive)");
    }
}
}

Branching

Branches let you create isolated copies of a ledger’s state for independent development. After branching, transactions on one branch are invisible to the other.

Creating a Branch

Branches are created from a source branch (default: main). The new branch starts at the same transaction time as the source:

mydb:main (t=5)
  └── create_branch("mydb", "dev")
mydb:dev  (t=5)  # starts with same data as main at t=5

Branches can also be nested — you can branch from a branch:

mydb:main (t=5)
  └── mydb:dev (t=7)      # branched from main at t=5, then advanced
        └── mydb:feature (t=8)  # branched from dev at t=7, then advanced

Data Isolation

After branching, each branch has its own independent transaction history:

mydb:main   → t=5 (shared) → t=6: insert Bob   → t=7: insert Dave
mydb:dev    → t=5 (shared) → t=6: insert Carol

Querying main returns Alice + Bob + Dave. Querying dev returns Alice + Carol. Bob and Dave never appear on dev; Carol never appears on main.

Storage Model

Branches share storage efficiently through a BranchedContentStore — a recursive content store that reads from the branch’s own namespace first, then falls back to parent namespaces for pre-branch-point content.

  • Commits are not copied — historical commits are read from the source namespace via fallback
  • Index files are copied — protects the branch from garbage collection on the source after reindexing
  • String dictionaries are globally shared — stored in a per-ledger @shared namespace (e.g., mydb/@shared/dicts/) rather than per-branch paths, so all branches read and write to the same location without copying or fallback. The @ prefix cannot collide with branch names. See Storage Traits — Global Dictionary Storage for details.

Each branch is a fully independent LedgerState with its own snapshot, novelty layer, commit chain, storage namespace, and t sequence.

Nameservice Metadata

When a branch is created, the nameservice records the source branch name on the new branch’s NsRecord (e.g., source_branch: Some("main")). The divergence point between the branch and its source is computed on demand by walking the commit chains rather than being stored as a static snapshot.

This metadata enables the system to reconstruct the BranchedContentStore tree when loading a branch. For nested branches, the ancestry chain is walked recursively via source_branch lookups.

API

Rust:

#![allow(unused)]
fn main() {
// Create a branch from main (default)
let record = fluree.create_branch("mydb", "dev", None).await?;

// Create a branch from another branch
let record = fluree.create_branch("mydb", "feature", Some("dev")).await?;

// List all branches
let branches = fluree.list_branches("mydb").await?;
}

HTTP:

# Create branch
curl -X POST http://localhost:8090/v1/fluree/branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "dev"}'

# List branches
curl http://localhost:8090/v1/fluree/branch/mydb

CLI:

# Create branch
fluree branch create dev --ledger mydb

# Create branch from another branch
fluree branch create feature-x --from dev --ledger mydb

# List branches
fluree branch list --ledger mydb

Dropping a Branch

Branches can be deleted with drop_branch. The main branch cannot be dropped.

Branches use reference counting (branches field on NsRecord) to track child branches. This enables safe deletion:

  • Leaf branch (no children, branches == 0): Fully dropped — storage artifacts are deleted, the NsRecord is purged, and the parent’s child count is decremented. If the parent was previously retracted and its count reaches 0, it is cascade-dropped.
  • Branch with children (branches > 0): Retracted (hidden from listings, transactions rejected) but storage is preserved so children can still read parent data via BranchedContentStore fallback. When the last child is dropped and the count reaches 0, the retracted branch is automatically cascade-purged.

Rust API:

#![allow(unused)]
fn main() {
// Drop a leaf branch
let report = fluree.drop_branch("mydb", "dev").await?;

// report.deferred == false for leaf branches
// report.deferred == true for branches with children
// report.cascaded contains any ancestor branches that were cascade-dropped
}

HTTP API:

curl -X POST http://localhost:8090/v1/fluree/drop-branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "dev"}'

CLI:

fluree branch drop dev --ledger mydb

See POST /branch, GET /branch/{ledger-name}, and POST /drop-branch for full endpoint details.

Rebasing a Branch

After a branch diverges from its source, you can rebase it to replay its unique commits on top of the source branch’s current HEAD. This brings the branch up to date with upstream changes without merging.

Rebase detects conflicts when both the branch and source have modified the same (subject, predicate, graph) tuples. Five conflict resolution strategies are available:

StrategyBehavior
take-both (default)Replay as-is, both values coexist (multi-cardinality)
abortFail on first conflict, no changes applied
take-sourceDrop branch’s conflicting flakes (source wins)
take-branchKeep branch’s flakes, retract source’s conflicting values
skipSkip entire commit if any flakes conflict

If the branch has no unique commits, rebase performs a fast-forward: it simply updates the branch point to the source’s current HEAD without replaying anything.

Rust API:

#![allow(unused)]
fn main() {
use fluree_db_api::ConflictStrategy;

let report = fluree.rebase_branch("mydb", "dev", ConflictStrategy::TakeBoth).await?;
// report.replayed — number of commits successfully replayed
// report.conflicts — conflicts detected and resolved
// report.fast_forward — true if no branch commits to replay
}

HTTP API:

curl -X POST http://localhost:8090/v1/fluree/rebase \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "dev", "strategy": "take-both"}'

CLI:

fluree branch rebase dev --ledger mydb --strategy take-both

See POST /rebase for full endpoint details.

Architecture Deep Dive

Ledger State Composition

Each ledger combines two layers for query execution:

1. Indexed Database

  • What: Persisted, optimized snapshot of ledger state
  • When: Created by background indexing processes
  • Storage: Large, read-optimized data structures
  • Query Performance: Fast, efficient for historical queries
  • Update Frequency: Asynchronous, may lag behind commits

2. Novelty Overlay

  • What: In-memory representation of uncommitted transactions
  • When: Transactions between index_t and commit_t
  • Storage: Transaction log entries
  • Query Performance: Slower, requires transaction replay
  • Update Frequency: Real-time, always current

Query Execution Model:

Query Result = Indexed Database (up to t=index_t) 
             + Novelty Overlay (t=index_t+1 to commit_t)

This architecture provides:

  • Fast historical queries: Use appropriate index snapshot
  • Real-time current queries: Include latest transactions via novelty
  • Efficient background indexing: Doesn’t block new writes
  • Consistent snapshots: Each query sees a consistent state

Concurrency Control

The nameservice ensures consistency through several mechanisms:

Ref Publishing

  • Commits: RefPublisher uses compare-and-set semantics on the current head identity plus a monotonic t guard
  • Indexes: publish_index() only accepts new_index_t > existing_index_t
  • Guarantee: Writers either advance the head or receive an explicit conflict outcome

Optimistic Concurrency

  • CAS Operations: Storage-backed nameservices use compare-and-swap (ETags)
  • Conflict Handling: Retry on conflicts (expected under contention)
  • Atomic Updates: Metadata updates are atomic per ledger

Consistency Guarantees

  • Read Consistency: All readers see the same nameservice state
  • Write Consistency: Monotonic updates prevent time-travel inconsistencies
  • Eventual Consistency: In distributed deployments, updates propagate eventually

Distributed Coordination

The nameservice enables coordination across distributed deployments:

Multi-Process Coordination

  • Shared State: Nameservice provides shared view of ledger state
  • Process Discovery: Processes can discover ledgers created by other processes
  • State Synchronization: Commit/index state visible to all processes

Geographic Distribution

  • Storage Backends: S3/DynamoDB enable cross-region coordination
  • Replication: Storage backends handle replication
  • Consistency: Eventual consistency with monotonic guarantees

Scalability Patterns

  • Horizontal Scaling: Multiple Fluree instances can share nameservice
  • Load Distribution: Queries can be distributed across instances
  • Storage Distribution: Ledger data can be stored across multiple backends

Nameservice Record Lifecycle

Understanding how records evolve:

1. Initialization
   - publish_ledger_init("mydb:main")
   - Creates record with commit_t=0, index_t=0

2. First Transaction
   - Transaction committed at t=1
   - Commit head advanced via `RefPublisher` CAS to `(commit_cid_1, 1)`
   - Record: commit_t=1, index_t=0

3. Indexing Completes
   - Index created for t=1
   - publish_index("mydb:main", index_cid_1, 1)
   - Record: commit_t=1, index_t=1

4. More Transactions
   - Transactions at t=2, t=3, t=4
   - Commit head advanced via CAS for each
   - Record: commit_t=4, index_t=1 (novelty: t=2,3,4)

5. Next Index
   - Index created for t=4
   - publish_index("mydb:main", index_cid_2, 4)
   - Record: commit_t=4, index_t=4 (no novelty)

Best Practices

Ledger Naming

  1. Use Descriptive Names: Choose names that clearly indicate purpose

    • Good: customers:main, inventory:prod, analytics:warehouse
    • Bad: db1:main, test:main, data:main
  2. Hierarchical Organization: Use slashes for logical grouping

    • Good: tenant/app:main, tenant/app:dev
    • Good: department/project:branch
  3. Branch Naming Conventions: Establish consistent branch naming

    • Good: feature/authentication, bugfix/login-error
    • Good: release/v1.2.0, hotfix/security-patch

Nameservice Configuration

  1. Choose Appropriate Backend: Match backend to deployment needs

    • Development: File system
    • Single server: File system
    • Distributed/Cloud: S3/DynamoDB
  2. Monitor Novelty Layer: Track gap between commits and indexes

    • Large gaps indicate indexing lag
    • May need to tune indexing frequency or resources
  3. Handle Retraction Carefully: Retracted ledgers preserve history

    • Use for soft deletes, not hard deletes
    • Historical queries still work on retracted ledgers

Performance Considerations

  1. Index Frequency: Balance indexing frequency with query needs

    • More frequent indexing: Better query performance, more storage
    • Less frequent indexing: Lower overhead, larger novelty layer
  2. Query Patterns: Understand your query patterns

    • Historical queries: Benefit from frequent indexing
    • Current-only queries: Can tolerate larger novelty layer
  3. Storage Planning: Plan for index storage growth

    • Each index is a complete snapshot
    • Historical indexes accumulate over time
    • Consider retention policies for old indexes

Operational Guidelines

  1. Monitor Nameservice Health: Track nameservice operations

    • Lookup latency
    • Publish success rates
    • Storage backend health
  2. Backup Strategy: Include nameservice in backup plans

    • File-based: Backup ns@v2/ directory
    • Storage-based: Use backend backup mechanisms
  3. Error Handling: Handle nameservice errors gracefully

    • Lookup failures: May indicate ledger doesn’t exist
    • Publish failures: May indicate contention (retry)
    • Storage errors: May indicate backend issues

Troubleshooting

Ledger Not Found

Symptom: Query fails with “ledger not found”

Possible Causes:

  • Ledger ID misspelled
  • Ledger not yet created (no transactions yet)
  • Ledger retracted
  • Nameservice backend misconfigured

Solutions:

  • Verify ledger ID spelling and format
  • Check if ledger exists: nameservice.lookup(ledger_id)
  • Verify nameservice backend configuration
  • Check ledger status (retracted?)

Stale Query Results

Symptom: Queries don’t see latest transactions

Possible Causes:

  • Novelty layer not being applied
  • Index lagging significantly behind commits
  • Query caching issues

Solutions:

  • Check commit_t vs index_t gap
  • Verify indexing process is running
  • Check query execution logs
  • Consider forcing index update

Nameservice Contention

Symptom: Publish operations failing with conflicts

Possible Causes:

  • Multiple processes updating same ledger
  • High transaction rate
  • Storage backend throttling

Solutions:

  • Implement retry logic with backoff
  • Reduce transaction rate if possible
  • Scale storage backend (if S3/DynamoDB)
  • Check for process coordination issues

This foundation of ledgers and the nameservice enables Fluree’s distributed, temporal graph database capabilities, providing the coordination layer needed for scalable, consistent data management.

Differentiator: Fluree’s nameservice architecture enables true distributed deployments with coordination across multiple processes and machines, unlike single-instance databases. The separation of commits and indexes, combined with the novelty layer, enables real-time queries while maintaining efficient background indexing—a unique architectural advantage.

Graph Sources

Differentiator: Graph sources are one of Fluree’s most powerful features, enabling seamless integration of specialized indexes and external data sources directly into graph queries. Unlike traditional databases that require separate systems for full-text search, vector similarity, or data lake access, Fluree makes these capabilities first-class citizens in the query language.

What Are Graph Sources?

A graph source is anything you can address by a graph name/IRI in Fluree query execution. Graph sources may be backed by:

  • Ledger graphs (default graph and named graphs stored as RDF triples)
  • Index graph sources (BM25 and vector/HNSW indexes)
  • Mapped graph sources (R2RML and Iceberg-backed mappings)

Key Characteristics

  • Query integration: Graph sources can be queried using the same SPARQL and JSON-LD Query interfaces
  • Transparent access: Applications don’t need to know whether data comes from a ledger graph source or a non-ledger graph source
  • Specialization: Each graph source type is optimized for specific query patterns
  • Time travel (type-specific): Some graph sources support time-travel queries, but support is not uniform across all types. Time-travel is implemented by each graph source type (not by the nameservice).

Graph Source Types

BM25 Full-Text Search

Differentiator: Fluree includes built-in BM25 full-text search indexing, eliminating the need for separate search systems like Elasticsearch.

Use Cases:

  • Product search with relevance ranking
  • Document search with keyword matching
  • Content discovery with fuzzy matching

Example:

{
  "@context": {
    "f": "https://ns.flur.ee/db#"
  },
  "from": "products:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 10,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Key Features:

  • Relevance scoring (BM25 algorithm)
  • Configurable parameters (k1, b)
  • Language-aware search
  • Optional time-travel support (BM25-owned manifest; see “Time Travel” below)

See the BM25 documentation for details.

Vector Similarity Search (ANN)

Differentiator: Native support for approximate nearest neighbor (ANN) queries via embedded HNSW indexes, enabling semantic search and similarity queries. Can run embedded (in-process) or via a dedicated remote search service.

Use Cases:

  • Semantic search (find similar documents)
  • Recommendation systems
  • Image similarity search
  • Embedding-based queries

Key Features:

  • Approximate nearest neighbor search (HNSW algorithm)
  • Configurable distance metrics (cosine, euclidean, dot product)
  • Embedded indexes (no external service required) or remote mode via fluree-search-httpd
  • Support for high-dimensional vectors
  • Snapshot-based persistence with watermarks (head-only in v1; time-travel not supported)

See the Vector Search documentation for details.

Apache Iceberg Integration

Differentiator: Query Apache Iceberg tables and Parquet files directly as graph sources, enabling seamless integration with data lake architectures.

Use Cases:

  • Query data lake formats without ETL
  • Combine graph data with tabular data
  • Analytics queries over large datasets
  • Integration with existing data pipelines

Example:

# Query Iceberg table as graph source
SELECT ?customer ?order ?amount
FROM <iceberg:sales:main>
WHERE {
  ?order ex:customer ?customer .
  ?order ex:amount ?amount .
  FILTER(?amount > 1000)
}

Key Features:

  • Direct querying of Iceberg tables
  • Parquet file support
  • R2RML mapping for tabular data (Iceberg-backed)
  • Time-travel via Iceberg snapshots
  • Direct S3 mode: bypass REST catalog servers for iceberg-rust / self-managed tables — reads version-hint.text for automatic version discovery

See the Iceberg documentation for details.

R2RML Relational Mapping

Differentiator: Map relational databases to RDF using R2RML (R2RML Mapping Language), enabling graph queries over SQL databases.

Use Cases:

  • Adopt graph queries alongside SQL data sources
  • Query SQL databases using SPARQL
  • Integrate existing systems
  • Unified query interface across data sources

Example:

# Query relational database via R2RML mapping
SELECT ?customer ?order
FROM <r2rml:orders:main>
WHERE {
  ?customer ex:hasOrder ?order .
  ?order ex:status "pending" .
}

Key Features:

  • R2RML standard compliance
  • Automatic RDF mapping from SQL schemas
  • Read-only access to source databases
  • Support for complex joins and transformations

See the R2RML documentation for details.

Graph Source Lifecycle

Creation

Graph sources are created through administrative operations, specifying:

  • Type: BM25, Vector, Iceberg, or R2RML
  • Configuration: Type-specific settings
  • Dependencies: Source ledgers or data sources
  • Branch: Graph sources support branching like ledgers

Example BM25 Graph Source Creation:

{
  "@type": "f:Bm25Index",
  "f:name": "products-search",
  "f:branch": "main",
  "f:sourceLedger": "products:main",
  "f:config": {
    "k1": 1.2,
    "b": 0.75,
    "fields": ["name", "description"]
  }
}

Indexing

Graph sources maintain their own indexes:

  • BM25: Full-text indexes are built from source ledger data
  • Vector: Embeddings stored in HNSW indexes (embedded or remote)
  • Iceberg: Metadata is cached for efficient querying
  • R2RML: Mapping rules are applied to generate RDF

Querying

Graph sources are queried like regular ledgers:

# Query any graph source
SELECT ?result
FROM <graph-source-name:branch>
WHERE {
  # Query patterns specific to graph source type
}

Time Travel

Some graph sources support historical queries using the @t: syntax in the ledger reference, but the behavior is graph-source-type specific:

{
  "@context": { "f": "https://ns.flur.ee/db#" },
  "from": "products:main@t:1000",
  "select": ["?product"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product" }
    }
  ]
}

BM25

BM25 can support time travel by maintaining a BM25-owned manifest in storage that maps transaction watermarks (t) to index snapshot addresses. The nameservice stores only a head pointer (an opaque address to the latest BM25 manifest/root) and does not store snapshot history.

Vector

Vector search is head-only in v1. If a query requests an @t: (or otherwise requests an historical view), vector search rejects the request with a clear “time-travel not supported” error.

Iceberg

Iceberg time travel (when used) is handled by Iceberg’s own snapshot/metadata model, not by nameservice-managed snapshot history.

Graph Source Architecture

Nameservice Integration

Graph sources are tracked in the nameservice alongside ledgers:

  • Discovery: List all graph sources via nameservice
  • Metadata: Configuration and status stored in nameservice
  • Coordination: Index state tracked separately from source ledgers

Important: for graph sources, the nameservice stores only configuration and a head pointer (as a ContentId) to the graph source’s latest index root/manifest. Snapshot history (if any) lives in graph-source-owned manifests in the content store.

Query Execution

When querying a graph source:

  1. Resolution: Query engine resolves graph source from nameservice
  2. Type Detection: Determines graph source type (BM25, Vector, etc.)
  3. Specialized Execution: Routes to type-specific query handler
  4. Result Integration: Results integrated with regular graph queries

Performance Characteristics

Each graph source type has different performance characteristics:

  • BM25: Fast keyword search, relevance scoring
  • Vector: Approximate similarity search, configurable accuracy/speed tradeoff
  • Iceberg: Columnar storage, efficient for analytical queries
  • R2RML: Depends on source database performance

Use Cases

Combine full-text search, vector similarity, and graph queries:

{
  "@context": {
    "ex": "http://example.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "from": "products:main",
  "select": ["?product", "?textScore", "?vectorScore"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
  ],
  "where": [
    { "@id": "?product", "ex:category": "electronics" },
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "wireless",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
    },
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 10,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vectorScore" }
    }
  ],
  "orderBy": [["desc", "(?textScore + ?vectorScore)"]]
}

Vector/HNSW graph sources are currently queried via JSON-LD Query using f:* patterns (e.g. f:graphSource, f:queryVector, f:searchResult). SPARQL query syntax for HNSW vector indexes is not currently available.

Data Lake Integration

Query both graph and tabular data:

SELECT ?customer ?graphData ?lakeData
FROM <customers:main>           # Graph ledger
FROM <iceberg:sales:main>        # Iceberg graph source
WHERE {
  # Graph data
  ?customer ex:preferences ?graphData .
  
  # Data lake data
  GRAPH <iceberg:sales:main> {
    ?sale ex:customer ?customer .
    ?sale ex:total ?lakeData .
  }
}

Combine semantic and keyword search:

{
  "@context": {
    "f": "https://ns.flur.ee/db#"
  },
  "from": "documents:main",
  "select": ["?document"],
  "where": [
    {
      "f:graphSource": "documents-search:main",
      "f:searchText": "machine learning",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?document" }
    }
  ]
}

Semantic similarity via HNSW vector indexes is also queried via JSON-LD Query using f:* patterns. SPARQL syntax for BM25 and vector index search is not currently available.

Best Practices

Graph Source Design

  1. Choose Appropriate Type: Match graph source type to query patterns

    • Keyword search → BM25
    • Semantic search → Vector
    • Analytics → Iceberg
    • SQL integration → R2RML
  2. Configuration Tuning: Optimize graph source parameters

    • BM25: Tune k1 and b for relevance
    • Vector: Choose appropriate distance metric
    • Iceberg: Optimize partition strategy
  3. Dependency Management: Understand source data dependencies

    • BM25/Vector: Keep in sync with source ledger
    • Iceberg: Handle schema evolution
    • R2RML: Map schema changes

Performance Optimization

  1. Index Maintenance: Keep graph source indexes up-to-date

    • Monitor indexing lag
    • Tune indexing frequency
    • Handle large data volumes
  2. Query Planning: Optimize queries using graph sources

    • Use graph sources for appropriate query patterns
    • Combine with graph queries efficiently
    • Consider cost of graph source queries
  3. Caching: Cache frequently accessed graph source results

    • Cache query results when appropriate
    • Consider graph source snapshot caching
    • Balance freshness vs performance

Operational Considerations

  1. Monitoring: Track graph source health

    • Index build status
    • Query performance
    • Storage usage
  2. Backup: Include graph sources in backup strategy

    • BM25 indexes can be rebuilt (or restored from stored snapshots/manifests, depending on configuration)
    • Vector indexes are stored as head snapshots (time-travel not supported in v1)
    • Iceberg metadata in nameservice
  3. Scaling: Plan for graph source scaling

    • BM25: Scale with source ledger size
    • Vector: Scale with embedding count
    • Iceberg: Leverage Iceberg partitioning

Comparison with Traditional Approaches

Traditional Architecture

Application
    ├── Graph Database (Neo4j, etc.)
    ├── Search Engine (Elasticsearch)
    ├── Vector DB (Pinecone, etc.)
    └── Data Lake (Spark, Presto)

Challenges:

  • Multiple systems to manage
  • Data synchronization complexity
  • Different query languages
  • Separate authentication/authorization

Fluree Graph Source Architecture

Application
    └── Fluree
        ├── Graph Ledgers
        ├── BM25 Graph Sources (built-in)
        ├── Vector Graph Sources
        └── Iceberg Graph Sources

Benefits:

  • Single query interface (SPARQL/JSON-LD Query)
  • Unified access control (policy enforcement)
  • Consistent time-travel across all data
  • Simplified operations and deployment

Graph sources make Fluree a unified platform for graph, search, vector, and data lake queries, eliminating the complexity of managing multiple specialized systems.

IRIs, Namespaces, and JSON-LD @context

Internationalized Resource Identifiers (IRIs)

In Fluree, all data identifiers use Internationalized Resource Identifiers (IRIs) - the internationalized version of URIs. IRIs uniquely identify:

  • Subjects: Entities in your data (people, products, concepts)
  • Predicates: Relationships or properties
  • Objects: Values or other entities
  • Graphs: Named data partitions

IRI Examples

# Full IRIs
<http://example.org/person/alice> <http://xmlns.com/foaf/0.1/name> "Alice" .
<http://example.org/person/alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .

# IRIs with Unicode characters
<http://例え.org/人物/アリス> <http://xmlns.com/foaf/0.1/name> "アリス" .

IRI Best Practices

  • Use stable domains: Choose domains you control or well-established standards
  • Hierarchical structure: Organize IRIs with meaningful paths
  • Avoid query parameters: IRIs should be clean identifiers, not URLs with parameters
  • Internationalization: IRIs support Unicode characters for global identifiers

Namespaces

Namespaces provide shorthand notation for IRIs, making data more readable and manageable. A namespace maps a prefix to a base IRI.

Defining Namespaces

{
  "@context": {
    "ex": "http://example.org/ns/",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  }
}

Using Namespaced IRIs

With the above context, you can write compact IRIs:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "foaf:Person",
      "foaf:name": "Alice Smith"
    }
  ]
}

This expands to:

{
  "@graph": [
    {
      "@id": "http://example.org/ns/alice",
      "@type": "http://xmlns.com/foaf/0.1/Person",
      "http://xmlns.com/foaf/0.1/name": "Alice Smith"
    }
  ]
}

JSON-LD @context

The @context is a JSON-LD mechanism that defines how to interpret the data. In Fluree, @context serves multiple purposes:

IRI Expansion/Compaction

{
  "@context": {
    "name": "http://xmlns.com/foaf/0.1/name",
    "Person": "http://xmlns.com/foaf/0.1/Person"
  },
  "@graph": [
    {
      "@id": "http://example.org/alice",
      "@type": "Person",
      "name": "Alice"
    }
  ]
}

The @context maps namehttp://xmlns.com/foaf/0.1/name and Personhttp://xmlns.com/foaf/0.1/Person.

Standard Prefixes

Fluree includes many standard prefixes by default:

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "dc": "http://purl.org/dc/elements/1.1/"
  }
}

@context in Queries

@context is also used in query results for compact output:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "foaf:Person",
      "foaf:name": "Alice"
    }
  ]
}

IRI Resolution Rules

Fluree follows strict IRI resolution rules:

Absolute IRIs

These are used as-is:

  • http://example.org/person/alice
  • https://data.example.com/product/123

Prefixed IRIs

These expand using @context:

  • ex:alicehttp://example.org/ns/alice (if ex maps to http://example.org/ns/)
  • foaf:namehttp://xmlns.com/foaf/0.1/name

Relative IRIs

These are resolved relative to a base IRI:

  • alicehttp://example.org/ns/alice (if base is http://example.org/ns/)

Strict Compact-IRI Guard

JSON-LD parsing in Fluree (queries and transactions) is strict by default about compact IRIs. If you write a value that looks like a compact IRI — prefix:suffix — but the prefix is not defined in @context, Fluree rejects the request at parse time with a clear error:

Unresolved compact IRI 'ex:Person': prefix 'ex' is not defined in @context.
If this is intended as an absolute IRI, use a full form (e.g. http://...)
or add the prefix to @context.

Why strict by default

Without the guard, a missing or misspelled prefix passes through silently — ex:Person gets stored as the literal string "ex:Person" instead of being expanded to a real IRI like http://example.org/Person. This produces incorrect data and confusing query results that are very hard to diagnose later.

The guard catches the most common cause of these bugs: forgetting an @context.

What the guard accepts

  • IRIs that resolve through @context (the normal happy path).
  • Hierarchical absolute IRIs whose suffix starts with //http://..., https://..., ftp://..., etc.
  • A small allowlist of well-known non-hierarchical schemes — urn:, did:, mailto:, tel:, data:, ipfs:, ipns:, geo:, blob:, magnet:, fluree:. Scheme names are matched case-insensitively per RFC 3986.
  • Variables (?x) and blank nodes (_:b0) bypass the guard entirely.

Where the guard applies

The guard runs at every position that semantically expects an IRI in JSON-LD:

  • @id, @type, predicates / property names
  • Datatype IRIs in @type of @value objects
  • Graph names and graph-crawl roots
  • Selection predicates (forward and reverse)
  • VALUES @id cells
  • @path aliases inside @context

It does not apply to:

  • SPARQL queries
  • Turtle / TriG transactions
  • Literal string values (only IRI positions)
  • Other consumers of the underlying JSON-LD expander (e.g. connection-config parsing)

Opting out per request

If you really need to accept unresolved compact-looking strings — for example, when migrating legacy data that uses bare prefix:suffix strings as opaque identifiers — set opts.strictCompactIri: false in the JSON-LD payload itself:

{
  "@context": {"ex": "http://example.org/ns/"},
  "opts": {"strictCompactIri": false},
  "@graph": [
    {"@id": "ex:alice", "ex:name": "Alice"},
    {"@id": "legacy:bob", "ex:name": "Bob"}
  ]
}

The same key works on both queries and transactions. The default is true. Keep it on unless you have a concrete reason to disable it.

For programmatic use from Rust, transactions can also set TxnOpts.strict_compact_iri directly; that takes precedence over opts.strictCompactIri in the JSON.

Blank Nodes and Anonymous Entities

Blank nodes represent entities without global identifiers:

{
  "@graph": [
    {
      "@id": "_:b1",
      "foaf:name": "Anonymous Person"
    }
  ]
}

Blank nodes are:

  • Local to a single transaction
  • Cannot be referenced across transactions
  • Useful for temporary or anonymous data

Best Practices

Namespace Organization

  1. Use stable prefixes: Don’t change prefix mappings once data is committed
  2. Standard vocabularies: Use well-known prefixes (foaf, dc, rdf, etc.)
  3. Custom domains: Use your own domain for application-specific terms
  4. Versioning: Consider versioning in namespace IRIs for evolution

IRI Design

  1. Descriptive paths: Use meaningful hierarchical paths
  2. Avoid special characters: Stick to URL-safe characters
  3. Consistent casing: Use consistent capitalization conventions
  4. Future-proofing: Design IRIs to accommodate future extensions

@context Management

  1. Shared contexts: Reuse @context definitions across transactions
  2. Minimal contexts: Only define prefixes you actually use
  3. Documentation: Document custom prefixes and their meanings
  4. Evolution: Plan for @context changes over time

Default Context

Each ledger can store a default context — a JSON object mapping prefixes to IRIs. This context is available for retrieval and can be injected into queries by compatibility surfaces (the Fluree HTTP server and CLI), but is not applied automatically by the core API (fluree-db-api).

How it’s populated

  • Bulk import: When importing Turtle data via fluree create --from, all @prefix declarations are captured and stored as the ledger’s default context, augmented with built-in prefixes (rdf, rdfs, xsd, owl, sh, geo).
  • Manual update: Use the CLI (fluree context set) or HTTP API (PUT /v1/fluree/context/{ledger...}) to set or replace the context at any time.

Core API behavior

When using fluree-db-api directly (e.g., embedding Fluree in a Rust application), queries must supply their own @context (JSON-LD) or PREFIX declarations (SPARQL). If a query omits context, IRIs are not compacted and compact IRIs without a matching prefix will produce an error.

To opt in to default context injection when using the API directly, fetch the stored context and use the with_default_context builder:

#![allow(unused)]
fn main() {
let ctx = fluree.get_default_context("mydb").await?;
let ledger = fluree.ledger("mydb").await?;
let view = GraphDb::from_ledger_state(&ledger)
    .with_default_context(ctx);
}

Or use the convenience method:

#![allow(unused)]
fn main() {
let view = fluree.db_with_default_context("mydb").await?;
}

Server and CLI behavior

The CLI automatically injects the ledger’s default context into queries that don’t provide their own. The HTTP API defaults this behavior off; pass ?default-context=true on a query request to opt in.

When default context injection is enabled:

  1. Query-level @context (JSON-LD) or PREFIX declarations (SPARQL) — always win
  2. Ledger default context — applied only when the query provides no context of its own
  3. Built-in prefixesrdf, rdfs, xsd, etc. are always available

Use with SPARQL (server/CLI)

The default context provides prefix definitions for SPARQL queries, so you don’t need to repeat PREFIX declarations in every query when injection is enabled. If the ledger’s default context includes {"ex": "http://example.org/"}, then you can write:

SELECT ?name WHERE {
  ex:alice ex:name ?name .
}

without an explicit PREFIX ex: <http://example.org/> declaration. If you declare any PREFIX in the query, the default context is not used at all — you must declare every prefix you need.

Use with JSON-LD queries (server/CLI)

Similarly, JSON-LD queries sent through an opt-in surface that omit @context receive the default context:

{
  "select": ["?name"],
  "where": [["ex:alice", "ex:name", "?name"]]
}

Viewing and updating

# View the default context
fluree context get mydb

# Replace it
fluree context set mydb -e '{"ex": "http://example.org/", "foaf": "http://xmlns.com/foaf/0.1/"}'

Via the HTTP API:

# Read
curl http://localhost:8090/v1/fluree/context/mydb:main

# Replace
curl -X PUT http://localhost:8090/v1/fluree/context/mydb:main \
  -H "Content-Type: application/json" \
  -d '{"ex": "http://example.org/"}'

See CLI context command and API endpoints for full details.

Opting out of the default context

When using a default-context-enabled surface, you may want full, unexpanded IRIs in query results — for debugging, interoperability with other RDF tools, or simply to avoid any prefix assumptions. You can opt out of the default context:

JSON-LD queries — pass an empty @context object:

{
  "@context": {},
  "select": ["?s", "?p", "?o"],
  "where": [["?s", "?p", "?o"]]
}

Results will contain full IRIs (e.g., http://example.org/ns/alice) instead of compacted forms (ex:alice).

SPARQL queries — include any PREFIX declaration. When a query declares its own prefixes, the default context is not injected. To opt out without defining any real prefix, use an empty default prefix:

PREFIX : <>
SELECT ?s ?p ?o WHERE { ?s ?p ?o }

Or simply declare the specific prefixes you need — the default context is only injected when the query has no PREFIX declarations whatsoever.

Storage

The default context is stored as a content-addressed blob in CAS, with a pointer (ContentId) in the nameservice config. Updates use compare-and-set semantics, so concurrent writers are safely handled. After an update, the server invalidates the cached ledger state so subsequent operations use the new context.

Integration with Standards

Fluree’s IRI system is fully compatible with:

  • RDF Standards: Works with RDF/XML, Turtle, N-Triples
  • SPARQL: IRIs work seamlessly in SPARQL queries
  • Linked Data: Enables publishing and consuming linked data
  • Semantic Web: Supports OWL ontologies and RDF Schema

This foundation enables Fluree to participate in the broader semantic web ecosystem while providing the convenience of JSON-LD’s compact syntax.

Datatypes and Typed Values

Fluree enforces strong typing for all literal values, ensuring data consistency and enabling efficient indexing and querying. Every literal value has an explicit datatype, following RDF and XSD standards.

Core Principle: No Untyped Literals

Unlike some databases that allow “plain” strings, Fluree requires every literal to have a datatype. This design provides:

  • Type Safety: Prevents type confusion in queries and applications
  • Consistent Comparisons: Typed values compare predictably
  • Standards Compliance: Follows RDF and SPARQL specifications
  • Query Optimization: Enables efficient indexing and query planning

XSD Datatypes

Fluree supports the core XML Schema Definition (XSD) datatypes:

String Types

{
  "@context": {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "ex": "http://example.org/ns/"
  },
  "@graph": [
    {
      "@id": "ex:book1",
      "ex:title": "The Great Gatsby",
      "ex:author": {
        "@value": "F. Scott Fitzgerald",
        "@type": "xsd:string"
      }
    }
  ]
}

xsd:string is the default for plain string literals when no type is specified.

Numeric Types

{
  "@graph": [
    {
      "@id": "ex:product1",
      "ex:price": {
        "@value": "29.99",
        "@type": "xsd:decimal"
      },
      "ex:quantity": {
        "@value": "100",
        "@type": "xsd:integer"
      },
      "ex:rating": {
        "@value": "4.5",
        "@type": "xsd:double"
      }
    }
  ]
}

Supported numeric types:

  • xsd:integer: Whole numbers (-∞, ∞)
  • xsd:long: 64-bit integers
  • xsd:int: 32-bit integers
  • xsd:short: 16-bit integers
  • xsd:byte: 8-bit integers
  • xsd:decimal: Arbitrary precision decimals
  • xsd:double: 64-bit floating point
  • xsd:float: 32-bit floating point

Boolean Type

{
  "@graph": [
    {
      "@id": "ex:user1",
      "ex:isActive": {
        "@value": "true",
        "@type": "xsd:boolean"
      },
      "ex:hasVerifiedEmail": {
        "@value": "false",
        "@type": "xsd:boolean"
      }
    }
  ]
}

xsd:boolean accepts: true, false, 1, 0.

Date and Time Types

{
  "@graph": [
    {
      "@id": "ex:event1",
      "ex:startDate": {
        "@value": "2024-01-15",
        "@type": "xsd:date"
      },
      "ex:startTime": {
        "@value": "14:30:00Z",
        "@type": "xsd:time"
      },
      "ex:createdAt": {
        "@value": "2024-01-15T14:30:00Z",
        "@type": "xsd:dateTime"
      }
    }
  ]
}

Temporal types:

  • xsd:date: Dates without time (e.g., 2024-01-15)
  • xsd:time: Times without date (e.g., 14:30:00Z)
  • xsd:dateTime: Full timestamps (e.g., 2024-01-15T14:30:00Z)

Other XSD Types

{
  "@graph": [
    {
      "@id": "ex:resource1",
      "ex:homepage": {
        "@value": "https://example.com",
        "@type": "xsd:anyURI"
      },
      "ex:duration": {
        "@value": "PT1H30M",
        "@type": "xsd:duration"
      }
    }
  ]
}

Additional types include:

  • xsd:anyURI: Web addresses and identifiers
  • xsd:duration: Time periods (ISO 8601 format)
  • xsd:gYear, xsd:gMonth, xsd:gDay: Partial date components

RDF Datatypes

Beyond XSD, Fluree supports RDF-specific datatypes:

Language-Tagged Strings

{
  "@graph": [
    {
      "@id": "ex:book1",
      "ex:title": {
        "@value": "The Great Gatsby",
        "@language": "en"
      },
      "ex:titel": {
        "@value": "Der große Gatsby",
        "@language": "de"
      }
    }
  ]
}

rdf:langString represents strings with language tags. This is distinct from plain strings and enables language-aware queries.

JSON Data

{
  "@graph": [
    {
      "@id": "ex:config1",
      "ex:settings": {
        "@value": "{\"theme\": \"dark\", \"notifications\": true}",
        "@type": "@json"
      }
    }
  ]
}

rdf:JSON stores JSON data as typed literals. This is useful for storing complex structured data that doesn’t fit the RDF model.

Geographic Data

{
  "@context": {
    "geo": "http://www.opengis.net/ont/geosparql#",
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "ex:location1",
      "ex:coordinates": {
        "@value": "POINT(2.3522 48.8566)",
        "@type": "geo:wktLiteral"
      }
    }
  ]
}

geo:wktLiteral stores geographic data in Well-Known Text (WKT) format. POINT geometries are automatically converted to an optimized binary encoding, while other geometry types (polygons, lines) are stored as strings.

See Geospatial for complete documentation.

Vector Data

{
  "@context": {
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "ex:doc1",
      "ex:embedding": {
        "@value": [0.1, 0.2, 0.3, 0.4],
        "@type": "@vector"
      }
    }
  ]
}

@vector (full IRI: https://ns.flur.ee/db#embeddingVector, prefix form: f:embeddingVector) stores numeric arrays as embedding vectors. Values are quantized to IEEE-754 f32 at ingest for compact storage and SIMD-accelerated similarity computation. In Turtle/SPARQL, use f:embeddingVector with the ^^ typed-literal syntax.

Without this type annotation, plain JSON arrays are decomposed into individual RDF values where duplicates may be removed and ordering is lost.

See Vector Search for complete documentation.

Fulltext Data

{
  "@context": {
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "ex:article-1",
      "ex:content": {
        "@value": "Rust is a systems programming language focused on safety and performance",
        "@type": "@fulltext"
      }
    }
  ]
}

@fulltext (full IRI: https://ns.flur.ee/db#fullText, prefix form: f:fullText) marks a string value for full-text search indexing. Values annotated with @fulltext are automatically analyzed (tokenized, stemmed, stopword-filtered) and indexed into per-predicate fulltext arenas during background index builds. This enables BM25-ranked relevance scoring via the fulltext() query function.

Without this type annotation, strings are stored as plain xsd:string values and support only exact matching and prefix queries – not relevance-ranked full-text search.

See Inline Fulltext Search for complete documentation.

Type Coercion and Compatibility

Automatic Type Promotion

Fluree handles type compatibility intelligently:

# This works - integer can be used where decimal is expected
SELECT ?price
WHERE {
  ?product ex:price ?price .
  FILTER(?price > 10.0)  # decimal comparison
}

Comparisons Between Incompatible Types

When a filter compares values of incompatible types (e.g., a number and a string), the behavior depends on the operator:

  • Equality (=) returns false — values of different types are never equal
  • Inequality (!=) returns true — values of different types are never equal
  • Ordering (<, <=, >, >=) raises an error — ordering between incompatible types is undefined

Numeric types (long, double, bigint, decimal) are mutually comparable via automatic promotion, so cross-numeric comparisons work as expected. Similarly, temporal types can be compared with string representations that parse to the same temporal type.

Type Casting in Queries

SPARQL provides functions for type conversion:

SELECT ?name (xsd:string(?id) AS ?idString)
WHERE {
  ?person ex:name ?name ;
          ex:id ?id .
}

Best Practices

Choosing Datatypes

  1. Be Specific: Use the most appropriate type for your data

    • Use xsd:integer for whole numbers that will be used in calculations
    • Use xsd:string for identifiers and labels
    • Use xsd:dateTime for timestamps
  2. Consider Query Patterns: Choose types that support your intended queries

    • Numeric types enable range queries and aggregations
    • Date types enable temporal queries
    • String types support text search
  3. Standards Alignment: Use standard datatypes where possible

    • Prefer XSD types over custom types
    • Use established vocabularies with well-defined ranges

Type Consistency

  1. Consistent Usage: Use the same datatype for equivalent properties across your data
  2. Change Planning: Plan for type changes as your data model evolves
  3. Validation: Validate data types at ingestion time

Performance Considerations

  1. Index Efficiency: Different types have different indexing characteristics

    • Numeric types support efficient range queries
    • String types support prefix and substring matching
    • Date types enable temporal range queries
  2. Storage Size: Some types are more storage-efficient than others

    • xsd:integer is more compact than xsd:string
    • xsd:boolean is more efficient than string representations

Type System Architecture

Internal Representation

Fluree stores all typed values with their datatype information:

  • Value Storage: The literal value as a string
  • Type Metadata: The datatype IRI
  • Comparison Logic: Type-aware comparison functions

Query Processing

The type system affects query processing:

  • Type Checking: Ensures type compatibility in filters and joins
  • Index Selection: Chooses appropriate indexes based on types
  • Result Formatting: Formats results according to datatype rules

Standards Compliance

Fluree’s type system is fully compliant with:

  • RDF 1.1 Concepts: Literal typing requirements
  • SPARQL 1.1: Type promotion and compatibility rules
  • XSD 1.1: Datatype definitions and constraints
  • JSON-LD 1.1: Typed value syntax

This strong typing foundation ensures data consistency, enables optimization, and maintains interoperability with the broader semantic web ecosystem.

Datasets and Named Graphs

Fluree supports SPARQL datasets, allowing queries to span multiple graphs simultaneously. This enables complex data integration scenarios where data from different sources or time periods needs to be queried together.

SPARQL Datasets

A dataset in SPARQL is a collection of graphs used for query execution:

  • Default Graph: The primary graph for triple patterns without GRAPH clauses
  • Named Graphs: Additional graphs identified by IRIs, accessible via GRAPH clauses

Dataset Structure

# Dataset with one default graph and two named graphs
FROM <ledger:main>           # Default graph
FROM NAMED <ledger:archive>  # Named graph
FROM NAMED <ledger:staging>  # Another named graph

Named Graphs

In SPARQL, named graphs are additional graphs (identified by IRIs) that participate in query execution and are accessed via GRAPH <iri> { ... }.

In Fluree, named graphs are used in several ways:

  • Multi-graph execution (datasets): FROM NAMED <...> identifies additional graph sources (often other ledgers or non-ledger graph sources) that you can reference with GRAPH <...> { ... }.
  • System named graphs: Fluree provides two built-in named graphs:
    • txn-meta (#txn-meta): commit/transaction metadata, queryable via the #txn-meta fragment (e.g., <mydb:main#txn-meta>)
    • config (#config): ledger-level configuration (policy, SHACL, reasoning, uniqueness constraints). See Ledger configuration.
  • User-defined named graphs: Fluree supports ingesting data into user-defined named graphs using TriG format. These graphs are identified by their IRI and can be queried using the structured from object syntax with a graph field.

HTTP endpoints and default graph behavior

Fluree exposes two query styles over HTTP:

  • Connection-scoped (POST /query): the ledger(s) and graphs are identified by from / fromNamed (JSON-LD) or FROM / FROM NAMED (SPARQL). This is the dataset path and supports multi-ledger datasets.
  • Ledger-scoped (POST /query/{ledger}): the ledger is fixed by the URL. The request may still select a named graph inside that ledger:
    • JSON-LD: "from": "default", "from": "txn-meta", or "from": "<graph IRI>"
    • SPARQL: FROM <default>, FROM <txn-meta>, FROM <graph IRI>, and FROM NAMED <graph IRI>

If the request body tries to target a different ledger than the one in the URL, the server rejects it with a “Ledger mismatch” error.

Txn metadata named graph (#txn-meta)

The txn-meta graph contains per-commit metadata stored as triples. This is useful for auditing and operational metadata (machine address, internal user id, job id, etc.).

Querying txn-meta via SPARQL:

PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/ns/>

SELECT ?commit ?t ?machine
FROM <mydb:main#txn-meta>
WHERE {
  ?commit f:t ?t .
  OPTIONAL { ?commit ex:machine ?machine }
}

Notes:

  • Using FROM <mydb:main#txn-meta> makes txn-meta the default graph for the query.
  • You can also use dataset syntax (FROM NAMED + GRAPH) if you need to mix default graph and txn-meta in one query.

User-Defined Named Graphs

Fluree supports ingesting data into user-defined named graphs using TriG format. TriG extends Turtle by adding GRAPH blocks that assign triples to specific named graphs.

Creating named graphs via TriG:

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

# Default graph triples
ex:company a schema:Organization ;
    schema:name "Acme Corp" .

# Named graph for product data
GRAPH <http://example.org/graphs/products> {
    ex:widget a schema:Product ;
        schema:name "Widget" ;
        schema:price "29.99"^^xsd:decimal .
}

# Named graph for inventory
GRAPH <http://example.org/graphs/inventory> {
    ex:widget schema:inventory 42 ;
        schema:warehouse "main" .
}

Submit TriG data via HTTP API:

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/trig" \
  --data-binary '@data.trig'

Querying user-defined named graphs (JSON-LD):

Use the structured from object with a graph field:

{
  "@context": { "schema": "http://schema.org/" },
  "from": {
    "@id": "mydb:main",
    "graph": "http://example.org/graphs/products"
  },
  "select": ["?name", "?price"],
  "where": [
    { "@id": "?product", "schema:name": "?name" },
    { "@id": "?product", "schema:price": "?price" }
  ]
}

System and user graphs:

  • Default graph (implicit): User data without GRAPH blocks
  • urn:fluree:{ledger_id}#txn-meta: Commit metadata
  • urn:fluree:{ledger_id}#config: Ledger configuration (see Ledger configuration)
  • User-defined named graphs: Identified by their IRI, allocated in order of first use

Notes:

  • Named graph IRIs are stored in the commit’s graph_delta field for replay
  • Queries against named graphs are scoped to the indexed data (post-indexing)
  • Maximum 256 named graphs can be introduced per transaction
  • Maximum IRI length is 8KB per graph IRI

Querying Named Graphs

# Query specific named graphs
SELECT ?name
FROM NAMED <http://example.org/ns/graph1>
WHERE {
  GRAPH <http://example.org/ns/graph1> {
    ?person ex:name ?name
  }
}

# Query across multiple graphs
SELECT ?graph ?name
FROM NAMED <http://example.org/ns/graph1>
FROM NAMED <http://example.org/ns/graph2>
WHERE {
  GRAPH ?graph {
    ?person ex:name ?name
  }
}

Default Graph Semantics

The default graph contains triples that are not in any named graph:

# Query only the default graph
SELECT ?name
FROM <ledger:main>
WHERE {
  ?person ex:name ?name
  # This matches triples in the default graph only
}

Union Default Graph

Some SPARQL implementations create a “union default graph” containing triples from all graphs. Fluree keeps them separate by default, but you can achieve union semantics:

# Manual union across graphs
SELECT ?name
FROM NAMED <ledger:main>
FROM NAMED <ledger:archive>
WHERE {
  { GRAPH <ledger:main> { ?person ex:name ?name } }
  UNION
  { GRAPH <ledger:archive> { ?person ex:name ?name } }
}

Multi-Ledger Datasets

Datasets can span multiple ledgers:

# Dataset across different ledgers
SELECT ?product ?price
FROM <inventory:main>        # Default graph from inventory ledger
FROM NAMED <pricing:main>    # Named graph from pricing ledger
WHERE {
  ?product ex:name "Widget" .
  GRAPH <pricing:main> {
    ?product ex:price ?price
  }
}

This enables federated queries across different data sources.

Time-Aware Datasets

Named graphs can represent different time periods:

# Query current and historical data
SELECT ?version ?name
FROM NAMED <ledger:main>      # Current data
FROM NAMED <ledger:archive>   # Historical data
WHERE {
  { GRAPH <ledger:main> {
      ?person ex:name ?name .
      BIND("current" AS ?version)
    }
  }
  UNION
  { GRAPH <ledger:archive> {
      ?person ex:name ?name .
      BIND("archive" AS ?version)
    }
  }
}

Graph Management

Graph Operations

Fluree supports graph-level operations:

# Insert into a specific graph
INSERT DATA {
  GRAPH <http://example.org/ns/metadata> {
    <http://example.org/data/doc1> ex:created "2024-01-15T10:00:00Z"^^xsd:dateTime .
  }
}

# Delete from a specific graph
DELETE {
  GRAPH <http://example.org/ns/temp> {
    ?s ?p ?o
  }
}
WHERE {
  GRAPH <http://example.org/ns/temp> {
    ?s ?p ?o
  }
}

Graph Metadata

For transaction-scoped metadata, Fluree uses the txn-meta named graph (see above). Transaction metadata is stored as properties on commit subjects in txn-meta, and can be queried independently of user data.

Use Cases

Data Partitioning

Separate different types of data:

FROM NAMED <urn:customers>
FROM NAMED <urn:products>
FROM NAMED <urn:orders>

SELECT ?customer ?product
WHERE {
  GRAPH <urn:customers> { ?customer foaf:name ?name }
  GRAPH <urn:orders> {
    ?order ex:customer ?customer ;
           ex:product ?product .
  }
}

Access Control

Different graphs can have different permissions:

  • Public graph: Open access
  • Private graph: Restricted access
  • Admin graph: Administrative data

Data Provenance

Track data sources and quality:

FROM NAMED <urn:sensor1>
FROM NAMED <urn:sensor2>

SELECT ?sensor ?reading ?quality
WHERE {
  GRAPH ?sensor {
    ?obs ex:reading ?reading ;
         ex:quality ?quality .
  }
  FILTER(?quality > 0.8)  # Only high-quality readings
}

Version Management

Maintain different versions of data:

FROM NAMED <urn:v1.0>
FROM NAMED <urn:v2.0>

SELECT ?feature ?version
WHERE {
  GRAPH ?version {
    ?feature ex:status "active"
  }
}

Performance Considerations

Index Optimization

Named graphs affect indexing strategy:

  • Graph-aware indexes: Indexes can be partitioned by graph
  • Cross-graph joins: May require special optimization
  • Graph statistics: Maintain statistics per graph for query planning

Query Planning

The query planner considers:

  • Graph selectivity: Which graphs contain relevant data
  • Join patterns: How graphs are connected in the query
  • Graph size: Larger graphs may need different strategies

Best Practices

  1. Logical Partitioning: Use graphs for logical data separation
  2. Size Considerations: Very large graphs may impact query performance
  3. Naming Conventions: Use consistent IRI patterns for graph names
  4. Documentation: Document the purpose and schema of each graph

Standards Compliance

Fluree’s dataset implementation follows:

  • SPARQL 1.1 Query: FROM and FROM NAMED clauses
  • SPARQL 1.1 Update: GRAPH clauses in updates
  • RDF 1.1 Datasets: Named graph semantics
  • JSON-LD 1.1: @graph syntax for named graphs

This enables seamless integration with other RDF tools and SPARQL endpoints while providing Fluree’s unique temporal and ledger capabilities.

Time Travel

Differentiator: Fluree is a temporal database that preserves the complete history of all changes. Every transaction is timestamped, enabling queries against any previous state of the data. This “time travel” capability is fundamental to Fluree’s architecture and provides capabilities that most databases cannot match.

Query Formats

Time travel is supported in both JSON-LD and SPARQL query formats. Examples in this document primarily use JSON-LD syntax with SPARQL equivalents shown where relevant.

Transaction Time

Every transaction in Fluree receives a unique transaction time (t) - a monotonically increasing integer that represents the logical time of the transaction.

Transaction Ordering

Transaction 1: t=1
Transaction 2: t=2
Transaction 3: t=3
...
  • Monotonic: Each new transaction gets a higher t than all previous transactions
  • Unique: No two transactions share the same t
  • Global: Transaction times are unique across the entire Fluree instance

Current Time

The current time is the highest transaction time that has been committed. Queries without a time specifier automatically query the current state:

{
  "@context": { "ex": "http://example.org/ns/" },
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

You can also explicitly specify @t:latest to query the latest state:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:latest",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Historical Queries

Fluree supports querying data as it existed at any point in time using the @ syntax in ledger references.

Point-in-Time Queries

Query data as it existed at a specific transaction using the from field with @t::

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:100",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Query at ISO Timestamp

Query using ISO 8601 datetime with @iso::

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@iso:2024-01-15T10:30:00Z",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Query at Commit ContentId

Query at a specific commit using @commit: with a commit ContentId:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@commit:bafybeig...",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Temporal Data Model

Immutable Facts

Once committed, data is immutable. Changes are represented as new facts that supersede previous ones:

t=1: Alice age 25  (assertion)
t=5: Alice age 26  (retraction of age 25, assertion of age 26)

History queries capture both the retraction and assertion with @op:

[
  [25, 1, true],
  [25, 5, false],
  [26, 5, true]
]

Each row shows [value, transaction_time, op] where op is true for assertions and false for retractions.

Valid Time vs Transaction Time

Fluree primarily uses transaction time (when the fact was recorded in the database). For applications needing valid time (when the fact was true in the real world), this can be modeled explicitly as properties:

{
  "@context": { "ex": "http://example.org/ns/" },
  "@graph": [
    {
      "@id": "ex:alice-employment-1",
      "ex:person": "ex:alice",
      "ex:company": "ex:company-a",
      "ex:validFrom": "2020-01-01T00:00:00Z",
      "ex:validTo": "2023-12-31T23:59:59Z"
    }
  ]
}

This allows you to query by both:

  • Transaction time: When was this recorded? (using @t:, @iso:, @commit:)
  • Valid time: When was this true? (using standard WHERE clause filters on ex:validFrom/ex:validTo)

Snapshot and Indexing

Database Snapshots

Fluree maintains indexed snapshots at regular intervals for efficient historical access:

  • Index: A complete, optimized snapshot of the database at a specific t
  • Novelty: Uncommitted transactions since the last index
  • Background Indexing: Continuous process that creates new indexes

Query Execution Model

Queries combine indexed data with novelty:

Query Result = Indexed Database (up to t=index) + Novelty (t=index+1 to current)

This provides:

  • Fast historical queries: Use appropriate index
  • Real-time current queries: Include latest transactions
  • Consistent snapshots: Each query sees a consistent state

Consistency and Read-After-Write

Fluree’s query engine is eventually consistent. When a transaction commits at t=N, queries running against a different process or a warm cache may still see a state older than t=N until the cache is refreshed.

The Problem

Process A: transact → receives t=42
Process B: query    → sees t=40 (stale cache)

This is expected in architectures where the query server is a separate peer, or in serverless environments where a warm Lambda invocation holds a cached ledger state from a previous request.

The Solution: refresh() with min_t

The refresh() API accepts a min_t parameter that asserts the cached ledger has reached at least a specific transaction time. If the ledger hasn’t reached that t after pulling the latest state from the nameservice, the call returns an error so the caller can retry.

Flow:

1. Client transacts → receives t=42
2. Client calls refresh(ledger, min_t=42)
3. Fluree checks cached t:
   - If cached t >= 42 → immediate success (no I/O)
   - If cached t < 42  → pull latest from nameservice, apply commits
   - If still t < 42   → return AwaitTNotReached error
4. Client queries at t >= 42 with confidence

Usage Patterns

Same-process (embedded Fluree):

In a single process where you transact and query through the same Fluree instance, the cache is updated in-place by the transaction. min_t is typically not needed, but can serve as a safety assertion.

Multi-process / Serverless:

When the transacting process and querying process are separate (e.g., a Lambda that writes and another that reads), pass the t from the transaction receipt through your event/message payload and use min_t to gate the query:

Writer Lambda:
  receipt = transact(data)
  publish_event({ t: receipt.t, ... })

Reader Lambda:
  event = receive_event()
  refresh(ledger, min_t=event.t, timeout=5s)
  query(ledger)  // guaranteed to see at least t=event.t

HTTP API:

The HTTP query endpoint does not yet expose min_t directly. For HTTP clients, use the SSE events endpoint (GET /v1/fluree/events) to receive real-time commit notifications, or poll the ledger info endpoint until the desired t is reached.

Rust API

See Using Fluree as a Rust Library — Read-After-Write Consistency for full code examples including retry-with-backoff patterns.

#![allow(unused)]
fn main() {
use fluree_db_api::RefreshOpts;

// After a transaction returns t=42:
let opts = RefreshOpts { min_t: Some(42) };
let result = fluree.refresh("mydb:main", opts).await?;
// result.t >= 42 is guaranteed if Ok
}

History Queries for Change Tracking

History queries let you see all changes (assertions and retractions) within a time range. Specify the range using from and to keys with time-specced endpoints.

Entity History (JSON-LD)

Track all changes to a specific entity over time by specifying a time range:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?name", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

The @t and @op annotations bind the transaction time and operation type:

  • @t - Transaction time (integer) when the fact was asserted or retracted.
  • @op - Boolean: true for assertions, false for retractions. Mirrors Flake.op on disk. Both literal- and IRI-valued objects carry the metadata.

Returns results showing all changes:

[
  ["Alice", 1, true],
  ["Alice", 5, false],
  ["Alicia", 5, true]
]

Entity History (SPARQL)

The same query in SPARQL uses RDF-star syntax with FROM...TO:

PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?name ?t ?op
FROM <ledger:main@t:1>
TO <ledger:main@t:latest>
WHERE {
  << ex:alice ex:name ?name >> f:t ?t .
  << ex:alice ex:name ?name >> f:op ?op .
}
ORDER BY ?t

Property-Specific History

Query changes for specific properties:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:100",
  "select": ["?age", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "ex:age": { "@value": "?age", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

All Properties History

Query all property changes for an entity:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?p", "?v", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "?p": { "@value": "?v", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

Time Range with Datetime

Query history using ISO 8601 datetime strings:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@iso:2024-01-01T00:00:00Z",
  "to": "ledger:main@iso:2024-12-31T23:59:59Z",
  "select": ["?name", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
  ]
}

Filter by Operation Type

Filter to show only assertions or only retractions:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?name", "?t"],
  "where": [
    { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
    ["filter", "(= ?op \"retract\")"]
  ]
}

Pattern History Across Subjects

Query changes for a specific property across all subjects:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?person", "?status", "?t", "?op"],
  "where": [
    { "@id": "?person", "ex:status": { "@value": "?status", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

Performance Characteristics

Time Resolution Performance

Different time specifiers have different performance characteristics:

  • @t:NNN (fastest): Direct transaction number, no resolution needed
  • @iso:DATETIME: O(log n) binary search through commit timestamps using POST index
  • @commit:CID: Bounded SPOT scan, O(k) where k is commits matching prefix (use longer prefixes for better performance)

Index Selection

Fluree automatically selects the most appropriate index for historical queries:

  • Recent history: Uses current index + novelty (uncommitted transactions)
  • Historical snapshots: Uses closest index snapshot to target time
  • Point queries (@t:): Direct index lookup for specific transaction

History Query Performance

History queries scan flakes within the specified time range:

  • Entity history (specific @id): SPOT index scan on subject
  • Property history (specific predicate): Narrower SPOT scan with predicate filter
  • All properties (variable predicate ?p): Full SPOT scan for subject
  • Cross-entity (variable subject ?s): POST/PSOT index scan (can be slower for common predicates)

Optimization Strategies

  1. Use Transaction Numbers: When possible, use @t:NNN instead of @iso:DATETIME
  2. Narrow History Patterns: Use [subject, predicate] instead of [subject] when you only need specific properties
  3. Limit Time Ranges: Specify realistic from/to bounds rather than querying all history
  4. ContentId Prefix Length: Use sufficiently long ContentId prefixes to avoid ambiguity checks
  5. Index Density: More frequent indexing improves historical query performance for distant past

Storage Implications

  • Full History: All transaction history is preserved (immutable append-only)
  • Index Snapshots: Periodic snapshots enable efficient historical queries without replaying all transactions
  • Commit Metadata: Stored as queryable flakes (~8-9 flakes per commit)
  • Transaction JSON: Optionally stored for audit trails (enable with txn: true)

Practical Applications

Version Control

Treat data like code with version control:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "app:production@t:1000",
  "select": ["?config"],
  "where": [
    { "@id": "?setting", "ex:value": "?config" }
  ]
}

Regulatory Compliance

Maintain complete audit trails - query data as it existed at time of consent:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "users:main@iso:2024-05-25T14:30:00Z",
  "select": ["?predicate", "?data"],
  "where": [
    { "@id": "ex:alice", "?predicate": "?data" }
  ]
}

Change History Analysis

Track how data evolved over time:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "sales:main@iso:2024-01-01T00:00:00Z",
  "to": "sales:main@iso:2024-12-31T23:59:59Z",
  "select": ["?order", "?amount", "?t", "?op"],
  "where": [
    { "@id": "?order", "ex:amount": { "@value": "?amount", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

Debugging and Troubleshooting

Investigate system state at time of incident:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "system:config@iso:2024-01-15T09:15:00Z",
  "select": ["?setting", "?config"],
  "where": [
    { "@id": "?setting", "ex:value": "?config" }
  ]
}

Time Travel in Multi-Ledger Scenarios

Cross-Ledger Temporal Queries

Query across ledgers at consistent time points:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": [
    "customers:main@t:1000",
    "orders:main@t:1000"
  ],
  "select": ["?customer", "?order"],
  "where": [
    { "@id": "?customer", "ex:name": "Alice" },
    { "@id": "?order", "ex:customer": "?customer" }
  ]
}

Ledger Branching

Time travel enables sophisticated branching workflows by querying historical states:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:500",
  "select": ["?entity", "?property", "?value"],
  "where": [
    { "@id": "?entity", "?property": "?value" }
  ]
}

You can then use this historical state as a basis for creating a new branch or comparing against current state.

Common Patterns

Compare Current vs Historical State

Query the same entity at two different points in time:

// Query current state
{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main",
  "select": ["?price"],
  "where": [
    { "@id": "ex:product-123", "ex:price": "?price" }
  ]
}

// Query historical state
{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:100",
  "select": ["?price"],
  "where": [
    { "@id": "ex:product-123", "ex:price": "?price" }
  ]
}

Find When a Change Occurred

Use history queries to identify when a specific change happened:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?status", "?t", "?op"],
  "where": [
    { "@id": "ex:product-123", "ex:status": { "@value": "?status", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

The results show when ex:status changed, with ?op = false (retract) for the old value and ?op = true (assert) for the new value at the same transaction time.

Audit Trail for Compliance

Generate a complete audit trail for a sensitive entity:

{
  "@context": { "schema": "http://schema.org/" },
  "from": "users:main@iso:2024-01-01T00:00:00Z",
  "to": "users:main@t:latest",
  "select": ["?property", "?value", "?t", "?op"],
  "where": [
    { "@id": "schema:Person/12345", "?property": { "@value": "?value", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

This returns all changes with transaction times for audit purposes. Each result row shows the property, value, when it was changed, and whether it was an assertion or retraction.

Rollback Detection

Find what changed after a specific commit:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "config:main@t:50",
  "to": "config:main@t:latest",
  "select": ["?setting", "?value", "?t", "?op"],
  "where": [
    { "@id": "?setting", "ex:config": { "@value": "?value", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

This shows all configuration changes since transaction 50, useful for identifying what to rollback. You can first query "from": "config:main@commit:bafybeig..." to find the transaction number (using point-in-time queries), then use that in the history query.

Reproduce a Bug at Specific Time

Query the exact state of the system when a bug was reported:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": [
    "products:main@iso:2024-06-15T14:30:00Z",
    "inventory:main@iso:2024-06-15T14:30:00Z"
  ],
  "select": ["?product", "?stock", "?reserved"],
  "where": [
    { "@id": "?product", "ex:stockLevel": "?stock" },
    { "@id": "?product", "ex:reserved": "?reserved" }
  ]
}

This recreates the exact state across multiple ledgers at the time the bug occurred, making debugging much easier.

Best Practices

Time Travel Guidelines

  1. Explicit Time References: Always specify clear time references (@t:, @iso:, or @commit:) for reproducible queries
  2. Time Zone Awareness: Use UTC for ISO timestamps to avoid ambiguity
  3. ContentId Length: Use sufficiently long ContentId prefixes to avoid collisions
  4. Performance Testing: Test query performance across different time ranges and ledger sizes

History Query Patterns

  1. Narrow Your Scope: Use specific property patterns rather than wildcard ?p when you only need certain properties
  2. Limit Time Ranges: Specify realistic time ranges with from and to rather than @t:1 to @t:latest
  3. Use Filters: Filter by @op to show only assertions or retractions when you don’t need both
  4. Order Results: Use orderBy: "?t" to see changes in chronological order

Data Modeling for Time

  1. Temporal Validity: Model valid time explicitly when needed (separate from transaction time)
  2. Change Tracking: Use history queries rather than storing change logs manually
  3. Immutable Design: Design for immutability from the start - never update in place
  4. Audit Patterns: Leverage history queries for audit trails instead of separate audit tables

Operational Considerations

  1. Index Maintenance: Monitor and tune background indexing for optimal historical query performance
  2. Storage Planning: Plan storage growth for historical data (all history is preserved)
  3. Query Optimization: Use time-specific queries (@t:) rather than datetime resolution (@iso:) when transaction numbers are known
  4. Backup Strategy: Include temporal aspects in backup/recovery plans - commits and indexes are both critical

Implementation Architecture

Transaction Pipeline

  1. Transaction Reception: Assign new transaction time (t)
  2. Validation: Check against current state
  3. Commitment: Persist transaction with ISO timestamp
  4. Commit Metadata: Store commit ContentId, timestamp, and optional transaction JSON
  5. Indexing: Background process creates new indexes
  6. Publication: Update nameservice with new transaction time

Time Travel Resolution

When you query with @t:, @iso:, or @commit::

  1. @t:NNN - Direct transaction number (fastest)
  2. @iso:DATETIME - Binary search through commit timestamps using POST index
  3. @commit:CID - Bounded SPOT scan to find matching commit

Query Execution

  1. Time Resolution: Resolve time specifiers to specific t values
  2. Index Selection: Choose appropriate index for target time
  3. Novelty Application: Apply intervening transactions if needed
  4. Result Generation: Return consistent snapshot

History Query Execution

  1. Time Range Detection: The from and to keys with time-specced endpoints activates history mode
  2. Pattern Resolution: WHERE patterns are executed with history mode enabled
  3. Metadata Capture: Transaction time (@t) and operation (@op) are captured for each binding
  4. Result Generation: Results include both assertions and retractions within the time range

This temporal foundation makes Fluree uniquely powerful for applications requiring complete historical visibility, audit capabilities, and temporal analytics.

Policy Enforcement

Fluree enforces access control inside the database. Individual facts (flakes) are filtered against policy rules during query and transaction execution, so the same query returns different results to different identities — automatically. The application doesn’t filter; the database does.

Why triple-level

Most databases enforce access at the row, table, or schema level. That granularity is awkward for graph data, where a single subject may have facts that are public (schema:name), employee-only (ex:department), and HR-only (ex:salary). Fluree’s enforcement happens per flake?subject ?predicate ?object — so policies can permit name, allow department to platform employees, and restrict salary to managers in the same department, all from one query.

The consequences:

  • No application-side filtering. Security can’t be bypassed by buggy code paths because the database never returns flakes the requester isn’t allowed to see.
  • Auditable. Policies are themselves data. They live in the ledger, are time-travelable, and can be queried — SELECT ?p WHERE { ?p a f:AccessPolicy }.
  • Multi-tenant ready. A single ledger can serve many tenants, with isolation enforced at flake level.
  • Compliance-friendly. GDPR / HIPAA-style “minimum necessary” access is the default behavior, not a check the app forgot to do.

What a policy looks like

Every policy is a JSON-LD node typed f:AccessPolicy. A policy has three orthogonal pieces:

  • Targetingf:onProperty, f:onClass, f:onSubject (each an array of @id references). Omit them all to make a default policy that applies to every flake.
  • Actionf:action with values f:view (queries) and/or f:modify (transactions).
  • Decision — either:
    • f:allow: true — unconditional allow, or
    • f:allow: false — unconditional deny, or
    • f:query: "<JSON-encoded WHERE>" — allow when the embedded query produces at least one binding for the targeted flake.

Two further knobs:

  • f:required: true — the policy must allow for access to the targeted flake to be granted, even when default-allow is true. Use it for hard constraints (PII protection, write barriers).
  • f:exMessage — a string returned to the caller when this policy denies a transaction.

A worked example:

{
  "@id": "ex:salary-restriction",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onProperty": [{"@id": "ex:salary"}],
  "f:action": [{"@id": "f:view"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"manager\"}}"
}

Translation: for every flake whose property is ex:salary and that someone is trying to read, this policy must allow. The embedded f:query runs with ?$identity pre-bound to the requester; if it returns a binding (i.e. the identity has role "manager"), the flake is permitted.

Variables in f:query

Inside an f:query, two variables are pre-bound:

VariableMeaning
?$thisThe subject of the targeted flake (the entity being read or written).
?$identityThe IRI of the requesting identity, supplied via policy-values.

Anything else is bound by the embedded WHERE just like a normal Fluree query.

How the engine combines policies

When a request hits a flake, the engine collects every policy that targets it:

  1. Required policies (with f:required: true) must all allow. If any required policy denies — including by returning no f:query bindings — the flake is denied.
  2. If no required policies target the flake, any allow is enough. Fluree uses allow-overrides across the non-required set.
  3. If no policies apply at all, the request falls back to default-allow.

default-allow: false is fail-closed and the right choice for most production deployments.

Where policies come from

Two delivery channels, often mixed:

  • Stored — write policies into the ledger as data. Tag each policy with a class (e.g. ex:CorpPolicy), and tag each identity entity with f:policyClass linking to that class. At request time, pass policy-class: ["ex:CorpPolicy"] and the engine pulls the matching policy set from the ledger automatically. Stored policies are versioned, time-travelable, and consistent across all callers — the right approach for production.
  • Inline — pass policies in opts.policy (an array of policy nodes) or via the fluree-policy HTTP header. Useful for ad-hoc queries, automated tests, and admin scripts.

The two can be combined: a query can carry a policy-class and an additional inline policy.

Identity binding

An identity entity ties a caller (DID, JWT subject, application user) to graph nodes that policies can reason about:

{
  "@id": "ex:aliceIdentity",
  "ex:user": {"@id": "ex:alice"},
  "f:policyClass": [{"@id": "ex:CorpPolicy"}]
}

Caller traffic carrying identity: "ex:aliceIdentity" causes:

  1. Fluree binds ?$identity to ex:aliceIdentity in every f:query.
  2. Stored policies tagged ex:CorpPolicy are loaded.
  3. Each policy’s f:query runs against the snapshot, with ?$identity and ?$this pre-bound, deciding flake by flake whether the request is permitted.

The ex:user link is a domain-specific convention — your f:querys use it to reach from the identity to the human/service the policies should reason about. Any modeling works; nothing about that link is special to Fluree.

What you control at the request boundary

Each request can supply:

  • identity — IRI of the calling identity entity. Used to pre-bind ?$identity and to discover the identity’s f:policyClass.
  • policy-class — one or more class IRIs to pull stored policies by class.
  • policy-values — an object of additional ?$var bindings injected into every policy’s f:query.
  • policy — an inline JSON-LD policy array.
  • default-allow — boolean fallback for flakes no policy targets.

Over JSON-LD, these go inside opts. Over SPARQL, they’re sent as fluree-* headers (SPARQL has no opts block). When the server is configured with a default policy class, a verified bearer token’s identity is auto-applied — see the policy cookbook for the request shapes and the server-side data_auth_default_policy_class option in Configuration.

Query enforcement vs transaction enforcement

The same policy model governs both, distinguished by f:action:

  • f:view — runs during query execution. Flakes that fail the policy are filtered from the result; the query never sees them.
  • f:modify — runs during transaction staging. The transaction is rejected (with f:exMessage if provided) if a write would touch flakes the identity isn’t allowed to modify.

A single policy can govern both ("f:action": [{"@id": "f:view"}, {"@id": "f:modify"}]). Most realistic policy sets mix view-only restrictions, modify-only restrictions, and a small number of [f:view, f:modify] defaults.

Policies are data

Because policies are flakes:

  • Time travel. Query at past t to see what was in effect.
  • Branchable. Trial policies on a branch before merging.
  • Versionable. Edit through normal transactions; full history kept.
  • Self-querying. Run reports over the policies themselves.

This makes policy management a normal Fluree workflow rather than a sidecar problem.

Performance shape

Policy evaluation has two phases — load (read the policies relevant to this request once) and apply (filter flakes during plan execution). Cost scales mostly with the apply phase: how many flakes the request touches, and how expensive each policy’s f:query is.

Two practical implications:

  • Target policies. A policy with f:onProperty or f:onClass only runs on flakes whose predicate or rdf:type matches. Default policies (no targeting) run on every flake. Prefer targeting wherever it makes sense.
  • Keep f:query cheap. Lean on identity attributes already loaded (@type, f:policyClass, role flags) rather than deep traversals.

For deeper architectural detail see Policy model and inputs, Policy in queries, and Policy in transactions.

Verifiable Data

Differentiator: Fluree supports cryptographically signed transactions using industry-standard formats (JWS and Verifiable Credentials), enabling tamper-proof audit trails and trustless data exchange. Every transaction can be cryptographically verified, providing cryptographic proof of data provenance and integrity.

Note: Requires the credential feature flag. See Compatibility and Feature Flags.

What Is Verifiable Data?

Verifiable data in Fluree refers to transactions that are cryptographically signed, providing proof of:

  • Authenticity: Who created the transaction
  • Integrity: That the data hasn’t been tampered with
  • Non-repudiation: The signer cannot deny creating the transaction
  • Provenance: The origin and history of the data

Key Characteristics

  • Cryptographic Signatures: Transactions signed using standard cryptographic algorithms
  • Industry Standards: Support for JWS (JSON Web Signatures) and Verifiable Credentials (VC)
  • Tamper-Proof: Any modification to signed data invalidates the signature
  • Verifiable: Anyone can verify signatures without special access

Why Verifiable Data Matters

Traditional Database Limitations

Most databases provide:

  • Authentication: Who can access the database
  • Authorization: What they can do
  • Audit Logs: What happened (but logs can be modified)

Problems:

  • No cryptographic proof of data origin
  • Audit logs can be tampered with
  • Difficult to prove data integrity
  • No way to verify data across systems

Fluree’s Approach

Fluree provides:

  • Cryptographic Signatures: Every transaction can be signed
  • Tamper-Proof History: Signed transactions cannot be modified
  • Verifiable Provenance: Anyone can verify data origin
  • Trustless Exchange: Data can be shared without trusting intermediaries

Benefits:

  • Audit Compliance: Cryptographic proof for compliance requirements
  • Data Integrity: Detect any tampering with data
  • Trustless Systems: Enable trustless data exchange
  • Provenance Tracking: Track data origin cryptographically

Signed Transactions

JWS (JSON Web Signatures)

Fluree supports JWS for signing transactions:

Transaction Structure:

{
  "ledger": "mydb:main",
  "tx": [
    {
      "@id": "ex:alice",
      "ex:name": "Alice"
    }
  ],
  "signature": {
    "protected": {
      "alg": "ES256",
      "kid": "key-1"
    },
    "signature": "base64-encoded-signature"
  }
}

Verification:

  • Extract signature from transaction
  • Verify signature using signer’s public key
  • Confirm transaction hasn’t been modified

Verifiable Credentials

Fluree supports Verifiable Credentials (VC) for credential-based transactions:

VC Structure:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1"
  ],
  "type": ["VerifiableCredential"],
  "credentialSubject": {
    "@id": "ex:alice",
    "ex:name": "Alice"
  },
  "proof": {
    "type": "Ed25519Signature2020",
    "created": "2024-01-15T10:00:00Z",
    "verificationMethod": "did:example:alice#key-1",
    "proofValue": "base64-encoded-signature"
  }
}

Verification:

  • Verify credential proof
  • Check credential issuer
  • Validate credential structure
  • Confirm credential hasn’t been revoked

Transaction Signing

Signing a Transaction

Step 1: Prepare Transaction

{
  "ledger": "mydb:main",
  "tx": [
    {
      "@id": "ex:alice",
      "ex:name": "Alice"
    }
  ]
}

Step 2: Create Signature

// Pseudo-code
const payload = JSON.stringify(tx);
const signature = sign(payload, privateKey);

Step 3: Add Signature

{
  "ledger": "mydb:main",
  "tx": [...],
  "signature": {
    "protected": {
      "alg": "ES256",
      "kid": "key-1"
    },
    "signature": signature
  }
}

Signature Algorithms

Fluree supports standard signature algorithms:

  • ES256: ECDSA with P-256 and SHA-256
  • ES384: ECDSA with P-384 and SHA-384
  • ES512: ECDSA with P-521 and SHA-512
  • Ed25519: EdDSA with Ed25519 curve

Key Management

Public Key Storage:

Public keys can be stored:

  • In the ledger itself (as data)
  • In a separate key registry
  • In a DID (Decentralized Identifier) document

Example Public Key in Ledger:

{
  "@id": "ex:alice",
  "ex:publicKey": {
    "kty": "EC",
    "crv": "P-256",
    "x": "base64-x",
    "y": "base64-y"
  }
}

Transaction Verification

Verifying a Signed Transaction

Step 1: Extract Signature

{
  "signature": {
    "protected": {...},
    "signature": "base64-signature"
  }
}

Step 2: Get Public Key

// Pseudo-code
const kid = signature.protected.kid;
const publicKey = getPublicKey(kid);

Step 3: Verify Signature

// Pseudo-code
const payload = JSON.stringify(tx);
const isValid = verify(payload, signature.signature, publicKey);

Verification in Fluree

Fluree automatically verifies signed transactions:

  1. Signature Extraction: Extract signature from transaction
  2. Key Resolution: Resolve public key from signature
  3. Signature Verification: Verify cryptographic signature
  4. Transaction Acceptance: Accept transaction if signature valid

If verification fails:

  • Transaction is rejected
  • Error returned to client
  • No data is committed

Use Cases

Audit Compliance

Requirement: Cryptographic proof of all data changes

Solution: Sign all transactions

{
  "ledger": "audit:main",
  "tx": [
    {
      "@id": "ex:change1",
      "ex:action": "update",
      "ex:timestamp": "2024-01-15T10:00:00Z"
    }
  ],
  "signature": {...}
}

Benefits:

  • Cryptographic proof of changes
  • Tamper-proof audit trail
  • Compliance with regulations

Trustless Data Exchange

Requirement: Share data without trusting intermediaries

Solution: Sign data at source

{
  "ledger": "shared:main",
  "tx": [
    {
      "@id": "ex:data1",
      "ex:value": "sensitive-data",
      "ex:source": "ex:system-a"
    }
  ],
  "signature": {
    "protected": {
      "kid": "ex:system-a#key-1"
    },
    "signature": "..."
  }
}

Benefits:

  • Verify data origin
  • Detect tampering
  • Trustless data sharing

Multi-Party Systems

Requirement: Multiple parties contribute data

Solution: Each party signs their transactions

{
  "ledger": "consortium:main",
  "tx": [
    {
      "@id": "ex:contribution1",
      "ex:party": "ex:party-a",
      "ex:data": "..."
    }
  ],
  "signature": {
    "protected": {
      "kid": "ex:party-a#key-1"
    },
    "signature": "..."
  }
}

Benefits:

  • Identify data contributors
  • Verify party contributions
  • Enable accountability

Regulatory Compliance

Requirement: Prove data integrity for regulations

Solution: Sign all regulated data

Examples:

  • HIPAA: Healthcare data integrity
  • GDPR: Personal data provenance
  • SOX: Financial data integrity
  • FDA: Pharmaceutical data integrity

Verifiable Credentials

Credential Structure

Verifiable Credentials follow W3C VC standard:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1"
  ],
  "id": "ex:credential-1",
  "type": ["VerifiableCredential", "ex:IdentityCredential"],
  "issuer": "did:example:issuer",
  "issuanceDate": "2024-01-15T10:00:00Z",
  "credentialSubject": {
    "@id": "ex:alice",
    "ex:name": "Alice",
    "ex:email": "alice@example.com"
  },
  "proof": {
    "type": "Ed25519Signature2020",
    "created": "2024-01-15T10:00:00Z",
    "verificationMethod": "did:example:issuer#key-1",
    "proofValue": "base64-signature"
  }
}

Credential Verification

Step 1: Verify Proof

// Pseudo-code
const proof = credential.proof;
const publicKey = resolvePublicKey(proof.verificationMethod);
const isValid = verifyProof(credential, proof, publicKey);

Step 2: Check Issuer

// Pseudo-code
const issuer = credential.issuer;
const isTrusted = checkIssuerTrust(issuer);

Step 3: Validate Credential

// Pseudo-code
const isValid = validateCredentialStructure(credential);

Credential Revocation

Credentials can be revoked:

{
  "@id": "ex:revocation-1",
  "@type": "ex:CredentialRevocation",
  "ex:credentialId": "ex:credential-1",
  "ex:revokedAt": "2024-01-20T10:00:00Z"
}

Verification should check revocation status.

Data Provenance

Tracking Data Origin

Signed transactions enable provenance tracking:

Query Transaction History:

SELECT ?tx ?signer ?timestamp
WHERE {
  ?tx ex:signature ?sig .
  ?sig ex:signer ?signer .
  ?tx ex:timestamp ?timestamp .
}
ORDER BY DESC(?timestamp)

Verify Data Chain:

SELECT ?data ?origin ?signer
WHERE {
  ?data ex:origin ?origin .
  ?origin ex:signature ?sig .
  ?sig ex:signer ?signer .
}

Provenance Verification

Step 1: Find Data Origin

SELECT ?tx
WHERE {
  ?tx ex:created ?data .
}

Step 2: Verify Transaction Signature

// Pseudo-code
const tx = getTransaction(txId);
const isValid = verifySignature(tx);

Step 3: Trace Provenance Chain

SELECT ?chain
WHERE {
  ?data ex:provenance ?chain .
  ?chain ex:signature ?sig .
}

Best Practices

Key Management

  1. Secure Storage: Store private keys securely
  2. Key Rotation: Rotate keys regularly
  3. Key Backup: Backup keys securely
  4. Key Recovery: Plan for key recovery

Signature Practices

  1. Always Sign: Sign all important transactions
  2. Verify Before Trust: Always verify signatures
  3. Standard Algorithms: Use standard signature algorithms
  4. Key Identification: Use clear key identifiers

Credential Management

  1. Issuer Trust: Establish issuer trust relationships
  2. Credential Validation: Validate credential structure
  3. Revocation Checking: Check revocation status
  4. Credential Storage: Store credentials securely

Compliance

  1. Audit Logging: Log all signature verifications
  2. Provenance Tracking: Track data provenance
  3. Regulatory Alignment: Align with regulations
  4. Documentation: Document verification processes

Comparison with Traditional Approaches

Traditional Audit Logs

Traditional Approach:

  • Logs stored in database
  • Can be modified by admins
  • No cryptographic proof
  • Difficult to verify

Problems:

  • Logs can be tampered with
  • No proof of authenticity
  • Difficult to verify
  • Not suitable for trustless systems

Fluree Verifiable Data

Fluree Approach:

  • Transactions cryptographically signed
  • Signatures cannot be forged
  • Anyone can verify
  • Suitable for trustless systems

Benefits:

  • Tamper-proof history
  • Cryptographic proof
  • Easy verification
  • Trustless data exchange

Architecture

Signature Storage

Signatures are stored with transactions:

  • Transaction Metadata: Signature stored in transaction metadata
  • Queryable: Signatures can be queried like any data
  • Versioned: Signature history tracked over time

Verification Engine

The verification engine:

  • Automatic Verification: Verifies signatures automatically
  • Key Resolution: Resolves public keys from signatures
  • Standard Compliance: Follows JWS and VC standards

API Integration

Verification integrated with:

  • Transaction API: Verifies signatures on transaction submission
  • Query API: Can query signature information
  • Admin API: Administrative operations on signatures

Verifiable data makes Fluree uniquely suited for applications requiring cryptographic proof of data integrity, audit compliance, and trustless data exchange. By supporting industry-standard signature formats, Fluree enables integration with existing identity systems and credential ecosystems.

Reasoning and Inference

Fluree includes a built-in reasoning engine that can derive new facts from your data based on ontology declarations (RDFS and OWL) or user-defined rules (Datalog). This page introduces the core concepts; see Query-time reasoning for usage syntax, Datalog rules for custom rules, and the OWL & RDFS reference for a full list of supported constructs.

Why reasoning?

In a plain triple store every fact must be stated explicitly. If you assert that Alice is a Student and that Student is a subclass of Person, a query for all Person instances will not return Alice — unless you also assert Alice rdf:type Person.

With reasoning enabled, Fluree can infer the missing fact automatically:

Alice  rdf:type  Student       (asserted)
Student  rdfs:subClassOf  Person   (schema)
────────────────────────────────────────────
Alice  rdf:type  Person        (inferred)

This keeps your data clean (no redundant assertions) while giving your queries the full power of schema-aware retrieval.

Reasoning modes

Fluree supports four reasoning profiles that can be enabled independently or in combination. They are listed here from lightest to most powerful:

ModeWhat it doesCost
RDFSExpands rdfs:subClassOf and rdfs:subPropertyOf hierarchies so that querying for a superclass or superproperty also returns instances of its subclasses/subproperties.Very low — query rewriting only, no materialization.
OWL 2 QLEverything RDFS does, plus owl:inverseOf expansion and rdfs:domain/rdfs:range type inference via query rewriting. Based on the OWL 2 QL profile designed for query answering.Low — query rewriting only.
OWL 2 RLForward-chaining materialization of a comprehensive rule set (symmetric, transitive, and inverse properties; functional properties; property chains; class restrictions; owl:sameAs equivalence; and more). See the OWL & RDFS reference for the full list.Medium — derives facts before query execution; results are cached.
DatalogUser-defined if/then rules expressed in a familiar JSON-LD pattern syntax. Rules run in a fixpoint loop and can chain off each other or off OWL-derived facts. See Datalog rules.Depends on the rules — can be lightweight or heavy.

Combining modes

Modes can be combined freely. For example, ["rdfs", "owl2rl", "datalog"] first materializes OWL 2 RL entailments, then runs your Datalog rules over the combined base + OWL-derived data, and finally applies RDFS query rewriting on top. This layering lets you start simple (RDFS) and add more powerful inference only where you need it.

How it works

Fluree uses two complementary techniques depending on the mode:

Query rewriting (RDFS, OWL 2 QL)

The query planner rewrites your patterns at compile time. For example, a ?x rdf:type ex:Person pattern is expanded into a UNION over Person and all of its subclasses. No extra data is stored; the rewriting is transparent to the caller.

Forward-chaining materialization (OWL 2 RL, Datalog)

Before your query runs, the engine:

  1. Loads the ontology — extracts OWL/RDFS declarations (property types, class hierarchies, restrictions) from your data.
  2. Applies rules in a fixpoint loop — each iteration derives new facts from the combination of asserted and previously-derived facts. The loop stops when no new facts are produced (fixpoint) or a budget limit is reached.
  3. Overlays derived facts — the inferred triples are layered on top of your base data as a read-only overlay. Your original data is never modified.
  4. Caches the result — if the same database state is queried again with the same reasoning modes, the cached materialization is reused instantly.

Budget controls

To guarantee termination, materialization enforces configurable limits:

LimitDefaultWhat happens when exceeded
Time30 secondsMaterialization stops; partial results used
Derived facts1,000,000Materialization stops; partial results used
Memory100 MBMaterialization stops; partial results used

When a budget is exceeded the query still runs — it simply uses whatever facts were derived before the limit was hit. Diagnostics are available via tracing spans to identify when capping occurs.

Enabling reasoning

There are two levels of control:

1. Ledger-wide defaults (configuration graph)

Set reasoning defaults so every query against a ledger uses a particular mode without having to specify it each time:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "insert": {
    "@id": "urn:fluree:mydb:main:config:ledger",
    "@type": "f:LedgerConfig",
    "f:reasoningDefaults": {
      "f:reasoningModes": {"@id": "f:RDFS"},
      "f:overrideControl": {"@id": "f:OverrideAll"}
    }
  }
}

See Setting groups — reasoningDefaults for full configuration options.

2. Per-query override

Any query can specify or override the reasoning mode:

{
  "select": ["?s"],
  "where": {"@id": "?s", "@type": "ex:Person"},
  "reasoning": "rdfs"
}

Use "reasoning": "none" to explicitly disable reasoning for a single query, even if the ledger has defaults configured.

See Query-time reasoning for complete syntax and examples.

Key concepts

Schema as data

Unlike systems with external schema files, Fluree stores ontology declarations as regular triples in your graph. An rdfs:subClassOf assertion is just another triple — you add it via a normal transaction:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ex": "http://example.org/"
  },
  "insert": {
    "@id": "ex:Student",
    "rdfs:subClassOf": {"@id": "ex:Person"}
  }
}

This means your schema evolves with your data, is time-travelable, and is subject to the same policy controls as any other data.

Derived facts are virtual

Inferred triples exist only in a query-time overlay — they are never written to storage. This means:

  • No storage bloat — you don’t pay disk costs for derived facts.
  • Always consistent — derived facts are recomputed from the current state, so they can never go stale.
  • Time-travel safe — querying a historical point in time materializes based on that point’s data and schema.

owl:sameAs and identity

When OWL 2 RL is enabled, the engine tracks owl:sameAs equivalences using an efficient union-find data structure. If two resources are determined to be the same (via functional properties, inverse functional properties, or owl:hasKey), all their facts are merged under a canonical representative. Queries transparently resolve through these equivalences.

TopicPage
Using reasoning in queriesQuery-time reasoning
Writing custom inference rulesDatalog rules
Full list of supported OWL & RDFS constructsOWL & RDFS reference
Configuring ledger-wide defaultsSetting groups

Guides

Practical, task-oriented cookbooks for Fluree’s key features. Each guide shows working patterns you can adapt to your use case.

If you’re new to Fluree, start with the Getting Started section first.

Cookbooks

Set up BM25 full-text search and vector similarity. Insert searchable data, write relevance-ranked queries, combine search with graph patterns, and build hybrid text+vector search.

Time Travel

Practical patterns for temporal queries: audit trails, point-in-time comparison, compliance snapshots, recovering deleted data, and transaction metadata.

Branching and Merging

Git-like workflows for data: safe experimentation, review-before-merge, multi-environment setups, feature branches, and rebase strategies.

Access Control Policies

Set up fine-grained access control: department isolation, role-based access, property redaction, multi-tenant isolation, and default-deny patterns.

SHACL Validation

Define data quality constraints: required properties, datatype validation, value ranges, string patterns, cardinality, and allowed values.

Cookbook: Full-Text and Vector Search

Fluree integrates BM25 full-text search and vector similarity directly into the query engine. Search results participate in joins, filters, and aggregations like any other graph pattern — no external search service needed.

This guide covers practical patterns for both approaches.

1. Insert searchable data

Annotate string values with @fulltext to make them searchable:

fluree insert '{
  "@context": {"ex": "http://example.org/"},
  "@graph": [
    {
      "@id": "ex:doc1",
      "@type": "ex:Article",
      "ex:title": "Introduction to Graph Databases",
      "ex:body": {
        "@value": "Graph databases model data as nodes and edges, making relationship queries fast and intuitive. Unlike relational databases, graph databases traverse relationships without expensive joins.",
        "@type": "@fulltext"
      }
    },
    {
      "@id": "ex:doc2",
      "@type": "ex:Article",
      "ex:title": "Time Series vs Graph: When to Use Which",
      "ex:body": {
        "@value": "Time series databases excel at ordered, append-only data. Graph databases shine when relationships between entities matter more than temporal ordering.",
        "@type": "@fulltext"
      }
    },
    {
      "@id": "ex:doc3",
      "@type": "ex:Article",
      "ex:title": "Building REST APIs with Rust",
      "ex:body": {
        "@value": "Rust provides memory safety without garbage collection, making it ideal for high-performance API servers. Popular frameworks include Actix and Axum.",
        "@type": "@fulltext"
      }
    }
  ]
}'

In Turtle, use ^^f:fullText:

fluree insert '
@prefix ex: <http://example.org/> .
@prefix f:  <https://ns.flur.ee/db#> .

ex:doc4 a ex:Article ;
  ex:title "SPARQL Query Optimization" ;
  ex:body  "Optimizing SPARQL queries requires understanding triple patterns, join ordering, and index selection. The query planner reorders patterns based on estimated cardinality."^^f:fullText .
'

2. Search with relevance scoring

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?title", "?score"],
  "where": [
    {"@id": "?doc", "@type": "ex:Article", "ex:body": "?body", "ex:title": "?title"},
    ["bind", "?score", "(fulltext ?body \"graph database relationships\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}'

The fulltext() function returns a BM25 relevance score. Higher scores mean better matches. Documents with none of the search terms score 0.

3. Combine search with graph filters

Search only within a specific category:

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?title", "?score"],
  "where": [
    {
      "@id": "?doc", "@type": "ex:Article",
      "ex:body": "?body", "ex:title": "?title",
      "ex:category": "databases"
    },
    ["bind", "?score", "(fulltext ?body \"query optimization\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]]
}'

Place graph filters before the fulltext() bind to reduce the number of documents scored.

Patterns

Search across multiple properties

If both title and body are @fulltext, score them separately and combine:

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?title", "?combined"],
  "where": [
    {"@id": "?doc", "ex:ftTitle": "?ft", "ex:body": "?body", "ex:title": "?title"},
    ["bind", "?titleScore", "(fulltext ?ft \"graph databases\")"],
    ["bind", "?bodyScore", "(fulltext ?body \"graph databases\")"],
    ["bind", "?combined", "(+ (* ?titleScore 2.0) ?bodyScore)"],
    ["filter", "(> ?combined 0)"]
  ],
  "orderBy": [["desc", "?combined"]]
}'

This weights title matches 2x higher than body matches.

Search with time travel

Search the knowledge base as it existed at a previous point in time:

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "from": "mydb:main@t:5",
  "select": ["?title", "?score"],
  "where": [
    {"@id": "?doc", "ex:body": "?body", "ex:title": "?title"},
    ["bind", "?score", "(fulltext ?body \"deployment\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]]
}'

Search with aggregation

Count matches by category:

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?category", "?count"],
  "where": [
    {"@id": "?doc", "ex:body": "?body", "ex:category": "?category"},
    ["bind", "?score", "(fulltext ?body \"database\")"],
    ["filter", "(> ?score 0)"]
  ],
  "groupBy": "?category",
  "aggregate": {"?count": ["count", "?doc"]}
}'

1. Insert vector embeddings

Annotate arrays with @vector:

fluree insert '{
  "@context": {"ex": "http://example.org/"},
  "@graph": [
    {
      "@id": "ex:product1",
      "@type": "ex:Product",
      "ex:name": "Wireless Headphones",
      "ex:embedding": {"@value": [0.82, 0.15, 0.91, 0.23], "@type": "@vector"}
    },
    {
      "@id": "ex:product2",
      "@type": "ex:Product",
      "ex:name": "Bluetooth Speaker",
      "ex:embedding": {"@value": [0.78, 0.12, 0.88, 0.31], "@type": "@vector"}
    },
    {
      "@id": "ex:product3",
      "@type": "ex:Product",
      "ex:name": "Running Shoes",
      "ex:embedding": {"@value": [0.11, 0.95, 0.05, 0.87], "@type": "@vector"}
    }
  ]
}'

Vectors are stored as f32. Values are quantized at ingest time.

2. Find similar items

Use cosineSimilarity (or dotProduct, euclideanDistance) to rank by similarity:

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?name", "?sim"],
  "where": [
    {"@id": "?product", "@type": "ex:Product", "ex:name": "?name", "ex:embedding": "?vec"},
    ["bind", "?sim", "(cosineSimilarity ?vec [0.80, 0.14, 0.90, 0.25])"],
    ["filter", "(> ?sim 0.9)"]
  ],
  "orderBy": [["desc", "?sim"]],
  "limit": 5
}'

3. Combine vector search with graph patterns

Find products similar to a query vector, but only in a specific category:

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?name", "?sim"],
  "where": [
    {
      "@id": "?product", "@type": "ex:Product",
      "ex:name": "?name", "ex:embedding": "?vec",
      "ex:category": "electronics"
    },
    ["bind", "?sim", "(cosineSimilarity ?vec [0.80, 0.14, 0.90, 0.25])"]
  ],
  "orderBy": [["desc", "?sim"]],
  "limit": 10
}'

Hybrid search: text + vector

Combine BM25 keyword relevance with vector semantic similarity for the best of both:

fluree query '{
  "@context": {"ex": "http://example.org/"},
  "select": ["?name", "?hybrid"],
  "where": [
    {
      "@id": "?doc", "ex:name": "?name",
      "ex:description": "?desc", "ex:embedding": "?vec"
    },
    ["bind", "?textScore", "(fulltext ?desc \"wireless audio\")"],
    ["bind", "?vecScore", "(cosineSimilarity ?vec [0.80, 0.14, 0.90, 0.25])"],
    ["bind", "?hybrid", "(+ (* ?textScore 0.4) (* ?vecScore 0.6))"],
    ["filter", "(> ?hybrid 0)"]
  ],
  "orderBy": [["desc", "?hybrid"]],
  "limit": 10
}'

Adjust the weights (0.4 text, 0.6 vector) based on your use case. Keyword search is better for exact term matching; vector search is better for semantic similarity.

When to use which

ApproachBest forScale
Inline @fulltextKeyword search, document rankingUp to ~500K documents per property
BM25 graph sourceLarge-scale text search with WAND pruning1M+ documents
Inline @vector + similaritySmall-to-medium similarity searchUp to ~100K vectors
HNSW indexLarge-scale approximate nearest neighbor100K+ vectors

Performance tips

  1. Place graph filters before search — Reduce the candidate set before scoring
  2. Use limit — BM25 and similarity scoring are per-document operations
  3. Wait for indexing — Inline @fulltext works without an index (novelty fallback) but is 7x faster with a built index
  4. Choose the right scale — Inline functions work well up to hundreds of thousands of documents. For millions, use the dedicated graph source pipeline

Cookbook: Time Travel

Every transaction in Fluree is immutable. The database preserves complete history automatically — no audit tables, no trigger-based logging, no slowly-changing dimensions. This guide covers practical patterns for using time travel.

Basics

Query by transaction number

Every transaction increments a counter (t). Query data as it was after any transaction:

# Current state
fluree query 'SELECT ?name ?salary WHERE { ?p schema:name ?name ; ex:salary ?salary }'

# State after transaction 5
fluree query --at 5 'SELECT ?name ?salary WHERE { ?p schema:name ?name ; ex:salary ?salary }'

# State after the very first transaction
fluree query --at 1 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'

Query by ISO timestamp

Use a timestamp to query the state at a specific moment:

fluree query --at 2025-01-15T00:00:00Z \
  'SELECT ?name ?email WHERE { ?p schema:name ?name ; schema:email ?email }'

Fluree finds the most recent transaction at or before the given timestamp.

Query by commit ID

Every commit has a content-addressed ID (CID). Query by exact commit:

fluree query --at bafyreif... \
  'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'

HTTP API

# By transaction number
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main&t=5' \
  -H "Content-Type: application/sparql-query" \
  -d 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'

# By timestamp (URL-encoded)
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main&t=2025-01-15T00%3A00%3A00Z' \
  -H "Content-Type: application/sparql-query" \
  -d 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'

JSON-LD query with time specifier

{
  "from": "mydb:main@t:5",
  "select": ["?name"],
  "where": [{"@id": "?p", "schema:name": "?name"}]
}

Patterns

Audit trail: who changed what

View the history of changes to a specific entity:

fluree history 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT ?prop ?value ?t ?op WHERE {
  ex:alice ?prop ?value .
}'

Each result includes:

  • ?t — the transaction number
  • ?opassert (added) or retract (removed)

Point-in-time comparison

Compare an entity before and after a change:

# Before the change (t=5)
fluree query --at 5 'SELECT ?salary WHERE { ex:alice ex:salary ?salary }'

# After the change (t=6)
fluree query --at 6 'SELECT ?salary WHERE { ex:alice ex:salary ?salary }'

Find when a value changed

Track salary history:

fluree history 'SELECT ?salary ?t ?op WHERE { ex:alice ex:salary ?salary }'

Output:

?salary  ?t  ?op
85000    1   assert      ← Initial salary
85000    4   retract     ← Old value removed
95000    4   assert      ← New value added
95000    7   retract
110000   7   assert      ← Another raise

Each update produces a retract/assert pair at the same t.

Compliance snapshot

Generate a report of all data as it existed on a specific date:

fluree query --at 2025-06-30T23:59:59Z --format csv \
  'PREFIX schema: <http://schema.org/>
   PREFIX ex: <http://example.org/>

   SELECT ?name ?department ?role
   WHERE {
     ?person a schema:Person ;
             schema:name ?name ;
             ex:department ?department ;
             ex:role ?role .
   }
   ORDER BY ?department ?name' > compliance-report-2025-Q2.csv

This is a reproducible snapshot — running the same query with the same timestamp always returns the same results.

Debugging: find what changed between two points

Compare entity states across a range:

# What was added or removed between t=10 and t=15?
fluree history 'SELECT ?s ?p ?o ?t ?op WHERE {
  ?s ?p ?o .
  FILTER(?t >= 10 && ?t <= 15)
}'

Recover deleted data

Data that was retracted still exists in history:

# Carol was deleted at t=8. Recover her data from t=7:
fluree query --at 7 'SELECT ?prop ?value WHERE { ex:carol ?prop ?value }'

To restore, simply re-insert the data from the historical query.

Multi-ledger time travel

Query two ledgers at different points in time:

{
  "from": {
    "products": {"ledger": "catalog:main", "t": 10},
    "orders": {"ledger": "orders:main", "t": 25}
  },
  "select": ["?product", "?price", "?qty"],
  "where": [
    {"@id": "?order", "ex:product": "?p", "ex:quantity": "?qty", "@graph": "orders"},
    {"@id": "?p", "schema:name": "?product", "schema:price": "?price", "@graph": "products"}
  ]
}

This joins product data from t=10 with order data from t=25 — useful for price-at-time-of-purchase analysis.

Temporal aggregation

Track how a metric changed over time:

fluree history 'SELECT ?count ?t ?op WHERE {
  ex:dashboard ex:activeUsers ?count
}'

Transaction metadata

Every commit records metadata. Query it via the txn-meta graph:

PREFIX f: <https://ns.flur.ee/db#>

SELECT ?t ?timestamp ?author
FROM <urn:fluree:knowledge-base:main#txn-meta>
WHERE {
  ?commit f:t ?t ;
          f:time ?timestamp .
  OPTIONAL { ?commit f:author ?author }
}
ORDER BY DESC(?t)
LIMIT 10

Common questions

Is time travel expensive? No. Querying a historical state uses the same indexes as querying the current state. The cost is O(log n) for index lookups.

Does old data use extra storage? Yes — immutability means retracted values are preserved. Storage grows with the number of changes, not just the current state size. For most workloads this is negligible.

Can I query “between” two points? History queries return all changes with their transaction numbers. Use FILTER on ?t to scope to a range.

Can I delete history? No. Immutability is a core guarantee. If you need to remove data for compliance (e.g., GDPR right to erasure), contact the Fluree team about data compaction options.

Cookbook: Branching and Merging

Fluree lets you fork a ledger into independent branches, each with its own commit history. Experiment freely, then merge changes back when ready. Think of it like git branch for your data.

Quick start

# Create a branch from main
fluree branch create experiment

# Switch to the branch
fluree use mydb:experiment

# Make changes (only on the branch)
fluree insert '...'
fluree update '...'

# See both branches
fluree branch list

# Merge back into main
fluree branch merge experiment

# Clean up
fluree branch drop experiment

Core concepts

  • Branches are isolated — Transactions on one branch are invisible to others
  • Branches are cheap — Creating a branch doesn’t copy data; it creates a new commit pointer
  • Merge is fast-forward — The target branch must not have diverged. If it has, rebase first
  • Source branch survives merge — After merging, the branch can continue receiving transactions

Patterns

Safe experimentation

Try a risky change without affecting production:

fluree branch create try-new-schema

fluree use mydb:try-new-schema

# Restructure data
fluree update 'PREFIX ex: <http://example.org/>
DELETE { ?doc ex:category ?cat }
INSERT { ?doc ex:tags ?cat }
WHERE  { ?doc ex:category ?cat }'

# Verify the change looks right
fluree query 'SELECT ?doc ?tag WHERE { ?doc ex:tags ?tag }'

# If it works, merge back
fluree branch merge try-new-schema
fluree branch drop try-new-schema

# If it doesn't work, just drop the branch — main is untouched
fluree branch drop try-new-schema

Review before merge

Use branches as a staging area for data changes:

# Data engineer creates a branch for the weekly import
fluree branch create weekly-import

fluree use mydb:weekly-import

# Import new data
fluree insert -f new-data.ttl

# Verify: count new entities
fluree query 'SELECT (COUNT(?s) AS ?count) WHERE { ?s a ex:NewRecord }'

# Verify: no duplicates
fluree query 'SELECT ?id (COUNT(?s) AS ?count) WHERE {
  ?s ex:externalId ?id
} GROUP BY ?id HAVING(?count > 1)'

# Looks good — merge into main
fluree branch merge weekly-import

Multi-environment workflow

Use branches to model dev/staging/prod environments within a single ledger:

# Create environment branches
fluree branch create staging
fluree branch create dev --from staging

# Developers work on dev
fluree use mydb:dev
fluree insert '...'

# Promote to staging via merge
fluree branch merge dev --target staging

# Promote to main (production) after testing
fluree use mydb:staging
# ... run validation queries ...
fluree branch merge staging

Feature branches

Multiple people can work on different features simultaneously:

# Team member A: add product categories
fluree branch create feature-categories

# Team member B: update pricing
fluree branch create feature-pricing

# Each works independently on their branch
# ...

# Merge sequentially — first one is a fast-forward
fluree branch merge feature-categories

# Second one may need rebase if main advanced
fluree branch rebase feature-pricing
fluree branch merge feature-pricing

Rebase to catch up with upstream

When main has advanced since you branched:

# Main has new commits that your branch doesn't have
fluree branch rebase my-branch

This replays your branch’s commits on top of main’s current HEAD. Conflict strategies:

StrategyBehavior
take-both (default)Keep both the source and branch changes
abortStop if any conflicts — let you inspect
take-sourceSource (main) wins on conflict
take-branchBranch wins on conflict
skipSkip conflicting commits entirely
# Rebase with abort on conflict for manual review
fluree branch rebase my-branch --strategy abort

# Rebase where main always wins
fluree branch rebase my-branch --strategy take-source

Compare branches

See what’s different between two branches:

# Query branch for entities not in main
fluree query --ledger mydb:my-branch 'SELECT ?s ?p ?o WHERE {
  ?s ?p ?o .
  FILTER NOT EXISTS {
    SERVICE <fluree:ledger:mydb:main> { ?s ?p ?o }
  }
}'

Time travel across branches

Each branch has its own transaction history. Query any branch at any point in time:

# Branch state after its 3rd transaction
fluree query --ledger mydb:experiment --at 3 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'

Branch at a historical point

By default, branch create starts the new branch at the source’s current HEAD. Pass --at to start it at an earlier commit on the source branch instead — useful for recovering to a known-good state, forking off an older release, or experimenting with what-if scenarios from a past point in time.

# Start a branch at transaction 5 on main
fluree branch create rewind --at t:5

# Or use a hex-digest prefix of the commit
fluree branch create rewind --at 3dd028a7

The commit must be reachable from the source branch’s HEAD (branching from an unrelated branch’s commit is rejected). The new branch starts with no index and replays from genesis on first query — acceptable for small/medium histories; if replay cost matters, transact a small no-op to force an index rebuild.

Full CIDs are also accepted (--at fluree:commit:sha256:...) and resolve without requiring the source to be indexed; t:N and hex prefixes require an indexed source.

Branch lifecycle

create ──→ transact ──→ rebase (if needed) ──→ merge ──→ drop
              ↑                                  │
              └──────── continue working ←───────┘

After merging, the branch is still alive. You can:

  • Continue transacting on it (for ongoing work)
  • Merge again later (only new commits since last merge are copied)
  • Drop it when done

HTTP API

# Create a branch
curl -X POST http://localhost:8090/v1/fluree/branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "dev", "source": "main"}'

# Branch at a historical commit
curl -X POST http://localhost:8090/v1/fluree/branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "rewind", "at": "t:5"}'

# Query a specific branch
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:dev' \
  -H "Content-Type: application/sparql-query" \
  -d 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'

# Merge
curl -X POST http://localhost:8090/v1/fluree/merge \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "source": "dev"}'

Best practices

  1. Name branches descriptivelyweekly-import-2025-04, feature-product-tags, not test1
  2. Keep branches short-lived — Long-lived branches diverge more, making rebase harder
  3. Merge frequently — Small, frequent merges are easier than large, infrequent ones
  4. Test before merging — Run validation queries on the branch before promoting
  5. Drop after merging — Clean up branches you’re done with

Cookbook: Access Control Policies

Fluree policies enforce access control inside the database — individual facts (flakes) are filtered based on the requesting identity. The same query returns different results for different users, automatically. No application-layer filtering needed.

This cookbook walks through the common patterns. For the underlying model see Policy enforcement; for the full reference see Policy model and inputs.

How a policy is shaped

Every policy is a JSON-LD node typed f:AccessPolicy. It has three orthogonal pieces:

FieldPurpose
What it targetsf:onProperty, f:onClass, f:onSubject (any combination, each an array of @id references). Omit all three to make a default policy that applies to every flake.
What it governsf:actionf:view (queries), f:modify (transactions), or both.
Whether it permitsEither f:allow: true (unconditional allow), f:allow: false (deny), or f:query: "<JSON-encoded WHERE>" (allow when the embedded query returns at least one binding for the target).

Two more knobs:

  • f:required: true — the policy must allow for access to be granted on its targets, even if default-allow is true. Use it for hard constraints.
  • f:exMessage — error message returned to the caller when the policy denies a transaction.

Inside f:query, two special variables are pre-bound: ?$this (the entity being checked) and ?$identity (the requesting identity, supplied via policy-values).

Quick start

1. Insert sample data

fluree insert '{
  "@context": {
    "schema": "http://schema.org/",
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice Chen",
      "ex:role": "engineer",
      "ex:department": "platform",
      "ex:salary": 130000
    },
    {
      "@id": "ex:bob",
      "@type": "schema:Person",
      "schema:name": "Bob Martinez",
      "ex:role": "manager",
      "ex:department": "platform",
      "ex:salary": 155000
    },
    {
      "@id": "ex:carol",
      "@type": "schema:Person",
      "schema:name": "Carol White",
      "ex:role": "engineer",
      "ex:department": "marketing",
      "ex:salary": 115000
    }
  ]
}'

Add identity records that link DIDs / users to the entities they represent:

fluree insert '{
  "@context": {"ex": "http://example.org/", "f": "https://ns.flur.ee/db#"},
  "@graph": [
    { "@id": "ex:aliceIdentity", "ex:user": {"@id": "ex:alice"},
      "f:policyClass": [{"@id": "ex:CorpPolicy"}] },
    { "@id": "ex:bobIdentity",   "ex:user": {"@id": "ex:bob"},
      "f:policyClass": [{"@id": "ex:CorpPolicy"}] }
  ]
}'

f:policyClass tags an identity with the set of policy classes that apply to it — every stored policy of that class will be loaded automatically when this identity makes a request.

2. Insert policies

Policies are data — they go into the ledger like any other graph:

fluree insert '{
  "@context": {
    "f": "https://ns.flur.ee/db#",
    "ex": "http://example.org/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:salary-restriction",
      "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
      "f:required": true,
      "f:onProperty": [{"@id": "ex:salary"}],
      "f:action": [{"@id": "f:view"}],
      "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$subject\"}, \"http://example.org/role\": \"manager\", \"http://example.org/department\": \"?dept\"}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/department\": \"?dept\"}}"
    },
    {
      "@id": "ex:default-view",
      "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
      "f:action": [{"@id": "f:view"}],
      "f:allow": true
    }
  ]
}'

What this set of two policies says:

  1. ex:salary-restriction is required for ex:salary: a request can read ex:salary only when f:query returns a binding. The query says: given the identity, find the user it represents; if that user is a manager in the same department as the entity being viewed (?$this), allow.
  2. ex:default-view allows reading everything else.

f:query is stored as a JSON string inside the policy because RDF can’t hold structured JSON natively. When loaded, the engine parses it and runs it as a subquery with ?$this and ?$identity pre-bound.

3. Query as different identities

As Alice (engineer in platform — no manager privilege):

fluree query '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "from": "mydb:main",
  "select": ["?name", "?salary"],
  "where": [
    {"@id": "?p", "schema:name": "?name"},
    ["optional", {"@id": "?p", "ex:salary": "?salary"}]
  ],
  "opts": {
    "identity": "ex:aliceIdentity",
    "policy-class": ["ex:CorpPolicy"],
    "default-allow": false
  }
}'

Alice sees every name but no salaries — the required policy denies ex:salary because she isn’t a manager.

As Bob (manager in platform):

Same query, but "identity": "ex:bobIdentity". Bob sees salaries for Alice and Bob (same department) but Carol’s salary stays hidden — different department.

Inline policies (no insert needed)

Don’t want to commit policies to the ledger yet? Pass them inline via opts.policy:

{
  "from": "mydb:main",
  "select": "?name",
  "where": [{"@id": "?p", "schema:name": "?name"}],
  "opts": {
    "policy": [
      {
        "@id": "ex:adhoc-allow",
        "@type": "f:AccessPolicy",
        "f:action": "f:view",
        "f:allow": true
      }
    ],
    "default-allow": false
  }
}

Inline policies are useful for one-off queries, automated tests, and admin scripts. Stored policies (with policy-class) are the right approach for production access control because they’re versioned, time-travelable, and consistent across all requests.

Patterns

Public read

{
  "@id": "ex:public-read",
  "@type": "f:AccessPolicy",
  "f:action": [{"@id": "f:view"}],
  "f:allow": true
}

A default-allow policy with no targeting applies to every flake.

Owner-only access

{
  "@id": "ex:owner-only",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$user\"}}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/owner\": {\"@id\": \"?$user\"}}}"
}

The query resolves ?$identity → user, then checks that ?$this (the entity being read or written) has that user as its ex:owner.

Property redaction (hide a property unless permitted)

[
  {
    "@id": "ex:hide-ssn",
    "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
    "f:required": true,
    "f:onProperty": [{"@id": "ex:ssn"}],
    "f:action": [{"@id": "f:view"}],
    "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"
  },
  {
    "@id": "ex:default-view",
    "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
    "f:action": [{"@id": "f:view"}],
    "f:allow": true
  }
]

f:onProperty scopes the restriction to ex:ssn only — every other property still falls under ex:default-view. f:required: true means the SSN policy MUST allow for any SSN flake to be visible (the default allow doesn’t override it on this property).

Class-scoped restriction

{
  "@id": "ex:employee-only",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onClass": [{"@id": "ex:Employee"}],
  "f:action": [{"@id": "f:view"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/Employee\"}}"
}

Anyone querying for ex:Employee instances must themselves be tagged as an employee.

Multi-tenant isolation

{
  "@id": "ex:tenant-isolation",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/tenant\": \"?tenant\"}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/tenant\": \"?tenant\"}}"
}

Each tenant only sees and writes data tagged with their own ex:tenant. Required-no-targeting means it applies to every flake.

Hierarchical access (manager sees direct reports)

{
  "@id": "ex:manager-sees-reports",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:onClass": [{"@id": "schema:Person"}],
  "f:action": [{"@id": "f:view"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$mgr\"}}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/reportsTo\": {\"@id\": \"?$mgr\"}}}"
}

Write protection

{
  "@id": "ex:no-direct-writes",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onProperty": [{"@id": "ex:approved"}],
  "f:action": [{"@id": "f:modify"}],
  "f:exMessage": "ex:approved is set by the workflow service only.",
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/WorkflowService\"}}"
}

When the policy denies a transaction, f:exMessage is returned to the client.

Combining algorithm

When multiple policies match a flake:

  • A required policy must allow. If any required policy denies (or returns no f:query bindings), access is denied.
  • If no required policy applies, any allow is enough — Fluree uses allow-overrides over the non-required set.
  • If no policy applies, the request falls back to default-allow. Setting default-allow: false is the fail-closed default for production.

See Policy model and inputs for the full state diagram.

Invoking policies via HTTP

Policies are passed via opts on JSON-LD requests, and via headers on SPARQL requests.

JSON-LD

curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $JWT" \
  -d '{
    "from": "mydb:main",
    "select": "?name",
    "where": [{"@id": "?p", "schema:name": "?name"}],
    "opts": {
      "identity": "ex:aliceIdentity",
      "policy-class": ["ex:CorpPolicy"],
      "default-allow": false
    }
  }'

SPARQL (headers — no opts block in SPARQL)

curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main' \
  -H 'Content-Type: application/sparql-query' \
  -H "Authorization: Bearer $JWT" \
  -H 'fluree-identity: ex:aliceIdentity' \
  -H 'fluree-policy-class: ex:CorpPolicy' \
  -H 'fluree-default-allow: false' \
  -d 'SELECT ?name WHERE { ?p <http://schema.org/name> ?name }'
HeaderJSON-LD opts fieldValue
fluree-identityidentityIRI of an identity entity
fluree-policy-classpolicy-classComma-separated or repeated header; matches f:policyClass on stored policies
fluree-policy-valuespolicy-valuesJSON object — extra ?$var bindings for policy queries
fluree-policypolicyInline JSON-LD policy array
fluree-default-allowdefault-allowtrue / false

When the bearer token is verified and the server is configured with data_auth_default_policy_class, the verified identity is auto-applied to policy-values and the configured class to policy-class. See Configuration for those server-side settings.

Policies are data

Because policies live as flakes in the ledger:

  • Time-travel — query at any past t to see the policies in effect then.
  • AuditSELECT ?p ?action WHERE { ?p a f:AccessPolicy ; f:action ?action }.
  • Versionable — change policies through normal transactions; full history kept.
  • Branchable — try new policies on a branch before merging to main.

Best practices

  1. Start with default-allow: false and required policies. Fail-closed is easier to reason about than fail-open.
  2. Tag every stored policy with a class (e.g. ex:CorpPolicy) and tag every identity with f:policyClass. Pass policy-class at query time — Fluree pulls in the matching policy set automatically.
  3. Use f:onProperty / f:onClass / f:onSubject aggressively. A targeted policy is cheaper to evaluate than a default policy, because Fluree can short-circuit during flake filtering.
  4. Keep f:query simple. It runs once per flake-target during evaluation. Lean on tagged identity properties (@type, f:policyClass, role flags) rather than deep traversals.
  5. Test with multiple identities. Verify the same query returns the right shape for each role.
  6. Document intent. Add rdfs:label and rdfs:comment to your policy nodes so audits are readable.

Cookbook: SHACL Validation

SHACL (Shapes Constraint Language) is a W3C standard for defining constraints on graph data. In Fluree, SHACL shapes are evaluated at transaction time — invalid data is rejected before it’s committed (or logged as a warning, depending on your config).

This guide covers:

When SHACL runs

Fluree decides whether to run SHACL validation on each transaction using this order:

  1. If a config graph exists with f:shaclDefaults — follow the configured settings per graph (enable/disable, mode).
  2. If no config graph section is present — fall back to the shapes-exist heuristic: if any SHACL shapes are present in the database (as regular RDF triples), validation runs in Reject mode. If no shapes are present, validation is skipped entirely (zero overhead).

This means you can start using SHACL without writing any config — just transact shapes and they’re enforced.

The shacl feature must be enabled at build time (it’s on by default for the server and CLI binaries). See Standards and feature flags.

Enabling SHACL via the config graph

Writing ledger config is done via transactions into the config graph, whose IRI is always urn:fluree:{ledger_id}#config. See Writing config data for the full pattern.

Minimal config: enable SHACL, shapes in the default graph

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:config:main> a f:LedgerConfig ;
    f:shaclDefaults [
      f:shaclEnabled true ;
      f:validationMode f:ValidationReject
    ] .
}

Notes:

  • f:shaclEnabled defaults to false when a f:shaclDefaults section exists without it — make the enable decision explicit.
  • f:validationMode defaults to f:ValidationReject. Use f:ValidationWarn to log violations without failing the transaction.
  • With no explicit f:shapesSource, shapes are compiled from the default graph (f:defaultGraph, g_id=0). See Storing shapes in a named graph to load from elsewhere.

Defining shapes

Shapes are ordinary RDF — transact them like any other data. They can be written in Turtle, TriG, or JSON-LD.

Node shape with property constraints

@prefix sh:     <http://www.w3.org/ns/shacl#> .
@prefix schema: <http://schema.org/> .
@prefix ex:     <http://example.org/> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .

ex:PersonShape a sh:NodeShape ;
  sh:targetClass schema:Person ;
  sh:property [
    sh:path schema:name ;
    sh:datatype xsd:string ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:message "Every person must have exactly one name"
  ] ;
  sh:property [
    sh:path schema:email ;
    sh:datatype xsd:string ;
    sh:pattern "^[^@]+@[^@]+\\.[^@]+$" ;
    sh:message "Email must be a valid email address"
  ] ;
  sh:property [
    sh:path ex:age ;
    sh:datatype xsd:integer ;
    sh:minInclusive 0 ;
    sh:maxInclusive 200
  ] .

Target types

TargetEffect
sh:targetClass <C>Every subject with rdf:type <C> (including RDFS subclasses of <C> when the hierarchy is available)
sh:targetNode <N>The specific subject <N>
sh:targetSubjectsOf <P>Every subject that currently has predicate <P>
sh:targetObjectsOf <P>Every node that currently appears as the object of <P>

See Predicate-target shapes for notes on how the staged-path validator discovers focus nodes for sh:targetSubjectsOf / sh:targetObjectsOf.

Constraint patterns

Cardinality — required and multi-valued

ex:ArticleShape a sh:NodeShape ;
  sh:targetClass ex:Article ;
  sh:property [ sh:path ex:title ; sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:path ex:tag   ; sh:minCount 1 ] .

Datatype

ex:ProductShape a sh:NodeShape ;
  sh:targetClass ex:Product ;
  sh:property [ sh:path ex:price   ; sh:datatype xsd:decimal ] ;
  sh:property [ sh:path ex:inStock ; sh:datatype xsd:boolean ] .

Numeric ranges

ex:OrderShape a sh:NodeShape ;
  sh:targetClass ex:Order ;
  sh:property [
    sh:path ex:quantity ;
    sh:datatype xsd:integer ;
    sh:minInclusive 1 ;
    sh:maxInclusive 10000
  ] .

Available: sh:minInclusive, sh:maxInclusive, sh:minExclusive, sh:maxExclusive.

String patterns and length

ex:UserShape a sh:NodeShape ;
  sh:targetClass ex:User ;
  sh:property [
    sh:path ex:username ;
    sh:datatype xsd:string ;
    sh:minLength 3 ;
    sh:maxLength 32 ;
    sh:pattern "^[a-zA-Z0-9_]+$"
  ] .

sh:pattern accepts an optional sh:flags string (e.g. "i" for case-insensitive).

Node kind

ex:RefShape sh:property [
  sh:path ex:owner ;
  sh:nodeKind sh:IRI
] .

Values: sh:IRI, sh:BlankNode, sh:Literal, sh:BlankNodeOrIRI, sh:BlankNodeOrLiteral, sh:IRIOrLiteral.

Enumerated values

ex:TaskShape a sh:NodeShape ;
  sh:targetClass ex:Task ;
  sh:property [
    sh:path ex:status ;
    sh:in ( "todo" "in-progress" "review" "done" )
  ] .

sh:hasValue requires a specific value to be present.

Class constraint (with RDFS subclass reasoning)

ex:OrderShape a sh:NodeShape ;
  sh:targetClass ex:Order ;
  sh:property [
    sh:path ex:customer ;
    sh:class schema:Person ;
    sh:minCount 1
  ] .

Each value of ex:customer must have rdf:type schema:Person — or rdf:type of any class that is rdfs:subClassOf* schema:Person. See RDFS subclass reasoning for sh:class.

Pair constraints — comparing two properties

ex:EventShape a sh:NodeShape ;
  sh:targetClass ex:Event ;
  sh:property [
    sh:path ex:startYear ;
    sh:lessThan ex:endYear
  ] ;
  sh:property [
    sh:path ex:primaryEmail ;
    sh:disjoint ex:secondaryEmail
  ] .
ConstraintSemantic
sh:equals <P>Value sets for this path and <P> must be identical
sh:disjoint <P>Value sets must not overlap
sh:lessThan <P>Every value on this path must be strictly less than every value of <P>
sh:lessThanOrEquals <P>Every value on this path must be ≤ every value of <P>

Logical constraints

ex:ContactShape a sh:NodeShape ;
  sh:targetClass ex:Contact ;
  sh:or (
    [ sh:property [ sh:path schema:email     ; sh:minCount 1 ] ]
    [ sh:property [ sh:path schema:telephone ; sh:minCount 1 ] ]
  ) .

Available: sh:not, sh:and, sh:or, sh:xone.

Closed shapes

ex:StrictPersonShape a sh:NodeShape ;
  sh:targetClass ex:StrictPerson ;
  sh:closed true ;
  sh:ignoredProperties ( rdf:type ) ;
  sh:property [ sh:path schema:name ; sh:minCount 1 ] .

A closed shape forbids any property not explicitly declared (or listed in sh:ignoredProperties). rdf:type is implicitly ignored per the SHACL spec.

RDFS subclass reasoning for sh:class

sh:class honors rdfs:subClassOf. Example:

ex:Novelist rdfs:subClassOf schema:Person .
ex:pratchett rdf:type ex:Novelist .

ex:BookShape sh:property [
  sh:path ex:author ;
  sh:class schema:Person
] .

A book whose ex:author is ex:pratchett conforms — ex:pratchett is a schema:Person via rdfs:subClassOf.

Fluree resolves this in two tiers:

  1. Fast path: the ledger’s indexed schema hierarchy (SchemaHierarchy). Expanded at engine build time so same-class and descendant-class matches are O(1) hashmap hits.
  2. Live fallback: when the subclass relation was asserted in the current transaction (or any earlier unindexed commit), the fast path misses. The engine then walks rdfs:subClassOf via a BFS on the database’s SPOT index. This walk is scoped to the default graph regardless of the subject’s own graph — matching how SchemaHierarchy is built and preventing cross-graph issues.

Predicate-target shapes

sh:targetSubjectsOf(P) and sh:targetObjectsOf(P) depend on the current state of the database — a subject is a focus node iff it actually has (or is referenced by) predicate P in the post-transaction view.

Fluree does not precompute target hints from staged flakes. Instead, for each focus node being validated, the engine does a bounded existence check against the post-state:

  • sh:targetSubjectsOf(P) → SPOT range query (focus, P, _). Non-empty → shape applies.
  • sh:targetObjectsOf(P) → OPST range query (_, P, focus). Non-empty → shape applies.

This means:

  • A base-state (alice, ex:ssn, "123") makes sh:targetSubjectsOf(ex:ssn) fire on alice even when this transaction only retracts ex:name.
  • A retraction-only transaction that removes the last matching edge means the shape no longer applies — the post-state check returns empty.
  • The check is bounded by the number of predicate-targeted shapes in the cache, not the data size.

Ref-objects of asserted flakes are pulled into the focus set for their graph, so newly-introduced inbound edges trigger validation of the referenced node.

Per-graph configuration

Each named graph can have its own f:shaclEnabled and f:validationMode via f:graphOverrides:

@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:config:main> a f:LedgerConfig ;
    # Ledger-wide: SHACL on, reject on violation.
    f:shaclDefaults [
      f:shaclEnabled true ;
      f:validationMode f:ValidationReject ;
      f:overrideControl f:OverrideAll
    ] ;
    # Per-graph: ex:scratch has SHACL off; ex:audit uses warn mode.
    f:graphOverrides
      [ a f:GraphConfig ;
        f:targetGraph ex:scratch ;
        f:shaclDefaults [ f:shaclEnabled false ]
      ],
      [ a f:GraphConfig ;
        f:targetGraph ex:audit ;
        f:shaclDefaults [ f:validationMode f:ValidationWarn ]
      ] .
}

With this config:

  • A violating write to the default graph is rejected (ledger-wide Reject).
  • A violating write to ex:scratch passes without validation (graph disabled).
  • A violating write to ex:audit passes but emits a tracing::warn! (Warn mode).
  • A single multi-graph transaction can mix modes: reject-bucket violations fail the txn; warn-bucket violations get logged.

Monotonicity

Per-graph configs can only tighten the ledger-wide posture:

Ledger-widePer-graphEffective
enabled: false, OverrideNoneenabled: truedisabled (OverrideNone blocks per-graph)
enabled: true, OverrideAllenabled: falsedisabled for that graph
mode: warn, OverrideAllmode: rejectreject for that graph

See Override control for the full ruleset.

Storing shapes in a named graph

f:shapesSource points the shape compiler at a specific graph. Useful when you want schema / shapes isolated from data — even the config graph itself can be used as a shape source.

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:config:main> a f:LedgerConfig ;
    f:shaclDefaults [
      f:shaclEnabled true ;
      f:shapesSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector <http://example.org/shapes> ]
      ]
    ] .
}

Semantics:

  • f:shapesSource is authoritative, not additive: when set, shapes come exclusively from the configured graph. Shapes in the default graph are ignored.
  • f:shapesSource is non-overridable — it can only be set in the config graph, not via transaction/query-time options.
  • Use f:graphSelector f:defaultGraph to explicitly point at the default graph (same as omitting f:shapesSource).

Validation modes

  • f:ValidationReject (default): on any violation, the transaction fails with ShaclViolation(report). The formatted report lists each violation’s focus node, property path, and message.
  • f:ValidationWarn: violations are logged via tracing::warn! and the transaction proceeds. Any non-violation error from the SHACL pipeline (compile failure, range-scan failure) still propagates — Warn mode never silently admits a broken validation pipeline.

Working with shapes across write surfaces

SHACL validation runs consistently on every write surface:

  • JSON-LD / SPARQL transactions (fluree insert, fluree upsert, fluree update)
  • Turtle / TriG ingest (fluree insert-turtle, stage_turtle_insert)
  • Commit replay (push_commits_with_handle, followers applying upstream commits)

All three routes go through the same post-stage helper, so the ledger’s configured SHACL posture (enable/disable, mode, per-graph, shapes source) applies uniformly.

Not yet supported

The following SHACL constructs are parsed/compiled but currently no-ops at validation time. Shapes using them load without error but don’t constrain data:

  • sh:uniqueLang, sh:languageIn — require language-tag metadata on flakes, which isn’t yet threaded through the validation path.
  • sh:qualifiedValueShape (+ sh:qualifiedMinCount / sh:qualifiedMaxCount) — requires recursive nested-shape counting.

These are tracked in the SHACL compliance effort. Contributors: see Contributing / SHACL implementation.

Shapes are data

Because shapes live as regular RDF in your ledger:

  • Time-travelable@atT query any shape’s history to see what validation was in effect at a given commit.
  • Versionabledelete/insert constraints through ordinary transactions.
  • QueryableSELECT ?shape ?target WHERE { ?shape sh:targetClass ?target }.
  • Branchable — test new constraints on a branch; merge when verified.

Best practices

  1. Start with sh:minCount — missing-value bugs are the most common data quality issue.
  2. Incremental rollout — deploy shapes in f:ValidationWarn mode first. Watch the logs for a sprint, then flip to f:ValidationReject.
  3. Per-graph scratch zones — for experimentation, disable SHACL on a named graph so exploratory transactions don’t fail your CI.
  4. sh:message everywhere — custom messages are what end users see when a transaction is rejected. Invest in them early.
  5. f:shapesSource for schema hygiene — keep shapes out of user data graphs so deletes / retractions on user data can’t accidentally touch your schema.

Cookbook: owl:imports across named graphs

This walkthrough builds a small two-file ontology, links it together with owl:imports, applies it to instance data, and shows OWL 2 QL and OWL 2 RL inference firing through the import.

In Fluree, an owl:imports target must resolve to another named graph in the same ledger (or to a local graph via f:ontologyImportMap). Cross-ledger imports are not supported. This tutorial uses three named graphs in one ledger:

Graph IRIRole
(default graph)Instance data
<http://example.org/onto/core>Core ontology — class hierarchy + owl:imports hub
<http://example.org/onto/behaviors>Imported ontology — property characteristics
<urn:fluree:demo:main#config>Ledger config — wires up reasoning

See Reasoning and inference for background and Setting groups → reasoningDefaults for the full config schema.


1. Create the ledger

fluree init
fluree create demo

demo becomes the active ledger. Its full ID is demo:main, which means the config named graph IRI is urn:fluree:demo:main#config (the #config fragment is a Fluree convention).


2. Insert instance data into the default graph

Save as 01-data.ttl:

@prefix ex: <http://example.org/> .

# People (typed directly, will be classified further by reasoning)
ex:alice  a ex:GradStudent .
ex:bob    a ex:Person .
ex:carol  a ex:Professor .

# Ancestor chain — exercises owl:TransitiveProperty (declared in the import)
ex:alice  ex:hasAncestor ex:eve .
ex:eve    ex:hasAncestor ex:frank .

# Living arrangement — exercises owl:SymmetricProperty
ex:alice  ex:livesWith   ex:bob .

# Parent/child — exercises owl:inverseOf
ex:carol  ex:parentOf    ex:alice .

# Teaching — exercises rdfs:domain / rdfs:range
ex:professor1 ex:teaches ex:cs101 .

Insert it:

fluree upsert -f 01-data.ttl
# → Committed t=1, 8 flakes

Use upsert (not insert) for any TriG document that contains GRAPH blocks. The CLI’s insert path parses Turtle straight to flakes and does not extract GRAPH blocks; over HTTP, /v1/fluree/insert rejects Content-Type: application/trig outright. upsert handles both Turtle and TriG.


3. Stage the ontology and reasoning config (TriG)

Save as 02-ontology.trig:

@prefix f:    <https://ns.flur.ee/db#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex:   <http://example.org/> .

# ---- Core ontology: class hierarchy + owl:imports hub -----------------
GRAPH <http://example.org/onto/core> {
  <http://example.org/onto/core>
      a owl:Ontology ;
      owl:imports <http://example.org/onto/behaviors> .

  ex:Student      rdfs:subClassOf  ex:Person .
  ex:GradStudent  rdfs:subClassOf  ex:Student .
  ex:Professor    rdfs:subClassOf  ex:Person .
}

# ---- Imported ontology: property characteristics + domain/range -------
GRAPH <http://example.org/onto/behaviors> {
  ex:hasAncestor  a              owl:TransitiveProperty .
  ex:livesWith    a              owl:SymmetricProperty .
  ex:parentOf     owl:inverseOf  ex:childOf .
  ex:teaches      rdfs:domain    ex:Professor ;
                  rdfs:range     ex:Course .
}

# ---- Reasoning configuration ------------------------------------------
# schemaSource = <onto/core>, followOwlImports = true
# → reasoner walks the import closure and projects schema triples from
#   BOTH graphs onto the default graph for inference.
GRAPH <urn:fluree:demo:main#config> {
  <urn:demo:cfg>
      a f:LedgerConfig ;
      f:reasoningDefaults <urn:demo:cfg:reasoning> .

  <urn:demo:cfg:reasoning>
      f:schemaSource      <urn:demo:cfg:schemaref> ;
      f:followOwlImports  true .

  <urn:demo:cfg:schemaref>
      a f:GraphRef ;
      f:graphSource <urn:demo:cfg:schemasrc> .

  <urn:demo:cfg:schemasrc>
      f:graphSelector <http://example.org/onto/core> .
}

Submit it:

fluree upsert -f 02-ontology.trig --format turtle
# → Committed t=2, 17 flakes

--format turtle is needed because the file extension .trig is not on the auto-detect list; the parser treats the contents as Turtle/TriG.


4. Verify base data

Without reasoning, only asserted facts are returned:

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?s",
  "where":{"@id":"?s","@type":"ex:Person"},
  "reasoning":"none"
}'
# → ["ex:bob"]

Only bob is directly typed Person. The schema and the rest of the classifications are still hidden behind reasoning.


5. RDFS subclass expansion

rdfs:subClassOf is declared in <onto/core> (the schemaSource). With RDFS reasoning, querying for Person returns every subclass instance:

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?s",
  "where":{"@id":"?s","@type":"ex:Person"},
  "reasoning":"rdfs"
}'
# → ["ex:bob", "ex:carol", "ex:alice"]

alice (GradStudent → Student → Person) and carol (Professor → Person) are now classified through the hierarchy.


6. OWL 2 RL inference through the import

Everything below uses axioms declared in the imported <onto/behaviors> graph — they reach the reasoner only because owl:imports resolved correctly.

6.1 owl:TransitivePropertyex:hasAncestor

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?a",
  "where":{"@id":"ex:alice","ex:hasAncestor":"?a"},
  "reasoning":"owl2rl"
}'
# → ["ex:eve", "ex:frank"]

Asserted: alice → eve, eve → frank. Inferred via the TransitiveProperty axiom in the imported graph: alice → frank.

6.2 owl:SymmetricPropertyex:livesWith

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?p",
  "where":{"@id":"ex:bob","ex:livesWith":"?p"},
  "reasoning":"owl2rl"
}'
# → ["ex:alice"]

Only alice livesWith bob was asserted; the symmetric pair is inferred.

6.3 owl:inverseOfparentOf / childOf

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?p",
  "where":{"@id":"ex:alice","ex:childOf":"?p"},
  "reasoning":"owl2rl"
}'
# → ["ex:carol"]

Asserted: carol parentOf alice. Inferred: alice childOf carol.

6.4 rdfs:domain / rdfs:range

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?p",
  "where":{"@id":"?p","@type":"ex:Professor"},
  "reasoning":"owl2rl"
}'
# → ["ex:carol", "ex:professor1"]

professor1 was never typed. The reasoner infers it from teaches rdfs:domain Professor (declared in the imported graph) plus the asserted professor1 teaches cs101.

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?c",
  "where":{"@id":"?c","@type":"ex:Course"},
  "reasoning":"owl2rl"
}'
# → ["ex:cs101"]

Same idea on the range side: cs101 is classified as a Course because of teaches rdfs:range Course in the import.


7. OWL 2 QL — query rewriting only

OWL 2 QL handles the same constructs as RDFS plus owl:inverseOf and rdfs:domain/range, but at query rewrite time rather than via fact materialisation. For the patterns above where you query the inferred direction directly, OWL 2 RL is the simpler choice. OWL 2 QL is best when you want zero materialisation and your queries already align with the rewriting (e.g., asking for any superclass type).

fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?s",
  "where":{"@id":"?s","@type":"ex:Person"},
  "reasoning":"owl2ql"
}'
# → ["ex:bob", "ex:carol", "ex:alice"]

Same answer as RDFS for this pattern, with no materialisation step.


8. Full chain: combining modes

Combining rdfs + owl2rl lets schema hierarchy and forward-chained facts work together. professor1 appears as a Person via:

  1. teaches rdfs:domain Professor (imported axiom, OWL 2 RL)
  2. professor1 teaches cs101 (asserted)
  3. professor1 a Professor (derived)
  4. Professor rdfs:subClassOf Person (core ontology, RDFS)
  5. professor1 a Person (derived)
fluree query --format json '{
  "@context":{"ex":"http://example.org/"},
  "select":"?s",
  "where":{"@id":"?s","@type":"ex:Person"},
  "reasoning":["rdfs","owl2rl"]
}'
# → ["ex:bob", "ex:carol", "ex:professor1", "ex:alice"]

Submitting TriG over the HTTP API

The CLI’s upsert command is one way to load TriG. Against a running fluree-db-server, the same payload goes through the HTTP API. Both endpoints below accept Turtle/TriG when sent with Content-Type: application/trig (or text/turtle):

# Connection-scoped (specify ledger via query string)
curl -X POST 'http://localhost:8090/v1/fluree/upsert?ledger=demo:main' \
     -H 'Content-Type: application/trig' \
     --data-binary @02-ontology.trig

# Ledger-scoped path form
curl -X POST 'http://localhost:8090/v1/fluree/upsert/demo:main' \
     -H 'Content-Type: application/trig' \
     --data-binary @02-ontology.trig

The same TriG GRAPH blocks land in the same named graphs as via the CLI; nothing else changes about the reasoning wiring.

See HTTP endpoints for the full surface area and Datasets and named graphs for how named graphs participate in queries.


What was actually proved

Each query above is a load-bearing test that the import closure is being walked correctly:

QueryAxiom locationWithout owl:imports resolution it would…
§6.1 transitive ancestorsimported graph (behaviors)…only return ex:eve (no transitive closure)
§6.2 symmetric livesWithimported graph…return empty (bob livesWith alice not asserted)
§6.3 childOf via inverseimported graph…return empty (childOf is never asserted)
§6.4 domain/range classificationimported graph…not classify professor1 / cs101

If you change f:followOwlImports to false in the config graph, every query in §6 except bob livesWith collapses back to base data — a useful toggle for confirming the closure walk is what’s doing the work.

Design

Architecture and design documents for Fluree’s internal systems. These documents describe the rationale behind key design decisions, wire formats, and trait architectures.

Documents

Query execution and overlay merge

How queries run through a single preparation/execution pipeline, how scan operators select the binary-cursor path vs the range fallback, and where overlay novelty merges with indexed data (including graph scoping boundaries).

Auth Contract (CLI ↔ Server)

Wire-level contract between the Fluree CLI and any Fluree-compatible server, covering OIDC device auth, token refresh, and storage proxy authentication.

Nameservice Schema v2

Design of the nameservice schema: ledger records, graph source records, configuration payloads, and the ref/config/tracking store abstractions.

Storage-agnostic Commits and Sync

How ContentId (CIDv1) values decouple the commit chain from storage backends, enabling replication across filesystem, S3, and IPFS. Includes the pack protocol wire format for efficient bulk transfer.

ContentId and ContentStore

The content-addressed identity layer: ContentId type, ContentStore trait, multicodec content kinds, and the bridge between CID-based identity and storage-backend addressing.

Index Format

Binary columnar index format: branch/leaf/leaflet hierarchy, dictionary artifacts, SPOT/PSOT/POST/OPST/TSPO layout, and encoding details.

Namespace allocation and fallback modes

How Fluree assigns ns_code values for IRIs (prefix trie matching, fallback split modes), including bulk-import preflight mitigation and how the “host-only” fallback persists for future transactions.

Ontology imports (f:schemaSource + owl:imports)

How the reasoner consumes schema from a named f:schemaSource graph and transitively resolves owl:imports: resolution order, the SchemaBundleOverlay projection, schema-triple whitelist, and caching.

Storage Traits

Storage trait architecture: StorageRead, StorageWrite, ContentAddressedWrite, Storage, and NameService trait design with guidance for implementing new backends.

Query execution and overlay merge


title: Query execution and overlay merge

This document describes the single query execution pipeline in Fluree DB and how it combines:

  • Indexed data (binary columnar indexes)
  • Overlay data (novelty + staged flakes)

It also calls out where graph scoping (g_id) is applied so named graphs remain isolated.

Pipeline overview

flowchart TD
  LedgerState -->|produces| LedgerSnapshot
  LedgerSnapshot -->|shared substrate| GraphDb
  GraphDb -->|single-ledger| QueryRunner
  GraphDb -->|member_of| DataSetDb
  DataSetDb -->|federated| QueryRunner
  QueryRunner -->|scan index + merge overlay| DatasetOperator
  DatasetOperator -->|per-graph| BinaryScanOperator
  BinaryScanOperator -->|fast path| BinaryCursor
  BinaryScanOperator -->|fallback| range_with_overlay
  BinaryCursor -->|graph-scoped decode| BinaryGraphView
  range_with_overlay -->|delegates| RangeProvider

Where this exists in code

  • API entrypoints

    • fluree-db-api/src/view/query.rs: single-ledger GraphDb queries (query)
    • fluree-db-api/src/view/dataset_query.rs: dataset queries (DataSetDb)
  • Unified query runner

    • fluree-db-query/src/execute/runner.rs
      • prepare_execution(db: GraphDbRef<'_>, query: &ExecutableQuery) builds derived facts/ontology (if enabled), rewrites patterns, and builds the operator tree.
      • execute_prepared(...) runs the operator tree using an ExecutionContext.
  • Dataset operator

    • fluree-db-query/src/dataset_operator.rs
      • DatasetOperator wraps every triple-pattern scan. In single-graph mode (the common case) it passes through to one inner BinaryScanOperator with negligible overhead. In multi-graph mode (FROM/FROM NAMED datasets) it fans out one inner operator per active graph, drives their lifecycles, and stamps ledger provenance (Binding::IriMatch) on results that span multiple ledgers.
      • DatasetBuilder trait (factory pattern): the planner constructs a ScanDatasetBuilder at plan time; DatasetOperator calls build() at execution time during open() to produce per-graph BinaryScanOperators.
      • Nested composition: inner operators can themselves be DatasetOperators — provenance stamping passes IriMatch through unchanged.
  • Scan operators

    • fluree-db-query/src/binary_scan.rs
      • BinaryScanOperator handles single-graph scanning only. Selects between binary cursor (streaming, integer-ID pipeline) and range fallback at open() time based on the ExecutionContext.
  • Range fallback

    • fluree-db-core/src/range.rs: range_with_overlay(snapshot, g_id, overlay, ...)
    • fluree-db-core/src/range_provider.rs: RangeProvider trait implemented by the binary range provider

Graph scoping (g_id)

Graph scoping is applied at two key boundaries:

  • Binary streaming path: BinaryCursor operates on a BinaryGraphView (graph-scoped decode handle), ensuring leaf/leaflet decoding, predicate dictionaries, and specialty arenas are graph-isolated.
  • Range path: range_with_overlay(snapshot, g_id, overlay, ...) passes g_id into the RangeProvider, which routes the range query to the correct per-graph index segments.

Overlay providers are graph-scoped at the trait boundary: the overlay hook receives g_id and must only return flakes for that graph. This keeps multi-tenant named graphs isolated even when overlay data is sourced externally.

Overlay merge semantics (high level)

Both scan paths implement the same logical behavior:

  • Read matching flakes from the indexed base (binary files)
  • Read matching flakes from the overlay (novelty/staged)
  • Merge them using (t, op) semantics so retractions cancel assertions as-of the query time bound

The details differ:

  • BinaryScanOperator translates overlay flakes into integer-ID space and merges them into the decoded columnar stream.
  • RangeScanOperator delegates to range_with_overlay, which combines RangeProvider output with overlay output.

Auth contract (CLI ↔ Server)

This document defines the wire-level contract between the Fluree CLI and any Fluree-compatible server (a standalone fluree-server, an OIDC-capable application embedding Fluree, or future products). Any implementation that exposes these endpoints will get zero-configuration CLI auth.

For the overall authentication model, see Authentication.

Implementer checklist (CLI compatibility)

An implementation is considered CLI-compatible if the Fluree CLI can:

  • discover how to authenticate,
  • obtain/store a Bearer token, and
  • use that token for data-plane operations (and optionally refresh it).

Required (for “it works”)

  • Auth discovery: implement GET /.well-known/fluree.json.
    • Return at least { "version": 1 }.
    • If you support automated login, include an auth object with type="oidc_device" and required fields (issuer, client_id, exchange_url).
    • If you do not support automated login, you may omit auth (CLI will use manual token input), or return auth.type="token" to be explicit.
  • Token exchange / refresh (only for auth.type="oidc_device"): implement POST {exchange_url}:
    • grant_type="urn:ietf:params:oauth:grant-type:token-exchange" for exchanging an IdP token into a Fluree-scoped token.
    • grant_type="refresh_token" for refreshing without user interaction (optional; CLI will still work without refresh, but requires re-login when tokens expire).
  • Issue Fluree-scoped JWTs: the access_token you return MUST include the standard Fluree claims used by fluree-server:
    • identity: fluree.identity (recommended) and standard iss/sub/exp/iat
    • scopes: fluree.ledger.read.*, fluree.ledger.write.*, fluree.events.* (as applicable)
    • replication scopes (fluree.storage.*) MUST be reserved for operator/service principals only.
  • Stable error messages: keep error strings stable and human-readable. The CLI may pattern-match on substrings (e.g. "Bearer token required", "Untrusted issuer") to provide hints.
  • Anti-leak semantics: for data endpoints, return 404 for out-of-scope ledgers (do not leak existence).
  • Verified diagnostics: implement GET /v1/fluree/whoami (or an equivalent endpoint) to return token_present, verified, auth_method, identity, and scope summary.

Auth discovery

GET /.well-known/fluree.json

The CLI fetches this endpoint when a remote is added (fluree remote add) to auto-configure auth. The server MAY expose this endpoint. If absent, the CLI falls back to manual token configuration.

Response (200 OK, application/json):

{
  "version": 1,
  "api_base_url": "https://data.example.com/v1/fluree",
  "auth": {
    "type": "oidc_device",
    "issuer": "https://issuer.example.com",
    "client_id": "fluree-cli",
    "exchange_url": "https://data.example.com/v1/fluree/auth/exchange",
    "scopes": ["openid", "profile"],
    "redirect_port": 8400
  }
}

api_base_url

api_base_url tells the CLI where the Fluree HTTP API is mounted. It is specifically intended to support implementations that:

  • mount the Fluree API under a non-root prefix (e.g. /v1/fluree), and/or
  • want discovery served from a different host than the data plane (e.g. www.example.com serving discovery that points at data.example.com).

Contract:

  • api_base_url MAY be:
    • an absolute URL, e.g. https://data.example.com/v1/fluree, or
    • an absolute-path reference (relative to the discovery origin), e.g. /v1/fluree.
  • If api_base_url is an absolute-path reference, the CLI MUST resolve it against the origin (scheme + host + port) of the discovery document URL it fetched (i.e., the URL used for GET /.well-known/fluree.json).
    • Example: discovery fetched from https://abc123.cloudfront.net/.well-known/fluree.json and api_base_url="/v1/fluree" resolves to https://abc123.cloudfront.net/v1/fluree.
  • api_base_url SHOULD include the full prefix including fluree and SHOULD NOT have a trailing slash.
  • The CLI MUST use the resolved api_base_url as the base for subsequent API calls (query/insert/upsert/update/info/exists).
  • If api_base_url is absent, the CLI MUST derive it from the configured remote URL:
    • If the remote URL already ends with /fluree, use it as-is.
    • Otherwise, append /fluree.
    • If you mount a versioned API (for example /v1/fluree), you SHOULD include api_base_url in discovery to avoid ambiguity.

auth.type values

TypeMeaningCLI behavior
oidc_deviceOIDC interactive login + token exchangefluree auth login uses device-code if the IdP supports it, otherwise auth-code+PKCE
tokenManual Bearer token (no automated login flow)fluree auth login --token <value>

Field reference (oidc_device)

FieldRequiredDescription
issuerYesOIDC issuer URL (used for /.well-known/openid-configuration discovery)
client_idYesOAuth client ID for the CLI (public client; no client secret)
exchange_urlYesAbsolute URL for the Fluree token exchange endpoint
scopesNoOAuth scopes to request (default: ["openid"])
redirect_portNoPort for auth-code callback listener (default: first available in 8400..8405; also overrideable via FLUREE_AUTH_PORT)

Fallback behavior

  • Discovery endpoint absent (404 or connection error) → CLI assumes token type, prompts user to provide a token manually
  • version > 1 → CLI warns but attempts to parse known fields

Token exchange

POST {exchange_url}

After the CLI completes OIDC login with the IdP, it calls the exchange endpoint to trade the IdP token for a Fluree-scoped Bearer token. This endpoint is hosted by the application that manages authorization (e.g., an app embedding Fluree and maintaining user entitlements).

Request:

POST /v1/fluree/auth/exchange HTTP/1.1
Content-Type: application/json

{
  "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
  "subject_token": "<idp-access-token-or-id-token>",
  "subject_token_type": "urn:ietf:params:oauth:token-type:access_token"
}

Success response (200 OK):

{
  "access_token": "<fluree-bearer-token>",
  "token_type": "Bearer",
  "expires_in": 3600,
  "refresh_token": "<optional-refresh-token>"
}

Error response (401/403):

{
  "error": "invalid_grant",
  "error_description": "IdP token is invalid or user is not authorized for Fluree access"
}

Contract

  • The exchange endpoint validates the IdP token (against the IdP’s JWKS or userinfo), looks up the user’s Fluree entitlements, and mints a Fluree-scoped JWT.
  • The returned access_token MUST be a JWT that fluree-server can verify (via JWKS). It MUST include the standard Fluree claims (fluree.identity, fluree.ledger.*, and optionally fluree.storage.*). See Bearer token claim set.
  • refresh_token is OPTIONAL. If present, the CLI stores it and uses it for silent refresh.
  • subject_token_type MAY be urn:ietf:params:oauth:token-type:id_token if the CLI sends the ID token instead of the access token.

This loosely follows RFC 8693 (OAuth 2.0 Token Exchange).

Token refresh

POST {exchange_url}

If the CLI holds a refresh_token, it can request a new access token without user interaction.

Request:

{
  "grant_type": "refresh_token",
  "refresh_token": "<stored-refresh-token>"
}

Success response: Same shape as token exchange success.

Failure: CLI clears stored tokens and prompts fluree auth login.

CLI TOML config format

The CLI stores auth configuration per-remote in .fluree/config.toml:

[[remotes]]
name = "solo-prod"
type = "Http"
base_url = "https://solo.example.com"

[remotes.auth]
type = "oidc_device"
issuer = "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_abc123"
client_id = "fluree-cli"
exchange_url = "https://solo.example.com/v1/fluree/auth/exchange"
scopes = ["openid", "profile"]
redirect_port = 8400
token = "eyJ..."           # cached Fluree Bearer token (written by 'fluree auth login')
refresh_token = "eyJ..."   # refresh token (written by 'fluree auth login')

[[remotes]]
name = "local"
type = "Http"
base_url = "http://localhost:8090"

[remotes.auth]
type = "token"
token = "eyJ..."           # manually provided via 'fluree auth login --token'

Backward compatibility: If type is absent, infer "token" if token is present, otherwise treat as unauthenticated.

CLI fluree auth login behavior

fluree auth login [--remote <name>]
  1. Resolve the target remote.
  2. Check auth.type:
    • oidc_device:
      1. Discover OIDC endpoints from {issuer}/.well-known/openid-configuration.
      2. If the discovery document includes device_authorization_endpoint, run OAuth device-code:
        • POST to device_authorization_endpoint to get device_code, user_code, verification_uri.
        • Print: Open {verification_uri} and enter code: {user_code}
        • Poll token_endpoint until user completes browser auth.
      3. Otherwise, if the discovery document includes authorization_endpoint, run OAuth authorization-code + PKCE:
        • Start a localhost callback listener on http://127.0.0.1:{port}/callback (port selection: redirect_port/FLUREE_AUTH_PORT, else first available in 8400..8405).
        • Open the system browser to the authorization_endpoint URL including code_challenge and requested scopes.
        • Receive the callback, then exchange the code at token_endpoint.
        • Note for Cognito: callback URLs must be pre-allowlisted (no wildcard ports); allowlist http://127.0.0.1:8400/callback through http://127.0.0.1:8405/callback (or your chosen fixed port).
      4. POST IdP token to exchange_url → get Fluree Bearer token.
      5. Store token and refresh_token in remote config.
    • token: Prompt for token (or accept --token <value|@file|@->). Store in config.
    • Unset / no discovery: Attempt discovery at {base_url}/.well-known/fluree.json. If found, configure auth type and proceed. If not found, fall back to token flow.

See CLI auth command for full command reference.

CLI auto-refresh on 401

Auto-refresh applies to data-plane commands (query, insert, upsert, info) that use RemoteLedgerClient in tracked mode or --remote mode.

When a data-plane command receives a 401 from the remote:

  1. If auth.type == "oidc_device" and refresh_token is present:
    • Attempt silent refresh via the exchange endpoint.
    • On success: update stored token and (if rotated) refresh token in .fluree/config.toml, retry the original request once.
    • On failure: clear tokens, print Token expired. Run: fluree auth login --remote <name>
  2. Otherwise: print Authentication failed. Run: fluree auth login --remote <name>

Replication commands (fetch, pull, push)

Replication commands use HttpRemoteClient (from fluree-db-nameservice-sync) which does not perform auto-refresh. This is intentional:

  • Replication requires fluree.storage.* scopes, which are reserved for operators and service accounts.
  • Operator tokens are typically long-lived or non-expiring. If an operator token expires, the user should run fluree auth login to obtain a new one.
  • Regular users who only have query-scoped tokens should use fluree track + --remote mode instead of fetch/pull/push.

Scope rules

  • The exchange endpoint MUST NOT grant fluree.storage.* to regular users. Replication scope is for operators and service accounts only. See Replication vs query boundary.
  • If a user with only query-scoped tokens attempts fluree pull or fluree fetch, the CLI MUST fail with a clear message explaining that replication requires fluree.storage.* and suggesting fluree track instead.

Token diagnostic endpoint

GET /v1/fluree/whoami

A verified diagnostic endpoint that performs full cryptographic verification of the Bearer token (if present) using the same code path as data endpoints. This is the recommended way for the CLI or an implementing application to validate a token without side effects.

No token:

{ "token_present": false }

Valid token (verified):

{
  "token_present": true,
  "verified": true,
  "auth_method": "embedded_jwk",
  "issuer": "did:key:z6Mk...",
  "subject": "admin@example.com",
  "identity": "did:key:z6Mk...",
  "expires_at": 1739012345,
  "scopes": {
    "ledger_read_all": true,
    "ledger_write_all": true
  }
}

Invalid token (verification failed):

{
  "token_present": true,
  "verified": false,
  "error": "Token expired",
  "issuer": "did:key:z6Mk...",
  "subject": "admin@example.com",
  "expires_at": 1738900000
}

When verification fails, the response includes unverified decoded claims (base64-decoded without signature check) for debugging. These fields are explicitly untrustworthy — they help diagnose why verification failed (e.g., wrong issuer, expired token) but must never be used for authorization decisions.

The auth_method field is only present on successful verification: "embedded_jwk" for Ed25519/JWS tokens, "oidc" for JWKS/RS256 tokens.

This endpoint always returns 200 regardless of token validity — it is diagnostic, not a gate.

Error semantics

Standard error response shape

fluree-server returns errors as JSON with a consistent structure. Implementers SHOULD follow this shape so the CLI can display meaningful diagnostics.

{
  "error": "<human-readable description>",
  "status": 401,
  "@type": "err:db/Unauthorized",
  "cause": {
    "error": "<nested cause (optional)>",
    "status": 400,
    "@type": "err:db/JsonParse"
  }
}

Notes:

  • error is the primary human-readable message. The CLI may pattern-match on substrings inside this field.
  • @type is a compact error type IRI used as a stable, machine-readable code.
  • cause is optional and may be nested.
  • Implementers MAY include additional fields, but MUST keep error stable and human-readable.

Status codes

CodeMeaningWhen
200SuccessRequest completed successfully
400Bad requestMalformed body, invalid JSON, missing required fields
401UnauthorizedMissing Bearer token, expired token, invalid signature, unknown signing key
403ForbiddenValid token but insufficient scope (e.g., query-only token on admin endpoint)
404Not found or unauthorizedLedger does not exist, or token lacks access to this ledger (anti-leak)
409ConflictLedger already exists (/fluree/create), concurrent transaction conflict
500Internal errorServer-side failure

Anti-leak pattern: 404 for out-of-scope ledgers

Data endpoints (/fluree/query, /fluree/update, etc.) return 404 rather than 403 when a valid token lacks access to the requested ledger. This prevents authenticated users from discovering the existence of ledgers they are not authorized to access.

Implication for CLI and implementers: A 404 on a data endpoint can mean either:

  • The ledger genuinely does not exist, or
  • The token does not have scope for that ledger.

The CLI should present both possibilities in error messages. Implementers should not attempt to distinguish these cases client-side.

Token verification errors (401)

Common 401 error messages and their causes:

Server messageCauseCLI hint
Bearer token requiredNo Authorization: Bearer ... headerfluree auth login --remote <name>
Invalid tokenMalformed JWT/JWS, bad signatureRe-issue token; check signing key
Token expiredexp claim is in the pastRefresh or re-login
Untrusted issueriss / signing key not in trusted listCheck --trusted-issuer / --jwks-issuer config
OIDC issuer not configuredToken has kid header but no JWKS configuredAdd --jwks-issuer to server config
Token lacks storage proxy permissionsValid token but missing fluree.storage.*Use operator token or fluree track instead

Implementor checklist

Any Fluree-compatible server that wants zero-config CLI auth must:

  1. Expose GET /.well-known/fluree.json with the discovery payload
  2. Implement POST {exchange_url} for token exchange and refresh
  3. Issue Fluree-scoped JWTs with the standard claim set
  4. Publish a JWKS endpoint so fluree-server can verify issued tokens (configured via --jwks-issuer)

Conformance checklist (status codes)

Implementors MUST return these status codes consistently so the CLI can provide good diagnostics:

EndpointSuccessMissing tokenBad tokenInsufficient scopeNot found / no access
GET /.well-known/fluree.json200n/an/an/a404 (not implemented)
POST /v1/fluree/create201401401403n/a
POST /v1/fluree/drop200401401403404
POST /v1/fluree/query200401401404 (anti-leak)404 (anti-leak)
POST /v1/fluree/update200401401404 (anti-leak)404 (anti-leak)
POST /v1/fluree/auth/exchange200n/a401403n/a
GET /v1/fluree/whoami200200 (token_present=false)200 (verified=false)n/an/a

Conformance checklist (error bodies)

All error responses MUST include a JSON body. The body SHOULD include at least an error or message field. The CLI pattern-matches on specific substrings (e.g., "Bearer token required", "Untrusted issuer") to provide targeted hints, so error messages should be stable across releases.

See also

Nameservice Schema v2 Design

Schema Version: 2

Overview

This document describes the design for a unified nameservice schema that supports:

  1. Ledgers with named graphs and independent indexing
  2. Non-ledger graph sources (indexes/mappings like BM25, Iceberg/R2RML, Vector/HNSW, JDBC, etc.) with varying versioning semantics
  3. Four independent atomic concerns that can be updated without contention
  4. Watermarked updates for client subscription and push notifications
  5. Pluggable backends (DynamoDB, S3, filesystem) with consistent semantics

Terminology:

  • Prefer graph source in docs and user-facing API descriptions.
  • Non-ledger data sources (BM25, vector, Iceberg, R2RML) are called graph sources.

Design Goals

  • Stable schema: Minimize attribute changes as features evolve
  • Flexible payloads: Use JSON Maps for evolving/variable content
  • Reduced conflict probability: Logically independent concerns minimize contention
  • Client subscriptions: Watermarks enable efficient change detection
  • Coordination via status: Soft locks/leases for distributed process coordination

The Four Concerns Model

Each nameservice record has four independent concerns, each with its own watermark and payload:

#ConcernWatermarkPayloadUpdated By
1Headcommit_tcommitTransactor (on commit)
2Indexindex_tindexIndexer (on index publish)
3Statusstatus_vstatusVarious (state changes, metrics, locks)
4Configconfig_vconfigAdmin (settings changes)

Each concern can be pushed independently without affecting or contending with the others.


DynamoDB Schema

Table Name

fluree-nameservice (configurable)

Physical layout: item-per-concern (PK+SK)

DynamoDB serializes writes per item, not per attribute. To achieve true per-concern independence (transactor vs indexer vs admin), represent each concern as a separate item under the same address partition:

  • pk (partition key): record address in the name:branch form (e.g., "mydb:main", "products-search:main")
  • sk (sort key): concern discriminator

Recommended sk values:

  • meta
  • head (ledgers only)
  • index (ledgers + graph sources)
  • config (ledgers + graph sources)
  • status (ledgers + graph sources)

This layout aligns with the file-backed v2 pattern (.index.json separate) while also eliminating DynamoDB physical contention between writers.

Design Note: Per-Concern Independence

Each concern is logically independent:

  • No shared updated_at: Each concern’s watermark (commit_t, index_t, etc.) serves as its timestamp/version marker
  • Disjoint items: Updating one concern does not touch any attributes of another concern
  • Reduced conflict probability: Independent concerns minimize logical contention

With the item-per-concern layout, DynamoDB contention is limited to writers of the same concern.

Entity kinds and graph source types

The meta item carries the record discriminator:

  • kind: ledger | graph_source
  • source_type (graph sources only): a type string (e.g., f:Bm25Index, f:HnswIndex, f:IcebergSource, f:R2rmlSource, f:JdbcSource)

Use graph_source naming consistently in pk values and type strings.


Watermark Semantics

Watermarks are strict monotonic per concern. This ensures:

  1. Clients can detect changes by comparing watermarks.
  2. No change is ever “invisible” to subscribers.
  3. Simple comparison logic: if remote_watermark > local_watermark then changed.

commit_t (Ledger commit watermark)

  • Value: Equals the commit t (transaction time).
  • Update rule: Strict monotonic (new_t > current_t).
  • Rationale: Commits are already strictly ordered by t, so t IS the version

index_t (Index watermark)

  • Value: Transaction time t that the published index covers.
  • Update rule: Strict monotonic (new_t > current_t).
  • Admin reindex: allow idempotent overwrite at the same t (new_t >= current_t) when rebuilding an index to the same watermark with a new address.

status_v (Status Watermark)

  • Value: Atomic incrementing integer
  • Update rule: Strict monotonic (new_v > current_v)
  • Rationale: Status has no t relation; version is just a change counter

config_v (Config Watermark)

  • Value: Atomic incrementing integer
  • Update rule: Strict monotonic (new_v > current_v)
  • Rationale: Config has no t relation; version is just a change counter

Unborn State Semantics

When a record is initialized but has no data yet for a concern:

ConcernUnborn WatermarkUnborn PayloadMeaning
headcommit_t = 0commit = nullLedger initialized, no commits yet
indexindex_t = 0index = nullNo index published yet
statusstatus_v = 1status = {state: "ready"}Always has initial status
configconfig_v = 0config = nullNo config set yet

Key distinction:

  • *_v = 0 with payload = null: Initialized but unborn (record exists)
  • Record not found (GetItem returns nothing): Unknown/never created

Payload Schemas

commit (Ledger)

{
  "id": "bafybeigdyr...commitCid",
  "t": 42
}
FieldTypeDescription
idStringContentId (CIDv1) of the commit
tNumberTransaction time (redundant with commit_t but explicit)

See ContentId and ContentStore for details on the CID format.

index (Ledger with Named Graphs)

{
  "default": {
    "id": "bafybeig...indexRootDefault",
    "t": 42,
    "rev": 0
  },
  "txn-metadata": {
    "id": "bafybeig...indexRootTxnMeta",
    "t": 42,
    "rev": 1
  },
  "audit-log": null
}
FieldTypeDescription
{named-graph}Object | nullIndex state per named graph
.idStringContentId (CIDv1) of the index root
.tNumberTransaction time the index covers
.revNumberRevision at that t (0, 1, 2… for reindex operations)

Named graph = null means that graph exists but hasn’t been indexed yet.

index (Graph Source)

For graph sources with index state (e.g., BM25, vector, spatial, Iceberg, etc.), the nameservice stores a head pointer to the graph source’s latest index root/manifest. The payload is intentionally opaque to nameservice: the graph source implementation defines what the ContentId points to and how (or whether) it supports time travel.

{
  "id": "bafybeig...graphSourceIndexRoot",
  "index_t": 42
}

For graph sources with no index concept (e.g., JDBC mappings): null.

Design note: Snapshot history (if any) is stored in graph-source-owned manifests in storage, not in nameservice. See docs/design/graph-source-index-manifests.md.

status

{
  "state": "ready",
  "queue_depth": 3,
  "last_commit_ms": 45
}
FieldTypeDescription
stateStringCurrent state (see State Values below)
*AnyAdditional metadata varies by state and entity type

State Values

StateDescriptionTypical Metadata
readyNormal operating state (default initial state)queue_depth, last_commit_ms
indexingBackground indexing in progressindex_lock
reindexingFull reindex in progressreindex_lock, progress
syncingGraph source syncing from sourceprogress, source_t, synced_t
maintenanceAdministrative maintenance in progressmaintenance_lock
retractedSoft-deletedretracted_at, reason
errorError stateerror, error_at

status with Locks (Coordination)

{
  "state": "indexing",
  "index_lock": {
    "holder": "indexer-7f3a",
    "target_t": 45,
    "acquired_at": 1705312200,
    "expires_at": 1705316100
  }
}
FieldTypeDescription
index_lockObject | nullSoft lock for indexing coordination
.holderStringIdentifier of the process holding the lock
.target_tNumberThe t being indexed
.acquired_atNumberUnix epoch when lock was acquired
.expires_atNumberUnix epoch when lock expires (lease timeout)

config

{
  "default_context_id": "bafkreih...contextCid",
  "index_threshold": 1000,
  "replication": {
    "factor": 3,
    "regions": ["us-east-1", "us-west-2"]
  }
}

Config is fully flexible JSON. Common fields:

FieldTypeDescription
default_context_idStringContentId (CIDv1) of default JSON-LD context
index_thresholdNumberCommits before auto-index
replicationObjectReplication settings

For graph sources, config contains type-specific settings:

BM25:

{
  "k1": 1.2,
  "b": 0.75,
  "fields": ["title", "body", "description"]
}

JDBC:

{
  "connection_string": "jdbc:postgresql://host:5432/db",
  "schema": "public",
  "pool_size": 10
}

DynamoDB Operations

CAS Semantics (Git-like Push)

All push operations support compare-and-set (CAS) semantics with expected old values. This enables Git-like divergence detection:

  • Caller provides expected (the last-known state) and new (the desired state)
  • Backend rejects if current state doesn’t match expected
  • On rejection, backend returns actual current state for caller to reconcile

This is stronger than simple watermark monotonicity: it detects divergence, not just staleness.

Create (Initialize)

Operation: PutItem
ConditionExpression: attribute_not_exists(#pk)
Item: {
  pk: "mydb:main",
  sk: "meta",
  schema: 2,
  kind: "ledger",
  name: "mydb",
  branch: "main",
  dependencies: null,
  created_at: <now>,          // optional
  updated_at_ms: <now_ms>,
  retracted: false,
}

push_commit (Publish Commit)

Option A: Monotonic only (simpler, allows fast-forward by any newer commit)

Operation: UpdateItem
Key: { pk: "mydb:main", sk: "head" }
ConditionExpression: attribute_not_exists(#ct) OR #ct < :new_t
UpdateExpression: SET #ct = :new_t, #c = :commit
ExpressionAttributeNames: {
  "#ct": "commit_t",
  "#c": "commit"
}
ExpressionAttributeValues: {
  ":new_t": 42,
  ":commit": { "id": "bafybeig...commitT42", "t": 42 }
}

Option B: CAS with expected value (Git-like, detects divergence)

CAS checks both watermark equality AND payload equality. The condition is a single OR’d expression handling both existing and unborn cases:

Operation: UpdateItem
Key: { pk: "mydb:main", sk: "head" }

// Single condition: existing case OR unborn case
ConditionExpression:
  (#ct = :expected_t AND #c = :expected_commit AND :new_t > :expected_t)
  OR
  (#ct = :zero AND attribute_type(#c, :null_type) AND :new_t > :zero)

UpdateExpression: SET #ct = :new_t, #c = :commit
ExpressionAttributeNames: {
  "#ct": "commit_t",
  "#c": "commit"
}
ExpressionAttributeValues: {
  ":expected_t": 41,                                              // caller's last-known watermark
  ":expected_commit": { "id": "bafybeig...commitT41", "t": 41 },  // caller's last-known payload
  ":new_t": 42,
  ":commit": { "id": "bafybeig...commitT42", "t": 42 },
  ":zero": 0,
  ":null_type": "NULL"
}

Caller logic: Set :expected_v and :expected_head based on last-known state:

  • If unborn: :expected_v = 0, :expected_head can be any value (the unborn clause matches on #hv = :zero)
  • If existing: :expected_v = last_v, :expected_head = last_payload

Note: DynamoDB does support nested paths like #h.#addr (with #h=head, #addr=address). However, comparing the entire map (#h = :expected_head) is simpler and avoids partial-match edge cases.

Recommendation: Use Option B (CAS) for transactors to detect divergence. Use Option A for distributed sync where fast-forward is acceptable.

push_index (Publish Index)

CAS with expected watermark + monotonic enforcement:

Operation: UpdateItem
Key: { pk: "mydb:main", sk: "index" }
ConditionExpression: (attribute_not_exists(#it) OR #it < :new_t)
UpdateExpression: SET #it = :new_t, #i = :index
ExpressionAttributeNames: {
  "#it": "index_t",
  "#i": "index"
}
ExpressionAttributeValues: {
  ":new_t": 42,
  ":index": {
    "default": { "id": "bafybeig...indexDefault", "t": 42, "rev": 0 },
    "txn-metadata": { "id": "bafybeig...indexTxnMeta", "t": 42, "rev": 1 }
  },
}

Note: For admin rebuilds at the same watermark, allow #it <= :new_t as the condition (idempotent overwrite at equal t).

push_status (Update Status)

Operation: UpdateItem
Key: { pk: "mydb:main", sk: "status" }
ConditionExpression: (#sv = :expected_v AND :new_v > :expected_v)
                     OR
                     (attribute_not_exists(#sv) AND :expected_v = :zero)
UpdateExpression: SET #sv = :new_v, #s = :status
ExpressionAttributeNames: {
  "#sv": "status_v",
  "#s": "status"
}
ExpressionAttributeValues: {
  ":expected_v": 89,
  ":zero": 0,
  ":new_v": 90,
  ":status": { "state": "ready", "queue_depth": 0 }
}

Note: status_v starts at 1 (not 0) on creation, so attribute_not_exists(#sv) handles cases where the attribute is missing (e.g., partially-written or manually-created items). Normal updates use the first clause.

push_config (Update Config)

Operation: UpdateItem
Key: { pk: "mydb:main", sk: "config" }
ConditionExpression: (#cv = :expected_v AND :new_v > :expected_v)
                     OR
                     (#cv = :zero AND attribute_type(#c, :null_type) AND :expected_v = :zero)
UpdateExpression: SET #cv = :new_v, #c = :config
ExpressionAttributeNames: {
  "#cv": "config_v",
  "#c": "config"
}
ExpressionAttributeValues: {
  ":expected_v": 2,
  ":zero": 0,
  ":new_v": 3,
  ":config": { "default_context_id": "bafkreih...", "index_threshold": 500 },
  ":null_type": "NULL"
}

Note: Unborn clause checks both #cv = :zero AND attribute_type(#c, NULL) to prevent accepting writes against inconsistent states.

Retract

Operation: UpdateItem
Key: { pk: "mydb:main", sk: "meta" }
UpdateExpression: SET #r = :true, #sv = :new_sv, #s = :status
ExpressionAttributeNames: {
  "#r": "retracted",
  "#sv": "status_v",
  "#s": "status"
}
ExpressionAttributeValues: {
  ":true": true,
  ":new_sv": 91,
  ":status": { "state": "retracted", "retracted_at": 1705315800 }
}

Lookup (Read)

Operation: GetItem
Key: { pk: "mydb:main", sk: "meta" }
ConsistentRead: true

To read full state, query all items for the record address: pk = "mydb:main" and assemble meta + head + index + status + config as present.

List by Kind

Operation: Query (requires GSI on kind)
KeyConditionExpression: #kind = :kind
ExpressionAttributeNames: { "#kind": "kind" }
ExpressionAttributeValues: { ":kind": "ledger" }

To list graph sources, query kind = graph_source.

To list graph sources of a specific type (optional GSI), query source_type = f:Bm25Index, etc.


Push Result Handling

Each push operation returns one of:

ResultMeaningAction
UpdatedUpdate acceptedProceed
ConflictExpected didn’t match currentReconcile using actual

Rust Types (aligned with existing RefKind/CasResult vocabulary)

#![allow(unused)]
fn main() {
/// Which concern is being read or updated.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub enum ConcernKind {
    /// The commit head pointer (`commit_t` + `commit` payload)
    Head,
    /// The index state (`index_t` + `index` payload)
    Index,
    /// The status state (status_v + status payload)
    Status,
    /// The config state (config_v + config payload)
    Config,
}

/// Value of a concern: watermark + optional payload.
///
/// - `Some(ConcernValue { v: 0, payload: None })` — unborn (initialized, no data)
/// - `Some(ConcernValue { v: N, payload: Some(...) })` — has data
/// - `None` (at Option level) — record doesn't exist
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct ConcernValue<T> {
    pub v: i64,
    pub payload: Option<T>,
}

/// Outcome of a compare-and-set push operation.
///
/// Conflicts are NOT errors — they are expected outcomes of concurrent
/// writes and must be handled by the caller (retry, report, etc.).
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum CasResult<T> {
    /// CAS succeeded — the concern was updated to the new value.
    Updated,
    /// CAS failed — `expected` did not match the current value.
    /// `actual` carries the current concern value so the caller can decide
    /// what to do next (retry, diverge, etc.).
    Conflict { actual: Option<ConcernValue<T>> },
}
}

Conflict Handling

On Conflict, the caller receives the actual current state and can:

  1. Fast-forward: If actual.v < new.v, retry with expected = actual
  2. Divergence: If actual.v >= new.v or addresses differ unexpectedly, handle merge/error
  3. Retry loop: For distributed systems, implement bounded retry with backoff
#![allow(unused)]
fn main() {
async fn push_with_retry<T>(
    ns: &impl ConcernPublisher<T>,
    address: &str,
    kind: ConcernKind,
    new: ConcernValue<T>,
    max_retries: usize,
) -> Result<CasResult<T>> {
    let mut expected = ns.get_concern(address, kind).await?;

    for _ in 0..max_retries {
        match ns.push_concern(address, kind, expected.as_ref(), &new).await? {
            CasResult::Updated => return Ok(CasResult::Updated),
            CasResult::Conflict { actual } => {
                // Check if fast-forward is still possible
                if let Some(ref act) = actual {
                    if new.v <= act.v {
                        // Diverged - can't fast-forward
                        return Ok(CasResult::Conflict { actual });
                    }
                }
                // Retry with new expected
                expected = actual;
            }
        }
    }

    // Exhausted retries
    let actual = ns.get_concern(address, kind).await?;
    Ok(CasResult::Conflict { actual })
}
}

Example Records

DynamoDB (item-per-concern) examples

This section shows the DynamoDB physical layout (multiple items per address partition). Other backends serialize the same logical concerns differently.

Ledger (typical items)

Ledger records are represented as multiple items under the same pk:

{
  "pk": "mydb:main",
  "sk": "meta",
  "schema": 2,
  "kind": "ledger",
  "name": "mydb",
  "branch": "main",
  "created_at": 1705312200,
  "updated_at_ms": 1705312200123,
  "retracted": false
}
{
  "pk": "mydb:main",
  "sk": "head",
  "schema": 2,
  "commit_t": 42,
  "commit": { "id": "bafybeig...commitT42", "t": 42 }
}
{
  "pk": "mydb:main",
  "sk": "index",
  "schema": 2,
  "index_t": 42,
  "index": {
    "default": { "id": "bafybeig...indexDefaultT42", "t": 42, "rev": 0 }
  }
}
{
  "pk": "mydb:main",
  "sk": "config",
  "schema": 2,
  "config_v": 2,
  "config": { "default_context_id": "bafkreih...contextCid", "index_threshold": 1000 }
}
{
  "pk": "mydb:main",
  "sk": "status",
  "schema": 2,
  "status_v": 89,
  "status": { "state": "ready", "queue_depth": 3, "last_commit_ms": 45 }
}

Ledger (unborn)

An “unborn” ledger has all 5 concern items created atomically at initialization. The head and index items have watermarks set to 0 with null payloads. The status item starts at status_v=1 with state="ready". The config item starts at config_v=0 (unborn).

Graph Source (BM25)

{
  "pk": "search:main",
  "sk": "meta",
  "schema": 2,
  "kind": "graph_source",
  "source_type": "f:Bm25Index",
  "name": "search",
  "branch": "main",
  "dependencies": ["mydb:main"],
  "created_at": 1705312200,
  "updated_at_ms": 1705312200123,
  "retracted": false
}

Additional concern items for the same pk (examples):

{
  "pk": "search:main",
  "sk": "config",
  "schema": 2,
  "config_v": 1,
  "config": { "k1": 1.2, "b": 0.75, "fields": ["title", "body"] }
}
{
  "pk": "search:main",
  "sk": "index",
  "schema": 2,
  "index_t": 42,
  "index": { "id": "bafybeig...bm25IndexRoot" }
}

Graph Source (Iceberg)

{
  "pk": "analytics:main",
  "sk": "meta",
  "schema": 2,
  "kind": "graph_source",
  "source_type": "f:IcebergSource",
  "name": "analytics",
  "branch": "main",
  "dependencies": ["mydb:main"],
  "created_at": 1705312200,
  "updated_at_ms": 1705312200123,
  "retracted": false,
  "...": "see config/index items"
}

Graph Source (JDBC - No Index)

{
  "pk": "erp:main",
  "sk": "meta",
  "schema": 2,
  "kind": "graph_source",
  "source_type": "f:JdbcSource",
  "name": "erp",
  "branch": "main",
  "dependencies": null,
  "created_at": 1705312200,
  "updated_at_ms": 1705312200123,
  "retracted": false,
  "...": "see config item; index item may be absent or have index_t=0"
}

Git-like Push Model

The nameservice follows a git-like model where:

  1. Local nameservice: Each node has a local NS for reads and local writes
  2. Upstream nameservice: The “source of truth” that accepts or rejects pushes
  3. Push operations: Local changes are pushed upstream
  4. Forward operations: Requests can be forwarded upstream without local write
┌─────────────────┐         push_head         ┌─────────────────────┐
│  Transactor     │ ────────────────────────▶ │                     │
│  (local NS)     │                           │   Upstream NS       │
└─────────────────┘                           │                     │
                                              │  - DynamoDB, or     │
┌─────────────────┐         push_index        │  - S3 + ETags, or   │
│  Indexer        │ ────────────────────────▶ │  - FS + locks, or   │
│  (local NS)     │                           │  - Service          │
└─────────────────┘                           │                     │
        ▲                                     │  Enforces:          │
        │              pull/sync              │  - Watermark rules  │
        └─────────────────────────────────────│  - Serialization    │
                                              └─────────────────────┘

Upstream NS Backend Options

BackendHow It Enforces Rules
DynamoDBConditional expressions on watermarks
S3ETags for CAS + application logic
FilesystemFile locks or single-writer process
ServiceQueue + application logic

The push interface is the same regardless of backend.


Status-based Coordination (Soft Locks)

Status can carry soft locks for coordinating distributed processes:

Lock Acquisition Flow

1. Indexer starts up
2. Read current status
3. If index_lock exists and not expired:
     → Another indexer is working, wait or skip
4. If no lock or lock expired:
     → Push status with our lock claim (status_v + 1)
     → If accepted: we own the lock, proceed
     → If rejected: someone else claimed it, back off
5. Do indexing work (periodically refresh lock by pushing status)
6. Push index update
7. Push status: clear lock, set state to ready

Lock Expiry (Crash Recovery)

If a process crashes while holding a lock:

  • The expires_at timestamp allows other processes to take over
  • No manual intervention needed
  • Typical lease duration: 5-15 minutes depending on operation

Lock Refresh

Long-running operations should periodically refresh their lock:

{
  "state": "indexing",
  "index_lock": {
    "holder": "indexer-7f3a",
    "target_t": 45,
    "acquired_at": 1705312200,
    "expires_at": 1705316100,
    "refreshed_at": 1705314000
  },
  "progress": 0.67
}

Client Subscription Model

Clients track watermarks to detect changes:

{
  "subscriptions": {
    "mydb:main": {
      "kind": "ledger",
      "commit_t": 42,
      "index_t": 42,
      "status_v": 89,
      "config_v": 2
    },
    "search:main": {
      "kind": "graph_source",
      "source_type": "f:Bm25Index",
      "index_t": 42,
      "status_v": 12,
      "config_v": 1
    }
  }
}

Change Detection

  1. Client polls or receives notification
  2. Compare watermarks: if remote.commit_t > local.commit_t
  3. Fetch only the changed concern(s)
  4. Update local cache

Subscription Granularity

Clients can subscribe to:

  • All concerns for an address
  • Specific concerns (e.g., only commit_t for a query client)
  • All addresses of a kind (e.g., all ledgers)

File-backed Nameservice Considerations

The logical concerns (head/index/status/config) can be stored in different physical layouts depending on the backend.

The file-backed and storage-backed implementations in this repo use the ns@v2 JSON-LD format (see fluree-db-nameservice/src/file.rs and fluree-db-nameservice/src/storage_ns.rs):

  • Main record: ns@v2/{name}/{branch}.json (commit/head + status + config-ish fields)
  • Index record: ns@v2/{name}/{branch}.index.json (index head pointer only)

Field names differ from the DynamoDB layout, but the semantics match:

  • logical commit_t is stored as f:t
  • logical commit.id is stored as f:ledgerCommit.@id (a CID string)
  • logical index_t is stored as f:ledgerIndex.f:t (or f:indexT for graph source index files)
  • logical index.id is stored as f:ledgerIndex.@id (a CID string, or f:indexId for graph source index files)

Layout Options

Option A: Single File (Unified)

ns@v2/{name}/{branch}.json
  • Contains all four concerns in one file
  • Simplest for reads (one fetch)
  • Requires single-writer discipline or file-level CAS

Option B: Separate Head and Index Files (Current Implementation)

ns@v2/{name}/{branch}.json        # head + status + config
ns@v2/{name}/{branch}.index.json  # index only
  • Matches current implementation
  • Allows transactor and indexer to write independently
  • 2 files to read per entity for full state
  • Trade-off: Status and config updates contend with head updates at file-lock level. Acceptable if status updates are low-frequency (state changes only, not high-frequency metrics).

Option C: Fully Separate Files (Maximum Independence)

ns@v2/{name}/{branch}.head.json
ns@v2/{name}/{branch}.index.json
ns@v2/{name}/{branch}.status.json
ns@v2/{name}/{branch}.config.json
  • Each concern in its own file
  • Maximum write independence
  • 4 files to read per entity

Use Option B (separate head/index) as the default:

  • Proven in current implementation
  • Solves the main contention issue (transactor vs indexer)
  • Reasonable read overhead (2 files)
  • Constraint: Status updates should be coarse-grained (state transitions, not per-transaction metrics). If high-frequency status updates are needed, consider Option C.

Use Option C (fully separate files) when:

  • Status updates are frequent (e.g., real-time queue depth reporting)
  • Multiple independent processes update different concerns
  • Write independence is more important than read efficiency

For queryable nameservice with many entities:

  • Read files in parallel
  • Consider in-memory caching with file-change notification
  • The 2-file layout is acceptable; 4-file layout may add too much I/O

Atomicity Mechanisms

BackendMechanismNotes
FilesystemAtomic rename (write to temp, rename)POSIX guarantees
S3ETags for CASIf-Match header
GCSGeneration numbersSimilar to ETags

File Content Format

Each file contains JSON matching the concern’s payload plus metadata:

head file ({name}/{branch}.json):

{
  "@context": { "f": "https://ns.flur.ee/db#" },
  "@id": "mydb:main",
  "@type": ["f:Database", "f:LedgerSource"],
  "f:ledger": { "@id": "mydb" },
  "f:branch": "main",
  "f:ledgerCommit": { "@id": "bafybeig...commitT42" },
  "f:t": 42,
  "f:ledgerIndex": { "@id": "bafybeig...indexRootT42", "f:t": 42 },
  "f:status": "ready"
}

index file ({name}/{branch}.index.json):

{
  "@context": { "f": "https://ns.flur.ee/db#" },
  "f:ledgerIndex": { "@id": "bafybeig...indexRootT42", "f:t": 42 }
}

Global Secondary Indexes (GSIs)

GSI1: gsi1-kind (Implemented)

GSI NamePartition KeySort KeyUse Case
gsi1-kindkindpkList all entities of a kind (ledger, graph_source)
  • Only meta items carry the kind attribute and project into the GSI
  • Projection: INCLUDE with name, branch, source_type, dependencies, retracted
  • Used by all_records() (kind=ledger) and all_vg_records() (kind=graph_source)
  • After GSI query returns meta items, BatchGetItem fetches remaining concern items (config, index) to assemble full records

Future GSIs

GSI NamePartition KeySort KeyUse Case
source-type-indexsource_typepkList all graph sources of a given type
state-indexstatus_statepkFind entities in specific state

Note on state-index: DynamoDB GSIs cannot use nested map attributes as keys. To enable this GSI:

  1. Add an optional denormalized attribute status_state (String) on the status item
  2. Update status_state whenever status.state changes
  3. Only add it if you need GSI-based queries by state

Alternative: Use Scan with FilterExpression on status.state (less efficient but no schema extension needed)

Future Considerations

Streams and Events

DynamoDB Streams can be enabled to:

  • Trigger Lambda on changes
  • Build event sourcing
  • Replicate to other regions

Multi-region

For global deployments:

  • Use DynamoDB Global Tables
  • Or regional nameservices with cross-region sync

Appendix: Attribute Reference

All items share:

AttributeTypeDescription
pkStringRecord address (name:branch)
skStringConcern discriminator (meta, head, index, status, config)
schemaNumberSchema version (always 2)

meta item

AttributeTypeDescription
kindStringledger | graph_source
nameStringBase name
branchStringBranch name
retractedBooleanSoft-delete flag
branchesNumberChild branch reference count (0 for leaf branches, omitted when 0 in JSON-LD)
dependenciesList<String> | nullGraph-source dependencies (optional)
source_typeString | nullGraph-source type (e.g., f:Bm25Index)
created_atNumberCreation timestamp (epoch seconds, optional)
updated_at_msNumberLast update time (epoch millis, optional)

meta item: Branch Attributes

For branches created via create_branch, the meta item carries an additional attribute recording the source branch:

AttributeTypeDescription
bp_sourceString | nullSource branch name (e.g., "main")

This attribute is null/absent for the original main branch. The JSON-LD format uses f:sourceBranch. The divergence point between a branch and its source is computed on demand by walking the commit chains rather than being stored.

head item (ledgers only)

AttributeTypeDescription
commit_tNumberCommit watermark (t)
commitMap | null{ id, t } (id is a ContentId CID string)

index item (ledgers + graph sources)

AttributeTypeDescription
index_tNumberIndex watermark (t)
indexMap | nullLedger index map or graph-source head pointer payload

status item (ledgers + graph sources)

AttributeTypeDescription
status_vNumberStatus change counter
statusMapStatus payload
status_stateString | nullOptional denormalized status.state for a GSI

config item (ledgers + graph sources)

AttributeTypeDescription
config_vNumberConfig change counter
configMap | nullConfig payload

Watermark Semantics Summary

WatermarkSemanticsInitial ValueUpdate Rule
commit_t= commit t0 (unborn)Strict: new > current
index_t= index t0 (unborn)Strict: new > current (admin may allow equal)
status_vCounter1 (ready)Strict: new > current
config_vCounter0 (unborn)Strict: new > current

Storage-agnostic commits and sync

Fluree uses ContentId (CIDv1) values as the primary identifiers for commits, index roots, and other immutable artifacts. This decouples the commit chain and nameservice references from any specific storage backend, enabling replication across different storage systems (filesystem, S3, IPFS, etc.) without rewriting commit data.

Pack protocol (fluree-pack-v1)

The pack protocol enables efficient bulk transfer of CAS objects between Fluree instances. Instead of fetching each commit individually (one HTTP round-trip per commit), the pack protocol streams all missing objects in a single binary response.

How it works

  1. Client sends a POST /pack/{ledger} request with want (CIDs the client needs, typically the remote head) and have (CIDs the client already has, typically the local head). Optionally includes include_indexes: true with want_index_root_id / have_index_root_id to request binary index artifacts.
  2. Server walks the commit chain from each want backward until it reaches a have, collecting all missing commits and their referenced txn blobs. When indexes are requested, computes the diff of index artifact CIDs between the want and have index roots.
  3. Server streams commit + txn objects as binary data frames (oldest-first topological order), followed by a Manifest frame and index artifact data frames when indexes are included.
  4. Client decodes frames incrementally via a BytesMut buffer, verifies integrity of each object, and writes to local CAS.

The CLI uses a peek-then-ingest pattern: it reads the Header frame first (via peek_pack_header) to inspect estimated_total_bytes, then prompts for confirmation on large transfers (>1 GiB) before consuming the rest of the stream via ingest_pack_stream_with_header.

Wire format

[Preamble: FPK1 + version(1)] [Header frame] [Data frames...] [Manifest frame]? [Data frames...]? [End frame]
FrameType byteContent
Header0x00JSON metadata: protocol, capabilities, commit count, index artifact count, estimated_total_bytes
Data0x01CID binary + raw object bytes (commit, txn blob, or index artifact)
Error0x02UTF-8 error message (terminates stream)
Manifest0x03JSON metadata for phase transitions (e.g. start of index artifact phase)
End0xFFEnd of stream

Client-side verification

Each data frame is verified before writing to CAS:

  • Commit blobs (FCV2 magic): SHA-256 of full blob via verify_commit_blob()
  • All other blobs (txn, index artifacts, config): Full-bytes SHA-256 via ContentId::verify()

Integrity failure is terminal – the entire ingest is aborted.

Fallback

When the server does not support the pack endpoint (returns 404, 405, 406, or 501), CLI commands automatically fall back to:

  • Named-remote: Paginated JSON export via GET /commits/{ledger}
  • Origin-based: CID chain walk via GET /storage/objects/{cid}

Implementation

ComponentLocation
Wire format (encode/decode), estimation constantsfluree-db-core/src/pack.rs
Server-side pack generation + index artifact difffluree-db-api/src/pack.rs
Server HTTP endpointfluree-db-server/src/routes/pack.rs
Client-side streaming ingest (ingest_pack_stream, peek_pack_header, ingest_pack_stream_with_header)fluree-db-nameservice-sync/src/pack_client.rs
Origin fetcher pack methodsfluree-db-nameservice-sync/src/origin.rs
CLI pull/clone with index transfer + size confirmationfluree-db-cli/src/commands/sync.rs
set_index_head() API methodfluree-db-api/src/commit_transfer.rs

For the full design document including graph source packing and protocol evolution, see STORAGE_AGNOSTIC_COMMITS_AND_SYNC.md (repo root).

ContentId and ContentStore

This document describes the content-addressed identity and storage layer introduced by the storage-agnostic commits design. For the full design rationale, see Storage-agnostic commits and sync.

Overview

Fluree’s storage-agnostic architecture separates identity (what something is) from location (where its bytes live). Every immutable artifact—commit, transaction payload, index root, index leaf, dictionary blob—is identified by a ContentId (a CIDv1 value) and stored/retrieved via a ContentStore trait.

Identity is a content ID; location is a local configuration detail.

ContentId

ContentId is a CIDv1 (multiformats) value that encodes three things:

  1. Version: CIDv1
  2. Multicodec: identifies the kind of the bytes (e.g., Fluree commit, index root)
  3. Multihash: identifies the hash function + digest (SHA-256)

Multicodec assignments (private-use range)

Fluree uses the multicodec private-use range for type-tagged CIDs:

Codec valueContentKindDescription
0x300001CommitCommit payload
0x300002TxnOriginal transaction payload
0x300003IndexRootBinary index root descriptor
0x300004IndexBranchIndex branch manifest
0x300005IndexLeafIndex leaf file
0x300006DictBlobDictionary artifact
0x300007DefaultContextDefault JSON-LD @context

String representation

The canonical string form is base32-lower multibase (the familiar bafy… / bafk… prefixes from IPFS/IPLD). This is the form used in JSON APIs, logs, nameservice records, and CLI output.

bafybeigdyr...   (commit CID)
bafkreihdwd...   (index root CID)

Binary representation

The compact binary form (varint version + varint codec + multihash bytes) is used for:

  • On-wire pack streams
  • Internal caches and indexes
  • Embedded references inside commit payloads

Creating a ContentId

A ContentId is derived by hashing the canonical bytes of an artifact with SHA-256, then wrapping the digest as a CIDv1 with the appropriate multicodec:

#![allow(unused)]
fn main() {
use fluree_db_core::content_id::{ContentId, ContentKind};

let bytes: &[u8] = /* canonical commit bytes */;
let cid = ContentId::from_bytes(ContentKind::Commit, bytes);

// String form for JSON/logs
let s = cid.to_string(); // "bafybeig..."

// Parse back
let parsed = ContentId::from_str(&s)?;
assert_eq!(cid, parsed);
}

ContentId in commit references

Commits reference parents and related artifacts by ContentId only—never by storage addresses:

{
  "t": 42,
  "previous": "bafybeigdyr...commitParent",
  "txn": "bafkreihdwd...txnBlob",
  "index": "bafybeigdyr...indexRoot"
}

ContentKind

ContentKind is an enum that maps 1:1 to multicodec values. It serves two purposes:

  1. Embedded in CIDs: the multicodec tag lets stores, caches, and validators identify what an object is without parsing its bytes.
  2. Routing: the ContentStore uses ContentKind to route objects to the appropriate storage tier (commit store vs index store).
#![allow(unused)]
fn main() {
pub enum ContentKind {
    Commit,
    Txn,
    IndexRoot,
    IndexBranch,
    IndexLeaf,
    DictBlob,
    DefaultContext,
}
}

Routing by kind (replaces URL parsing)

Previously, storage routing parsed URL path segments (e.g., looking for "/commit/" in an address string). With ContentId, routing is explicit:

  • Commit + Txn → commit-tier store(s)
  • IndexRoot + IndexBranch + IndexLeaf + DictBlob → index-tier store(s)

ContentStore trait

ContentStore provides content-addressed get/put operations keyed by ContentId:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContentStore: Debug + Send + Sync {
    /// Retrieve bytes by content ID
    async fn get(&self, id: &ContentId) -> Result<Vec<u8>>;

    /// Store bytes, returning the computed ContentId
    async fn put(&self, kind: ContentKind, bytes: &[u8]) -> Result<ContentId>;

    /// Check whether an object exists
    async fn has(&self, id: &ContentId) -> Result<bool>;
}
}

Relationship to Storage trait

ContentStore is the primary abstraction for immutable object access. The Storage / StorageRead / ContentAddressedWrite traits handle address-routed I/O for the underlying storage backends (filesystem, S3, etc.), while ContentStore provides the content-addressed layer on top.

Implementations

  • MemoryContentStore: In-memory HashMap<ContentId, Vec<u8>> for testing.
  • BridgeContentStore: Adapter that wraps a Storage implementation, mapping ContentIds to physical storage addresses.
  • Filesystem / S3 / IPFS: Direct implementations that store objects keyed by CID.

Layered composition

ContentStore implementations can be layered:

Local cache (filesystem)
    ↓ miss
Shared store (S3 / IPFS / shared filesystem)

Reads fall through from cache to shared store. Writes go to both (policy-configurable).

How ContentId flows through the system

Transaction path

  1. Transactor produces commit bytes
  2. ContentId::from_bytes(ContentKind::Commit, &bytes) computes the CID
  3. content_store.put(Commit, &bytes) stores the blob
  4. Nameservice head is updated: commit_head_id = cid, commit_t = t

Index path

  1. Indexer builds binary index, producing root descriptor bytes
  2. ContentId::from_bytes(ContentKind::IndexRoot, &root_bytes) computes the CID
  3. All artifacts (branches, leaves, dicts) are stored via content_store.put()
  4. Nameservice index head is updated: index_head_id = cid, index_t = t

Query path

  1. Query engine reads nameservice to get index_head_id
  2. content_store.get(&index_head_id) fetches the index root
  3. Index root references branches/leaves/dicts by their ContentIds
  4. Each artifact is fetched via content_store.get() (with caching)

Replication path (clone/pull/push)

  1. Client fetches remote nameservice heads (ContentIds + watermarks)
  2. Client sends have[] / want[] roots to server
  3. Server walks commit chain and (optionally) index graph to compute missing objects
  4. Missing objects streamed as (ContentId, bytes) pairs
  5. Client stores objects in local ContentStore and advances local nameservice heads

No address rewriting is needed because commits contain no storage addresses.

Implementation status

  • ContentId type and ContentKind enum: fluree-db-core/src/content_id.rs
  • ContentStore trait + MemoryContentStore + bridge adapter: fluree-db-core/src/storage.rs
  • Commit and CommitRef use ContentId for all references (index pointers are tracked exclusively via nameservice, not embedded in commits)
  • Nameservice records use head_commit_id / index_head_id as ContentId values
  • IndexRoot (FIR6) references all artifacts by ContentId
  • Transact and indexer paths use ContentStore for all object I/O

Binary index format (leaf / leaflet / dictionaries)

This document describes the on-disk / blob-store formats used by Fluree’s binary indexes: the branch → leaf → leaflet hierarchy for fact indexes, and the dictionary artifacts used to translate between IRIs/strings and compact numeric IDs.

The intent is to make the formats easy to reason about (for debugging and tooling) and to highlight why leaf files contain multiple leaflets: it materially improves performance and cost characteristics on blob/object storage by reducing object counts and request rates while preserving fine-grained decompression and caching at the leaflet level.

Overview

A binary index build produces:

  • Per-graph, per-sort-order fact indexes:
    • a content-addressed branch manifest (FBR3, file extension .fbr)
    • a set of content-addressed leaf files (FLI3, file extension .fli)
    • each leaf contains multiple leaflets (compressed blocks with independently compressed regions)
  • Shared dictionary artifacts:
    • small dictionaries (predicates, graphs, datatypes, languages) embedded in the index root (CAS) and/or persisted as flat files in local builds
    • large dictionaries (subjects, strings) stored as CoW single-level B-tree-like trees (a branch manifest DTB1 + multiple leaf blobs DLF1/DLR1)
  • Manifests / roots that describe how to load the above either from a local directory layout or from the content store via IndexRoot (FIR6 binary format, CID-based).

Fact indexes exist in up to four sort orders (see RunSortOrder):

  • SPOT: ((g, s, p, o, dt, t, op))
  • PSOT: ((g, p, s, o, dt, t, op))
  • POST: ((g, p, o, dt, s, t, op))
  • OPST: ((g, o, dt, p, s, t, op))

Design goals

  • Blob-store efficiency: keep object counts low and object sizes in a “healthy” range for S3/GCS/Azure-like stores, avoiding “many tiny objects” request overhead.
  • Fast routing: branch manifest enables binary search routing to the relevant leaf range(s).
  • Cheap decompression: leaflets are internally structured so query paths can decompress only what they need (e.g., Region 1 to filter before paying for Region 2).
  • Content-addressed immutability: leaves/branches/dict leaves can be cached aggressively and safely, because their CAS address (or content hash filename) uniquely identifies content.
  • Simple versioning: each binary artifact begins with a magic + version and can be rejected early if incompatible.

Terminology

  • Leaflet: a compressed block of rows (default build target: leaflet_rows = 25_000).
  • Leaf: a container of multiple leaflets (default: leaflets_per_leaf = 10) plus a directory for random access to its leaflets.
  • Branch manifest: maps key ranges to leaf files; used for routing.
  • Region: a separately compressed section inside a leaflet.
  • Dictionary tree: a DTB1 branch + DLF1/DLR1 leaves for large keyspaces (subjects/strings).
  • ContentId: a CIDv1 value that uniquely identifies a content-addressed artifact by its hash and type. See ContentId and ContentStore.

Physical layout (local build output)

When built to a filesystem directory (see IndexBuildConfig), the output layout is:

index/
  index_manifest_spot.json
  index_manifest_psot.json
  index_manifest_post.json
  index_manifest_opst.json
  graph_<g_id>/
    spot/
      <branch_hash>.fbr
      <leaf_hash_0>.fli
      <leaf_hash_1>.fli
      ...
    psot/
      ...
    post/
      ...
    opst/
      ...

The .fbr and .fli files are content-addressed by SHA-256 hex of their bytes (the filename is the hash). index_manifest_<order>.json is a small routing manifest that points to the per-graph directory and branch hash.

Per-order index manifest (index_manifest_<order>.json)

The per-order manifest is JSON and summarizes all graphs for a sort order:

  • total_rows: total indexed asserted facts for that order
  • max_t: max transaction t in the indexed snapshot
  • graphs[]: g_id, leaf_count, total_rows, branch_hash, and directory (relative path)

Root descriptor (CAS): IndexRoot (FIR6)

When publishing an index to nameservice / CAS, the canonical entrypoint is the FIR6 root (IndexRoot, binary wire format, magic bytes FIR6).

Key properties:

  • CID references for all artifacts (dicts, branches, leaves).
  • Deterministic binary encoding so the root itself is suitable for content hashing to derive its own ContentId.
  • Tracks index_t (max transaction covered) and base_t (earliest time for which Region 3 history is valid).
  • Embeds predicate ID mapping and namespace prefix table inline, so query-time predicate IRI → p_id translation does not require fetching a redundant predicate dictionary blob.
  • Embeds small dictionaries (graphs, datatypes, languages) inline, so query-time graph/dt/lang resolution does not require fetching tiny dict blobs (important for S3 cold starts).
  • Default graph routing is inline: leaf entries (first/last key, row count, leaf CID) are embedded directly, avoiding an extra branch fetch for the common single-graph case.
  • Named graph routing uses branch CID pointers: larger multi-graph setups reference branch manifests by CID.
  • Optional binary sections for stats, schema, prev_index (GC chain), garbage manifest, and sketch (HLL).
  • Import-only performance hint: IndexRoot.lex_sorted_string_ids indicates whether StringId assignment preserves lexicographic UTF-8 byte order of strings (true for bulk imports). Query execution can use this to avoid materializing simple string values during ORDER BY comparisons. This flag must be cleared on the first post-import write because incremental dictionary appends break the invariant. When the flag is absent (older roots) or false, query execution must assume no lexical ordering.

At a high level the root contains:

  • Inline small dictionaries (embedded in the binary root):
    • graph_iris[] (dict_index → graph IRI; g_id = dict_index + 1)
    • datatype_iris[] (dt_id → datatype IRI)
    • language_tags[] (lang_id-1 → tag string; lang_id = index + 1, 0 = “no tag”)
  • Dictionary ContentIds (CAS artifacts):
    • tree blobs: subject/string forward & reverse (DTB1 branch + DLF1/DLR1 leaves)
    • optional per-predicate numbig arenas
    • optional per-predicate vector arenas (manifest + shards)
  • Default graph routing (inline leaf entries per sort order)
  • Named graph routing (branch CIDs per sort order per graph)

Branch manifest (FBR3, .fbr)

A branch manifest is a single-level index mapping key ranges to leaf files. It is written per graph per order and read via binary search to route a lookup/range scan.

File format

[BranchHeader: 16 bytes]
  magic: "FBR3" (4B)
  version: u8
  _pad: [u8; 3]
  leaf_count: u32
  _reserved: u32
[LeafEntries: leaf_count × 104 bytes]
  first_key: key bytes (44B, little-endian)  [1]
  last_key:  key bytes (44B, little-endian)  [1]
  row_count: u64
  path_offset: u32
  path_len: u16
  _pad: u16
[PathTable]
  Concatenated UTF-8 relative paths (typically "<leaf_hash>.fli")

Notes:

  • first_key and last_key use the same 44-byte key wire encoding produced by the index builder (see footnote [1]).
  • The path table stores relative filenames; on read, paths are resolved against the .fbr’s directory.
  • In local builds, paths are <leaf_hash>.fli to match the content-addressed leaf filenames.

[1] Key encoding note (internal): the 44-byte key is the RunRecord wire layout used by the import/index-build pipeline and stored here only for routing. It is an internal build artifact detail (not a core runtime fact type).

Leaf file (FLI3, .fli)

A leaf file groups multiple leaflets into a single blob, and includes a small directory so leaflets can be accessed without scanning the entire file.

File format

[LeafHeader: variable size]
  magic: "FLI3" (4B)
  version: u8          (currently 1)
  order: u8
  dt_width: u8         (currently 1; may widen to 2)
  p_width: u8          (2=u16, 4=u32)
  total_rows: u64
  first_key: SortKey (28B)
  last_key:  SortKey (28B)
  [LeafletDirectory: leaflet_count × 40B]    (v2: 28B, lacks first_o_*)
    offset: u64
    compressed_len: u32
    row_count: u32
    first_s_id: u64
    first_p_id: u32
    first_o_kind: u8   (v3+)
    _pad: [u8; 3]      (v3+)
    first_o_key: u64   (v3+)
[LeafletData: concatenated encoded leaflets]

The v3 leaflet directory adds first_o_kind and first_o_key to each entry. These fields enable leaflet-boundary skip-decoding: if two adjacent leaflet directory entries share the same (p_id, o_kind, o_key), the entire earlier leaflet is guaranteed to contain only that (p, o) combination. Fast-path COUNT + GROUP BY operators use this property to count rows by row_count without decompressing Region 1, which significantly reduces CPU and I/O for large predicate scans. v2 leaves (which lack these fields) are still readable but always require full leaflet decoding.

SortKey (leaf routing key)

SortKey is a compact 28-byte key stored in leaf headers:

g_id: u32
s_id: u64
p_id: u32
dt:  u16
o_kind: u8
_pad: u8
o_key: u64

SortKey exists to reduce leaf header overhead; the branch manifest uses full RunRecord boundaries. It also intentionally omits t, op, lang_id, and i — leaf header keys are useful for coarse metadata and diagnostics, while precise routing is done via the branch’s full RunRecord ranges.

Why “leaf contains leaflets” (blob-store optimization)

If every leaflet were its own object:

  • range scans and joins would issue many more GETs (request overhead dominates)
  • caches would be pressured by object metadata overhead and higher churn

By grouping N leaflets into one leaf object:

  • we reduce object count and request rate roughly by a factor of N
  • we still keep leaflet-sized “micro-partitions” internally for:
    • selective decompression (region-by-region)
    • caching hot leaflets (decoded) independent of unrelated ones
    • future optimizations like ranged reads (leaflet offsets are explicit)

The default build targets (leaflet_rows = 25_000, leaflets_per_leaf = 10) yield a leaf that is large enough to amortize object-store overhead but still small enough to cache and move efficiently.

Leaflet format (compressed block inside a leaf)

A leaflet is a compressed block of rows containing three regions. Each region is independently zstd-compressed.

Leaflet header (fixed 61 bytes)

row_count: u32
region1_offset: u32
region1_compressed_len: u32
region1_uncompressed_len: u32
region2_offset: u32
region2_compressed_len: u32
region2_uncompressed_len: u32
region3_offset: u32
region3_compressed_len: u32
region3_uncompressed_len: u32
first_s_id: u64
first_p_id: u32
first_o_kind: u8
first_o_key: u64

Regions

  • Region 1 (core columns): order-dependent layout optimized for scan/join filtering.
    • includes an RLE-encoded “primary” column (e.g., s_id in SPOT)
    • stores the other core columns as dense arrays
    • p_id may be stored as u16 or u32 depending on dictionary cardinality (p_width)
  • Region 2 (metadata columns): values needed to reconstruct full flakes (datatype, transaction time, etc.).
    • stored in a layout that supports sparse lang_id and i without per-row overhead
    • dt is stored as u8 today (dt_width = 1) and may widen to u16
  • Region 3 (history journal): optional operation log to support time-travel semantics from base_t onward.
    • stored as a sequence of fixed-size entries in reverse chronological order (newest first)

Region 1 layouts (uncompressed)

Region 1’s uncompressed bytes vary by sort order:

  • SPOT: RLE(s_id:u64), p_id[p_width], o_kind[u8], o_key[u64]
  • PSOT: RLE(p_id:u32), s_id[u64], o_kind[u8], o_key[u64]
  • POST: RLE(p_id:u32), o_kind[u8], o_key[u64], s_id[u64]
  • OPST: RLE(o_key:u64), p_id[p_width], s_id[u64]
    • OPST leaflets are type-homogeneous (segmented by o_type), so the per-row object type column can be omitted and stored as a constant in the leaflet directory entry. When a leaflet contains mixed types in other orders, o_type is stored as a per-row column.

RLE encoding is:

run_count: u32
[(key, run_len)] × run_count

with (key=u64, run_len=u32) or (key=u32, run_len=u32) depending on the field.

Region 2 layout (uncompressed)

dt: [dt_width bytes] × row_count
t:  [i64] × row_count
lang_bitmap:  u8 × ceil(row_count/8)
lang_values:  u16 × popcount(lang_bitmap)
i_bitmap:     u8 × ceil(row_count/8)
i_values:     i32 × popcount(i_bitmap)
  • lang_id is 0 when absent; otherwise stored in lang_values keyed by bitmap position.
  • i uses ListIndex::none() (sentinel) when absent; otherwise stored sparsely.

Region 3 layout (uncompressed)

Region 3 is an operation journal stored newest-first:

entry_count: u32
[Region3Entry; entry_count]    // 37 bytes per entry

Region3Entry wire layout (37 bytes):

s_id: u64
p_id: u32
o_kind: u8
o_key: u64
t_signed: i64      // positive = assert, negative = retract, abs() = t
dt: u16
lang_id: u16
i: i32

Dictionary artifacts

Binary indexes store facts in numeric-ID form. Dictionaries are required to:

  • translate query inputs (IRIs, strings) to numeric IDs for scans
  • decode numeric IDs back to user-visible values when returning flakes

Small flat dictionaries (FRD1)

Several dictionaries use a simple “count + length-prefixed UTF-8” format:

magic: "FRD1" (4B)
count: u32
for each entry:
  len: u32
  utf8_bytes: [u8; len]

This format is used for predicate-like dictionaries. In local builds these are written as flat files (e.g., graphs.dict, datatypes.dict, languages.dict), but in CAS publishes (FIR6 root) these small dictionaries are embedded inline in the binary root.

Legacy forward files + index (FSI1) (primarily build-time)

Some build paths still write a forward file (*.fwd) plus a separate index (*.idx):

FSI1 index format:

magic: "FSI1" (4B)
count: u32
offsets: [u64] × count
lens:    [u32] × count

The forward file itself is a raw concatenation of bytes; access is via (offset,len) from the index.

Large dictionaries as CoW trees (DTB1 + leaf blobs)

Subjects and strings are large enough that we represent them as single-level CoW trees:

  • Branch: DTB1 mapping key ranges to leaf ContentIds
  • Leaves:
    • forward leaf (DLF1): numeric ID → value bytes
    • reverse leaf (DLR1): key bytes → numeric ID

Dictionary branch (DTB1)

[magic: 4B "DTB1"]
[leaf_count: u32]
[offset_table: u32 × leaf_count]  // byte offset of each leaf entry
[leaf entries...]
  entry :=
    [first_key_len: u32] [first_key_bytes]
    [last_key_len: u32]  [last_key_bytes]
    [entry_count: u32]
    [content_id_len: u16]   [content_id_bytes]

Keys are treated as raw bytes and compared lexicographically. For forward trees keyed by numeric ID, the branch uses 8-byte big-endian keys (so lexical order matches numeric order).

Forward dict leaf (DLF1)

[magic: 4B "DLF1"]
[entry_count: u32]
[offset_table: u32 × entry_count]
[data section]
  entry := [id: u64 LE] [value_len: u32] [value_bytes]

Reverse dict leaf (DLR1)

[magic: 4B "DLR1"]
[entry_count: u32]
[offset_table: u32 × entry_count]
[data section]
  entry := [key_len: u32] [key_bytes] [id: u64 LE]

Subject reverse key format is:

[ns_code: u16 BE][suffix bytes]

The u16 big-endian prefix ensures that lexicographic byte comparisons match logical (ns_code, suffix) ordering.

Endianness and encoding conventions

  • Numeric fields in file formats are little-endian, unless explicitly stated otherwise.
  • Subject reverse keys embed ns_code in big-endian for byte-sort correctness.
  • Compression is currently zstd via independent region compression within a leaflet.
  • Fact keys are keyed by numeric IDs; ID assignment is provided by dictionary artifacts and/or the root.

Integrity, caching, and lifecycle

  • Leaf and branch filenames (local) are derived from SHA-256 content hashes; remote references use ContentId (CIDv1).
  • Content-addressed artifacts are immutable; caches can key by ContentId.
  • IndexRoot (FIR6) provides a GC chain (prev_index) and an optional garbage manifest pointer to support retention-based cleanup of replaced artifacts.

Versioning notes

  • Fact artifacts:
    • branch: magic FBR3, version 1
    • leaf: magic FLI3, version 1
  • Dictionary tree artifacts:
    • branch: magic DTB1
    • leaves: magic DLF1 / DLR1
  • Small dict blobs: magic FRD1

When adding new fields, prefer:

  • bumping the per-file version byte (when present), and
  • keeping old readers strict (fail fast on unsupported versions) to avoid silent corruption.

Namespace allocation and fallback modes

Fluree encodes IRIs as compact SIDs: a (ns_code, local) pair where:

  • ns_code is a u16 namespace code that identifies an IRI prefix
  • local is the remaining suffix (bytes) after removing the matched prefix

The database maintains a namespace table (LedgerSnapshot.namespace_codes: ns_code -> prefix string). That table is embedded in the published index root and is loaded whenever a LedgerSnapshot is opened.

This document describes how Fluree chooses a namespace prefix for an IRI, and how it mitigates datasets that would otherwise allocate an excessive number of distinct namespace prefixes.

Goals

  • Keep declared namespaces intact: if a dataset declares @prefix foo: <...>, we want IRIs in that namespace to use that exact prefix, not a derived/split prefix.
  • Stable behavior across writes: after importing an “outlier” dataset, subsequent transactions should continue using the same fallback rules for previously unseen IRIs (e.g. new hosts), avoiding regression back to finer-grained splitting.
  • Contain namespace explosion: avoid allocating one namespace code per highly-specific leaf (e.g. splitting on the last / for IRIs whose paths are effectively unique).

Core rule: declared-prefix trie match wins

Namespace resolution is trie-first:

  1. Load all known prefixes (predefined defaults + DB namespace table) into a byte-level trie.
  2. For each IRI, perform a longest-prefix match.
  3. If a match is found, emit Sid(ns_code, iri[prefix_len..]) and do not run fallback logic.

Only IRIs with no matching prefix fall through to the fallback splitter.

Implementation: fluree-db-transact/src/namespace.rs

  • NamespaceRegistry::sid_for_iri (transactions, serial paths)
  • SharedNamespaceAllocator::sid_for_iri (parallel bulk import)

Fallback split modes (only for unmatched IRIs)

Fluree uses a small set of fallback “splitters” that derive (prefix, local) for IRIs that do not match any known prefix.

The active fallback behavior is represented by NsFallbackMode:

  • LastSlashOrHash (default): split on the last / or # (prefix is inclusive)
  • CoarseHeuristic (outlier mitigation):
    • http(s): usually scheme://host/<seg1>/
    • special-case: DBLP-style .../pid/<digits>/ buckets may keep 2 segments
    • non-http(s) with : but no / or #: split at the 2nd : when present (e.g. urn:isbn:), else the 1st :
  • HostOnly (“fallback to the fallback”):
    • http(s): scheme://host/
    • non-http(s) with : but no / or #: split at the 1st :
    • else: last-slash-or-hash

Implementation: fluree-db-transact/src/namespace.rs

Bulk import: streaming preflight + dynamic mitigation

For large Turtle streaming imports, Fluree attempts to detect “namespace explosion” early without an extra I/O pass:

  1. StreamingTurtleReader samples bounded byte windows within the first chunk region and counts distinct prefixes under LastSlashOrHash.
  2. If the sample exceeds a budget (NS_PREFLIGHT_BUDGET, currently 255), the reader publishes a preflight result recommending mitigation.
  3. The import forwarder enables CoarseHeuristic on the shared allocator before parsing begins (so the earliest allocations are already coarse).
  4. If allocations under CoarseHeuristic still grow beyond the u8-ish threshold (>255), the shared allocator switches to HostOnly so new, unseen hosts do not allocate deeper-than-host namespaces.

Implementation:

  • Preflight detector: fluree-graph-turtle/src/splitter.rs
  • Policy application: fluree-db-api/src/import.rs
  • Runtime switch: SharedNamespaceAllocator::get_or_allocate in fluree-db-transact/src/namespace.rs

Transactions after import: preventing regression for unseen IRIs

Bulk import can upgrade fallback behavior at runtime (shared allocator). For subsequent normal transactions, we also need “outlier mode” to persist so new IRIs do not regress to LastSlashOrHash.

Fluree derives this from the DB’s namespace table at open time:

  • When a LedgerSnapshot is opened, NamespaceRegistry::from_db(db) loads db.namespace_codes.
  • If the DB has already allocated namespace codes beyond the u8-ish threshold (>255), the registry sets its fallback mode to HostOnly.

That means a new IRI like:

http://some-unseen-host/blah/123/456

will allocate (if needed) at:

http://some-unseen-host/

instead of falling back to a finer last-slash split.

Implementation: NamespaceRegistry::from_db and NamespaceRegistry::sid_for_iri in fluree-db-transact/src/namespace.rs

Notes and trade-offs

  • HostOnly can still result in many namespaces if a dataset genuinely contains many distinct hosts (one per host), but it prevents deeper fragmentation that is common in path-heavy IRIs.
  • The OVERFLOW namespace code is a sentinel used when u16 codes are exhausted; it is not a fallback mode. Overflow SIDs store the full IRI as the SID name.

Ontology imports (f:schemaSource + owl:imports)

Reasoning in Fluree needs to see a ledger’s ontology — class and property hierarchies, OWL axioms — even when those triples don’t live in the same graph as the instance data being queried. This document describes how that binding is configured, resolved, and plumbed into the reasoning pipeline.

Topics:

  • Config-layer contract (f:schemaSource, f:followOwlImports, f:ontologyImportMap).
  • Resolution algorithm for the owl:imports closure.
  • SchemaBundleOverlay — how the resolved closure is presented to the reasoner without changing reasoner internals.
  • Caching, error semantics, and the schema-triple whitelist.

Related docs:

Configuration

Reasoning config is declared in the ledger’s config graph (g_id=2), on the f:LedgerConfig resource’s f:reasoningDefaults. Three fields drive ontology resolution:

@prefix f:    <https://ns.flur.ee/db#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .

GRAPH <urn:fluree:myapp:main#config> {
  <urn:myapp:config> a f:LedgerConfig ;
    f:reasoningDefaults <urn:myapp:config:reasoning> .

  <urn:myapp:config:reasoning>
    f:reasoningModes ( "rdfs" "owl2-rl" ) ;
    f:schemaSource <urn:myapp:config:schema-ref> ;
    f:followOwlImports true ;
    f:ontologyImportMap <urn:myapp:config:bfo-binding> .

  <urn:myapp:config:schema-ref> a f:GraphRef ;
    f:graphSource <urn:myapp:config:schema-source> .
  <urn:myapp:config:schema-source>
    f:graphSelector <http://example.org/ontology/core> .

  <urn:myapp:config:bfo-binding>
    f:ontologyIri <http://purl.obolibrary.org/obo/bfo.owl> ;
    f:graphRef   <urn:myapp:config:bfo-ref> .
  <urn:myapp:config:bfo-ref> a f:GraphRef ;
    f:graphSource <urn:myapp:config:bfo-source> .
  <urn:myapp:config:bfo-source>
    f:graphSelector <http://example.org/ontology/local/bfo> .
}

Field reference:

FieldTypeMeaning
f:schemaSourcef:GraphRefStarting graph for schema extraction. When absent, reasoning uses the default graph directly.
f:followOwlImportsxsd:booleanWhen true, resolve the transitive closure of owl:imports triples starting from f:schemaSource. When absent or false, the bundle contains only the starting graph.
f:ontologyImportMaplist of OntologyImportBindingMapping table from external ontology IRIs to local graphs. Consulted when an owl:imports IRI doesn’t match a named graph in the current ledger.

An OntologyImportBinding has two fields:

  • f:ontologyIri — the IRI that appears in owl:imports statements.
  • f:graphRef — a nested f:GraphRef identifying the local graph.

The GraphRef shape supported for f:schemaSource and f:ontologyImportMap.graphRef is the same-ledger shape: f:graphSelector naming a local named graph, f:defaultGraph, or a registered graph IRI. References are resolved at the query’s effective to_t — every named graph in a Fluree ledger shares the ledger’s monotonic t, so the entire closure is consistent at a single point in time without per-import bookkeeping.

Resolution algorithm

For each owl:imports <X> triple discovered while walking the closure, the resolver (fluree_db_api::ontology_imports::resolve_schema_bundle) applies this order:

  1. Named-graph match — if <X> is registered as a graph IRI in the current ledger’s [GraphRegistry], resolve to that GraphId.
  2. Mapping-table fallback — if <X> appears in f:ontologyImportMap, resolve via the bound GraphSourceRef.
  3. Strict error — otherwise, fail the query with ApiError::OntologyImport. There is no silent skip.

The walk is BFS, deduplicated by resolved GraphId, and cycle-safe by construction (we only push unseen IDs onto the queue). The result is a ResolvedSchemaBundle { ledger_id, to_t, sources: Vec<GraphId> }.

System graphs are off-limits

Imports resolving to CONFIG_GRAPH_ID (g_id=2) or TXN_META_GRAPH_ID (g_id=1) are rejected — those graphs are structurally reserved and would leak framework triples into reasoning. The guard sits in the single resolve_local_graph_source chokepoint, so every resolution path (direct graph-IRI match, f:ontologyImportMap entry, f:schemaSource selector) is covered.

owl:imports discovery is subject-wildcarded

Every ?s owl:imports ?o triple in a schema graph is treated as authoritative, regardless of whether ?s is typed owl:Ontology. This is broader than strict OWL 2 (which restricts owl:imports to the ontology header) and matches real-world OWL inputs that rely on file-level provenance. The resolution layer’s strictness still applies: a stray owl:imports triple that doesn’t map to a local graph fails the query rather than silently expanding the closure.

Reasoning-disabled queries don’t trigger resolution

Queries that opt out of reasoning ("reasoning": "none") skip bundle resolution entirely — a broken ontology import in the ledger’s config shouldn’t produce errors for a non-reasoning workload. The short-circuit lives in attach_schema_bundle (both the single-view and dataset paths).

Projecting the bundle into reasoning

RDFS and OWL extraction code reads schema triples out of the default graph (g_id=0). The resolver feeds that code via a SchemaBundleOverlay that projects whitelisted triples from every bundle source onto g_id=0, so the reasoner sees the full closure without being aware of it.

The projection happens in two phases:

  1. Materialize. build_schema_bundle_flakes runs targeted reads against every source graph — one PSOT scan per schema predicate and one OPST scan per schema class — and collects the matching flakes into per-index sorted arrays (SPOT / PSOT / POST / OPST). Reads go through the normal range_with_overlay path, so both committed index data and novelty are visible.
  2. Overlay. SchemaBundleOverlay::new(base_overlay, flakes) wraps the query’s base overlay. For g_id != 0 it delegates straight to the base. For g_id == 0 it emits a linear merge of base flakes and bundle flakes in index order.

The reasoner sees: base default-graph flakes ∪ projected schema flakes, presented as a single ordered stream at g_id=0. Reasoner code is unmodified.

Schema-triple whitelist

Only the following predicates are eligible for projection:

  • RDFS: rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range
  • OWL: owl:inverseOf, owl:equivalentClass, owl:equivalentProperty, owl:sameAs, owl:imports

And rdf:type triples are projected only when the object is one of: owl:Class, owl:ObjectProperty, owl:DatatypeProperty, owl:SymmetricProperty, owl:TransitiveProperty, owl:FunctionalProperty, owl:InverseFunctionalProperty, owl:Ontology, rdf:Property.

Anything else in an import graph — in particular, instance data — does not surface in the reasoner’s view. See fluree_db_core::{is_schema_predicate, is_schema_class} for the canonical checks and fluree-db-api/tests/it_reasoning_imports.rs::instance_data_in_schema_graph_does_not_leak for the regression test.

Caching

global_schema_bundle_cache() is a process-wide moka::sync::Cache keyed by:

  • ledger_id: Arc<str>
  • to_t: i64
  • starting_g_id: GraphId (the resolved f:schemaSource)
  • follow_imports: bool

Because config lives in the same ledger (g_id=2) and any config change advances t, the to_t dimension is sufficient to express “config version” — there is no separate config_epoch key, and no explicit invalidation logic. Stale entries age out via LRU.

The cache stores the resolution result (Vec<GraphId>); the projected flake arrays are rebuilt per query. Materialization is cheap relative to reasoning itself, and keeping the cached value small lets many entries coexist for many ledgers without memory pressure.

Error semantics

ApiError::OntologyImport is raised when the configured closure is invalid. Every message identifies the offending resource and suggests remediation. Queries fail rather than silently returning reduced results, so broken ontology references surface early. Sources of this error:

  • An owl:imports <X> that doesn’t match a local named graph and has no f:ontologyImportMap entry.
  • A resolution that would land on a reserved system graph (config or txn-meta), whether via direct graph-IRI match, mapping table, or f:schemaSource selector.
  • A GraphRef that targets a different ledger, uses f:atT, or carries a f:trustPolicy / f:rollbackGuard. The bundle is resolved at the query’s single to_t, same-ledger scope only, and accepting these fields silently would create a gap between declared intent and actual behavior.

Wiring at query time

Fluree::query(&db, ...) (and the dataset-query counterpart) call build_executable_for_viewattach_schema_bundle on every query. The attach step:

  1. Reads db.resolved_config().reasoning. If there is no f:schemaSource, returns immediately — the legacy default-graph path applies unchanged.
  2. Calls resolve_schema_bundle for the closure, consulting the cache.
  3. Materializes SchemaBundleFlakes via build_schema_bundle_flakes.
  4. Sets executable.options.schema_bundle so prepare_execution wraps db.overlay in a SchemaBundleOverlay for the reasoning_prep block.

Downstream, schema_hierarchy_with_overlay, reason_owl2rl, and Ontology::from_db_with_overlay all receive the same wrapped overlay and see the full closure on g_id=0 reads.

Testing

The acceptance suite lives in fluree-db-api/tests/it_reasoning_imports.rs and covers:

  • Same-ledger auto resolution of a named schema source.
  • Transitive A → B with a subclass edge in B.
  • Mapping table fallback for external IRIs.
  • Unresolved imports surface as ApiError::OntologyImport.
  • Cycle A → B → A terminates and still yields the correct closure.
  • Mapping entries that would target a reserved system graph are rejected.
  • "reasoning": "none" queries skip resolution entirely (no spurious errors from unrelated config).
  • f:atT on a GraphRef is rejected with a clear message.
  • Instance data in the schema graph does not leak into query results.
  • End-to-end OWL2-RL rule firing through a transitive import: owl:TransitiveProperty, owl:inverseOf, and rdfs:domain axioms declared in an imported graph produce the expected entailments against instance data in the default graph.

Module-level unit tests cover the cache keys, empty-bundle passthrough, and non-default-graph delegation.

Storage Traits Design

This document describes the storage trait architecture in Fluree DB, explaining the design rationale and providing guidance for implementing new storage backends.

Overview

Fluree uses a layered storage abstraction that separates:

  • Content-addressed access (fluree-db-core): The ContentStore trait provides get/put/has operations keyed by ContentId (CIDv1). This is the primary interface for all immutable artifact access (commits, index roots, leaves, dicts).
  • Physical storage traits (fluree-db-core): Runtime-agnostic storage operations (StorageRead, StorageWrite, ContentAddressedWrite) with standard Result<T> error handling. These handle the physical I/O layer beneath ContentStore.
  • Extension traits (fluree-db-nameservice): Nameservice-specific operations with StorageExtResult<T> for richer error semantics (CAS operations, pagination, etc.).

See ContentId and ContentStore for the content-addressed identity model.

Quick Start: The Prelude

For convenient imports, use the storage prelude:

#![allow(unused)]
fn main() {
use fluree_db_core::prelude::*;

// Now you have access to:
// - Storage, StorageRead, StorageWrite, ContentAddressedWrite (traits)
// - MemoryStorage, FileStorage (implementations)
// - ContentKind, ContentWriteResult, ReadHint (types)

async fn example<S: Storage>(storage: &S) -> Result<()> {
    let bytes = storage.read_bytes("some/address").await?;
    storage.write_bytes("other/address", &bytes).await?;
    Ok(())
}
}

For API consumers, fluree-db-api re-exports all storage traits:

#![allow(unused)]
fn main() {
use fluree_db_api::{Storage, StorageRead, MemoryStorage};
}

Trait Hierarchy

              ┌──────────────────────┐
              │    ContentStore      │  get(ContentId), put(ContentKind, bytes), has(ContentId)
              └──────────────────────┘
                    (primary interface for immutable artifacts)

              ┌─────────────────┐
              │   StorageRead   │  read_bytes, exists, list_prefix
              └────────┬────────┘
                       │
              ┌────────┴────────┐
              │  StorageWrite   │  write_bytes, delete
              └────────┬────────┘
                       │
        ┌──────────────┴──────────────┐
        │   ContentAddressedWrite     │  content_write_bytes[_with_hash]
        └──────────────┬──────────────┘
                       │
              ┌────────┴────────┐
              │     Storage     │  (marker trait - blanket impl)
              └─────────────────┘
                    (physical I/O layer)

ContentStore is the content-addressed layer that sits above the physical storage traits. It maps ContentId values to physical storage locations via the underlying Storage implementation.

ContentStore (fluree-db-core)

The ContentStore trait is the primary interface for accessing immutable, content-addressed artifacts (commits, index roots, leaves, dictionaries, etc.).

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContentStore: Debug + Send + Sync {
    /// Retrieve bytes by content ID
    async fn get(&self, id: &ContentId) -> Result<Vec<u8>>;

    /// Store bytes, returning the computed ContentId
    async fn put(&self, kind: ContentKind, bytes: &[u8]) -> Result<ContentId>;

    /// Check whether an object exists
    async fn has(&self, id: &ContentId) -> Result<bool>;
}
}

Design notes:

  • ContentId is a CIDv1 value encoding the hash function, digest, and content kind (multicodec). See ContentId and ContentStore.
  • ContentKind enables routing to different storage tiers (commit store vs index store) without parsing URL paths.
  • put computes the content hash and returns the derived ContentId.
  • Implementations include MemoryContentStore (for testing) and BridgeContentStore (adapts a Storage backend).

Physical Storage Traits (fluree-db-core)

The physical storage traits handle raw byte I/O against storage backends (filesystem, S3, memory). ContentStore implementations typically wrap these.

StorageRead

Read-only storage operations. Implement this for any storage that can retrieve data.

#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageRead: Debug + Send + Sync {
    /// Read raw bytes from an address
    async fn read_bytes(&self, address: &str) -> Result<Vec<u8>>;

    /// Read with a hint for content type optimization
    /// Default implementation ignores the hint
    async fn read_bytes_hint(&self, address: &str, hint: ReadHint) -> Result<Vec<u8>> {
        self.read_bytes(address).await
    }

    /// Check if an address exists
    async fn exists(&self, address: &str) -> Result<bool>;

    /// Read a `[start, end)` byte range from an address.
    /// Default implementation fetches the full object and slices; backends
    /// that support native range reads (S3, HTTP) should override.
    async fn read_byte_range(&self, address: &str, range: std::ops::Range<u64>)
        -> Result<Vec<u8>>;

    /// List all addresses with a given prefix
    async fn list_prefix(&self, prefix: &str) -> Result<Vec<String>>;

    /// List addresses under a prefix together with byte sizes.
    /// Default implementation returns an error indicating the backend does
    /// not support cheap metadata listing. Backends with native list+size
    /// (S3 `list_objects_v2`, GCS, etc.) should override.
    async fn list_prefix_with_metadata(&self, prefix: &str)
        -> Result<Vec<RemoteObject>>;

    /// Resolve a CAS address to a local filesystem path, if available.
    fn resolve_local_path(&self, address: &str) -> Option<PathBuf> { None }
}

/// `(address, size)` pair returned by `list_prefix_with_metadata`.
pub struct RemoteObject {
    pub address: String,
    pub size_bytes: u64,
}
}

Design notes:

  • read_bytes_hint enables optimizations like returning pre-encoded flakes for leaf nodes
  • read_byte_range allows partial reads against backends with native HTTP/S3 range support; the default impl is correct but does N full-object fetches for N range reads
  • list_prefix is essential for garbage collection and administrative operations
  • list_prefix_with_metadata is used by the bulk-import remote-source path so the importer can size each chunk before fetching. Backends without cheap size metadata return an error; callers can fall back to caller-supplied object lists
  • resolve_local_path lets callers (e.g., import scratch staging) skip a copy when the storage already exposes data on the local filesystem (FileStorage)
  • All methods return fluree_db_core::Result<T> (alias for std::result::Result<T, Error>)

StorageWrite

Mutating storage operations. Implement alongside StorageRead for read-write storage.

#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageWrite: Debug + Send + Sync {
    /// Write raw bytes to an address
    async fn write_bytes(&self, address: &str, bytes: &[u8]) -> Result<()>;

    /// Delete data at an address
    async fn delete(&self, address: &str) -> Result<()>;
}
}

Design notes:

  • delete is part of the core write trait (not separate) because any writable storage should support deletion
  • Implementations should be idempotent: deleting a non-existent address succeeds silently

ContentAddressedWrite

Extension trait for content-addressed (hash-based) writes. Extends StorageWrite.

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContentAddressedWrite: StorageWrite {
    /// Write bytes with a pre-computed content hash
    /// Returns the canonical address and metadata
    async fn content_write_bytes_with_hash(
        &self,
        kind: ContentKind,
        ledger_id: &str,
        content_hash_hex: &str,
        bytes: &[u8],
    ) -> Result<ContentWriteResult>;

    /// Write bytes, computing the hash internally
    /// Default implementation computes SHA-256 and delegates
    async fn content_write_bytes(
        &self,
        kind: ContentKind,
        ledger_id: &str,
        bytes: &[u8],
    ) -> Result<ContentWriteResult> {
        let hash = sha256_hex(bytes);
        self.content_write_bytes_with_hash(kind, ledger_id, &hash, bytes).await
    }
}
}

Design notes:

  • ContentKind indicates whether data is a commit or index, enabling routing to different storage tiers
  • The default content_write_bytes implementation handles hash computation, so most backends only need to implement content_write_bytes_with_hash
  • Content-addressed storage enables deduplication and integrity verification

Storage (Marker Trait)

A convenience marker trait indicating full storage capability.

#![allow(unused)]
fn main() {
/// Full storage capability: read + content-addressed write
pub trait Storage: StorageRead + ContentAddressedWrite {}

/// Blanket implementation for any type implementing both traits
impl<T: StorageRead + ContentAddressedWrite> Storage for T {}
}

Usage:

#![allow(unused)]
fn main() {
// Instead of this verbose bound:
fn process<S: StorageRead + StorageWrite + ContentAddressedWrite>(storage: &S)

// Use this:
fn process<S: Storage>(storage: &S)
}

Extension Traits (fluree-db-nameservice)

The nameservice crate defines additional traits with StorageExtResult<T> for richer error handling (e.g., PreconditionFailed for CAS operations).

StorageList

Paginated listing for large-scale storage backends.

#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageList {
    async fn list_prefix(&self, prefix: &str) -> StorageExtResult<Vec<String>>;

    async fn list_prefix_paginated(
        &self,
        prefix: &str,
        continuation_token: Option<String>,
        max_keys: usize,
    ) -> StorageExtResult<ListResult>;
}
}

StorageCas

Compare-and-swap operations for consistent distributed updates.

#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageCas {
    /// Write only if the address doesn't exist
    async fn write_if_absent(&self, address: &str, bytes: &[u8]) -> StorageExtResult<bool>;

    /// Write only if the current version matches expected_etag
    async fn write_if_match(
        &self,
        address: &str,
        bytes: &[u8],
        expected_etag: &str,
    ) -> StorageExtResult<String>;

    /// Read with version/etag for subsequent CAS operations
    async fn read_with_etag(&self, address: &str) -> StorageExtResult<(Vec<u8>, String)>;
}
}

StorageDelete (nameservice)

Delete with nameservice error semantics.

#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageDelete {
    async fn delete(&self, address: &str) -> StorageExtResult<()>;
}
}

Why separate from core StorageWrite::delete?

  • Nameservice operations need StorageExtResult for errors like PreconditionFailed
  • Core operations use standard Result for simplicity
  • Storage backends typically implement both, with the nameservice version delegating to core

Implementing a Storage Backend

Minimal Read-Only Backend

For a read-only backend (e.g., ProxyStorage that fetches via HTTP):

#![allow(unused)]
fn main() {
#[async_trait]
impl StorageRead for MyReadOnlyStorage {
    async fn read_bytes(&self, address: &str) -> Result<Vec<u8>> {
        // Fetch from remote
    }

    async fn exists(&self, address: &str) -> Result<bool> {
        // Check existence (can implement as try-read)
        match self.read_bytes(address).await {
            Ok(_) => Ok(true),
            Err(Error::NotFound(_)) => Ok(false),
            Err(e) => Err(e),
        }
    }

    async fn list_prefix(&self, _prefix: &str) -> Result<Vec<String>> {
        Err(Error::storage("list_prefix not supported"))
    }
}

// Must also implement StorageWrite (with error stubs) and ContentAddressedWrite
// if you want to satisfy the Storage marker trait
#[async_trait]
impl StorageWrite for MyReadOnlyStorage {
    async fn write_bytes(&self, _: &str, _: &[u8]) -> Result<()> {
        Err(Error::storage("read-only storage"))
    }
    async fn delete(&self, _: &str) -> Result<()> {
        Err(Error::storage("read-only storage"))
    }
}

#[async_trait]
impl ContentAddressedWrite for MyReadOnlyStorage {
    async fn content_write_bytes_with_hash(&self, ...) -> Result<ContentWriteResult> {
        Err(Error::storage("read-only storage"))
    }
}
}

Full Read-Write Backend

For a complete backend (e.g., S3, filesystem):

#![allow(unused)]
fn main() {
// 1. Implement core traits
#[async_trait]
impl StorageRead for MyStorage {
    async fn read_bytes(&self, address: &str) -> Result<Vec<u8>> { ... }
    async fn exists(&self, address: &str) -> Result<bool> { ... }
    async fn list_prefix(&self, prefix: &str) -> Result<Vec<String>> { ... }
}

#[async_trait]
impl StorageWrite for MyStorage {
    async fn write_bytes(&self, address: &str, bytes: &[u8]) -> Result<()> { ... }
    async fn delete(&self, address: &str) -> Result<()> { ... }
}

#[async_trait]
impl ContentAddressedWrite for MyStorage {
    async fn content_write_bytes_with_hash(
        &self,
        kind: ContentKind,
        ledger_id: &str,
        content_hash_hex: &str,
        bytes: &[u8],
    ) -> Result<ContentWriteResult> {
        // Build address from kind + alias + hash
        let address = build_content_address(kind, ledger_id, content_hash_hex);
        self.write_bytes(&address, bytes).await?;
        Ok(ContentWriteResult {
            address,
            content_hash: content_hash_hex.to_string(),
            size_bytes: bytes.len(),
        })
    }
}

// Storage marker trait is automatically satisfied via blanket impl

// 2. Optionally implement nameservice traits for advanced features
#[async_trait]
impl StorageList for MyStorage {
    async fn list_prefix(&self, prefix: &str) -> StorageExtResult<Vec<String>> {
        // Delegate to core trait, convert error
        StorageRead::list_prefix(self, prefix)
            .await
            .map_err(|e| StorageExtError::Other(e.to_string()))
    }
    // ... paginated version
}
}

BranchedContentStore (fluree-db-core)

BranchedContentStore<S> is a recursive ContentStore implementation that provides namespace-scoped fallback reads for branched ledgers. When a branch is created, it gets its own storage namespace for new writes, but needs to read pre-branch-point content (commits, dictionaries) from ancestor namespaces.

Structure

#![allow(unused)]
fn main() {
pub struct BranchedContentStore<S: Storage> {
    branch_store: StorageContentStore<S>,
    parents: Vec<BranchedContentStore<S>>,
}
}
  • branch_store — the branch’s own namespace store; all writes go here
  • parents — ancestor stores to fall back to for reads (recursive tree)

The recursive structure supports arbitrarily deep branch chains (main → dev → feature) and is designed to support future merge scenarios where a branch may have multiple parents (DAG ancestry).

Constructors

#![allow(unused)]
fn main() {
// Root branch (e.g., main) — no parents
let store = BranchedContentStore::leaf(storage, "mydb:main");

// Branch with parent fallback
let parent = BranchedContentStore::leaf(storage, "mydb:main");
let store = BranchedContentStore::with_parents(storage, "mydb:dev", vec![parent]);
}

Read Behavior

get() tries the branch’s own namespace first, then recurses into parents:

  1. Try branch_store.get(id) — if found, return immediately
  2. If NotFound and parents exist, try each parent in order
  3. If no parent finds it, return the last NotFound error
  4. Non-NotFound errors propagate immediately — only NotFound triggers fallback

has() and resolve_local_path() follow the same fallback pattern.

Write Behavior

put() and put_with_id() always write to branch_store — never to parents. This ensures branch isolation: new content is always scoped to the branch’s own namespace.

What Is and Isn’t Copied at Branch Time

ArtifactCopied?Reason
CommitsNoImmutable chain, never deleted; read via fallback
Index structure files (root, leaves, branches, arenas)YesSource may GC old indexes after reindexing
String dictionariesNoStored globally in the @shared namespace; all branches read from the same location

Global Dictionary Storage (@shared Namespace)

String dictionaries (mappings between IRIs/strings and compact integer IDs) are the largest index artifact. Rather than copying them per-branch or relying on BranchedContentStore fallback reads, dictionaries are stored in a global namespace shared by all branches of a ledger.

The content_path function routes all DictBlob CIDs to a shared path:

mydb/@shared/dicts/<sha256hex>.subject    # Subject dict
mydb/@shared/dicts/<sha256hex>.string     # String dict
mydb/@shared/dicts/<sha256hex>.predicate  # Predicate dict
...

The @shared prefix uses the @ character, which is forbidden in branch names by validate_branch_name, so it cannot collide with any branch namespace. The constant is defined as SHARED_NAMESPACE in fluree-db-core::address_path.

Legacy fallback: Existing deployments may have dictionaries stored at the old per-branch path (e.g., mydb/main/index/objects/dicts/<sha>.dict). StorageContentStore automatically falls back to the legacy path when a dict CID is not found at the new @shared location. After the next index build, new writes go to the @shared path — no manual migration is needed.

Building the Store Tree

LedgerState::build_branched_store() recursively walks the branch ancestry via nameservice source_branch metadata, constructing the BranchedContentStore tree. This uses Box::pin for the recursive async calls.

The actual ancestry walk lives in fluree-db-nameservice (branched_store::build_branched_store), and LedgerState::build_branched_store is a thin wrapper that delegates there. This keeps the helper available to crates that don’t depend on fluree-db-ledger (notably fluree-db-indexer’s background worker).

When to Use BranchedContentStore

Any code path that walks the commit chain or loads index blobs for a branched ledger MUST use a branch-aware content store. Per-query reads against an already-loaded LedgerState are fine — LedgerState::load already wires the branched store up.

Use the nameservice helpers, not the flat StorageBackend::content_store(...):

HelperWhen to use
fluree_db_nameservice::branched_content_store_for_record(backend, ns, &record)An NsRecord is in scope (no extra lookup)
fluree_db_nameservice::branched_content_store_for_id(backend, ns, ledger_id)No NsRecord available — does one nameservice lookup
Fluree::branched_content_store(&self, ledger_id)API / CLI callers — wraps _for_id

Both helpers return the flat namespace store unchanged for non-branched ledgers, so adding them to non-branch code paths costs at most a single nameservice lookup.

A flat backend.content_store(ledger_id) on the commit-chain walk path will 404 the moment the walker steps past the fork point and tries to read an ancestor commit from the wrong namespace.

Type Erasure with AnyStorage

For dynamic dispatch (e.g., runtime-selected storage backends), use AnyStorage:

#![allow(unused)]
fn main() {
/// Type-erased storage wrapper
pub struct AnyStorage {
    inner: Arc<dyn Storage>,
}

impl AnyStorage {
    pub fn new<S: Storage + 'static>(storage: S) -> Self {
        Self { inner: Arc::new(storage) }
    }
}
}

When to use:

  • FlureeClient uses AnyStorage to support any backend at runtime
  • Generic code should prefer concrete types (S: Storage) for better optimization
  • Use AnyStorage when storage type is determined at runtime (e.g., from config)

Wrapper Storages

Several wrapper types add functionality to underlying storage:

TieredStorage

Routes commits and indexes to different backends:

#![allow(unused)]
fn main() {
pub struct TieredStorage<S> {
    commit_storage: S,
    index_storage: S,
}
}

EncryptedStorage

Adds transparent encryption:

#![allow(unused)]
fn main() {
pub struct EncryptedStorage<S, K> {
    inner: S,
    key_provider: K,
}
}

AddressIdentifierResolverStorage

Routes reads based on address format (e.g., different storage backends by identifier segment):

#![allow(unused)]
fn main() {
pub struct AddressIdentifierResolverStorage {
    default_storage: Arc<dyn Storage>,
    identifier_storages: HashMap<String, Arc<dyn Storage>>,
}
}

Error Handling

Core Errors (fluree_db_core::Error)

Standard errors for storage operations:

  • NotFound - Address doesn’t exist
  • Storage - Generic storage failure
  • Io - Underlying I/O error

Nameservice Errors (StorageExtError)

Extended errors for nameservice operations:

  • NotFound - Address doesn’t exist
  • PreconditionFailed - CAS condition not met
  • Other - Generic error with message

Summary

TypeCratePurposeError Type
ContentStore (trait)coreContent-addressed get/put/has by ContentIdResult<T>
BranchedContentStore (struct)coreRecursive ContentStore with namespace fallback for branchesResult<T>
StorageRead (trait)corePhysical read operationsResult<T>
StorageWrite (trait)corePhysical write + deleteResult<T>
ContentAddressedWrite (trait)coreHash-based physical writesResult<T>
Storage (trait)coreMarker (full physical capability)-
StorageList (trait)nameservicePaginated listingStorageExtResult<T>
StorageCas (trait)nameserviceCompare-and-swapStorageExtResult<T>
StorageDelete (trait)nameserviceDelete with ext errorsStorageExtResult<T>

Application code typically interacts with ContentStore for immutable artifact access. Storage backend implementors implement the physical traits (StorageRead, StorageWrite, ContentAddressedWrite) and the Storage marker trait is automatically satisfied. For branched ledgers, BranchedContentStore wraps the physical storage with recursive namespace fallback — see BranchedContentStore above.

HTTP API

The Fluree HTTP API provides RESTful endpoints for all database operations. This section documents the complete API surface including request formats, authentication, and error handling.

Core Endpoints

Overview

High-level introduction to the Fluree HTTP API, including:

  • API design principles
  • Authentication overview
  • Rate limiting and quotas
  • API versioning

Endpoints

Complete reference for all HTTP endpoints:

  • POST /update - Submit update transactions (WHERE/DELETE/INSERT or SPARQL UPDATE)
  • POST /query - Execute queries
  • GET /v1/fluree/ledgers - List ledgers
  • GET /health - Health checks
  • GET /v1/fluree/stats - Server status
  • And more…

Headers, Content Types, and Request Sizing

HTTP headers and request format details:

  • Content-Type negotiation
  • Accept headers for response formats
  • Request size limits
  • Compression support
  • Custom headers

Signed Requests (JWS/VC)

Cryptographically signed and verifiable requests:

  • JSON Web Signature (JWS) format
  • Verifiable Credentials (VC) support
  • Public key verification
  • DID authentication
  • Signature validation

Errors and Status Codes

HTTP status codes and error responses:

  • Standard HTTP status codes
  • Fluree-specific error codes
  • Error response format
  • Troubleshooting common errors

API Characteristics

RESTful Design

The Fluree API follows REST principles:

  • Resource-oriented URLs
  • Standard HTTP methods (GET, POST)
  • Stateless requests
  • Standard status codes

Content Negotiation

Fluree supports multiple content types for requests and responses:

Request Content-Types:

  • application/json - JSON-LD transactions and queries
  • application/sparql-query - SPARQL queries
  • text/turtle - Turtle RDF format
  • application/ld+json - Explicit JSON-LD

Response Content-Types:

  • application/json - Default JSON format
  • application/ld+json - JSON-LD with context
  • application/sparql-results+json - SPARQL result format

Authentication

Fluree supports multiple authentication mechanisms:

  1. No Authentication (development only)
  2. Signed Requests (JWS/VC for production)
  3. API Keys (simple token-based auth)
  4. Bearer Tokens (JWT authentication)

See Signed Requests for cryptographic authentication details.

Quick Examples

Transaction Request

curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/"
    },
    "@graph": [
      { "@id": "ex:alice", "ex:name": "Alice" }
    ]
  }'

Query Request

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "from": "mydb:main",
    "select": ["?name"],
    "where": [
      { "@id": "?person", "ex:name": "?name" }
    ]
  }'

SPARQL Query

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d 'SELECT ?name FROM <mydb:main> WHERE { ?person ex:name ?name }'

Health Check

curl http://localhost:8090/health

API Clients

Command Line (curl)

All examples in this documentation use curl for simplicity. Curl is available on all major platforms.

Programming Languages

Fluree’s HTTP API can be accessed from any language with HTTP client support:

JavaScript/TypeScript:

const response = await fetch('http://localhost:8090/v1/fluree/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    from: 'mydb:main',
    select: ['?name'],
    where: [{ '@id': '?person', 'ex:name': '?name' }]
  })
});
const results = await response.json();

Python:

import requests

response = requests.post('http://localhost:8090/v1/fluree/query', json={
    'from': 'mydb:main',
    'select': ['?name'],
    'where': [{'@id': '?person', 'ex:name': '?name'}]
})
results = response.json()

Java:

HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create("http://localhost:8090/v1/fluree/query"))
    .header("Content-Type", "application/json")
    .POST(HttpRequest.BodyPublishers.ofString(queryJson))
    .build();
HttpResponse<String> response = client.send(request, 
    HttpResponse.BodyHandlers.ofString());

Development vs Production

Development Setup

For local development, the API typically runs without authentication:

./fluree-db-server --port 8090 --storage memory

Access: http://localhost:8090

Production Setup

For production deployments, enable authentication and use HTTPS:

./fluree-db-server \
  --port 8090 \
  --storage aws \
  --require-signed-requests \
  --https-cert /path/to/cert.pem \
  --https-key /path/to/key.pem

Access: https://api.yourdomain.com

Always use:

  • HTTPS in production
  • Signed requests or API keys
  • Rate limiting
  • Request size limits

Performance Considerations

Request Size Limits

Default limits (configurable):

  • Transaction size: 10MB
  • Query size: 1MB
  • Response size: 100MB

See Headers and Request Sizing for details.

Connection Management

  • Keep-alive connections supported
  • HTTP/2 support available
  • WebSocket support for streaming (planned)

Caching

  • Query results can be cached (ETag support)
  • Immutable historical queries cache well
  • Current queries should not be cached aggressively

API Overview

The Fluree HTTP API provides a complete RESTful interface for database operations. This document provides a high-level overview of API design principles and capabilities.

API Design Principles

Resource-Oriented

The API is organized around resources:

  • Ledgers: Database instances
  • Transactions: Write operations
  • Queries: Read operations
  • Commits: Transaction history

Standard HTTP Methods

Operations use standard HTTP methods:

  • GET - Retrieve information (idempotent, cacheable)
  • POST - Submit operations (transactions, queries)
  • PUT - Update resources (planned)
  • DELETE - Remove resources (planned)

JSON-First

All request and response bodies use JSON by default:

  • Native JSON-LD support
  • Clean, readable syntax
  • Easy integration with modern applications

Stateless

All requests are stateless:

  • No session management required
  • Each request contains complete information
  • Enables horizontal scaling

Core Concepts

Ledger Identification

Ledgers are identified using aliases with branch names:

ledger-name:branch-name

Examples:

  • mydb:main - Main branch of mydb ledger
  • customers:prod - Production branch of customers ledger
  • tenant/app:dev - Development branch with hierarchical naming

Time Travel in URLs

Historical queries use time specifiers in ledger IDs:

ledger:branch@t:100           # Transaction number
ledger:branch@iso:2024-01-15  # ISO timestamp
ledger:branch@commit:bafybeig...  # Commit ID

These work in all query contexts (FROM clauses, dataset specs, etc.).

Content Type Negotiation

Request format determined by Content-Type header:

  • application/json - JSON-LD (default)
  • application/sparql-query - SPARQL
  • text/turtle - Turtle RDF

Response format determined by Accept header:

  • application/json - Compact JSON (default)
  • application/ld+json - Full JSON-LD with context
  • application/sparql-results+json - SPARQL result format

API Endpoints

Except for root diagnostics such as /health and /.well-known/fluree.json, HTTP API paths are under the discovered API base URL. The standalone server defaults to /v1/fluree.

Transaction Endpoints

POST /update

  • Submit update transactions (WHERE/DELETE/INSERT JSON-LD or SPARQL UPDATE)
  • Parameters: ledger, context
  • Returns: Transaction receipt with commit info

POST /insert / POST /upsert

  • Insert or upsert data (JSON-LD and Turtle; TriG on upsert)

Query Endpoints

POST /query

  • Execute queries (JSON-LD Query or SPARQL)
  • Parameters: None (ledger specified in query body)
  • Returns: Query results
  • Supports history queries via time range in from clause (see Time Travel)

Ledger Management

GET /ledgers

  • List all ledgers
  • Parameters: None
  • Returns: Array of ledger metadata

GET /info/:ledger-id

  • Get specific ledger metadata
  • Parameters: ledger-id (ledger:branch)
  • Returns: Ledger details (commit_t, index_t, etc.)

POST /create

  • Create a new ledger explicitly
  • Parameters: ledger
  • Returns: Ledger metadata

System Endpoints

GET /health

  • Health check endpoint
  • Parameters: None
  • Returns: Server health status

GET /stats

  • Server status and statistics
  • Parameters: None
  • Returns: Detailed server state

Request Format

URL Structure

https://[host]:[port]/[endpoint]?[parameters]

Example:

http://localhost:8090/v1/fluree/update?ledger=mydb:main

Query Parameters

Common parameters:

  • ledger - Target ledger (format: name:branch)
  • context - Default context URL
  • format - Response format override

Request Headers

Essential headers:

Content-Type: application/json
Accept: application/json
Authorization: Bearer [token]

See Headers for complete list.

Request Body

JSON-LD format for transactions:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "@graph": [
    { "@id": "ex:alice", "ex:name": "Alice" }
  ]
}

JSON-LD Query format:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "from": "mydb:main",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Response Format

Success Response

Successful operations return appropriate status codes with JSON bodies.

Transaction Response:

{
  "t": 5,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT5",
  "flakes_added": 3,
  "flakes_retracted": 1
}

Query Response:

[
  { "name": "Alice" },
  { "name": "Bob" }
]

Error Response

Errors return appropriate HTTP status codes with structured error objects:

{
  "error": "Invalid IRI: not a valid URI",
  "status": 400,
  "@type": "err:db/BadRequest"
}

See Errors and Status Codes for complete error reference.

Authentication

Fluree supports multiple authentication mechanisms, configured per endpoint group (data, events, admin, storage proxy). Each can be set to none, optional, or required. See Configuration for full details.

Development Mode

No authentication required (default):

curl http://localhost:8090/v1/fluree/query/mydb:main \
  -H "Content-Type: application/json" \
  -d '{"select": ["?s"], "where": [{"@id": "?s"}]}'

Bearer Token Authentication

Bearer tokens in the Authorization header. Fluree supports two token types with automatic dual-path dispatch:

Ed25519 JWS (did:key) - Locally minted tokens with an embedded JWK. Created with fluree token create:

TOKEN=$(fluree token create --private-key @~/.fluree/key --read-all --write-all)

curl http://localhost:8090/v1/fluree/query/mydb:main \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"select": ["?s"], "where": [{"@id": "?s"}]}'

OIDC/JWKS (RS256) - Tokens from external identity providers, verified against the provider’s JWKS endpoint. Requires the oidc feature and --jwks-issuer server configuration:

curl http://localhost:8090/v1/fluree/query/mydb:main \
  -H "Authorization: Bearer <oidc-token>" \
  -H "Content-Type: application/json" \
  -d '{"select": ["?s"], "where": [{"@id": "?s"}]}'

The server inspects the token header to determine the verification path:

  • Embedded JWK (Ed25519): Verifies against the embedded public key; issuer is a did:key
  • kid header (RS256): Verifies against the issuer’s JWKS endpoint

Token Scopes

Bearer tokens carry permission scopes that control access:

  • Read: fluree.ledger.read.all=true or fluree.ledger.read.ledgers=[...]
  • Write: fluree.ledger.write.all=true or fluree.ledger.write.ledgers=[...]
  • Back-compat: fluree.storage.* claims also imply read access for data endpoints

Connection-Scoped SPARQL

When a bearer token is present for connection-scoped SPARQL queries (/v1/fluree/query with Content-Type: application/sparql-query), FROM/FROM NAMED clauses are checked against the token’s read scope (fluree.ledger.read.all or fluree.ledger.read.ledgers). Out-of-scope ledgers return 404 (no existence leak).

Signed Requests (JWS/VC)

Cryptographically signed request bodies using Ed25519 JWS or Verifiable Credentials. The signed payload carries the request itself plus the signer’s identity for policy evaluation.

curl http://localhost:8090/v1/fluree/query/mydb:main \
  -H "Content-Type: application/jose" \
  -d '<compact-jws-string>'

See Signed Requests for detailed documentation.

Rate Limiting

Default Limits

Production deployments should implement rate limiting:

  • Queries: 100 requests per minute
  • Transactions: 10 requests per minute
  • History: 50 requests per minute

Rate Limit Headers

Responses include rate limit information:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642857600

Exceeding Limits

When limits are exceeded:

  • Status code: 429 Too Many Requests
  • Response body includes retry information
  • Retry-After header indicates wait time

API Versioning

Current Version

The current API is version 1 (v1).

Version in URL (Future)

Future versions may use URL-based versioning:

https://api.example.com/v2/query

Common Patterns

Idempotent Transactions

Use the upsert endpoint for idempotent transactions:

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{...}'

Batch Operations

Submit multiple entities in a single transaction:

{
  "@graph": [
    { "@id": "ex:alice", "ex:name": "Alice" },
    { "@id": "ex:bob", "ex:name": "Bob" },
    { "@id": "ex:carol", "ex:name": "Carol" }
  ]
}

Conditional Updates

Use WHERE/DELETE/INSERT for conditional changes:

{
  "where": [
    { "@id": "ex:alice", "ex:age": "?oldAge" }
  ],
  "delete": [
    { "@id": "ex:alice", "ex:age": "?oldAge" }
  ],
  "insert": [
    { "@id": "ex:alice", "ex:age": 31 }
  ]
}

Historical Queries

Query past states using time specifiers:

{
  "from": "mydb:main@t:100",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Best Practices

1. Use Appropriate HTTP Methods

  • GET for read-only operations (health, status)
  • POST for write and query operations

2. Set Correct Content-Type

Always specify the request format:

Content-Type: application/json

3. Handle Errors Gracefully

Check status codes and parse error responses:

if (response.status !== 200) {
  const error = await response.json();
  console.error(`Error ${error.code}: ${error.message}`);
}

4. Use Connection Pooling

Reuse HTTP connections for better performance:

const agent = new https.Agent({ keepAlive: true });

5. Implement Retry Logic

Retry failed requests with exponential backoff:

async function retryRequest(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i === maxRetries - 1) throw err;
      await sleep(Math.pow(2, i) * 1000);
    }
  }
}

6. Monitor Rate Limits

Track rate limit headers and back off when approaching limits.

7. Use Compression

Enable compression for large payloads:

Accept-Encoding: gzip, deflate

Security Considerations

HTTPS in Production

Always use HTTPS in production:

  • Prevents eavesdropping
  • Protects credentials
  • Enables trust

Validate Input

Validate all user input before sending to API:

  • Check IRI formats
  • Validate JSON structure
  • Sanitize user data

Secure Credentials

Never expose credentials in code or logs:

  • Use environment variables
  • Rotate keys regularly
  • Use signed requests for highest security

Implement CORS Carefully

If exposing API to web applications, configure CORS appropriately:

Access-Control-Allow-Origin: https://your-app.com
Access-Control-Allow-Methods: POST, GET
Access-Control-Allow-Headers: Content-Type, Authorization

Performance Tips

Combine related entities in single transactions for better performance.

2. Use Appropriate Time Specifiers

  • @t:NNN is fastest (direct lookup)
  • @iso:DATETIME requires binary search
  • @commit:CID requires scan

3. Limit Result Sets

Always use LIMIT for potentially large result sets:

{
  "select": ["?name"],
  "where": [...],
  "limit": 100
}

4. Cache Historical Queries

Historical queries (with time specifiers) are immutable and cache well.

5. Use Streaming for Large Results

For very large result sets, consider streaming responses (when supported).

API Endpoints

Complete reference for all Fluree HTTP API endpoints.

Base URL / versioning

All endpoints listed below are under the server’s API base URL (api_base_url from GET /.well-known/fluree.json).

  • Standalone fluree-server default: api_base_url = "/v1/fluree"
  • All curl examples in this document use the full URL including the base path (e.g., http://localhost:8090/v1/fluree/query/<ledger...>)

Discovery and diagnostics

GET /.well-known/fluree.json

CLI auth discovery endpoint. Used by fluree remote add and fluree auth login to auto-configure authentication for a remote.

See Auth contract (CLI ↔ Server) for the full schema.

Standalone fluree-server returns:

  • {"version":1,"api_base_url":"/v1/fluree"} when no server auth is enabled
  • {"version":1,"api_base_url":"/v1/fluree","auth":{"type":"token"}} when any server auth mode is enabled (data/events/admin)

OIDC-capable implementations should return auth.type="oidc_device" plus issuer, client_id, and exchange_url. The CLI treats oidc_device as “OIDC interactive login”: it uses device-code when the IdP supports it, otherwise authorization-code + PKCE.

Implementations MAY also return api_base_url to tell the CLI where the Fluree API is mounted (for example, when the API is hosted under /v1/fluree or on a separate data subdomain).

GET {api_base_url}/whoami

Diagnostic endpoint for Bearer tokens. Returns a summary of the principal:

  • token_present: whether a Bearer token was present
  • verified: whether cryptographic verification succeeded
  • auth_method: "embedded_jwk" (Ed25519) or "oidc" (JWKS/RS256)
  • identity + scope summary (when verified)

This endpoint is intended for debugging and operator support. See also Admin, health, and stats.

Transaction Endpoints

POST /update

Submit an update transaction (WHERE/DELETE/INSERT JSON-LD or SPARQL UPDATE) to write data to a ledger.

URL:

POST /update?ledger={ledger-id}
POST /update/{ledger-id}

Query Parameters:

  • ledger (required for /update): Target ledger (format: name:branch)
  • context (optional): URL to default JSON-LD context

Request Headers:

For JSON-LD transactions:

Content-Type: application/json
Accept: application/json

For SPARQL UPDATE:

Content-Type: application/sparql-update
Accept: application/json

Note: Turtle/TriG are not accepted on /update. Use /insert (Turtle) or /upsert (Turtle/TriG).

Request Body (JSON-LD):

JSON-LD transaction document:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "@graph": [
    { "@id": "ex:alice", "ex:name": "Alice" }
  ]
}

Or WHERE/DELETE/INSERT update:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "where": [
    { "@id": "ex:alice", "ex:age": "?oldAge" }
  ],
  "delete": [
    { "@id": "ex:alice", "ex:age": "?oldAge" }
  ],
  "insert": [
    { "@id": "ex:alice", "ex:age": 31 }
  ]
}

Request Body (SPARQL UPDATE):

PREFIX ex: <http://example.org/ns/>

INSERT DATA {
  ex:alice ex:name "Alice" .
  ex:alice ex:age 30 .
}

Or with DELETE/INSERT:

PREFIX ex: <http://example.org/ns/>

DELETE {
  ?person ex:age ?oldAge .
}
INSERT {
  ?person ex:age 31 .
}
WHERE {
  ?person ex:name "Alice" .
  ?person ex:age ?oldAge .
}

Response:

{
  "t": 5,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT5",
  "flakes_added": 3,
  "flakes_retracted": 1,
  "previous_commit_id": "bafybeig...commitT4"
}

Status Codes:

  • 200 OK - Transaction successful
  • 400 Bad Request - Invalid transaction syntax
  • 401 Unauthorized - Authentication required
  • 403 Forbidden - Not authorized for this ledger
  • 404 Not Found - Ledger not found
  • 413 Payload Too Large - Transaction exceeds size limit
  • 500 Internal Server Error - Server error

Examples:

JSON-LD transaction:

curl -X POST "http://localhost:8090/v1/fluree/update?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": { "ex": "http://example.org/ns/" },
    "@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }]
  }'

SPARQL UPDATE (ledger-scoped endpoint):

curl -X POST http://localhost:8090/v1/fluree/update/mydb:main \
  -H "Content-Type: application/sparql-update" \
  -d 'PREFIX ex: <http://example.org/ns/>
      INSERT DATA { ex:alice ex:name "Alice" }'

SPARQL UPDATE (connection-scoped with header):

curl -X POST http://localhost:8090/v1/fluree/update \
  -H "Content-Type: application/sparql-update" \
  -H "Fluree-Ledger: mydb:main" \
  -d 'PREFIX ex: <http://example.org/ns/>
      DELETE { ?s ex:age ?old } INSERT { ?s ex:age 31 }
      WHERE { ?s ex:name "Alice" . ?s ex:age ?old }'

Note: Turtle and TriG are not accepted on /update. Use /insert (Turtle) or /upsert (Turtle/TriG).

POST /insert

Insert new data into a ledger. Data must not conflict with existing data.

URL:

POST /insert?ledger={ledger-id}
POST /insert/{ledger-id}

Supported Content Types:

  • application/json - JSON-LD
  • text/turtle - Turtle (fast direct flake path)

Note: TriG (application/trig) is not supported on the insert endpoint. Named graph ingestion via GRAPH blocks requires the upsert path. Use /upsert for TriG data.

Example (JSON-LD):

curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": { "ex": "http://example.org/ns/" },
    "@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }]
  }'

Example (Turtle):

curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
  -H "Content-Type: text/turtle" \
  -d '@prefix ex: <http://example.org/ns/> .
      ex:alice ex:name "Alice" ; ex:age 30 .'

POST /upsert

Upsert data into a ledger. For each (subject, predicate) pair, existing values are retracted before new values are asserted.

URL:

POST /upsert?ledger={ledger-id}
POST /upsert/{ledger-id}

Supported Content Types:

  • application/json - JSON-LD
  • text/turtle - Turtle
  • application/trig - TriG with named graphs

Example (JSON-LD):

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": { "ex": "http://example.org/ns/" },
    "@id": "ex:alice",
    "ex:age": 31
  }'

Example (TriG with named graphs):

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/trig" \
  -d '@prefix ex: <http://example.org/ns/> .

      # Default graph
      ex:company ex:name "Acme Corp" .

      # Named graph for products
      GRAPH <http://example.org/graphs/products> {
          ex:widget ex:name "Widget" ;
                    ex:price "29.99"^^xsd:decimal .
      }'

POST /push/*ledger

Push precomputed commit v2 blobs to the server.

This endpoint is intended for Git-like workflows (fluree push) where a client has written commits locally and wants the server to validate and commit them.

URL:

POST /push/<ledger...>

Request Headers:

Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>
Idempotency-Key: <string>   (optional; recommended)

If Idempotency-Key is provided, servers MAY treat POST /push/*ledger as idempotent for that key (same request body + key should yield the same response), returning the prior success response instead of 409 on client retry after timeouts.

Request Body:

JSON object:

  • commits: array of base64-encoded commit v2 blobs (oldest → newest)
  • blobs (optional): map of { cid: base64Bytes } for referenced blobs (currently: commit.txn when present)

Response Body (200 OK):

{
  "ledger": "mydb:main",
  "accepted": 3,
  "head": {
    "t": 42,
    "commit_id": "bafy...headCommit"
  },
  "indexing": {
    "enabled": false,
    "needed": true,
    "novelty_size": 524288,
    "index_t": 30,
    "commit_t": 42
  }
}
FieldDescription
indexing.enabledWhether background indexing is active on this server.
indexing.neededWhether novelty has exceeded reindex_min_bytes and indexing should be triggered.
indexing.novelty_sizeCurrent novelty size in bytes after the push.
indexing.index_tTransaction time of the last indexed state.
indexing.commit_tTransaction time of the latest committed data (after push).

When enabled is false (external indexer mode), the caller should use needed and related fields to decide whether to trigger indexing through its own mechanism.

Error Responses:

  • 409 Conflict: head changed / diverged / first commit t did not match next-t
  • 422 Unprocessable Entity: invalid commit bytes, missing referenced blob, or retraction invariant violation

GET /show/*ledger

Fetch and decode a single commit’s contents with resolved IRIs. This is the server-side equivalent of fluree show — it returns assertions, retractions, and flake tuples with IRIs compacted using the ledger’s namespace prefix table.

URL:

GET /show/<ledger...>?commit=<ref>

Query Parameters:

  • commit (required): Commit identifier — t:<N> for transaction number, hex-digest prefix (min 6 chars), or full CID

Request Headers:

Authorization: Bearer <token>   (when data auth is enabled)

Response Body (200 OK):

{
  "id": "bagaybqabciq...",
  "t": 5,
  "time": "2026-03-12T16:58:18.395474217+00:00",
  "size": 327,
  "previous": "bagaybqabciq...",
  "asserts": 1,
  "retracts": 1,
  "@context": {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "schema": "http://schema.org/"
  },
  "flakes": [
    ["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T14:15:30Z", "xsd:string", false],
    ["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T16:58:16Z", "xsd:string", true]
  ]
}

Each flake is a tuple: [subject, predicate, object, datatype, operation]. Operation true = assert (added), false = retract (removed). When metadata is present (language tag, list index, or named graph), a 6th element is appended.

Policy filtering: Flakes are filtered by the caller’s data-auth identity (extracted from the Bearer token) and the server’s configured default_policy_class. When neither is present, all flakes are returned (root/admin access). Flakes the caller cannot read are silently omitted — the asserts and retracts counts reflect only the visible flakes. Unlike the query endpoints, show does not accept per-request policy overrides via headers or request body.

Responses:

  • 200 OK: Decoded commit returned
  • 400 Bad Request: Missing or invalid commit parameter
  • 401 Unauthorized: Bearer token required but missing
  • 404 Not Found: Ledger or commit not found
  • 501 Not Implemented: Proxy storage mode (no local index available)

Peer mode: Forwards to the transactor.

GET /commits/*ledger

Export commit blobs from a ledger using stable cursors. Pages walk backward via each commit’s parents — O(limit) per page regardless of ledger size. Used by fluree pull and fluree clone.

Requires replication-grade permissions (fluree.storage.*). The storage proxy must be enabled on the server.

URL:

GET /commits/<ledger...>?limit=100&cursor_id=<cid>

Query Parameters:

  • limit (optional): Max commits per page (default 100, server clamps to max 500)
  • cursor_id (optional): Commit CID cursor for pagination. Omit for first page (starts from head). Use next_cursor_id from the previous response for subsequent pages.

Request Headers:

Authorization: Bearer <token>   (requires fluree.storage.* claims)

Response Body (200 OK):

{
  "ledger": "mydb:main",
  "head_commit_id": "bafy...headCommit",
  "head_t": 42,
  "commits": ["<base64>", "<base64>"],
  "blobs": { "bafy...txnBlob": "<base64>" },
  "newest_t": 42,
  "oldest_t": 41,
  "next_cursor_id": "bafy...prevCommit",
  "count": 2,
  "effective_limit": 100
}
  • commits: Raw commit v2 blobs, newest → oldest within each page.
  • blobs: Referenced txn blobs keyed by CID string.
  • next_cursor_id: CID cursor for the next page; null when genesis is reached.
  • effective_limit: Actual limit used (after server clamping).

Responses:

  • 200 OK: Page of commits returned
  • 401 Unauthorized: Missing or invalid storage token
  • 404 Not Found: Storage proxy not enabled, ledger not found, or not authorized for this ledger

Pagination:

Commit CIDs in the immutable chain are stable cursors. New commits appended to the head do not affect backward pointers, so cursors remain valid across pages even when new commits arrive between requests.

POST /pack/*ledger

Stream all missing CAS objects for a ledger in a single binary response. This is the primary transport for fluree clone and fluree pull, replacing multiple paginated GET /commits requests or per-object GET /storage/objects fetches with a single streaming request.

Requires replication-grade permissions (fluree.storage.*). The storage proxy must be enabled on the server.

URL:

POST /pack/<ledger...>

Request Headers:

Content-Type: application/json
Accept: application/x-fluree-pack
Authorization: Bearer <token>   (requires fluree.storage.* claims)

Request Body:

{
  "protocol": "fluree-pack-v1",
  "want": ["bafy...remoteHead"],
  "have": ["bafy...localHead"],
  "include_indexes": true,
  "include_txns": true,
  "want_index_root_id": "bafy...indexRoot",
  "have_index_root_id": "bafy...localIndexRoot"
}
FieldTypeRequiredDescription
protocolstringYesMust be "fluree-pack-v1"
wantstring[]YesContentId CIDs the client wants (typically the remote commit head)
havestring[]NoContentId CIDs the client already has (typically the local commit head). Server stops walking the commit chain when it reaches a have CID. Empty for full clone.
want_index_root_idstringNoIndex root CID the client wants (typically remote nameservice index_head_id). Required when include_indexes=true.
have_index_root_idstringNoIndex root CID the client already has (typically local nameservice index_head_id). Used for index artifact diff.
include_indexesboolYesInclude index artifacts in the stream. When true, the stream contains commit + txn objects plus index root/branch/leaf/dict artifacts.
include_txnsboolYesInclude original transaction blobs referenced by each commit. When false, only commits (and optionally index artifacts) are streamed — commit envelopes still reference their txn CIDs, but the client will not have the transaction payloads locally. The ledger state is fully reconstructable from commits + indexes; transactions are the original request payloads (e.g., JSON-LD insert/update requests).

Response:

Binary stream using the fluree-pack-v1 wire format (Content-Type: application/x-fluree-pack):

[Preamble: FPK1 + version(1)] [Header frame] [Data frames...] [End frame]
FrameType byteContent
Header0x00JSON metadata: protocol version, capabilities, commit_count, index_artifact_count, estimated_total_bytes
Data0x01CID binary + raw object bytes (commit, txn blob, or index artifact)
Error0x02UTF-8 error message (terminates stream)
Manifest0x03JSON metadata for phase transitions (e.g. start of index phase)
End0xFFEnd of stream (no payload)

Data frames are streamed in oldest-first topological order (parents before children), so the client can write objects to CAS as they arrive without buffering the entire stream.

The Header frame includes an estimated_total_bytes field that the CLI uses to warn users before large transfers (~1 GiB or more). The estimate is ratio-based (derived from commit count) and may differ from actual transfer size. Set to 0 for commits-only requests.

Status Codes:

  • 200 OK: Binary pack stream
  • 401 Unauthorized: Missing or invalid storage token
  • 404 Not Found: Storage proxy not enabled, ledger not found, or not authorized for this ledger

Example:

# Download all commits for a ledger (full clone)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
  -H "Content-Type: application/json" \
  -H "Accept: application/x-fluree-pack" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"protocol":"fluree-pack-v1","want":["bafy...head"],"have":[],"include_indexes":false,"include_txns":true}' \
  --output pack.bin

# Download commits without transaction payloads (smaller clone, read-only use)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
  -H "Content-Type: application/json" \
  -H "Accept: application/x-fluree-pack" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"protocol":"fluree-pack-v1","want":["bafy...head"],"have":[],"include_indexes":true,"include_txns":false,"want_index_root_id":"bafy...indexRoot"}' \
  --output pack.bin

# Download only missing commits (incremental pull)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
  -H "Content-Type: application/json" \
  -H "Accept: application/x-fluree-pack" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"protocol":"fluree-pack-v1","want":["bafy...remoteHead"],"have":["bafy...localHead"],"include_indexes":false,"include_txns":true}' \
  --output pack.bin

# Download commits + index artifacts (default for CLI pull/clone)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
  -H "Content-Type: application/json" \
  -H "Accept: application/x-fluree-pack" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"protocol":"fluree-pack-v1","want":["bafy...head"],"have":[],"include_indexes":true,"include_txns":true,"want_index_root_id":"bafy...indexRoot"}' \
  --output pack.bin

Storage Proxy Endpoints

These endpoints are intended for peer mode and fluree clone/pull workflows. They require the storage proxy to be enabled on the server and use replication-grade Bearer tokens (fluree.storage.* claims).

GET /storage/ns/:ledger-id

Fetch the nameservice record for a ledger.

URL:

GET /storage/ns/{ledger-id}

Request Headers:

Authorization: Bearer <token>   (requires fluree.storage.* claims)

Response (200 OK):

{
  "ledger_id": "mydb:main",
  "name": "mydb",
  "branch": "main",
  "commit_head_id": "bafy...commitCid",
  "commit_t": 42,
  "index_head_id": "bafy...indexCid",
  "index_t": 40,
  "default_context": null,
  "retracted": false,
  "config_id": "bafy...configCid"
}
FieldDescription
ledger_idCanonical ledger ID (e.g., “mydb:main”)
nameLedger name without branch (e.g., “mydb”)
config_idCID of the LedgerConfig object (origin discovery), if set

Status Codes:

  • 200 OK: Record found
  • 404 Not Found: Storage proxy disabled, ledger not found, or not authorized

POST /storage/block

Fetch a storage block (index branch or leaf) by CID. The server derives the storage address internally. Leaf blocks are always policy-filtered before return.

Only replication-relevant content kinds are allowed (commits, txns, config, index roots/branches/leaves, dict blobs). Internal metadata kinds (GC records, stats sketches, graph source snapshots) are rejected with 404.

URL:

POST /storage/block

Request Headers:

Content-Type: application/json
Authorization: Bearer <token>
Accept: application/octet-stream | application/x-fluree-flakes | application/x-fluree-flakes+json

Request Body:

Both fields are required:

{
  "cid": "bafy...branchOrLeafCid",
  "ledger": "mydb:main"
}

Responses:

  • 200 OK: Block bytes (branches) or encoded flakes (leaves)
  • 400 Bad Request: Invalid CID string
  • 404 Not Found: Block not found, disallowed kind, or not authorized

GET /storage/objects/:cid

Fetch a CAS (content-addressed storage) object by its content identifier. Returns the raw bytes of the stored object after verifying integrity.

This is a replication-grade endpoint for fluree clone/pull workflows. The client knows the CID (from the nameservice record or the commit chain) and wants the raw bytes.

URL:

GET /storage/objects/{cid}?ledger={ledger-id}

Path Parameters:

  • cid: CIDv1 string (base32-lower multibase, e.g., "bafybeig...")

Query Parameters:

  • ledger (required): Ledger ID (e.g., "mydb:main"). Required because storage paths are ledger-scoped.

Request Headers:

Authorization: Bearer <token>   (requires fluree.storage.* claims)

Kind Allowlist:

All replication-relevant content kinds are served:

KindDescription
commitCommit chain blobs
txnTransaction data blobs
configLedgerConfig origin discovery objects
index-rootBinary index root (FIR6)
index-branchIndex branch manifests
index-leafIndex leaf files
dictDictionary artifacts (predicates, subjects, strings, etc.)

Only GarbageRecord (internal GC metadata) returns 404.

Response Headers:

  • Content-Type: application/octet-stream
  • X-Fluree-Content-Kind: Content kind label (commit, txn, config, index-root, index-branch, index-leaf, dict)

Response Body:

Raw bytes of the stored object.

Integrity Verification:

The server verifies the hash of the stored bytes against the CID before returning. Commit blobs are format-sniffed:

  • Commit-v2 blobs (FCV2 magic): Uses the canonical sub-range hash (SHA-256 over the payload excluding the trailing hash + signature block).
  • All other blobs (txn, config, future commit formats): Full-bytes SHA-256.

If verification fails, the server returns 500 Internal Server Error — this indicates storage corruption.

Status Codes:

  • 200 OK: Object found and integrity verified
  • 400 Bad Request: Invalid CID string
  • 404 Not Found: Object not found, disallowed kind, not authorized, or storage proxy disabled
  • 500 Internal Server Error: Hash verification failed (storage corruption)

Example:

# Fetch a commit blob by CID
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8090/v1/fluree/storage/objects/bafybeig...commitCid?ledger=mydb:main"

# Fetch a config blob by CID
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8090/v1/fluree/storage/objects/bafybeig...configCid?ledger=mydb:main"

# Fetch an index leaf by CID
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8090/v1/fluree/storage/objects/bafybeig...leafCid?ledger=mydb:main"

Nameservice Sync Endpoints

Used by replication clients and peer instances to push ref updates, initialize ledgers, and fetch snapshots of all nameservice records. These are the server-side counterpart to the fluree-db-nameservice-sync crate.

Authorization: All endpoints require a Bearer token with storage-proxy permissions. Per-alias endpoints verify the principal is authorized for that ledger. /snapshot filters results to the principal’s authorized scope (storage_all returns everything; otherwise results are filtered to storage_ledgers and graph sources are excluded).

Availability: These endpoints are only available on transaction servers (direct storage mode). Proxy-mode instances return 404 Not Found.

POST /nameservice/refs/{alias}/commit

Compare-and-set push for a ledger’s commit-head ref.

Request Body:

{
  "expected": { /* RefValue or null for initial creation */ },
  "new":      { /* RefValue */ }
}

Response (200 OK — updated):

{ "status": "updated", "ref": { /* new RefValue */ } }

Response (409 Conflict — CAS failed):

{ "status": "conflict", "actual": { /* current server-side RefValue */ } }

POST /nameservice/refs/{alias}/index

Compare-and-set push for a ledger’s index-head ref. Same request/response shape as /commit above.

POST /nameservice/refs/{alias}/init

Create a ledger entry in the nameservice if it does not already exist. Idempotent.

Response:

{ "created": true }   // new ledger entry was registered
{ "created": false }  // already existed; no change

GET /nameservice/snapshot

Return a full snapshot of all ledger (NsRecord) and graph-source (GraphSourceRecord) records visible to the caller.

Response:

{
  "ledgers":       [ /* NsRecord, … */ ],
  "graph_sources": [ /* GraphSourceRecord, … */ ]
}

Status Codes:

  • 200 OK — snapshot returned
  • 401 Unauthorized — missing/invalid storage-proxy token
  • 404 Not Found — endpoint disabled (proxy mode)

Query Endpoints

POST /query

Execute a query against one or more ledgers.

URL:

POST /query
GET  /query?query={urlencoded-sparql}   # SPARQL Protocol GET form

The GET form is provided for W3C SPARQL Protocol compliance. It accepts SPARQL queries via the query query parameter; the body forms below are preferred for larger queries and for JSON-LD. The same form is available on the ledger-scoped /query/{ledger} route.

Optional Query Parameters:

ParameterTypeDefaultDescription
default-contextbooleanfalseWhen true, use the ledger’s stored default JSON-LD context if the request omits its own @context (JSON-LD) or PREFIX declarations (ledger-scoped SPARQL).

Request Headers:

Content-Type: application/json
Accept: application/json

Or for SPARQL:

Content-Type: application/sparql-query
Accept: application/sparql-results+json

Request Body (JSON-LD Query):

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "from": "mydb:main",
  "select": ["?name", "?age"],
  "where": [
    { "@id": "?person", "ex:name": "?name" },
    { "@id": "?person", "ex:age": "?age" }
  ],
  "orderBy": ["?name"],
  "limit": 100
}

Request Body (SPARQL):

PREFIX ex: <http://example.org/ns/>

SELECT ?name ?age
FROM <mydb:main>
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
}
ORDER BY ?name
LIMIT 100

Response (JSON-LD Query):

[
  { "name": "Alice", "age": 30 },
  { "name": "Bob", "age": 25 }
]

Response (SPARQL):

{
  "head": {
    "vars": ["name", "age"]
  },
  "results": {
    "bindings": [
      {
        "name": { "type": "literal", "value": "Alice" },
        "age": { "type": "literal", "value": "30", "datatype": "http://www.w3.org/2001/XMLSchema#integer" }
      }
    ]
  }
}

Status Codes:

  • 200 OK - Query successful
  • 400 Bad Request - Invalid query syntax
  • 401 Unauthorized - Authentication required
  • 404 Not Found - Ledger not found
  • 413 Payload Too Large - Query exceeds size limit
  • 500 Internal Server Error - Server error
  • 503 Service Unavailable - Query timeout or resource limit

Example:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "from": "mydb:main",
    "select": ["?name"],
    "where": [{ "@id": "?person", "ex:name": "?name" }]
  }'

POST /query/

Execute a query against a specific ledger (ledger-scoped).

This endpoint is designed for single-ledger queries, but supports selecting named graphs inside the ledger.

URL:

POST /query/{ledger}

Default graph semantics:

  • If the request does not specify a graph selector, the query runs against the ledger’s default graph.
  • The built-in txn-meta graph can be selected as either:
    • JSON-LD: "from": "txn-meta", or
    • SPARQL: FROM <txn-meta>

Named graph selection (within the same ledger):

  • JSON-LD: you can use "from" to pick a graph in this ledger:

    • "from": "default" → default graph
    • "from": "txn-meta" → txn-meta graph
    • "from": "<graph IRI>" → a user-defined named graph IRI within this ledger
    • Structured form: "from": { "@id": "<ledger>", "graph": "<graph selector>" }
  • SPARQL: if the query includes FROM / FROM NAMED, the server interprets those IRIs as graphs within this ledger (not other ledgers):

    • FROM <default> / FROM <txn-meta> / FROM <graph IRI> selects the default graph for triple patterns outside GRAPH {}.
    • FROM NAMED <graph IRI> makes that named graph available via GRAPH <graph IRI> { ... }.

Ledger mismatch protection:

If the body includes a ledger reference that targets a different ledger than {ledger}, the server returns 400 Bad Request with a “Ledger mismatch” error.

Examples:

JSON-LD (query txn-meta):

curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "from": "txn-meta",
    "select": ["?commit", "?t"],
    "where": [{ "@id": "?commit", "https://ns.flur.ee/db#t": "?t" }]
  }'

JSON-LD (query a user-defined named graph by IRI):

curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "from": "http://example.org/graphs/products",
    "select": ["?name"],
    "where": [{ "@id": "?p", "http://example.org/ns/name": "?name" }]
  }'

SPARQL (select txn-meta as default graph):

curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
  -H "Content-Type: application/sparql-query" \
  -d 'PREFIX f: <https://ns.flur.ee/db#>
SELECT ?commit ?t
FROM <txn-meta>
WHERE { ?commit f:t ?t }'

History Queries via POST /query

Query the history of entities using the standard /query endpoint with from and to keys specifying the time range.

Request Body:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "from": "mydb:main@t:1",
  "to": "mydb:main@t:latest",
  "select": ["?name", "?age", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
    { "@id": "ex:alice", "ex:age": "?age" }
  ],
  "orderBy": "?t"
}

The @t and @op annotations capture transaction metadata:

  • @t - Transaction time (integer) when the fact was asserted or retracted.
  • @op - Operation type as a boolean: true for assertions, false for retractions. (Mirrors Flake.op on disk; constants "assert" / "retract" are not accepted.)

Both annotations work uniformly for literal-valued and IRI-valued objects.

Response:

[
  ["Alice", 30, 1, true],
  ["Alice", 30, 5, false],
  ["Alicia", 31, 5, true]
]

Example:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -d '{
    "@context": { "ex": "http://example.org/ns/" },
    "from": "mydb:main@t:1",
    "to": "mydb:main@t:latest",
    "select": ["?name", "?t", "?op"],
    "where": [
      { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
    ],
    "orderBy": "?t"
  }'

SPARQL History Query:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/sparql-query" \
  -d 'PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?name ?t ?op
FROM <mydb:main@t:1>
TO <mydb:main@t:latest>
WHERE {
  << ex:alice ex:name ?name >> f:t ?t .
  << ex:alice ex:name ?name >> f:op ?op .
}
ORDER BY ?t'

GET/POST /explain

Return a query plan without executing the query. Accepts the same body formats and authentication as /query (JSON-LD, SPARQL via application/sparql-query or ?query=, and JWS/VC signed requests).

URL:

GET  /explain[/{ledger...}]
POST /explain[/{ledger...}]

Behavior:

  • JSON-LD body: returns the logical plan for the parsed query.
  • SPARQL body: returns the plan for the parsed SPARQL query. The ledger-scoped endpoint (/explain/{ledger}) rejects queries containing FROM / FROM NAMED — strip dataset clauses to explain the core plan.
  • SPARQL UPDATE is rejected (HTTP 400) — use /update for updates.
  • Same ledger-scope enforcement for Bearer tokens as /query.

Response:

A JSON object describing the logical / physical plan. Shape mirrors the query engine’s internal plan representation; treat it as informational and non-stable across releases.

Status Codes:

  • 200 OK — plan returned
  • 400 Bad Request — SPARQL UPDATE sent, or FROM clauses on the ledger-scoped explain
  • 401 Unauthorized — authentication required and missing
  • 404 Not Found — ledger not found or not authorized

Examples:

# Explain a SPARQL query
curl -X POST http://localhost:8090/v1/fluree/explain/mydb \
  -H "Content-Type: application/sparql-query" \
  --data 'SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10'

# Explain a JSON-LD query
curl -X POST http://localhost:8090/v1/fluree/explain/mydb \
  -H "Content-Type: application/json" \
  -d '{"select":["?s"],"where":{"@id":"?s"}}'

Nameservice Metadata

The standalone server does not expose a general-purpose POST /nameservice/query endpoint. Use GET /ledgers to list ledgers and graph sources, GET /info/{ledger-id} for metadata about a single ledger or graph source, and GET /nameservice/snapshot for authenticated remote-sync snapshots.

Ledger Management Endpoints

GET /ledgers

List all ledgers and graph sources.

URL:

GET /ledgers

Response:

{
  "ledgers": [
    {
      "ledger_id": "mydb:main",
      "branch": "main",
      "commit_t": 5,
      "index_t": 5,
      "created": "2024-01-22T10:00:00.000Z",
      "last_updated": "2024-01-22T10:30:00.000Z"
    },
    {
      "ledger_id": "mydb:dev",
      "branch": "dev",
      "commit_t": 3,
      "index_t": 2,
      "created": "2024-01-22T11:00:00.000Z",
      "last_updated": "2024-01-22T11:15:00.000Z"
    }
  ]
}

Example:

curl http://localhost:8090/v1/fluree/ledgers

For metadata about a specific ledger or graph source, use GET /info/{ledger-id}. To create a ledger, use POST /create.

POST /create

Create a new ledger.

URL:

POST /create

Authentication: When admin auth is enabled (--admin-auth-mode=required), requires Bearer token from a trusted issuer. See Admin Authentication.

Request Body:

{
  "ledger": "mydb:main"
}
FieldTypeRequiredDescription
ledgerstringYesLedger ID (e.g., “mydb” or “mydb:main”)

Response:

{
  "ledger": "mydb:main",
  "t": 0,
  "commit_id": "bafybeig...commitT0"
}
FieldDescription
ledgerNormalized ledger ID
tTransaction time (0 for new ledger)
commit_idContentId of the initial commit

Status Codes:

  • 201 Created - Ledger created successfully
  • 400 Bad Request - Invalid request body
  • 401 Unauthorized - Bearer token required (when admin auth enabled)
  • 409 Conflict - Ledger already exists
  • 500 Internal Server Error - Server error

Examples:

# Create ledger (no auth required in default mode)
curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

# Create ledger with auth token (when admin auth enabled)
curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJ..." \
  -d '{"ledger": "mydb:main"}'

# Create with short ledger ID (auto-resolves to :main)
curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb"}'

POST /drop

Drop (delete) a ledger.

URL:

POST /drop

Authentication: When admin auth is enabled (--admin-auth-mode=required), requires Bearer token from a trusted issuer. See Admin Authentication.

Request Body:

{
  "ledger": "mydb:main",
  "hard": false
}
FieldTypeRequiredDescription
ledgerstringYesLedger ID (e.g., “mydb” or “mydb:main”)
hardbooleanNoIf true, permanently delete all storage files. Default: false (soft drop)

Drop Modes:

  • Soft drop (hard: false, default): Retracts the ledger from the nameservice but preserves all data files. The ledger can potentially be recovered.
  • Hard drop (hard: true): Permanently deletes all commit and index files. This is irreversible.

Response:

{
  "ledger": "mydb:main",
  "status": "dropped",
  "files_deleted": {
    "commit": 15,
    "index": 8
  }
}
FieldDescription
ledgerNormalized ledger ID
statusOne of: "dropped", "already_retracted", "not_found"
files_deletedFile counts (only populated for hard drop)

Status Codes:

  • 200 OK - Drop successful (or already dropped/not found)
  • 400 Bad Request - Invalid request body
  • 401 Unauthorized - Bearer token required (when admin auth enabled)
  • 500 Internal Server Error - Server error

Drop Sequence:

  1. Normalizes the ledger ID (ensures branch suffix like :main)
  2. Cancels any pending background indexing
  3. Waits for in-progress indexing to complete
  4. In hard mode: deletes all storage artifacts (commits + indexes)
  5. Retracts from nameservice
  6. Disconnects from ledger cache

Idempotency:

Safe to call multiple times:

  • Returns "already_retracted" if the ledger was previously dropped
  • Hard mode still attempts file deletion even for already-retracted ledgers (useful for cleanup)

Examples:

# Soft drop (retract only, preserve files)
curl -X POST http://localhost:8090/v1/fluree/drop \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

# Hard drop (delete all files - IRREVERSIBLE)
curl -X POST http://localhost:8090/v1/fluree/drop \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main", "hard": true}'

# Drop with auth token (when admin auth enabled)
curl -X POST http://localhost:8090/v1/fluree/drop \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJ..." \
  -d '{"ledger": "mydb:main", "hard": true}'

# Drop with short ledger ID (auto-resolves to :main)
curl -X POST http://localhost:8090/v1/fluree/drop \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb"}'

GET /context/

Get the default JSON-LD context for a ledger.

URL:

GET /context/{ledger-id}

Path Parameters:

  • ledger-id: Ledger identifier (e.g., mydb or mydb:main)

Response:

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "ex": "http://example.org/"
  }
}

If no default context has been set, "@context" is null.

Status Codes:

  • 200 OK - Context returned (may be null)
  • 404 Not Found - Ledger does not exist

Example:

curl http://localhost:8090/v1/fluree/context/mydb:main

PUT /context/

Replace the default JSON-LD context for a ledger.

URL:

PUT /context/{ledger-id}

Path Parameters:

  • ledger-id: Ledger identifier (e.g., mydb or mydb:main)

Request Body:

A JSON object mapping prefixes to IRIs. Either a bare object or wrapped in {"@context": {...}}:

{
  "ex": "http://example.org/",
  "foaf": "http://xmlns.com/foaf/0.1/",
  "schema": "http://schema.org/"
}

Response (success):

{
  "status": "updated"
}

Status Codes:

  • 200 OK - Context replaced successfully
  • 400 Bad Request - Body is not a valid JSON object; or peer mode (writes not available)
  • 404 Not Found - Ledger does not exist
  • 409 Conflict - Concurrent update conflict (retry the request)

Concurrency: The update uses compare-and-set semantics internally (up to 3 retries). A 409 means all retries were exhausted — this is rare and indicates heavy concurrent updates.

Cache invalidation: After a successful update, the server invalidates the cached ledger state. Subsequent queries will use the new context.

Examples:

# Set context
curl -X PUT http://localhost:8090/v1/fluree/context/mydb:main \
  -H "Content-Type: application/json" \
  -d '{"ex": "http://example.org/", "foaf": "http://xmlns.com/foaf/0.1/"}'

# Wrapped form also accepted
curl -X PUT http://localhost:8090/v1/fluree/context/mydb:main \
  -H "Content-Type: application/json" \
  -d '{"@context": {"ex": "http://example.org/"}}'

POST /branch

Create a new branch for a ledger.

URL:

POST /branch

Authentication: When admin auth is enabled (--admin-auth-mode=required), requires Bearer token from a trusted issuer. See Admin Authentication.

Request Body:

{
  "ledger": "mydb",
  "branch": "feature-x",
  "source": "main",
  "at": "t:5"
}
FieldTypeRequiredDescription
ledgerstringYesLedger name without branch suffix (e.g., “mydb”)
branchstringYesNew branch name to create (e.g., “feature-x”)
sourcestringNoSource branch to create from. Default: "main"
atstringNoCommit on the source branch to start from. "t:N" for a transaction number, or a hex digest / full CID for prefix resolution. When omitted, the branch starts at the source’s current HEAD. t: / prefix resolution requires the source to be indexed.

Response:

{
  "ledger_id": "mydb:feature-x",
  "branch": "feature-x",
  "source": "main",
  "t": 5
}
FieldDescription
ledger_idFull ledger:branch identifier for the new branch
branchBranch name
sourceSource branch this was created from
tTransaction time of the commit at the branch point

Status Codes:

  • 201 Created - Branch created successfully
  • 400 Bad Request - Invalid request body (including malformed at value)
  • 401 Unauthorized - Bearer token required (when admin auth enabled)
  • 404 Not Found - Source branch does not exist, or at commit is not reachable from source HEAD
  • 409 Conflict - Branch already exists
  • 500 Internal Server Error - Server error

Examples:

# Create branch from main (default source)
curl -X POST http://localhost:8090/v1/fluree/branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "feature-x"}'

# Create branch from a specific source branch
curl -X POST http://localhost:8090/v1/fluree/branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "staging", "source": "dev"}'

# Branch at a historical commit on main
curl -X POST http://localhost:8090/v1/fluree/branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "rewind", "at": "t:5"}'

GET /branch/

List all non-retracted branches for a ledger.

URL:

GET /branch/{ledger-name}

Response:

[
  {
    "branch": "main",
    "ledger_id": "mydb:main",
    "t": 5
  },
  {
    "branch": "feature-x",
    "ledger_id": "mydb:feature-x",
    "t": 5,
    "source": "main"
  }
]
FieldDescription
branchBranch name
ledger_idFull ledger:branch identifier
tCurrent transaction time on this branch
sourceSource branch (only present for branches created via /branch)

Examples:

curl http://localhost:8090/v1/fluree/branch/mydb

POST /drop-branch

Drop a branch from a ledger. Admin-protected.

URL:

POST /drop-branch

Request body:

{
  "ledger": "mydb",
  "branch": "feature-x"
}
FieldTypeRequiredDescription
ledgerstringYesLedger name without branch suffix (e.g., “mydb”)
branchstringYesBranch name to drop (e.g., “feature-x”)

Response body (200 OK):

{
  "ledger_id": "mydb:feature-x",
  "status": "Dropped",
  "deferred": false,
  "artifacts_deleted": 5,
  "cascaded": [],
  "warnings": []
}
FieldTypeDescription
ledger_idFull ledger:branch identifier of the dropped branch
statusDrop status ("Dropped", "AlreadyRetracted", "NotFound")
deferredtrue if the branch has children — retracted but storage preserved
artifacts_deletedNumber of storage artifacts removed
cascadedList of ancestor branch ledger_ids that were cascade-dropped
warningsAny non-fatal warnings during the drop

Behavior:

  • Cannot drop main: Returns 400 Bad Request.
  • Leaf branch (no children): Fully drops — deletes storage artifacts, purges NsRecord, decrements parent’s child count. If the parent was previously retracted and its child count reaches 0, the parent is cascade-dropped too.
  • Branch with children (branches > 0): Retracted (hidden from listings, rejects new transactions) but storage is preserved for children. When the last child is eventually dropped, the retracted parent is cascade-purged automatically.

Status codes:

  • 200 OK - Branch dropped (or deferred) successfully
  • 400 Bad Request - Cannot drop the main branch
  • 404 Not Found - Ledger or branch does not exist
  • 500 Internal Server Error - Server error

Examples:

# Drop a leaf branch
curl -X POST http://localhost:8090/v1/fluree/drop-branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "feature-x"}'

# Drop a branch with children (will be deferred)
curl -X POST http://localhost:8090/v1/fluree/drop-branch \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "dev"}'

POST /rebase

Rebase a branch onto its source branch’s current HEAD. Admin-protected.

URL:

POST /rebase

Request body:

{
  "ledger": "mydb",
  "branch": "feature-x",
  "strategy": "take-both"
}
FieldTypeRequiredDescription
ledgerstringYesLedger name without branch suffix (e.g., “mydb”)
branchstringYesBranch name to rebase (e.g., “feature-x”)
strategystringNoConflict resolution strategy (default: “take-both”). Options: take-both, abort, take-source, take-branch, skip

Response body (200 OK):

{
  "ledger_id": "mydb:feature-x",
  "branch": "feature-x",
  "fast_forward": false,
  "replayed": 3,
  "skipped": 0,
  "conflicts": 1,
  "failures": 0,
  "total_commits": 3,
  "source_head_t": 8
}
FieldTypeDescription
ledger_idstringFull ledger:branch identifier
branchstringBranch name
fast_forwardbooltrue if the branch had no unique commits
replayednumberNumber of commits successfully replayed
skippednumberNumber of commits skipped (Skip strategy)
conflictsnumberNumber of conflicts detected
failuresnumberNumber of commits that failed validation
total_commitsnumberTotal branch commits considered
source_head_tnumberTransaction time of the source branch HEAD

Conflict strategies:

StrategyBehavior
take-bothReplay as-is, both values coexist (multi-cardinality)
abortFail on first conflict, no changes applied
take-sourceDrop branch’s conflicting flakes (source wins)
take-branchKeep branch’s flakes, retract source’s conflicting values
skipSkip entire commit if any flakes conflict

Status codes:

  • 200 OK - Rebase completed successfully
  • 400 Bad Request - Cannot rebase main, invalid strategy, or missing branch point
  • 404 Not Found - Ledger or branch does not exist
  • 409 Conflict - Rebase aborted due to conflict (abort strategy)
  • 500 Internal Server Error - Server error

Examples:

# Rebase with default strategy (take-both)
curl -X POST http://localhost:8090/v1/fluree/rebase \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "feature-x"}'

# Rebase with abort strategy (fail on conflicts)
curl -X POST http://localhost:8090/v1/fluree/rebase \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "branch": "feature-x", "strategy": "abort"}'

POST /merge

Merge a source branch into a target branch. Admin-protected.

Fast-forward merges copy the source commit chain into the target namespace and advance the target HEAD. When the target has diverged, Fluree performs a general merge: it computes the source and target deltas since their common ancestor, resolves overlapping (s, p, g) conflicts according to the requested strategy, and creates a merge commit on the target branch.

URL:

POST /merge

Request body:

{
  "ledger": "mydb",
  "source": "feature-x",
  "target": "dev",
  "strategy": "take-both"
}
FieldTypeRequiredDescription
ledgerstringYesLedger name without branch suffix (e.g., “mydb”)
sourcestringYesSource branch to merge from (e.g., “feature-x”)
targetstringNoTarget branch to merge into (defaults to source’s parent branch)
strategystringNoConflict resolution strategy for non-fast-forward merges. Defaults to take-both. Options: take-both, abort, take-source, take-branch

Conflict strategies:

StrategyBehavior
take-bothKeep source flakes as-is, so both source and target values can coexist
abortFail if conflicts are detected; no merge commit is created
take-sourceSource wins: keep source flakes and retract target’s conflicting values
take-branchTarget wins: drop source flakes for conflicting keys

skip is a rebase-only strategy and is not supported for non-fast-forward merges.

Response body (200 OK):

{
  "ledger_id": "mydb:dev",
  "target": "dev",
  "source": "feature-x",
  "fast_forward": false,
  "new_head_t": 8,
  "commits_copied": 3,
  "conflict_count": 1,
  "strategy": "take-both"
}
FieldTypeDescription
ledger_idstringFull ledger:branch identifier of the target
targetstringTarget branch name
sourcestringSource branch name
fast_forwardboolWhether this merge advanced the target directly to the source HEAD
new_head_tnumberNew commit HEAD transaction time of the target
commits_copiednumberNumber of commit blobs copied to the target namespace
conflict_countnumberNumber of overlapping (s, p, g) keys detected during a non-fast-forward merge
strategystringConflict strategy used for a non-fast-forward merge. Omitted for fast-forward merges

Status codes:

  • 200 OK - Merge completed successfully
  • 400 Bad Request - Source has no branch point (e.g., main), self-merge, unknown strategy, or unsupported merge strategy
  • 404 Not Found - Ledger or branch does not exist
  • 409 Conflict - Merge aborted due to conflicts when using the abort strategy, or the target HEAD changed during commit publishing
  • 500 Internal Server Error - Server error

Examples:

# Merge feature-x into its parent (inferred from branch point)
curl -X POST http://localhost:8090/v1/fluree/merge \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "source": "feature-x"}'

# Merge dev into main (explicit target)
curl -X POST http://localhost:8090/v1/fluree/merge \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "source": "dev", "target": "main"}'

# Non-fast-forward merge with source-winning conflict resolution
curl -X POST http://localhost:8090/v1/fluree/merge \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb", "source": "dev", "target": "main", "strategy": "take-source"}'

GET /merge-preview/

Read-only preview of merging a source branch into a target branch. Returns the rich diff — ahead/behind commit summaries, conflict keys, and fast-forward eligibility — without mutating any nameservice or content store state.

Bearer token required when data_auth.mode = required; reads are gated on bearer.can_read(ledger).

URL:

GET /merge-preview/{ledger-name}?source={source}&target={target}&max_commits={n}&max_conflict_keys={n}&include_conflicts={bool}&include_conflict_details={bool}&strategy={strategy}

Path / Query Parameters:

ParameterTypeRequiredDescription
ledger (path)stringYesLedger name (e.g., “mydb”)
sourcestringYesSource branch to merge from (e.g., “feature-x”)
targetstringNoTarget branch (defaults to the source’s parent branch)
max_commitsnumberNoCap on per-side commit summaries returned (default 500). Server clamps to a hard maximum of 5,000 — values above are silently lowered. Bounds response size, not divergence-walk cost (the unbounded count is still computed).
max_conflict_keysnumberNoCap on conflict keys returned (default 200). Server clamps to a hard maximum of 5,000. Bounds response size, not the conflict-delta walks.
include_conflictsboolNoWhen false, skips the conflict computation (default true). Use this to make the preview cheap on diverged branches.
include_conflict_detailsboolNoWhen true, includes source/target flake values for the returned conflict keys. Defaults to false. Details are computed after max_conflict_keys is applied.
strategystringNoStrategy used to annotate conflict details. Defaults to take-both. Options: take-both, abort, take-source, take-branch.

Response body (200 OK):

{
  "source": "feature-x",
  "target": "main",
  "ancestor": { "commit_id": "bafy...", "t": 5 },
  "ahead": {
    "count": 3,
    "commits": [
      { "t": 8, "commit_id": "bafy...", "time": "2026-04-25T12:00:00Z",
        "asserts": 2, "retracts": 0, "flake_count": 2, "message": null }
    ],
    "truncated": false
  },
  "behind": { "count": 1, "commits": [...], "truncated": false },
  "fast_forward": false,
  "mergeable": true,
  "conflicts": {
    "count": 1,
    "keys": [{ "s": [100, "alice"], "p": [100, "status"], "g": null }],
    "truncated": false,
    "strategy": "take-source",
    "details": [
      {
        "key": { "s": [100, "alice"], "p": [100, "status"], "g": null },
        "source_values": [["ex:alice", "ex:status", "active", "xsd:string", true]],
        "target_values": [["ex:alice", "ex:status", "archived", "xsd:string", true]],
        "resolution": {
          "source_action": "kept",
          "target_action": "retracted",
          "outcome": "source-wins"
        }
      }
    ]
  }
}
FieldTypeDescription
sourcestringSource branch name
targetstringTarget branch name (resolved from default when not supplied)
ancestorobject | nullCommon ancestor {commit_id, t}. null when both heads are absent
aheadobjectCommits on source not on target (count, commits, truncated)
behindobjectCommits on target not on source
fast_forwardboolTrue when target HEAD == ancestor (or both heads absent)
mergeableboolFalse only when the selected preview strategy would abort, e.g. strategy=abort with conflicts. This is a strategy/conflict signal, not full transaction validation. mergeable=true does not guarantee a subsequent POST /merge will succeed; it only reflects the conflict/strategy interaction at preview time.
conflictsobjectOverlapping (s, p, g) keys touched on both sides since the ancestor. Empty when fast_forward or include_conflicts=false

Per-commit summaries (ahead.commits[] / behind.commits[]) are newest-first and include assert/retract counts plus an optional message extracted from txn_meta when an f:message string entry is present.

When include_conflict_details=true, conflicts.details[] contains one entry for each returned conflict key. source_values and target_values are the current asserted values for that key at each branch HEAD, using the same resolved flake tuple format as /show: [subject, predicate, object, datatype, operation], with an optional metadata object as the 6th tuple item. The resolution object is an annotation only; preview does not apply the strategy or mutate state.

Status codes:

  • 200 OK — Preview computed successfully
  • 400 Bad Request — Source has no branch point (e.g., main), source == target, unknown strategy, unsupported preview strategy, include_conflict_details=true with include_conflicts=false, or strategy=abort with include_conflicts=false
  • 401 Unauthorized — Bearer token required
  • 404 Not Found — Ledger or branch does not exist (or bearer cannot read it)

Examples:

# Default target (source's parent), defaults for caps and conflict computation
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=feature-x"

# Counts only — skip the conflict walks for a faster response
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=dev&target=main&include_conflicts=false"

# Cap commit lists at 50 per side
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=dev&max_commits=50"

# Include value details and labels for a source-winning merge
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=dev&target=main&include_conflict_details=true&strategy=take-source"

GET /info/

Get ledger metadata. Used by the CLI for info, push, pull, and clone.

URL:

GET /info/{ledger-id}

Path Parameters:

  • ledger-id: Ledger ID (e.g., “mydb” or “mydb:main”)

Response (non-proxy mode):

Returns comprehensive ledger metadata including namespace codes, property stats, and class counts. Always includes:

{
  "ledger_id": "mydb:main",
  "t": 42,
  "commitId": "bafybeig...headCommitCid",
  "indexId": "bafybeig...indexRootCid",
  "namespaces": { ... },
  "properties": { ... },
  "classes": [ ... ]
}

Response (proxy storage mode):

Returns simplified nameservice-only metadata:

{
  "ledger_id": "mydb:main",
  "t": 42,
  "commit_head_id": "bafybeig...commitCid",
  "index_head_id": "bafybeig...indexCid"
}
FieldTypeRequiredDescription
ledger_idstringYesCanonical ledger ID
tintegerYesCurrent transaction time. Used by push/pull for head comparison.
commitIdstringNoHead commit CID (non-proxy mode)
commit_head_idstringNoHead commit CID (proxy mode)

Important: The t field is required by the CLI for push/pull/clone operations. See CLI-Server API Contract for details.

Optional query parameters:

ParameterTypeDefaultDescription
realtime_property_detailsbooleantrueWhen false, use the lighter fast novelty-aware stats path instead of the default full lookup-backed path
include_property_datatypesbooleantrueInclude datatype info for properties
include_property_estimatesbooleanfalseInclude index-derived NDV/selectivity estimates for properties

Status Codes:

  • 200 OK - Ledger found
  • 401 Unauthorized - Authentication required
  • 404 Not Found - Ledger not found

Examples:

# Get ledger info
curl "http://localhost:8090/v1/fluree/info/mydb:main"

# With auth token
curl "http://localhost:8090/v1/fluree/info/mydb:main" \
  -H "Authorization: Bearer eyJ..."

GET /exists/

Check if a ledger exists in the nameservice.

URL:

GET /exists/{ledger-id}

Path Parameters:

  • ledger-id: Ledger ID (e.g., “mydb” or “mydb:main”)

Response:

{
  "ledger": "mydb:main",
  "exists": true
}
FieldTypeDescription
ledgerstringLedger ID (echoed back)
existsbooleanWhether the ledger is registered in the nameservice

Status Codes:

  • 200 OK - Check completed successfully (regardless of whether ledger exists)
  • 500 Internal Server Error - Server error

Usage Notes:

This is a lightweight check that only queries the nameservice without loading the ledger data. Use this to:

  • Check if a ledger exists before attempting to load it
  • Implement conditional create-or-load logic
  • Validate ledger IDs in application code

Examples:

# Check a ledger ID
curl "http://localhost:8090/v1/fluree/exists/mydb:main"

# Conditional create-or-load in shell
if curl -s "http://localhost:8090/v1/fluree/exists/mydb" | jq -e '.exists == false' > /dev/null; then
  curl -X POST http://localhost:8090/v1/fluree/create \
    -H "Content-Type: application/json" \
    -d '{"ledger": "mydb"}'
fi

System Endpoints

GET /health

Health check endpoint for monitoring.

URL:

GET /health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "storage": "memory",
  "uptime_ms": 123456
}

Status Codes:

  • 200 OK - System healthy
  • 503 Service Unavailable - System unhealthy

Example:

curl http://localhost:8090/health

GET /stats

Detailed server statistics.

URL:

GET /stats

Response:

{
  "version": "0.1.0",
  "uptime_ms": 123456789,
  "storage": {
    "mode": "memory",
    "total_bytes": 12345678,
    "ledgers": 5
  },
  "queries": {
    "total": 1234,
    "active": 3,
    "average_duration_ms": 45
  },
  "transactions": {
    "total": 567,
    "average_duration_ms": 89
  },
  "indexing": {
    "active": true,
    "pending_ledgers": 2
  }
}

Example:

curl http://localhost:8090/v1/fluree/stats

Events Endpoint

GET /events

Server-Sent Events (SSE) stream of nameservice changes for ledgers and graph sources. Available on transaction servers only (not peers).

Query parameters:

ParameterDescription
all=trueSubscribe to all ledgers and graph sources
ledger=<id>Subscribe to a specific ledger (repeatable)
graph-source=<id>Subscribe to a specific graph source (repeatable)

Event types:

EventDescription
ns-recordA ledger or graph source was published/updated
ns-retractedA ledger or graph source was deleted

Authentication: Configurable via --events-auth-mode none|optional|required. See Query peers and replication for full details including auth configuration, event payloads, and peer subscription setup.

Graph Source Endpoints

Note: HTTP endpoints for BM25 and vector index lifecycle management (create, sync, drop) are not yet implemented in the server. BM25 and vector indexes are currently managed via the Rust API (Bm25CreateConfig, create_full_text_index, sync_bm25_index, drop_full_text_index). See BM25 Full-Text Search and Vector Search for API usage.

BM25 search is available in queries via the f:graphSource / f:searchText pattern in where clauses — see the query documentation for details.

Graph source metadata can be discovered via GET /ledgers or GET /info/{graph-source-id}.

POST {api_base_url}/iceberg/map

Map an Iceberg table (or R2RML-mapped relational source backed by Iceberg) as a graph source. Admin-protected — requires the admin Bearer token when an admin token is configured. Available only when the server is built with the iceberg feature.

URL:

POST {api_base_url}/iceberg/map

For the standalone server and Docker image defaults, this is:

POST http://localhost:8090/v1/fluree/iceberg/map

Request Body:

{
  "name": "warehouse-orders",
  "mode": "rest",
  "catalog_uri": "https://polaris.example.com/api/catalog",
  "table": "sales.orders",
  "branch": "main",
  "r2rml": "@prefix rr: <http://www.w3.org/ns/r2rml#> . ...",
  "r2rml_type": "text/turtle",
  "warehouse": "prod",
  "auth_bearer": "…",
  "oauth2_token_url": "https://idp.example.com/token",
  "oauth2_client_id": "…",
  "oauth2_client_secret": "…",
  "no_vended_credentials": false,
  "s3_region": "us-east-1",
  "s3_endpoint": "https://s3.example.com",
  "s3_path_style": false,
  "table_location": "s3://bucket/warehouse/sales/orders"
}
FieldTypeDescription
namestringGraph source name (required)
modestringrest (default) or direct
catalog_uristringREST catalog URI (required in rest mode)
tablestringTable identifier namespace.table (required in rest mode)
table_locationstringS3 table location (required in direct mode)
r2rmlstringInline R2RML mapping (Turtle/JSON-LD). Omit to auto-generate a direct mapping.
r2rml_typestringMedia type of r2rml (text/turtle, application/ld+json)
branchstringBranch name (default: main)
auth_bearerstringBearer token for catalog auth
oauth2_*stringOAuth2 client-credentials flow for the catalog
warehousestringWarehouse identifier
no_vended_credentialsboolDisable vended credentials
s3_region, s3_endpoint, s3_path_styleS3 overrides for direct mode

Response:

{
  "graph_source_id": "warehouse-orders:main",
  "table_identifier": "sales.orders",
  "catalog_uri": "https://polaris.example.com/api/catalog",
  "connection_tested": true,
  "mapping_source": "r2rml-inline",
  "triples_map_count": 3,
  "mapping_validated": true
}

Status Codes:

  • 201 Created — graph source created
  • 400 Bad Request — missing required fields or invalid R2RML
  • 401/403 — admin auth required
  • 500 Internal Server Error — catalog connection or mapping failure

See also the CLI wrapper: fluree iceberg map.

Admin Endpoints

POST /reindex

Trigger a full manual reindex for a ledger. Walks the entire commit chain and rebuilds the binary index from scratch using the server’s configured indexer settings. Admin-protected — requires the admin Bearer token when admin auth is enabled.

This endpoint runs the reindex synchronously and returns when the new root is committed. For large ledgers it may run for many minutes; configure your HTTP client timeout accordingly. In peer mode, the request is forwarded to the transaction server.

URL:

POST /reindex

Request Body:

{
  "ledger": "mydb:main"
}
FieldTypeDescription
ledgerstringLedger alias (name or name:branch). Required.
optsobjectReserved for future per-request indexer overrides. Currently accepted but ignored.

Example:

curl -X POST http://localhost:8090/v1/fluree/reindex \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <admin-token>' \
  -d '{"ledger": "mydb:main"}'

Response:

{
  "ledger_id": "mydb:main",
  "index_t": 42,
  "root_id": "fluree:cid:bafy…",
  "stats": {
    "flake_count": 184273,
    "leaf_count": 614,
    "branch_count": 23,
    "total_bytes": 47185920
  }
}
FieldDescription
ledger_idLedger alias the reindex was run against
index_tTransaction time the new index was built at (matches the head commit)
root_idContentId of the newly written index root
stats.flake_countTotal flakes in the rebuilt index
stats.leaf_countNumber of leaf nodes written
stats.branch_countNumber of branch nodes written
stats.total_bytesBytes written to storage during the reindex

Status Codes:

  • 200 OK — reindex complete
  • 400 Bad Request — missing/invalid ledger
  • 401/403 — admin auth required
  • 404 Not Found — ledger does not exist
  • 500 Internal Server Error — reindex failed

When triggering indexing through the Rust API instead, see Fluree::reindex and ReindexOptions. For background incremental indexing (which runs automatically as commits are made), see Background indexing.

Admin Authentication

Administrative endpoints (/create, /drop, /reindex, branch operations, and Iceberg mapping when enabled) can be protected with Bearer token authentication.

Configuration

Enable admin authentication with CLI flags:

# Production: require trusted tokens
fluree-server \
  --admin-auth-mode=required \
  --admin-auth-trusted-issuer=did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK

# Development: no authentication (default)
fluree-server --admin-auth-mode=none

Environment Variables:

  • FLUREE_ADMIN_AUTH_MODE: none (default) or required
  • FLUREE_ADMIN_AUTH_TRUSTED_ISSUERS: Comma-separated list of trusted did:key identifiers

Token Format

Admin tokens use the same JWS format as other Fluree tokens. Required claims:

{
  "iss": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
  "exp": 1705932000,
  "sub": "admin@example.com"
}
ClaimRequiredDescription
issYesIssuer did:key (must be in trusted issuers list)
expYesExpiration timestamp (Unix seconds)
subNoSubject identifier
fluree.identityNoIdentity for audit logging

Making Authenticated Requests

Include the token in the Authorization header:

curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOiJFZERTQSIsImp3ayI6ey..." \
  -d '{"ledger": "mydb:main"}'

Issuer Trust

Tokens must be signed by a trusted issuer. Configure trusted issuers:

# Single issuer
--admin-auth-trusted-issuer=did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK

# Multiple issuers
--admin-auth-trusted-issuer=did:key:z6Mk... \
--admin-auth-trusted-issuer=did:key:z6Mn...

# Fallback to events auth issuers
--events-auth-trusted-issuer=did:key:z6Mk...

If no admin-specific issuers are configured, admin auth falls back to --events-auth-trusted-issuer.

Response Codes

  • 401 Unauthorized: Missing or invalid Bearer token
  • 401 Unauthorized: Token expired
  • 401 Unauthorized: Untrusted issuer

Error Responses

All endpoints may return error responses in this format (and should return Content-Type: application/json):

{
  "error": "Human-readable error message",
  "status": 409,
  "@type": "err:db/Conflict",
  "cause": {
    "error": "Optional nested error detail",
    "status": 409,
    "@type": "err:db/SomeInnerError"
  }
}

See Errors and Status Codes for complete error reference.

CLI Compatibility Requirements

This section summarizes the contract that third-party server implementations (e.g., Solo) must follow to be compatible with the Fluree CLI (fluree-db-cli). The CLI discovers the API base URL via fluree remote add and constructs endpoint URLs as {base_url}/{operation}/{ledger}.

Required endpoints

EndpointCLI commands
GET /info/{ledger}info, push, pull, clone
GET /show/{ledger}?commit=<ref>show --remote
POST /query/{ledger}query (JSON-LD and SPARQL)
POST /insert/{ledger}insert
POST /upsert/{ledger}upsert
GET /exists/{ledger}clone (pre-create check)
GET /context/{ledger}context get
PUT /context/{ledger}context set
GET /ledgerslist --remote

For sync workflows (clone/push/pull), these additional endpoints are needed:

EndpointCLI commandsNotes
POST /push/{ledger}pushRequired for push
GET /commits/{ledger}clone, pullPaginated export fallback
POST /pack/{ledger}clone, pullPreferred bulk transport; CLI falls back to /commits on 404/405/501
GET /storage/ns/{ledger}clone, pullPack preflight (head CID discovery)

Critical response field: t

The GET /info/{ledger} response must include a t field (integer) representing the current transaction time. This field is used by the CLI for:

  • push: Comparing local_t vs remote_t to determine what commits to send and detect divergence
  • pull: Comparing remote_t vs local_t to determine if new commits are available
  • clone: Guarding against cloning empty ledgers (t == 0) and displaying progress

Omitting t from the info response will cause push and pull to fail with "remote ledger-info response missing 't'".

Transaction response format

The /insert and /upsert endpoints should return a JSON object. The CLI displays the full response as pretty-printed JSON. Common fields include t, tx-id, and commit.hash, but the exact shape is not prescribed — the CLI does not parse individual fields from transaction responses.

Authentication

All endpoints accept Authorization: Bearer <token>. On 401, the CLI attempts a single token refresh (if OIDC is configured) and retries. See Auth contract for the full authentication lifecycle.

Error responses

Error bodies should be JSON with an error or message field. The CLI extracts the first available string from message or error for display. Plain-text error bodies are also accepted.

Headers, Content Types, and Request Sizing

This document covers HTTP headers, content type negotiation, request size limits, and related considerations for the Fluree HTTP API.

Request Headers

Content-Type

Specifies the format of the request body.

Supported Values:

JSON-LD Transactions and Queries:

Content-Type: application/json

Default for JSON-LD transactions and JSON-LD queries.

Content-Type: application/ld+json

Explicit JSON-LD content type.

SPARQL Queries:

Content-Type: application/sparql-query

For SPARQL SELECT, ASK, CONSTRUCT queries.

Content-Type: application/sparql-update

For SPARQL UPDATE operations. See SPARQL Transactions for supported operations.

RDF Formats:

Content-Type: text/turtle

For Turtle RDF format transactions. Supported on /insert (fast direct path) and /upsert.

Content-Type: application/trig

For TriG format transactions with named graphs (GRAPH blocks). Only supported on /upsert - returns 400 error on /insert because named graph ingestion requires the upsert path.

Content-Type: application/n-triples

For N-Triples format (future support).

Content-Type: application/rdf+xml

For RDF/XML format (future support).

Accept

Specifies the desired response format.

Supported Values:

Accept: application/json

Compact JSON format (default).

Accept: application/ld+json

Full JSON-LD with @context.

Accept: application/sparql-results+json

SPARQL JSON Results format (for SPARQL queries).

Accept: application/sparql-results+xml

SPARQL XML Results format (for SPARQL SELECT/ASK queries).

Accept: text/turtle

Turtle RDF format (for CONSTRUCT queries).

Accept: application/rdf+xml

RDF/XML graph format (for CONSTRUCT/DESCRIBE queries).

Accept: application/vnd.fluree.agent+json

Agent JSON format — optimized for LLM/agent consumption. Returns a self-describing envelope with schema, compact rows, and pagination support. See Output Formats for details.

Use the Fluree-Max-Bytes header to set a byte budget for response truncation:

Fluree-Max-Bytes: 32768
Accept: application/n-triples

N-Triples format (future support).

Multiple Accept Values:

You can specify multiple formats with quality values:

Accept: application/ld+json; q=1.0, application/json; q=0.8

The server will choose the best match based on quality values and support.

Authorization

Authentication credentials. Only required when the server has authentication enabled for the relevant endpoint group (see Configuration).

Bearer Token (Ed25519 JWS or OIDC):

Authorization: Bearer eyJhbGciOiJFZERTQSIsImp3ayI6eyJrdHkiOiJPS1AiLCJjcnYiOiJFZDI1NTE5IiwieCI6Ii4uLiJ9fQ...

The server automatically dispatches to the correct verification path based on the token header:

  • Tokens with an embedded jwk field use the Ed25519 verification path
  • Tokens with a kid field use the OIDC/JWKS verification path (requires oidc feature)

Signed Requests:

For JWS/VC signed request bodies, set Content-Type to application/jose:

Content-Type: application/jose

See Signed Requests for details.

Content-Length

The server requires Content-Length for all POST requests:

Content-Length: 1234

Most HTTP clients set this automatically.

Accept-Encoding

Request compressed responses:

Accept-Encoding: gzip, deflate

The server will compress responses when appropriate, reducing bandwidth usage.

Response Header:

Content-Encoding: gzip

User-Agent

Identify your client application:

User-Agent: MyApp/1.0.0 (https://example.com)

Helpful for server logs and troubleshooting.

X-Request-ID

Client-supplied request ID for tracing:

X-Request-ID: abc-123-def-456

The server will include this in logs and response headers for correlation. When a request queues background indexing work, the copied X-Request-ID also appears on the background indexer worker logs so you can connect the foreground request and later indexing activity in plain log search.

Response Headers

Content-Type

Indicates the format of the response body:

Content-Type: application/json; charset=utf-8

Content-Length

Size of the response body in bytes:

Content-Length: 5678

X-Fluree-T

The transaction time of the data returned (for queries):

X-Fluree-T: 42

Useful for tracking which version of data was queried.

X-Fluree-Commit

The commit ContentId of the data returned:

X-Fluree-Commit: abc123def456789...

ETag

Entity tag for caching:

ETag: "abc123def456"

Can be used with If-None-Match for conditional requests.

Cache-Control

Caching directives:

For current queries:

Cache-Control: no-cache

For historical queries:

Cache-Control: public, max-age=31536000, immutable

Historical queries are immutable and cache indefinitely.

X-RateLimit Headers

Rate limit information (if enabled):

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642857600

X-Request-ID

Echo of client-supplied request ID or server-generated ID:

X-Request-ID: abc-123-def-456

X-Response-Time

Server processing time in milliseconds:

X-Response-Time: 45

Content Type Details

JSON-LD (application/json, application/ld+json)

Request Example:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice"
    }
  ]
}

Compact vs Expanded:

application/json returns compact JSON:

[
  { "name": "Alice" }
]

application/ld+json returns with full context:

{
  "@context": {
    "name": "http://schema.org/name"
  },
  "@graph": [
    { "name": "Alice" }
  ]
}

SPARQL Query (application/sparql-query)

Request Example:

PREFIX ex: <http://example.org/ns/>
PREFIX schema: <http://schema.org/>

SELECT ?name
FROM <mydb:main>
WHERE {
  ?person a schema:Person .
  ?person schema:name ?name .
}

Plain text SPARQL query in the request body.

SPARQL Results JSON (application/sparql-results+json)

Response Example:

{
  "head": {
    "vars": ["name"]
  },
  "results": {
    "bindings": [
      {
        "name": {
          "type": "literal",
          "value": "Alice",
          "datatype": "http://www.w3.org/2001/XMLSchema#string"
        }
      }
    ]
  }
}

Follows W3C SPARQL 1.1 Query Results JSON Format specification.

Turtle (text/turtle)

Transaction Request:

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

ex:alice a schema:Person ;
  schema:name "Alice" ;
  schema:age 30 .

CONSTRUCT Response:

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

ex:alice a schema:Person .
ex:alice schema:name "Alice" .

Request Size Limits

Default Limits

The server enforces size limits to prevent resource exhaustion:

Transaction Requests:

  • Default limit: 10 MB
  • Configurable: --max-transaction-size

Query Requests:

  • Default limit: 1 MB
  • Configurable: --max-query-size

History Requests:

  • Default limit: 1 MB
  • Configurable: --max-history-size

Exceeding Limits

If a request exceeds size limits:

Status Code: 413 Payload Too Large

Response:

{
  "error": "Request body exceeds maximum size of 10485760 bytes",
  "status": 413,
  "@type": "err:http/PayloadTooLarge"
}

Configuration

Set custom limits when starting the server:

./fluree-db-server \
  --max-transaction-size 20971520 \    # 20 MB
  --max-query-size 2097152 \           # 2 MB
  --max-response-size 104857600        # 100 MB

Response Size Limits

The server also limits response sizes:

Default limit: 100 MB

If a query result exceeds the limit:

Status Code: 413 Payload Too Large

Response:

{
  "error": "Query result exceeds maximum response size",
  "status": 413,
  "@type": "err:http/ResponseTooLarge"
}

Solution: Use LIMIT and pagination:

{
  "select": ["?name"],
  "where": [...],
  "limit": 1000,
  "offset": 0
}

Compression

Request Compression

Send compressed requests (for large transactions):

Content-Encoding: gzip
Content-Type: application/json

The request body should be gzip-compressed JSON.

Response Compression

Request compressed responses:

Accept-Encoding: gzip, deflate

The server will compress responses when:

  • Client accepts compression
  • Response is larger than threshold (typically 1 KB)
  • Content-Type is compressible

Response Headers:

Content-Encoding: gzip
Vary: Accept-Encoding

Compression Benefits:

  • Reduced bandwidth usage (typically 70-90% for JSON)
  • Faster response times on slower connections
  • Lower costs for cloud deployments

Character Encoding

All text content uses UTF-8 encoding.

Request:

Content-Type: application/json; charset=utf-8

Response:

Content-Type: application/json; charset=utf-8

Unicode characters are supported in:

  • IRIs
  • Literal values
  • Property names
  • Comments

CORS Headers

For web browser access, the server supports Cross-Origin Resource Sharing (CORS).

CORS Request Headers

Preflight Request:

OPTIONS /query HTTP/1.1
Origin: https://example.com
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Content-Type

CORS Response Headers

Preflight Response:

Access-Control-Allow-Origin: https://example.com
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Max-Age: 86400

Actual Response:

Access-Control-Allow-Origin: https://example.com
Access-Control-Allow-Credentials: true

CORS Configuration

Configure CORS when starting the server:

./fluree-db-server \
  --cors-origin "https://example.com" \
  --cors-methods "GET,POST,OPTIONS" \
  --cors-headers "Content-Type,Authorization"

Allow all origins (development only):

./fluree-db-server --cors-origin "*"

Never use --cors-origin "*" in production with credentials.

Caching Headers

ETag and Conditional Requests

The server supports ETags for efficient caching.

Initial Request:

GET /ledgers/mydb:main HTTP/1.1

Response:

HTTP/1.1 200 OK
ETag: "abc123def456"
Cache-Control: no-cache

Conditional Request:

GET /ledgers/mydb:main HTTP/1.1
If-None-Match: "abc123def456"

Not Modified Response:

HTTP/1.1 304 Not Modified
ETag: "abc123def456"

Immutable Historical Data

Historical queries with time specifiers are immutable:

Query:

POST /query HTTP/1.1
{"from": "mydb:main@t:100", ...}

Response:

HTTP/1.1 200 OK
Cache-Control: public, max-age=31536000, immutable
ETag: "mydb:main@t:100:query-hash"

Clients can cache these responses indefinitely.

Custom Headers

X-Fluree-Fuel-Limit

Set query fuel limit to prevent runaway queries:

X-Fluree-Fuel-Limit: 1000000

See Tracking and Fuel Limits for details.

X-Fluree-Timeout

Set query timeout in milliseconds:

X-Fluree-Timeout: 30000

X-Fluree-Policy

Specify a policy to apply (if authorized):

X-Fluree-Policy: ex:restrictive-policy

Best Practices

1. Always Set Content-Type

Explicitly set Content-Type for all requests:

Content-Type: application/json

2. Accept Compression

Always request compression for better performance:

Accept-Encoding: gzip, deflate

3. Use Appropriate Accept Headers

Request the format you need:

Accept: application/json

4. Include User-Agent

Identify your application:

User-Agent: MyApp/1.0.0

5. Handle ETags

Implement ETag caching for frequently accessed resources:

const etag = localStorage.getItem('ledger-etag');
if (etag) {
  headers['If-None-Match'] = etag;
}

6. Monitor Rate Limits

Check rate limit headers and back off when needed:

const remaining = response.headers.get('X-RateLimit-Remaining');
if (remaining < 10) {
  // Slow down requests
}

7. Use Request IDs

Include request IDs for tracing:

X-Request-ID: uuid-v4-here

Signed Requests (JWS/VC)

Fluree supports cryptographically signed requests using JSON Web Signatures (JWS) and Verifiable Credentials (VC). This provides tamper-proof authentication and enables trustless data exchange.

Note: Requires the credential feature flag. See Compatibility and Feature Flags.

Why Sign Requests?

Signed requests provide:

  • Authentication: Prove the identity of the request sender
  • Integrity: Ensure the request hasn’t been tampered with
  • Non-repudiation: Sender cannot deny sending the request
  • Authorization: Cryptographically link requests to specific identities
  • Auditability: Complete audit trail of who did what

JSON Web Signatures (JWS)

JWS is an IETF standard (RFC 7515) for representing digitally signed content as JSON.

JWS Structure

A JWS consists of three parts:

  1. Protected Header: Metadata about the signature (base64url-encoded)
  2. Payload: The actual content being signed (base64url-encoded)
  3. Signature: Cryptographic signature (base64url-encoded)

Compact Serialization:

eyJhbGciOiJFZDI1NTE5In0.eyJmcm9tIjoibXlkYjptYWluIn0.c2lnbmF0dXJl
|_______header_______|.|______payload______|.|_signature_|

JSON Serialization:

{
  "payload": "eyJmcm9tIjoibXlkYjptYWluIn0",
  "signatures": [
    {
      "protected": "eyJhbGciOiJFZDI1NTE5In0",
      "signature": "c2lnbmF0dXJl"
    }
  ]
}

Supported Algorithm

Fluree uses EdDSA (Ed25519) for JWS verification. All signed requests must use "alg": "EdDSA" in the protected header.

Creating Signed Requests

Step 1: Prepare the Payload

Create your query or transaction as usual:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "from": "mydb:main",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Step 2: Encode the Payload

Base64url-encode the JSON payload:

const payload = JSON.stringify(query);
const encodedPayload = base64url.encode(payload);

Step 3: Create the Protected Header

Create a header specifying the algorithm and key ID:

{
  "alg": "EdDSA",
  "kid": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}

Base64url-encode the header:

const header = JSON.stringify({ alg: "EdDSA", kid: keyId });
const encodedHeader = base64url.encode(header);

Step 4: Sign

Create the signing input and sign it:

const signingInput = encodedHeader + "." + encodedPayload;
const signature = sign(signingInput, privateKey);
const encodedSignature = base64url.encode(signature);

Step 5: Construct the JWS

Create the complete JWS:

Compact Format:

const jws = encodedHeader + "." + encodedPayload + "." + encodedSignature;

JSON Format:

{
  "payload": "<encodedPayload>",
  "signatures": [
    {
      "protected": "<encodedHeader>",
      "signature": "<encodedSignature>"
    }
  ]
}

Step 6: Send the Request

Send the JWS to Fluree:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/jose" \
  -d '{
    "payload": "eyJmcm9tIjoibXlkYjptYWluIn0...",
    "signatures": [{
      "protected": "eyJhbGciOiJFZDI1NTE5In0...",
      "signature": "c2lnbmF0dXJl..."
    }]
  }'

Verifiable Credentials (VC)

Verifiable Credentials are a W3C standard for cryptographically verifiable digital credentials.

VC Structure

A Verifiable Credential includes:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1"
  ],
  "type": ["VerifiableCredential"],
  "issuer": "did:key:z6Mkh...",
  "issuanceDate": "2024-01-22T10:00:00Z",
  "credentialSubject": {
    "id": "did:key:z6Mkh...",
    "flureeAction": {
      "query": {
        "from": "mydb:main",
        "select": ["?name"],
        "where": [...]
      }
    }
  },
  "proof": {
    "type": "Ed25519Signature2020",
    "created": "2024-01-22T10:00:00Z",
    "verificationMethod": "did:key:z6Mkh...#z6Mkh...",
    "proofPurpose": "authentication",
    "proofValue": "z58DAdFfa9SkqZMVP..."
  }
}

Creating a Verifiable Credential

Use a VC library to create signed credentials:

import { issue } from '@digitalbazaar/vc';

const credential = {
  '@context': ['https://www.w3.org/2018/credentials/v1'],
  type: ['VerifiableCredential'],
  issuer: didKey,
  issuanceDate: new Date().toISOString(),
  credentialSubject: {
    id: didKey,
    flureeAction: {
      query: queryObject
    }
  }
};

const verifiableCredential = await issue({
  credential,
  suite: new Ed25519Signature2020({ key: keyPair }),
  documentLoader
});

Sending a VC

Send the VC to Fluree:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/vc+ld+json" \
  -d '{
    "@context": ["https://www.w3.org/2018/credentials/v1"],
    "type": ["VerifiableCredential"],
    ...
  }'

Decentralized Identifiers (DIDs)

Fluree uses DIDs to identify public keys.

Supported DID Methods

did:key - Public key embedded in the DID (recommended):

did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK

did:web - Web-based DID resolution:

did:web:example.com:users:alice

did:ion - ION network DIDs (future support):

did:ion:EiClkZMDxPKqC9c-umQfTkR8vvZ9JPhl_xLDI9Nfk38w5w

DID Resolution

Fluree resolves DIDs to public keys:

  1. did:key: Public key extracted directly from DID
  2. did:web: Fetched from https://example.com/.well-known/did.json
  3. did:ion: Resolved via ION network

Public Key Resolution

Standalone server signed requests verify Ed25519 JWS material from the request itself (for example embedded JWK / did:key) or configured OIDC/JWKS issuers. There is no /admin/keys registration endpoint.

Request Verification

Verification Process

When Fluree receives a signed request:

  1. Extract the signature and header
  2. Resolve the key ID (kid) to a public key
  3. Verify the signature using the public key
  4. Check expiration (if exp claim present)
  5. Validate issuer (if required)
  6. Apply authorization policies based on DID

Verification Failure

If verification fails:

Status Code: 401 Unauthorized

Response:

{
  "error": "Invalid signature",
  "status": 401,
  "@type": "err:auth/InvalidSignature"
}

Key Management

Generating Keys

Ed25519 (EdDSA):

import { generateKeyPair } from '@stablelib/ed25519';

const keyPair = generateKeyPair();
// keyPair.publicKey - 32 bytes
// keyPair.secretKey - 64 bytes

Storing Keys

Secure Storage:

  • Hardware Security Modules (HSM)
  • Key Management Services (AWS KMS, Azure Key Vault)
  • Encrypted files with strong passphrases
  • Hardware wallets for blockchain-based DIDs

Never:

  • Store private keys in code
  • Commit keys to version control
  • Send keys over insecure channels
  • Share keys between applications

Key Rotation

Rotate keys regularly:

  1. Generate new key pair
  2. Register new public key with Fluree
  3. Update client to use new key
  4. Revoke old key after transition period
  5. Remove old key from Fluree

Authorization with Signed Requests

Identity-Based Policies

Fluree policies can use the signer’s DID for authorization:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "f": "https://ns.flur.ee/db#"
  },
  "@id": "ex:admin-policy",
  "f:policy": [
    {
      "f:subject": "did:key:z6Mkh...",
      "f:action": ["query", "transact"],
      "f:allow": true
    }
  ]
}

Role-Based Access

Link DIDs to roles:

{
  "@id": "did:key:z6Mkh...",
  "@type": "ex:User",
  "ex:role": "ex:Administrator"
}

Policy checks the role:

{
  "f:policy": [
    {
      "f:subject": { "ex:role": "ex:Administrator" },
      "f:action": "*",
      "f:allow": true
    }
  ]
}

Code Examples

JavaScript/TypeScript

import jose from 'jose';

async function signQuery(query: object, privateKey: Uint8Array) {
  const payload = JSON.stringify(query);
  
  const jws = await new jose.SignJWT(query)
    .setProtectedHeader({ alg: 'EdDSA', kid: 'did:key:z6Mkh...' })
    .setIssuedAt()
    .setExpirationTime('5m')
    .sign(privateKey);
  
  return jws;
}

// Send signed request
const signedQuery = await signQuery(query, privateKey);
const response = await fetch('http://localhost:8090/v1/fluree/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/jose' },
  body: signedQuery
});

Python

from jwcrypto import jwk, jws
import json

def sign_query(query, private_key):
    # Create JWK from private key
    key = jwk.JWK.from_json(private_key)
    
    # Create JWS
    payload = json.dumps(query).encode('utf-8')
    jws_token = jws.JWS(payload)
    jws_token.add_signature(key, alg='EdDSA', 
                           protected=json.dumps({"kid": "did:key:z6Mkh..."}))
    
    return jws_token.serialize()

# Send signed request
signed_query = sign_query(query, private_key)
response = requests.post('http://localhost:8090/v1/fluree/query',
                        headers={'Content-Type': 'application/jose'},
                        data=signed_query)

Best Practices

1. Use EdDSA (Ed25519)

EdDSA provides:

  • Excellent security (128-bit security level)
  • Fast signing and verification
  • Small signatures (64 bytes)
  • Deterministic (no random number generation needed)

2. Include Expiration

Always set an expiration time:

{
  "alg": "EdDSA",
  "exp": 1642857600
}

3. Use Short Expiration Times

For interactive requests: 5-15 minutes For batch processes: 1-24 hours Never: No expiration

4. Rotate Keys Regularly

Rotate signing keys every 90-180 days.

5. Secure Key Storage

Use proper key management:

  • Development: Encrypted local storage
  • Production: HSM or KMS

6. Validate on Server

Never trust client-side validation alone. Fluree always validates signatures server-side.

7. Use HTTPS

Always use HTTPS with signed requests to prevent replay attacks.

8. Implement Nonce/JTI

Include a unique identifier to prevent replay:

{
  "alg": "EdDSA",
  "jti": "unique-request-id-12345"
}

Troubleshooting

“Invalid Signature” Error

Causes:

  • Wrong private key used
  • Payload modified after signing
  • Incorrect base64url encoding
  • Algorithm mismatch

Solution: Verify the signing process end-to-end.

“Key Not Found” Error

Causes:

  • DID not registered with Fluree
  • Incorrect key ID (kid) in header
  • DID resolution failed

Solution: Register public key or check DID format.

“Signature Expired” Error

Causes:

  • Request sent after expiration time
  • Clock skew between client and server

Solution: Use NTP to sync clocks, increase expiration time.

Errors and Status Codes

This document provides a complete reference for HTTP status codes and error responses in the Fluree API.

Error Response Format

fluree-server errors return a consistent JSON structure:

{
  "error": "Human-readable error description",
  "status": 400,
  "@type": "err:db/BadRequest",
  "cause": {
    "error": "Optional nested cause",
    "status": 400,
    "@type": "err:db/JsonParse"
  }
}

Fields:

  • error: Human-readable error message (primary diagnostic text)
  • status: HTTP status code (numeric)
  • @type: Compact error type IRI (stable, machine-readable category)
  • cause: Optional nested cause chain (only present for select errors)

Stability note: clients (including the Fluree CLI) may pattern-match on substrings within the error field for targeted hints, so error messages should be stable across releases.

HTTP Status Codes

Success Codes (2xx)

200 OK

The request succeeded.

Used for:

  • Successful queries
  • Successful transactions
  • Successful GET requests

Example:

{
  "t": 5,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT5"
}

201 Created

A new resource was created.

Used for:

  • Ledger creation
  • Index creation

Example:

{
  "ledger_id": "mydb:main",
  "created": "2024-01-22T10:00:00.000Z"
}

204 No Content

Request succeeded with no response body.

Used for:

  • DELETE operations
  • Administrative commands

Client Error Codes (4xx)

400 Bad Request

The request is malformed or contains invalid data.

Common Causes:

  • Invalid JSON syntax
  • Invalid JSON-LD structure
  • Invalid SPARQL syntax
  • Invalid IRI format
  • Type mismatch

Error typing:

The server includes a compact error type IRI in the @type field. This is the preferred stable, machine-readable category for programmatic handling.

Example:

{
  "error": "Invalid JSON: expected value at line 5, column 12",
  "status": 400,
  "@type": "err:db/JsonParse"
}

How to Fix:

  • Validate JSON syntax
  • Check IRI formats
  • Verify JSON-LD structure
  • Review the error message and optional cause

401 Unauthorized

Authentication is required but not provided or invalid.

Common Causes:

  • Missing authentication credentials
  • Invalid API key
  • Expired JWT token
  • Invalid signature (for signed requests)

Example:

{
  "error": "Bearer token required",
  "status": 401,
  "@type": "err:db/Unauthorized"
}

How to Fix:

  • Provide valid authentication credentials
  • Check API key or token
  • Renew expired tokens
  • Verify signature process for signed requests

403 Forbidden

Authentication succeeded but authorization failed.

Common Causes:

  • Insufficient permissions for operation
  • Policy denies access
  • Ledger access restricted

Example:

{
  "error": "access denied (403)",
  "status": 403,
  "@type": "err:db/Forbidden"
}

How to Fix:

  • Verify user has required permissions
  • Check policy configuration
  • Contact administrator for access

404 Not Found

The requested resource doesn’t exist.

Common Causes:

  • Ledger doesn’t exist
  • Entity not found
  • Endpoint doesn’t exist

Example:

{
  "error": "Ledger not found: mydb:main",
  "status": 404,
  "@type": "err:db/LedgerNotFound"
}

How to Fix:

  • Verify ledger name spelling
  • Check if ledger was created
  • Verify entity IRI

408 Request Timeout

The request took too long to process.

Common Causes:

  • Query timeout exceeded
  • Complex query taking too long
  • Database under heavy load

Example:

{
  "error": "Query execution exceeded timeout",
  "status": 408,
  "@type": "err:db/Timeout"
}

How to Fix:

  • Simplify query
  • Add more specific filters
  • Use LIMIT clause
  • Increase timeout setting
  • Check server load

409 Conflict

The request conflicts with current server state.

Common Causes:

  • Concurrent modification conflict
  • Ledger already exists
  • Resource state conflict

Example:

{
  "error": "Ledger already exists: mydb:main",
  "status": 409,
  "@type": "err:db/LedgerExists"
}

How to Fix:

  • Use different ledger name
  • Handle concurrent modifications with retry logic
  • Check resource state before modifying

413 Payload Too Large

The request or response exceeds size limits.

Common Causes:

  • Transaction too large
  • Query result too large
  • Request body exceeds limit

Example:

{
  "error": "request body exceeds configured limit",
  "status": 413,
  "@type": "err:db/PayloadTooLarge"
}

How to Fix:

  • Split large transactions into batches
  • Use LIMIT clause for queries
  • Use pagination for large result sets
  • Increase size limits (if appropriate)

415 Unsupported Media Type

The Content-Type is not supported.

Common Causes:

  • Wrong Content-Type header
  • Unsupported format
  • Missing Content-Type header

Example:

{
  "error": "Content-Type not supported: text/plain",
  "status": 415,
  "@type": "err:db/UnsupportedMediaType"
}

How to Fix:

  • Set correct Content-Type header
  • Use supported format
  • Check API documentation for supported types

422 Unprocessable Entity

The request is well-formed but semantically invalid.

Common Causes:

  • Invalid data values
  • Business rule violation
  • Semantic constraint violation

Example:

{
  "error": "semantic constraint violation",
  "status": 422,
  "@type": "err:db/ConstraintViolation"
}

How to Fix:

  • Validate data before submitting
  • Check business rules
  • Review constraint requirements

429 Too Many Requests

Rate limit exceeded.

Common Causes:

  • Too many requests in time window
  • Exceeded quota

Example:

{
  "error": "rate limit exceeded",
  "status": 429,
  "@type": "err:db/RateLimited"
}

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1642857645
Retry-After: 45

How to Fix:

  • Wait before retrying (check Retry-After header)
  • Implement exponential backoff
  • Reduce request rate
  • Request higher rate limit

Server Error Codes (5xx)

500 Internal Server Error

An unexpected error occurred on the server.

Common Causes:

  • Unhandled exception
  • Database error
  • Internal logic error

Example:

{
  "error": "internal error",
  "status": 500,
  "@type": "err:db/Internal"
}

How to Fix:

  • Check server logs
  • Report to system administrator
  • Retry request
  • Contact support if persists

502 Bad Gateway

Error communicating with upstream service.

Common Causes:

  • Storage backend unavailable
  • Nameservice unavailable
  • Network error

Example:

{
  "error": "upstream service error",
  "status": 502,
  "@type": "err:db/BadGateway"
}

How to Fix:

  • Check storage backend status
  • Verify network connectivity
  • Check AWS/cloud service status
  • Retry with backoff

503 Service Unavailable

The server is temporarily unavailable.

Common Causes:

  • Server overloaded
  • Maintenance mode
  • Resource exhaustion

Example:

{
  "error": "service unavailable",
  "status": 503,
  "@type": "err:db/ServiceUnavailable"
}

Response Headers:

Retry-After: 300

How to Fix:

  • Wait and retry (check Retry-After header)
  • Implement retry logic with exponential backoff
  • Check service status page

504 Gateway Timeout

Upstream service didn’t respond in time.

Common Causes:

  • Storage backend timeout
  • Long-running query
  • Network latency

Example:

{
  "error": "gateway timeout",
  "status": 504,
  "@type": "err:db/GatewayTimeout"
}

How to Fix:

  • Retry request
  • Check storage backend performance
  • Simplify query
  • Increase timeout settings

Error Handling Best Practices

1. Always Check Status Codes

Check HTTP status before parsing response:

const response = await fetch(url, options);
if (!response.ok) {
  const err = await response.json();
  // err.error is the primary human-readable message, err["@type"] is the stable category.
  throw new Error(`${err["@type"] || "err:unknown"}: ${err.error}`);
}

2. Implement Retry Logic

Retry transient errors with exponential backoff:

async function retryRequest(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (err) {
      if (!isRetryable(err) || i === maxRetries - 1) {
        throw err;
      }
      await sleep(Math.pow(2, i) * 1000);
    }
  }
}

function isRetryable(err) {
  return [408, 429, 502, 503, 504].includes(err.status);
}

3. Handle Rate Limits

Respect rate limit headers:

if (response.status === 429) {
  const retryAfter = response.headers.get('Retry-After');
  await sleep(retryAfter * 1000);
  return retryRequest(fn);
}

4. Log Error Details

Log complete error context for debugging:

console.error({
  status: response.status,
  error: errorData.error,
  error_type: errorData["@type"],
  cause: errorData.cause,
  requestId: response.headers.get('X-Request-ID')
});

5. User-Friendly Messages

Show appropriate messages to users:

function getUserMessage(error) {
  switch (error["@type"]) {
    case 'err:db/LedgerNotFound':
      return 'Database not found. Please check the name.';
    case 'err:db/Timeout':
      return 'Query took too long. Please try a simpler query.';
    case 'err:db/RateLimited':
      return 'Too many requests. Please wait a moment.';
    default:
      return 'An error occurred. Please try again.';
  }
}

6. Graceful Degradation

Handle errors gracefully:

try {
  const data = await query(ledger);
  return data;
} catch (err) {
  if (err["@type"] === 'err:db/LedgerNotFound') {
    // Create ledger and retry
    await createLedger(ledger);
    return await query(ledger);
  }
  throw err;
}

7. Circuit Breaker Pattern

Prevent cascading failures:

class CircuitBreaker {
  constructor(threshold = 5, timeout = 60000) {
    this.failures = 0;
    this.threshold = threshold;
    this.timeout = timeout;
    this.state = 'CLOSED';
  }
  
  async execute(fn) {
    if (this.state === 'OPEN') {
      throw new Error('Circuit breaker is OPEN');
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }
  
  onSuccess() {
    this.failures = 0;
    this.state = 'CLOSED';
  }
  
  onFailure() {
    this.failures++;
    if (this.failures >= this.threshold) {
      this.state = 'OPEN';
      setTimeout(() => {
        this.state = 'HALF_OPEN';
        this.failures = 0;
      }, this.timeout);
    }
  }
}

Query

Fluree supports two powerful query languages for querying graph data: JSON-LD Query (Fluree’s native query language) and SPARQL (the W3C standard). Both languages provide access to Fluree’s unique features including time travel, graph sources, and policy enforcement.

Query Languages

JSON-LD Query

Fluree’s native query language that uses JSON-LD syntax. JSON-LD Query provides a natural, JSON-based interface for querying graph data, making it easy to integrate with modern applications.

Key Features:

  • JSON-based syntax (no string parsing)
  • Full support for time travel (@t:, @iso:, @commit:)
  • Graph source integration
  • Policy enforcement
  • History queries

SPARQL

Industry-standard SPARQL 1.1 query language. Fluree provides full SPARQL support, enabling compatibility with existing RDF tools and knowledge graphs.

Key Features:

  • W3C SPARQL 1.1 compliant
  • FROM and FROM NAMED clauses
  • CONSTRUCT queries
  • Time travel support (planned)
  • Standard SPARQL functions

Query Features

Output Formats

Fluree supports multiple output formats for query results:

  • JSON-LD: Compact, context-aware JSON with IRI expansion/compaction
  • SPARQL JSON: Standard SPARQL result format
  • Typed JSON: Type-preserving JSON with datatype information

Datasets and Multi-Graph Execution

Query across multiple graphs and ledgers:

  • FROM clauses: Specify default graphs
  • FROM NAMED: Query named graphs
  • Multi-ledger queries: Query across different ledgers
  • Time-aware datasets: Query graphs at different time points

CONSTRUCT Queries

Generate RDF graphs from query results:

  • Transform query results into RDF
  • Create new graph structures
  • Extract subgraphs

Graph Crawl

Traverse graph relationships:

  • Follow links between entities
  • Recursive graph traversal
  • Depth-limited crawling

Explain Plans

Understand query execution:

  • View query plans
  • Analyze index usage
  • Optimize query performance

Tracking and Fuel Limits

Monitor and control query execution:

  • Query tracking and debugging
  • Fuel limits for resource control
  • Performance monitoring

Nameservice Queries

Query metadata about all ledgers and graph sources in the system. The nameservice stores information about every database including commit state, index state, and configuration.

JSON-LD Query:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "select": ["?ledger", "?t"],
  "where": [
    { "@id": "?ns", "@type": "f:LedgerSource", "f:ledger": "?ledger", "f:t": "?t" }
  ]
}

SPARQL:

PREFIX f: <https://ns.flur.ee/db#>
SELECT ?ledger ?t WHERE { ?ns a f:LedgerSource ; f:ledger ?ledger ; f:t ?t }

See the Ledgers and Nameservice concept documentation for details.

Time Travel in Queries

Fluree supports querying historical data using time specifiers in ledger references:

Transaction Number:

ledger:main@t:100

ISO 8601 Timestamp:

ledger:main@iso:2024-01-15T10:30:00Z

Commit ContentId:

ledger:main@commit:bafybeig...

See the Time Travel concept documentation for details.

Graph Source Queries

Query graph sources (BM25, Vector, Iceberg, R2RML) using the same syntax as regular ledgers:

{
  "@context": {
    "f": "https://ns.flur.ee/db#"
  },
  "from": "products:main",
  "select": ["?product"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product" }
    }
  ]
}

See the Graph Sources concept documentation for details.

Policy Enforcement

Policies are automatically enforced during query execution, ensuring users only see data they’re authorized to access. No special syntax is required—policies are applied transparently.

See the Policy Enforcement concept documentation for details.

Getting Started

Basic JSON-LD Query

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Basic SPARQL Query

PREFIX ex: <http://example.org/ns/>

SELECT ?name
WHERE {
  ?person ex:name ?name .
}

Query with Time Travel

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "from": "ledger:main@t:100",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Query Performance

Fluree’s query engine is optimized for:

  • Automatic Join Ordering: The planner reorders all WHERE-clause patterns (triples, UNION, OPTIONAL, MINUS, search patterns, and more) using statistics-driven cardinality estimates. When database statistics are available, it uses HLL-derived property counts; otherwise it falls back to heuristic constants. Estimates are context-aware — the planner tracks which variables are already bound and adjusts costs accordingly, so a triple whose subject is bound from an earlier pattern is scored as a cheap per-subject lookup rather than a full scan.
  • Index Selection: Automatically chooses optimal indexes (SPOT, POST, OPST, PSOT) based on which triple components are bound.
  • Filter Optimization: Filters are automatically applied as soon as their required variables are bound, regardless of where they appear in the query. Range-safe filters are pushed down to index scans, and filters are evaluated inline during joins when possible.
  • Streaming Execution: Results stream as they’re computed
  • Parallel Processing: Parallel execution where possible

Best Practices

  1. Use Appropriate Indexes: Structure queries to leverage indexes
  2. Limit Result Sets: Use LIMIT clauses for large result sets
  3. Time Travel Efficiency: Use @t: when transaction numbers are known
  4. Graph Source Selection: Choose appropriate graph sources for query patterns
  5. Policy Awareness: Understand how policies affect query results

JSON-LD Query

JSON-LD Query is Fluree’s native query language, providing a JSON-based interface for querying graph data. It combines the expressiveness of SPARQL with the convenience of JSON, making it easy to integrate with modern applications.

Overview

JSON-LD Query uses JSON-LD syntax to express queries, leveraging @context for IRI expansion and compaction. Queries are structured as JSON objects with familiar clauses like select, where, from, etc.

Basic Query Structure

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "select": ["?name", "?age"],
  "where": [
    { "@id": "?person", "ex:name": "?name", "ex:age": "?age" }
  ]
}

Query Clauses

@context

The @context defines namespace mappings for IRI expansion/compaction:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/",
    "foaf": "http://xmlns.com/foaf/0.1/"
  }
}

When querying via the CLI, omitting @context causes the ledger’s default context to be injected automatically. The HTTP API defaults this behavior off; pass ?default-context=true to opt in for a request. To opt out explicitly, pass an empty object: "@context": {}. See opting out of the default context.

Note: When using fluree-db-api directly (embedded), @context is not injected automatically. Queries must supply their own context or use full IRIs. Use db_with_default_context() or GraphDb::with_default_context() to opt in.

select

Specifies what to return in results. The shape of select determines the shape of each output row.

Bare variable — one column, each row is the bound value (not wrapped in an array):

{
  "select": "?name"
}

Variable list — each row is [v1, v2, ...]:

{
  "select": ["?name", "?age"]
}

Wildcard — every variable bound in the WHERE clause:

{
  "select": "*"
}

Subject expansion — return a nested JSON-LD object instead of a flat row. The key is either a variable (the WHERE clause binds it to subjects) or an IRI constant (the named subject is expanded directly, no WHERE needed):

{
  "select": { "?person": ["*", { "schema:knows": ["@id", "schema:name"] }] },
  "where": { "@id": "?person", "@type": "schema:Person" }
}
{
  "select": { "ex:alice": ["*"] }
}

The array value is the selection spec — "*" for all forward properties, individual property names ("schema:name"), or nested object forms for sub-selections. Add "depth": N at the query top level to bound auto-expansion of unselected references.

Mixed array — combine flat variables and subject expansions in one row, in any order. Each object is an independent expansion with its own root and selection spec:

{
  "select": [
    "?age",
    { "?person": ["@id", "schema:name"] },
    { "?org": ["@id", "schema:name"] }
  ],
  "where": {
    "@id": "?person",
    "ex:age": "?age",
    "ex:worksFor": "?org"
  }
}

Each row is [age, expanded_person, expanded_org]. When every column is an IRI-constant expansion (no variable dependency anywhere in select), the output is independent of the WHERE solution count: the formatter emits one row regardless of how many solutions the WHERE produced.

S-expression columns — a select item that is a string starting with ( is an S-expression, in two flavors:

Aggregates. Auto-aliased (?count, ?sum, etc.) or with an explicit alias via (as ...):

{
  "select": ["?category", "(count ?product)"],
  "groupBy": ["?category"]
}
{
  "select": ["?category", "(as (count ?product) ?total)"],
  "groupBy": ["?category"]
}

Scalar expressions (COALESCE, IF, arithmetic, string/hash/date functions, …). Always require an explicit alias via (as <expr> ?alias). Mirrors SPARQL SELECT (expr AS ?alias):

{
  "select": ["?p", "(as (coalesce ?titleFr ?titleEn \"untitled\") ?title)"]
}
{
  "select": [
    "?name",
    "(as (coalesce ?email \"no-email\") ?contact)",
    "(as (count ?favNums) ?count)"
  ],
  "groupBy": ["?name", "?contact"]
}

Scalar select expressions desugar to a bind in the WHERE pattern list. If the expression references an aggregate’s output variable (e.g. (as (+ ?count 1) ?adjusted)) the bind runs after aggregation; otherwise it runs before, so the alias is also a valid groupBy key.

The same expression language is shared with bind and filter. The one exception is in / not-in, which require the bracketed-list form and are not accepted in select expressions — rewrite as (or (= ?x 1) (= ?x 2) …) instead.

ask

Tests whether a set of patterns has any solution, returning true or false. No variables are projected. Equivalent to SPARQL ASK. The value of ask is the where clause itself — an array or object of the same patterns accepted by where:

{
  "@context": { "ex": "http://example.org/ns/" },
  "ask": [
    { "@id": "?person", "ex:name": "Alice" }
  ]
}

Single-pattern shorthand (object instead of array):

{
  "@context": { "ex": "http://example.org/ns/" },
  "ask": { "@id": "?person", "ex:name": "Alice" }
}

Returns true if at least one solution exists, false otherwise. Internally, LIMIT 1 is applied for efficiency.

from

Specifies which ledger(s) to query:

Single Ledger:

{
  "from": "mydb:main"
}

Multiple Ledgers:

{
  "from": ["mydb:main", "otherdb:main"]
}

Time Travel:

{
  "from": "mydb:main@t:100"
}
{
  "from": "mydb:main@iso:2024-01-15T10:30:00Z"
}
{
  "from": "mydb:main@commit:bafybeig..."
}

where

The where clause contains query patterns:

Basic Pattern:

{
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Multiple Patterns:

{
  "where": [
    { "@id": "?person", "ex:name": "?name" },
    { "@id": "?person", "ex:age": "?age" }
  ]
}

Type Pattern:

{
  "where": [
    { "@id": "?person", "@type": "ex:User", "ex:name": "?name" }
  ]
}

Pattern Types

Object Patterns

Match triples where subject, predicate, and object are specified:

{
  "@id": "ex:alice",
  "ex:name": "Alice"
}

Variable Patterns

Use variables (starting with ?) to match unknown values:

{
  "@id": "?person",
  "ex:name": "?name"
}

Type Patterns

Match entities by type:

{
  "@id": "?person",
  "@type": "ex:User",
  "ex:name": "?name"
}

Property Join Patterns

Match multiple properties of the same subject:

{
  "@id": "?person",
  "ex:name": "?name",
  "ex:age": "?age",
  "ex:email": "?email"
}

Advanced Patterns

Optional Patterns

Match optional data that may not exist:

{
  "where": [
    { "@id": "?person", "ex:name": "?name" },
    ["optional", { "@id": "?person", "ex:email": "?email" }]
  ]
}

Sibling vs. grouped OPTIONAL — semantics

The two forms below are not equivalent. Each ["optional", ...] array is a single OPTIONAL block in SPARQL terms — every item inside is part of the same conjunctive group, and a row is null-extended only when the group as a whole fails to match. To express two independent left joins, write two sibling arrays.

Sibling OPTIONALs — two independent left joins:

{
  "where": [
    { "@id": "?person", "ex:name": "?name" },
    ["optional", { "@id": "?person", "ex:email": "?email" }],
    ["optional", { "@id": "?person", "ex:phone": "?phone" }]
  ]
}

Equivalent SPARQL:

?person ex:name ?name .
OPTIONAL { ?person ex:email ?email }
OPTIONAL { ?person ex:phone ?phone }

?email and ?phone are independent — a person with only an email keeps ?email bound and gets null for ?phone, and vice versa.

Grouped OPTIONAL — one conjunctive left join:

{
  "where": [
    { "@id": "?person", "ex:name": "?name" },
    ["optional",
     { "@id": "?person", "ex:email": "?email" },
     { "@id": "?person", "ex:phone": "?phone" }
    ]
  ]
}

Equivalent SPARQL:

?person ex:name ?name .
OPTIONAL { ?person ex:email ?email . ?person ex:phone ?phone }

?email and ?phone are bound together — a person who has an email but no phone is null-extended on both variables, because the inner conjunctive group did not match as a whole.

Filters and binds inside OPTIONAL

filter and bind constrain or compute from existing bindings, so they need something to anchor to inside the OPTIONAL block. Any binding-producing pattern qualifies as an anchor — a node-map, values, an earlier bind, a nested optional, or a sub-query. A filter or bind as the very first item in an OPTIONAL array is rejected.

["optional",
  { "@id": "?person", "ex:age": "?age" },
  ["filter", "(> ?age 18)"]
]
["optional",
  ["values", ["?x", [1, 2, 3]]],
  ["filter", "(> ?x 0)"]
]

Union Patterns

Match data from multiple alternative patterns:

{
  "where": [
    ["union",
     { "@id": "?person", "ex:name": "?name" },
     { "@id": "?person", "ex:alias": "?name" }
    ]
  ]
}

Graph Patterns

Scope patterns to a named graph:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "mydb:main",
  "fromNamed": {
    "products": {
      "@id": "mydb:main",
      "@graph": "http://example.org/graphs/products"
    }
  },
  "select": ["?product", "?name"],
  "where": [
    ["graph", "products", { "@id": "?product", "ex:name": "?name" }]
  ]
}

Notes:

  • fromNamed is an object whose keys are dataset-local aliases. Each value is an object with @id (ledger reference) and optional @graph (graph selector IRI).
  • The second element of ["graph", ...] can be a dataset-local alias (recommended) or a graph IRI.
  • The legacy "from-named": [...] array format is still accepted for backward compatibility.
  • For dataset and named-graph configuration details, see docs/query/datasets.md.

Filter Patterns

Apply conditions to filter results:

Single Filter:

{
  "where": [
    { "@id": "?person", "ex:age": "?age" },
    ["filter", "(> ?age 18)"]
  ]
}

Multiple Filters:

{
  "where": [
    { "@id": "?person", "ex:age": "?age", "ex:name": "?name" },
    ["filter", "(> ?age 18)", "(strStarts ?name \"A\")"]
  ]
}

Complex Filters:

{
  "where": [
    { "@id": "?person", "ex:age": "?age", "ex:last": "?last" },
    ["filter", "(and (> ?age 45) (strEnds ?last \"ith\"))"]
  ]
}

Bind Patterns

Compute values and bind to variables:

{
  "where": [
    { "@id": "?person", "ex:age": "?age" },
    ["bind", "?nextAge", "(+ ?age 1)"]
  ]
}

Values Patterns

Provide initial bindings:

{
  "where": [
    ["values", "?name", ["Alice", "Bob", "Carol"]],
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Property Paths

Property paths enable transitive traversal of predicates, following chains of relationships across multiple hops. Define a path alias in @context using @path, then use the alias as a key in WHERE node-maps.

Defining a Path Alias:

Add a term definition with @path to your @context. The value of @path can be a string (SPARQL property path syntax) or an array (S-expression form).

String Form (SPARQL syntax):

{
  "@context": {
    "ex": "http://example.org/",
    "knowsPlus": { "@path": "ex:knows+" }
  },
  "select": ["?who"],
  "where": [
    { "@id": "ex:alice", "knowsPlus": "?who" }
  ]
}

This returns all entities reachable from ex:alice by following one or more ex:knows edges transitively.

Array Form (S-expression):

{
  "@context": {
    "ex": "http://example.org/",
    "knowsPlus": { "@path": ["+", "ex:knows"] }
  },
  "select": ["?who"],
  "where": [
    { "@id": "ex:alice", "knowsPlus": "?who" }
  ]
}

The array form uses the operator as the first element followed by its operands.

Supported Operators:

OperatorString syntaxArray syntaxDescription
One or moreex:p+["+", "ex:p"]Transitive closure (1+ hops)
Zero or moreex:p*["*", "ex:p"]Reflexive transitive closure (0+ hops)
Inverse^ex:p["^", "ex:p"]Traverse predicate in reverse direction
Alternativeex:a|ex:b[“|”, “ex:a”, “ex:b”]Match any of several predicates
Sequenceex:a/ex:b["/", "ex:a", "ex:b"]Follow a chain of predicates (property chain)

Zero-or-more (*) includes the starting node itself in the results (zero hops).

Sequence (/) compiles into a chain of triple patterns joined by internal intermediate variables. Each step must be a simple predicate or an inverse simple predicate (^ex:p). For example, "ex:friend/ex:name" matches paths where subject has a ex:friend whose ex:name is the result.

Parsed but Not Yet Supported:

The following operators are recognized by the parser but currently rejected (not yet supported for execution):

OperatorString syntaxArray syntax
Zero or oneex:p?["?", "ex:p"]

Subject and Object Variables:

Path aliases work with variables on either side:

{
  "@context": {
    "ex": "http://example.org/",
    "knowsPlus": { "@path": "ex:knows+" }
  },
  "select": ["?x", "?y"],
  "where": [
    { "@id": "?x", "knowsPlus": "?y" }
  ]
}

This returns all pairs (?x, ?y) where ?y is transitively reachable from ?x via ex:knows.

Fixed Subject or Object:

You can also fix one end to an IRI:

{
  "@context": {
    "ex": "http://example.org/",
    "knowsPlus": { "@path": "ex:knows+" }
  },
  "select": ["?who"],
  "where": [
    { "@id": "?who", "knowsPlus": { "@id": "ex:bob" } }
  ]
}

This finds all entities that can reach ex:bob through one or more ex:knows hops.

Inverse Example:

Find entities that know ex:bob (traverse ex:knows in reverse):

{
  "@context": {
    "ex": "http://example.org/",
    "knownBy": { "@path": "^ex:knows" }
  },
  "select": ["?who"],
  "where": [
    { "@id": "ex:bob", "knownBy": "?who" }
  ]
}

Alternative Example:

Match entities connected by either ex:knows or ex:likes:

{
  "@context": {
    "ex": "http://example.org/",
    "connected": { "@path": "ex:knows|ex:likes" }
  },
  "select": ["?who"],
  "where": [
    { "@id": "ex:alice", "connected": "?who" }
  ]
}

Inverse can also be applied to complex paths (sequences and alternatives):

  • ^(ex:friend/ex:name) — inverse of a sequence: reverses the step order and inverts each step, producing (^ex:name)/(^ex:friend)
  • ^(ex:name|ex:nick) — inverse of an alternative: distributes the inverse into each branch, producing (^ex:name)|(^ex:nick)
  • Double inverse cancels: ^(^ex:p) simplifies to ex:p

Array form examples:

{ "@path": ["^", ["/", "ex:friend", "ex:name"]] }
{ "@path": ["^", ["|", "ex:name", "ex:nick"]] }

Inverse is supported inside alternative branches (e.g. ex:knows|^ex:knows matches both directions of the ex:knows predicate).

Alternative branches can also be sequence chains. For example, ex:friend/ex:name|ex:colleague/ex:name returns the name of a friend OR the name of a colleague:

{
  "@context": {
    "ex": "http://example.org/",
    "contactName": { "@path": "ex:friend/ex:name|ex:colleague/ex:name" }
  },
  "select": ["?name"],
  "where": [
    { "@id": "ex:alice", "contactName": "?name" }
  ]
}

Branches can freely mix simple predicates, inverse predicates, and sequence chains (e.g. ex:name|ex:friend/ex:name|^ex:colleague).

Alternative uses UNION semantics (bag, not set): when multiple branches match the same (subject, object) pair, duplicate solutions are produced. Use selectDistinct if set semantics are needed.

Sequence (Property Chain) Example:

Follow a chain of predicates. The string form uses / to separate steps:

{
  "@context": {
    "ex": "http://example.org/",
    "friendName": { "@path": "ex:friend/ex:name" }
  },
  "select": ["?person", "?name"],
  "where": [
    { "@id": "?person", "friendName": "?name" }
  ]
}

The array form uses "/" as the operator:

{ "@path": ["/", "ex:friend", "ex:name"] }

Sequence steps can include inverse predicates. For example, "^ex:parent/ex:name" traverses the ex:parent link backwards, then follows ex:name:

{ "@path": "^ex:parent/ex:name" }

Longer chains are supported: "ex:friend/ex:address/ex:city" follows three hops.

Sequence steps can also be alternatives. For example, "ex:friend/(ex:name|ex:nick)" distributes the alternative into a union of chains (ex:friend/ex:name and ex:friend/ex:nick):

{ "@path": "ex:friend/(ex:name|ex:nick)" }

Array form:

{ "@path": ["/", "ex:friend", ["|", "ex:name", "ex:nick"]] }

Multiple alternative steps are supported: "(ex:a|ex:b)/(ex:c|ex:d)" expands to 4 chains. A safety limit of 64 expanded chains is enforced to prevent combinatorial explosion.

Each step must be a simple predicate (ex:p), inverse simple predicate (^ex:p), or an alternative of simple predicates ((ex:a|ex:b)). Transitive (+/*) and nested sequence modifiers are not allowed inside sequence steps.

Rules:

  • @path and @reverse are mutually exclusive on the same term definition (produces an error).
  • @path and @id may coexist on the same term definition; when the alias key appears in a WHERE node-map, the @path definition is used.
  • Cycle detection is built in: transitive traversal terminates when it encounters a node already visited.
  • Variable names starting with ?__ are reserved for internal use (e.g., intermediate join variables generated by sequence paths). These variables will not appear in wildcard (select: "*") output.

Filter Functions

Comparison Functions

Comparison operators accept two or more arguments. With multiple arguments, they chain pairwise: (< ?a ?b ?c) means ?a < ?b AND ?b < ?c.

  • (= ?x ?y ...) - Equality
  • (!= ?x ?y ...) - Inequality
  • (> ?x ?y ...) - Greater than
  • (>= ?x ?y ...) - Greater than or equal
  • (< ?x ?y ...) - Less than
  • (<= ?x ?y ...) - Less than or equal

When comparing incomparable types (e.g., a number and a string):

  • = yields false — values of different types are not equal
  • != yields true — values of different types are not equal
  • <, <=, >, >= raise an error — ordering between incompatible types is undefined

Logical Functions

  • (and ...) - Logical AND
  • (or ...) - Logical OR
  • (not ...) - Logical NOT

String Functions

  • (strStarts ?str ?prefix) - String starts with
  • (strEnds ?str ?suffix) - String ends with
  • (contains ?str ?substr) - String contains
  • (regex ?str ?pattern) - Regular expression match

Numeric Functions

Arithmetic operators accept two or more arguments. With multiple arguments, they fold left: (+ ?x ?y ?z) evaluates as (?x + ?y) + ?z. A single argument returns the value unchanged.

  • (+ ?x ?y ...) - Addition
  • (- ?x ?y ...) - Subtraction
  • (* ?x ?y ...) - Multiplication
  • (/ ?x ?y ...) - Division
  • (- ?x) - Unary negation (single argument)
  • (abs ?x) - Absolute value

Vector Similarity Functions

Used with bind to compute similarity scores between @vector values:

  • (dotProduct ?vec1 ?vec2) - Dot product (inner product)
  • (cosineSimilarity ?vec1 ?vec2) - Cosine similarity (-1 to 1)
  • (euclideanDistance ?vec1 ?vec2) - Euclidean (L2) distance

Function names are case-insensitive. See Vector Search for usage examples.

Type Functions

  • (bound ?var) - Variable is bound
  • (isIRI ?x) - Is an IRI
  • (isBlank ?x) - Is a blank node
  • (isLiteral ?x) - Is a literal

Query Modifiers

orderBy

Sort results:

{
  "orderBy": ["?name"]
}

Descending Order:

{
  "orderBy": [["desc", "?age"]]
}

Multiple Sort Keys:

{
  "orderBy": ["?last", ["desc", "?age"]]
}

limit

Limit number of results:

{
  "limit": 10
}

offset

Skip results:

{
  "offset": 20,
  "limit": 10
}

groupBy

Group results:

{
  "select": ["?category", "(count ?product)"],
  "groupBy": ["?category"],
  "where": [
    { "@id": "?product", "ex:category": "?category" }
  ]
}

having

Filter grouped results:

{
  "select": ["?category", "(count ?product)"],
  "groupBy": ["?category"],
  "having": [["filter", "(> (count ?product) 10)"]],
  "where": [
    { "@id": "?product", "ex:category": "?category" }
  ]
}

Aggregation Functions

  • (count ?var) / (count *) — count non-null values; * counts solution rows
  • (count-distinct ?var) — count distinct non-null values
  • (sum ?var) — sum numeric values
  • (avg ?var) — average numeric values
  • (min ?var) / (max ?var) — extremum
  • (median ?var) — median
  • (variance ?var) / (stddev ?var) — population variance / standard deviation
  • (sample ?var) — implementation-defined sample value
  • (groupconcat ?var) / (groupconcat ?var ", ") — concatenate string values, optional separator (defaults to a single space)

Each aggregate auto-aliases to ?<fn-name> (?count, ?sum, …). Use (as (<fn> ?var) ?alias) to choose an explicit alias.

Time Travel Queries

Query historical data using time specifiers in from:

Transaction Number:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:100",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

ISO Timestamp:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@iso:2024-01-15T10:30:00Z",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Commit ContentId:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@commit:bafybeig...",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Multiple Ledgers at Different Times:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": ["ledger1:main@t:100", "ledger2:main@t:200"],
  "select": ["?data"],
  "where": [
    { "@id": "?entity", "ex:data": "?data" }
  ]
}

History Queries

History queries let you see all changes (assertions and retractions) within a time range. Specify the range using from and to keys with time-specced endpoints:

Time Range Syntax

{
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest"
}

Binding Transaction Metadata

Use @t and @op annotations on value objects to capture metadata:

  • @t - Binds the transaction time (integer) when the fact was asserted/retracted.
  • @op - Binds the operation type as a boolean: true for assertions, false for retractions. (Mirrors Flake.op on disk; constants "assert" / "retract" are not accepted — use true / false.)

Both annotations work uniformly for literal-valued and IRI-valued objects.

Entity History:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?name", "?age", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
    { "@id": "ex:alice", "ex:age": "?age" }
  ],
  "orderBy": "?t"
}

Property-Specific History:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:100",
  "select": ["?age", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "ex:age": { "@value": "?age", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

Time Range with ISO:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@iso:2024-01-01T00:00:00Z",
  "to": "ledger:main@iso:2024-12-31T23:59:59Z",
  "select": ["?name", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
  ]
}

Filter by Operation:

You can either use a constant @op shorthand (preferred) or filter on the bound variable:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?name", "?t"],
  "where": [
    { "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": false } }
  ]
}

The shorthand "@op": false lowers to FILTER(op(?name) = false). Equivalent long form using a bound variable: "@op": "?op" plus ["filter", "(= ?op false)"].

All Properties History:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "ledger:main@t:1",
  "to": "ledger:main@t:latest",
  "select": ["?property", "?value", "?t", "?op"],
  "where": [
    { "@id": "ex:alice", "?property": { "@value": "?value", "@t": "?t", "@op": "?op" } }
  ],
  "orderBy": "?t"
}

Graph Source Queries

Query graph sources using the same syntax:

BM25 Search:

{
  "@context": {
    "f": "https://ns.flur.ee/db#"
  },
  "from": "products:main@t:1000",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 10,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Vector Similarity:

{
  "@context": {
    "ex": "http://example.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "from": "documents:main",
  "select": ["?document", "?similarity"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
  ],
  "where": [
    {
      "f:graphSource": "documents-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 5,
      "f:searchResult": { "f:resultId": "?document", "f:resultScore": "?similarity" }
    }
  ],
  "orderBy": [["desc", "?similarity"]],
  "limit": 5
}

Note: f:* keys used for graph source queries should be defined in your @context for clarity.

Complete Examples

Simple Select Query

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "select": ["?name", "?age"],
  "where": [
    {
      "@id": "?person",
      "@type": "ex:User",
      "ex:name": "?name",
      "ex:age": "?age"
    },
    ["filter", "(> ?age 18)"]
  ],
  "orderBy": ["?name"],
  "limit": 10
}

Complex Query with Joins

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "select": ["?person", "?friend", "?friendName"],
  "where": [
    { "@id": "?person", "ex:name": "?name" },
    { "@id": "?person", "ex:friend": "?friend" },
    { "@id": "?friend", "ex:name": "?friendName" },
    ["filter", "(= ?name \"Alice\")"]
  ]
}

Aggregation Query

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "select": [
    "?category",
    "(as (count ?product) ?count)",
    "(as (avg ?price) ?avgPrice)"
  ],
  "groupBy": ["?category"],
  "having": [["filter", "(> (count ?product) 5)"]],
  "where": [
    { "@id": "?product", "ex:category": "?category", "ex:price": "?price" }
  ],
  "orderBy": [["desc", "?count"]]
}

Parse Options

JSON-LD queries accept parse-time options under a top-level opts object. These control how the query is parsed (not what it returns).

strictCompactIri

By default, JSON-LD queries reject unresolved compact-looking IRIs (prefix:suffix where the prefix is not in @context) at parse time. To opt out:

{
  "@context": {"ex": "http://example.org/ns/"},
  "opts": {"strictCompactIri": false},
  "select": ["?id", "?name"],
  "where": {"@id": "?id", "ex:name": "?name"}
}

The default is true. Disable only when you are intentionally working with bare prefix:suffix strings as opaque identifiers. See IRIs and @context — Strict Compact-IRI Guard for the full policy.

Best Practices

  1. Always Provide @context: Makes queries readable and maintainable
  2. Use Specific Patterns: More specific patterns are more efficient
  3. Limit Result Sets: Use limit for large result sets
  4. Flexible Filter Placement: Filters can be placed anywhere in where clauses - the query engine automatically applies each filter as soon as all its required variables are bound
  5. Use Time Specifiers: Use @t: when transaction numbers are known (fastest)
  6. Graph Source Selection: Choose appropriate graph sources for query patterns

SPARQL

Fluree provides full support for SPARQL 1.1, the W3C standard query language for RDF. SPARQL enables compatibility with existing RDF tools, knowledge graphs, and semantic web applications.

Overview

SPARQL (SPARQL Protocol and RDF Query Language) is the industry standard for querying RDF data. Fluree implements SPARQL 1.1, providing full compatibility with SPARQL endpoints and tools.

Basic SPARQL Query

PREFIX ex: <http://example.org/ns/>

SELECT ?name ?age
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
}

Default Prefixes

When querying via the CLI, a ledger’s default context prefix mappings are injected into SPARQL queries that have no explicit PREFIX declarations. The HTTP API defaults this behavior off; pass ?default-context=true on ledger-scoped query requests to opt in. For example, if the default context includes {"ex": "http://example.org/ns/"}, this query works without a PREFIX line when default-context injection is enabled:

SELECT ?name ?age
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
}

If a query includes any PREFIX declarations, the default context is not used — you must declare every prefix you need. To explicitly opt out of the default context without defining any real prefix, use PREFIX : <>. See opting out of the default context for details.

Note: When using fluree-db-api directly (embedded), queries must declare their own PREFIX declarations. The default context is not injected automatically by the core API. Use db_with_default_context() or GraphDb::with_default_context() to opt in. See Default Context for details.

You can view and manage the default context with fluree context get/set or GET/PUT /v1/fluree/context/{ledger...}.

Query Forms

SELECT Queries

Return variable bindings:

PREFIX ex: <http://example.org/ns/>

SELECT ?name ?email
WHERE {
  ?person ex:name ?name .
  ?person ex:email ?email .
}

DISTINCT Results:

SELECT DISTINCT ?name
WHERE {
  ?person ex:name ?name .
}

Reduced Results:

SELECT REDUCED ?name
WHERE {
  ?person ex:name ?name .
}

CONSTRUCT Queries

Generate RDF graphs from query results:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?person ex:displayName ?name .
}
WHERE {
  ?person ex:name ?name .
}

See CONSTRUCT Queries for details.

ASK Queries

Return boolean indicating if query matches:

PREFIX ex: <http://example.org/ns/>

ASK {
  ?person ex:name "Alice" .
}

DESCRIBE Queries

Return RDF description of resources:

PREFIX ex: <http://example.org/ns/>

DESCRIBE ex:alice

Fluree’s DESCRIBE returns outgoing triples for each described resource (equivalent to CONSTRUCT { ?r ?p ?o } WHERE { VALUES ?r { ... } . ?r ?p ?o }).

Basic Graph Patterns

Triple Patterns

Match RDF triples:

?person ex:name ?name .

Multiple Patterns

Combine patterns with AND semantics:

?person ex:name ?name .
?person ex:age ?age .
?person ex:email ?email .

Property Paths

SPARQL property paths allow complex traversal patterns in the predicate position of a triple pattern.

Supported Operators

SyntaxNameDescription
p+One or moreTransitive closure (follows p one or more hops)
p*Zero or moreReflexive transitive closure (includes self)
^pInverseTraverses p in reverse direction
p|qAlternativeMatches either p or q (UNION semantics)
p/qSequenceFollows p then q (property chain)

One or More (+):

?person ex:parent+ ?ancestor .

Zero or More (*):

?person ex:parent* ?ancestorOrSelf .

Inverse (^):

?child ^ex:parent ?parent .

This is equivalent to ?parent ex:parent ?child — it reverses the traversal direction.

Inverse can also be applied to complex paths (sequences and alternatives):

?s ^(ex:friend/ex:name) ?o .   -- inverse of a sequence
?s ^(ex:name|ex:nick) ?o .     -- inverse of an alternative
  • ^(ex:friend/ex:name) reverses the step order and inverts each step: (^ex:name)/(^ex:friend)
  • ^(ex:name|ex:nick) distributes inverse into each branch: (^ex:name)|(^ex:nick)
  • Double inverse cancels: ^(^ex:p) simplifies to ex:p

Alternative (|):

?person ex:friend|ex:colleague ?related .

This produces UNION semantics: results from both ex:friend and ex:colleague are combined (bag semantics, so duplicates are preserved).

Three-way and inverse alternatives are supported:

?s ex:a|ex:b|ex:c ?o .
?s ex:friend|^ex:colleague ?related .

Alternative branches can also be sequence chains. For example, to get the name via either the friend or colleague path:

?s (ex:friend/ex:name)|(ex:colleague/ex:name) ?name .

Branches can freely mix simple predicates, inverse predicates, and sequence chains:

?s ex:name|(ex:friend/ex:name)|^ex:colleague ?val .

Sequence (/) — Property Chains:

?person ex:friend/ex:name ?friendName .

This follows ex:friend then ex:name, expanding into a chain of triple patterns joined by internal variables. Multi-step chains are supported:

?person ex:friend/ex:friend/ex:name ?fofName .

Sequence steps can include inverse predicates:

?person ^ex:friend/ex:name ?name .

This traverses ex:friend backwards (finding who links to ?person), then follows ex:name forward.

Sequence steps can also be alternatives. For example, ex:friend/(ex:name|ex:nick) distributes the alternative into a union of chains (ex:friend/ex:name and ex:friend/ex:nick):

?person ex:friend/(ex:name|ex:nick) ?label .

Multiple alternative steps are supported: (ex:a|ex:b)/(ex:c|ex:d) expands to 4 chains. A safety limit of 64 expanded chains is enforced to prevent combinatorial explosion.

Rules:

  • Transitive paths (+, *) require at least one variable (both subject and object cannot be constants).
  • Sequence (/) steps must be simple predicates (ex:p), inverse simple predicates (^ex:p), or alternatives of simple predicates ((ex:a|ex:b)). Transitive (+/*) and nested sequence modifiers are not allowed inside sequence steps.
  • Variable names starting with ?__ are reserved for internal use and will not appear in SELECT * (wildcard) output.

Not Yet Supported

The following operators are parsed but not yet supported for execution:

SyntaxName
p?Zero or one (optional step)
!p or !(p|q)Negated property set

Query Modifiers

FILTER

Filter results with conditions:

SELECT ?name ?age
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
  FILTER (?age > 18)
}

Multiple Filters:

FILTER (?age > 18 && ?age < 65)
FILTER (regex(?name, "^A"))

OPTIONAL

Match optional patterns:

SELECT ?name ?email
WHERE {
  ?person ex:name ?name .
  OPTIONAL { ?person ex:email ?email . }
}

Multiple Optionals:

SELECT ?name ?email ?phone
WHERE {
  ?person ex:name ?name .
  OPTIONAL { ?person ex:email ?email . }
  OPTIONAL { ?person ex:phone ?phone . }
}

UNION

Match alternative patterns:

SELECT ?name
WHERE {
  { ?person ex:name ?name . }
  UNION
  { ?person ex:alias ?name . }
}

MINUS

Exclude matching patterns:

SELECT ?person
WHERE {
  ?person ex:type ex:User .
  MINUS { ?person ex:status ex:Inactive . }
}

BIND

Compute values:

SELECT ?name ?nextAge
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
  BIND (?age + 1 AS ?nextAge)
}

VALUES

Provide initial bindings:

SELECT ?person ?name
WHERE {
  VALUES ?name { "Alice" "Bob" "Carol" }
  ?person ex:name ?name .
}

Aggregation

GROUP BY

Group results by variable:

SELECT ?category (COUNT(?product) AS ?count)
WHERE {
  ?product ex:category ?category .
}
GROUP BY ?category

Expression-based GROUP BY:

Group by a computed expression using (expr AS ?alias) syntax:

SELECT ?initial (COUNT(?name) AS ?count)
WHERE {
  ?person ex:name ?name .
}
GROUP BY (SUBSTR(?name, 1, 1) AS ?initial)

The expression is evaluated per row and bound to the alias variable before grouping. Any SPARQL expression is supported, including function calls, arithmetic, and type casts.

HAVING

Filter grouped results:

SELECT ?category (COUNT(?product) AS ?count)
WHERE {
  ?product ex:category ?category .
}
GROUP BY ?category
HAVING (COUNT(?product) > 10)

Aggregation Functions

  • COUNT(?var) - Count non-null values
  • SUM(?var) - Sum numeric values
  • AVG(?var) - Average numeric values
  • MIN(?var) - Minimum value
  • MAX(?var) - Maximum value
  • SAMPLE(?var) - Arbitrary value from group
  • GROUP_CONCAT(?var; separator=",") - Concatenate values

All aggregate functions support the DISTINCT modifier, which eliminates duplicate values before aggregation:

SELECT ?category (COUNT(DISTINCT ?customer) AS ?unique_buyers)
                 (SUM(DISTINCT ?price) AS ?unique_price_total)
WHERE {
  ?order ex:category ?category .
  ?order ex:customer ?customer .
  ?order ex:price ?price .
}
GROUP BY ?category

Aggregate result types: COUNT and SUM of integers return xsd:integer. SUM of mixed numeric types and AVG return xsd:double.

Sorting and Limiting

ORDER BY

Sort results:

SELECT ?name ?age
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
}
ORDER BY ?name

Descending:

ORDER BY DESC(?age)

Multiple Sort Keys:

ORDER BY ?last ASC(?first) DESC(?age)

LIMIT

Limit number of results:

SELECT ?name
WHERE {
  ?person ex:name ?name .
}
LIMIT 10

OFFSET

Skip results:

SELECT ?name
WHERE {
  ?person ex:name ?name .
}
OFFSET 20
LIMIT 10

Datasets

FROM

Specify default graph:

PREFIX ex: <http://example.org/ns/>

SELECT ?name
FROM <mydb:main>
WHERE {
  ?person ex:name ?name .
}

Multiple Default Graphs:

SELECT ?name
FROM <mydb:main>
FROM <otherdb:main>
WHERE {
  ?person ex:name ?name .
}

FROM NAMED

Specify named graphs:

PREFIX ex: <http://example.org/ns/>

SELECT ?graph ?name
FROM NAMED <mydb:main>
FROM NAMED <otherdb:main>
WHERE {
  GRAPH ?graph {
    ?person ex:name ?name .
  }
}

Fluree also exposes a built-in named graph inside each ledger for transaction / commit metadata:

  • FROM <mydb:main#txn-meta> (txn-meta as the default graph), or
  • FROM NAMED <mydb:main#txn-meta> and GRAPH <mydb:main#txn-meta> { ... }

See Datasets for details.

SPARQL Functions

String Functions

  • STR(?x) - String value
  • LANG(?x) - Language tag
  • LANGMATCHES(?lang, ?pattern) - Language match
  • REGEX(?str, ?pattern) - Regular expression
  • REPLACE(?str, ?pattern, ?replacement) - Replace
  • SUBSTR(?str, ?start, ?length) - Substring
  • STRLEN(?str) - String length
  • UCASE(?str) - Uppercase
  • LCASE(?str) - Lowercase
  • ENCODE_FOR_URI(?str) - URI encode
  • CONCAT(?str1, ?str2, ...) - Concatenate

Numeric Functions

  • ABS(?x) - Absolute value
  • ROUND(?x) - Round
  • CEIL(?x) - Ceiling
  • FLOOR(?x) - Floor
  • RAND() - Random number

Date/Time Functions

  • NOW() - Current timestamp
  • YEAR(?date) - Year
  • MONTH(?date) - Month
  • DAY(?date) - Day
  • HOURS(?time) - Hours
  • MINUTES(?time) - Minutes
  • SECONDS(?time) - Seconds

Type Conversion

  • STRDT(?str, ?datatype) - String to typed literal
  • STRLANG(?str, ?lang) - String with language
  • DATATYPE(?literal) - Datatype
  • IRI(?str) - IRI from string
  • URI(?str) - URI from string
  • BNODE(?str) - Blank node

XSD Datatype Constructors (Casts)

Per W3C SPARQL 1.1 §17.5, XSD constructor functions cast values between datatypes. Invalid casts produce unbound (no binding), not errors.

  • xsd:boolean(?x) - Cast to boolean ("true", "1" → true; "false", "0" → false; numeric 0 → false, non-zero → true)
  • xsd:integer(?x) - Cast to integer (truncates doubles, parses strings)
  • xsd:float(?x) - Cast to single-precision float
  • xsd:double(?x) - Cast to double-precision float
  • xsd:decimal(?x) - Cast to decimal (rejects scientific notation strings)
  • xsd:string(?x) - Cast to string (canonical form for decimals)

Logical Functions

  • BOUND(?var) - Variable is bound
  • IF(?condition, ?then, ?else) - Conditional
  • COALESCE(?x, ?y, ...) - First non-null value
  • ISIRI(?x) - Is IRI
  • ISURI(?x) - Is URI
  • ISBLANK(?x) - Is blank node
  • ISLITERAL(?x) - Is literal
  • ISNUMERIC(?x) - Is numeric

Subqueries

Nest queries:

SELECT ?person ?name
WHERE {
  ?person ex:name ?name .
  {
    SELECT ?person
    WHERE {
      ?person ex:age ?age .
      FILTER (?age > 18)
    }
  }
}

Service Queries

SERVICE enables cross-ledger queries within Fluree. You can execute patterns against different ledgers within the same query using the fluree:ledger: URI scheme.

Basic Cross-Ledger Query

Query data from another ledger in your dataset:

PREFIX ex: <http://example.org/ns/>

SELECT ?customer ?name ?total
FROM <customers:main>
FROM NAMED <orders:main>
WHERE {
  ?customer ex:name ?name .
  SERVICE <fluree:ledger:orders:main> {
    ?order ex:customer ?customer ;
           ex:total ?total .
  }
}

Endpoint URI Format

For local Fluree ledger queries, use the fluree:ledger: scheme:

FormatDescriptionMatches dataset ledger ID
fluree:ledger:<name>Query ledger with default branch (main)<name>:main
fluree:ledger:<name>:<branch>Query specific branch<name>:<branch>

Where:

  • <name> is the ledger name without the branch (e.g., orders, acme/people)
  • <branch> is the branch name (e.g., main, dev)
  • The full dataset ledger ID is always <name>:<branch> (e.g., orders:main, acme/people:dev)

The endpoint is resolved by matching against the full ledger_id in the dataset.

Examples:

SERVICE <fluree:ledger:orders> { ... }         -- matches orders:main
SERVICE <fluree:ledger:orders:main> { ... }    -- matches orders:main (explicit)
SERVICE <fluree:ledger:orders:dev> { ... }     -- matches orders:dev

SERVICE SILENT

Use SERVICE SILENT to return empty results instead of failing if the service errors or is unavailable:

PREFIX ex: <http://example.org/ns/>

SELECT ?name ?order
WHERE {
  ?person ex:name ?name .
  SERVICE SILENT <fluree:ledger:orders:main> {
    ?order ex:customer ?person .
  }
}

If the orders ledger is not in the dataset or encounters an error, the query returns results with unbound ?order values instead of failing.

Variable Endpoints

SERVICE supports variable endpoints that iterate over available ledgers:

PREFIX ex: <http://example.org/ns/>

SELECT ?ledger ?person ?name
FROM NAMED <db1:main>
FROM NAMED <db2:main>
WHERE {
  SERVICE ?ledger {
    ?person ex:name ?name .
  }
}

This queries all named ledgers in the dataset.

Cross-Ledger Join Example

Join customer data from one ledger with their orders from another:

PREFIX ex: <http://example.org/ns/>

SELECT ?customerName ?productName ?quantity
FROM <customers:main>
FROM NAMED <orders:main>
FROM NAMED <products:main>
WHERE {
  # Get customer from default graph (customers ledger)
  ?customer ex:name ?customerName .

  # Get orders for this customer from orders ledger
  SERVICE <fluree:ledger:orders:main> {
    ?order ex:customer ?customer ;
           ex:product ?product ;
           ex:quantity ?quantity .
  }

  # Get product details from products ledger
  SERVICE <fluree:ledger:products:main> {
    ?product ex:name ?productName .
  }
}

Requirements

  • The target ledger must be included in the dataset (via FROM or FROM NAMED clauses)
  • Results are joined with the outer query on shared variables
  • SERVICE patterns are executed as correlated subqueries (like EXISTS)

Remote Fluree Federation

SERVICE supports querying ledgers on remote Fluree instances using the fluree:remote: scheme. This enables cross-server federation — a single SPARQL query can join data from local ledgers with data from ledgers on other Fluree servers.

Remote Endpoint Format

FormatDescription
fluree:remote:<connection>/<ledger>Query a ledger on a registered remote server

Where:

  • <connection> is a named remote connection registered at build time (maps to a server URL + bearer token)
  • <ledger> is the ledger ID on the remote server (e.g., customers:main, acme/people:main)

Example: Cross-Server Join

PREFIX ex: <http://example.org/ns/>

SELECT ?localName ?remoteEmail
WHERE {
  ?person ex:name ?localName .
  SERVICE <fluree:remote:acme/customers:main> {
    ?person ex:email ?remoteEmail .
  }
}

This queries ?person ex:name from the local ledger and joins with ?person ex:email from the customers:main ledger on the remote server named acme.

Multiple Ledgers on the Same Remote Server

A single remote connection gives access to any ledger the bearer token is authorized for:

PREFIX ex: <http://example.org/ns/>

SELECT ?customer ?orderId ?productName
WHERE {
  SERVICE <fluree:remote:acme/customers:main> {
    ?customer ex:name ?name .
    ?customer ex:id ?customerId .
  }
  SERVICE <fluree:remote:acme/orders:main> {
    ?order ex:customerId ?customerId .
    ?order ex:orderId ?orderId .
    ?order ex:product ?product .
  }
  SERVICE <fluree:remote:acme/products:main> {
    ?product ex:name ?productName .
  }
}

SILENT with Remote Endpoints

SERVICE SILENT works with remote endpoints. If the remote server is unreachable, the connection is not registered, or the bearer token is rejected, the SERVICE block returns empty results instead of failing the query:

SERVICE SILENT <fluree:remote:partner/inventory:main> {
  ?item ex:sku ?sku .
}

Registering Remote Connections

Remote connections are registered at connection build time via the Rust API or server configuration. See Configuration: Remote connections and Rust API: Remote federation for setup details.

Datatype Handling

Remote query results preserve their original datatypes. Values returned from a remote server are parsed into the same rich type system used for local data — xsd:dateTime, xsd:date, xsd:decimal, xsd:integer, etc. are all stored with their proper typed representations. Custom datatypes (e.g., http://example.org/myType) are also preserved: the value is kept as a string with the original datatype IRI retained, so round-tripping and downstream FILTER comparisons on shared custom types work correctly.

Limitations (v1)

  • Uncorrelated execution only. The SERVICE body is sent to the remote server as a standalone query. Parent-row bindings are not injected as VALUES (bound-join). This means a SERVICE block that references variables bound in the outer query will not push those constraints to the remote server — the remote returns all matching rows, and the join happens locally.
  • SPARQL queries only. Remote SERVICE is available in SPARQL queries. JSON-LD queries do not currently support the fluree:remote: scheme.
  • No query cancellation propagation. If the local query is cancelled, in-flight remote HTTP requests are not aborted.
  • Policy is local only. The remote server enforces its own policy based on the bearer token. The local server’s policy engine does not filter rows returned from a remote SERVICE.

External SPARQL Endpoints

Federated queries to non-Fluree SPARQL endpoints (e.g., Wikidata, DBpedia) are not yet supported. Only the fluree:ledger: (local) and fluree:remote: (remote Fluree) schemes are currently available.

Time Travel

Point-in-Time Queries

Query data as it existed at a specific time using time specifiers in the FROM clause:

PREFIX ex: <http://example.org/ns/>

SELECT ?name ?age
FROM <ledger:main@t:100>
WHERE {
  ?person ex:name ?name ;
          ex:age ?age .
}

Time specifiers:

  • @t:100 - Transaction number
  • @iso:2024-01-15T10:30:00Z - ISO 8601 datetime
  • @commit:bafybeig... - Commit ContentId
  • @t:latest - Current/latest state

History Queries

Query all changes (assertions and retractions) within a time range using FROM...TO with RDF-star syntax:

PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?age ?t ?op
FROM <ledger:main@t:1>
TO <ledger:main@t:latest>
WHERE {
  << ex:alice ex:age ?age >> f:t ?t .
  << ex:alice ex:age ?age >> f:op ?op .
}
ORDER BY ?t

The << subject predicate object >> syntax (RDF-star) treats the triple as an entity that can have metadata:

  • f:t - Transaction time (integer) when the fact was asserted or retracted.
  • f:op - Operation type as a boolean: true for assertions, false for retractions. Mirrors Flake.op on disk.

Filter by operation type:

PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?age ?t
FROM <ledger:main@t:1>
TO <ledger:main@t:latest>
WHERE {
  << ex:alice ex:age ?age >> f:t ?t .
  << ex:alice ex:age ?age >> f:op ?op .
  FILTER(?op = false)
}

History with ISO datetime range:

PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?name ?t ?op
FROM <ledger:main@iso:2024-01-01T00:00:00Z>
TO <ledger:main@iso:2024-12-31T23:59:59Z>
WHERE {
  << ex:alice ex:name ?name >> f:t ?t .
  << ex:alice ex:name ?name >> f:op ?op .
}

SPARQL UPDATE

Fluree supports SPARQL 1.1 Update for modifying data using standard SPARQL syntax. SPARQL UPDATE requests use the application/sparql-update content type and are sent to the update endpoints.

INSERT DATA

Insert ground triples (no variables):

PREFIX ex: <http://example.org/ns/>

INSERT DATA {
  ex:alice ex:name "Alice" .
  ex:alice ex:age 30 .
  ex:alice ex:email "alice@example.org" .
}

HTTP Request:

curl -X POST http://localhost:8090/v1/fluree/update/mydb:main \
  -H "Content-Type: application/sparql-update" \
  -d 'PREFIX ex: <http://example.org/ns/>
      INSERT DATA { ex:alice ex:name "Alice" }'

DELETE DATA

Delete specific ground triples:

PREFIX ex: <http://example.org/ns/>

DELETE DATA {
  ex:alice ex:email "alice@example.org" .
}

DELETE WHERE

Delete triples matching a pattern:

PREFIX ex: <http://example.org/ns/>

DELETE WHERE {
  ex:alice ex:age ?age .
}

This finds all ex:age values for ex:alice and deletes them.

DELETE/INSERT (Modify)

The most powerful form combines WHERE, DELETE, and INSERT clauses:

PREFIX ex: <http://example.org/ns/>

DELETE {
  ?person ex:age ?oldAge .
}
INSERT {
  ?person ex:age ?newAge .
}
WHERE {
  ?person ex:name "Alice" .
  ?person ex:age ?oldAge .
  BIND(?oldAge + 1 AS ?newAge)
}

Update multiple properties:

PREFIX ex: <http://example.org/ns/>

DELETE {
  ?person ex:name ?oldName .
  ?person ex:status ?oldStatus .
}
INSERT {
  ?person ex:name "Alicia" .
  ?person ex:status ex:Active .
}
WHERE {
  ?person ex:name "Alice" .
  OPTIONAL { ?person ex:name ?oldName }
  OPTIONAL { ?person ex:status ?oldStatus }
}

Dataset scoping for MODIFY (WITH / USING / USING NAMED)

SPARQL UPDATE MODIFY supports dataset scoping for named graphs:

  • WITH <iri>: sets the default graph for INSERT/DELETE templates that don’t use an explicit GRAPH <iri> { ... } block.
  • USING <iri>: scopes the default graph(s) for WHERE evaluation. Repeated USING clauses are evaluated as a merged default graph.
  • USING NAMED <iri>: scopes which named graphs are visible to WHERE GRAPH <iri> { ... } patterns. Repeated USING NAMED clauses allow multiple named graphs.

Blank Nodes in INSERT

Blank nodes can be used in INSERT templates to create new entities:

PREFIX ex: <http://example.org/ns/>

INSERT DATA {
  _:newPerson ex:name "Bob" .
  _:newPerson ex:age 25 .
}

Typed Literals

Specify datatypes explicitly:

PREFIX ex: <http://example.org/ns/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

INSERT DATA {
  ex:alice ex:birthDate "1990-05-15"^^xsd:date .
  ex:alice ex:salary "75000.00"^^xsd:decimal .
  ex:alice ex:active "true"^^xsd:boolean .
}

Language-Tagged Strings

Insert strings with language tags:

PREFIX ex: <http://example.org/ns/>

INSERT DATA {
  ex:alice ex:name "Alice"@en .
  ex:alice ex:name "Alicia"@es .
  ex:alice ex:name "アリス"@ja .
}

SPARQL UPDATE Restrictions

Current restrictions / boundaries:

  • Graph management operations: LOAD, CLEAR, DROP, CREATE, ADD, MOVE, COPY are not yet supported.
  • Template graph variables: INSERT/DELETE templates support GRAPH <iri> { ... } blocks, but GRAPH ?g { ... } is not yet supported.
  • DELETE WHERE + GRAPH blocks: GRAPH <iri> { ... } blocks are not yet supported inside DELETE WHERE { ... }.
  • SERVICE: Only local-ledger endpoints of the form fluree:ledger:<name>[:<branch>] are supported; arbitrary remote HTTP SERVICE endpoints are not supported.
  • Property paths: Supported in WHERE (subject to Fluree capability settings).

Endpoint Usage

SPARQL UPDATE uses the update endpoints with Content-Type: application/sparql-update:

EndpointDescription
POST /v1/fluree/updateConnection-scoped, requires Fluree-Ledger header
POST /v1/fluree/update/<ledger...>Ledger-scoped, ledger from URL path

Examples:

# Ledger-scoped (recommended)
curl -X POST http://localhost:8090/v1/fluree/update/mydb:main \
  -H "Content-Type: application/sparql-update" \
  -d 'PREFIX ex: <http://example.org/ns/>
      INSERT DATA { ex:alice ex:name "Alice" }'

# Connection-scoped with header
curl -X POST http://localhost:8090/v1/fluree/update \
  -H "Content-Type: application/sparql-update" \
  -H "Fluree-Ledger: mydb:main" \
  -d 'PREFIX ex: <http://example.org/ns/>
      INSERT DATA { ex:alice ex:name "Alice" }'

Best Practices

  1. Use PREFIX Declarations: Makes queries readable
  2. Automatic Pattern Optimization: The query planner automatically reorders patterns for efficient execution using statistics-driven cardinality estimates
  3. Flexible FILTER Placement: Filters can be placed anywhere in the WHERE clause — the query engine automatically applies each filter as soon as all its required variables are bound
  4. Limit Results: Use LIMIT for large result sets
  5. Avoid Cartesian Products: Structure queries to avoid large joins

Output Formats

Fluree supports multiple output formats for query results, each optimized for different use cases. You can choose the format that best fits your application’s needs.

Supported Formats

JSON-LD Format

Default format for JSON-LD Query. Provides compact, context-aware JSON with IRI expansion/compaction.

Characteristics:

  • Uses @context for IRI compaction
  • Compact IRIs (e.g., ex:alice instead of full IRIs)
  • Inferable datatypes (string, long, double, boolean) rendered as bare values
  • Language tags preserved

Example (graph crawl):

[
  {
    "@id": "ex:alice",
    "schema:name": "Alice",
    "schema:age": 30,
    "schema:knows": {"@id": "ex:bob"}
  }
]

Example (tabular SELECT):

[
  ["Alice", 30],
  ["Bob", 25]
]

SPARQL JSON Format

Standard SPARQL 1.1 result format for SPARQL queries.

Characteristics:

  • W3C SPARQL 1.1 compliant
  • Standard results and bindings structure
  • Datatype information included
  • Language tags included

Example:

{
  "head": {
    "vars": ["name", "age"]
  },
  "results": {
    "bindings": [
      {
        "name": {
          "type": "literal",
          "value": "Alice"
        },
        "age": {
          "type": "literal",
          "value": "30",
          "datatype": "http://www.w3.org/2001/XMLSchema#integer"
        }
      }
    ]
  }
}

Typed JSON Format

Type-preserving JSON format with explicit datatype information on every value. Works with both tabular SELECT queries and graph crawl (entity-centric) queries.

Characteristics:

  • Every literal includes {"@value": ..., "@type": "..."} — even inferable types
  • References use {"@id": "..."}
  • Language-tagged strings use {"@value": ..., "@language": "..."}
  • @json values use {"@value": <parsed>, "@type": "@json"}
  • Nested entities in graph crawl results are also fully typed
  • IRIs compacted via @context

Example (tabular SELECT):

[
  {
    "?name": {"@value": "Alice", "@type": "xsd:string"},
    "?age": {"@value": 30, "@type": "xsd:long"}
  }
]

Example (graph crawl):

[
  {
    "@id": "ex:alice",
    "@type": ["schema:Person"],
    "schema:name": {"@value": "Alice", "@type": "xsd:string"},
    "schema:age": {"@value": 30, "@type": "xsd:long"},
    "schema:knows": {
      "@id": "ex:bob",
      "schema:name": {"@value": "Bob", "@type": "xsd:string"}
    },
    "ex:data": {"@value": {"key": "val"}, "@type": "@json"}
  }
]

Agent JSON Format

Optimized for LLM/agent consumption. Returns a self-describing envelope with a schema header, compact object rows using native JSON types, and built-in pagination support.

Request via HTTP:

Accept: application/vnd.fluree.agent+json
Fluree-Max-Bytes: 32768

Characteristics:

  • Schema-once header: datatypes declared per variable, not repeated per value
  • Native JSON types for values (strings, numbers, booleans — no wrappers for inferable types)
  • Non-inferable datatypes annotated inline only where needed ({"@value": ..., "@type": "..."})
  • Byte-budget truncation with hasMore flag and resume query
  • Time-pinning metadata (t for single-ledger, iso wallclock timestamp for cross-ledger)

Example (single-ledger, no truncation):

{
  "schema": {
    "?name": "xsd:string",
    "?age": "xsd:integer",
    "?s": "uri"
  },
  "rows": [
    {"?name": "Alice", "?age": 30, "?s": "ex:alice"},
    {"?name": "Bob", "?age": 25, "?s": "ex:bob"}
  ],
  "rowCount": 2,
  "t": 5,
  "iso": "2026-03-26T14:30:00Z",
  "hasMore": false
}

Example (truncated, with resume query):

{
  "schema": {
    "?name": "xsd:string",
    "?age": "xsd:integer"
  },
  "rows": [
    {"?name": "Alice", "?age": 30},
    {"?name": "Bob", "?age": 25}
  ],
  "rowCount": 2,
  "t": 5,
  "iso": "2026-03-26T14:30:00Z",
  "hasMore": true,
  "message": "Response truncated due to size limit of 32768 bytes. Use the query below to retrieve the next batch.",
  "resume": "SELECT ?name ?age FROM <mydb:main@t:5> WHERE { ?s ex:name ?name ; ex:age ?age } OFFSET 2 LIMIT 100"
}

Schema types:

  • Single type → string: "?name": "xsd:string"
  • Mixed types → array: "?value": ["xsd:string", "xsd:integer"]
  • IRI references → "uri"

Envelope fields:

FieldPresentDescription
schemaAlwaysPer-variable datatype map
rowsAlwaysArray of {variable: value} objects
rowCountAlwaysNumber of rows included
tSingle-ledger onlyTransaction number used for the query
isoAlwaysISO-8601 wallclock timestamp at query time
hasMoreAlwaysWhether more rows exist beyond the byte budget
messageWhen truncatedHuman-readable truncation explanation
resumeWhen truncated, single-FROM onlyReady-to-execute SPARQL with @t: pinning and OFFSET

Multi-ledger queries: The t field is omitted (each ledger has its own timeline). The resume field is also omitted; instead, the message instructs the caller to use @iso: on each FROM clause for time-pinning.

Byte budget: Set via the Fluree-Max-Bytes header. When the cumulative serialized size of rows exceeds this limit, the formatter stops adding rows and sets hasMore: true. The budget applies to row data only (schema and envelope overhead are excluded from the count).

Array Normalization

By default, graph crawl results return single-valued properties as bare scalars and multi-valued properties as arrays:

{"schema:name": "Alice", "ex:tags": ["rust", "wasm"]}

This can be problematic for typed struct deserialization (e.g., a Vec<String> field that receives a bare string when only one value exists).

normalize_arrays forces all property values into arrays regardless of cardinality:

{"schema:name": ["Alice"], "ex:tags": ["rust", "wasm"]}

This is orthogonal to typed JSON and can be combined with any format:

#![allow(unused)]
fn main() {
// Typed + normalized — most predictable for struct deserialization
let config = FormatterConfig::typed_json().with_normalize_arrays();

// JSON-LD + normalized — compact values but predictable shapes
let config = FormatterConfig::jsonld().with_normalize_arrays();
}

The @container: @set context annotation still forces arrays per-property and works regardless of the normalize_arrays setting.

Format Selection

JSON-LD Query

JSON-LD Query defaults to JSON-LD format. You can specify the format explicitly:

{
  "@context": { "ex": "http://example.org/ns/" },
  "select": ["?name", "?age"],
  "where": [
    { "@id": "?person", "ex:name": "?name", "ex:age": "?age" }
  ],
  "format": "jsonld"
}

SPARQL

SPARQL queries return SPARQL JSON format by default:

PREFIX ex: <http://example.org/ns/>

SELECT ?name ?age
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
}

Datatype Handling

String Types

JSON-LD:

"Hello"

Typed JSON:

{"@value": "Hello", "@type": "xsd:string"}

SPARQL JSON:

{"type": "literal", "value": "Hello"}

Numeric Types

JSON-LD:

42

Typed JSON:

{"@value": 42, "@type": "xsd:long"}

SPARQL JSON:

{"type": "literal", "value": "42", "datatype": "http://www.w3.org/2001/XMLSchema#integer"}

Language-Tagged Strings

All formats use the same representation:

{"@value": "Hello", "@language": "en"}

IRIs

JSON-LD / Typed JSON:

{"@id": "ex:alice"}

SPARQL JSON:

{"type": "uri", "value": "http://example.org/ns/alice"}

Rust API

Use FormatterConfig to control output format via the query builder API:

#![allow(unused)]
fn main() {
use fluree_db_api::FormatterConfig;

// Single-ledger query with explicit format
let db = fluree.db("mydb:main").await?;
let result = db.query(&fluree)
    .sparql("SELECT ?name WHERE { ?s <schema:name> ?name }")
    .format(FormatterConfig::typed_json())
    .execute_formatted()
    .await?;

// Dataset query with format
let result = dataset.query(&fluree)
    .sparql("SELECT * WHERE { ?s ?p ?o }")
    .format(FormatterConfig::sparql_json())
    .execute_formatted()
    .await?;

// Connection-level query with format
let result = fluree.query_from()
    .jsonld(&query_with_from)
    .format(FormatterConfig::jsonld())
    .execute_formatted()
    .await?;

// AgentJson with byte budget and resume support
use fluree_db_api::AgentJsonContext;

let config = FormatterConfig::agent_json()
    .with_max_bytes(32768)
    .with_agent_json_context(AgentJsonContext {
        sparql_text: Some(sparql.to_string()),
        from_count: 1,
        iso_timestamp: Some(chrono::Utc::now().to_rfc3339()),
    });
let result = db.query(&fluree)
    .sparql("SELECT ?name ?age WHERE { ?s ex:name ?name ; ex:age ?age }")
    .format(config)
    .execute_formatted()
    .await?;

// Or directly on QueryResult:
let json = result.to_agent_json(&snapshot)?;                       // no budget
let json = result.to_agent_json_with_config(&snapshot, &config)?;  // with budget
}

Available format constructors:

  • FormatterConfig::jsonld() — JSON-LD (default for JSON-LD queries)
  • FormatterConfig::sparql_json() — SPARQL 1.1 JSON Results (default for SPARQL queries)
  • FormatterConfig::typed_json() — Typed JSON with explicit datatypes on every value
  • FormatterConfig::agent_json() — Agent JSON envelope for LLM/agent consumers

Builder methods:

  • .with_normalize_arrays() — Force array wrapping for all graph crawl properties
  • .with_pretty() — Pretty-print JSON output
  • .with_max_bytes(n) — Set byte budget for AgentJson truncation
  • .with_agent_json_context(ctx) — Set SPARQL text, FROM count, and ISO timestamp for AgentJson resume queries

All three query paths (db.query(), dataset.query(), fluree.query_from()) support .format().

Direct formatting on QueryResult

For graph crawl queries (which require async DB access):

#![allow(unused)]
fn main() {
// Typed JSON with graph crawl support
let json = result.to_typed_json_async(db.as_graph_db_ref()).await?;

// Custom config (e.g., typed + normalize_arrays)
let config = FormatterConfig::typed_json().with_normalize_arrays();
let json = result.format_async(db.as_graph_db_ref(), &config).await?;
}

When no .format() is set:

  • JSON-LD queries default to JSON-LD format
  • SPARQL queries default to SPARQL JSON format

CLI Usage

The fluree query command supports format selection via --format:

# Default table output
fluree query "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 5"

# JSON output
fluree query --format json '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'

# Typed JSON output (explicit types on every value)
fluree query --format typed-json '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'

# Normalize arrays (force all properties to arrays)
fluree query --format json --normalize-arrays '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'

# Typed JSON + normalize arrays (most predictable for programmatic use)
fluree query --format typed-json --normalize-arrays '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'

Performance Considerations

  • JSON-LD is the most efficient format — inferable types skip the @value/@type wrapper
  • Typed JSON adds a constant-factor overhead per literal value (one extra JSON object allocation). Query execution is unaffected — only the formatting phase is slower.
  • normalize_arrays adds zero overhead when disabled (default). When enabled, it skips the len() == 1 check — no additional allocations beyond the array wrapper.
  • TSV/CSV bypass JSON DOM construction entirely for maximum throughput

Best Practices

  1. Use JSON-LD for human-facing apps: Compact and readable
  2. Use Typed JSON for struct deserialization: Unambiguous types prevent parsing surprises
  3. Use normalize_arrays for typed consumers: Ensures Vec<T> fields always get arrays
  4. Use SPARQL JSON for standard tooling: Interoperable with SPARQL clients
  5. Use TSV/CSV for bulk export: Highest throughput, smallest memory footprint
  6. Use Agent JSON for LLM/agent integrations: Schema-once + pagination prevents context window overflow

Datasets and Multi-Graph Execution

Fluree supports SPARQL datasets, enabling queries across multiple graphs and ledgers simultaneously. This provides powerful data integration capabilities for complex applications.

SPARQL Datasets

A dataset in SPARQL is a collection of graphs used for query execution:

  • Default Graph: The primary graph for triple patterns without GRAPH clauses
  • Named Graphs: Additional graphs identified by IRIs, accessible via GRAPH clauses

FROM Clauses

Single Default Graph

Specify a single default graph:

JSON-LD Query:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "mydb:main",
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

SPARQL:

PREFIX ex: <http://example.org/ns/>

SELECT ?name
FROM <mydb:main>
WHERE {
  ?person ex:name ?name .
}

Multiple Default Graphs

Specify multiple default graphs (union semantics):

JSON-LD Query:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": ["mydb:main", "otherdb:main"],
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ]
}

SPARQL:

PREFIX ex: <http://example.org/ns/>

SELECT ?name
FROM <mydb:main>
FROM <otherdb:main>
WHERE {
  ?person ex:name ?name .
}

FROM NAMED Clauses

Named graph sources (datasets)

In SPARQL, FROM NAMED identifies named graphs in the dataset. In Fluree, these are often graph sources such as:

  • another ledger (federation / multi-ledger queries), or
  • a graph source (search, tabular mapping, etc.).

Note: On the ledger-scoped HTTP query endpoint (POST /query/{ledger}), FROM / FROM NAMED is also supported, but is interpreted as selecting named graphs inside that same ledger. Use the connection-scoped endpoint (POST /query) when you want a dataset that spans multiple ledgers.

Query across multiple named graph sources:

JSON-LD Query:

{
  "@context": { "ex": "http://example.org/ns/" },
  "fromNamed": {
    "mydb": { "@id": "mydb:main" },
    "otherdb": { "@id": "otherdb:main" }
  },
  "select": ["?graph", "?name"],
  "where": [
    ["graph", "?graph", { "@id": "?person", "ex:name": "?name" }]
  ]
}

SPARQL:

PREFIX ex: <http://example.org/ns/>

SELECT ?graph ?name
FROM NAMED <mydb:main>
FROM NAMED <otherdb:main>
WHERE {
  GRAPH ?graph {
    ?person ex:name ?name .
  }
}

Specific Named Graph

Query a specific named graph:

SPARQL:

PREFIX ex: <http://example.org/ns/>

SELECT ?name
FROM NAMED <mydb:main>
WHERE {
  GRAPH <mydb:main> {
    ?person ex:name ?name .
  }
}

Ledger named graph: txn-meta

Fluree provides a built-in named graph inside each ledger for transactional / commit metadata: txn-meta.

Use the #txn-meta fragment on a ledger reference:

  • mydb:main#txn-meta
  • mydb:main@t:100#txn-meta (time pinned)

JSON-LD Query (txn-meta as the default graph):

{
  "@context": {
    "f": "https://ns.flur.ee/db#",
    "ex": "http://example.org/ns/"
  },
  "from": "mydb:main#txn-meta",
  "select": ["?commit", "?t", "?machine"],
  "where": [
    { "@id": "?commit", "f:t": "?t" },
    { "@id": "?commit", "ex:machine": "?machine" }
  ]
}

SPARQL Query:

PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/ns/>

SELECT ?commit ?t ?machine
FROM <mydb:main#txn-meta>
WHERE {
  ?commit f:t ?t .
  OPTIONAL { ?commit ex:machine ?machine }
}

User-Defined Named Graphs

Fluree supports user-defined named graphs ingested via TriG format. These graphs are queryable using the structured from object syntax with a graph field.

For the ledger-scoped HTTP endpoint (POST /query/{ledger}), the server also accepts a convenient shorthand:

  • "from": "txn-meta" / "from": "default" / "from": "<graph IRI>" to select a graph within the ledger in the URL.

Ingesting data with named graphs (TriG):

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/trig" \
  -d '@prefix ex: <http://example.org/ns/> .

      GRAPH <http://example.org/graphs/products> {
          ex:widget ex:name "Widget" ;
                    ex:price "29.99"^^xsd:decimal .
      }'

Querying the named graph (JSON-LD):

Use the structured from object with a graph field specifying the graph IRI:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": {
    "@id": "mydb:main",
    "graph": "http://example.org/graphs/products"
  },
  "select": ["?name", "?price"],
  "where": [
    { "@id": "?product", "ex:name": "?name" },
    { "@id": "?product", "ex:price": "?price" }
  ]
}

With time-travel:

{
  "from": {
    "@id": "mydb:main",
    "t": 100,
    "graph": "http://example.org/graphs/products"
  },
  "select": ["?name", "?price"],
  "where": [...]
}

Combining multiple graphs (JSON-LD):

Query across the default graph and user-defined named graphs:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": "mydb:main",
  "fromNamed": {
    "products": {
      "@id": "mydb:main",
      "@graph": "http://example.org/graphs/products"
    }
  },
  "select": ["?company", "?product", "?price"],
  "where": [
    { "@id": "?company", "@type": "ex:Company" },
    ["graph", "products", { "@id": "?product", "ex:name": "?productName", "ex:price": "?price" }]
  ]
}

Notes:

  • Named graphs are queryable after indexing completes
  • The @graph field accepts the full graph IRI (no URL-encoding required)
  • Time-travel is specified via the t, iso, or sha field in the object form
  • Object keys in fromNamed serve as dataset-local aliases for use in GRAPH patterns

Graph Source Object Schema

fromNamed (named graphs) — preferred format

fromNamed is an object whose keys are dataset-local aliases. Each value has:

FieldTypeRequiredDescription
@idstringYesLedger reference (e.g., mydb:main, mydb:main@t:100)
@graphstringNoGraph selector: "default", "txn-meta", or full IRI
tintegerNoTime-travel: specific transaction number
atstringNoTime-travel: ISO-8601 timestamp or commit:<hash>
policyobjectNoPer-source policy override (see below)

from (default graphs) — object syntax

When using object syntax for from, the following fields are available:

FieldTypeRequiredDescription
@idstringYesLedger reference (e.g., mydb:main, mydb:main@t:100)
aliasstringNoDataset-local alias for GRAPH pattern reference
graphstringNoGraph selector: "default", "txn-meta", or full IRI
tintegerNoTime-travel: specific transaction number
isostringNoTime-travel: ISO-8601 timestamp
commit_idstringNoTime-travel: commit ContentId
policyobjectNoPer-source policy override (see below)

Legacy format: The array format "from-named": [...] with "alias" and "graph" fields is still accepted for backward compatibility. The "fromNamed" object format is preferred.

Dataset-Local Aliases

Aliases provide short names for referencing graphs in query patterns. They are especially useful when:

  1. Same graph IRI exists in multiple ledgers - Use distinct aliases to disambiguate
  2. Complex IRIs - Use short aliases instead of repeating long IRIs

Example: Disambiguating same graph IRI across ledgers

{
  "@context": { "ex": "http://example.org/ns/" },
  "fromNamed": {
    "salesProducts": {
      "@id": "sales:main",
      "@graph": "http://example.org/vocab#products"
    },
    "inventoryProducts": {
      "@id": "inventory:main",
      "@graph": "http://example.org/vocab#products"
    }
  },
  "select": ["?g", "?sku", "?data"],
  "where": [
    ["graph", "?g", { "@id": "?sku", "ex:data": "?data" }]
  ]
}

In this example, both ledgers have a graph with the same IRI (http://example.org/vocab#products). The aliases salesProducts and inventoryProducts (the object keys) allow you to reference them distinctly.

Validation Rules:

  • Aliases must be unique across the entire dataset (both from and fromNamed)
  • Aliases cannot collide with identifiers (the @id values)
  • Duplicate aliases will cause an error

Graph Selector Values

The graph field accepts three types of values:

ValueMeaning
"default"Explicitly select the ledger’s default graph
"txn-meta"Select the built-in transaction metadata graph (urn:fluree:{ledger_id}#txn-meta)
"<full-iri>"Select a user-defined named graph by its full IRI

Note: If using #txn-meta fragment syntax in @id, do not also specify graph: "txn-meta". This is considered ambiguous and will return an error.

Per-Source Policy Override

Each graph source can have its own policy, enabling fine-grained access control where different graphs in the same query use different policies.

Policy object fields:

  • identity: Identity IRI string
  • policy-class: Policy class IRI or array of IRIs
  • policy: Inline policy JSON
  • policy-values: Policy parameter values
  • default-allow: Boolean (default: false). Governs access when no policies match. Ignored (forced false) if identity is specified but has no subject node in the ledger.

Example:

{
  "from": [
    {
      "@id": "public:main",
      "policy": {
        "default-allow": true
      }
    },
    {
      "@id": "sensitive:main",
      "policy": {
        "identity": "did:fluree:alice",
        "policy-class": ["ex:EmployeePolicy"],
        "default-allow": false
      }
    }
  ],
  "select": ["?data"],
  "where": [{ "@id": "?s", "ex:data": "?data" }]
}

Policy Precedence:

  • Per-source policy takes precedence over global opts policy
  • If a source has no policy field, the global policy (if any) applies

Multi-Ledger Queries

Query across different ledgers:

JSON-LD Query:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": ["customers:main", "orders:main"],
  "select": ["?customer", "?order"],
  "where": [
    { "@id": "?customer", "ex:name": "Alice" },
    { "@id": "?order", "ex:customer": "?customer" }
  ]
}

SPARQL:

PREFIX ex: <http://example.org/ns/>

SELECT ?customer ?order
FROM <customers:main>
FROM <orders:main>
WHERE {
  ?customer ex:name "Alice" .
  ?order ex:customer ?customer .
}

Time-Aware Datasets

Query graphs at different time points:

JSON-LD Query:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": ["ledger1:main@t:100", "ledger2:main@t:200"],
  "select": ["?data"],
  "where": [
    { "@id": "?entity", "ex:data": "?data" }
  ]
}

SPARQL:

PREFIX ex: <http://example.org/ns/>

SELECT ?data
FROM <ledger1:main@t:100>
FROM <ledger2:main@t:200>
WHERE {
  ?entity ex:data ?data .
}

Graph Patterns

Default Graph Only

Query only the default graph:

SPARQL:

SELECT ?name
FROM <mydb:main>
WHERE {
  ?person ex:name ?name .
  # Matches triples in default graph only
}

Named Graph Only

Query only named graphs:

SPARQL:

SELECT ?name
FROM NAMED <mydb:main>
WHERE {
  GRAPH <mydb:main> {
    ?person ex:name ?name .
  }
}

Mixed Patterns

Combine default and named graph patterns:

SPARQL:

PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/ns/>

SELECT ?name ?commit ?t
FROM <mydb:main>
FROM NAMED <mydb:main#txn-meta>
WHERE {
  ?person ex:name ?name .
  GRAPH <mydb:main#txn-meta> {
    ?commit f:t ?t .
  }
}

Use Cases

Data Integration

Combine data from multiple sources:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": ["customers:main", "products:main", "orders:main"],
  "select": ["?customer", "?product", "?order"],
  "where": [
    { "@id": "?customer", "ex:name": "Alice" },
    { "@id": "?order", "ex:customer": "?customer" },
    { "@id": "?order", "ex:product": "?product" }
  ]
}

Cross-Ledger Joins

Join data across different ledgers:

PREFIX ex: <http://example.org/ns/>

SELECT ?customer ?order ?product
FROM <customers:main>
FROM <orders:main>
FROM <products:main>
WHERE {
  ?customer ex:name "Alice" .
  ?order ex:customer ?customer .
  ?order ex:product ?product .
}

SERVICE for Cross-Ledger Queries

Use SPARQL SERVICE to explicitly target specific ledgers within a query:

PREFIX ex: <http://example.org/ns/>

SELECT ?customerName ?productName ?quantity
FROM <customers:main>
FROM NAMED <orders:main>
FROM NAMED <products:main>
WHERE {
  # Get customer from default graph
  ?customer ex:name ?customerName .

  # Get orders from orders ledger
  SERVICE <fluree:ledger:orders:main> {
    ?order ex:customer ?customer ;
           ex:product ?product ;
           ex:quantity ?quantity .
  }

  # Get product details from products ledger
  SERVICE <fluree:ledger:products:main> {
    ?product ex:name ?productName .
  }
}

SERVICE provides explicit control over which ledger each pattern executes against, enabling complex cross-ledger joins with clear data provenance.

See SPARQL Service Queries for full documentation.

Time-Consistent Queries

Query multiple ledgers at the same point in time:

{
  "@context": { "ex": "http://example.org/ns/" },
  "from": [
    "products:main@t:1000",
    "inventory:main@t:1000",
    "pricing:main@t:1000"
  ],
  "select": ["?product", "?stock", "?price"],
  "where": [
    { "@id": "?product", "ex:stockLevel": "?stock" },
    { "@id": "?product", "ex:price": "?price" }
  ]
}

Error Handling

Common Dataset Errors

ErrorCauseResolution
Duplicate aliasSame alias used twice in dataset specUse unique aliases for each source
Alias collisionalias matches an existing @idChoose a different alias name
Ambiguous graph selectorBoth #txn-meta fragment AND graph field specifiedUse only one method
Unknown ledgerLedger reference not foundVerify ledger exists and is accessible
Unknown graph IRIGraph IRI not found in ledgerVerify graph was ingested and indexed
Binary index requiredNamed graph query requires binary indexEnsure ledger has been indexed

Example Error Messages

Duplicate alias:

{
  "error": "Duplicate dataset-local alias: 'products' appears multiple times"
}

Ambiguous graph selector:

{
  "error": "Ambiguous graph selector: cannot specify both #txn-meta fragment and graph field"
}

SPARQL Execution Modes

Fluree supports two SPARQL execution modes:

Ledger-Bound Mode

When a query targets a single ledger (via endpoint or single FROM clause), GRAPH patterns reference named graphs within that ledger:

-- Ledger-bound: GRAPH references graphs inside mydb:main
SELECT ?name ?price
FROM <mydb:main>
WHERE {
  GRAPH <http://example.org/graphs/products> {
    ?product ex:name ?name ;
             ex:price ?price .
  }
}

Connection-Bound Mode

When querying across multiple ledgers, use SERVICE to select which ledger each pattern executes against:

-- Connection-bound: SERVICE selects the target ledger
SELECT ?name ?stock
WHERE {
  SERVICE <fluree:ledger:sales:main> {
    GRAPH <http://example.org/graphs/products> {
      ?product ex:name ?name .
    }
  }
  SERVICE <fluree:ledger:inventory:main> {
    ?product ex:stock ?stock .
  }
}

When to use each mode:

  • Ledger-bound: Single ledger queries, standard SPARQL datasets within one ledger
  • Connection-bound: Multi-ledger queries, explicit control over data provenance

Best Practices

  1. Consistent Time Points: Use the same time specifier for all graphs in a query
  2. Graph Selection: Use FROM NAMED when you need to identify the source graph
  3. Use Aliases: Create meaningful aliases for complex graph IRIs or disambiguation
  4. Performance: Queries across multiple ledgers may be slower
  5. Data Locality: Consider data locality when designing multi-ledger queries
  6. Policy Granularity: Use per-source policy when different graphs need different access control

CONSTRUCT Queries

CONSTRUCT queries generate RDF graphs from query results, enabling you to transform and reshape data into new graph structures.

Overview

CONSTRUCT queries return RDF graphs instead of variable bindings. They’re useful for:

  • Extracting subgraphs
  • Transforming data structures
  • Creating new graph views
  • Generating RDF for export

Basic CONSTRUCT

SPARQL CONSTRUCT

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?person ex:displayName ?name .
}
WHERE {
  ?person ex:name ?name .
}

This generates a new graph with ex:displayName properties from ex:name values.

Multiple Triples

Construct multiple triples per solution:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?person ex:displayName ?name .
  ?person ex:hasAge ?age .
}
WHERE {
  ?person ex:name ?name .
  ?person ex:age ?age .
}

Complex Patterns

Conditional Construction

Use filters to conditionally construct triples:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?person ex:status ex:Adult .
}
WHERE {
  ?person ex:age ?age .
  FILTER (?age >= 18)
}

Transitive Relationships

Construct inferred relationships:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?person ex:knows ?friendOfFriend .
}
WHERE {
  ?person ex:friend ?friend .
  ?friend ex:friend ?friendOfFriend .
}

CONSTRUCT with Aggregation

Construct triples from aggregated data:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?category ex:productCount ?count .
}
WHERE {
  {
    SELECT ?category (COUNT(?product) AS ?count)
    WHERE {
      ?product ex:category ?category .
    }
    GROUP BY ?category
  }
}

Use Cases

Extract Subgraph

Extract a subgraph for a specific entity:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?s ?p ?o .
}
WHERE {
  ex:alice ?p ?o .
  BIND (ex:alice AS ?s)
}

Transform Data Structure

Transform data into a different structure:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?order ex:hasItem [
    ex:product ?product ;
    ex:quantity ?quantity
  ] .
}
WHERE {
  ?order ex:item ?item .
  ?item ex:product ?product .
  ?item ex:quantity ?quantity .
}

Generate Inferred Facts

Generate inferred relationships:

PREFIX ex: <http://example.org/ns/>

CONSTRUCT {
  ?person ex:ancestor ?ancestor .
}
WHERE {
  ?person ex:parent+ ?ancestor .
}

Best Practices

  1. Specific Patterns: Construct specific patterns rather than wildcards
  2. Filter Early: Apply filters in WHERE clause, not CONSTRUCT
  3. Avoid Duplicates: Use DISTINCT if needed
  4. Performance: CONSTRUCT can be expensive for large result sets

Graph Crawl

Graph crawl enables recursive traversal of relationships — following links between entities to discover connected data. This is built on property paths, which provide operators for transitive, inverse, and multi-predicate traversal.

Overview

Graph crawl queries traverse relationships in the graph, following links from one entity to another. Common use cases:

  • Social networks — Find friends-of-friends, influence chains
  • Organizational hierarchies — Traverse reporting structures
  • Knowledge graphs — Follow related concepts across multiple hops
  • Dependency graphs — Trace transitive dependencies
  • Bill of materials — Recursive part containment

Property path operators

Property paths are the foundation of graph crawl. They let you follow relationships beyond a single hop.

OperatorSyntaxDescriptionExample
One or more (+)ex:knows+Follow 1+ times (transitive closure)Friends of friends
Zero or more (*)ex:knows*Follow 0+ times (includes self)Self and all reachable
Inverse (^)^ex:reportsToFollow in reverse directionWho reports to me?
Alternative (|)ex:knows|ex:colleagueMatch any of several predicatesSocial or professional connections
Sequence (/)ex:knows/ex:nameChain of predicatesNames of friends

JSON-LD Query syntax

Property paths are defined using @path in the @context:

{
  "@context": {
    "ex": "http://example.org/",
    "allReports": { "@path": "^ex:reportsTo+" }
  },
  "select": ["?name"],
  "where": [
    { "@id": "ex:ceo", "allReports": "?person" },
    { "@id": "?person", "ex:name": "?name" }
  ]
}

Two syntax forms are available:

String form (SPARQL-style):

"knowsTransitive": { "@path": "ex:knows+" }

Array form (S-expression):

"knowsTransitive": { "@path": ["+", "ex:knows"] }

SPARQL syntax

Property paths are native SPARQL syntax:

PREFIX ex: <http://example.org/>

# All people reachable through ex:knows (1+ hops)
SELECT ?person WHERE {
  ex:alice ex:knows+ ?person .
}

Patterns

Friend-of-friend network

Find everyone Alice knows, directly or transitively:

SPARQL:

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

SELECT ?name WHERE {
  ex:alice ex:knows+ ?person .
  ?person schema:name ?name .
}

JSON-LD:

{
  "@context": {
    "ex": "http://example.org/",
    "schema": "http://schema.org/",
    "knowsTransitive": { "@path": "ex:knows+" }
  },
  "select": ["?name"],
  "where": [
    { "@id": "ex:alice", "knowsTransitive": "?person" },
    { "@id": "?person", "schema:name": "?name" }
  ]
}

Organizational hierarchy

Find all people who report to a manager (at any level):

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

SELECT ?name WHERE {
  ?person ex:reportsTo+ ex:vp-engineering .
  ?person schema:name ?name .
}

Or use inverse path to start from the top:

SELECT ?name WHERE {
  ex:vp-engineering ^ex:reportsTo+ ?person .
  ?person schema:name ?name .
}

Class hierarchy (RDFS)

Find all subclasses of a class:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?subclass WHERE {
  ?subclass rdfs:subClassOf+ ex:Vehicle .
}

Path chaining (sequence)

Follow a chain of different predicates:

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

# Names of Alice's friends' managers
SELECT ?managerName WHERE {
  ex:alice ex:knows/ex:reportsTo ?manager .
  ?manager schema:name ?managerName .
}

In JSON-LD:

{
  "@context": {
    "ex": "http://example.org/",
    "schema": "http://schema.org/",
    "friendManager": { "@path": "ex:knows/ex:reportsTo" }
  },
  "select": ["?managerName"],
  "where": [
    { "@id": "ex:alice", "friendManager": "?manager" },
    { "@id": "?manager", "schema:name": "?managerName" }
  ]
}

Multi-relationship traversal

Follow any of several relationship types:

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

# People connected by friendship OR professional relationship
SELECT ?name WHERE {
  ex:alice (ex:knows|ex:colleague)+ ?person .
  ?person schema:name ?name .
}

Self-inclusive traversal (zero or more)

Use * to include the starting node:

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

# Alice and everyone she knows (transitively)
SELECT ?name WHERE {
  ex:alice ex:knows* ?person .
  ?person schema:name ?name .
}

With *, Alice herself is included in results (zero hops). With +, only her connections are returned.

Inverse relationships

Find who links to a given entity:

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

# Who has Alice as a friend?
SELECT ?name WHERE {
  ?person ex:knows ex:alice .
  ?person schema:name ?name .
}

# Same thing using inverse path syntax
SELECT ?name WHERE {
  ex:alice ^ex:knows ?person .
  ?person schema:name ?name .
}

Inverse paths are especially useful in transitive queries:

# All ancestors in a taxonomy
SELECT ?ancestor WHERE {
  ex:goldRetriever rdfs:subClassOf+ ?ancestor .
}

# All descendants (inverse)
SELECT ?descendant WHERE {
  ex:animal ^rdfs:subClassOf+ ?descendant .
}

Performance considerations

Property path cost

OperatorCostNotes
Simple predicateO(log n)Single index lookup
Sequence (/)O(k * log n)k joins, each indexed
One-or-more (+)O(reachable * log n)Breadth-first expansion
Zero-or-more (*)O(reachable * log n)Same as + plus start node
Alternative (|)O(sum of alternatives)Each alternative evaluated
Inverse (^)O(log n)Uses OPST index

Transitive operators (+, *) expand breadth-first and track visited nodes to detect cycles. The cost is proportional to the number of reachable nodes, not the total graph size.

Optimizing traversals

  1. Start from the specific side — If you know one endpoint, start there. ex:alice ex:knows+ ?person is faster than ?person ex:knows+ ex:alice because it anchors the traversal.

  2. Add filters after traversal — Filter the results of a traversal rather than trying to filter during:

    SELECT ?name WHERE {
      ex:alice ex:knows+ ?person .
      ?person schema:name ?name .
      ?person ex:department "Engineering" .
    }
    
  3. Use + over * when possible* includes the start node and typically has one more step to evaluate.

  4. Prefer sequence over transitive for known depth — If you know the relationship is exactly 2 hops, use a sequence (ex:a/ex:b) or two explicit patterns instead of ex:a+.

  5. Combine with LIMIT — For exploration, limit results to avoid materializing the full reachable set:

    SELECT ?person WHERE {
      ex:alice ex:knows+ ?person .
    } LIMIT 100
    

Cycle handling

Fluree’s property path engine tracks visited nodes during transitive expansion. If a cycle is encountered (e.g., A knows B knows C knows A), the traversal stops at the already-visited node. This prevents infinite loops without requiring user intervention.

Property paths vs. explicit patterns

For fixed-depth queries, explicit patterns are equivalent and sometimes clearer:

Property path (2 hops):

SELECT ?fof WHERE {
  ex:alice ex:knows/ex:knows ?fof .
}

Explicit patterns (2 hops):

SELECT ?fof WHERE {
  ex:alice ex:knows ?friend .
  ?friend ex:knows ?fof .
}

Both produce the same results. Use property paths when:

  • The depth is variable or unknown (transitive closure)
  • You want compact syntax for chains
  • You need alternative or inverse traversal

Use explicit patterns when:

  • The depth is fixed and small
  • You need to bind intermediate variables (e.g., ?friend above)
  • You want maximum clarity

Explain Plans

Explain plans provide insight into how the query planner reorders WHERE-clause patterns, helping you understand optimization decisions and diagnose performance issues.

Overview

Explain plans show:

  • Whether patterns were reordered and why
  • Whether database statistics were available for optimization
  • The cardinality category and cost estimate assigned to each pattern
  • The original vs. optimized pattern order
  • Execution strategy hints for special fast paths such as fused property-join stars

Requesting Explain Plans

JSON-LD Query

Use the /fluree/explain endpoint (or the CLI fluree query --explain ...) to get a plan without executing. For JSON-LD, the explain request body is the same as a normal JSON-LD query body.

{
  "@context": { "ex": "http://example.org/ns/" },
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ],
  "from": "mydb:main"
}

SPARQL

Use the explain endpoint with SPARQL content type:

PREFIX ex: <http://example.org/ns/>

SELECT ?name
WHERE {
  ?person ex:name ?name .
}

How the Query Planner Works

The query planner reorders WHERE-clause patterns to minimize the number of intermediate rows flowing through the execution pipeline. It uses a greedy algorithm that places patterns one at a time, choosing the cheapest eligible pattern at each step.

Pattern Categories

Every pattern is classified into one of four cardinality categories:

CategoryMeaningPatterns
SourceProduces rows (estimated row count)Triple, VALUES, UNION, Subquery, IndexSearch, VectorSearch, GeoSearch, S2Search, Graph, PropertyPath, R2rml, Service
ReducerShrinks the stream (multiplier < 1.0)MINUS, EXISTS, NOT EXISTS
ExpanderGrows the stream (multiplier >= 1.0)OPTIONAL
DeferredNo cardinality effectFILTER, BIND

Placement Priority

The greedy loop places patterns in this priority order:

  1. Eligible reducers (lowest multiplier first) — shrink the stream as early as possible.
  2. Sources (lowest row count first, preferring patterns that join on already-bound variables) — most selective first.
  3. Eligible expanders (lowest multiplier first) — defer row expansion until prerequisite variables are bound.

A reducer or expander is “eligible” when at least one of its variables is already bound by a previously placed pattern.

FILTER and BIND patterns are integrated into the greedy loop: after each source, reducer, or expander is placed, any deferred patterns whose input variables are now satisfied are drained in original-position order. For BIND patterns, only the expression’s input variables must be bound — the target variable is an output that feeds back into bound_vars, potentially enabling further deferred patterns to be placed immediately (cascading placement).

Compound Pattern Nesting

When a deferred pattern (FILTER or BIND) becomes ready and the last placed pattern is a compound pattern (UNION, Graph, or Service), the planner nests the deferred pattern into the compound pattern’s inner lists instead of appending it after. This enables the deferred pattern to participate in the compound pattern’s inner reorder_patterns pipeline, unlocking:

  • Optimal placement after the specific triple that binds its variable
  • Range-safe filter pushdown to index scans
  • Inline evaluation during joins

Nesting occurs only when the compound pattern guarantees the deferred pattern’s required variable is bound:

CompoundNest?Guarantee
UNIONYesVariable must appear in the intersection of all branches
GraphYesVariable is in inner patterns or is the graph name variable
ServiceYesVariable is in inner patterns or is the endpoint variable
OPTIONALNoLeft-join: inner vars may be Unbound
MINUSNoAnti-join: inner vars not exported to outer scope
EXISTSNoFilter-only: inner vars not exported
NOT EXISTSNoFilter-only: inner vars not exported

For UNION, the deferred pattern is cloned into every branch. For Graph and Service, it is appended to the inner pattern list. Recursion is handled naturally: when a nested filter lands inside a branch containing another compound pattern, the branch’s reorder_patterns call applies the same logic.

Bound-Variable-Aware Estimation

The planner tracks which variables become bound as each pattern is placed. This significantly affects estimates for subsequent patterns:

  • A triple ?s :name ?name with ?s unbound is a property scan — estimated at the full property count (or a 1000-row fallback).
  • The same triple with ?s already bound from an earlier pattern is a per-subject lookup — estimated at count / ndv_subjects (typically ~10 rows).

This context-aware scoring also applies inside compound patterns: UNION branches and subqueries receive database statistics and use the same selectivity model for their inner patterns.

Statistics-Based vs. Fallback Scoring

When a StatsView is available (after at least one indexing cycle), the planner uses HLL-derived property statistics:

  • count: total number of triples for this predicate
  • ndv_subjects: number of distinct subjects
  • ndv_values: number of distinct objects

Without statistics, the planner falls back to heuristic constants:

Pattern TypeFallback Estimate
ExactMatch1
BoundSubject10
BoundObject1,000
PropertyScan1,000
FullScan1e12

Search and Graph Source Estimates

Search patterns (IndexSearch, VectorSearch, GeoSearch, S2Search) use their limit field when present. Without an explicit limit, the planner assumes a default of 100 rows. Graph patterns recursively estimate their inner patterns. Service patterns use a very high estimate (1e12) so they are placed last among sources, minimizing data sent to the remote endpoint.

Reading Explain Output

The explain plan shows two sections: the original pattern order and the optimized order. Each pattern is annotated with its category and estimate.

Example output for a multi-pattern query:

=== Query Optimization Explain (Generalized) ===

Statistics available: yes
Optimization: patterns reordered

--- Original Pattern Order ---
  [1] ?s :age ?age | category=Source row_count=5000
  [2] ?s :name ?name | category=Source row_count=10000
  [3] FILTER((> ?age 25)) | category=Deferred
  [4] OPTIONAL { ?s :email ?email } | category=Expander multiplier=1.00

--- Optimized Pattern Order ---
  [1] ?s :age ?age | category=Source row_count=5000
  [2] FILTER((> ?age 25)) | category=Deferred
  [3] ?s :name ?name | category=Source row_count=10000
  [4] OPTIONAL { ?s :email ?email } | category=Expander multiplier=1.00

Key things to look for:

  • Source row_count: Lower values are placed first. If a pattern with a high row count appears early, it may indicate missing statistics or an inherently broad pattern.
  • Reducer multiplier: Values below 1.0 indicate the fraction of rows that survive. A MINUS with multiplier 0.90 removes ~10% of rows.
  • Deferred placement: FILTERs and BINDs appear immediately after all of their input variables become bound. BIND outputs cascade — a BIND placed early can enable subsequent FILTERs or BINDs that depend on its target variable. If a FILTER appears late, check whether its variables could be bound sooner.
  • Statistics available: no: Without statistics, the planner uses conservative heuristics. Run at least one indexing cycle to enable statistics-based optimization.

Execution Hints

Explain responses may also include an execution-hints array. These are not generic cardinality estimates; they describe when the executor expects to use a specialized path after planning.

For the star-join work, look for:

  • property_join: the planner chose the same-subject property-join path
  • property_join_fused_star: the planner chose property join and also fused trailing same-subject single-triple OPTIONALs plus eligible trailing FILTER/BIND patterns into the same star operator

Typical fields include:

  • required_triples: number of required star predicates
  • fused_optional_triples: number of fused trailing OPTIONAL triples
  • fused_filters: number of trailing filters evaluated inside the star path
  • fused_binds: number of trailing binds evaluated inside the star path
  • width_score: weighted star width used by the property-join gate
  • optional_bonus: how much of the width score came from trailing optionals

This is the clearest signal that a query like:

?deal a crm:Deal ;
      crm:name ?name ;
      crm:amount ?amount ;
      crm:stage ?stage .
OPTIONAL { ?deal crm:probability ?probability }
OPTIONAL { ?deal crm:closedAt ?closedAt }
FILTER (!STRSTARTS(STR(?stage), "Closed"))

is using the fused two-pass star path rather than falling back to separate OPTIONAL and FILTER operators.

Indexes

Scan operations use one of four index permutations depending on which components of the triple pattern are bound:

  • SPOT: Subject-Predicate-Object-Time — used when the subject is bound
  • POST: Predicate-Object-Subject-Time — used for predicate+object lookups
  • OPST: Object-Predicate-Subject-Time — used for object-based lookups
  • PSOT: Predicate-Subject-Object-Time — used for full predicate scans

Filter Optimization

Filters are automatically optimized by the query engine in three ways:

  • Dependency-based placement: Filters and BINDs are placed as soon as all their input variables are bound, as part of the greedy reordering loop. BIND target variables feed back into the bound set, enabling cascading placement of dependent patterns.
  • Index pushdown: Range-safe filters (comparisons like >, <, >=, <= on indexed properties) are pushed down to the index scan, reducing the number of rows read.
  • Inline evaluation: Filters whose variables are all bound by a join are evaluated inside the join operator itself, avoiding the overhead of a separate filter pass.
  • BIND filter fusion: When a FILTER’s last required variable is the output of a BIND, the filter is fused into the BindOperator and evaluated inline after computing each row’s BIND value. Failing rows are dropped before materialization, eliminating a separate FilterOperator pass.

Best Practices

  1. Review plans for new queries: Use explain to verify that the planner chose a reasonable order, especially for queries with many patterns.
  2. Ensure statistics are available: Statistics enable much better estimates. If explain shows “Statistics available: no”, check that at least one indexing cycle has completed.
  3. Check for high row counts early in the plan: A source with a very high row count placed first can indicate a missing join variable or an overly broad pattern.
  4. Use LIMIT on search patterns: IndexSearch, VectorSearch, GeoSearch, and S2Search patterns use their limit field for cost estimation. Providing an explicit limit helps the planner place them more accurately.

Tracking and Fuel Limits

Fluree provides query tracking and fuel limits to monitor and control query execution, ensuring system stability and performance.

Query Tracking

Query tracking provides visibility into query execution, helping you understand query behavior and performance.

Enable Tracking

Enable tracking via the opts object. Use "meta": true to enable all tracking, or selectively enable specific metrics:

{
  "@context": { "ex": "http://example.org/ns/" },
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ],
  "opts": { "meta": true }
}

Or enable specific metrics:

{
  "opts": {
    "meta": {
      "time": true,
      "fuel": true,
      "policy": true
    }
  }
}

Tracked Information

Tracking provides:

  • time: Query execution duration (formatted as “12.34ms”)
  • fuel: Total cost as a decimal value (rounded to 3 places)
  • policy: Policy evaluation statistics ({policy-id: {executed: N, allowed: M}})

Fuel Limits

Fuel limits control resource consumption, preventing runaway queries from consuming excessive resources.

What Is Fuel?

Fuel is a decimal measure of query/transaction cost. Internally it is accumulated as micro-fuel (1 fuel = 1000 micro-fuel) and reported back rounded to 3 decimal places. Costs reflect actual work — primarily I/O — rather than output cardinality.

Cost ladder (per event):

EventCost (fuel)
Index leaflet touched (per scan batch, regardless of cache state)1.000
Forward-dict touch (per dict-backed value resolved during result materialization)1.000
Flake returned from a db.range call (e.g. SHACL graph reads, graph crawl)0.001
Overlay/novelty row materialized0.001
R2RML row emitted (Iceberg/Parquet)0.001
Transaction commit baseline (once per commit)100.000
Staged flake (per non-schema flake in a transaction)0.001
REGEX / REPLACE evaluation0.001
Hash function (MD5, SHA1, SHA256, SHA384, SHA512)0.001
UUID / STRUUID0.001
geof:distance0.001
Vector similarity (DotProduct, CosineSimilarity, EuclideanDistance)0.002
Fulltext (per-row BM25 scoring)0.005

Cheap operations (comparisons, arithmetic, type checks, simple string ops, datetime extraction, etc.) cost zero — instrumentation overhead would dwarf the actual cost.

Setting Fuel Limits

Set fuel limits via opts.max-fuel (decimal allowed). Setting a fuel limit implicitly enables fuel tracking:

{
  "@context": { "ex": "http://example.org/ns/" },
  "select": ["?name"],
  "where": [
    { "@id": "?person", "ex:name": "?name" }
  ],
  "opts": { "max-fuel": 10000 }
}

You can also use "maxFuel" or "max_fuel" as alternative key names. The HTTP equivalent is the fluree-max-fuel header.

Fuel Limit Behavior

When fuel limit is exceeded:

  • Query execution stops
  • Error returned to client
  • Partial results not returned

Response Format

When tracking is enabled, the response includes tracking information as top-level siblings:

{
  "status": 200,
  "result": [...],
  "time": "12.34ms",
  "fuel": 42.317,
  "policy": {
    "http://example.org/myPolicy": {
      "executed": 10,
      "allowed": 8
    }
  }
}

The fuel value is decimal with up to 3 places of precision. The HTTP x-fdb-fuel response header carries the same value.

Best Practices

Tracking

  1. Enable for Debugging: Use "opts": {"meta": true} to debug slow queries
  2. Monitor Performance: Track query performance over time
  3. Identify Bottlenecks: Use tracking to identify performance bottlenecks

Fuel Limits

  1. Set Appropriate Limits: Set fuel limits based on expected query complexity
  2. Monitor Fuel Usage: Track fuel usage to optimize queries
  3. Prevent Runaway Queries: Use fuel limits to prevent resource exhaustion

Query-Time Reasoning

This page covers how to enable and use reasoning in your queries. For background concepts see Reasoning and inference; for the full list of supported OWL/RDFS constructs see the OWL & RDFS reference.

The reasoning parameter

Add a "reasoning" key to any JSON-LD query to control which inference modes are active:

Single mode

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?s"],
  "where": {"@id": "?s", "@type": "ex:Person"},
  "reasoning": "rdfs"
}

Multiple modes

{
  "select": ["?s"],
  "where": {"@id": "?s", "@type": "ex:Person"},
  "reasoning": ["rdfs", "owl2rl"]
}

Disable reasoning

{
  "select": ["?s"],
  "where": {"@id": "?s", "@type": "ex:Person"},
  "reasoning": "none"
}

Use "none" to suppress auto-enabled RDFS and any ledger-wide defaults.

Valid mode strings

StringAliasesMode
"rdfs"RDFS subclass/subproperty expansion
"owl2ql""owl-ql", "owlql"OWL 2 QL query rewriting (includes RDFS)
"owl2rl""owl-rl", "owlrl"OWL 2 RL forward-chaining materialization
"datalog"Datalog rule execution
"none"Disable all reasoning

Default behavior

When the reasoning key is absent from a query:

  • RDFS auto-enables if your data contains rdfs:subClassOf or rdfs:subPropertyOf hierarchies. This is lightweight (query rewriting only) and usually desirable.
  • OWL 2 QL, OWL 2 RL, and Datalog remain disabled unless enabled via ledger-wide configuration.

To override ledger defaults for a single query, use "reasoning": "none".

Examples

The examples below assume this schema and data have been transacted:

{
  "@context": {
    "ex": "http://example.org/",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "owl": "http://www.w3.org/2002/07/owl#"
  },
  "insert": [
    {"@id": "ex:Student", "rdfs:subClassOf": {"@id": "ex:Person"}},
    {"@id": "ex:GradStudent", "rdfs:subClassOf": {"@id": "ex:Student"}},
    {"@id": "ex:alice", "@type": "ex:GradStudent", "ex:name": "Alice"},
    {"@id": "ex:bob", "@type": "ex:Person", "ex:name": "Bob"},

    {"@id": "ex:livesWith", "@type": "owl:SymmetricProperty"},
    {"@id": "ex:alice", "ex:livesWith": {"@id": "ex:bob"}},

    {"@id": "ex:hasAncestor", "@type": "owl:TransitiveProperty"},
    {"@id": "ex:carol", "ex:hasAncestor": {"@id": "ex:dave"}},
    {"@id": "ex:dave", "ex:hasAncestor": {"@id": "ex:eve"}},

    {"@id": "ex:hasMother", "owl:inverseOf": {"@id": "ex:childOf"}},
    {"@id": "ex:frank", "ex:hasMother": {"@id": "ex:grace"}}
  ]
}

RDFS: subclass expansion

Query for all ex:Person instances — Alice is returned even though she was only typed as ex:GradStudent:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?name"],
  "where": {
    "@id": "?s", "@type": "ex:Person",
    "ex:name": "?name"
  },
  "reasoning": "rdfs"
}

Result: ["Alice", "Bob"]

Without reasoning (or with "reasoning": "none"), only "Bob" is returned because Alice’s explicit type is GradStudent, not Person.

OWL 2 RL: symmetric properties

Query who lives with Bob — Alice is inferred even though only alice livesWith bob was asserted:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?who"],
  "where": {"@id": "ex:bob", "ex:livesWith": "?who"},
  "reasoning": "owl2rl"
}

Result: ["ex:alice"]

OWL 2 RL: transitive properties

Query for all ancestors of Carol — Eve is inferred through transitivity:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?ancestor"],
  "where": {"@id": "ex:carol", "ex:hasAncestor": "?ancestor"},
  "reasoning": "owl2rl"
}

Result: ["ex:dave", "ex:eve"]

OWL 2 QL: inverse properties

Query childOf — inferred from the hasMother / inverseOf declaration:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?child"],
  "where": {"@id": "ex:grace", "ex:childOf": "?child"},
  "reasoning": "owl2ql"
}

Result: ["ex:frank"]

OWL 2 RL: domain and range inference

If your schema declares rdfs:domain and rdfs:range:

{
  "insert": [
    {"@id": "ex:teaches", "rdfs:domain": {"@id": "ex:Professor"},
                          "rdfs:range": {"@id": "ex:Course"}},
    {"@id": "ex:alice", "ex:teaches": {"@id": "ex:cs101"}}
  ]
}

Then with "reasoning": "owl2rl":

  • ex:alice rdf:type ex:Professor is inferred (from domain)
  • ex:cs101 rdf:type ex:Course is inferred (from range)

Combined modes

Enable RDFS + OWL 2 RL + Datalog together:

{
  "select": ["?s"],
  "where": {"@id": "?s", "@type": "ex:Person"},
  "reasoning": ["rdfs", "owl2rl", "datalog"],
  "rules": [
    {
      "@context": {"ex": "http://example.org/"},
      "where": {"@id": "?p", "ex:parent": {"ex:parent": "?gp"}},
      "insert": {"@id": "?p", "ex:grandparent": {"@id": "?gp"}}
    }
  ]
}

OWL 2 RL facts are materialized first, then Datalog rules run over the combined base + OWL data, and finally RDFS query rewriting is applied.

SPARQL

In SPARQL queries, reasoning is controlled via the Fluree-specific PRAGMA reasoning directive. Property paths (+, *, ^) provide a complementary mechanism for navigating transitive and inverse relationships directly in the query pattern — see SPARQL for details.

Interaction with ledger configuration

If f:reasoningDefaults is set in the ledger configuration graph (see Setting groups), those modes are the baseline for every query. The per-query reasoning parameter can:

  • Add modes — the query modes are merged with the defaults.
  • Disable all"reasoning": "none" overrides the defaults entirely.

The f:overrideControl setting on the ledger config determines whether query-time overrides are allowed. See Override control for details.

Performance considerations

ModeOverheadCaching
RDFSNegligible — query rewriting onlyN/A
OWL 2 QLNegligible — query rewriting onlyN/A
OWL 2 RLFirst query materializes derived facts; subsequent queries use cacheLRU cache (16 entries), keyed on database state + reasoning modes
DatalogEach unique rule set + database state combination is cachedSame LRU cache as OWL 2 RL

Tips:

  • Start with RDFS if you only need class/property hierarchies — it has virtually zero overhead.
  • Use OWL 2 QL when you also need inverse properties and domain/range inference but want to stay in the query-rewriting approach.
  • Use OWL 2 RL when you need the full rule set (transitive, symmetric, functional properties, owl:sameAs, restrictions, property chains).
  • The materialization cache is invalidated when the underlying data changes (new transactions), so the first query after a write will re-materialize.
TopicPage
Conceptual introductionReasoning and inference
Custom inference rulesDatalog rules
Supported OWL & RDFS constructsOWL & RDFS reference
Ledger-wide reasoning configSetting groups

Datalog Rules

Datalog rules let you define custom inference logic that goes beyond what OWL and RDFS provide. Rules are expressed in a familiar JSON-LD pattern syntax with where (conditions) and insert (conclusions) clauses, and execute in a fixpoint loop that can chain rules together.

For background concepts see Reasoning and inference; for enabling reasoning in queries see Query-time reasoning.

Quick example

Infer a grandparent relationship from two parent hops:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?gp"],
  "where": {"@id": "ex:alice", "ex:grandparent": "?gp"},
  "reasoning": "datalog",
  "rules": [
    {
      "@context": {"ex": "http://example.org/"},
      "where": {"@id": "?person", "ex:parent": {"ex:parent": "?gp"}},
      "insert": {"@id": "?person", "ex:grandparent": {"@id": "?gp"}}
    }
  ]
}

The rule says: “For any ?person whose parent has a parent ?gp, insert that ?person has a grandparent ?gp.” The query then finds Alice’s grandparents using the inferred facts.

Rule format

Each rule is a JSON object with three parts:

KeyRequiredDescription
@contextYesJSON-LD context for expanding compact IRIs
whereYesPattern(s) that must match for the rule to fire
insertYesPattern(s) of new facts to derive when the rule fires
@idNoOptional name/IRI for the rule (for documentation/debugging)

Where clause

The where clause defines the conditions under which the rule fires. It follows the same pattern syntax as JSON-LD queries.

Single pattern:

"where": {"@id": "?person", "ex:parent": "?parent"}

Multiple patterns (implicit join on shared variables):

"where": [
  {"@id": "?person", "ex:parent": "?parent"},
  {"@id": "?parent", "ex:name": "?parentName"}
]

Nested patterns (shorthand for multi-hop traversal):

"where": {"@id": "?person", "ex:parent": {"ex:parent": "?gp"}}

This is equivalent to two patterns joined on an intermediate variable.

With filters:

"where": [
  {"@id": "?person", "ex:age": "?age"},
  ["filter", "(>= ?age 65)"]
]

Insert clause

The insert clause defines what facts to produce for each set of matching variable bindings.

"insert": {"@id": "?person", "ex:grandparent": {"@id": "?gp"}}
  • Variables (?person, ?gp) are replaced with the bound values from where.
  • Use {"@id": "?var"} for IRI/entity values; use "?var" directly for literal values.
  • Multiple triples can be generated from a single insert pattern.

Providing rules

Rules can be provided in two ways:

1. Query-time rules

Pass rules directly in the query via the rules array. This is the simplest approach and doesn’t require any prior setup:

{
  "select": ["?result"],
  "where": {"@id": "?s", "ex:derived": "?result"},
  "reasoning": "datalog",
  "rules": [ ... ]
}

Note: Providing a rules array automatically enables datalog reasoning — you don’t strictly need "reasoning": "datalog", though including it is recommended for clarity.

2. Database-stored rules

Rules can be stored in the database as f:rule assertions and referenced via ledger configuration. This is useful for rules that should apply consistently across all queries.

Store a rule:

{
  "@context": {
    "f": "https://ns.flur.ee/db#",
    "ex": "http://example.org/"
  },
  "insert": {
    "@id": "ex:grandparentRule",
    "f:rule": {
      "@context": {"ex": "http://example.org/"},
      "where": {"@id": "?person", "ex:parent": {"ex:parent": "?gp"}},
      "insert": {"@id": "?person", "ex:grandparent": {"@id": "?gp"}}
    }
  }
}

Configure the ledger to use stored rules:

{
  "insert": {
    "@id": "urn:fluree:mydb:main:config:ledger",
    "@type": "f:LedgerConfig",
    "f:datalogDefaults": {
      "f:datalogEnabled": true,
      "f:rulesSource": {
        "@type": "f:GraphRef",
        "f:graphSource": {"f:graphSelector": {"@id": "f:defaultGraph"}}
      },
      "f:allowQueryTimeRules": true
    }
  }
}

See Setting groups — datalogDefaults for full configuration options.

When both stored and query-time rules are present, they are merged and execute together in the same fixpoint loop.

Examples

Sibling inference

Infer siblings from shared parents:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?sibling"],
  "where": {"@id": "ex:alice", "ex:sibling": "?sibling"},
  "reasoning": "datalog",
  "rules": [
    {
      "@context": {"ex": "http://example.org/"},
      "where": [
        {"@id": "?person", "ex:parent": "?parent"},
        {"@id": "?sibling", "ex:parent": "?parent"}
      ],
      "insert": {"@id": "?person", "ex:sibling": {"@id": "?sibling"}}
    }
  ]
}

Note: This rule will also infer that a person is their own sibling. You could add a filter ["filter", "(!= ?person ?sibling)"] to exclude self-references.

Chained rules (uncle + aunt)

Multiple rules that build on each other:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?aunt"],
  "where": {"@id": "ex:alice", "ex:aunt": "?aunt"},
  "reasoning": "datalog",
  "rules": [
    {
      "@context": {"ex": "http://example.org/"},
      "where": {"@id": "?person", "ex:parent": {"ex:brother": "?uncle"}},
      "insert": {"@id": "?person", "ex:uncle": {"@id": "?uncle"}}
    },
    {
      "@context": {"ex": "http://example.org/"},
      "where": {
        "@id": "?person",
        "ex:uncle": {
          "ex:spouse": {"@id": "?aunt", "ex:gender": {"@id": "ex:Female"}}
        }
      },
      "insert": {"@id": "?person", "ex:aunt": {"@id": "?aunt"}}
    }
  ]
}

The second rule (aunt) depends on facts derived by the first rule (uncle). The fixpoint loop handles this automatically — it keeps iterating until no new facts are produced.

Rules with filters

Classify people by age:

{
  "@context": {"ex": "http://example.org/"},
  "select": ["?person"],
  "where": {"@id": "?person", "ex:status": "senior"},
  "reasoning": "datalog",
  "rules": [
    {
      "@context": {"ex": "http://example.org/"},
      "where": [
        {"@id": "?person", "ex:age": "?age"},
        ["filter", "(>= ?age 65)"]
      ],
      "insert": {"@id": "?person", "ex:status": "senior"}
    }
  ]
}

Combining with OWL reasoning

Datalog rules can build on OWL-derived facts. For example, use OWL 2 RL to materialize transitive and symmetric properties, then use Datalog for custom business logic:

{
  "select": ["?recommendation"],
  "where": {"@id": "ex:alice", "ex:recommended": "?recommendation"},
  "reasoning": ["owl2rl", "datalog"],
  "rules": [
    {
      "@context": {"ex": "http://example.org/"},
      "where": [
        {"@id": "?person", "ex:friend": "?friend"},
        {"@id": "?friend", "ex:likes": "?item"},
        {"@id": "?person", "ex:likes": "?item"}
      ],
      "insert": {"@id": "?person", "ex:recommended": {"@id": "?item"}}
    }
  ]
}

If ex:friend is declared as a owl:SymmetricProperty, OWL 2 RL materializes the reverse friendship links, and then the Datalog rule can find items liked by mutual friends.

Execution model

Fixpoint evaluation

Rules execute in a fixpoint loop:

  1. All rules are applied against the current data (base + previously derived facts).
  2. New facts produced in this iteration are collected.
  3. If any new facts were produced, go back to step 1 with the expanded fact set.
  4. When no new facts are produced (fixpoint reached), the loop terminates.

This means:

  • Recursive rules work. A rule can produce facts that trigger itself again.
  • Rule chaining works. Rule A can produce facts that trigger Rule B, and vice versa.
  • Termination is guaranteed by the budget controls (max iterations, max facts, max time, max memory).

Execution order

Rules are topologically sorted by their predicate dependencies: a rule that generates ex:uncle triples runs before a rule that consumes ex:uncle in its where clause. This minimizes the number of fixpoint iterations needed.

Interaction with OWL 2 RL

When both OWL 2 RL and Datalog are enabled:

  1. OWL 2 RL materialization runs first.
  2. Datalog rules run over the combined base data + OWL-derived facts.
  3. Both result sets are merged into a single overlay for query execution.

Filter expressions

Filters use S-expression syntax within the where array:

["filter", "(expression)"]

Available operators

CategoryOperators
Comparison=, !=, <, >, <=, >=
Logicaland, or, not
Arithmetic+, -, *, /
Stringstr, strlen, contains, strstarts, strends
Type checkingisIRI, isBlank, isLiteral, bound

Examples

["filter", "(> ?age 21)"]
["filter", "(and (>= ?age 18) (< ?age 65))"]
["filter", "(contains ?name \"Smith\")"]
["filter", "(!= ?person ?other)"]

Performance considerations

  • Keep rules focused. Broad rules that match many patterns produce more derived facts and require more iterations.
  • Budget limits apply. The same time/fact/memory budgets as OWL 2 RL materialization apply to Datalog execution (default: 30s, 1M facts, 100MB).
  • Results are cached. The same rule set + database state returns instantly from cache on subsequent queries.
  • Query-time rules disable caching across queries with different rule sets, since the cache key includes a hash of the rules.
TopicPage
Conceptual introductionReasoning and inference
Enabling reasoning in queriesQuery-time reasoning
OWL & RDFS constructsOWL & RDFS reference
Ledger-wide configSetting groups

Transactions

Transactions are how you write data to Fluree. This section covers all transaction patterns, formats, and behaviors.

Transaction Patterns

Overview

High-level introduction to Fluree transactions:

  • Transaction lifecycle
  • Commit process
  • Indexing pipeline
  • Transaction semantics

Insert

Adding new data to the database:

  • Basic inserts
  • Batch inserts
  • Entity creation
  • Relationship creation

Upsert

Idempotent transactions that replace values for supplied predicates:

  • Upsert semantics
  • Use cases for upsert
  • Idempotent operations
  • Synchronization patterns

Update (WHERE/DELETE/INSERT)

Targeted updates to existing data:

  • WHERE clause patterns
  • DELETE operations
  • INSERT operations
  • Conditional updates
  • Partial updates

Retractions

Removing data from the database:

  • Retract specific triples
  • Retract entire entities
  • Retraction semantics
  • Time travel and retractions

Transaction Formats

Turtle Ingest

Import RDF data in Turtle format:

  • Turtle syntax
  • Bulk imports
  • File uploads
  • Format conversion

Signed / Credentialed Transactions

Cryptographically signed transactions:

  • JWS signed transactions
  • Verifiable Credentials
  • Identity-based transactions
  • Audit trails

Transaction Metadata

Commit Receipts and tx-id

Understanding transaction receipts:

  • Receipt structure
  • Transaction ID (t)
  • Commit ID
  • Timestamps
  • Flake counts

Indexing Side-Effects

How transactions affect indexing:

  • Background indexing
  • Novelty layer
  • Index triggers
  • Performance considerations

Transaction Concepts

Immutability

Once committed, transactions are immutable:

  • Changes are represented as new assertions and retractions
  • Historical data is never modified
  • Complete audit trail preserved
  • Time travel enabled by immutability

Atomicity

Transactions are atomic:

  • All changes succeed or all fail
  • No partial commits
  • Consistent state guaranteed
  • Validation before commit

Transaction Time

Every transaction receives a unique transaction time:

  • Monotonically increasing integer (t)
  • Unique across all ledgers in instance
  • Used for time travel queries
  • Basis for temporal ordering

Assertions and Retractions

Transactions consist of two operations:

  • Assertions: Add new triples
  • Retractions: Remove existing triples

Updates are represented as retraction + assertion pairs.

Common Transaction Patterns

Create Entity

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice",
      "schema:email": "alice@example.org"
    }
  ]
}

Update Property

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "where": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "insert": [
    { "@id": "ex:alice", "schema:age": 31 }
  ]
}

Add Relationship

{
  "@graph": [
    {
      "@id": "ex:alice",
      "schema:worksFor": { "@id": "ex:company-a" }
    }
  ]
}

Remove Property

{
  "delete": [
    { "@id": "ex:alice", "schema:telephone": "?phone" }
  ],
  "where": [
    { "@id": "ex:alice", "schema:telephone": "?phone" }
  ]
}

Replace Entity (Upsert)

POST /upsert?ledger=mydb:main
{
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice Smith",
      "schema:email": "alice.smith@example.org"
    }
  ]
}

Transaction Types

  • Insert (POST /insert) — add triples (JSON-LD or Turtle)
  • Update (POST /update) — WHERE/DELETE/INSERT (JSON-LD) or SPARQL UPDATE
  • Upsert (POST /upsert) — replace values for the predicates you supply (JSON-LD, Turtle, TriG)

Transaction Validation

Before commit, transactions are validated:

Syntax Validation:

  • Valid JSON/JSON-LD syntax
  • Well-formed IRIs
  • Correct datatype formats

Semantic Validation:

  • Type compatibility
  • Constraint adherence
  • Reference integrity (optional)

Policy Validation:

  • Authorization checks
  • Access control enforcement
  • Data-level permissions

Validation failures result in transaction rejection with detailed error messages.

Transaction Size Limits

Default Limits:

  • Transaction size: 10 MB
  • Triple count: 10,000 triples
  • Configurable per deployment

Large Transactions:

  • Split into batches for large imports
  • Use streaming for bulk data
  • Monitor indexing lag

See Indexing Side-Effects for performance considerations.

Error Handling

Transaction Errors

Common errors:

  • PARSE_ERROR - Invalid JSON-LD
  • INVALID_IRI - Malformed IRI
  • TYPE_ERROR - Type mismatch
  • CONSTRAINT_VIOLATION - Constraint violated
  • POLICY_DENIED - Not authorized

Retry Logic

Implement retry for transient errors:

  • Network errors: Retry with backoff
  • Conflicts: Retry with updated data
  • Timeouts: Retry after delay
  • Server errors: Retry with backoff

Idempotency

For idempotent transactions:

  • Use replace mode
  • Include unique identifiers
  • Design for retry safety
  • Use deterministic IRIs

Best Practices

1. Use Meaningful IRIs

Good:

{"@id": "ex:user-alice-123"}

Bad:

{"@id": "ex:1"}

Combine related entities in single transaction:

{
  "@graph": [
    { "@id": "ex:order-123", "ex:customer": { "@id": "ex:alice" } },
    { "@id": "ex:order-123", "ex:product": { "@id": "ex:widget" } },
    { "@id": "ex:order-123", "ex:total": 99.99 }
  ]
}

3. Use Appropriate Mode

  • Default mode: For additive operations
  • Replace mode: For complete replacements, sync operations

4. Include Types

Always specify entity types:

{
  "@id": "ex:alice",
  "@type": "schema:Person"
}

5. Use Typed Literals

Be explicit about types:

{
  "schema:birthDate": {
    "@value": "1990-05-15",
    "@type": "xsd:date"
  }
}

6. Design for History

Consider how data will look in historical queries:

  • Use descriptive property names
  • Include relevant metadata
  • Design for temporal queries

7. Monitor Performance

Track transaction metrics:

  • Commit time
  • Indexing lag
  • Error rates
  • Transaction size

Transaction Overview

This document provides a comprehensive overview of how transactions work in Fluree, from submission to final indexing.

What is a Transaction?

A transaction in Fluree is a set of changes to the database, represented as RDF triple assertions and retractions. Each transaction is:

  • Atomic: All changes succeed or all fail
  • Immutable: Once committed, never modified
  • Timestamped: Assigned a unique transaction time (t)
  • Auditable: Complete metadata preserved

Transaction Lifecycle

1. Submission

Client submits transaction to Fluree using either JSON-LD or SPARQL UPDATE:

JSON-LD Transaction:

POST /update?ledger=mydb:main
Content-Type: application/json

{
  "@context": { "ex": "http://example.org/ns/" },
  "@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }]
}

SPARQL UPDATE:

POST /update/mydb:main
Content-Type: application/sparql-update

PREFIX ex: <http://example.org/ns/>
INSERT DATA { ex:alice ex:name "Alice" }

2. Parsing

Fluree parses the transaction:

  • Parse JSON/JSON-LD structure
  • Expand compact IRIs using @context
  • Convert to internal representation

3. Validation

Transaction is validated:

  • Syntax validation: Well-formed IRIs, valid datatypes
  • Semantic validation: Type compatibility, constraints
  • Policy validation: Authorization checks

If validation fails, transaction is rejected with error details.

4. Conversion to Flakes

Transaction is converted to flakes (Fluree’s internal triple format):

Subject    Predicate           Object                    Operation
------------------------------------------------------------------------
ex:alice   rdf:type           schema:Person             assert
ex:alice   schema:name        "Alice"^^xsd:string       assert

Each flake is a tuple: (subject, predicate, object, transaction-time, operation, metadata)

5. Assignment of Transaction Time

Fluree assigns a unique transaction time (t):

  • Monotonically increasing integer
  • Unique across all transactions
  • Used for temporal queries

Example: t=42

6. Commit

Transaction is committed to storage:

  • Flakes written to transaction log
  • Commit metadata created (ContentId, timestamp, etc.)
  • Commit ID published to nameservice

Commit Data:

{
  "t": 42,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT42",
  "flakes_added": 2,
  "flakes_retracted": 0
}

7. Nameservice Update

Nameservice is updated with new commit:

  • commit_t updated to 42
  • commit_id updated
  • Other processes can see new commit

8. Indexing (Asynchronous)

Background process indexes the transaction:

  • Flakes added to index structures (SPOT, POST, OPST, PSOT)
  • Query-optimized data structures built
  • Graph sources updated (if applicable)

9. Index Publication

When indexing completes:

  • index_t updated to 42
  • index_id published
  • Novelty layer reduced

Transaction Components

@context

Defines namespace mappings:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  }
}

The @context can be:

  • Inline (as above)
  • External URL: "@context": "http://example.org/context.jsonld"
  • Array of contexts: "@context": [url1, {...}]

@graph

Contains the entities being asserted:

{
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice"
    },
    {
      "@id": "ex:bob",
      "@type": "schema:Person",
      "schema:name": "Bob"
    }
  ]
}

opts

Top-level parse-time options. These control how the transaction is parsed (not what it writes).

{
  "@context": {"ex": "http://example.org/ns/"},
  "opts": {"strictCompactIri": false},
  "@graph": [{"@id": "legacy:bob", "ex:name": "Bob"}]
}

Currently supported keys:

  • strictCompactIri (bool, default true): Reject unresolved compact-looking IRIs (prefix:suffix where the prefix is missing from @context). Disable only for legacy data where bare prefix:suffix strings are intentional. See IRIs and @context — Strict Compact-IRI Guard.

Programmatic Rust callers can override strictCompactIri via TxnOpts.strict_compact_iri, which takes precedence over the JSON opts value.

WHERE/DELETE/INSERT

For updates, specify what to match, delete, and insert:

{
  "where": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "insert": [
    { "@id": "ex:alice", "schema:age": 31 }
  ]
}

SPARQL UPDATE

Alternatively, use SPARQL UPDATE syntax with Content-Type: application/sparql-update:

PREFIX ex: <http://example.org/ns/>

DELETE {
  ?person ex:age ?oldAge .
}
INSERT {
  ?person ex:age 31 .
}
WHERE {
  ?person ex:name "Alice" .
  ?person ex:age ?oldAge .
}

SPARQL UPDATE supports:

  • INSERT DATA - Insert ground triples
  • DELETE DATA - Delete specific triples
  • DELETE WHERE - Delete matching patterns
  • DELETE/INSERT WHERE - Full update with patterns

See SPARQL UPDATE for complete documentation.

Transaction Endpoints

Fluree exposes three transaction endpoints (all under /v1/fluree/):

  • POST /insert — add triples (JSON-LD or Turtle)
  • POST /update — WHERE/DELETE/INSERT (JSON-LD) and SPARQL UPDATE
  • POST /upsert — replace values for the predicates you supply (JSON-LD, Turtle, TriG)

See Insert, Update, and Upsert for details.

Transaction Semantics

Assertions

Assertions add new triples to the database:

{
  "@id": "ex:alice",
  "schema:name": "Alice"
}

Creates triple:

ex:alice schema:name "Alice"

Retractions

Retractions remove existing triples:

{
  "delete": [
    { "@id": "ex:alice", "schema:age": "?age" }
  ],
  "where": [
    { "@id": "ex:alice", "schema:age": "?age" }
  ]
}

Removes matching triples.

Updates

Updates are retraction + assertion:

t=10: ex:alice schema:age 30 (assert)
t=20: ex:alice schema:age 30 (retract), ex:alice schema:age 31 (assert)

Historical queries can see both states.

Commit Metadata

Each commit includes rich metadata:

Core Fields:

  • t: Transaction time
  • timestamp: ISO 8601 timestamp
  • commit_id: Content-addressed identifier (CIDv1)

Counts:

  • flakes_added: Number of assertions
  • flakes_retracted: Number of retractions

Provenance (in txn-meta graph, under the commit subject):

  • f:identity: Authenticated identity acting on the transaction. System-controlled — verified DID for signed requests, otherwise from opts.identity / CommitOpts::identity. Any user-supplied f:identity in the body is overridden.
  • f:author: Optional author claim. Pure user txn-meta — supply f:author as a top-level property in the envelope-form transaction body.
  • f:message: Optional commit message. Pure user txn-meta — supply f:message as a top-level property in the envelope-form transaction body.
  • previous_commit_id: ContentId of previous commit (in the commit envelope).

See Commit Receipts for details.

Indexing Pipeline

Commit vs Index

Commit (immediate):

  • Transaction written to log
  • Available for time travel queries
  • Small, append-only files

Index (asynchronous):

  • Query-optimized data structures
  • Background process
  • May lag behind commits

Novelty Layer

The novelty layer is uncommitted data between index and commit:

index_t = 40
commit_t = 45
novelty layer = transactions 41, 42, 43, 44, 45

Queries combine:

  • Indexed data (up to t=40)
  • Novelty layer (t=41 to t=45)

Index Structures

Fluree maintains four index permutations (SPOT, POST, OPST, PSOT):

SPOT (Subject-Predicate-Object-Time):

ex:alice → schema:name → "Alice" → t=10

POST (Predicate-Object-Subject-Time):

schema:name → "Alice" → ex:alice → t=10

OPST (Object-Predicate-Subject-Time):

"Alice" → schema:name → ex:alice → t=10

PSOT (Predicate-Subject-Object-Time):

schema:name → ex:alice → "Alice" → t=10

Different query patterns use different indexes for optimal performance.

Transaction Properties

Atomicity

All-or-nothing execution:

  • Validation failure rejects entire transaction
  • Parse error rejects entire transaction
  • No partial commits

Consistency

Database remains consistent:

  • Constraints enforced
  • Types validated
  • References checked (optionally)

Isolation

Transactions are isolated:

  • Each sees consistent snapshot
  • No dirty reads
  • Serializable execution

Durability

Committed data is durable:

  • Written to persistent storage
  • Replicated (if configured)
  • Immutable

Error Handling

Validation Errors

{
  "error": "ValidationError",
  "message": "Invalid IRI format",
  "code": "INVALID_IRI",
  "details": {
    "iri": "not a uri",
    "line": 3
  }
}

Conflict Errors

{
  "error": "ConflictError",
  "message": "Concurrent modification detected",
  "code": "CONCURRENT_MODIFICATION"
}

Policy Errors

{
  "error": "Forbidden",
  "message": "Policy denies transact on mydb:main",
  "code": "POLICY_DENIED"
}

Performance Considerations

Transaction Size

  • Recommended: < 1,000 triples per transaction
  • Maximum: Configurable (default 10,000)
  • Large transactions increase commit time

Indexing Lag

  • Background indexing may lag behind commits
  • Monitor commit_t - index_t gap
  • Tune indexing frequency if needed

Batch Operations

For bulk imports:

  • Batch into reasonably-sized transactions
  • Monitor memory usage
  • Allow time for indexing between batches

For initial ledger bootstraps (large Turtle datasets), prefer the Rust bulk import API which streams commits and builds multi-order binary indexes:

See Indexing Side-Effects for details.

Best Practices

1. Meaningful Transaction Units

Group related changes in single transaction:

Good:

{
  "@graph": [
    { "@id": "ex:order-123", "ex:customer": { "@id": "ex:alice" } },
    { "@id": "ex:order-123", "ex:items": [...] },
    { "@id": "ex:order-123", "ex:total": 99.99 }
  ]
}

2. Include Metadata

Add provenance information:

{
  "@graph": [
    {
      "@id": "ex:alice",
      "schema:name": "Alice",
      "ex:created": "2024-01-22T10:00:00Z",
      "ex:createdBy": "user-123"
    }
  ]
}

3. Use Descriptive IRIs

Good: ex:user-alice-123 Bad: ex:1

4. Test Transactions

Test transactions before production:

  • Validate JSON-LD syntax
  • Check IRI formats
  • Verify types and constraints

5. Monitor Performance

Track metrics:

  • Average commit time
  • Indexing lag
  • Transaction size
  • Error rate

6. Handle Errors Gracefully

Implement retry logic for transient errors:

  • Network errors
  • Timeout errors
  • Conflict errors (with updated data)

7. Design for Time Travel

Remember data is immutable:

  • Changes create new versions
  • Historical queries see all versions
  • Design with temporal access in mind

Insert

Insert operations add new data to Fluree. This is the most common transaction type for creating new entities and relationships.

Basic Insert

Single Entity

Insert a single entity with properties:

curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:alice",
        "@type": "schema:Person",
        "schema:name": "Alice",
        "schema:email": "alice@example.org",
        "schema:age": 30
      }
    ]
  }'

Result:

{
  "t": 1,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT1",
  "flakes_added": 4,
  "flakes_retracted": 0
}

This creates 4 triples:

ex:alice rdf:type schema:Person
ex:alice schema:name "Alice"
ex:alice schema:email "alice@example.org"
ex:alice schema:age 30

Multiple Entities

Insert multiple entities in one transaction:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice"
    },
    {
      "@id": "ex:bob",
      "@type": "schema:Person",
      "schema:name": "Bob"
    },
    {
      "@id": "ex:carol",
      "@type": "schema:Person",
      "schema:name": "Carol"
    }
  ]
}

Benefits:

  • Atomic: All entities created or none
  • Efficient: Single commit, single index update
  • Consistent: All entities at same transaction time

Insert with Relationships

Create entities with relationships:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:company-a",
      "@type": "schema:Organization",
      "schema:name": "Acme Corp"
    },
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice",
      "schema:worksFor": { "@id": "ex:company-a" }
    }
  ]
}

This creates:

ex:company-a rdf:type schema:Organization
ex:company-a schema:name "Acme Corp"
ex:alice rdf:type schema:Person
ex:alice schema:name "Alice"
ex:alice schema:worksFor ex:company-a

Nested Objects

Create nested structures:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice",
      "schema:address": {
        "@id": "ex:alice-address",
        "@type": "schema:PostalAddress",
        "schema:streetAddress": "123 Main St",
        "schema:addressLocality": "Springfield",
        "schema:postalCode": "12345"
      }
    }
  ]
}

This creates two entities (alice and alice-address) linked by schema:address.

Multi-Valued Properties

Add multiple values for a property:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice",
      "schema:email": ["alice@example.org", "alice@work.com"],
      "schema:telephone": ["+1-555-0100", "+1-555-0101"]
    }
  ]
}

Creates separate triples for each value:

ex:alice schema:email "alice@example.org"
ex:alice schema:email "alice@work.com"
ex:alice schema:telephone "+1-555-0100"
ex:alice schema:telephone "+1-555-0101"

Typed Literals

Dates

{
  "@id": "ex:alice",
  "schema:birthDate": {
    "@value": "1994-05-15",
    "@type": "xsd:date"
  }
}

Timestamps

{
  "@id": "ex:event",
  "schema:startDate": {
    "@value": "2024-01-22T10:30:00Z",
    "@type": "xsd:dateTime"
  }
}

Numbers

{
  "@id": "ex:product",
  "schema:price": {
    "@value": "29.99",
    "@type": "xsd:decimal"
  }
}

Booleans

{
  "@id": "ex:alice",
  "schema:active": {
    "@value": "true",
    "@type": "xsd:boolean"
  }
}

Or use native JSON boolean:

{
  "@id": "ex:alice",
  "schema:active": true
}

Language Tags

Add language-tagged strings:

{
  "@id": "ex:alice",
  "schema:name": {
    "@value": "Alice",
    "@language": "en"
  },
  "schema:description": [
    { "@value": "Software engineer", "@language": "en" },
    { "@value": "Ingénieure logicielle", "@language": "fr" },
    { "@value": "Softwareingenieurin", "@language": "de" }
  ]
}

Blank Nodes

Create entities without explicit IRIs:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "schema:address": {
        "@type": "schema:PostalAddress",
        "schema:streetAddress": "123 Main St"
      }
    }
  ]
}

Fluree generates a unique IRI for the blank node address.

Adding to Existing Entities

Add properties to existing entities:

Initial Insert (t=1):

{
  "@graph": [
    {
      "@id": "ex:alice",
      "schema:name": "Alice"
    }
  ]
}

Add Email (t=2):

{
  "@graph": [
    {
      "@id": "ex:alice",
      "schema:email": "alice@example.org"
    }
  ]
}

After t=2, ex:alice has both name and email.

Insert Semantics

Additive by Default

Inserts are additive—they don’t remove existing data:

t=1: INSERT { ex:alice schema:name "Alice" }
     Result: ex:alice has name "Alice"

t=2: INSERT { ex:alice schema:age 30 }
     Result: ex:alice has name "Alice" AND age 30

Duplicate Prevention

Inserting the same triple again is a no-op:

t=1: INSERT { ex:alice schema:name "Alice" }
t=2: INSERT { ex:alice schema:name "Alice" }
     (No change—triple already exists)

Multi-Value Handling

Multiple values create multiple triples:

t=1: INSERT { ex:alice schema:email "alice@example.org" }
t=2: INSERT { ex:alice schema:email "alice@work.com" }
     Result: ex:alice has TWO email values

IRI Generation

Explicit IRIs

Specify IRIs explicitly:

{
  "@id": "ex:user-12345",
  "schema:name": "Alice"
}

UUID-Based IRIs

Generate UUIDs for unique IRIs:

const uuid = crypto.randomUUID();
const entity = {
  "@id": `ex:user-${uuid}`,
  "schema:name": "Alice"
};

Content-Addressable IRIs

Use content hashing for deterministic IRIs:

const hash = sha256(JSON.stringify(data));
const entity = {
  "@id": `ex:entity-${hash}`,
  ...data
};

Batch Inserts

{
  "@graph": [
    { "@id": "ex:user-1", "schema:name": "Alice" },
    { "@id": "ex:user-2", "schema:name": "Bob" },
    { "@id": "ex:user-3", "schema:name": "Carol" }
    // ... 100-1000 entities
  ]
}

Large Imports

For very large imports:

const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
  const batch = entities.slice(i, i + batchSize);
  await transact({ "@graph": batch });
  
  // Optional: wait for indexing
  await sleep(1000);
}

Error Handling

Common Insert Errors

Invalid IRI:

{
  "error": "ValidationError",
  "message": "Invalid IRI format",
  "code": "INVALID_IRI"
}

Type Mismatch:

{
  "error": "TypeError",
  "message": "Expected number, got string",
  "code": "TYPE_ERROR"
}

Constraint Violation:

{
  "error": "ConstraintViolation",
  "message": "Unique constraint violated",
  "code": "CONSTRAINT_VIOLATION"
}

Validation Before Insert

Validate data before inserting:

function validateEntity(entity) {
  if (!entity['@id']) {
    throw new Error('Entity must have @id');
  }
  if (!isValidIRI(entity['@id'])) {
    throw new Error('Invalid IRI format');
  }
  // Additional validation...
}

Best Practices

1. Use Meaningful IRIs

Good:

{ "@id": "ex:user-alice-12345" }

Bad:

{ "@id": "ex:1" }

2. Always Include Type

{
  "@id": "ex:alice",
  "@type": "schema:Person"
}

3. Use Appropriate Datatypes

{
  "schema:age": 30,
  "schema:price": 29.99,
  "schema:active": true,
  "schema:birthDate": { "@value": "1994-05-15", "@type": "xsd:date" }
}

Insert related entities in same transaction:

{
  "@graph": [
    { "@id": "ex:order-123", ... },
    { "@id": "ex:order-item-1", ... },
    { "@id": "ex:order-item-2", ... }
  ]
}

5. Use Consistent Namespaces

Define and use consistent namespace prefixes:

{
  "@context": {
    "app": "https://myapp.com/ns/",
    "schema": "http://schema.org/"
  }
}

6. Include Metadata

Add creation metadata:

{
  "@id": "ex:alice",
  "schema:name": "Alice",
  "app:createdAt": "2024-01-22T10:00:00Z",
  "app:createdBy": "user-admin"
}

7. Validate Before Insert

Always validate:

  • JSON-LD syntax
  • IRI formats
  • Required fields
  • Type constraints

Performance Tips

1. Batch Appropriately

  • Recommended: 100-1000 entities per batch
  • Too small: Many commits, slow
  • Too large: Memory pressure, long commits

2. Monitor Indexing

Track indexing lag after large inserts:

curl http://localhost:8090/v1/fluree/info/mydb:main
# Check: t - index.t

3. Use Efficient IRIs

Short IRIs are more efficient:

Good: ex:user-123 Less efficient: https://example.org/very/long/path/user-123

4. Minimize Context Size

Use compact contexts:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  }
}

Upsert

Upsert operations provide idempotent transactions by replacing the values of the predicates you supply for an entity (matched by @id).

What is Upsert?

Upsert = Update or Insert:

  • If the entity exists: for each predicate present in your payload, retract existing values for that predicate and assert the new value(s)
  • If the entity doesn’t exist: create it with the supplied triples

This makes upserts safe to retry: sending the same upsert repeatedly produces the same current-state values for those predicates.

HTTP Endpoint

Use the dedicated upsert endpoint:

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "@graph": [
      {
        "@id": "ex:alice",
        "@type": "schema:Person",
        "schema:name": "Alice Smith",
        "schema:email": "alice.smith@example.org"
      }
    ]
  }'

Upsert Behavior

First Transaction (Entity Doesn’t Exist)

{
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice",
      "schema:email": "alice@example.org"
    }
  ]
}

Result: Entity created with specified properties.

Triples After t=1:

ex:alice rdf:type schema:Person
ex:alice schema:name "Alice"
ex:alice schema:email "alice@example.org"

Second Transaction (Entity Exists)

{
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice Smith",
      "schema:email": "alice.smith@example.org",
      "schema:age": 30
    }
  ]
}

Operations:

  1. Retract ALL existing properties of ex:alice
  2. Assert new properties

Flakes:

# Retractions (t=2)
ex:alice schema:name "Alice" (retract)
ex:alice schema:email "alice@example.org" (retract)

# Assertions (t=2)
ex:alice rdf:type schema:Person (assert)
ex:alice schema:name "Alice Smith" (assert)
ex:alice schema:email "alice.smith@example.org" (assert)
ex:alice schema:age 30 (assert)

Triples After t=2:

ex:alice rdf:type schema:Person
ex:alice schema:name "Alice Smith"
ex:alice schema:email "alice.smith@example.org"
ex:alice schema:age 30

Note: The @type is re-asserted (types are always included in replace).

Idempotency

Replace mode is idempotent—repeated submissions produce the same result:

First Submission (t=1):

{"@id": "ex:alice", "schema:name": "Alice", "schema:age": 30}

Result: Entity created.

Second Submission (t=2):

{"@id": "ex:alice", "schema:name": "Alice", "schema:age": 30}

Result: No actual changes (retracts and re-asserts same values).

Third Submission (t=3):

{"@id": "ex:alice", "schema:name": "Alice", "schema:age": 30}

Result: No actual changes.

This makes upserts safe to retry.

Comparison: Insert vs Update vs Upsert

Insert

POST /insert?ledger=mydb:main

Behavior:

  • Additive: asserts the triples you submit
  • Does not retract existing values automatically

Example:

t=1: INSERT { ex:alice schema:name "Alice", schema:age 30 }
t=2: INSERT { ex:alice schema:email "alice@example.org" }

Result: ex:alice has name, age, AND email (all three)

Update (WHERE/DELETE/INSERT)

POST /update?ledger=mydb:main

Behavior:

  • Explicit: you retract exactly what you match in where/delete, then assert insert
  • Most flexible (conditional updates, partial updates, computed values)

Example:

t=1: INSERT { ex:alice schema:name "Alice", schema:age 30 }
t=2: UPDATE { DELETE { ex:alice schema:age 30 } INSERT { ex:alice schema:age 31 } WHERE { ex:alice schema:age 30 } }

Result: ex:alice has name "Alice", age 31

Upsert

POST /upsert?ledger=mydb:main

Behavior:

  • Replaces values for the predicates you supply (per subject)
  • Leaves other predicates unchanged
  • Retry-safe/idempotent for the supplied predicates

Use Cases

1. Synchronization from External Systems

Sync data from external database:

async function syncUser(externalUser) {
  await fetch('http://localhost:8090/v1/fluree/upsert?ledger=mydb:main', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      "@graph": [{
        "@id": `ex:user-${externalUser.id}`,
        "@type": "schema:Person",
        "schema:name": externalUser.name,
        "schema:email": externalUser.email,
        "schema:telephone": externalUser.phone
      }]
    })
  })
}

// Safe to call repeatedly—always matches external state
await syncUser(fetchUserFromDB(123));

2. Idempotent API Operations

Make API operations retry-safe:

// Safe to retry on failure
async function updateProduct(productId, productData) {
  return await fetch('http://localhost:8090/v1/fluree/upsert?ledger=mydb:main', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      "@graph": [{
        "@id": `ex:product-${productId}`,
        ...productData
      }]
    })
  })
}

3. Configuration Management

Update configuration atomically:

{
  "@graph": [
    {
      "@id": "ex:config",
      "@type": "ex:Configuration",
      "ex:apiEndpoint": "https://api.example.com",
      "ex:timeout": 30000,
      "ex:retries": 3,
      "ex:enabled": true
    }
  ]
}

Each update replaces entire configuration—no orphaned settings.

4. State Machine Transitions

Model state machines where entity has well-defined state:

{
  "@graph": [
    {
      "@id": "ex:order-123",
      "@type": "ex:Order",
      "ex:status": "shipped",
      "ex:shippedAt": "2024-01-22T10:30:00Z",
      "ex:carrier": "FedEx",
      "ex:trackingNumber": "123456789"
    }
  ]
}

Batch Upserts

Upsert multiple entities:

POST /upsert?ledger=mydb:main
{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:user-1",
      "@type": "schema:Person",
      "schema:name": "Alice"
    },
    {
      "@id": "ex:user-2",
      "@type": "schema:Person",
      "schema:name": "Bob"
    },
    {
      "@id": "ex:user-3",
      "@type": "schema:Person",
      "schema:name": "Carol"
    }
  ]
}

Each entity is replaced independently.

Type Handling

Types are Preserved

Upsert preserves existing @type values unless you explicitly include @type in the upsert payload (in which case rdf:type is treated like any other predicate and its values are replaced for that subject).

{
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice"
    }
  ]
}

The @type is always asserted, even if it existed before.

Multiple Types

Entities can have multiple types:

{
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": ["schema:Person", "ex:Employee"],
      "schema:name": "Alice"
    }
  ]
}

All types are replaced together.

Edge Cases

Empty Replacement

Replacing with minimal data removes other properties:

Before (t=1):

{
  "@id": "ex:alice",
  "schema:name": "Alice",
  "schema:email": "alice@example.org",
  "schema:age": 30,
  "schema:telephone": "+1-555-0100"
}

Replace (t=2):

{
  "@id": "ex:alice",
  "@type": "schema:Person",
  "schema:name": "Alice"
}

After t=2:

{
  "@id": "ex:alice",
  "@type": "schema:Person",
  "schema:name": "Alice"
}

Email, age, and telephone are removed.

Partial Updates Not Possible

Replace mode replaces ALL properties—partial updates not supported.

For partial updates, use WHERE/DELETE/INSERT.

Error Handling

Same Errors as Default Mode

Replace mode has same validation errors:

{
  "error": "ValidationError",
  "message": "Invalid IRI format",
  "code": "INVALID_IRI"
}

No Special Errors

Replace mode doesn’t introduce new error types—it’s just different semantics for the same operations.

Performance Considerations

Retraction Overhead

Replace mode may retract many triples:

Entity with 50 properties:
- 50 retractions
- 50 assertions
= 100 flakes per entity

For entities with many properties, this can be expensive.

Indexing Impact

Each retraction and assertion updates indexes:

  • More work for indexing process
  • May increase indexing lag
  • Consider batch size for large replacements

Best Practices

1. Use for Idempotent Operations

Good use:

// Idempotent sync
await upsertUser(userId, userData);
await upsertUser(userId, userData); // Safe to retry

2. Include All Required Properties

Always include all properties entity should have:

Good:

{
  "@id": "ex:user-123",
  "@type": "schema:Person",
  "schema:name": "Alice",
  "schema:email": "alice@example.org",
  "ex:status": "active"
}

Bad (incomplete):

{
  "@id": "ex:user-123",
  "schema:name": "Alice"
}

3. Use Consistent Schema

Define entity schema and always include all fields:

function createUserTransaction(user) {
  return {
    "@id": `ex:user-${user.id}`,
    "@type": "schema:Person",
    "schema:name": user.name || null,
    "schema:email": user.email || null,
    "schema:telephone": user.phone || null,
    "ex:status": user.status || "active"
  };
}

4. Document Upsert Usage

Comment when using upsert for idempotent sync:

// Upsert for idempotent sync with external API
await fetch('http://localhost:8090/v1/fluree/upsert?ledger=users:main', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify(userPayload),
});

5. Test Idempotency

Verify operations are truly idempotent:

const result1 = await upsert(data);
const result2 = await upsert(data);
// Should produce same final state

6. Monitor Performance

Track metrics for replace operations:

  • Flakes retracted
  • Flakes asserted
  • Commit time
  • Indexing lag

7. Consider Alternatives

For partial updates, use WHERE/DELETE/INSERT:

{
  "where": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
  "delete": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
  "insert": [{ "@id": "ex:alice", "schema:age": 31 }]
}

Comparison Table

FeatureDefault ModeReplace Mode
BehaviorAdditiveReplace all
Existing propertiesPreservedRemoved
IdempotentNoYes
Partial updatesYes (with WHERE/DELETE/INSERT)No
Use caseAdding dataSynchronization
Retry safetyRequires careSafe by default
PerformanceFewer operationsMore operations

Update (WHERE/DELETE/INSERT)

The WHERE/DELETE/INSERT pattern enables targeted updates to existing data in Fluree. This is the most flexible update mechanism, allowing conditional modifications, partial updates, and complex transformations.

Basic Pattern

The WHERE/DELETE/INSERT pattern has three clauses:

  1. WHERE: Pattern to match existing data
  2. DELETE: Triples to retract (using variables from WHERE)
  3. INSERT: Triples to assert (using variables from WHERE)
{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "where": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "insert": [
    { "@id": "ex:alice", "schema:age": 31 }
  ]
}

This:

  1. Finds the current age of ex:alice
  2. Deletes that age value
  3. Inserts the new age value

WHERE clause capabilities

The update transaction where clause uses the same pattern grammar as JSON-LD queries, so you can use rich patterns like OPTIONAL, UNION, FILTER, VALUES, and subqueries.

Two common forms:

  • Node-map: a single object (simple triple patterns)
  • Array: a sequence of node-maps plus special forms (recommended for anything beyond basic matching)

Supported special forms inside the where array:

  • ["filter", <expr>]
  • ["bind", "?var", <expr>] (may include multiple var/expr pairs)
  • ["optional", <pattern>]
  • ["union", <pattern>, <pattern>, ...]
  • ["minus", <pattern>]
  • ["exists", <pattern>] / ["not-exists", <pattern>]
  • ["values", <values-clause>]
  • ["query", <subquery>] (subquery can use select, groupBy, aggregates like (max ?x), etc.)
  • ["graph", <graph-name>, <pattern>]

Expression format for filter/bind supports either:

  • Data expressions like ["+", "?x", 1], ["and", [">=", "?age", 18], ["=", "?status", "pending"]]
  • S-expressions like "(+ ?x 1)"

Graph scoping (named graphs)

JSON-LD update supports writing into user-defined named graphs (ingested via TriG or JSON-LD @graph) and scoping the update to a named graph.

Default graph for WHERE/DELETE/INSERT

Use a top-level graph key to scope the update to a named graph as the default graph:

{
  "@context": { "ex": "http://example.org/ns/", "schema": "http://schema.org/" },
  "graph": "http://example.org/graphs/audit",
  "where":  { "@id": "ex:event1", "schema:description": "?old" },
  "delete": { "@id": "ex:event1", "schema:description": "?old" },
  "insert": { "@id": "ex:event1", "schema:description": "new" }
}

This is the JSON-LD UPDATE analog of SPARQL UPDATE WITH <iri>:

  • WHERE patterns are evaluated against the named graph
  • DELETE/INSERT templates without an explicit graph are written to that named graph

Writing templates to specific graphs

There are two ways to target graphs in insert / delete templates:

  • Per-node @graph: attach a graph IRI to a node object (overrides the transaction-level graph)
{
  "insert": [
    { "@id": "ex:event1", "@graph": "http://example.org/graphs/audit", "schema:description": "v" }
  ]
}
  • Template sugar: inside insert / delete arrays, use ["graph", "<graph IRI>", <pattern>]
{
  "insert": [
    ["graph", "http://example.org/graphs/audit", { "@id": "ex:event1", "schema:description": "v" }]
  ]
}

Notes:

  • graph is a graph IRI (a string like "http://example.org/graphs/audit")
  • Named-graph reads are available after indexing completes (see docs/query/datasets.md)

Dataset scoping for WHERE (from / fromNamed)

JSON-LD update reuses the same dataset keys as JSON-LD query to control where the where clause reads from:

  • from: scopes the default graph used for where evaluation (equivalent to SPARQL UPDATE USING <iri>)
  • fromNamed: restricts which named graphs are visible to where ["graph", ...] patterns (equivalent to SPARQL UPDATE USING NAMED <iri>)

This is why JSON-LD update uses from rather than introducing new keywords: it matches the existing JSON-LD query language vocabulary and keeps dataset configuration consistent across read-only queries and updates.

from (WHERE default graph)

When from is present, it scopes the where clause evaluation without changing where templates write:

  • graph (if present) controls the default graph for DELETE/INSERT templates (SPARQL UPDATE WITH)
  • from controls the default graph(s) for where evaluation (SPARQL UPDATE USING)

Notes:

  • from can be:
    • a string graph IRI (shorthand for {"graph": "<iri>"})
    • an object with {"graph": "<iri>"} (or {"graph": ["<iri1>", "<iri2>"]})
    • an array of graph IRIs/selectors (multiple graphs are evaluated as a merged default graph)
  • If your insert / delete templates write into the same graph as the top-level graph, you can omit per-template graph selection. The top-level graph becomes the default target for templates that don’t specify @graph (or ["graph", ...] sugar).
  • If you want to write to multiple graphs in one update, keep a top-level graph as the default (optional) and use per-template ["graph", ...] for the exceptions.
{
  "@context": { "ex": "http://example.org/ns/", "schema": "http://schema.org/" },
  "graph": "http://example.org/g2",
  "from": { "graph": "http://example.org/g1" },
  "where": { "@id": "ex:s", "schema:description": "?d" },
  "insert": [{ "@id": "ex:s", "schema:copyFromG1": "?d" }]
}

Example: read from one graph, write to two graphs

{
  "@context": { "ex": "http://example.org/ns/", "schema": "http://schema.org/" },
  "graph": "http://example.org/g2",
  "from": { "graph": "http://example.org/g1" },
  "where": { "@id": "ex:s", "schema:description": "?d" },
  "insert": [
    { "@id": "ex:s", "schema:copyFromG1": "?d" },
    ["graph", "http://example.org/audit", { "@id": "ex:event1", "schema:description": "copied description" }]
  ]
}

fromNamed (WHERE named graphs allowlist)

Use fromNamed to allow (and optionally alias) named graphs for where ["graph", ...] patterns:

Notes:

  • In where GRAPH patterns, you can reference the graph by alias (e.g. "g2") or by the graph IRI (e.g. "http://example.org/g2"). Aliases are just convenience names for matching.
  • In insert / delete templates, graph selection is a write target. You can use:
    • the full graph IRI ("http://example.org/g2")
    • a compact IRI/term that expands via @context (e.g. "ex:g2")
    • the fromNamed alias (e.g. "g2") for consistency within the same update transaction
{
  "@context": { "ex": "http://example.org/ns/" },
  "fromNamed": [
    { "alias": "g2", "graph": "http://example.org/g2" }
  ],
  "where": [
    ["graph", "g2", { "@id": "ex:s", "ex:p": "?o" }]
  ],
  "insert": [["graph", "g2", { "@id": "ex:s", "ex:q": "touched" }]]
}

Same example, but with a compacted graph IRI via @context:

{
  "@context": { "ex": "http://example.org/ns/" },
  "fromNamed": [{ "alias": "g2", "graph": "ex:g2" }],
  "where": [["graph", "g2", { "@id": "ex:s", "ex:p": "?o" }]],
  "insert": [["graph", "ex:g2", { "@id": "ex:s", "ex:q": "touched" }]]
}

Same idea without an explicit alias (the fromNamed string acts as its own identifier):

{
  "@context": { "ex": "http://example.org/ns/" },
  "fromNamed": ["ex:g2"],
  "where": [["graph", "ex:g2", { "@id": "ex:s", "ex:p": "?o" }]],
  "insert": [["graph", "ex:g2", { "@id": "ex:s", "ex:q": "touched" }]]
}

Simple Property Update

Update a single property value:

curl -X POST "http://localhost:8090/v1/fluree/update?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "where": [
      { "@id": "ex:alice", "schema:email": "?oldEmail" }
    ],
    "delete": [
      { "@id": "ex:alice", "schema:email": "?oldEmail" }
    ],
    "insert": [
      { "@id": "ex:alice", "schema:email": "alice.new@example.org" }
    ]
  }'

Multiple Property Updates

Update several properties at once:

{
  "where": [
    { "@id": "ex:alice", "schema:name": "?oldName" },
    { "@id": "ex:alice", "schema:email": "?oldEmail" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:name": "?oldName" },
    { "@id": "ex:alice", "schema:email": "?oldEmail" }
  ],
  "insert": [
    { "@id": "ex:alice", "schema:name": "Alice Johnson" },
    { "@id": "ex:alice", "schema:email": "alice.j@example.org" }
  ]
}

Conditional Updates

Only update if condition is met:

{
  "where": [
    { "@id": "ex:alice", "schema:age": "?age" },
    { "@id": "ex:alice", "ex:status": "?status" },
    ["filter", ["and", [">=", "?age", 18], ["=", "?status", "pending"]]]
  ],
  "delete": [
    { "@id": "ex:alice", "ex:status": "?status" }
  ],
  "insert": [
    { "@id": "ex:alice", "ex:status": "approved" }
  ]
}

The update only happens if Alice is 18+ and status is “pending”.

Pattern Matching

Find and Update

Find entities matching a pattern and update them:

{
  "where": [
    { "@id": "?person", "@type": "schema:Person" },
    { "@id": "?person", "ex:status": "pending" }
  ],
  "delete": [
    { "@id": "?person", "ex:status": "pending" }
  ],
  "insert": [
    { "@id": "?person", "ex:status": "active" }
  ]
}

This updates ALL people with status=“pending” to status=“active”.

Relationship-Based Updates

Update based on relationships:

{
  "where": [
    { "@id": "?employee", "schema:worksFor": "ex:company-a" },
    { "@id": "?employee", "ex:salary": "?oldSalary" },
    ["bind", "?newSalary", ["*", "?oldSalary", 1.1]]
  ],
  "delete": [
    { "@id": "?employee", "ex:salary": "?oldSalary" }
  ],
  "insert": [
    { "@id": "?employee", "ex:salary": "?newSalary" }
  ]
}

Gives all company-a employees a 10% raise.

Variable Transformation

Use variables from WHERE in INSERT with transformations:

{
  "where": [
    { "@id": "ex:product-123", "ex:price": "?currentPrice" },
    ["bind", "?newPrice", ["*", "?currentPrice", 0.9]]
  ],
  "delete": [
    { "@id": "ex:product-123", "ex:price": "?currentPrice" }
  ],
  "insert": [
    { "@id": "ex:product-123", "ex:price": "?newPrice" },
    { "@id": "ex:product-123", "ex:previousPrice": "?currentPrice" }
  ]
}

Applies 10% discount and saves previous price.

Partial Updates

Update only specific properties, leaving others unchanged:

Current State:

ex:alice schema:name "Alice"
ex:alice schema:email "alice@example.org"
ex:alice schema:age 30
ex:alice schema:telephone "+1-555-0100"

Update Only Age:

{
  "where": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ],
  "insert": [
    { "@id": "ex:alice", "schema:age": 31 }
  ]
}

Result:

ex:alice schema:name "Alice"              (unchanged)
ex:alice schema:email "alice@example.org" (unchanged)
ex:alice schema:age 31                     (updated)
ex:alice schema:telephone "+1-555-0100"   (unchanged)

Adding Properties

Add a property without WHERE (when it might not exist):

{
  "insert": [
    { "@id": "ex:alice", "schema:telephone": "+1-555-0100" }
  ]
}

Or conditionally add if missing:

{
  "where": [
    { "@id": "ex:alice", "schema:name": "?name" },
    ["optional", { "@id": "ex:alice", "schema:telephone": "?existingPhone" }],
    ["filter", ["not", ["bound", "?existingPhone"]]]
  ],
  "insert": [
    { "@id": "ex:alice", "schema:telephone": "+1-555-0100" }
  ]
}

Removing Properties

Remove a property entirely:

{
  "where": [
    { "@id": "ex:alice", "schema:telephone": "?phone" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:telephone": "?phone" }
  ]
}

No INSERT clause—just deletes.

Multi-Value Properties

Replace One Value

{
  "where": [
    { "@id": "ex:alice", "schema:email": "alice.old@example.org" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:email": "alice.old@example.org" }
  ],
  "insert": [
    { "@id": "ex:alice", "schema:email": "alice.new@example.org" }
  ]
}

Add Value

{
  "insert": [
    { "@id": "ex:alice", "schema:email": "alice.work@example.org" }
  ]
}

Remove One Value

{
  "where": [
    { "@id": "ex:alice", "schema:email": "alice.old@example.org" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:email": "alice.old@example.org" }
  ]
}

Remove All Values

{
  "where": [
    { "@id": "ex:alice", "schema:email": "?email" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:email": "?email" }
  ]
}

Relationship Updates

Change Relationship

{
  "where": [
    { "@id": "ex:alice", "schema:worksFor": "?oldCompany" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:worksFor": "?oldCompany" }
  ],
  "insert": [
    { "@id": "ex:alice", "schema:worksFor": "ex:company-b" }
  ]
}

Add Relationship

{
  "insert": [
    { "@id": "ex:alice", "schema:knows": "ex:bob" }
  ]
}

Remove Relationship

{
  "where": [
    { "@id": "ex:alice", "schema:knows": "ex:bob" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:knows": "ex:bob" }
  ]
}

Complex Updates

Cascading Updates

Update related entities:

{
  "where": [
    { "@id": "ex:order-123", "ex:status": "?oldStatus" },
    { "@id": "ex:order-123", "ex:items": "?item" },
    { "@id": "?item", "ex:status": "?itemStatus" }
  ],
  "delete": [
    { "@id": "ex:order-123", "ex:status": "?oldStatus" },
    { "@id": "?item", "ex:status": "?itemStatus" }
  ],
  "insert": [
    { "@id": "ex:order-123", "ex:status": "shipped" },
    { "@id": "?item", "ex:status": "shipped" }
  ]
}

Computed Values

Calculate new values based on old:

{
  "where": [
    { "@id": "ex:product-123", "ex:inventory": "?current" },
    { "@id": "ex:product-123", "ex:sold": "?sold" },
    ["bind", "?newInventory", ["-", "?current", "?sold"]]
  ],
  "delete": [
    { "@id": "ex:product-123", "ex:inventory": "?current" }
  ],
  "insert": [
    { "@id": "ex:product-123", "ex:inventory": "?newInventory" }
  ]
}

Error Handling

No Match

If WHERE doesn’t match, nothing happens (not an error):

{
  "where": [
    { "@id": "ex:nonexistent", "schema:name": "?name" }
  ],
  "delete": [...],
  "insert": [...]
}

Result: No changes, no error.

Multiple Matches

If WHERE matches multiple entities, all are updated:

{
  "where": [
    { "@id": "?person", "ex:status": "pending" }
  ],
  "delete": [
    { "@id": "?person", "ex:status": "pending" }
  ],
  "insert": [
    { "@id": "?person", "ex:status": "approved" }
  ]
}

Updates ALL entities with status=“pending”.

Comparison: WHERE/DELETE/INSERT vs Replace Mode

FeatureWHERE/DELETE/INSERTReplace Mode
GranularityProperty-levelEntity-level
Other propertiesPreservedRemoved
ConditionalYes (with filters)No
Pattern matchingYesNo
IdempotentDepends on logicYes
Use casePartial updatesComplete replacement

Best Practices

1. Be Specific in WHERE

Good (specific):

{
  "where": [
    { "@id": "ex:alice", "schema:age": "?oldAge" }
  ]
}

Risky (might match many):

{
  "where": [
    { "@id": "?person", "schema:age": "?age" }
  ]
}

2. Always Use Variables

Use variables from WHERE in DELETE:

Good:

{
  "where": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
  "delete": [{ "@id": "ex:alice", "schema:age": "?oldAge" }]
}

Bad (deletes all ages):

{
  "where": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
  "delete": [{ "@id": "ex:alice", "schema:age": "?age" }]
}

3. Test Updates

Test on development data first:

// Test update logic
const result = await transact(updateQuery);
console.log(`Updated ${result.flakes_retracted} values`);

4. Use Filters for Safety

Add filters to prevent unintended updates:

{
  "where": [
    "...",
    ["filter", ["and", [">=", "?age", 0], ["<=", "?age", 150]]]
  ],
  "delete": [...],
  "insert": [...]
}

5. Handle No Matches

Decide if no matches should be an error in your application:

const result = await transact(updateQuery);
if (result.flakes_retracted === 0) {
  console.warn('Update matched no entities');
}

6. Document Complex Updates

Comment complex update logic:

// Update inventory after sale completion
// - Decrement stock by sold quantity
// - Update last-sold timestamp
// - Mark as low-stock if below threshold
const updateInventory = { ... };

Performance Considerations

Index Usage

WHERE clauses use indexes:

  • Subject-based: Fast
  • Predicate-based: Fast
  • Pattern-based: May be slower

Batch Updates

For many updates, consider batching:

const updates = entities.map(e => createUpdateQuery(e));
for (const update of updates) {
  await transact(update);
}

Conditional Updates (Atomic / Compare-and-Swap Patterns)

Fluree’s WHERE/DELETE/INSERT transaction model supports powerful conditional update patterns that depend on the current database state. Every operation runs atomically within a single transaction — the WHERE clause reads current state, and the DELETE/INSERT templates modify it, all as one unit.

This guide covers common patterns for state-dependent updates with both JSON-LD and SPARQL UPDATE syntax.

Key Concept: How Conditional Updates Work

┌──────────────────────────────────────────────────────┐
│  1. WHERE   — query current state, bind variables    │
│  2. FILTER  — guard: eliminate rows that don't pass  │
│  3. BIND    — compute new values from bound vars     │
│  4. DELETE  — retract matched triples                │
│  5. INSERT  — assert new triples                     │
│                                                      │
│  All steps execute atomically in one transaction.    │
│  If WHERE returns zero rows, nothing happens (no-op).│
└──────────────────────────────────────────────────────┘

The WHERE clause runs against the current database state. If it matches, the bound variables flow into DELETE (to retract old values) and INSERT (to assert new ones). If WHERE returns zero rows — because a FILTER eliminated them or a pattern didn’t match — DELETE is skipped entirely (nothing to retract) and INSERT templates with unbound variables produce zero flakes.

Two INSERT behaviors

  • INSERT with variables from WHERE (e.g., "@id": "?s") — conditional. When WHERE returns zero rows, the variable is unbound and the INSERT produces nothing. Use this for CAS, state machines, and guards.
  • All-literal INSERT (e.g., "@id": "ex:alice") — unconditional. Fires even when WHERE returns zero rows. Use this for “delete-if-exists, always insert” patterns.

1. Atomic Increment / Decrement

Read the current value, compute a new one, write it back — all in one transaction. Classic use cases: counters, inventory quantities, vote tallies, loyalty points.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where": [
    { "@id": "ex:counter", "ex:count": "?old" },
    ["bind", "?new", "(+ ?old 1)"]
  ],
  "delete": { "@id": "ex:counter", "ex:count": "?old" },
  "insert": { "@id": "ex:counter", "ex:count": "?new" }
}

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE { ex:counter ex:count ?old }
INSERT { ex:counter ex:count ?new }
WHERE {
  ex:counter ex:count ?old .
  BIND (?old + 1 AS ?new)
}

Variations

  • Decrement: ["bind", "?new", "(- ?old 1)"]
  • Increment by N: ["bind", "?new", "(+ ?old 50)"]
  • Multiply: ["bind", "?new", "(* ?old 2)"]

2. Compare-and-Swap (Optimistic Concurrency)

Only update if the current value matches what the client last read. If another transaction changed the data since the read, the WHERE won’t match and the update is a no-op. This is the foundation of optimistic concurrency control.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where":  { "@id": "?s", "ex:version": 1, "ex:price": "?oldPrice" },
  "delete": { "@id": "?s", "ex:version": 1, "ex:price": "?oldPrice" },
  "insert": { "@id": "?s", "ex:version": 2, "ex:price": 24.99 }
}

How it works:

  1. Client reads ex:item and sees version: 1, price: 19.99
  2. Client submits update pinning version: 1 in WHERE
  3. If version is still 1 → match → update succeeds, version bumps to 2
  4. If another client already changed version to 2 → no match → no-op

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE { ?s ex:version 1 . ?s ex:price ?oldPrice }
INSERT { ?s ex:version 2 . ?s ex:price 24.99 }
WHERE {
  ?s ex:version 1 ;
     ex:price ?oldPrice .
}

Application-Level Handling

When a CAS update is a no-op (stale read), the client can detect this by checking whether t advanced:

response.t == request.t_before  →  stale read, retry with fresh data
response.t  > request.t_before  →  update succeeded

3. State Machine Transitions

Only allow transitions from a valid source state. Invalid transitions (e.g., trying shipped → delivered when the current state is pending) are silently rejected as no-ops.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where":  { "@id": "?order", "ex:status": "pending" },
  "delete": { "@id": "?order", "ex:status": "pending" },
  "insert": { "@id": "?order", "ex:status": "approved" }
}

This only fires when the order’s current status is exactly "pending". If the status is anything else, the WHERE returns zero rows and nothing changes.

Multi-Step Chain

Chain transitions across sequential transactions:

pending  →  approved  →  shipped  →  delivered

Each step is its own transaction, each guarded by the expected source state. If any step finds the state has already moved (e.g., another process approved it), that step is a no-op.

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE { ?order ex:status "pending" }
INSERT { ?order ex:status "approved" }
WHERE  { ?order ex:status "pending" }

4. Guarded Update (Threshold / Precondition)

Only apply a change when a numeric (or other) precondition is met. Classic use case: prevent overdrafts by checking balance before deducting.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where": [
    { "@id": "ex:account", "ex:balance": "?bal" },
    ["filter", "(>= ?bal 100)"],
    ["bind", "?newBal", "(- ?bal 100)"]
  ],
  "delete": { "@id": "ex:account", "ex:balance": "?bal" },
  "insert": { "@id": "ex:account", "ex:balance": "?newBal" }
}

How it works:

  • If balance >= 100 → FILTER passes → deduction applied
  • If balance < 100 → FILTER eliminates the row → no-op (overdraft prevented)

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE { ex:account ex:balance ?bal }
INSERT { ex:account ex:balance ?newBal }
WHERE {
  ex:account ex:balance ?bal .
  FILTER (?bal >= 100)
  BIND (?bal - 100 AS ?newBal)
}

5. Atomic Transfer (Double-Entry)

Move a value between two entities atomically in a single transaction. Both the debit and credit happen together — if the guard fails, neither side is modified. Classic use cases: balance transfers, inventory moves between warehouses.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where": [
    { "@id": "ex:alice-acct", "ex:balance": "?aliceBal" },
    { "@id": "ex:bob-acct",   "ex:balance": "?bobBal" },
    ["filter", "(>= ?aliceBal 150)"],
    ["bind", "?newAlice", "(- ?aliceBal 150)",
             "?newBob",   "(+ ?bobBal 150)"]
  ],
  "delete": [
    { "@id": "ex:alice-acct", "ex:balance": "?aliceBal" },
    { "@id": "ex:bob-acct",   "ex:balance": "?bobBal" }
  ],
  "insert": [
    { "@id": "ex:alice-acct", "ex:balance": "?newAlice" },
    { "@id": "ex:bob-acct",   "ex:balance": "?newBob" }
  ]
}

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE {
  ex:alice-acct ex:balance ?aliceBal .
  ex:bob-acct   ex:balance ?bobBal .
}
INSERT {
  ex:alice-acct ex:balance ?newAlice .
  ex:bob-acct   ex:balance ?newBob .
}
WHERE {
  ex:alice-acct ex:balance ?aliceBal .
  ex:bob-acct   ex:balance ?bobBal .
  FILTER (?aliceBal >= 150)
  BIND (?aliceBal - 150 AS ?newAlice)
  BIND (?bobBal + 150 AS ?newBob)
}

6. Insert-If-Not-Exists (Conditional Create)

Create an entity only if it doesn’t already exist. Useful for preventing duplicate records.

This pattern uses OPTIONAL + FILTER to check for absence.

JSON-LD

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "where": [
    ["optional", { "@id": "ex:bob", "schema:name": "?existing" }],
    ["filter", "(not (bound ?existing))"]
  ],
  "insert": {
    "@id": "ex:bob",
    "schema:name": "Bob",
    "schema:age": 25
  }
}

How it works:

  • If ex:bob does not exist: OPTIONAL leaves ?existing unbound → (not (bound ?existing)) is true → INSERT fires
  • If ex:bob exists: OPTIONAL binds ?existing(not (bound ?existing)) is false → FILTER eliminates the row → INSERT is skipped (zero solution rows = zero template instantiations)

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>
PREFIX schema: <http://schema.org/>

INSERT { ex:bob schema:name "Bob" ; schema:age 25 }
WHERE {
  OPTIONAL { ex:bob schema:name ?existing }
  FILTER (!BOUND(?existing))
}

7. Capped Accumulator (Increment with Ceiling)

Increment a value but never exceed a maximum. Useful for loyalty points, rate limits, or any bounded counter.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where": [
    { "@id": "ex:user", "ex:points": "?pts" },
    ["filter", "(< ?pts 1000)"],
    ["bind", "?new", "(if (> (+ ?pts 150) 1000) 1000 (+ ?pts 150))"]
  ],
  "delete": { "@id": "ex:user", "ex:points": "?pts" },
  "insert": { "@id": "ex:user", "ex:points": "?new" }
}

How it works:

  • If pts < 1000 → FILTER passes → BIND computes min(pts + 150, 1000) → update applied
  • If pts >= 1000 → FILTER eliminates → no-op (already at cap)

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE { ex:user ex:points ?pts }
INSERT { ex:user ex:points ?new }
WHERE {
  ex:user ex:points ?pts .
  FILTER (?pts < 1000)
  BIND (IF(?pts + 150 > 1000, 1000, ?pts + 150) AS ?new)
}

8. Cascading / Dependent Update (Graph Traversal)

Update one entity based on values from a related entity. The WHERE clause traverses the graph to gather data from multiple nodes.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where": [
    { "@id": "ex:order1", "ex:customer": "?cust", "ex:total": "?orderTotal" },
    { "@id": "?cust", "ex:lifetimeSpend": "?ls" },
    ["bind", "?newLs", "(+ ?ls ?orderTotal)"]
  ],
  "delete": { "@id": "?cust", "ex:lifetimeSpend": "?ls" },
  "insert": { "@id": "?cust", "ex:lifetimeSpend": "?newLs" }
}

This traverses order → customer and accumulates the order total into the customer’s lifetime spend — all atomically.

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE { ?cust ex:lifetimeSpend ?ls }
INSERT { ?cust ex:lifetimeSpend ?newLs }
WHERE {
  ex:order1 ex:customer ?cust ;
            ex:total ?orderTotal .
  ?cust ex:lifetimeSpend ?ls .
  BIND (?ls + ?orderTotal AS ?newLs)
}

9. Batch Conditional Update (Multi-Entity)

Apply the same transformation to every entity matching a pattern. The WHERE clause acts as a filter across the dataset.

Give All Engineers a 10% Raise

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where": [
    { "@id": "?emp", "ex:dept": "engineering", "ex:salary": "?sal" },
    ["bind", "?newSal", "(+ ?sal (/ ?sal 10))"]
  ],
  "delete": { "@id": "?emp", "ex:salary": "?sal" },
  "insert": { "@id": "?emp", "ex:salary": "?newSal" }
}

SPARQL UPDATE

PREFIX ex: <http://example.org/ns/>

DELETE { ?emp ex:salary ?sal }
INSERT { ?emp ex:salary ?newSal }
WHERE {
  ?emp ex:dept "engineering" ;
       ex:salary ?sal .
  BIND (?sal + ?sal / 10 AS ?newSal)
}

Batch Status Change

Approve all pending tasks in one transaction:

{
  "@context": { "ex": "http://example.org/ns/" },
  "where":  { "@id": "?task", "ex:status": "pending" },
  "delete": { "@id": "?task", "ex:status": "pending" },
  "insert": { "@id": "?task", "ex:status": "approved" }
}

Only entities with status: "pending" are affected; all others remain untouched.


10. Update with Audit Trail

Change a value and simultaneously record the old value for auditing — in a single atomic transaction.

JSON-LD

{
  "@context": { "ex": "http://example.org/ns/" },
  "where": [
    { "@id": "ex:product", "ex:price": "?oldPrice" },
    ["bind", "?newPrice", "(- ?oldPrice 10)"]
  ],
  "delete": { "@id": "ex:product", "ex:price": "?oldPrice" },
  "insert": {
    "@id": "ex:product",
    "ex:price": "?newPrice",
    "ex:previousPrice": "?oldPrice"
  }
}

After the update, the product has both its new price and a record of the previous price.

Note: Fluree’s immutable ledger also preserves full history via time travel, so you can always query any prior state. This pattern is useful when you want the previous value accessible without time-travel queries.


Pattern Summary

PatternWHERE MatchesFILTERBINDEffect on No-Match
Atomic incrementCurrent valueCompute new valueNo-op
Compare-and-swapExpected valueNo-op (stale read)
State machineExpected stateNo-op (invalid transition)
Guarded updateCurrent valueThreshold checkCompute new valueNo-op (guard failed)
Atomic transferBoth accountsSender balanceBoth new balancesNo-op (insufficient)
Insert-if-not-existsOPTIONAL probenot boundNo-op (already exists)
Capped accumulatorCurrent valueBelow capMin(new, cap)No-op (at cap)
Cascading updateGraph traversalDerived valueNo-op (path broken)
Batch updateAll matchingPer-entity transformOnly matching entities
Audit trailCurrent valueNew valueNo-op

Best Practices

  1. Prefer pattern matching over FILTER for equality. Pinning a value in the WHERE pattern (e.g., "ex:status": "pending") is simpler and more efficient than ["filter", "(= ?st \"pending\")"].

  2. Check t to detect no-ops. When your application needs to distinguish between “update succeeded” and “condition not met,” compare t before and after the transaction.

  3. Use BIND for all computed values. The ["bind", "?var", "(expression)"] form keeps computation inside the transaction, ensuring atomicity.

  4. Use OPTIONAL + FILTER for absence checks. The ["optional", ...], ["filter", "(not (bound ?var))"] pattern is the idiomatic way to test for non-existence.

  5. Leverage Fluree’s immutability. Every transaction creates an immutable commit. Even without explicit audit trail patterns, you can always query previous states using time travel. Use the audit trail pattern when you want the old value readily accessible in the current state.

Retractions

Retractions remove data from Fluree. While data is never truly deleted (it remains in history), retractions mark triples as no longer current.

What is a Retraction?

A retraction removes a triple from the current state:

  • The triple existed at some point (was asserted)
  • The retraction marks it as no longer true
  • Historical queries can still see the triple
  • Current queries don’t see the triple

Basic Retraction

Remove a specific triple:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "where": [
    { "@id": "ex:alice", "schema:age": "?age" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:age": "?age" }
  ]
}

This removes the age property from ex:alice.

Retract Specific Property

Remove a specific property value:

curl -X POST "http://localhost:8090/v1/fluree/update?ledger=mydb:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@context": {
      "ex": "http://example.org/ns/",
      "schema": "http://schema.org/"
    },
    "where": [
      { "@id": "ex:alice", "schema:email": "alice.old@example.org" }
    ],
    "delete": [
      { "@id": "ex:alice", "schema:email": "alice.old@example.org" }
    ]
  }'

Retract All Values of a Property

Remove all values for a property:

{
  "where": [
    { "@id": "ex:alice", "schema:telephone": "?phone" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:telephone": "?phone" }
  ]
}

If ex:alice has multiple phone numbers, this removes them all.

Retract Multiple Properties

Remove several properties at once:

{
  "where": [
    { "@id": "ex:alice", "schema:email": "?email" },
    { "@id": "ex:alice", "schema:telephone": "?phone" },
    { "@id": "ex:alice", "ex:preferences": "?prefs" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:email": "?email" },
    { "@id": "ex:alice", "schema:telephone": "?phone" },
    { "@id": "ex:alice", "ex:preferences": "?prefs" }
  ]
}

Retract Entire Entity

Remove all triples for an entity:

{
  "where": [
    { "@id": "ex:alice", "?predicate": "?value" }
  ],
  "delete": [
    { "@id": "ex:alice", "?predicate": "?value" }
  ]
}

This finds all triples where ex:alice is the subject and retracts them all.

Result: Entity is “deleted” from current state (but remains in history).

Conditional Retractions

Retract only if conditions are met:

{
  "where": [
    { "@id": "?user", "@type": "schema:Person" },
    { "@id": "?user", "ex:lastLogin": "?lastLogin" },
    { "@id": "?user", "ex:status": "?status" }
  ],
  "filter": "?lastLogin < '2023-01-01' && ?status == 'inactive'",
  "delete": [
    { "@id": "?user", "?predicate": "?value" }
  ],
  "where": [
    { "@id": "?user", "?predicate": "?value" }
  ]
}

Removes all inactive users who haven’t logged in since 2023.

Retract Relationships

Remove Single Relationship

{
  "where": [
    { "@id": "ex:alice", "schema:knows": "ex:bob" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:knows": "ex:bob" }
  ]
}

Remove All Relationships of a Type

{
  "where": [
    { "@id": "ex:alice", "schema:knows": "?person" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:knows": "?person" }
  ]
}

Bidirectional Relationship Removal

Remove relationship in both directions:

{
  "where": [
    { "@id": "ex:alice", "schema:knows": "ex:bob" },
    { "@id": "ex:bob", "schema:knows": "ex:alice" }
  ],
  "delete": [
    { "@id": "ex:alice", "schema:knows": "ex:bob" },
    { "@id": "ex:bob", "schema:knows": "ex:alice" }
  ]
}

Cascading Retractions

Retract an entity and all related entities:

{
  "where": [
    { "@id": "ex:order-123", "ex:items": "?item" },
    { "@id": "?item", "?itemPred": "?itemVal" },
    { "@id": "ex:order-123", "?orderPred": "?orderVal" }
  ],
  "delete": [
    { "@id": "?item", "?itemPred": "?itemVal" },
    { "@id": "ex:order-123", "?orderPred": "?orderVal" }
  ]
}

Deletes order and all its items.

Soft Delete vs Hard Retraction

Mark as deleted without retracting:

{
  "where": [
    { "@id": "ex:alice", "ex:status": "?status" }
  ],
  "delete": [
    { "@id": "ex:alice", "ex:status": "?status" }
  ],
  "insert": [
    { "@id": "ex:alice", "ex:status": "deleted" },
    { "@id": "ex:alice", "ex:deletedAt": "2024-01-22T10:30:00Z" }
  ]
}

Benefits:

  • Easy to “undelete”
  • Audit trail of deletion
  • Can query deleted entities
  • Less impact on indexes

Hard Retraction

Retract all data:

{
  "where": [
    { "@id": "ex:alice", "?predicate": "?value" }
  ],
  "delete": [
    { "@id": "ex:alice", "?predicate": "?value" }
  ]
}

When to use:

  • Legal requirement to remove data
  • Sensitive data that must be removed
  • Test data cleanup

Note: Data still exists in history. For true deletion, see data purging operations.

Pattern-Based Retractions

Retract by Type

Remove all entities of a type:

{
  "where": [
    { "@id": "?entity", "@type": "ex:TempData" },
    { "@id": "?entity", "?predicate": "?value" }
  ],
  "delete": [
    { "@id": "?entity", "?predicate": "?value" }
  ]
}

Retract by Property Value

Remove entities with specific property:

{
  "where": [
    { "@id": "?entity", "ex:expired": true },
    { "@id": "?entity", "?predicate": "?value" }
  ],
  "delete": [
    { "@id": "?entity", "?predicate": "?value" }
  ]
}

Retraction Semantics

Idempotent

Retracting a non-existent triple is a no-op:

t=1: No triple exists
t=2: DELETE { ex:alice schema:age 30 }
     Result: No change (triple didn't exist)

No Cascading by Default

Retracting an entity doesn’t automatically retract references to it:

t=1: ex:alice schema:worksFor ex:company-a
     ex:company-a schema:name "Acme"

t=2: DELETE all triples for ex:company-a

Result:
- ex:company-a properties are gone
- ex:alice schema:worksFor ex:company-a REMAINS
- Reference is now "dangling"

To cascade, explicitly match and delete references.

Time Travel and Retractions

Historical Queries See Retracted Data

# Current query (after retraction at t=5)
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "mydb:main", "select": ["?name"], ...}'
# Returns: [] (no results)

# Historical query (before retraction)
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "mydb:main@t:3", "select": ["?name"], ...}'
# Returns: [{"name": "Alice"}] (data visible)

History Shows Retractions

Query the history to see both assertions and retractions:

curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{
    "@context": { "schema": "http://schema.org/" },
    "from": "mydb:main@t:1",
    "to": "mydb:main@t:latest",
    "select": ["?name", "?t", "?op"],
    "where": [
      { "@id": "ex:alice", "schema:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
    ],
    "orderBy": "?t"
  }'

Response:

[
  ["Alice", 1, true],
  ["Alice", 5, false]
]

The @t annotation captures the transaction time and @op binds a boolean — true for assertions, false for retractions (mirroring Flake.op on disk).

Error Handling

Common Errors

No Match (Not an Error):

{
  "where": [{ "@id": "ex:nonexistent", "schema:name": "?name" }],
  "delete": [{ "@id": "ex:nonexistent", "schema:name": "?name" }]
}

Result: No changes, no error.

Invalid Pattern:

{
  "error": "QueryError",
  "message": "Invalid WHERE pattern",
  "code": "INVALID_PATTERN"
}

Performance Considerations

Index Updates

Retractions update all indexes:

  • Each retracted triple updates SPOT, POST, OPST, PSOT
  • Large retractions can impact performance
  • Consider batch size for bulk deletions

Indexing Lag

Large retractions may increase indexing lag:

  • Monitor commit_t - index_t
  • Allow time for indexing between large retractions
  • Consider scheduling during low-traffic periods

Vacuum/Compaction

Eventually, consider compaction to reclaim space from retracted data (implementation-specific).

Best Practices

1. Use Soft Deletes

Prefer marking as deleted:

Good:

{
  "insert": [{ "@id": "ex:alice", "ex:status": "deleted" }]
}

Over:

{
  "delete": [{ "@id": "ex:alice", "?pred": "?val" }]
}

2. Add Audit Metadata

Include deletion metadata:

{
  "insert": [
    { "@id": "ex:alice", "ex:status": "deleted" },
    { "@id": "ex:alice", "ex:deletedAt": "2024-01-22T10:30:00Z" },
    { "@id": "ex:alice", "ex:deletedBy": "user-admin" },
    { "@id": "ex:alice", "ex:deleteReason": "User request" }
  ]
}

3. Be Specific in WHERE

Avoid accidentally retracting too much:

Good:

{
  "where": [{ "@id": "ex:alice", "schema:age": "?age" }],
  "delete": [{ "@id": "ex:alice", "schema:age": "?age" }]
}

Dangerous:

{
  "where": [{ "@id": "?entity", "schema:age": "?age" }],
  "delete": [{ "@id": "?entity", "?pred": "?val" }]
}

4. Test Retractions

Test on development data:

// Count before
const countBefore = await query('SELECT (COUNT(?e) as ?count) WHERE { ... }');

// Retract
await transact(retractionQuery);

// Count after
const countAfter = await query('SELECT (COUNT(?e) as ?count) WHERE { ... }');

console.log(`Retracted ${countBefore - countAfter} entities`);

5. Handle Cascading Explicitly

Don’t rely on cascading—make it explicit:

{
  "where": [
    { "@id": "ex:order-123", "?pred": "?val" },
    { "@id": "?item", "ex:orderId": "ex:order-123" },
    { "@id": "?item", "?itemPred": "?itemVal" }
  ],
  "delete": [
    { "@id": "ex:order-123", "?pred": "?val" },
    { "@id": "?item", "?itemPred": "?itemVal" }
  ]
}

6. Document Deletion Logic

Comment deletion logic:

// Hard delete expired sessions older than 30 days
// - Finds all sessions with expired=true and oldDate
// - Retracts all properties
// - Logs count of deleted sessions
await retractExpiredSessions();

7. Monitor Impact

Track retraction metrics:

  • Count of retractions
  • Entities affected
  • Indexing lag after large retractions
  • Query performance impact

Data Privacy Compliance

GDPR “Right to be Forgotten”

For compliance, consider:

  1. Soft delete first (marks as deleted)
  2. Schedule purge (actual removal from history)
  3. Anonymize references (replace with pseudonymous ID)

Example:

{
  "where": [{ "@id": "ex:user-123", "?pred": "?val" }],
  "delete": [{ "@id": "ex:user-123", "?pred": "?val" }],
  "insert": [{
    "@id": "ex:user-123",
    "ex:anonymized": true,
    "ex:anonymizedAt": "2024-01-22T10:30:00Z"
  }]
}

Note: True purging from history requires administrative operations beyond standard retractions.

Turtle and TriG Ingest

Fluree supports ingesting RDF data in Turtle (Terse RDF Triple Language) and TriG formats. Turtle is a compact, human-readable format for RDF triples, while TriG extends Turtle to support named graphs.

What is Turtle?

Turtle is a W3C standard format for writing RDF triples. It’s more readable than XML-based formats and commonly used in the Semantic Web community.

Example Turtle:

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

ex:alice a schema:Person ;
  schema:name "Alice" ;
  schema:email "alice@example.org" ;
  schema:age 30 .

ex:bob a schema:Person ;
  schema:name "Bob" ;
  schema:email "bob@example.org" .

Transaction Endpoints

Fluree supports Turtle and TriG on different endpoints with different semantics:

EndpointTurtle (text/turtle)TriG (application/trig)
/insertSupported (fast direct path)Not supported (400 error)
/upsertSupportedSupported
  • Insert (/insert): Pure insert semantics. Uses fast direct flake parsing. Will fail if subjects already exist with conflicting data. TriG is not supported because named graphs require the upsert path for GRAPH block extraction.
  • Upsert (/upsert): For each (subject, predicate) pair, existing values are retracted before new values are asserted. Supports TriG with GRAPH blocks for named graph ingestion.

Basic Turtle Transaction

Submit Turtle data via HTTP API:

# Insert (pure insert, fast path)
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
  -H "Content-Type: text/turtle" \
  --data-binary '@data.ttl'

# Or upsert (replace existing values)
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: text/turtle" \
  --data-binary '@data.ttl'

File: data.ttl

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

ex:alice a schema:Person ;
  schema:name "Alice" ;
  schema:email "alice@example.org" .

Turtle Syntax

Prefixes

Define namespace prefixes:

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

Basic Triples

ex:alice schema:name "Alice" .
ex:alice schema:age 30 .
ex:alice schema:email "alice@example.org" .

Semicolon Shorthand

Share subject across predicates:

ex:alice schema:name "Alice" ;
         schema:age 30 ;
         schema:email "alice@example.org" .

Equivalent to three separate triples.

Comma Shorthand

Share subject and predicate:

ex:alice schema:email "alice@example.org" ,
                      "alice@work.com" ,
                      "alice@personal.net" .

Creates three triples with same subject and predicate.

Type Shorthand

ex:alice a schema:Person .

Equivalent to:

ex:alice rdf:type schema:Person .

Literals

Plain String:

ex:alice schema:name "Alice" .

Typed Literal:

ex:alice schema:age "30"^^xsd:integer .
ex:alice schema:price "29.99"^^xsd:decimal .
ex:alice schema:birthDate "1994-05-15"^^xsd:date .

Language-Tagged:

ex:alice schema:name "Alice"@en .
ex:alice schema:name "アリス"@ja .

Boolean:

ex:alice schema:active true .

Numbers:

ex:alice schema:age 30 .
ex:alice schema:height 1.68 .

IRIs

Full IRI:

<http://example.org/ns/alice> schema:name "Alice" .

Prefixed IRI:

ex:alice schema:name "Alice" .

Blank Nodes

Anonymous:

ex:alice schema:address [
  a schema:PostalAddress ;
  schema:streetAddress "123 Main St" ;
  schema:addressLocality "Springfield"
] .

Labeled:

ex:alice schema:address _:addr1 .

_:addr1 a schema:PostalAddress ;
  schema:streetAddress "123 Main St" .

Collections

RDF Lists:

ex:alice schema:favoriteColors ( "red" "blue" "green" ) .

Equivalent to linked list structure in RDF.

Bulk Import

From File

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: text/turtle" \
  --data-binary '@large-dataset.ttl'

From URL

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: text/turtle" \
  -d "@https://example.org/data.ttl"

Streaming Large Files

For very large files, split into batches:

# Split large file
split -l 10000 large-dataset.ttl batch-

# Import batches
for file in batch-*; do
  curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
    -H "Content-Type: text/turtle" \
    --data-binary "@$file"
  sleep 1  # Allow indexing time
done

Complete Example

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Company
ex:company-a a schema:Organization ;
  schema:name "Acme Corp" ;
  schema:url <https://acme.example.com> ;
  schema:foundingDate "2000-01-15"^^xsd:date .

# People
ex:alice a schema:Person ;
  schema:name "Alice" ;
  schema:email "alice@example.org" , "alice@work.com" ;
  schema:age 30 ;
  schema:worksFor ex:company-a ;
  schema:address [
    a schema:PostalAddress ;
    schema:streetAddress "123 Main St" ;
    schema:addressLocality "Springfield" ;
    schema:postalCode "12345"
  ] .

ex:bob a schema:Person ;
  schema:name "Bob" ;
  schema:email "bob@example.org" ;
  schema:age 25 ;
  schema:worksFor ex:company-a ;
  schema:knows ex:alice .

ex:carol a schema:Person ;
  schema:name "Carol" ;
  schema:email "carol@example.org" ;
  schema:knows ex:alice , ex:bob .

Format Conversion

From JSON-LD to Turtle

Many tools can convert between formats:

# Using rapper (from Redland)
rapper -i json-ld -o turtle data.jsonld > data.ttl

# Using riot (from Apache Jena)
riot --output=turtle data.jsonld > data.ttl

From RDF/XML to Turtle

rapper -i rdfxml -o turtle data.rdf > data.ttl

From N-Triples to Turtle

rapper -i ntriples -o turtle data.nt > data.ttl

Validation

Validate Turtle syntax before importing:

# Using rapper
rapper -i turtle -c data.ttl

# Using riot
riot --validate data.ttl

Error Handling

Syntax Errors

{
  "error": "ParseError",
  "message": "Invalid Turtle syntax at line 5",
  "code": "TURTLE_PARSE_ERROR",
  "details": {
    "line": 5,
    "column": 12,
    "token": "unexpected EOF"
  }
}

Invalid IRIs

{
  "error": "ValidationError",
  "message": "Invalid IRI: not a valid URI",
  "code": "INVALID_IRI",
  "details": {
    "iri": "not a uri",
    "line": 8
  }
}

Performance Tips

1. Use Batch Import

Import large datasets in batches of 10,000-100,000 triples.

2. Optimize Prefixes

Use short prefixes for efficiency:

Good:

@prefix ex: <http://example.org/ns/> .
ex:alice ex:name "Alice" .

Less efficient:

<http://example.org/ns/alice> <http://example.org/ns/name> "Alice" .

3. Monitor Memory

Large Turtle files consume memory during parsing. Split very large files.

4. Allow Indexing Time

After large imports, wait for indexing:

# Import
curl -X POST ... --data-binary '@batch.ttl'

# Wait for indexing
sleep 5

# Import next batch
curl -X POST ... --data-binary '@batch2.ttl'

Best Practices

1. Use Standard Vocabularies

Prefer well-known vocabularies:

@prefix schema: <http://schema.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/terms/> .

2. Include Types

Always specify entity types:

ex:alice a schema:Person ;
  schema:name "Alice" .

3. Use Typed Literals

Be explicit about datatypes:

ex:alice schema:birthDate "1994-05-15"^^xsd:date ;
         schema:age "30"^^xsd:integer ;
         schema:height "1.68"^^xsd:decimal .

4. Document Namespaces

Comment your prefixes:

# Schema.org vocabulary for general entities
@prefix schema: <http://schema.org/> .

# Application-specific namespace
@prefix ex: <http://example.org/ns/> .

# Standard XSD datatypes
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

5. Validate Before Import

Always validate Turtle syntax:

rapper -i turtle -c data.ttl

6. Split Large Files

For files > 100MB, split into smaller batches.

7. Include Provenance

Add metadata about the import:

ex:dataset-import-2024-01-22 a ex:DatasetImport ;
  schema:dateCreated "2024-01-22T10:00:00Z"^^xsd:dateTime ;
  schema:author <https://example.org/users/admin> ;
  ex:sourceFile "data-2024-01.ttl" ;
  ex:recordCount 1234567 .

Comparing Formats

JSON-LD vs Turtle

JSON-LD:

  • Native to Fluree
  • Easy for JavaScript applications
  • Verbose for large datasets

Turtle:

  • More compact
  • Standard in RDF community
  • Better for bulk imports
  • Requires conversion for JavaScript apps

When to Use Turtle

Use Turtle for:

  • Large bulk imports
  • Integration with RDF tools
  • Data from Semantic Web sources
  • Data exchange with RDF systems

Use JSON-LD for:

  • Application integration
  • Real-time transactions
  • JavaScript/TypeScript apps
  • REST API interactions

TriG Format (Named Graphs)

TriG extends Turtle to support named graphs. Each named graph groups triples under a graph IRI.

What is TriG?

TriG (TriG RDF Triple Graph) is a W3C standard format that adds named graph support to Turtle syntax. It allows you to partition data into logical groups that can be queried independently.

Basic TriG Syntax

@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

# Default graph triples (no GRAPH block)
ex:company a schema:Organization ;
    schema:name "Acme Corp" .

# Named graph for products
GRAPH <http://example.org/graphs/products> {
    ex:widget a schema:Product ;
        schema:name "Widget" ;
        schema:price "29.99"^^xsd:decimal .

    ex:gadget a schema:Product ;
        schema:name "Gadget" ;
        schema:price "49.99"^^xsd:decimal .
}

# Named graph for inventory
GRAPH <http://example.org/graphs/inventory> {
    ex:widget schema:inventory 42 ;
        schema:warehouse "main" .

    ex:gadget schema:inventory 15 ;
        schema:warehouse "secondary" .
}

Submitting TriG Data

TriG is only supported on the upsert endpoint (or transact). Use the application/trig content type:

# TriG requires upsert (for named graph support)
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/trig" \
  --data-binary '@data.trig'

TriG on the /insert endpoint will return a 400 error because named graph extraction requires the upsert path.

Querying Named Graphs

After ingesting TriG data, query specific graphs using JSON-LD with the structured from object:

{
  "@context": { "schema": "http://schema.org/" },
  "from": {
    "@id": "mydb:main",
    "graph": "http://example.org/graphs/products"
  },
  "select": ["?name", "?price"],
  "where": [
    { "@id": "?product", "schema:name": "?name" },
    { "@id": "?product", "schema:price": "?price" }
  ]
}

For cross-graph queries, use fromNamed with aliases:

{
  "@context": { "schema": "http://schema.org/" },
  "from": "mydb:main",
  "fromNamed": [
    { "@id": "mydb:main", "alias": "products", "graph": "http://example.org/graphs/products" },
    { "@id": "mydb:main", "alias": "inventory", "graph": "http://example.org/graphs/inventory" }
  ],
  "select": ["?name", "?inventory", "?warehouse"],
  "where": [
    ["graph", "products", { "@id": "?product", "schema:name": "?name" }],
    ["graph", "inventory", { "@id": "?product", "schema:inventory": "?inventory", "schema:warehouse": "?warehouse" }]
  ]
}

Graph IDs

Fluree assigns internal graph IDs to named graphs:

Graph IDPurpose
0Default graph (triples without GRAPH block)
1txn-meta (commit metadata)
2+User-defined named graphs

TriG with Transaction Metadata

You can combine named graphs with transaction metadata using the special #txn-meta graph fragment:

@prefix ex: <http://example.org/ns/> .
@prefix f: <https://ns.flur.ee/db#> .

# Transaction metadata (stored in txn-meta graph)
GRAPH <#txn-meta> {
    fluree:commit:this ex:jobId "batch-import-001" ;
        ex:source "warehouse-export" ;
        ex:operator "system-admin" .
}

# User data in named graph
GRAPH <http://example.org/graphs/products> {
    ex:widget a ex:Product ;
        ex:name "Widget" .
}

Limits

  • Maximum 256 named graphs per transaction
  • Maximum 8KB per graph IRI
  • Named graphs are queryable after indexing completes

When to Use TriG

Use TriG when you need to:

  • Partition data into logical groups
  • Separate data by source, tenant, or domain
  • Maintain provenance at the graph level
  • Integrate with RDF quad stores

Use plain Turtle when:

  • All data belongs in the default graph
  • Graph partitioning isn’t needed
  • Working with simpler data models

Bulk import (Rust API)

For high-throughput ingest of large Turtle datasets into a fresh ledger, prefer the bulk import pipeline exposed by fluree-db-api:

This pipeline:

  • Parses Turtle in parallel, but writes commits serially (hash-linked commit chain).
  • Streams run generation during import and builds multi-order binary indexes (SPOT/PSOT/POST/OPST).
  • Writes an index root to CAS and publishes it to the nameservice so queries can use the normal db() / query() path.

Temporary tmp_import/ session files are cleaned up on success (configurable).

Tools and Libraries

Command-Line Tools

Rapper (Redland):

# Install on macOS
brew install redland

# Parse Turtle
rapper -i turtle data.ttl

Riot (Apache Jena):

# Install
# Download from https://jena.apache.org/

# Validate
riot --validate data.ttl

Programming Libraries

JavaScript/TypeScript:

import { Parser } from 'n3';

const parser = new Parser();
const quads = parser.parse(turtleString);

Python:

from rdflib import Graph

g = Graph()
g.parse('data.ttl', format='turtle')

Java:

import org.apache.jena.rdf.model.*;

Model model = ModelFactory.createDefaultModel();
model.read("data.ttl", "TURTLE");

Signed / Credentialed Transactions

Fluree supports cryptographically signed transactions using JSON Web Signatures (JWS) and Verifiable Credentials (VC). Signed transactions provide authentication, integrity, and non-repudiation for all transaction operations.

Why Sign Transactions?

Signed transactions provide:

  • Authentication: Prove who submitted the transaction
  • Integrity: Ensure transaction hasn’t been tampered with
  • Non-repudiation: Transaction author cannot deny authorship
  • Authorization: Link transaction to specific identity for policy enforcement
  • Audit Trail: Complete provenance of all data changes

Basic Signed Transaction

Step 1: Create Transaction

Create your transaction as normal:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  },
  "@graph": [
    {
      "@id": "ex:alice",
      "@type": "schema:Person",
      "schema:name": "Alice"
    }
  ]
}

Step 2: Sign with JWS

Sign the transaction using JWS:

import jose from 'jose';

const privateKey = ... // Your Ed25519 private key

const jws = await new jose.SignJWT(transaction)
  .setProtectedHeader({
    alg: 'EdDSA',
    kid: 'did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK'
  })
  .setIssuedAt()
  .setExpirationTime('15m')
  .sign(privateKey);

Step 3: Submit

Submit the signed transaction:

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/jose" \
  -d "$jws"

JWS Format

Compact Serialization

eyJhbGciOiJFZDI1NTE5IiwidHlwIjoiSldUIn0.eyJAY29udGV4dCI6eyJleCI6Imh0...

Three base64url-encoded parts separated by dots:

  1. Header (algorithm, key ID)
  2. Payload (transaction)
  3. Signature

JSON Serialization

{
  "payload": "eyJAY29udGV4dCI6eyJleCI6Imh0...",
  "signatures": [
    {
      "protected": "eyJhbGciOiJFZDI1NTE5In0",
      "signature": "c2lnbmF0dXJl..."
    }
  ]
}

Verifiable Credentials

Use W3C Verifiable Credentials for transactions:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1"
  ],
  "type": ["VerifiableCredential"],
  "issuer": "did:key:z6Mkh...",
  "issuanceDate": "2024-01-22T10:00:00Z",
  "credentialSubject": {
    "id": "did:key:z6Mkh...",
    "flureeTransaction": {
      "@context": {
        "ex": "http://example.org/ns/"
      },
      "@graph": [
        { "@id": "ex:alice", "schema:name": "Alice" }
      ]
    }
  },
  "proof": {
    "type": "Ed25519Signature2020",
    "created": "2024-01-22T10:00:00Z",
    "verificationMethod": "did:key:z6Mkh...#z6Mkh...",
    "proofPurpose": "authentication",
    "proofValue": "z58DAdFfa9SkqZMVP..."
  }
}

Submit with:

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
  -H "Content-Type: application/vc+ld+json" \
  -d @credential.json

Supported Algorithm

EdDSA (Ed25519):

  • Fast, secure, deterministic
  • 64-byte signatures
  • 128-bit security level

Identity Management

Decentralized Identifiers (DIDs)

Use DIDs to identify transaction authors:

did:key (simplest):

did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK

did:web (organization-managed):

did:web:example.com:users:alice

did:ion (blockchain-based):

did:ion:EiClkZMDxPKqC9c-umQfTkR8vvZ9JPhl_xLDI9Nfk38w5w

Key Resolution

Standalone server signed requests verify Ed25519 JWS material from the request itself (for example embedded JWK / did:key) or configured OIDC/JWKS issuers. There is no /admin/keys registration endpoint.

Transaction Provenance

Signed transactions include author information in commit metadata:

{
  "t": 42,
  "timestamp": "2024-01-22T10:30:00Z",
  "commit_id": "bafybeig...commitT42",
  "author": "did:key:z6Mkh...",
  "signature": "z58DAdFfa9...",
  "flakes_added": 3,
  "flakes_retracted": 0
}

Query provenance:

PREFIX f: <https://ns.flur.ee/db#>

SELECT ?t ?author ?timestamp
WHERE {
  ?commit f:t ?t ;
          f:author ?author ;
          f:timestamp ?timestamp .
}
ORDER BY DESC(?t)

Policy-Based Authorization

Use signed transaction author for authorization:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "f": "https://ns.flur.ee/db#"
  },
  "@id": "ex:admin-policy",
  "f:policy": [
    {
      "f:subject": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
      "f:action": "transact",
      "f:allow": true
    }
  ]
}

Only transactions signed by this DID will be accepted.

Code Examples

JavaScript/TypeScript

import jose from 'jose';
import { Ed25519VerificationKey2020 } from '@digitalbazaar/ed25519-verification-key-2020';

async function signTransaction(transaction: object, privateKey: Uint8Array) {
  const jws = await new jose.SignJWT(transaction)
    .setProtectedHeader({
      alg: 'EdDSA',
      kid: 'did:key:z6Mkh...'
    })
    .setIssuedAt()
    .setExpirationTime('15m')
    .sign(privateKey);
  
  return jws;
}

async function submitSignedTransaction(ledger: string, transaction: object) {
  const signed = await signTransaction(transaction, privateKey);
  
  const response = await fetch(`http://localhost:8090/v1/fluree/upsert?ledger=${ledger}`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/jose' },
    body: signed
  });
  
  return await response.json();
}

Python

from jwcrypto import jwk, jws
import json

def sign_transaction(transaction, private_key):
    # Create JWK from private key
    key = jwk.JWK.from_json(private_key)
    
    # Create JWS
    payload = json.dumps(transaction).encode('utf-8')
    jws_token = jws.JWS(payload)
    jws_token.add_signature(
        key,
        alg='EdDSA',
        protected=json.dumps({"kid": "did:key:z6Mkh..."})
    )
    
    return jws_token.serialize()

def submit_signed_transaction(ledger, transaction, private_key):
    signed = sign_transaction(transaction, private_key)
    
    response = requests.post(
        f'http://localhost:8090/v1/fluree/upsert?ledger={ledger}',
        headers={'Content-Type': 'application/jose'},
        data=signed
    )
    
    return response.json()

Verification Process

When Fluree receives a signed transaction:

  1. Extract signature and header
  2. Resolve key ID (kid) to public key
  3. Verify signature using public key
  4. Check expiration (if exp claim present)
  5. Validate issuer (if required by policy)
  6. Apply authorization policies based on DID
  7. Process transaction if verification succeeds

Error Handling

Invalid Signature

{
  "error": "SignatureVerificationFailed",
  "message": "Invalid signature",
  "code": "INVALID_SIGNATURE",
  "details": {
    "kid": "did:key:z6Mkh...",
    "reason": "Signature does not match"
  }
}

Expired Transaction

{
  "error": "TokenExpired",
  "message": "Transaction signature expired",
  "code": "TOKEN_EXPIRED",
  "details": {
    "exp": 1642857600,
    "now": 1642858000
  }
}

Key Not Found

{
  "error": "KeyNotFound",
  "message": "Public key not registered",
  "code": "KEY_NOT_FOUND",
  "details": {
    "kid": "did:key:z6Mkh..."
  }
}

Unauthorized

{
  "error": "Forbidden",
  "message": "Policy denies transact permission",
  "code": "POLICY_DENIED",
  "details": {
    "subject": "did:key:z6Mkh...",
    "action": "transact",
    "ledger": "mydb:main"
  }
}

Best Practices

1. Use EdDSA (Ed25519)

Best security and performance:

{
  "alg": "EdDSA",
  "kid": "did:key:z6Mkh..."
}

2. Set Expiration

Always include expiration:

.setExpirationTime('15m')  // 15 minutes

3. Secure Key Storage

Never hardcode private keys:

Good:

const privateKey = await loadKeyFromSecureStorage();

Bad:

const privateKey = "hardcoded-key-here";

4. Use did:key for Simplicity

For simple deployments:

did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK

5. Implement Key Rotation

Rotate keys every 90-180 days:

async function rotateKey() {
  const newKey = generateKeyPair();
  await registerKey(newKey.publicKey);
  await revokeKey(oldKey.kid);
  updateApplicationKey(newKey);
}

6. Include Request ID

Add unique ID to prevent replay:

.setClaim('jti', crypto.randomUUID())

7. Use HTTPS

Always use HTTPS with signed transactions to prevent replay attacks.

Compliance and Auditing

Complete Audit Trail

Signed transactions provide complete audit trail:

SELECT ?t ?author ?timestamp ?action
WHERE {
  ?commit f:t ?t ;
          f:author ?author ;
          f:timestamp ?timestamp .
  ?commit f:assert ?assertion .
  ?assertion ?predicate ?object .
}
ORDER BY DESC(?t)

Regulatory Compliance

Signed transactions support:

  • SOC 2 (audit trails)
  • HIPAA (data provenance)
  • GDPR (data processing records)
  • PCI DSS (transaction logs)

Non-Repudiation

Cryptographic signatures provide non-repudiation:

  • Author cannot deny submitting transaction
  • Tampering is detectable
  • Legal admissibility in disputes

Commit Receipts and tx-id

Every successful transaction returns a commit receipt containing metadata about the transaction. This receipt provides important information for tracking, auditing, and referencing transactions.

Commit Receipt Structure

Basic commit receipt:

{
  "t": 42,
  "timestamp": "2024-01-22T10:30:00.000Z",
  "commit_id": "bafybeig...commitT42",
  "flakes_added": 15,
  "flakes_retracted": 3,
  "previous_commit_id": "bafybeig...commitT41"
}

Receipt Fields

Transaction Time (t)

The transaction time is a monotonically increasing integer uniquely identifying this transaction:

{
  "t": 42
}

Properties:

  • Unique across all ledgers in the Fluree instance
  • Monotonically increasing (never decreases)
  • Used for time travel queries
  • Basis for temporal ordering

Usage:

# Query at specific transaction
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "mydb:main@t:42", ...}'

Read-after-write consistency: The t value is the key to ensuring queries see freshly committed data. Pass it as min_t to refresh() to gate queries on a minimum transaction time. See Time Travel — Consistency and Read-After-Write for details.

Timestamp

ISO 8601 formatted timestamp of when the transaction was committed:

{
  "timestamp": "2024-01-22T10:30:00.000Z"
}

Properties:

  • UTC timezone
  • Millisecond precision
  • Server-assigned (not client-provided)
  • Monotonic (within same transaction time ordering)

Usage:

# Query at specific time
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "mydb:main@iso:2024-01-22T10:30:00Z", ...}'

Commit ID

Content-addressed identifier for the commit:

{
  "commit_id": "bafybeig...commitT42"
}

Properties:

  • CIDv1 value (base32-lower multibase string)
  • Derived from the commit’s canonical bytes via SHA-256
  • Storage-agnostic – does not depend on where the commit is stored
  • Can be used to fetch the commit from any content store

Usage:

# Query at specific commit
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "mydb:main@commit:bafybeig...commitT42", ...}'

Flake Counts

Number of triples added and retracted:

{
  "flakes_added": 15,
  "flakes_retracted": 3
}

flakes_added: Number of new triples asserted flakes_retracted: Number of existing triples removed

Net change: flakes_added - flakes_retracted

Previous Commit

ContentId of the previous commit (forms a chain):

{
  "previous_commit_id": "bafybeig...commitT41"
}

Properties:

  • Links to parent commit by ContentId
  • Forms immutable commit chain
  • Enables commit history traversal
  • null for first transaction (t=1)

Extended Receipt Fields

Author (Signed Transactions)

For signed transactions, includes author DID:

{
  "t": 42,
  "author": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
  "signature": "z58DAdFfa9SkqZMVP...",
  ...
}

Message

Optional commit message (if provided):

{
  "t": 42,
  "message": "Add new customer records for Q1 2024",
  ...
}

Ledger

Ledger ID:

{
  "t": 42,
  "ledger": "mydb:main",
  ...
}

Duration

Transaction processing time in milliseconds:

{
  "t": 42,
  "duration_ms": 45,
  ...
}

Using Transaction IDs

Referencing Transactions

Store transaction ID for later reference:

const receipt = await transact({
  "@graph": [{ "@id": "ex:alice", "schema:name": "Alice" }]
});

// Store for audit trail
await logTransaction({
  entity: "ex:alice",
  operation: "create",
  transactionId: receipt.t,
  timestamp: receipt.timestamp
});

Historical Queries

Query data at specific transaction:

// Get data as it was at transaction 42
const historicalData = await query({
  from: `mydb:main@t:${receipt.t}`,
  select: ["?name"],
  where: [{ "@id": "ex:alice", "schema:name": "?name" }]
});

Commit Verification

Verify commit integrity by re-deriving the ContentId from fetched bytes:

async function verifyCommit(receipt) {
  const bytes = await contentStore.get(receipt.commit_id);
  const derivedCid = computeContentId("Commit", bytes);

  if (derivedCid !== receipt.commit_id) {
    throw new Error('Commit integrity violation!');
  }
}

Commit Chain

Commits form an immutable chain:

t=1 (cid:aaa) ← t=2 (cid:bbb) ← t=3 (cid:ccc) ← t=4 (cid:ddd)
  ↑                ↑                ↑                ↑
  |                |                |                |
previous=null   previous=aaa    previous=bbb    previous=ccc

Traversing History

Walk the commit chain:

async function getCommitHistory(ledger, fromT, toT) {
  const history = [];
  let currentT = fromT;
  
  while (currentT >= toT) {
    const commit = await getCommit(ledger, currentT);
    history.push(commit);
    currentT = commit.previous_t;
  }
  
  return history;
}

Querying Commit Metadata

SPARQL Query for Commits

PREFIX f: <https://ns.flur.ee/db#>

SELECT ?t ?timestamp ?commitId ?author
WHERE {
  ?commit a f:Commit ;
          f:t ?t ;
          f:timestamp ?timestamp ;
          f:commitId ?commitId .
  OPTIONAL { ?commit f:author ?author }
}
ORDER BY DESC(?t)
LIMIT 10

JSON-LD Query for Recent Commits

{
  "@context": {
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?t", "?timestamp", "?commitId"],
  "where": [
    { "@id": "?commit", "@type": "f:Commit" },
    { "@id": "?commit", "f:t": "?t" },
    { "@id": "?commit", "f:timestamp": "?timestamp" },
    { "@id": "?commit", "f:commitId": "?commitId" }
  ],
  "orderBy": ["-?t"],
  "limit": 10
}

Receipt Storage

Application Database

Store receipts in your application database:

CREATE TABLE transaction_receipts (
  id SERIAL PRIMARY KEY,
  ledger VARCHAR(255),
  transaction_t INTEGER,
  commit_id TEXT,
  timestamp TIMESTAMP,
  flakes_added INTEGER,
  flakes_retracted INTEGER,
  author VARCHAR(255),
  created_at TIMESTAMP DEFAULT NOW()
);

Document Store

Store as JSON documents:

await mongodb.collection('receipts').insertOne({
  ledger: receipt.ledger,
  t: receipt.t,
  commit_id: receipt.commit_id,
  timestamp: receipt.timestamp,
  flakes: {
    added: receipt.flakes_added,
    retracted: receipt.flakes_retracted
  },
  metadata: {
    author: receipt.author,
    duration_ms: receipt.duration_ms
  }
});

Time-Series Database

For analytics:

await influxdb.writePoint({
  measurement: 'transactions',
  tags: { ledger: receipt.ledger },
  fields: {
    t: receipt.t,
    flakes_added: receipt.flakes_added,
    flakes_retracted: receipt.flakes_retracted,
    duration_ms: receipt.duration_ms
  },
  timestamp: new Date(receipt.timestamp)
});

Audit Trail

Transaction Log

Build complete audit log from receipts:

async function buildAuditLog(ledger, startDate, endDate) {
  const receipts = await fetchReceipts(ledger, startDate, endDate);
  
  return receipts.map(r => ({
    time: r.timestamp,
    transactionId: r.t,
    author: r.author || 'anonymous',
    changes: {
      added: r.flakes_added,
      removed: r.flakes_retracted
    },
    commit: r.commit_id,
    verifiable: true
  }));
}

Compliance Reports

Generate compliance reports:

async function generateComplianceReport(ledger, period) {
  const receipts = await fetchReceipts(ledger, period.start, period.end);
  
  return {
    period: period,
    totalTransactions: receipts.length,
    totalChanges: receipts.reduce((sum, r) => sum + r.flakes_added, 0),
    authors: [...new Set(receipts.map(r => r.author))],
    verifiedChain: verifyCommitChain(receipts)
  };
}

Performance Monitoring

Transaction Metrics

Track transaction performance:

function analyzeReceipts(receipts) {
  const durations = receipts.map(r => r.duration_ms);
  const sizes = receipts.map(r => r.flakes_added + r.flakes_retracted);
  
  return {
    avgDuration: average(durations),
    maxDuration: Math.max(...durations),
    avgSize: average(sizes),
    maxSize: Math.max(...sizes),
    throughput: receipts.length / (period.hours)
  };
}

Alert on Anomalies

function checkForAnomalies(receipt) {
  if (receipt.duration_ms > 1000) {
    alert(`Slow transaction: ${receipt.t} took ${receipt.duration_ms}ms`);
  }
  
  if (receipt.flakes_added > 10000) {
    alert(`Large transaction: ${receipt.t} added ${receipt.flakes_added} flakes`);
  }
}

Best Practices

1. Always Store Receipts

Store transaction receipts for audit trail:

const receipt = await transact(transaction);
await storeReceipt(receipt);

2. Verify Commit Chain

Periodically verify commit chain integrity:

async function verifyChainIntegrity(ledger) {
  const receipts = await fetchAllReceipts(ledger);
  
  for (let i = 1; i < receipts.length; i++) {
    if (receipts[i].previous_commit_id !== receipts[i-1].commit_id) {
      throw new Error(`Chain broken at t=${receipts[i].t}`);
    }
  }
}

3. Use Transaction IDs for References

Store transaction IDs rather than timestamps:

Good:

{ entity: "ex:alice", createdAt_t: 42 }

Less reliable:

{ entity: "ex:alice", createdAt: "2024-01-22T10:30:00Z" }

4. Monitor Performance

Track receipt metadata for performance insights:

const avgDuration = receipts.reduce((sum, r) => sum + r.duration_ms, 0) / receipts.length;

5. Include in Error Handling

Log receipt info on errors:

try {
  const receipt = await transact(transaction);
  logger.info(`Transaction successful: t=${receipt.t}`);
} catch (err) {
  logger.error(`Transaction failed`, {
    error: err.message,
    transaction: transaction
  });
}

Indexing Side-Effects

Transactions in Fluree trigger background indexing processes that build query-optimized data structures. Understanding these side-effects is crucial for performance tuning and capacity planning.

What is Indexing?

Indexing is the process of building query-optimized data structures from transaction data. Fluree maintains four index permutations (SPOT, POST, OPST, PSOT) that enable efficient query execution.

Commit vs Index

Commit (immediate):

  • Transaction written to log
  • Small, append-only files
  • Published to nameservice immediately
  • Available for time travel queries

Index (asynchronous):

  • Query-optimized structures built
  • Background process
  • Published to nameservice when complete
  • May lag behind commits

Index Structure

Fluree maintains four index permutations:

SPOT (Subject-Predicate-Object-Time)

ex:alice → schema:name → "Alice" → [t=1, t=5, t=10]
ex:alice → schema:age → 30 → [t=1]
ex:alice → schema:age → 31 → [t=10]

Optimized for: “What are all properties of this subject?”

POST (Predicate-Object-Subject-Time)

schema:name → "Alice" → ex:alice → [t=1, t=5, t=10]
schema:age → 30 → ex:alice → [t=1]
schema:age → 31 → ex:alice → [t=10]

Optimized for: “What subjects have this property/value?”

OPST (Object-Predicate-Subject-Time)

"Alice" → schema:name → ex:alice → [t=1, t=5, t=10]
30 → schema:age → ex:alice → [t=1]
31 → schema:age → ex:alice → [t=10]

Optimized for: “What subjects have this value?”

PSOT (Predicate-Subject-Object-Time)

schema:name → ex:alice → "Alice" → [t=1, t=5, t=10]
schema:age → ex:alice → 30 → [t=1]
schema:age → ex:alice → 31 → [t=10]

Optimized for: “What are all values for this predicate?”

Indexing Pipeline

1. Transaction Commit

t=42: Transaction committed
  - Flakes written to transaction log
  - Commit published to nameservice
  - commit_t updated to 42

2. Index Trigger

Background indexing process detects new commits:

Indexer: commit_t=42, index_t=40
Indexer: Need to index t=41, t=42

3. Index Building

Process transactions to build indexes:

For each flake in t=41, t=42:
  - Update SPOT index
  - Update POST index
  - Update OPST index
  - Update PSOT index

4. Index Publication

When complete, publish new index:

  - Write index snapshot to storage
  - Publish index_id to nameservice
  - Update index_t to 42

Novelty Layer

The novelty layer is the gap between indexed and committed data:

commit_t = 45
index_t = 40
novelty layer = [t=41, t=42, t=43, t=44, t=45]

Query Execution with Novelty

Queries combine index + novelty:

Query Result = Indexed Data (t ≤ 40) + Novelty Layer (41 ≤ t ≤ 45)

Performance Impact:

  • Small novelty: Fast queries (mostly indexed)
  • Large novelty: Slower queries (more transaction replay)

Indexing Performance

Transaction Size Impact

Larger transactions take longer to index:

Transaction with 10 flakes:
  - 10 flakes × 4 indexes = 40 index updates
  - Indexing time: ~1ms

Transaction with 10,000 flakes:
  - 10,000 flakes × 4 indexes = 40,000 index updates
  - Indexing time: ~100ms

Indexing Rate

Typical indexing rates:

Light load:
  - 1,000 flakes/second
  - ~10 moderate transactions/second

Heavy load:
  - 10,000 flakes/second
  - ~100 moderate transactions/second

Actual rates depend on:

  • Hardware (CPU, disk I/O)
  • Storage backend (memory, file, AWS)
  • Transaction patterns
  • System load

Monitoring Indexing

Check Indexing Status

curl http://localhost:8090/v1/fluree/info/mydb:main

Response:

{
  "ledger_id": "mydb:main",
  "commit_t": 150,
  "index_t": 140
}

Indexing lag (txns): commit_t - index_t = number of unindexed transactions

Healthy vs Unhealthy

Healthy:

commit_t = 1000
index_t = 998
novelty = 2 transactions (good!)

Unhealthy:

commit_t = 1000
index_t = 850
novelty = 150 transactions (indexing lag!)

Indexing Lag

Indexing lag occurs when indexing can’t keep up with transaction rate.

Causes

  1. High Transaction Rate

    • More transactions than indexing can handle
    • Sustained write load
  2. Large Transactions

    • Individual transactions with many flakes
    • Bulk imports
  3. Resource Constraints

    • CPU bottleneck
    • Disk I/O bottleneck
    • Memory pressure
  4. Storage Backend Latency

    • Slow storage (network attached)
    • AWS S3 latency

Impact

Large indexing lag affects:

Query Performance:

  • More novelty to replay
  • Slower query execution
  • Higher CPU usage for queries

Memory Usage:

  • Novelty layer held in memory
  • Larger memory footprint

Backup/Recovery:

  • Larger gap to replay
  • Longer recovery times

Tuning Indexing

Background indexing is controlled primarily by:

  • Enabling/disabling background indexing (--indexing-enabled / FLUREE_INDEXING_ENABLED)
  • Novelty thresholds that trigger indexing / apply backpressure (--reindex-min-bytes, --reindex-max-bytes)

See Operations: Configuration and Background Indexing for the canonical settings and tuning guidance.

4. Dedicated Indexing Process

For high-load deployments, run dedicated indexer:

# Main server (transact only; background indexing disabled)
fluree-server --indexing-enabled=false

# Indexing server
./fluree-db-indexer --ledgers mydb:main,mydb:dev

Transaction Patterns and Indexing

Batch Transactions

Good pattern:

// Batch into reasonable sizes
const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
  const batch = entities.slice(i, i + batchSize);
  await transact({ "@graph": batch });
  
  // Allow indexing time
  if (i % (batchSize * 10) === 0) {
    await sleep(1000);
  }
}

Bad pattern:

// Single giant transaction
await transact({ "@graph": allEntities });  // 1 million entities!

Continuous Transactions

For continuous transaction load:

async function writeWithBackpressure(data) {
  const status = await checkIndexingStatus();
  
  const lag = status.commit_t - status.index_t;
  if (lag > 100) {
    // Too much lag, slow down
    await sleep(1000);
  }
  
  await transact(data);
}

Bulk Imports

For large imports:

async function bulkImport(entities) {
  const batchSize = 1000;
  
  for (let i = 0; i < entities.length; i += batchSize) {
    const batch = entities.slice(i, i + batchSize);
    await transact({ "@graph": batch });
    
    // Wait for indexing to catch up every 10 batches
    if ((i / batchSize) % 10 === 0) {
      await waitForIndexing();
    }
    
    console.log(`Imported ${i + batch.length} / ${entities.length}`);
  }
}

async function waitForIndexing() {
  while (true) {
    const status = await checkIndexingStatus();
    const lag = status.commit_t - status.index_t;
    if (lag < 5) break;
    await sleep(1000);
  }
}

Graph Source Indexing

Graph sources have their own indexing processes:

BM25 Indexing

Full-text search indexes built asynchronously:

t=100: Transaction with new documents
  - Main index updated
  - BM25 indexer triggered
  - Documents added to BM25 index

Vector Search Indexing

Vector embeddings can be indexed separately for approximate nearest-neighbor (ANN) search via HNSW vector indexes (implemented with usearch, feature-gated behind the vector feature).

Inline similarity functions (dotProduct, cosineSimilarity, euclideanDistance) do not require a separate graph-source index; they compute scores directly during query execution.

t=100: Transaction with embeddings
  - Main index updated
  - Vector indexer triggered
  - Vectors added to vector index

See Vector Search for details on HNSW vector indexes and query syntax.

Best Practices

1. Monitor Novelty Layer

Track indexing lag:

setInterval(async () => {
  const status = await checkIndexingStatus();
  const lag = status.commit_t - status.index_t;
  metrics.gauge('index_lag_txns', lag);
  
  if (lag > 100) {
    logger.warn(`High indexing lag: ${lag} transactions`);
  }
}, 10000);  // Check every 10 seconds

2. Batch Appropriately

Keep transactions reasonable size:

  • Recommended: 100-1000 entities per transaction
  • Maximum: 10,000 entities per transaction

3. Rate Limiting

Implement rate limiting for heavy write loads:

const rateLimiter = new RateLimiter({
  tokensPerInterval: 100,
  interval: "minute"
});

await rateLimiter.removeTokens(1);
await transact(data);

4. Scheduled Imports

Run large imports during off-hours:

if (isOffPeakHours()) {
  await runBulkImport();
} else {
  logger.info('Deferring bulk import to off-peak hours');
}

5. Alert on Lag

Set up alerts for indexing lag:

const lag = status.commit_t - status.index_t;
if (lag > 200) {
  alert('Critical: Indexing lag > 200 transactions');
}

6. Capacity Planning

Plan capacity based on write load:

Expected load: 10,000 transactions/day
Average size: 100 flakes/transaction
Total: 1,000,000 flakes/day

Indexing capacity needed: ~12 flakes/second
With 4× safety margin: ~50 flakes/second

Troubleshooting

High indexing lag

Symptom: commit_t - index_t growing continuously

Causes:

  • Transaction rate exceeds indexing capacity
  • Large transactions
  • Resource constraints

Solutions:

  • Reduce transaction rate
  • Split large transactions
  • Increase indexing resources
  • Tune indexing parameters

Slow Queries

Symptom: Queries slower than expected

Possible Cause: Large novelty layer

Check:

curl http://localhost:8090/v1/fluree/info/mydb:main | jq '.t - .index.t'

Solution: Wait for indexing or reduce write rate

Index Memory Usage

Symptom: High memory usage

Cause: Large indexes or large novelty layer

Solutions:

  • Increase system memory
  • Reduce novelty layer
  • Compact indexes (if supported)

Ledger Configuration (Config Graph)

Fluree stores ledger-level configuration as data inside each ledger, in a dedicated system graph called the config graph. This is distinct from server configuration (TOML files, environment variables) which controls how the Fluree process runs.

The config graph holds RDF triples that define operational defaults for the ledger: which policy rules apply, whether SHACL validation runs, what reasoning modes are active, which properties enforce uniqueness, and more. Because config lives inside the ledger, it is:

  • Immutable and time-travelable — config at any historical t is recoverable
  • Auditable — every config change is a signed, committed transaction
  • Replicable — config travels with the ledger across nodes and forks
  • Replay-safe — deterministic interpretation without runtime environment state

Graph layout

Every ledger reserves system named graphs:

GraphIRI patternPurpose
Default graph(implicit)Application data
Txn-metaurn:fluree:{ledger_id}#txn-metaCommit metadata
Config graphurn:fluree:{ledger_id}#configLedger configuration

User-defined named graphs (created via TriG) are identified by their IRI and allocated after the system graphs.

The config graph IRI is deterministic — derived from the ledger identifier. For a ledger mydb:main, the config graph is urn:fluree:mydb:main#config.

Core concepts

f:LedgerConfig

A single f:LedgerConfig resource in the config graph defines ledger-wide defaults. If multiple exist, the one with the lexicographically smallest @id wins (with a logged warning).

Setting groups

Configuration is organized into independent setting groups, each governing a different subsystem:

Setting groupSubsystemKey fields
f:policyDefaultsPolicy enforcementf:defaultAllow, f:policySource, f:policyClass
f:shaclDefaultsSHACL validationf:shaclEnabled, f:shapesSource, f:validationMode
f:reasoningDefaultsOWL/RDFS reasoningf:reasoningModes, f:schemaSource
f:datalogDefaultsDatalog rulesf:datalogEnabled, f:rulesSource
f:transactDefaultsTransaction constraintsf:uniqueEnabled, f:constraintsSource

Each group is resolved independently — locking down policy does not affect whether reasoning can be overridden.

Per-graph overrides

Ledger-wide defaults apply to all graphs. For finer control, f:graphOverrides on the f:LedgerConfig contains f:GraphConfig entries that override settings for specific named graphs. See Override control for the full resolution model.

Privileged system read

Config is read via a privileged system read that bypasses policy enforcement. This is necessary because config defines the policy — reading it through the policy-enforced path would create a circular dependency. User queries against the config graph still go through normal policy enforcement.

Lagging config

Config changes take effect on the next transaction, not the current one. The transaction pipeline reads config from the pre-transaction state. This prevents a transaction from “authorizing itself” by changing config within its own payload.

Common patterns

These recipes cover typical scenarios. Each assumes the ledger mydb:main — substitute your own ledger ID.

Lock down a production ledger

Deny all access by default and require policy rules for every operation. Use f:OverrideNone so no query can bypass:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:policyDefaults [
      f:defaultAllow false ;
      f:policySource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ] ;
      f:overrideControl f:OverrideNone
    ] .
}

After this transaction, the next transaction and all subsequent queries will require matching policy rules in the default graph. Make sure policy rules are already in place before enabling this — see Config mutation governance.

Enable SHACL validation in development (warn mode)

Validate data shapes but log warnings instead of rejecting — useful during development:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:shaclDefaults [
      f:shaclEnabled true ;
      f:shapesSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ] ;
      f:validationMode f:ValidationWarn
    ] .
}

Switch to f:ValidationReject when ready for production.

Enforce unique emails

Two-step setup: annotate the property, then enable enforcement:

@prefix f:  <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .

# Step 1: Annotate the property (in the default graph)
ex:email f:enforceUnique true .

# Step 2: Enable enforcement (in the config graph)
GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:transactDefaults [
      f:uniqueEnabled true
    ] .
}

See Unique constraints for full details including per-graph scoping and edge cases.

Enable RDFS reasoning by default

Automatically expand rdfs:subClassOf and rdfs:subPropertyOf in all queries:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:reasoningDefaults [
      f:reasoningModes f:RDFS ;
      f:schemaSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ]
    ] .
}

With f:OverrideAll (the default), individual queries can still opt out by passing "reasoning": "none".

Different policy per graph

Allow open access to most graphs but lock down a sensitive one:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:policyDefaults [
      f:defaultAllow true ;
      f:overrideControl f:OverrideAll
    ] ;
    f:graphOverrides (
      [ a f:GraphConfig ;
        f:targetGraph <http://example.org/sensitive> ;
        f:policyDefaults [
          f:defaultAllow false ;
          f:policySource [
            a f:GraphRef ;
            f:graphSource [ f:graphSelector f:defaultGraph ]
          ] ;
          f:overrideControl f:OverrideNone
        ]
      ]
    ) .
}

The sensitive graph requires policy rules and cannot be overridden at query time. All other graphs remain open.

Troubleshooting

Config changes seem to have no effect

Config uses lagging semantics — changes take effect on the next transaction, not the current one. If you enable SHACL and insert invalid data in the same transaction, the data will be accepted. The next transaction will enforce the new config.

Ledger became unmodifiable after policy misconfiguration

If you set f:defaultAllow false with f:OverrideNone before granting write access to the config graph, the ledger becomes locked — no transaction can modify it (including config changes). Recovery requires a ledger fork/restore. To prevent this:

  1. Always write policy rules first, then enable restrictive policy in a subsequent transaction
  2. Test with f:OverrideAll before switching to f:OverrideNone
  3. Ensure at least one identity has write access to the config graph before locking down

Multiple f:LedgerConfig resources

If the config graph contains more than one f:LedgerConfig resource, the system uses the one with the lexicographically smallest @id and logs a warning. Use the recommended subject IRI convention (urn:fluree:{ledger_id}:config:ledger) to avoid this.

Config graph query returns empty results

User queries against the config graph go through policy enforcement. If f:defaultAllow is false and no policy explicitly grants read access to the config graph, queries will return empty results even though config is active. The system’s internal privileged read is unaffected.

CLI usage

The config graph is written and queried through normal CLI transaction and query commands:

# Write config via TriG
fluree insert --ledger mydb:main --format trig config.trig

# Query the config graph via SPARQL
fluree query --ledger mydb:main --format sparql \
  'PREFIX f: <https://ns.flur.ee/db#>
   SELECT ?s ?p ?o
   FROM <urn:fluree:mydb:main#config>
   WHERE { ?s ?p ?o }'

No special CLI commands are needed — config is data, written and queried like any other named graph.

In this section

Writing Config Data

The config graph is mutated using normal ledger transactions — config writes are signed, versioned, and replicable like any other write. The only difference is that the triples target the config graph IRI.

Config graph IRI

Each ledger’s config graph has a deterministic IRI:

urn:fluree:{ledger_id}#config

For a ledger named mydb:main, the config graph is urn:fluree:mydb:main#config.

Writing via TriG

TriG is the most natural format for writing to named graphs. Wrap your config triples in a GRAPH block targeting the config graph IRI:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:policyDefaults [
      f:defaultAllow false
    ] ;
    f:shaclDefaults [
      f:shaclEnabled true ;
      f:validationMode f:ValidationReject
    ] .
}

Writing via SPARQL UPDATE

Use INSERT DATA with a GRAPH clause:

PREFIX f: <https://ns.flur.ee/db#>

INSERT DATA {
  GRAPH <urn:fluree:mydb:main#config> {
    <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
      f:reasoningDefaults [
        f:reasoningModes f:RDFS ;
        f:schemaSource [
          a f:GraphRef ;
          f:graphSource [ f:graphSelector f:defaultGraph ]
        ]
      ] .
  }
}

Writing via JSON-LD

Use the @graph key with a named graph wrapper:

{
  "@context": { "f": "https://ns.flur.ee/db#" },
  "@graph": [
    {
      "@id": "urn:fluree:mydb:main:config:ledger",
      "@type": "f:LedgerConfig",
      "@graph": "urn:fluree:mydb:main#config",
      "f:shaclDefaults": {
        "f:shaclEnabled": true,
        "f:validationMode": { "@id": "f:ValidationReject" }
      }
    }
  ]
}

Updating config

Config changes are normal ledger operations. To change a setting, use a DELETE/INSERT WHERE pattern that binds the existing blank node:

PREFIX f: <https://ns.flur.ee/db#>

DELETE {
  GRAPH <urn:fluree:mydb:main#config> {
    ?policy f:defaultAllow false .
  }
}
INSERT {
  GRAPH <urn:fluree:mydb:main#config> {
    ?policy f:defaultAllow true .
  }
}
WHERE {
  GRAPH <urn:fluree:mydb:main#config> {
    <urn:fluree:mydb:main:config:ledger> f:policyDefaults ?policy .
    ?policy f:defaultAllow false .
  }
}

This pattern binds ?policy to the existing setting-group blank node, retracts the old value, and asserts the new one. It avoids the problem of DELETE DATA with blank nodes (which cannot match stored blank node identities).

Alternatively, give setting-group nodes explicit IRIs so they can be addressed directly:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:policyDefaults <urn:fluree:mydb:main:config:policy> .

  <urn:fluree:mydb:main:config:policy>
    f:defaultAllow false ;
    f:overrideControl f:OverrideAll .
}

With explicit IRIs, individual fields can be retracted by subject IRI without binding.

Retracting a field returns the ledger to the system default for that setting (as if the field were absent).

Config mutation governance

Config writes go through the normal policy-enforced transaction path. This means:

  • Reading config is privileged (system read, bypasses policy) — necessary to bootstrap.
  • Writing config is not privileged — policy enforcement applies.

A defaultAllow: false config is self-protecting: the policy it defines must explicitly grant write access to the config graph for any changes to be possible.

If a ledger becomes unmodifiable due to a policy misconfiguration (no authorized config writers), recovery requires a ledger fork/restore — there is no superuser bypass.

For operational simplicity, use a stable, conventional subject IRI:

urn:fluree:{ledger_id}:config:ledger

Colons (not a second # fragment) keep the IRI well-formed: the graph IRI already uses a fragment (#config), and RFC 3986 allows only one fragment per IRI. Using colons produces a valid URN (RFC 8141) that stays scoped to the ledger and avoids accidental multiple-config instances.

Querying the config graph

The config graph is a named graph like any other — you can query it with SPARQL or JSON-LD to inspect the current configuration.

SPARQL

PREFIX f: <https://ns.flur.ee/db#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?setting ?pred ?val
FROM <mydb:main#config>
WHERE {
  ?config rdf:type f:LedgerConfig ;
          ?setting ?group .
  ?group ?pred ?val .
  FILTER(?setting IN (
    f:policyDefaults, f:shaclDefaults, f:reasoningDefaults,
    f:datalogDefaults, f:transactDefaults
  ))
}

JSON-LD query

{
  "@context": { "f": "https://ns.flur.ee/db#" },
  "from": {
    "@id": "mydb:main",
    "graph": "urn:fluree:mydb:main#config"
  },
  "select": ["?config", "?pred", "?val"],
  "where": [
    { "@id": "?config", "@type": "f:LedgerConfig", "?pred": "?val" }
  ]
}

Ledger-scoped endpoint

curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
  -H "Content-Type: application/sparql-query" \
  -d 'PREFIX f: <https://ns.flur.ee/db#>
      SELECT ?s ?p ?o
      FROM <urn:fluree:mydb:main#config>
      WHERE { ?s ?p ?o }'

Policy applies to reads

User queries against the config graph go through normal policy enforcement. If f:defaultAllow is false and no policy grants read access to the config graph, user queries will return empty results. The system still reads config via a privileged path (bypassing policy), so config always takes effect regardless of policy.

Time-travel

Config is part of the ledger’s immutable commit chain. You can query config at any historical point:

PREFIX f: <https://ns.flur.ee/db#>

SELECT ?setting ?val
FROM <mydb:main@t:5#config>
WHERE {
  ?config a f:LedgerConfig ;
          f:policyDefaults ?policy .
  ?policy ?setting ?val .
}

Lagging semantics

Config changes take effect on the next transaction. The transaction pipeline reads config from the pre-transaction state (t - 1). This prevents a transaction from changing the rules it is validated against.

This means:

  • Enabling SHACL in the same transaction as invalid data will not reject that data
  • Enabling f:uniqueEnabled in the same transaction as duplicate values will not reject those duplicates
  • The next transaction after the config change will be validated against the new config

Setting Groups

Each setting group configures a different subsystem. Groups are resolved independently — locking down one group does not affect others.

All setting groups can appear on both f:LedgerConfig (ledger-wide defaults) and f:GraphConfig (per-graph overrides), except where noted.

System defaults

When no config graph is present (or a setting group is absent), the system defaults apply:

Setting groupSystem default
Policyf:defaultAllow true — all queries and transactions are permitted
SHACLDisabled — no shape validation
ReasoningDisabled — no OWL/RDFS inference
DatalogDisabled — no rule evaluation
Transact constraintsDisabled — no uniqueness enforcement
Override controlf:OverrideAll — any request can override any setting

In other words, an unconfigured ledger is fully open: no policy, no validation, no reasoning. This matches the behavior of a fresh ledger and ensures backward compatibility.


Policy defaults

Group predicate: f:policyDefaults

Controls default policy enforcement behavior.

FieldTypeDefaultDescription
f:defaultAllowbooleantrueAllow (true) or deny (false) when no policy rule matches
f:policySourcef:GraphRef(none)Graph containing policy rules (f:Allow, f:Modify, etc.)
f:policyClassIRI or list(none)Default policy classes to apply
f:overrideControlIRI or objectf:OverrideAllOverride gating (see Override control)

f:policySource is non-overridable — it can only be changed by writing to the config graph, not at query time. f:defaultAllow and f:policyClass are overridable (subject to override control).

When f:policySource is set, the policy loader scans the specified graph for policy rules instead of the default graph. This keeps policy rules separate from end-user data. If f:policySource is not set, policies are loaded from the default graph (backward compatible).

Current limitations: f:policySource only supports same-ledger graphs. Cross-ledger references (f:ledger), temporal pinning (f:atT), trust policy, and rollback guard fields are parsed but will produce an error if configured.

Example: policies in the default graph

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:policyDefaults [
      f:defaultAllow false ;
      f:policySource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ] ;
      f:overrideControl f:OverrideAll
    ] .
}

Example: policies in a named graph

Storing policy rules in a dedicated named graph keeps them out of the default data graph. The identity’s f:policyClass triples must also be in the policy graph.

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:policyDefaults [
      f:defaultAllow false ;
      f:policySource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector <urn:fluree:mydb:main/policy> ]
      ]
    ] .
}

SHACL defaults

Group predicate: f:shaclDefaults

Controls SHACL shape validation at transaction time.

FieldTypeDefaultDescription
f:shaclEnabledbooleanfalseEnable or disable SHACL validation
f:shapesSourcef:GraphRef(none)Graph containing SHACL shapes
f:validationModeIRIf:ValidationRejectf:ValidationReject (reject invalid data) or f:ValidationWarn (log warning, allow)
f:overrideControlIRI or objectf:OverrideAllOverride gating

f:shapesSource is non-overridable. f:shaclEnabled and f:validationMode are overridable.

Example

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:shaclDefaults [
      f:shaclEnabled true ;
      f:shapesSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ] ;
      f:validationMode f:ValidationReject ;
      f:overrideControl f:OverrideNone
    ] .
}

Reasoning defaults

Group predicate: f:reasoningDefaults

Controls OWL/RDFS reasoning applied at query time.

FieldTypeDefaultDescription
f:reasoningModesIRI or list(none)Reasoning modes: f:RDFS, f:OWL2QL, f:OWL2RL, f:Datalog
f:schemaSourcef:GraphRef(none)Graph containing schema triples (rdfs:subClassOf, etc.)
f:overrideControlIRI or objectf:OverrideAllOverride gating

f:schemaSource is non-overridable. f:reasoningModes is overridable.

Example

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:reasoningDefaults [
      f:reasoningModes f:RDFS ;
      f:schemaSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ] ;
      f:overrideControl f:OverrideAll
    ] .
}

Datalog defaults

Group predicate: f:datalogDefaults

Controls Fluree’s stored datalog rules (f:rule).

FieldTypeDefaultDescription
f:datalogEnabledbooleanfalseEnable or disable datalog rule evaluation
f:rulesSourcef:GraphRef(none)Graph containing f:rule definitions
f:allowQueryTimeRulesbooleantrueAllow queries to supply ad-hoc rules
f:overrideControlIRI or objectf:OverrideAllOverride gating

f:rulesSource is non-overridable. f:datalogEnabled and f:allowQueryTimeRules are overridable.

Example

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:datalogDefaults [
      f:datalogEnabled true ;
      f:rulesSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ] ;
      f:allowQueryTimeRules false ;
      f:overrideControl f:OverrideNone
    ] .
}

Transact defaults

Group predicate: f:transactDefaults

Controls transaction-time constraint enforcement, such as property value uniqueness.

FieldTypeDefaultDescription
f:uniqueEnabledbooleanfalseEnable unique constraint enforcement
f:constraintsSourcef:GraphRef or listdefault graphGraph(s) containing constraint annotations (e.g., f:enforceUnique)
f:overrideControlIRI or objectf:OverrideAllOverride gating

When f:uniqueEnabled is true and f:constraintsSource is omitted, the default graph is used as the constraint source.

Additive merge semantics

Unlike other setting groups where per-graph values replace ledger-wide values field-by-field, transact defaults use additive merge semantics:

  • f:uniqueEnabled: Once enabled at the ledger level, it stays enabled for all graphs. Per-graph configs cannot disable it.
  • f:constraintsSource: Per-graph sources are added to ledger-wide sources, not substituted. A graph checks annotations from all sources (ledger-wide + graph-specific).

This prevents a per-graph override from accidentally disabling enforcement or dropping constraint sources.

Note: additive merge is still subject to override control. If the ledger-wide f:overrideControl for f:transactDefaults is f:OverrideNone, per-graph additions are blocked entirely — the ledger-wide settings are final.

Example

@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .

# Define constraint annotations in the default graph
ex:email f:enforceUnique true .
ex:ssn   f:enforceUnique true .

# Enable enforcement via config
GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:transactDefaults [
      f:uniqueEnabled true ;
      f:constraintsSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ]
    ] .
}

See Unique constraints for full details on f:enforceUnique.


Full-text defaults

Group predicate: f:fullTextDefaults

Declares properties whose string values should be indexed for BM25 full-text scoring without requiring the @fulltext datatype per value, and sets the default analyzer language for untagged plain strings.

FieldTypeDefaultDescription
f:defaultLanguageBCP-47 string"en"Analyzer language for plain (xsd:string) values on configured properties
f:propertyf:FullTextProperty listemptyOne node per property to full-text index
f:overrideControlIRI or objectf:OverrideAllOverride gating

Each f:property entry is an f:FullTextProperty node carrying f:target — the IRI of the property being indexed. Additional optional knobs (per-property language, tokenizer, etc.) can be added to f:FullTextProperty in the future without breaking the schema.

The @fulltext datatype retains its zero-config shortcut semantics: any value tagged @fulltext always indexes as English, regardless of what f:fullTextDefaults declares. Configured plain-string paths and @fulltext-datatype English content share the same per-property English arena — no duplication.

rdf:langString values auto-route to per-language arenas by their tag. An unrecognized BCP-47 tag tokenizes + lowercases only (no stopwords, no stemming) — consistent on both indexing and query sides.

Additive merge semantics

Like f:transactDefaults, f:fullTextDefaults uses additive merge. Per-graph f:property entries are appended to the ledger-wide list (deduping by target IRI — per-graph wins on a collision). Per-graph f:defaultLanguage shadows the ledger-wide value. Ledger-wide f:OverrideNone blocks per-graph overrides entirely.

Config changes require a manual reindex

Editing f:fullTextDefaults never triggers any indexing automatically. Arenas reflect the config that was in effect at their build time; to pick up a changed property list or default language, run a full reindex (fluree reindex … or equivalent). Until then, existing arenas stay authoritative and novelty written after the config change is scored with whatever language the current effective config resolves to — which may produce temporarily mismatched scoring until the reindex completes.

An in-flight reindex operates on a point-in-time snapshot and will not see a config change committed during its run. Wait for the reindex to finish, then trigger a new one against the post-change state.

Example

@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:fullTextDefaults [
      a f:FullTextDefaults ;
      f:defaultLanguage "en" ;
      f:property [ a f:FullTextProperty ; f:target ex:title ] ,
                 [ a f:FullTextProperty ; f:target ex:body ]
    ] .
}

Per-graph override example

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:fullTextDefaults [
      a f:FullTextDefaults ;
      f:defaultLanguage "en" ;
      f:property [ a f:FullTextProperty ; f:target ex:title ]
    ] ;
    f:graphOverrides [
      a f:GraphConfig ;
      f:targetGraph <urn:example:productCatalog> ;
      f:fullTextDefaults [
        a f:FullTextDefaults ;
        f:defaultLanguage "es" ;
        f:property [ a f:FullTextProperty ; f:target ex:productName ]
      ]
    ] .
}

Under this config, queries touching the productCatalog graph analyze untagged plain strings as Spanish ("es"); other graphs keep English. ex:title is full-text indexed everywhere (ledger-wide); ex:productName is indexed only in the productCatalog graph.

See Inline fulltext search for the end-user guide — when to pick this path over the @fulltext datatype, supported languages, per-graph multilingual setups, the reindex workflow, and how configured properties interact with @fulltext-datatype values.


Ledger-scoped settings

Some settings are structurally tied to the ledger as a whole and are not meaningful per-graph. They live exclusively on f:LedgerConfig and are ignored if present on f:GraphConfig.

Override control does not apply to ledger-scoped settings — they are changed only by writing to the config graph.

Note: f:authzSource (an identity/relationship graph used by policy evaluation) is planned as a ledger-scoped setting but is not yet implemented. When available, it will let the config graph specify which graph contains identity data (e.g., DID→role mappings) for policy resolution.


f:GraphRef: referencing source graphs

Several fields (f:policySource, f:shapesSource, f:schemaSource, f:rulesSource, f:constraintsSource) use f:GraphRef to point at graphs containing rules, shapes, schema, or constraints.

A f:GraphRef has two levels: the outer node carries the type and optional trust/rollback settings, and a nested f:graphSource object carries the source coordinates:

FieldLevelTypeDescription
f:graphSourcef:GraphRefobjectNested source coordinates (required)
f:trustPolicyf:GraphRefobjectHow to verify the referenced graph (future)
f:rollbackGuardf:GraphRefobjectFreshness constraints (future)
f:graphSelectorf:graphSourceIRITarget graph: f:defaultGraph, f:txnMetaGraph, or a named graph IRI
f:ledgerf:graphSourceIRILedger identifier (for cross-ledger references; not yet supported for constraint sources)
f:atTf:graphSourceintegerPin to a specific transaction time (optional)

For the common case of referencing a graph within the same ledger, only f:graphSelector is needed inside f:graphSource:

f:shapesSource [
  a f:GraphRef ;
  f:graphSource [ f:graphSelector f:defaultGraph ]
] .

For referencing the config graph itself (co-resident rules/shapes):

f:policySource [
  a f:GraphRef ;
  f:graphSource [ f:graphSelector <urn:fluree:mydb:main#config> ]
] .

Cross-ledger f:GraphRef (using f:ledger to reference another ledger) is defined in the schema but not yet supported for constraint source resolution. Currently, only local graph references are resolved.

Override Control

Fluree’s config resolution follows a three-tier precedence model. Each setting group is resolved independently, and an override control mechanism governs whether higher-priority sources can change values set at lower tiers.

Resolution precedence

Settings are resolved from lowest to highest priority:

PrioritySourceWhen it applies
4 (lowest)System defaultsNo config present (allow-all, no SHACL, no reasoning)
3Ledger-wide config (f:LedgerConfig)Fallback for any setting not overridden at higher tiers
2Per-graph config (f:GraphConfig)Only if ledger-wide override control permits
1 (highest)Query/transaction-time optsOnly if effective override control permits + identity check passes

Override control modes

Each setting group may include an f:overrideControl field controlling whether higher-priority sources can override the value.

ModeValueBehavior
No overridesf:OverrideNoneConfig values are final. No per-graph or query-time overrides permitted.
All overridesf:OverrideAllAny request can override. Default when f:overrideControl is absent.
Identity-gatedObject with f:controlMode: f:IdentityRestrictedOnly requests with a server-verified identity matching f:allowedIdentities can override.

Identity-gated example

{
  "f:overrideControl": {
    "f:controlMode": { "@id": "f:IdentityRestricted" },
    "f:allowedIdentities": [
      { "@id": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK" }
    ]
  }
}

Identity verification

Override identity is the server-verified request identity (canonical DID string), not a user-supplied query parameter. Specifically:

  • With the credential feature: the DID from the verified JWS kid header
  • With server auth middleware: the DID mapped from an OAuth token
  • A caller cannot become an allowed identity by setting "opts": {"identity": "..."} in query JSON — that field is for policy evaluation context, not override authorization
  • Anonymous requests (no verified identity) are always denied by f:IdentityRestricted

Query-time vs transact-time overrides

  • Query-time overrides (reasoning modes, policy opts): identity is the query caller
  • Transact-time overrides (SHACL mode, validation settings): identity is the transaction signer

Monotonicity: per-graph can only tighten

Ledger-wide f:overrideControl sets the maximum permissiveness. Per-graph configs may only restrict further, never loosen.

Permissiveness ordering: f:OverrideNone < f:IdentityRestricted < f:OverrideAll

The effective per-graph override control is min(ledger-wide, per-graph):

Ledger-widePer-graphEffectiveWhy
OverrideNoneOverrideAllOverrideNonePer-graph cannot loosen (warning logged)
IdentityRestricted({alice})OverrideAllIdentityRestricted({alice})Per-graph cannot loosen
IdentityRestricted({alice, bob})IdentityRestricted({alice})IdentityRestricted({alice})Intersection: per-graph tightens
OverrideAllOverrideNoneOverrideNonePer-graph tightens (valid)
OverrideAllIdentityRestricted({alice})IdentityRestricted({alice})Per-graph tightens (valid)
OverrideAll(absent)OverrideAllInherits ledger-wide

When both are IdentityRestricted, the effective allowedIdentities is the intersection of the two lists.

Resolution algorithm

For each setting group independently:

1. Start with system defaults
2. Apply ledger-wide config for this group (if present)
3. Get ledger-wide overrideControl (default: OverrideAll)
4. If ledger-wide overrideControl is OverrideNone:
     → this group is final. Skip to step 8.
5. Apply per-graph config for this group (if present)
6. Compute effective overrideControl:
     = min(ledgerWide, perGraph)
     If both IdentityRestricted: allowedIdentities = intersection
7. Check effective overrideControl against query/txn-time opts:
     OverrideNone         → config values are final
     OverrideAll          → apply query-time opts
     IdentityRestricted   → apply only if request identity matches
8. Result is the effective setting for this group.

Per-group truth tables

Policy (f:policyDefaults)

Ledger-widePer-graphQuery (identity)EffectiveWhy
defaultAllow: false, OverrideNone(none)defaultAllow: true (any)denyNo overrides allowed
defaultAllow: false, OverrideAll(none)defaultAllow: true (any)allowAll overrides allowed
defaultAllow: false, IdentityRestricted({alice})(none)defaultAllow: true (alice)allowAlice is authorized
defaultAllow: false, IdentityRestricted({alice})(none)defaultAllow: true (bob)denyBob not authorized
defaultAllow: false, IdentityRestricted({alice})(none)defaultAllow: true (anon)denyNo identity = no override
defaultAllow: false, OverrideNonedefaultAllow: true(none)denyOverrideNone blocks per-graph
defaultAllow: false, OverrideAlldefaultAllow: true(none)allowPer-graph overrides ledger-wide
defaultAllow: true, OverrideAlldefaultAllow: false, OverrideNonedefaultAllow: true (any)denyPer-graph OverrideNone blocks query
(none)(none)(none)allowSystem default (allow-all)

Reasoning (f:reasoningDefaults)

Ledger-widePer-graphQuery (identity)EffectiveWhy
modes: [rdfs], OverrideNone(none)reasoning: [owl2-rl] (any)rdfsNo overrides
modes: [rdfs], OverrideAll(none)reasoning: [owl2-rl] (any)owl2-rlOverride allowed
modes: [rdfs], IdentityRestricted({alice})(none)reasoning: [owl2-rl] (alice)owl2-rlAlice authorized
modes: [rdfs], IdentityRestricted({alice})(none)reasoning: [owl2-rl] (bob)rdfsBob not authorized
modes: [rdfs], OverrideAllmodes: [owl2-rl](none)owl2-rlPer-graph overrides
modes: [rdfs], OverrideNonemodes: [owl2-rl](none)rdfsOverrideNone blocks per-graph

SHACL (f:shaclDefaults)

Ledger-widePer-graphEffectiveWhy
enabled: false, OverrideNoneenabled: truedisabledOverrideNone blocks per-graph
enabled: true, OverrideAllenabled: falsedisabledPer-graph disables for its graph
mode: warn, OverrideAllmode: rejectrejectPer-graph overrides

Transact (f:transactDefaults)

Transact defaults use additive merge semantics, unlike other groups. However, the general override control rule still applies: if the ledger-wide f:overrideControl is f:OverrideNone, per-graph transact defaults are blocked entirely.

Ledger-widePer-graphEffectiveWhy
uniqueEnabled: trueuniqueEnabled: falseenabledMonotonic OR — cannot disable
uniqueEnabled: true, sources: [default]sources: [schemaGraph]sources: [default, schemaGraph]Additive — sources accumulate
uniqueEnabled: falseuniqueEnabled: trueenabledPer-graph can enable
uniqueEnabled: true, OverrideNonesources: [schemaGraph]sources: [default] onlyOverrideNone blocks per-graph additions

Overridable vs non-overridable fields

Not all fields in a setting group are overridable. Source pointers (where rules/shapes/schema come from) are always config-only:

SubsystemOverridable fieldsNon-overridable (config-only)
f:policyDefaultsf:defaultAllow, f:policyClassf:policySource
f:shaclDefaultsf:validationMode, f:shaclEnabledf:shapesSource
f:reasoningDefaultsf:reasoningModesf:schemaSource
f:datalogDefaultsf:datalogEnabled, f:allowQueryTimeRulesf:rulesSource

Non-overridable fields can only be changed by writing to the config graph. This prevents a query from redirecting the engine to read rules or schema from an arbitrary graph.

Per-graph overrides

Per-graph overrides target specific named graphs by IRI:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:policyDefaults [
      f:defaultAllow true ;
      f:overrideControl f:OverrideAll
    ] ;
    f:graphOverrides (
      [ a f:GraphConfig ;
        f:targetGraph <http://example.org/sensitive> ;
        f:policyDefaults [
          f:defaultAllow false ;
          f:overrideControl f:OverrideNone
        ]
      ]
    ) .
}

In this example:

  • All graphs default to defaultAllow: true with OverrideAll
  • http://example.org/sensitive overrides to defaultAllow: false with OverrideNone — no query can override policy for this graph
  • f:targetGraph uses f:defaultGraph for the default graph

Unique Constraints (f:enforceUnique)

Fluree supports transaction-time enforcement of property value uniqueness via f:enforceUnique. This is complementary to SHACL — it runs independently.

How it works

Unique constraint enforcement has two parts:

  1. Annotation: Mark properties as unique by asserting f:enforceUnique true on their IRIs in any graph
  2. Activation: Enable enforcement in the config graph via f:transactDefaults

This separation follows the same pattern as SHACL (shapes + config activation) and reasoning (schema + config activation). Annotations alone do nothing — enforcement must be explicitly enabled.

Step 1: Define unique properties

Assert f:enforceUnique true on any property IRI that should enforce uniqueness. These annotations can live in the default graph or any named graph:

@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .

# In the default graph
ex:email f:enforceUnique true .
ex:ssn   f:enforceUnique true .

Step 2: Enable enforcement

Enable unique constraint checking in the config graph:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:transactDefaults [
      f:uniqueEnabled true
    ] .
}

When f:constraintsSource is omitted, the default graph is used as the annotation source.

Explicit constraint source

To read annotations from a specific graph:

@prefix f: <https://ns.flur.ee/db#> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:transactDefaults [
      f:uniqueEnabled true ;
      f:constraintsSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ]
    ] .
}

Multiple constraint sources

Multiple sources can be specified — all are checked:

f:transactDefaults [
  f:uniqueEnabled true ;
  f:constraintsSource [
    a f:GraphRef ;
    f:graphSource [ f:graphSelector f:defaultGraph ]
  ] , [
    a f:GraphRef ;
    f:graphSource [ f:graphSelector <http://example.org/schema> ]
  ]
] .

What gets enforced

Once enabled, any transaction that would result in two or more distinct subjects holding the same value for a unique property within the same graph is rejected.

Scoping: per-graph

Uniqueness is enforced per graph. The same value on the same property is allowed across different named graphs:

# Graph A: ex:alice ex:email "alice@example.com" — OK
# Graph B: ex:bob   ex:email "alice@example.com" — OK (different graph)
# Graph A: ex:carol ex:email "alice@example.com" — REJECTED (same graph as alice)

Value identity

Uniqueness is determined by the storage-layer value representation, not by RDF strict equality. The uniqueness key is:

(graph, predicate, value)

where “value” is the internal storage representation (type discriminant + payload).

The enforcement query matches on (predicate, object) in the POST index without constraining by datatype or language tag. This means:

  • Two values with different datatype IRIs but the same internal representation are treated as the same value. For example, "hello"^^xsd:string and "hello"^^ex:customType both store as the same string value internally, so they conflict.
  • Two values with different language tags but the same string content conflict, because the language tag is metadata, not part of the value key.
  • Two values with different internal representations are naturally distinct. For example, "42" (stored as a string) and 42 (stored as an integer) do not conflict because they are different value types at the storage layer.

This design matches how humans think about value identity and prevents circumventing uniqueness by attaching a different datatype annotation or language tag.

Intra-transaction enforcement

Uniqueness is checked after staging, so conflicts within a single transaction are caught:

{
  "@context": { "ex": "http://example.org/ns/" },
  "@graph": [
    { "@id": "ex:alice", "ex:email": "same@example.com" },
    { "@id": "ex:bob",   "ex:email": "same@example.com" }
  ]
}

This transaction is rejected because two subjects assert the same value for a unique property.

Upsert safety

Upserts that change a value are handled correctly. When an upsert retracts the old value and asserts a new one in the same transaction, the old value is no longer active — no false positive.

Idempotent re-insert

Re-asserting the same (subject, property, value) triple that already exists is allowed. One subject still holds the value — no violation.

Error message

When a uniqueness violation is detected, the transaction fails with an error like:

Unique constraint violation: property <http://example.org/ns/email>
  value "alice@example.com" already exists for subject
  <http://example.org/ns/alice> in graph default
  (conflicting subject: <http://example.org/ns/bob>)

Lagging config

Config is read from the pre-transaction state. This means:

  • Enabling f:uniqueEnabled and inserting duplicate values in the same transaction will not reject the duplicates
  • The next transaction will enforce the constraint

This is intentional and consistent with all other config graph features.

Per-graph overrides

Transact defaults use additive merge semantics:

  • f:uniqueEnabled uses monotonic OR — once enabled at the ledger level, per-graph configs cannot disable it
  • f:constraintsSource is additive — per-graph sources are added to (not replace) ledger-wide sources

Note: additive merge is still subject to override control. If the ledger-wide f:overrideControl for f:transactDefaults is f:OverrideNone, per-graph additions are blocked entirely.

This means a per-graph override can add additional constraint sources but cannot remove ledger-wide ones:

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:transactDefaults [
      f:uniqueEnabled true ;
      f:constraintsSource [
        a f:GraphRef ;
        f:graphSource [ f:graphSelector f:defaultGraph ]
      ]
    ] ;
    f:graphOverrides (
      [ a f:GraphConfig ;
        f:targetGraph <http://example.org/graphX> ;
        f:transactDefaults [
          f:constraintsSource [
            a f:GraphRef ;
            f:graphSource [ f:graphSelector <http://example.org/schema> ]
          ]
        ]
      ]
    ) .
}

In this example, graphX checks unique annotations from both the default graph (ledger-wide) and http://example.org/schema (per-graph addition).

Zero cost when not configured

When f:uniqueEnabled is not set or is false, uniqueness checking is completely skipped — no property scan, no index queries, no overhead. The enforcement code fast-paths out immediately.

Complete example

@prefix f:  <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .

# 1. Define unique annotations in the default graph
ex:email f:enforceUnique true .

# 2. Enable enforcement in the config graph
GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:transactDefaults [
      f:uniqueEnabled true
    ] .
}

After this transaction, the next transaction that attempts to give two subjects the same ex:email value (within the same graph) will be rejected.

Security and Policy

Fluree provides comprehensive security features including authentication, fine-grained access control through policies, and transparent encryption of data at rest.

Authentication

Authentication

Fluree’s authentication model, covering:

  • Identity vs transport (DIDs, signed requests, Bearer tokens)
  • Three auth modes: decentralized did:key, standalone server tokens, OIDC/OAuth2
  • Bearer token claim set and scope definitions
  • Replication vs query access boundary
  • Token verification paths (Ed25519 + OIDC/JWKS)

Data Encryption

Storage Encryption

Protect data at rest with AES-256-GCM encryption:

  • Transparent encryption/decryption
  • Environment variable key configuration
  • Portable ciphertext format
  • Key rotation support

Commit Integrity

Commit Signing and Attestation

Cryptographic proof of which node wrote a commit:

  • Ed25519 signatures over domain-separated commit digests
  • Embedded signature blocks in commit files
  • did:key signer identities
  • Future: detached attestations and consensus policies

Policy System

Policy Model and Inputs

Understanding Fluree’s policy architecture:

  • Policy structure and syntax
  • Subject, action, resource model
  • Policy evaluation order
  • Input data for policy decisions
  • Default allow vs default deny

Policy in Queries

How policies affect query execution:

  • Query-time filtering
  • Result set restrictions
  • Pattern-based filtering
  • Performance considerations
  • Policy debugging for queries

Policy in Transactions

How policies affect transaction operations:

  • Transaction validation
  • Authorization checks
  • Entity-level permissions
  • Property-level permissions
  • Policy-based retractions

Programmatic Policy API (Rust)

Using policies in Rust applications:

  • wrap_identity_policy_view - Identity-based policy lookup via f:policyClass
  • wrap_policy_view - Inline policies with QueryConnectionOptions
  • Policy precedence rules
  • Transaction-side policy enforcement
  • Historical views with policy

Key Concepts

Data-Level Security

Fluree enforces security at the data level, not just the application level:

  • Users see only authorized data
  • Policies applied during query execution
  • No unauthorized data leakage
  • Transparent to applications

Policy as Data

Policies are stored as RDF triples in the database:

  • Version controlled with data
  • Query policies like any data
  • Time travel for policy history
  • Policies can reference other data

Identity-Based Access

Policies use decentralized identifiers (DIDs):

  • did:key for cryptographic identity
  • did:web for organization identity
  • Signed requests link to DID
  • Policies grant/deny based on DID

Policy Structure

Basic policy format:

{
  "@context": {
    "f": "https://ns.flur.ee/db#",
    "ex": "http://example.org/ns/"
  },
  "@id": "ex:read-policy",
  "@type": "f:Policy",
  "f:subject": "did:key:z6Mkh...",
  "f:action": "query",
  "f:resource": {
    "@type": "schema:Person"
  },
  "f:allow": true
}

Subject: Who (DID, role, group) Action: What operation (query, transact) Resource: Which data (type, predicate, specific entities) Allow/Deny: Grant or deny access

Policy Enforcement Points

Query Time

Policies filter query results:

SELECT ?name
WHERE {
  ?person schema:name ?name .
}

Policy filters results to only show authorized people.

Transaction Time

Policies validate transaction operations:

{
  "@graph": [
    { "@id": "ex:alice", "schema:age": 31 }
  ]
}

Policy checks if user can modify ex:alice.

Common Policy Patterns

Allow All (Development)

{
  "@id": "ex:allow-all",
  "f:subject": "*",
  "f:action": "*",
  "f:allow": true
}

Role-Based Access

{
  "@id": "ex:admin-policy",
  "f:subject": { "ex:role": "admin" },
  "f:action": "*",
  "f:allow": true
}

Resource-Type Based

{
  "@id": "ex:public-data-policy",
  "f:subject": "*",
  "f:action": "query",
  "f:resource": { "@type": "ex:PublicData" },
  "f:allow": true
}

Property-Level Access

{
  "@id": "ex:sensitive-property-policy",
  "f:subject": { "ex:role": "hr" },
  "f:action": "query",
  "f:resource": {
    "f:predicate": "ex:salary"
  },
  "f:allow": true
}

Owner-Based Access

{
  "@id": "ex:owner-policy",
  "f:subject": "?user",
  "f:action": ["query", "transact"],
  "f:resource": {
    "ex:owner": "?user"
  },
  "f:allow": true
}

Policy Evaluation

Evaluation Order

  1. Collect applicable policies based on subject, action, resource
  2. Evaluate each policy against request context
  3. Combine results using policy combining algorithm
  4. Apply default if no policies match

Combining Algorithms

Deny Overrides (default):

  • If any policy denies, access denied
  • Otherwise, allow if any policy allows
  • Default: deny if no matches

Allow Overrides:

  • If any policy allows, access granted
  • Otherwise, deny if any policy denies
  • Default: deny if no matches

Policy Context

Policies have access to runtime context:

Request Context:

  • Subject DID
  • Action being performed
  • Target resource/entity
  • Timestamp

Data Context:

  • Entity properties
  • Related entities
  • Graph structure
  • Historical data

Example using context:

{
  "f:subject": "?user",
  "f:resource": {
    "ex:department": "?dept"
  },
  "f:condition": "?user ex:department ?dept",
  "f:allow": true
}

Allows access if user is in same department as resource.

Multi-Tenant Policies

Isolate data by tenant:

{
  "@id": "ex:tenant-isolation-policy",
  "f:subject": "?user",
  "f:action": "*",
  "f:resource": {
    "ex:tenant": "?tenant"
  },
  "f:condition": "?user ex:tenant ?tenant",
  "f:allow": true
}

Users can only access data from their tenant.

Policy Performance

Efficient Policies

Good (specific):

{
  "f:resource": { "@type": "ex:PublicData" },
  "f:allow": true
}

Less efficient (broad):

{
  "f:resource": { "?pred": "?value" },
  "f:condition": "complex graph pattern",
  "f:allow": true
}

Query Optimization

Policies are optimized during query planning:

  • Type-based filters pushed down
  • Property filters optimized
  • Complex patterns may impact performance

Policy Management

Creating Policies

Policies are created via transactions:

curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=policies:main" \
  -H "Content-Type: application/json" \
  -d '{
    "@graph": [
      {
        "@id": "ex:new-policy",
        "@type": "f:Policy",
        "f:subject": "did:key:z6Mkh...",
        "f:action": "query",
        "f:allow": true
      }
    ]
  }'

Updating Policies

Update using WHERE/DELETE/INSERT:

{
  "where": [
    { "@id": "ex:policy-1", "f:allow": "?oldValue" }
  ],
  "delete": [
    { "@id": "ex:policy-1", "f:allow": "?oldValue" }
  ],
  "insert": [
    { "@id": "ex:policy-1", "f:allow": false }
  ]
}

Policy Versioning

Policies are versioned with data:

  • Time travel to see historical policies
  • Audit who changed policies when
  • Rollback policies if needed

Security Best Practices

1. Principle of Least Privilege

Grant minimum necessary permissions:

// Good: Specific permissions
{
  "f:subject": "did:key:z6Mkh...",
  "f:action": "query",
  "f:resource": { "@type": "ex:PublicData" },
  "f:allow": true
}

// Bad: Overly broad
{
  "f:subject": "did:key:z6Mkh...",
  "f:action": "*",
  "f:allow": true
}

2. Default Deny

Start with deny-all, add specific allows:

// Default policy
{
  "@id": "ex:default",
  "f:subject": "*",
  "f:action": "*",
  "f:allow": false
}

// Specific allows
{
  "@id": "ex:public-read",
  "f:subject": "*",
  "f:action": "query",
  "f:resource": { "@type": "ex:PublicData" },
  "f:allow": true
}

3. Use Roles

Define roles, not individual permissions:

{
  "@id": "ex:admin-role",
  "@type": "ex:Role",
  "ex:permissions": ["read", "write", "admin"]
}

{
  "@id": "ex:role-policy",
  "f:subject": { "ex:hasRole": "ex:admin-role" },
  "f:action": "*",
  "f:allow": true
}

4. Audit Policy Changes

Track who changes policies:

{
  "@id": "ex:policy-audit",
  "ex:changedBy": "did:key:z6Mkh...",
  "ex:changedAt": "2024-01-22T10:00:00Z",
  "ex:reason": "Added read access for contractors"
}

5. Test Policies

Test policies before deploying:

async function testPolicy(policy, testCases) {
  for (const testCase of testCases) {
    const result = await evaluatePolicy(policy, testCase);
    assert.equal(result.allowed, testCase.expected);
  }
}

Authentication

Fluree supports multiple authentication mechanisms to cover different deployment scenarios — from standalone servers with no external identity provider to managed platforms using OIDC.

This document describes the authentication model, the supported modes, the bearer token claim set, and the access boundary between replication and query operations.

Identity vs transport

Identity (who)

Fluree policy enforcement is based on an identity, ideally a DID:

  • Preferred: did:key:... — portable across environments, no central identity server required
  • Also possible: other DIDs or IRIs mapped into Fluree policy (e.g. ex:alice)

Policies are stored as RDF triples in the ledger and evaluated at query/transaction time against the requesting identity. See Policy model for details.

Transport (how requests authenticate)

Two “on-the-wire” mechanisms carry the identity:

MechanismFormatWhen to use
Signed requestsJWS/VC envelope containing the DIDProof-of-possession; trustless environments
Bearer tokensAuthorization: Bearer <JWT>Session-based; OIDC/OAuth2 flows

Bearer tokens are a UX and deployment convenience — they do not replace the identity model. The server extracts the identity from the token claims and enforces the same dataset policies as signed requests.

Three supported auth modes

Mode 1 — Decentralized: did:key signed requests (no IdP)

  • The client holds an Ed25519 keypair and derives a did:key:...
  • Requests are signed using JWS or Verifiable Credential format
  • The server verifies the signature and uses the DID as the principal
  • Dataset policies decide allow/deny

This preserves Fluree’s core value: no central identity server required.

See Signed requests (JWS/VC) for the wire format.

Mode 2 — Standalone server with offline-minted tokens

Designed for: “stand up a server somewhere” (local dev, single-node EC2, etc.).

  • An admin generates an Ed25519 keypair with fluree token keygen
  • The admin mints a scoped Bearer token with fluree token create
  • The admin provides the token to CLI users or stores it in a secret manager
  • The server validates the token’s embedded JWK signature and enforces scopes + policy

The policy identity remains DID-based (fluree.identity claim), so authorization stays dataset/policy driven even though the transport is a Bearer token.

See CLI token command for minting instructions.

Mode 3 — OIDC/OAuth2 with an external identity provider

Designed for: managed platforms (e.g., any application using an OIDC provider).

  • The IdP authenticates the user (device flow, PKCE, etc.)
  • The application knows the user’s Fluree dataset entitlements
  • The application issues (or exchanges for) a Fluree-scoped token carrying:
    • identity (fluree.identity — ideally a DID)
    • ledger read/write scopes
    • optional policy class
  • The server verifies the token against the provider’s JWKS endpoint

This preserves separation of concerns:

  • IdP: authentication (who logged in)
  • Application: authorization (what they can access in Fluree)

The server must be configured with --jwks-issuer to trust OIDC tokens. See Configuration — OIDC.

Bearer token claim set

All Fluree Bearer tokens (Mode 2 and Mode 3) share the same claim set. The server extracts identity and scopes from these claims regardless of how the token was signed.

Standard JWT claims

ClaimRequiredDescription
issYesIssuer — did:key:... for Ed25519 tokens, URL for OIDC tokens
subNoSubject — human-readable identity of the token holder
audNoAudience — target service (e.g. server URL)
expYesExpiration time (Unix timestamp)
iatYesIssued-at time (Unix timestamp)

Fluree-specific claims

ClaimTypeDescription
fluree.identityString (IRI/DID)Identity for policy enforcement — takes precedence over sub
fluree.policy.classString (IRI)Optional policy class for identity-based policy lookup

Scope claims

Scopes control which endpoints and ledgers a token can access.

Query scopes (fluree.ledger.*)

ClaimTypeDescription
fluree.ledger.read.allBooleanRead access to all ledgers via data API
fluree.ledger.read.ledgersArray of stringsRead access to specific ledgers
fluree.ledger.write.allBooleanWrite access to all ledgers via data API
fluree.ledger.write.ledgersArray of stringsWrite access to specific ledgers

Replication scopes (fluree.storage.*)

ClaimTypeDescription
fluree.storage.allBooleanStorage/replication access to all ledgers
fluree.storage.ledgersArray of stringsStorage/replication access to specific ledgers

Back-compat: fluree.storage.* claims also imply data API read access for the same ledgers.

Populating fluree.storage.ledgers (multi-tenant hint)

If you run an IdP or a request-router that exchanges IdP tokens for Fluree-scoped tokens, prefer populating fluree.storage.ledgers rather than granting fluree.storage.all.

Recommended conventions for mapping IdP group/role claims to ledger scopes:

  • Treat group values like fluree:storage:<ledger-id> (example: fluree:storage:books:main) as permission to replicate that ledger.
  • Optionally support wildcards at the router boundary (example: fluree:storage:books:* expands to the set of ledgers your router knows about under books:).
  • Reserve fluree.storage.all=true for admin/service accounts.

Event scopes (fluree.events.*)

ClaimTypeDescription
fluree.events.allBooleanSSE event stream for all ledgers
fluree.events.ledgersArray of stringsSSE event stream for specific ledgers

Example token payload

{
  "iss": "https://solo.example.com",
  "sub": "alice@example.com",
  "aud": "https://fluree.example.com",
  "exp": 1700000000,
  "iat": 1699996400,
  "fluree.identity": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
  "fluree.ledger.read.all": true,
  "fluree.ledger.write.ledgers": ["mydb:main", "mydb:staging"]
}

Token verification paths

The server supports two verification paths, selected automatically based on the JWT header:

JWT headerPathAlgorithmTrust model
Contains jwk (embedded key)Ed25519 / did:keyEdDSAIssuer trust checked against --events-auth-trusted-issuer (or admin/storage equivalents)
Contains kid (key ID)OIDC / JWKSRS256Issuer must match a --jwks-issuer; key fetched from JWKS endpoint

This dual-path dispatch is transparent to callers — the same Authorization: Bearer <token> header works for both paths. The server applies the same scope and identity enforcement regardless of which path verified the signature.

Replication vs query access boundary

Fluree draws a hard boundary between replication-scoped and query-scoped access.

Replication access (fluree.storage.*)

Replication operations — nameservice sync, storage proxy reads, and CLI fetch/pull/push — require root-level fluree.storage.* claims. These operations transfer raw commit data and index blocks; they bypass dataset policy because the data must be bit-identical to what the transaction server wrote.

Replication tokens are intended for operator and service-account use (e.g. a peer server’s storage-proxy token, or an admin’s CLI pull/push workflow). They should never be issued to end users.

Query access (fluree.ledger.read/write.*)

Query operations — /v1/fluree/query/{ledger...}, /v1/fluree/insert/{ledger...}, connection-scoped SPARQL, etc. — use fluree.ledger.read/write.* claims. These go through the full query engine and dataset policy enforcement. The server never exposes raw storage bytes through query endpoints.

Query tokens are appropriate for end users and application service accounts. Combined with a fluree.identity claim and dataset policies, the server enforces fine-grained row- and property-level access control.

CLI consequence: track vs pull

CommandAccess typeRequired scopeWhat happens
fluree pullReplicationfluree.storage.*Downloads raw commits and indexes into local storage
fluree trackQueryfluree.ledger.read/write.*Registers a remote ledger; queries forwarded to server

If a user holds only query-scoped tokens, they cannot clone or pull a ledger. They can only track it and issue queries/transactions against the remote.

Identity precedence

When multiple identity signals are present, the server uses this precedence (highest first):

  1. Signed request DID — proof-of-possession from JWS/VC signature
  2. Bearer token fluree.identity — identity claim in the token
  3. Client-provided headers/body — only honored when the server is in unauthenticated mode

When auth is present, the server forces opts.identity (and optional policy class) from the token, ignoring any client-provided identity in headers or request bodies. This prevents identity spoofing.

Endpoint coverage

All Bearer-token-authenticated endpoints support both Ed25519 and OIDC verification paths:

Endpoint groupExtractorScopes checked
Data API (query/update/info/exists)MaybeDataBearerfluree.ledger.read/write.*
Admin (create/drop)require_admin_tokenIssuer trust
Events (SSE)MaybeBearerfluree.events.*
Storage proxyStorageProxyBearerfluree.storage.*
Nameservice refsStorageProxyBearerfluree.storage.*

MCP endpoints currently use the Ed25519 path only.

Security notes

  • Tokens are validated server-side on every request; client-side validation is never trusted
  • Out-of-scope ledgers return 404 (not 403) to avoid existence leaks
  • fluree.storage.* tokens grant raw data access — issue only to trusted operators
  • Connection-scoped SPARQL (FROM/FROM NAMED) requires all referenced ledgers to be within the token’s read scope

See also

Storage Encryption

Fluree supports transparent encryption of data at rest using AES-256-GCM authenticated encryption. When enabled, all data written to storage is automatically encrypted, and data is decrypted transparently when read.

Overview

Key Features:

  • AES-256-GCM: Industry-standard authenticated encryption with integrity protection
  • Transparent Operation: Encryption/decryption happens automatically on read/write
  • All Storage Backends: Works natively with file, S3, and memory storage
  • Portable Ciphertext: Encrypted data can be moved between storage backends (file ↔ S3)
  • Environment Variable Support: Keys can be loaded from environment variables
  • Secure Key Handling: Key material in EncryptionKey is zeroized on drop

Quick Start

Rust API

#![allow(unused)]
fn main() {
use fluree_db_api::FlureeBuilder;

// Option 1: Direct key (for testing)
let key: [u8; 32] = /* your 32-byte key */;
let fluree = FlureeBuilder::file("/data/fluree")
    .build_encrypted(key)?;

// Option 2: Base64-encoded key
let fluree = FlureeBuilder::file("/data/fluree")
    .with_encryption_key_base64("your-base64-encoded-32-byte-key")?
    .build_encrypted_from_config()?;

// Option 3: From JSON-LD config with env var
let config = serde_json::json!({
    "@context": {"@vocab": "https://ns.flur.ee/system#"},
    "@graph": [{
        "@type": "Connection",
        "indexStorage": {
            "@type": "Storage",
            "filePath": "/data/fluree",
            "AES256Key": {"envVar": "FLUREE_ENCRYPTION_KEY"}
        }
    }]
});
let fluree = FlureeBuilder::from_json_ld(&config)?
    .build_encrypted_from_config()?;
}

Server Configuration

Set the encryption key via environment variable:

# Generate a secure 32-byte key and base64 encode it
export FLUREE_ENCRYPTION_KEY=$(openssl rand -base64 32)

# Start the server with JSON-LD config
./fluree-db-server --config config.jsonld

Configuration

JSON-LD Configuration

The encryption key is specified in the storage configuration using AES256Key:

{
  "@context": {
    "@base": "https://example.org/config/",
    "@vocab": "https://ns.flur.ee/system#"
  },
  "@graph": [
    {
      "@id": "indexStorage",
      "@type": "Storage",
      "filePath": "/var/lib/fluree/data",
      "AES256Key": {
        "envVar": "FLUREE_ENCRYPTION_KEY"
      }
    },
    {
      "@id": "mainConnection",
      "@type": "Connection",
      "indexStorage": {"@id": "indexStorage"},
      "cacheMaxMb": 2000
    }
  ]
}

Configuration Options

FieldTypeDescription
AES256Keystring or objectBase64-encoded 32-byte encryption key
AES256Key.envVarstringEnvironment variable containing the key
AES256Key.defaultValstringFallback key if env var is not set

Environment Variable Indirection

You can load the encryption key from an environment variable:

{
  "AES256Key": {
    "envVar": "FLUREE_ENCRYPTION_KEY"
  }
}

Or with a fallback default (not recommended for production):

{
  "AES256Key": {
    "envVar": "FLUREE_ENCRYPTION_KEY",
    "defaultVal": "fallback-base64-key-for-dev-only"
  }
}

Key Management

Generating Keys

Generate a cryptographically secure 32-byte key:

# Using OpenSSL (recommended)
openssl rand -base64 32

# Using /dev/urandom
head -c 32 /dev/urandom | base64

# Example output: "K7gNU3sdo+OL0wNhqoVWhr3g6s1xYv72ol/pe/Unols="

Key Storage Best Practices

  1. Never commit keys to version control
  2. Use environment variables or secret managers
  3. Rotate keys periodically (see Key Rotation below)
  4. Limit access to key material

Recommended secret management solutions:

  • HashiCorp Vault
  • AWS Secrets Manager
  • Kubernetes Secrets
  • Docker secrets

Key Rotation

The encryption envelope format includes a key_id field to support key rotation:

  1. Existing data continues to be readable with the old key
  2. New writes use the new key
  3. Re-encrypt on read (optional): Decrypt with old key, re-encrypt with new key

Note: Full key rotation support with KeyProvider trait is planned for a future release. Currently, a single static key is used.

Encryption Details

Algorithm

  • Cipher: AES-256-GCM (Galois/Counter Mode)
  • Key Size: 256 bits (32 bytes)
  • Nonce Size: 96 bits (12 bytes), randomly generated per write
  • Tag Size: 128 bits (16 bytes)

Ciphertext Envelope Format

All encrypted data uses a portable envelope format:

┌──────────────────────────────────────────────────────────────┐
│ Header (22 bytes)                                            │
├──────────┬─────────┬─────────┬──────────┬───────────────────┤
│ Magic    │ Version │ Alg     │ Key ID   │ Nonce             │
│ 4 bytes  │ 1 byte  │ 1 byte  │ 4 bytes  │ 12 bytes          │
│ "FLU\0"  │ 0x01    │ 0x01    │ uint32   │ random            │
├──────────┴─────────┴─────────┴──────────┴───────────────────┤
│ Ciphertext (variable length)                                 │
├──────────────────────────────────────────────────────────────┤
│ Authentication Tag (16 bytes)                                │
└──────────────────────────────────────────────────────────────┘
  • Magic bytes: FLU\0 (0x46 0x4C 0x55 0x00) for format detection
  • Version: Format version (currently 0x01)
  • Algorithm: 0x01 = AES-256-GCM
  • Key ID: Identifier for key rotation support
  • Nonce: Randomly generated per encryption operation
  • Authentication Tag: GCM integrity tag (authenticates header + ciphertext)

Security Properties

  1. Confidentiality: AES-256 encryption protects data content
  2. Integrity: GCM authentication tag detects tampering
  3. Authenticity: Header is included in AAD (Additional Authenticated Data)
  4. Non-deterministic: Random nonces mean same plaintext → different ciphertext

Portability

Encrypted data is portable between storage backends:

# Encrypted files can be copied from local storage to S3
aws s3 sync /var/lib/fluree/data s3://my-bucket/fluree/

# And back again
aws s3 sync s3://my-bucket/fluree/ /var/lib/fluree/data

The same encryption key will decrypt data regardless of where it’s stored.

Performance Considerations

  • CPU overhead: ~5-15% for encryption/decryption (depends on hardware AES support)
  • Storage overhead: 22 bytes header + 16 bytes tag per object
  • Memory: Keys are kept in memory while the connection is open

Modern CPUs with AES-NI instructions provide hardware acceleration, minimizing the performance impact.

Troubleshooting

Common Errors

“Invalid encryption format”

  • The data doesn’t have the expected magic bytes
  • Possible causes: trying to read unencrypted data with encryption enabled, or corrupted data

“Unknown encryption key ID”

  • The data was encrypted with a different key than what’s configured
  • Check that the correct key is being used

“Decryption failed”

  • The encryption key doesn’t match
  • The data may be corrupted
  • The authentication tag verification failed (data was tampered with)

“Encryption key must be 32 bytes”

  • The provided key is the wrong length
  • Base64-decode your key and verify it’s exactly 32 bytes

Verifying Encryption

Check if a file is encrypted by looking for the magic bytes:

# Check first 4 bytes of a file
xxd -l 4 /var/lib/fluree/data/some-file
# Encrypted: 00000000: 464c 5500  FLU.
# Unencrypted: will show different bytes (likely JSON or Avro magic)

Changing Encryption Settings

Enabling Encryption on Existing Data

To encrypt existing unencrypted data:

  1. Export all ledgers to JSON-LD
  2. Delete the old unencrypted data directory
  3. Configure encryption with a new key
  4. Import the JSON-LD data
# 1. Export (while running without encryption)
fluree export mydb:main --format json-ld > mydb-export.jsonld

# 2. Stop server and backup/delete old data
mv /var/lib/fluree/data /var/lib/fluree/data-unencrypted-backup

# 3. Configure encryption key
export FLUREE_ENCRYPTION_KEY=$(openssl rand -base64 32)
echo "Save this key securely: $FLUREE_ENCRYPTION_KEY"

# 4. Start server with encryption config and import
./fluree-db-server --config encrypted-config.jsonld
fluree create mydb --from mydb-export.jsonld

Disabling Encryption

Warning: This exposes your data. Only do this if absolutely necessary.

Follow the same export/import process, but configure without an encryption key.

Commit Signing and Attestation

Fluree supports cryptographic signing at two levels:

  1. Transaction signatures prove who submitted a transaction (user-facing). See Signed Transactions.
  2. Commit signatures prove which node wrote a commit (infrastructure-facing). This page covers commit signatures.

Both use did:key identifiers with Ed25519 signatures, aligning with the credential infrastructure in fluree-db-credential.

Note: Requires the credential feature flag. See Compatibility and Feature Flags.

Transaction Signatures vs Commit Signatures

These two signature types serve different purposes:

Transaction SignatureCommit Signature
ProvesWho submitted the transactionWhich node wrote the commit
Signed byEnd user (client-side)Fluree node (server-side)
Trust modelUser authenticationInfrastructure integrity
FormatJWS / Verifiable CredentialDomain-separated Ed25519 over commit hash
Stored inCommit envelope (txn_signature)Trailing signature block after commit hash

A single commit can have both: a transaction signature from the user who submitted it, and a commit signature from the node that wrote it.

How Commit Signing Works

Commit Digest

When a commit is written, its content is hashed with SHA-256 to produce a commit_hash. The signing digest is then computed with domain separation to prevent cross-protocol and cross-ledger replay:

to_sign = SHA-256("fluree/commit/v1" || varint(ledger_id.len()) || ledger_id || commit_hash)

Where:

  • "fluree/commit/v1" is a domain separator (18 bytes ASCII)
  • ledger_id is the ledger ID (name:branch, length-prefixed)
  • commit_hash is the 32-byte SHA-256 of the commit content

Signature Block Layout

The signature block is appended after the commit hash and is not covered by it:

+-------------------------------------+
| Header (32 bytes)                   |
|   flags: includes HAS_COMMIT_SIG    |
+-------------------------------------+
| Envelope + Ops + Dictionaries       |
+-------------------------------------+
| Footer (64 bytes)                   |
+-------------------------------------+
| commit_hash (32 bytes)              |
+-------------------------------------+
| Signature Block (optional)          |  <-- after hash boundary
|   sig_count: u16                    |
|   signatures: [CommitSignature]     |
+-------------------------------------+

This design means:

  • commit_hash is stable regardless of signatures
  • Signatures can be added without changing the commit’s content address
  • Existing verification (hash check) works unchanged

Signature Entry Format

Each signature entry contains:

FieldTypeDescription
signerStringSigner identity (did:key:z6Mk...)
algou8Signing algorithm (0x01 = Ed25519)
signature[u8; 64]Ed25519 signature bytes
timestampi64Signing time (epoch millis, informational only)
metadataOption<Vec<u8>>Optional metadata (node_id, region, role for consensus)

The algo byte provides forward compatibility for new signature algorithms. Unknown algo values are rejected on decode (not silently skipped).

The timestamp is informational only and is not part of the signed digest. Ordering is determined by the commit chain, not by signature timestamps.

The metadata field is reserved for future consensus features (multi-node signing, quorum sets). It allows nodes to include identifying information like node ID, region, or role. Currently unused but present in the format to avoid future versioning.

Enabling Commit Signing (Rust API)

Commit signing is opt-in via CommitOpts when using the Rust API:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use fluree_db_novelty::SigningKey;

// Load or generate an Ed25519 signing key
let signing_key = Arc::new(SigningKey::from_bytes(&key_bytes));

// Attach to commit options
let opts = CommitOpts::default()
    .with_signing_key(signing_key);
}

When a signing key is present, the commit writer:

  1. Computes the domain-separated digest from the commit hash and ledger ID
  2. Signs the digest with Ed25519
  3. Appends the signature block after the commit hash
  4. Sets the FLAG_HAS_COMMIT_SIG bit in the header

Verifying Commit Signatures

Verification recomputes the domain-separated digest and checks the Ed25519 signature:

#![allow(unused)]
fn main() {
use fluree_db_credential::verify_commit_digest;

verify_commit_digest(
    &signer_did,       // "did:key:z6Mk..."
    &signature_bytes,  // [u8; 64]
    &commit_hash,      // [u8; 32]
    ledger_id,           // "mydb:main"
)?;
}

The verifier:

  1. Extracts the Ed25519 public key from the did:key identifier
  2. Recomputes to_sign = SHA-256("fluree/commit/v1" || varint(ledger_id.len()) || ledger_id || commit_hash)
  3. Verifies the signature over to_sign

No external key registry is needed for did:key identifiers — the public key is embedded in the DID itself.

Wire Format

Each CommitSignature is encoded as:

signer_len:   u16 (LE)          - length of signer string
signer:       [u8; signer_len]  - UTF-8 did:key identifier
algo:         u8                - signature algorithm (0x01 = Ed25519)
signature:    [u8; 64]          - Ed25519 signature bytes
timestamp:    i64 (LE)          - signing timestamp (epoch millis)
meta_len:     u16 (LE)          - metadata length (0 if none)
metadata:     [u8; meta_len]    - optional metadata bytes

The signature block is prefixed with sig_count: u16 (LE) containing the number of signatures.

Security Properties

Replay Prevention

  • Cross-ledger: The ledger ID is part of the signed digest, so a signature from ledger A cannot be replayed on ledger B
  • Cross-protocol: The domain separator "fluree/commit/v1" prevents signatures meant for other systems from being accepted
  • Version upgrade: Changing the domain separator (e.g., v1 to v2) invalidates old signatures

What Commit Signatures Do Not Provide

  • Transaction authorization: Use transaction signatures and policies for user-level access control
  • Consensus: A single commit signature proves one node wrote it. Multi-node consensus requires attestation policies (see below)
  • Encryption: Commit signing provides integrity and authenticity, not confidentiality. See Storage Encryption for data-at-rest protection

Future: Attestations and Consensus Policy

The following capabilities are designed but not yet implemented.

Detached Attestations

For multi-node deployments, signatures can be collected as separate attestation objects rather than embedded in the commit:

  • Commit file remains immutable and content-addressed
  • Signatures collected asynchronously from multiple nodes
  • No coordination needed during commit write
  • Attestations from different nodes can arrive at different times

Consensus Policy

Consensus policy will define how many signatures are required for a commit to be accepted:

  • None: No signatures required (default)
  • Single signer: One designated writer must sign
  • Threshold (K-of-N): At least K signatures from an allowlist of N signers
  • Quorum set: At least one signature from each required group

Policy validation runs after commit hash integrity check, before accepting the commit.

Policy Model and Inputs

This is the reference for Fluree’s access-control policy model. For a conceptual introduction, see Policy enforcement. For worked examples, see the policy cookbook. For Rust-side wiring (building a PolicyContext, wrap_identity_policy_view, transaction helpers), see Programmatic policy API.

Policy node shape

Every policy is a JSON-LD node. Required @type: f:AccessPolicy (the IRI is https://ns.flur.ee/db#AccessPolicy). A second class IRI (e.g. ex:CorpPolicy) is conventional and allows the policy to be loaded by policy-class.

{
  "@id": "ex:somePolicy",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onProperty": [{"@id": "ex:salary"}],
  "f:onClass":    [{"@id": "ex:Employee"}],
  "f:onSubject":  [{"@id": "ex:alice"}],
  "f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
  "f:query": "<JSON-encoded WHERE>",
  "f:allow": true,
  "f:exMessage": "Reason returned to caller on denial"
}

Predicate reference

PredicateTypeRequired?Description
f:actionarray of IRIs (or single IRI string)yesWhich operations the policy governs. Values: f:view (queries), f:modify (transactions).
f:allowbooleanone of f:allow / f:queryStatic decision. true permits, false denies. Takes precedence over f:query if both are present.
f:querystring (JSON-encoded JSON-LD WHERE)one of f:allow / f:queryDynamic decision. The targeted flake is permitted when the query returns at least one row. ?$this and ?$identity are pre-bound.
f:onPropertyarray of @id referencesnoRestrict the policy to flakes whose predicate is one of these IRIs.
f:onClassarray of @id referencesnoRestrict the policy to flakes whose subject has one of these rdf:types.
f:onSubjectarray of @id referencesnoRestrict the policy to flakes whose subject IRI is one of these.
f:requiredbooleanno, defaults to falseWhen true, the policy MUST allow for access to its targets to be granted, regardless of default-allow.
f:exMessagestringnoUser-facing error message returned when this policy denies a transaction.

If neither f:allow nor f:query is present, the policy is deny by default.

If multiple targeting predicates are present, they intersect: the policy applies only to flakes that match the property AND the class AND the subject sets.

If all targeting predicates are omitted, the policy is a default policy that applies to every flake of its f:actions.

Action values

f:action carries IRIs in the f: namespace:

  • "f:view" (or {"@id": "f:view"}) — queries.
  • "f:modify" (or {"@id": "f:modify"}) — transactions.
  • Both: [{"@id": "f:view"}, {"@id": "f:modify"}].

A policy with no f:action defaults to applying to both view and modify.

f:query syntax

f:query is a string containing a JSON-encoded JSON-LD query. The engine parses the string and runs the query as a subquery for each candidate flake, with two pre-bound variables:

VariableBinding
?$thisThe IRI of the subject being read or written.
?$identityThe IRI of the requesting identity (resolved from opts.identity, policy_values["?$identity"], or the verified bearer-token subject).

Anything else binds via the embedded WHERE just like a normal Fluree query.

Because RDF can’t carry structured JSON values natively, stored policies must JSON-encode the query (serde_json::to_string). For inline policies passed via opts.policy, you can also use the JSON-LD typed-literal form {"@type": "@json", "@value": {...}} to avoid manually escaping.

Example (string form, suitable for storing in a transaction):

"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"

Example (typed-literal form, suitable for inline policies):

"f:query": {
  "@type": "@json",
  "@value": {
    "where": {"@id": "?$identity", "http://example.org/role": "hr"}
  }
}

Inline policies must use full IRIs. Compact IRIs (schema:ssn) inside an inline policy passed through opts.policy are not expanded against the request @context. Use full IRIs (http://schema.org/ssn).

Combining algorithm

When more than one policy targets the same flake, the engine combines them as follows:

  1. If any required policy (f:required: true) targets the flake and does not allow it (either f:allow: false, missing f:allow, or f:query returning no rows), access is denied for that flake. Required policies are gates: they cannot be overridden by other allows or by default-allow.
  2. If at least one targeted (but not required) policy allows the flake, access is granted. Non-required allows combine with allow-overrides semantics.
  3. If a targeted policy’s f:query returns false (no rows), that policy applied but did not permit — the flake is denied even if default-allow is true. Default-allow only applies when no policy targets the flake.
  4. If no policies target the flake, default-allow decides. false denies; true permits.

f:allow always takes precedence over f:query: if both are set on the same policy, f:allow wins.

For a deeper treatment, including the three-state identity resolution semantics (FoundWithPolicies / FoundNoPolicies / NotFound), see the Policy combining algorithm section in the programmatic policy API reference.

Default-allow

default-allow is the fallback decision for flakes that no policy targets:

SettingBehavior
default-allow: falseFail-closed. A flake with no targeting policies is denied. Recommended for production.
default-allow: trueFail-open. A flake with no targeting policies is allowed. Useful in development or in deployments where an application layer handles authorization and Fluree is recording signed transactions for provenance.

Important: default-allow: true does not override required policies that fail. It only governs the no-policy case.

Identity resolution

When opts.identity is set, Fluree resolves it to a ?$identity SID and applies the identity’s f:policyClass automatically — every stored policy of that class is loaded into the request’s policy set.

The resolution path:

opts.identity  →  policy_class               →  policy             →  policy_values["?$identity"]
   (highest)                                                                  (lowest)

If multiple are set, the higher-priority binding wins. policy_values["?$identity"] is a manual escape hatch — useful when you want to test a specific identity SID without going through the full resolution path.

A request with no identity supplied uses an “anonymous” context: only inline policies, no class-based discovery, no ?$identity binding.

Where policies come from

Two delivery paths, often combined:

Stored policies

Persist policies as data in the ledger. The policy node carries the class type alongside f:AccessPolicy:

{
  "@id": "ex:salary-restriction",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  ...
}

Identities tag themselves with f:policyClass:

{
  "@id": "ex:aliceIdentity",
  "ex:user": {"@id": "ex:alice"},
  "f:policyClass": [{"@id": "ex:CorpPolicy"}]
}

When opts.identity = "ex:aliceIdentity", every f:AccessPolicy whose @type includes ex:CorpPolicy is loaded for the request — no per-request policy listing needed. Stored policies are versioned, time-travelable, branchable, and consistent across all callers.

Inline policies

Pass policies in opts.policy (an array of policy nodes) for ad-hoc requests:

{
  "from": "mydb:main",
  "select": "?x",
  "where": [...],
  "opts": {
    "policy": [
      {"@id": "ex:adhoc", "@type": "f:AccessPolicy", "f:action": "f:view", "f:allow": true}
    ],
    "default-allow": false
  }
}

Useful for tests, admin scripts, and migration tooling. Inline policies and stored policies can coexist in a single request.

Request-time options

Each request can supply these opts fields (JSON-LD form). Over SPARQL, the equivalent fluree-* HTTP headers carry the same values.

opts fieldHTTP headerDescription
identityfluree-identityIRI of an identity entity. Drives f:policyClass discovery and binds ?$identity.
policy-classfluree-policy-classClass IRI(s) to load stored policies by. Repeated header or comma-separated.
policy-valuesfluree-policy-valuesJSON object of additional ?$var bindings injected into every policy’s f:query.
policyfluree-policyInline policy array (full JSON-LD).
default-allowfluree-default-allowtrue / false. Fallback decision for flakes that no policy targets.

When the server is configured with data_auth_default_policy_class, a verified bearer token’s identity claim is auto-applied to policy-values and the configured class to policy-class — no client-side opts needed. See Configuration and Authentication for the bearer-token flow.

Read enforcement vs write enforcement

The same model governs both, distinguished by f:action:

  • f:view — applied during query execution. Flakes that fail the policy are filtered before the query plan emits results. The query never sees them.
  • f:modify — applied during transaction staging. The transaction is rejected — with f:exMessage if provided — when a write would touch flakes the identity isn’t allowed to modify.

A single policy can govern both. See Policy in queries and Policy in transactions for path-specific details.

Performance notes

Two phases:

  • Load. The relevant policies for a request are gathered once (from policy-class lookups + inline policy). Cost is small and proportional to the size of the policy set.
  • Apply. During plan execution, each candidate flake is checked against the matching subset of the policy set. Cost is proportional to the number of touched flakes × the average per-flake check cost.

Two practical implications:

  1. Target every policy you can. A policy with f:onProperty or f:onClass only runs on flakes whose predicate or rdf:type matches. Default policies (no targeting) run on every flake.
  2. Keep f:query cheap. It runs once per targeted flake. Lean on identity-side properties already loaded (@type, f:policyClass, role flags) rather than deep traversals.

Policies are queryable data

Because each policy is just a JSON-LD node, you can query the policies themselves:

PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/>

SELECT ?policy ?action ?onProperty
WHERE {
  ?policy a f:AccessPolicy ;
          a ex:CorpPolicy ;
          f:action ?action ;
          f:onProperty ?onProperty .
}

History queries against the same shape produce a complete audit trail of policy changes over time. See Time travel for query-at-t syntax.

Policy in Queries

Query-time enforcement uses Fluree’s policy model to filter individual flakes during query execution. The query plan is the same regardless of policy — what changes is which flakes the engine returns. The application sees a query result; the policy filtering is invisible.

This page documents how query-time enforcement works, how patterns interact with the plan, and how to test policies from the CLI. For the policy node shape and combining algorithm, see the policy model reference. For the underlying concept, see Policy enforcement.

How query-time filtering works

When a query is executed against a PolicyContext:

  1. The engine resolves the request’s policy set: identity-driven f:policyClass lookups + any inline opts.policy array.
  2. The plan executes normally — same join order, same indices.
  3. Each flake the plan would emit is checked against the policies whose target matches it (f:onProperty, f:onClass, f:onSubject, or default for untargeted policies).
  4. A flake survives only if the combining algorithm approves it.
  5. Surviving flakes flow through the rest of the plan (joins, filters, aggregates) as normal.

Filtering is at the flake level — a single subject can appear in the result with some properties visible and others elided.

Worked example

Two users in a mydb:main ledger:

fluree insert '{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
  "@graph": [
    {"@id": "ex:alice", "schema:name": "Alice", "ex:role": "engineer", "ex:salary": 130000},
    {"@id": "ex:bob",   "schema:name": "Bob",   "ex:role": "manager",  "ex:salary": 155000}
  ]
}'

A required policy that hides ex:salary unless the requester is a manager:

fluree insert '{
  "@context": {"f": "https://ns.flur.ee/db#", "ex": "http://example.org/"},
  "@graph": [
    {
      "@id": "ex:salary-restriction",
      "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
      "f:required": true,
      "f:onProperty": [{"@id": "ex:salary"}],
      "f:action": [{"@id": "f:view"}],
      "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"manager\"}}"
    },
    {
      "@id": "ex:default-view",
      "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
      "f:action": [{"@id": "f:view"}],
      "f:allow": true
    },
    {"@id": "ex:aliceIdentity", "f:policyClass": [{"@id": "ex:CorpPolicy"}], "ex:role": "engineer"},
    {"@id": "ex:bobIdentity",   "f:policyClass": [{"@id": "ex:CorpPolicy"}], "ex:role": "manager"}
  ]
}'

The same query, executed as different identities:

# As Bob (manager) — sees salaries
fluree query --as ex:bobIdentity --policy-class ex:CorpPolicy \
  'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'
# → Alice 130000, Bob 155000

# As Alice (engineer) — salary flakes filtered out
fluree query --as ex:aliceIdentity --policy-class ex:CorpPolicy \
  'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'
# → no results: the join requires ?salary which is filtered for Alice

To get Alice’s name back without the salary join, use OPTIONAL:

SELECT ?name ?salary WHERE {
  ?p <http://schema.org/name> ?name .
  OPTIONAL { ?p <http://example.org/salary> ?salary }
}

Now Alice sees both names, with ?salary unbound — exactly the behavior an application expects when a property is suppressed by policy.

Targeting patterns

Property-level (f:onProperty)

Restricts a flake whose predicate matches:

{
  "@id": "ex:hide-ssn",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onProperty": [{"@id": "http://schema.org/ssn"}],
  "f:action": [{"@id": "f:view"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"
}

Flakes whose predicate is not schema:ssn are unaffected by this policy.

Class-level (f:onClass)

Restricts flakes whose subject has one of the listed rdf:types:

{
  "@id": "ex:employee-data-only",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onClass": [{"@id": "http://example.org/Employee"}],
  "f:action": [{"@id": "f:view"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/Employee\"}}"
}

Flakes about non-Employee subjects fall through to other policies.

Subject-level (f:onSubject)

Restricts flakes about specific subjects:

{
  "@id": "ex:hide-internal-doc",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onSubject": [{"@id": "http://example.org/secret-doc"}],
  "f:action": [{"@id": "f:view"}],
  "f:allow": false
}

Default (no targeting)

A policy with no f:onProperty / f:onClass / f:onSubject applies to every flake. Use sparingly — default policies are evaluated against every emitted flake, which is more expensive than targeted policies.

SPARQL queries

SPARQL queries have no opts block, so policy is delivered via headers:

curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main' \
  -H 'Content-Type: application/sparql-query' \
  -H "Authorization: Bearer $JWT" \
  -H 'fluree-identity: ex:aliceIdentity' \
  -H 'fluree-policy-class: ex:CorpPolicy' \
  -H 'fluree-default-allow: false' \
  -d 'SELECT ?name WHERE { ?p <http://schema.org/name> ?name }'

The full header set is documented in the policy model.

JSON-LD queries

JSON-LD queries put policy in opts:

{
  "from": "mydb:main",
  "select": ["?name", "?salary"],
  "where": [
    {"@id": "?p", "schema:name": "?name"},
    ["optional", {"@id": "?p", "ex:salary": "?salary"}]
  ],
  "opts": {
    "identity": "ex:aliceIdentity",
    "policy-class": ["ex:CorpPolicy"],
    "default-allow": false
  }
}

Inline policies, additional policy-values, and multiple policy-class entries all live under opts. The full vocabulary is in the policy model reference.

Multi-graph queries

Policies apply per-flake, regardless of which named graph the flake came from. A query that pulls from multiple from-named graphs sees a uniformly filtered result — there’s no per-graph policy override.

If different graphs need different policy regimes, use targeted policies (f:onClass for type-scoped restrictions, f:onSubject for explicit subject lists). For wholly separate access regimes, use separate ledgers.

Time-travel queries

Policy evaluation honors the query’s t. When you query --at a past t:

  • The policy set itself is resolved at that t (so retired policies still apply when you time-travel back to when they were live).
  • Identity attributes used in f:query are evaluated at that t.

This makes audit-style queries — “What could Alice see on 2024-06-15?” — directly expressible:

fluree query --as ex:aliceIdentity --policy-class ex:CorpPolicy --at 2024-06-15T00:00:00Z \
  'SELECT ?p ?o WHERE { <http://example.org/financial-report> ?p ?o }'

Performance considerations

Two phases: load the policy set once per request; apply it to each touched flake.

  • Target policies whenever possible. A policy with f:onProperty only runs against flakes whose predicate matches. Default policies (no targeting) run against every flake.
  • Keep f:query cheap. It runs once per flake-target. Lean on identity-side properties already loaded (@type, f:policyClass, role flags) rather than deep traversals.
  • Avoid deep recursion in f:query. Each level of indirection multiplies the per-flake cost.
  • Required policies short-circuit. If a required policy denies, no further required policies are checked for that flake.

For complex deployments, the explain plan shows whether a query is dominated by policy filtering and which policies contribute.

Testing policies from the CLI

The fluree CLI supports policy-enforced queries so you can verify that the policies you’ve configured filter results as expected — without writing any client code.

Flags

Available on fluree query (and on fluree insert, upsert, update for write-time enforcement):

FlagPurpose
--as <IRI>Execute as this identity. Resolves f:policyClass on the identity subject to collect applicable policies, and binds ?$identity.
--policy-class <IRI>Apply stored policies of the given class IRI. Repeatable. Narrows to the intersection with the identity’s policies, or applies directly without --as.
--default-allowAllow when no matching policy exists for the operation. Defaults to false (deny-by-default).

Workflow

  1. Transact your policy rules (and the identities with their f:policyClass assignments) into the ledger, using any of the normal insert / upsert / update commands.
  2. Re-run the same query as different identities to confirm results differ as the policies prescribe:
# Full result set (no policy enforcement)
fluree query 'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'

# As an HR user — should see all salaries
fluree query --as ex:hrIdentity --policy-class ex:CorpPolicy \
  'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'

# As a regular employee — policies should hide salary field
fluree query --as ex:engineerIdentity --policy-class ex:CorpPolicy \
  'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'

Local vs remote

The flags work in both modes:

  • Local (default, or with --direct): the CLI loads the ledger directly and applies policy via the in-process query engine.
  • Remote (with --remote <name>, or auto-routed through a running local server): the CLI sends the flags to the server as HTTP headers (fluree-identity, fluree-policy-class, fluree-default-allow) and, for JSON-LD bodies, also injects them into opts. Multi-value --policy-class rides through the body opts only; SPARQL transport is single-valued via the header.

Remote impersonation: how it’s authorized

When you run against a remote server with --as <iri>, the server treats the request as impersonation and gates it as follows:

  1. Your bearer token’s identity is resolved on the target ledger.
  2. If that identity has no f:policyClass assignments (the FoundNoPolicies outcome — your service account is unrestricted on this ledger), the server honors --as and runs the query as the target identity.
  3. If your bearer identity is itself policy-constrained (FoundWithPolicies) or unknown to this ledger (NotFound), the server force-overrides --as with your bearer identity. You see your own filtered view, not the target’s.

Each successful impersonation is logged at info level on the server:

policy impersonation: bearer=<svc-id> target=<as-iri> ledger=<name>

This is the standard service-account pattern: register your CLI/app-server identity in the ledger with no f:policyClass, and it gains the right to delegate to any end-user identity for testing or per-request enforcement. Assigning a policy class to that identity revokes the delegation right with no config change.

Limitations

  • Inline policy rules (opts.policy) and policy variable bindings (opts.policy-values) are not yet exposed as CLI flags — use a JSON-LD query body with an "opts" block when you need those.
  • For SPARQL queries against a remote, only --as, single-value --policy-class, and --default-allow are wired (via headers). Multi-value --policy-class works on JSON-LD only.
  • Proxy-mode servers fall back to the legacy non-impersonation behavior — the upstream server performs the impersonation check.

Policy in Transactions

Transaction-time enforcement uses the same policy model as queries, switched on by f:action: f:modify. Where query-time enforcement filters flakes from results, transaction-time enforcement rejects the transaction when a write would touch flakes the identity isn’t allowed to modify.

This page documents how write-time enforcement integrates with the transaction lifecycle, the failure shape, and the patterns that come up most often. For the policy node shape and combining algorithm, see the policy model reference. For the conceptual frame, see Policy enforcement.

How transaction-time enforcement works

When a transaction is staged against a PolicyContext:

  1. The engine resolves the request’s policy set: identity-driven f:policyClass lookups + any inline opts.policy array, restricted to policies whose f:action includes f:modify.
  2. The transaction is staged into novelty (assertions and retractions are computed from insert / delete / where clauses).
  3. Each staged flake is checked against the matching policies.
  4. If any required policy denies a flake (or any non-required allow is missing where one would be needed), the entire transaction is rejected. Transactions are atomic — a partial write is never persisted.
  5. On rejection, the response carries the policy’s f:exMessage (when supplied), the offending flake, and the policy’s @id.

The result: the requester gets a clear authorization failure rather than a silently incomplete write.

Worked example

fluree insert '{
  "@context": {"f": "https://ns.flur.ee/db#", "ex": "http://example.org/"},
  "@graph": [
    {
      "@id": "ex:email-restriction",
      "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
      "f:required": true,
      "f:onProperty": [{"@id": "http://schema.org/email"}],
      "f:action": [{"@id": "f:modify"}],
      "f:exMessage": "Users can only update their own email.",
      "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$this\"}}}"
    },
    {
      "@id": "ex:default-rw",
      "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
      "f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
      "f:allow": true
    },
    {"@id": "ex:johnIdentity",  "ex:user": {"@id": "ex:john"},  "f:policyClass": [{"@id": "ex:CorpPolicy"}]},
    {"@id": "ex:janeIdentity",  "ex:user": {"@id": "ex:jane"},  "f:policyClass": [{"@id": "ex:CorpPolicy"}]}
  ]
}'

Now John attempts to update his own email — succeeds:

fluree update --as ex:johnIdentity --policy-class ex:CorpPolicy '
  PREFIX ex: <http://example.org/>
  PREFIX schema: <http://schema.org/>
  WHERE  { ex:john schema:email ?email }
  DELETE { ex:john schema:email ?email }
  INSERT { ex:john schema:email "new-john@flur.ee" }
'

John attempts to update Jane’s email — rejected:

fluree update --as ex:johnIdentity --policy-class ex:CorpPolicy '
  PREFIX ex: <http://example.org/>
  PREFIX schema: <http://schema.org/>
  WHERE  { ex:jane schema:email ?email }
  DELETE { ex:jane schema:email ?email }
  INSERT { ex:jane schema:email "hacked@flur.ee" }
'
# Error: policy denied: Users can only update their own email. (ex:email-restriction)

What gets enforced

Every modification path runs the same f:modify policy check on its staged flakes:

OperationFlakes checked
InsertAll asserted flakes.
UpsertAsserted flakes + retractions for any pre-existing values being replaced.
Update (WHERE/DELETE/INSERT)Both retracted flakes (DELETE) and asserted flakes (INSERT).
Retraction (@type: f:Retraction)Retracted flakes.

Crucially, the policy is checked against the flakes, not the operation type. A transaction that retracts a flake the identity can’t modify is rejected just like an insert that asserts one.

Targeting patterns

Whitelist a property to a role

{
  "@id": "ex:salary-write",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onProperty": [{"@id": "http://example.org/salary"}],
  "f:action": [{"@id": "f:modify"}],
  "f:exMessage": "Only HR may write salary.",
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"
}

Combined with default-allow: true (or a permissive default f:modify policy), every other property remains writable.

Owner-only edits

{
  "@id": "ex:owner-edit",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:action": [{"@id": "f:modify"}],
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$user\"}}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/owner\": {\"@id\": \"?$user\"}}}"
}

The f:query resolves the identity’s user and verifies that ?$this (the entity being modified) has that user as its owner.

Status-based gates

Prevent edits to records past a workflow gate:

{
  "@id": "ex:no-edit-after-approval",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onClass": [{"@id": "http://example.org/Order"}],
  "f:action": [{"@id": "f:modify"}],
  "f:exMessage": "Approved orders cannot be modified.",
  "f:query": "{\"where\": [{\"@id\": \"?$this\", \"http://example.org/status\": \"?status\"}, [\"filter\", \"(!= ?status \\\"approved\\\")\"]]}"
}

Approved orders fail the gate — their flakes can’t be retracted or modified.

Workflow service exception

Combine targeting + identity-typed checks to limit a write to a single service:

{
  "@id": "ex:approved-by-workflow-only",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onProperty": [{"@id": "http://example.org/approved"}],
  "f:action": [{"@id": "f:modify"}],
  "f:exMessage": "ex:approved is set by the workflow service only.",
  "f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/WorkflowService\"}}"
}

End-user identities can read ex:approved, but only the workflow service can write it.

Immutable records

{
  "@id": "ex:audit-log-immutable",
  "@type": ["f:AccessPolicy", "ex:CorpPolicy"],
  "f:required": true,
  "f:onClass": [{"@id": "http://example.org/AuditEvent"}],
  "f:action": [{"@id": "f:modify"}],
  "f:exMessage": "Audit events are immutable.",
  "f:allow": false
}

Notice the absence of f:queryf:allow: false is a flat deny, applied to every modification of ex:AuditEvent instances. New events can still be inserted because the policy targets only existing-instance flakes; a fresh @type: ex:AuditEvent insertion creates a new subject and a new rdf:type flake, neither of which the targeting matches.

(For a hard “append-only” guarantee that forbids anything but new insertions, model the constraint with a SHACL shape that requires the property to be unset on prior commits — SHACL is a better fit for that pattern than policy.)

Failure shape

When a transaction is rejected, the API returns:

{
  "error": "policy_denied",
  "message": "Users can only update their own email.",
  "policy": "http://example.org/email-restriction",
  "subject": "http://example.org/jane",
  "property": "http://schema.org/email"
}

f:exMessage is the user-visible string. The policy @id, the offending subject, and the property are reported for diagnostics.

When no f:exMessage is set, a generic message is returned ("policy denied"); the structured fields are still present so a client can surface the right error to a user.

WHERE/DELETE/INSERT semantics with policy

A WHERE/DELETE/INSERT transaction proceeds in three phases — match → retract → assert. Policy enforcement is on the staged flakes from phases 2 and 3:

PREFIX ex:     <http://example.org/>
PREFIX schema: <http://schema.org/>

WHERE  { ?u schema:email ?old . FILTER(?u = ex:jane) }
DELETE { ?u schema:email ?old }
INSERT { ?u schema:email "new@flur.ee" }

When run by an identity that lacks modify rights on ?u’s email:

  • The WHERE pattern still binds normally — policy doesn’t filter the match phase.
  • The DELETE retraction stages a flake the identity can’t modify — rejected.

To prevent accidental no-op rejections (the WHERE matches but the DELETE/INSERT can’t proceed), pair transaction-time f:modify policies with the same shape f:view policies, so the WHERE itself sees a filtered view.

Signed transactions and impersonation

When a transaction is signed (JWS or VC-wrapped), the signing key’s identity replaces the bearer identity for policy purposes. The signed credential becomes the source of truth: the server verifies the signature, resolves the signer’s identity entity, and applies that identity’s f:policyClass policies.

For the impersonation rules — when --as <iri> is honored vs force-overridden — see Policy in queries → Remote impersonation. The same gate applies to transactions.

See Signed / credentialed transactions for the wire format.

Provenance

Every committed transaction carries the asserting identity in its commit metadata. Combined with policy enforcement, this gives a clean audit trail:

  • The identity is recorded on the commit.
  • The policies in effect at commit time are themselves time-travelable.
  • Replay-from-commit produces the same policy decisions.

Performance considerations

  • Stage cost dominates. Most of the work is staging the transaction (computing assertions/retractions, building the novelty layer). Policy checks add a small per-flake cost on top.
  • Required policies short-circuit. A failure rejects the transaction immediately without checking remaining flakes.
  • Batch transactions amortize loading. Loading the policy set is per-transaction, not per-flake — large batched transactions pay the load cost once.
  • Cache identity properties. The identity’s @type, f:policyClass, and any role tags used in f:query are loaded once per transaction.

Testing policies from the CLI

The same --as, --policy-class, and --default-allow flags used on fluree query are available on fluree insert, fluree upsert, and fluree update so you can verify write-time enforcement without any client code:

# Attempt a write as an identity that lacks the f:modify policy — expect failure
fluree insert --as ex:readOnlyIdentity --policy-class ex:CorpPolicy -f new-data.ttl

# Same write as an authorized identity — expect success
fluree insert --as ex:writerIdentity --policy-class ex:CorpPolicy -f new-data.ttl

The flags work locally and against remote servers. On remote, the CLI sends the policy options as HTTP headers (fluree-identity, fluree-policy-class, fluree-default-allow) and, for JSON-LD bodies, also injects them into opts. The server applies the root-impersonation gate: your bearer identity may delegate to --as <iri> only when the bearer identity itself has no f:policyClass on the target ledger. Restricted bearers have --as force-overridden back to their own identity (and writes only what their own policies permit).

This is the standard service-account pattern — see Policy in queries → Remote impersonation for the full authorization rules and audit-log format.

Transaction enforcement is end-to-end

Unsigned bearer-authenticated transactions build a PolicyContext from the (post-header-merge) opts and route through the policy-enforcing transact_tracked_with_policy path. A non-root bearer’s f:modify constraints apply to their writes, matching the long-standing query-side behavior. SPARQL UPDATE inherits the same enforcement, with identity sourced from either the bearer or the fluree-identity header (impersonation-gated).

Programmatic Policy API (Rust)

This guide covers how to use Fluree’s policy system programmatically in Rust applications.

Overview

There are two main approaches to applying policies programmatically:

  1. Identity-based policies (wrap_identity_policy_view): Policies stored in the database and loaded via f:policyClass on an identity subject
  2. Inline policies (wrap_policy_view with opts.policy): Policies provided directly in the query/transaction options

Identity-Based Policy Lookup

The recommended approach for production systems. Policies are stored in the ledger and loaded dynamically based on the identity’s f:policyClass property.

Storing Policies in the Database

First, insert policies with types that will be referenced by identities:

#![allow(unused)]
fn main() {
let policies = json!({
    "@context": {
        "f": "https://ns.flur.ee/db#",
        "ex": "http://example.org/ns/",
        "schema": "http://schema.org/"
    },
    "@graph": [
        // Identity with policy class assignment
        {
            "@id": "http://example.org/identity/alice",
            "f:policyClass": [{"@id": "ex:EmployeePolicy"}],
            "ex:user": {"@id": "ex:alice"}
        },

        // SSN restriction policy - only see your own SSN
        {
            "@id": "ex:ssnRestriction",
            "@type": ["f:AccessPolicy", "ex:EmployeePolicy"],
            "f:required": true,
            "f:onProperty": [{"@id": "schema:ssn"}],
            "f:action": {"@id": "f:view"},
            "f:query": serde_json::to_string(&json!({
                "where": {
                    "@id": "?$identity",
                    "http://example.org/ns/user": {"@id": "?$this"}
                }
            })).unwrap()
        },

        // Default allow policy for other properties
        {
            "@id": "ex:defaultAllowView",
            "@type": ["f:AccessPolicy", "ex:EmployeePolicy"],
            "f:action": {"@id": "f:view"},
            "f:allow": true
        }
    ]
});

// Prefer the lazy Graph API for transactions
fluree.graph("mydb:main")
    .transact()
    .insert(&policies)
    .commit()
    .await?;
}

Using wrap_identity_policy_view

Create a policy-wrapped view using an identity IRI:

#![allow(unused)]
fn main() {
use fluree_db_api::{wrap_identity_policy_view, FlureeBuilder, GraphDb};

let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.ledger("mydb:main").await?;

// Wrap the ledger with identity-based policy
let wrapped = wrap_identity_policy_view(
    &ledger,
    "http://example.org/identity/alice",  // identity IRI
    true  // default_allow: allow access when no policy matches
).await?;

// Check policy properties
assert!(!wrapped.is_root(), "Should not be root/unrestricted");

// Create a view with the policy applied, then query using the builder
let view = GraphDb::from_ledger_state(&ledger)
    .with_policy(std::sync::Arc::new(wrapped.policy().clone()));

let query = json!({
    "select": ["?s", "?ssn"],
    "where": {
        "@id": "?s",
        "@type": "ex:User",
        "schema:ssn": "?ssn"
    }
});

let result = view.query(&fluree)
    .jsonld(&query)
    .execute()
    .await?;
}

How Identity Lookup Works

When you call wrap_identity_policy_view:

  1. Fluree queries for policies via the identity’s f:policyClass:

    SELECT ?policy WHERE {
        <identity-iri> f:policyClass ?class .
        ?policy a ?class .
        ?policy a f:AccessPolicy .
    }
    
  2. Each matching policy’s properties are loaded (f:action, f:allow, f:query, f:onProperty, etc.)

  3. The ?$identity variable is automatically bound to the identity IRI for use in f:query policies

Inline Policies with policy-values

For cases where policies should not be stored in the database, use inline policies with explicit ?$identity binding.

QueryConnectionOptions Pattern

#![allow(unused)]
fn main() {
use fluree_db_api::{QueryConnectionOptions, wrap_policy_view};
use std::collections::HashMap;

let policy = json!([{
    "@id": "ex:inlineSsnPolicy",
    "f:required": true,
    "f:onProperty": [{"@id": "http://schema.org/ssn"}],
    "f:action": "f:view",
    "f:query": serde_json::to_string(&json!({
        "where": {
            "@id": "?$identity",
            "http://example.org/ns/user": {"@id": "?$this"}
        }
    })).unwrap()
}]);

let opts = QueryConnectionOptions {
    policy: Some(policy),
    policy_values: Some(HashMap::from([(
        "?$identity".to_string(),
        json!({"@id": "http://example.org/identity/alice"}),
    )])),
    default_allow: true,
    ..Default::default()
};

let wrapped = wrap_policy_view(&ledger, &opts).await?;
}

Using query_from with Inline Policy

For FROM-driven queries where policy options are embedded in the query body, use query_from():

#![allow(unused)]
fn main() {
let query = json!({
    "@context": {
        "ex": "http://example.org/ns/",
        "schema": "http://schema.org/"
    },
    "from": "mydb:main",
    "opts": {
        "default-allow": true,
        "policy": [{
            "@id": "inline-ssn-policy",
            "f:required": true,
            "f:onProperty": [{"@id": "http://schema.org/ssn"}],
            "f:action": "f:view",
            "f:query": serde_json::to_string(&json!({
                "where": {
                    "@id": "?$identity",
                    "http://example.org/ns/user": {"@id": "?$this"}
                }
            })).unwrap()
        }],
        "policy-values": {
            "?$identity": {"@id": "http://example.org/identity/alice"}
        }
    },
    "select": ["?s", "?ssn"],
    "where": {
        "@id": "?s",
        "@type": "ex:User",
        "schema:ssn": "?ssn"
    }
});

let result = fluree.query_from()
    .jsonld(&query)
    .execute()
    .await?;
}

Policy Options Precedence

When multiple policy options are provided, they follow this precedence:

PriorityOptionBehavior
1 (highest)opts.identityQuery f:policyClass policies, auto-bind ?$identity
2opts.policy_classQuery policies of specified types
3 (lowest)opts.policyUse inline policy JSON directly

Important: If opts.identity is set, inline opts.policy is ignored.

Policy Structure Reference

f:allow (Static Allow/Deny)

{
    "@id": "ex:allowAll",
    "@type": ["f:AccessPolicy", "ex:MyPolicyClass"],
    "f:action": {"@id": "f:view"},
    "f:allow": true
}

f:query (Dynamic Evaluation)

{
    "@id": "ex:ownerOnly",
    "@type": ["f:AccessPolicy", "ex:MyPolicyClass"],
    "f:action": {"@id": "f:view"},
    "f:onProperty": [{"@id": "schema:ssn"}],
    "f:required": true,
    "f:query": "{\"where\": {\"@id\": \"?$identity\", \"ex:user\": {\"@id\": \"?$this\"}}}"
}

Policy Properties

PropertyTypeDescription
f:actionf:view / f:modifyWhat action this policy applies to
f:allowbooleanStatic allow (true) or deny (false)
f:querystring (JSON)Query that must return results for access to be granted
f:onPropertyIRI(s)Restrict policy to specific properties
f:onSubjectIRI(s)Restrict policy to specific subjects
f:onClassIRI(s)Restrict policy to instances of specific classes
f:requiredbooleanIf true, this policy MUST allow for access to be granted
f:exMessagestringCustom error message when policy denies access

Special Variables

VariableBinding
?$identityThe identity IRI (from opts.identity or policy_values["?$identity"])
?$thisThe subject being accessed (for property-level policies)

Policy Combining Algorithm

When multiple policies match a flake, they are combined using Deny Overrides:

  1. If any matching policy explicitly denies (f:allow: false), access is denied
  2. If a targeted policy’s f:query returns false, access is denied (doesn’t fall through to Default policies)
  3. If any policy allows (f:allow: true or f:query returns true), access is granted
  4. If no policies match and default_allow is true → access is granted
  5. Otherwise, access is denied

Identity resolution is three-state: FoundWithPolicies (restrictions apply) → FoundNoPolicies (subject exists, no restrictions) → NotFound (subject absent, no restrictions). The three-state split determines whether a concrete identity SID is available to bind ?$identity in policy queries; it does not gate default_allow. An unknown identity with default_allow: true is granted access — this is the intended behavior for deployments where an application layer handles authorization and Fluree records signed transactions for provenance. Set default_allow: false for fail-closed behavior.

Important: Inline policies must use full IRIs (e.g., "http://schema.org/ssn"), not compact IRIs (e.g., "schema:ssn"). Compact IRIs in inline policies are not expanded.

Transactions with Policy

Policies can also be applied to transactions using the builder API:

#![allow(unused)]
fn main() {
use fluree_db_api::policy_builder;

let policy_ctx = policy_builder::build_policy_context_from_opts(
    &ledger.snapshot,
    ledger.novelty.as_ref(),
    Some(ledger.novelty.as_ref()),
    ledger.t(),
    &qc_opts,
    &[0], // default graph; use resolve_policy_source_g_ids() for config-driven graphs
).await?;

let txn = json!({
    "@context": {"ex": "http://example.org/ns/"},
    "insert": [
        {"@id": "ex:alice", "ex:data": "secret"}
    ]
});

// Use the transaction builder with policy
let result = fluree.graph("mydb:main")
    .transact()
    .update(&txn)
    .policy(policy_ctx)
    .commit()
    .await;

match result {
    Ok(txn_result) => println!("Transaction succeeded at t={}", txn_result.ledger.t()),
    Err(e) => println!("Policy denied: {}", e),
}
}

Historical Views with Policy

For time-travel queries with policy, load a historical graph and apply policy as a view overlay:

#![allow(unused)]
fn main() {
use fluree_db_api::{GraphDb, QueryConnectionOptions};

// Load a historical view
let graph = fluree.view_at_t("mydb:main", 100).await?;

// Apply policy to create a view
let policy_ctx = policy_builder::build_policy_context_from_opts(
    &ledger.snapshot,
    ledger.novelty.as_ref(),
    Some(ledger.novelty.as_ref()),
    ledger.t(),
    &opts,
    &[0],
).await?;

let view = graph.with_policy(std::sync::Arc::new(policy_ctx));

// Query the historical view with policy applied
let result = view.query(&fluree)
    .jsonld(&query)
    .execute()
    .await?;
}

API Reference

wrap_identity_policy_view

#![allow(unused)]
fn main() {
pub async fn wrap_identity_policy_view<'a>(
    ledger: &'a LedgerState,
    identity_iri: &str,
    default_allow: bool,
) -> Result<PolicyWrappedView<'a>>
}

Creates a policy-wrapped view using identity-based f:policyClass lookup.

Parameters:

  • ledger: The ledger state to wrap
  • identity_iri: IRI of the identity subject (will query f:policyClass)
  • default_allow: Whether to allow access when no policies match. Ignored (forced false) if the identity IRI has no subject node in the ledger — see combining algorithm step 5

wrap_policy_view

#![allow(unused)]
fn main() {
pub async fn wrap_policy_view<'a>(
    ledger: &'a LedgerState,
    opts: &QueryConnectionOptions,
) -> Result<PolicyWrappedView<'a>>
}

Creates a policy-wrapped view from query connection options.

QueryConnectionOptions fields:

  • identity: Identity IRI for f:policyClass lookup
  • policy: Inline policy JSON
  • policy_class: Policy class IRIs to query
  • policy_values: Variable bindings for policy queries
  • default_allow: Default access when no policies match

PolicyWrappedView

#![allow(unused)]
fn main() {
impl PolicyWrappedView {
    /// Check if this is a root/unrestricted policy
    pub fn is_root(&self) -> bool;

    /// Get the underlying policy context
    pub fn policy(&self) -> &PolicyContext;

    /// Get the policy enforcer for query execution
    pub fn enforcer(&self) -> &Arc<QueryPolicyEnforcer>;
}
}

Best Practices

1. Prefer Identity-Based Policies

Store policies in the database for:

  • Version control with data
  • Audit trail of policy changes
  • Dynamic policy updates without code changes
  • Time-travel to historical policy states

2. Use HTTP IRIs for Identities

HTTP IRIs are more portable than DIDs for identity subjects:

#![allow(unused)]
fn main() {
// Recommended
let identity = "http://example.org/identity/alice";

// Also works but may have encoding issues
let identity = "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK";
}

3. Always Set default_allow Explicitly

#![allow(unused)]
fn main() {
// Be explicit about default behavior
let wrapped = wrap_identity_policy_view(&ledger, identity, false).await?;
//                                                          ^^^^^ explicit deny
}

4. Handle Policy Errors

#![allow(unused)]
fn main() {
let graph = GraphDb::from_ledger_state(&ledger)
    .with_policy(std::sync::Arc::new(policy_ctx));

match graph.query(&fluree).jsonld(&query).execute().await {
    Ok(result) => process_results(result),
    Err(ApiError::PolicyDenied { message, policy_id }) => {
        log::warn!("Access denied by {}: {}", policy_id, message);
        // Return empty or error to user
    }
    Err(e) => return Err(e),
}
}

Indexing and Search

Fluree provides powerful indexing and search capabilities beyond standard graph queries. This section covers background indexing, full-text search, and vector similarity search.

Index Types

Background Indexing

Core database indexing for query performance:

  • SPOT, POST, OPST, PSOT indexes
  • Automatic index maintenance
  • Indexing configuration
  • Performance tuning
  • Monitoring and metrics

Reindex API

Manual index rebuilding for recovery and maintenance:

  • Memory-bounded batched processing
  • Checkpointing for resumable operations
  • Progress monitoring with callbacks
  • Resume after interruption
  • Index configuration options

Inline BM25-ranked text scoring. Two entry points, same query surface:

  • @fulltext datatype — per-value annotation (analogous to @vector), always English, zero config
  • f:fullTextDefaults config — declare properties + language once at the ledger level; supports 18 languages with Snowball stemming and per-graph overrides for multilingual setups
  • fulltext(?var, "query") scoring function in bind expressions (same for both paths)
  • Automatic per-(graph, property, language) fulltext arena construction during background indexing
  • Unified scoring across indexed and novelty documents
  • Works immediately (no-index fallback) with optimal performance after indexing

BM25 Full-Text Search

Dedicated full-text search indexes using BM25 ranking (for large-scale corpora):

  • Creating BM25 indexes via Rust API
  • Query-based field selection (indexing query defines what to index)
  • BM25 scoring with configurable k1/b parameters
  • Block-Max WAND for efficient top-k queries
  • Incremental index updates via property-dependency tracking

Approximate nearest neighbor (ANN) search for embeddings:

  • Vector index configuration
  • Embedded HNSW indexes (in-process) or remote via dedicated search service
  • Embedding storage with @vector datatype (resolves to https://ns.flur.ee/db#embeddingVector)
  • Similarity queries via f:* syntax
  • Deployment modes (embedded / remote)
  • Use cases (semantic search, recommendations)

Geospatial

Geographic point data with native binary encoding:

  • geo:wktLiteral datatype support (OGC GeoSPARQL)
  • Automatic POINT geometry detection and optimization
  • Packed 60-bit lat/lng encoding (~0.3mm precision)
  • Foundation for proximity queries (latitude-band index scans)

Indexing Architecture

Fluree maintains multiple index types for different query patterns:

Core Indexes (automatic):

  • SPOT: Subject-Predicate-Object-Time
  • POST: Predicate-Object-Subject-Time
  • OPST: Object-Predicate-Subject-Time
  • PSOT: Predicate-Subject-Object-Time

Graph Source Indexes (explicit):

  • BM25: Full-text search indexes
  • Vector: Embedding similarity indexes
  • R2RML: Relational database views
  • Iceberg: Data lake integrations

Background Indexing

Core database indexing happens automatically:

Transaction → Commit → Background Indexer → Index Published

Process:

  1. Transaction committed (t assigned)
  2. Commit published to nameservice
  3. Background indexer detects new commit
  4. Indexes updated (SPOT, POST, OPST, PSOT)
  5. Index snapshot published

Novelty Layer:

  • Gap between latest commit and latest index
  • Queries combine indexed data + novelty
  • Monitored via commit_t - index_t

See Background Indexing for details.

Inline Fulltext Search

For small-to-medium corpora (up to hundreds of thousands of documents per predicate), inline fulltext search provides BM25-ranked scoring with zero configuration:

Annotate data:

{
  "@id": "ex:article-1",
  "ex:content": {
    "@value": "Rust is a systems programming language focused on safety",
    "@type": "@fulltext"
  }
}

Query with scoring:

{
  "select": ["?title", "?score"],
  "where": [
    { "@id": "?doc", "ex:content": "?content", "ex:title": "?title" },
    ["bind", "?score", "(fulltext ?content \"Rust programming\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

See Inline Fulltext Search for details.

Full-Text Search (BM25 Graph Source)

For larger corpora (1M+ documents) with strict latency requirements, the BM25 graph source pipeline provides WAND-based top-k pruning, chunked posting lists, and incremental updates:

BM25 provides ranked full-text search:

Creating Index (Rust API):

#![allow(unused)]
fn main() {
use fluree_db_api::Bm25CreateConfig;
use serde_json::json;

let query = json!({
    "@context": { "schema": "http://schema.org/" },
    "where": [{ "@id": "?x", "@type": "schema:Product", "schema:name": "?name" }],
    "select": { "?x": ["@id", "schema:name", "schema:description"] }
});
let config = Bm25CreateConfig::new("products-search", "mydb:main", query);
let result = fluree.create_full_text_index(config).await?;
}

There are no HTTP endpoints for index management yet — indexes are managed via the Rust API.

Searching:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "mydb:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop computer",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ],
  "orderBy": ["-?score"]
}

See BM25 for details.

Vector Search

Similarity search using vector embeddings via HNSW indexes (embedded or remote).

Important: Embeddings must be stored with the vector datatype (@type: "@vector", @type: "f:embeddingVector", or full IRI https://ns.flur.ee/db#embeddingVector) to preserve array structure.

Creating Index (Rust API):

#![allow(unused)]
fn main() {
let config = VectorCreateConfig::new(
    "products-vector", "mydb:main", query, "ex:embedding", 384
);
fluree.create_vector_index(config).await?;
}

Searching:

{
  "from": "mydb:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": [0.1, 0.2, ..., 0.9],
      "f:searchLimit": 10,
      "f:searchResult": {
        "f:resultId": "?product",
        "f:resultScore": "?score"
      }
    }
  ]
}

See Vector Search for details.

Index as Graph Sources

Search indexes are exposed as graph sources:

Graph Source Names:

  • products-search:main - BM25 index
  • products-vector:main - Vector index

Query Like Regular Ledgers:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "mydb:main",
  "select": ["?product", "?name", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    },
    { "@id": "?product", "schema:name": "?name" }
  ]
}

Combines structured data with search results via the f:graphSource pattern.

Index Management

Creating Indexes

BM25 and vector indexes are created via the Rust API. See BM25 and Vector Search for details.

Updating Indexes

BM25 indexes are not automatically updated when the source ledger changes. They must be explicitly synced:

#![allow(unused)]
fn main() {
// Incremental sync (detects changes since last watermark)
let result = fluree.sync_bm25_index("products-search:main").await?;

// Or use the Bm25MaintenanceWorker for automatic background syncing
}

The Bm25MaintenanceWorker can be configured to watch for ledger commits and sync automatically.

Deleting Indexes

#![allow(unused)]
fn main() {
let result = fluree.drop_full_text_index("products-search:main").await?;
}

Performance Characteristics

Inline Fulltext Search

  • Indexed throughput: ~625,000 docs/sec (50K paragraph-length docs in 80ms)
  • Novelty throughput: ~85,000 docs/sec (50K docs in ~600ms, no index required)
  • Indexed speedup: 7-7.5x faster than novelty-only
  • Scaling: Near-linear; ~625K docs within a 1-second query budget
  • Arena build: Adds minimal overhead to the normal binary index build
  • Index Build Time: O(n) for n documents
  • Top-k Query Time: Sub-linear via Block-Max WAND — skips posting list segments that cannot contribute to the top-k, with early termination. Falls back to O(total matching postings) when k approaches corpus size.
  • Space: ~2-3x document size
  • Updates: Incremental via property-dependency tracking, O(changed docs)

Vector Search

  • Flat scan (inline functions): O(n) brute-force, viable up to ~100K vectors with binary indexing; binary index provides ~6x speedup over novelty-only scans and ~25x for filtered queries
  • HNSW index: O(log n) approximate nearest neighbor, recommended for 100K+ vectors or strict latency requirements
  • Space: ~1.5x embedding size
  • Updates: Incremental, O(1) per vector
  • See Vector Search – Performance and Scaling for benchmark data and guidance on when to adopt HNSW

Combined Queries

Combine search with graph queries:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "mydb:main",
  "select": ["?product", "?category"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product" }
    },
    { "@id": "?product", "schema:category": "?category" }
  ]
}

Query optimizer handles joins between the search graph source and structured data efficiently.

Use Cases

E-commerce Product Search:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "wireless headphones",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ],
  "orderBy": ["-?score"]
}

Document Management:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "documents:main",
  "where": [
    {
      "f:graphSource": "documents-search:main",
      "f:searchText": "quarterly report 2024",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?doc" }
    },
    { "@id": "?doc", "ex:department": "finance" }
  ]
}

Vector Similarity

Semantic Search:

{
  "from": "articles:main",
  "values": [
    ["?queryVec"],
    [{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
  ],
  "where": [
    {
      "f:graphSource": "articles-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 10,
      "f:searchResult": {
        "f:resultId": "?article",
        "f:resultScore": "?vecScore"
      }
    }
  ],
  "select": ["?article", "?vecScore"],
  "orderBy": [["desc", "?vecScore"]]
}

Recommendation Engine:

{
  "from": "products:main",
  "where": [
    {
      "@id": "ex:product-123",
      "ex:embedding": "?queryVec"
    },
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 5,
      "f:searchResult": { "f:resultId": "?similar", "f:resultScore": "?vecScore" }
    }
  ],
  "select": ["?similar", "?vecScore"],
  "orderBy": [["desc", "?vecScore"]]
}

Hybrid Search

Combine text and vector search:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "values": [
    ["?queryVec"],
    [{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
  ],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
    },
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vecScore" }
    }
  ],
  "bind": {
    "?finalScore": "(?textScore * 0.6) + (?vecScore * 0.4)"
  },
  "orderBy": ["-?finalScore"]
}

Monitoring

Check BM25 Staleness

Check whether a BM25 index is behind its source ledger:

#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("products-search:main").await?;
println!("Index at t={}, ledger at t={}, stale: {}, lag: {}",
    check.index_t, check.ledger_t, check.is_stale, check.lag);
}

Background Maintenance

The Bm25MaintenanceWorker watches for source ledger commits and syncs indexes automatically:

  • Debounces rapid commits (configurable interval)
  • Bounded concurrency for concurrent sync operations
  • Registers/unregisters graph sources dynamically

Best Practices

1. Choose Appropriate Index Type

  • Structured queries: Use core graph indexes
  • Keyword search (< 500K docs): Use inline @fulltext for zero-config BM25 scoring
  • Keyword search (1M+ docs): Use the BM25 graph source for WAND-optimized top-k retrieval
  • Semantic similarity: Use vector search
  • Hybrid: Combine multiple indexes

2. Tune BM25 Parameters

Adjust k1 and b for your corpus:

#![allow(unused)]
fn main() {
let config = Bm25CreateConfig::new("search", "docs:main", query)
    .with_k1(1.5)  // Higher = more weight to term frequency (default: 1.2)
    .with_b(0.5);   // Lower = less document length normalization (default: 0.75)
}

The indexing query controls which properties are indexed — all selected text properties contribute to the document’s searchable content.

3. Monitor Index Staleness

Check staleness after bulk operations:

#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("search:main").await?;
if check.is_stale {
    fluree.sync_bm25_index("search:main").await?;
}
}

4. Sync After Bulk Updates

BM25 indexes require explicit sync. After bulk inserts, sync once at the end:

#![allow(unused)]
fn main() {
// Insert many documents...
for batch in batches {
    fluree.insert(ledger.clone(), &batch).await?;
}
// Sync the BM25 index once after all inserts
fluree.sync_bm25_index("products-search:main").await?;
}

5. Use Appropriate Limits

Limit results for performance:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "docs:main",
  "where": [
    {
      "f:graphSource": "docs-search:main",
      "f:searchText": "search query",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?doc" }
    }
  ]
}

Background Indexing

Fluree maintains query-optimized indexes through a background indexing process. This document covers the indexing architecture, configuration, and monitoring.

Index Architecture

Fluree maintains four index permutations for efficient query execution:

SPOT (Subject-Predicate-Object-Time)

Organized by subject first:

ex:alice → schema:name → "Alice" → [t=1, t=5]
ex:alice → schema:age → 30 → [t=1]
ex:alice → schema:age → 31 → [t=10]

Optimized for: “Give me all properties of this subject”

POST (Predicate-Object-Subject-Time)

Organized by predicate first:

schema:name → "Alice" → ex:alice → [t=1, t=5]
schema:age → 30 → ex:alice → [t=1]
schema:age → 31 → ex:alice → [t=10]

Optimized for: “Find all subjects with this property/value”

OPST (Object-Predicate-Subject-Time)

Organized by object first:

"Alice" → schema:name → ex:alice → [t=1, t=5]
30 → schema:age → ex:alice → [t=1]
31 → schema:age → ex:alice → [t=10]

Optimized for: “Find subjects with this object value”

PSOT (Predicate-Subject-Object-Time)

Organized by predicate, then subject:

schema:name → ex:alice → "Alice" → [t=1, t=5]
schema:age → ex:alice → 30 → [t=1]
schema:age → ex:alice → 31 → [t=10]

Optimized for: “Get all values for this predicate”

Indexing Process

1. Transaction Commit

t=42: Transaction committed
  - Flakes written to append-only log
  - Commit metadata created
  - Commit published to nameservice (commit_t=42)

2. Indexer Detection

Background indexing is triggered when the ledger’s novelty exceeds the configured threshold (see Configuration below):

Indexer checks: commit_t=42, index_t=40
Indexer: Need to index t=41, t=42

3. Index Building

Background indexing builds a new index snapshot up to a specific to_t (typically the current commit_t when the job starts). During the job, new commits may arrive; those remain in novelty for the next cycle.

Incremental indexing (default path):
  - Load the existing index root (CAS CID) from nameservice
  - Resolve only commits with t in (index_t, to_t]
  - Merge resolved novelty into only the affected leaf blobs (Copy-on-Write)
  - Update dictionaries (forward packs + reverse trees)
  - Assemble a new root referencing mostly-unchanged CAS artifacts

Fallback:
  - If incremental indexing cannot safely proceed, fall back to a full rebuild

4. Index Publishing

When complete:

  - Upload new CAS blobs (leaves, branches, dict blobs) as needed
  - Upload the new index root (CAS CID)
  - Publish index_head_id to nameservice (atomic “commit point”)
  - Update index_t to to_t

Novelty Layer

The novelty layer consists of transactions committed but not yet indexed:

Current State:
  commit_t = 150
  index_t = 145
  novelty = [t=146, t=147, t=148, t=149, t=150]

Query Execution with Novelty

Queries combine indexed data with novelty:

Query for ex:alice's properties:

1. Check SPOT index (up to t=145)
2. Apply novelty layer (t=146 to t=150)
3. Combine results

Impact of Large Novelty

Small novelty (< 10 transactions):

  • Minimal query overhead
  • Fast query execution

Large novelty (> 100 transactions):

  • Significant query overhead
  • Slower query execution
  • Higher memory usage

Configuration

Background indexing is on by default. Indexing is triggered based on novelty size thresholds:

  • Enable/disable background indexing: --indexing-enabled / FLUREE_INDEXING_ENABLED (default true; disable only when a peer/indexer process owns this storage)
  • Trigger threshold (soft): --reindex-min-bytes / FLUREE_REINDEX_MIN_BYTES
  • Backpressure threshold (hard): --reindex-max-bytes / FLUREE_REINDEX_MAX_BYTES

See Operations: Configuration for the canonical flag/env/config-file reference.

Incremental parallelism (per ledger)

Within a single incremental indexing job, Fluree can update multiple (graph, index-order) branches concurrently. This is bounded by:

  • IndexerConfig.incremental_max_concurrency (default: 4)

This setting is part of the Rust IndexerConfig used by the indexer pipeline; it is not a server CLI flag. Increasing it can improve throughput on multi-graph ledgers and can run the four main index orders (SPOT/PSOT/POST/OPST) in parallel, at the cost of higher peak memory.

Monitoring

Check Index Status

curl http://localhost:8090/v1/fluree/info/mydb:main

Response:

{
  "ledger_id": "mydb:main",
  "branch": "main",
  "commit_t": 150,
  "index_t": 145,
  "commit_id": "bafy...headCommit",
  "index_id": "bafy...indexRoot"
}

Key Metrics:

  • index lag (txns): commit_t - index_t

For byte-level novelty size and indexing trigger decisions, see the indexing block returned by transaction and replication endpoints (e.g. POST /push/<ledger>), documented in API Endpoints.

Key Log Messages

At INFO, background indexing now emits coarse-grained progress logs that make it easier to distinguish:

  • request queued vs. worker started
  • current wait status while trigger_index() is blocked
  • incremental vs. rebuild path selection
  • commit-chain walking progress
  • commit resolution progress and phase completion

When background indexing is queued by an HTTP transaction request, the worker logs also include copied request_id and trace_id fields from the triggering request. This provides log-level correlation between the foreground request and the later background build without making the index build part of the original request trace.

At DEBUG, the same wait and commit-walk paths emit more frequent progress updates for incident debugging without changing behavior.

When you call indexing through the Rust API with trigger_index(), wait timeout is optional and should generally be chosen by the caller. Leave TriggerIndexOptions.timeout_ms unset to wait until completion, or set it explicitly for bounded environments such as Lambda jobs, HTTP gateways, or other workers with a fixed maximum runtime.

Health Indicators

Healthy:

index_lag: 0-10 transactions
index_rate > transaction_rate

Warning:

index_lag: 10-50 transactions
index_rate ≈ transaction_rate

Critical:

index_lag: > 50 transactions
index_rate < transaction_rate

Performance Tuning

Optimize for Write-Heavy Loads

fluree-server \
  --indexing-enabled \
  --reindex-min-bytes 200000 \
  --reindex-max-bytes 2000000

Larger thresholds reduce indexing frequency (more novelty accumulation), trading some query-time overlay cost for reduced background indexing activity.

Optimize for Read-Heavy Loads

fluree-server \
  --indexing-enabled \
  --reindex-min-bytes 50000

Smaller reindex-min-bytes keeps novelty smaller (better query performance) at the cost of more frequent background indexing cycles.

Index Storage

Index Snapshots

Indexes are stored as immutable, content-addressed snapshots:

  - Leaf blobs (FLI3) and branch manifests (FBR3)
  - Dictionary blobs (forward packs, reverse tree leaves/branches)
  - An index root blob (FIR6) that references everything needed for queries

The nameservice stores the current index root CID (index_head_id) and its watermark (index_t). Peers fetch only the CAS objects they need on demand.

Index Retention

Old index snapshots are retained for time-travel safety and concurrent query safety. Cleanup is performed by the binary index garbage collector, governed by:

  • IndexerConfig.gc_max_old_indexes
  • IndexerConfig.gc_min_time_mins

No standalone HTTP compaction endpoint is currently exposed. Use POST /v1/fluree/reindex when you need to force a full index refresh.

Troubleshooting

High indexing lag

Symptom: commit_t - index_t grows continuously

Causes:

  • Transaction rate exceeds indexing capacity
  • Large transactions
  • Insufficient resources

Solutions:

  1. Reduce reindex-min-bytes so indexing triggers sooner
  2. Increase resources for the indexer (CPU/memory and storage throughput)
  3. Consider running a dedicated indexer process (separate from the transactor)
  4. For incremental indexing, consider increasing IndexerConfig.incremental_max_concurrency

Slow Indexing

Symptom: index_t advances slowly (or stops advancing)

Causes:

  • Disk I/O bottleneck
  • CPU bottleneck
  • Large index size
  • Storage backend latency

Solutions:

  1. Use faster storage (SSD)
  2. Increase CPU allocation
  3. Optimize transaction patterns
  4. Use local storage vs network storage

Index Corruption

Symptom: Query errors, unexpected results

Recovery: Use the Reindex API to rebuild indexes from scratch if you suspect corruption or need to change index structure parameters.

Best Practices

1. Monitor Novelty

setInterval(async () => {
  const status = await fetch('http://localhost:8090/v1/fluree/info/mydb:main')
    .then(r => r.json());
  
  const lag = status.t - status.index.t;
  if (lag > 50) {
    console.warn(`High indexing lag: ${lag} transactions`);
  }
}, 30000);  // Check every 30 seconds

2. Tune for Workload

Match configuration to workload pattern:

  • Write-heavy: Larger reindex-min-bytes (fewer indexing cycles)
  • Read-heavy: Smaller reindex-min-bytes (less novelty overlay)
  • Balanced: Default settings

3. Capacity Planning

Estimate indexing capacity:

Transaction rate: 10 txn/second
Avg flakes per txn: 100
Total flakes: 1,000 flakes/second

Indexing capacity: 2,000 flakes/second (2× margin)

4. Alert on Lag

Set up alerting:

const lag = status.commit_t - status.index_t;
if (lag > 100) {
  alertOps('Critical: Indexing lag > 100 transactions');
}

5. Scheduled Reindex

Run a full reindex during off-peak hours when you need to rebuild indexes:

# Cron job
0 2 * * * curl -X POST http://localhost:8090/v1/fluree/reindex -H "Content-Type: application/json" -d '{"ledger":"mydb:main"}'

Reindex API

The Reindex API provides full rebuilds of ledger indexes from the commit chain. Use this when you need to rebuild indexes from scratch, such as after suspected corruption or index configuration changes.

Overview

Unlike background indexing which incrementally updates indexes as transactions commit, reindexing rebuilds the entire binary columnar index from the commit history.

Reindex publishes the new index root via publish_index_allow_equal, which means a reindex can produce a new index root CID even when index_t stays the same (same logical snapshot, different physical layout/config).

When to Reindex

Common Use Cases

  1. Index corruption - Query errors or unexpected results suggest corrupted indexes
  2. Configuration changes - Changing index parameters (leaf size, branch size)
  3. Storage backend changes - If you move a deployment between storage backends or adopt a new index strategy/type.

Before You Reindex

Consider these factors:

  • Duration: Full reindex scales with ledger size; large ledgers may take hours
  • Resources: Ensure adequate memory and storage during the operation
  • Availability: Queries remain available during reindex, but may be slower
  • Backup: Be sure to back up data before major reindex operations

Rust API

The reindex API is exposed through the Fluree type in fluree-db-api. Fluree owns the storage backend, node cache, nameservice, and provides all ledger operations including queries, transactions, and admin functions like reindex.

Basic Reindex

#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, ReindexOptions, ReindexResult};

// Create Fluree instance
let fluree = FlureeBuilder::file("/path/to/data")
    .build()
    .await?;

// Reindex with default options
let result: ReindexResult = fluree.reindex("mydb:main", ReindexOptions::default()).await?;

println!("Reindexed to t={}", result.index_t);
println!("Root ID: {}", result.root_id);
}

Reindex with Custom Options

#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, ReindexOptions};
use fluree_db_indexer::IndexerConfig;

let fluree = FlureeBuilder::file("/path/to/data").build().await?;

let result = fluree.reindex("mydb:main", ReindexOptions::default()
    // Use custom index node sizes
    .with_indexer_config(IndexerConfig::large())
).await?;
}

ReindexOptions Reference

OptionDefaultDescription
indexer_configIndexerConfig::default()Controls output index structure (leaf/branch sizes, GC settings, memory budget)

indexer_config

Controls the output index structure and rebuild resources:

#![allow(unused)]
fn main() {
use fluree_db_indexer::IndexerConfig;

// For small datasets (< 100k flakes)
ReindexOptions::default()
    .with_indexer_config(IndexerConfig::small())

// For large datasets (> 10M flakes)
ReindexOptions::default()
    .with_indexer_config(IndexerConfig::large())

// Custom configuration
let config = IndexerConfig::default()
    .with_gc_max_old_indexes(10)       // Keep more old index versions
    .with_gc_min_time_mins(60)         // Retain for at least 60 minutes
    .with_run_budget_bytes(1 << 30)    // 1 GB memory budget for sort buffers
    .with_data_dir("/data/fluree");    // Directory for index artifacts

ReindexOptions::default()
    .with_indexer_config(config)
}

Key IndexerConfig fields:

FieldDefaultDescription
leaf_target_bytes187,500Target bytes per leaf node
leaf_max_bytes375,000Maximum bytes per leaf node (triggers split)
branch_target_children100Target children per branch node
branch_max_children200Maximum children per branch node
gc_max_old_indexes5Old index versions to retain before GC
gc_min_time_mins30Minimum age (minutes) before an index can be GC’d
run_budget_bytes256 MBMemory budget for sort buffers (split across all sort orders)
data_dirSystem temp dirBase directory for index artifacts
incremental_enabledtrueBackground indexing: attempt incremental updates before full rebuild
incremental_max_commits10,000Background indexing: max commit window for incremental indexing
incremental_max_concurrency4Background indexing: max concurrent (graph, order) branch updates

Note: Reindex is a full rebuild. The incremental_* fields are used by background indexing and are not relevant to the semantics of a reindex operation.

ReindexResult

The reindex operation returns:

#![allow(unused)]
fn main() {
pub struct ReindexResult {
    /// Ledger ID
    pub ledger_id: String,
    /// Transaction time the index was built to
    pub index_t: i64,
    /// ContentId of the new index root
    pub root_id: ContentId,
    /// Index build statistics
    pub stats: IndexStats,
}
}

Error Handling

Common Errors

#![allow(unused)]
fn main() {
use fluree_db_api::ApiError;

match fluree.reindex("mydb:main", opts).await {
    Ok(result) => println!("Success: t={}", result.index_t),
    Err(ApiError::NotFound(msg)) => {
        // Ledger doesn't exist or has no commits
        println!("Ledger not found: {}", msg);
    }
    Err(ApiError::ReindexConflict { expected, found }) => {
        // Ledger advanced during reindex (new commits arrived)
        println!("Conflict: expected t={}, found t={}", expected, found);
    }
    Err(e) => {
        // Storage, indexing, or other errors
        println!("Reindex failed: {}", e);
    }
}
}

How It Works

The reindex operation:

  1. Looks up the current ledger state and captures commit_t for conflict detection
  2. Cancels any active background indexing for the ledger
  3. Rebuilds a fresh binary columnar index from the full commit chain using rebuild_index_from_commits:
    • Phase A: Walks the commit DAG once, reading only the envelope header of each commit via byte-range requests (ContentStore::get_range). Returns the chronological CID list plus the genesis-most NsSplitMode in a single pass, so per-commit bandwidth on remote storage is ~128 KiB rather than the full commit blob.
    • Phase B: Resolves commits into batched chunks with chunk-local dictionaries (subjects, strings) and shared global dictionaries (predicates, datatypes, graphs, languages, numbigs, vectors). Commit blobs are pre-fetched concurrently (buffered(K), default K=3, env-tunable via FLUREE_REBUILD_FETCH_CONCURRENCY) so S3 round-trip latency overlaps with local decode cost.
    • Phase C: Merges per-chunk dictionaries into global dictionaries with remap tables
    • Phase D: Builds SPOT indexes from sorted commit files via k-way merge with graph-aware partitioning
    • Phase E: Builds secondary indexes (PSOT, POST, OPST) per-graph from partitioned run files
    • Phase F: Uploads dictionaries and index artifacts to CAS, creates IndexRoot (FIR6)
  4. Validates that no new commits arrived during the build (conflict detection)
  5. Publishes the new index root via publish_index_allow_equal
  6. Spawns async garbage collection to clean up old index versions

The rebuilt index preserves full time-travel history: retract-winner events and their preceding asserts are stored in Region 3 (history) of leaf nodes, enabling as-of queries at any past transaction time.

Best Practices

1. Schedule During Low-Traffic Periods

While queries continue to work during reindex, performance may be impacted. Schedule large reindex operations during maintenance windows when possible.

2. Tune Memory Budget for Large Ledgers

For ledgers with millions of flakes, increasing run_budget_bytes reduces the number of spill files and speeds up the merge phase:

#![allow(unused)]
fn main() {
let config = IndexerConfig::default()
    .with_run_budget_bytes(2 * 1024 * 1024 * 1024); // 2 GB
}

3. Tune Phase B Fetch Concurrency for Remote Storage

When reindexing from remote storage (S3) on latency-bound platforms like AWS Lambda, Phase B benefits from fetching several commit blobs in parallel so S3 round-trip latency (25–50 ms) overlaps with local decode cost.

# Default: 3. Increase for high-latency links; pin to 1 for strict serial behavior.
export FLUREE_REBUILD_FETCH_CONCURRENCY=4

In-flight memory is bounded by K × avg_commit_blob_size. For typical commits (< 1 MB) and K=3, the overhead is negligible against the run_budget_bytes pool. Pathologically large commits (hundreds of MB) should set K=1 to avoid transient memory spikes.

4. Verify After Reindex

After reindex, verify the results:

#![allow(unused)]
fn main() {
// Get ledger info to check state
let info = fluree.ledger_info(ledger_id).execute().await?;
println!("Index rebuilt to t={}", info["index"]["t"]);

// Run a sample query to verify correctness
let db = fluree_db_api::GraphDb::from_ledger_state(&ledger);
let query_result = fluree.query(&db, &sample_query).await?;
}

5. Concurrent Operations

During reindex:

  • Queries continue to work (using old index + novelty)
  • Transactions continue to work (writes to novelty)
  • Background indexing is paused for this ledger

Inline Fulltext Search

Inline fulltext search enables BM25-ranked text scoring directly in queries, using the @fulltext datatype (or a ledger-level f:fullTextDefaults config) and the fulltext() scoring function. This follows the same pattern as @vector and inline similarity functions: declare what to index, persist as normal commits, and query with a scoring function in bind expressions. No external services, no separate ingestion pipeline.

Two ways to enable fulltext scoring on a property:

  • Per-value annotation (@fulltext datatype) — zero-config, always English. Tag individual literal values at insert time. Good for a handful of obviously-fulltext fields where English is fine.
  • Property-level configuration (f:fullTextDefaults) — declare once in the ledger’s config graph which properties should be full-text indexed, and optionally which language to analyze them in. Plain-string values on those properties get indexed automatically — no @type annotation needed at insert time. Required when you want non-English stemming/stopwords, or when you want every value of a property indexed by default.

Both paths produce the same on-disk BM25 arenas and are queried with the same fulltext(?var, "query") function.

Use cases:

  • Document ranking: Score and rank articles, product descriptions, or knowledge base entries by keyword relevance
  • Content discovery: Find the most relevant documents for a natural language query
  • Faceted search: Combine fulltext scoring with graph pattern filters (e.g., score only documents in a specific category)
  • Multilingual catalogs: Index product descriptions in Spanish on one graph and English on another, with the right stemmer picked automatically per-language

The @fulltext Datatype

Why a dedicated datatype?

Plain strings in Fluree are stored as xsd:string values. They are indexed for exact matching and prefix queries, but not for full-text search. The @fulltext datatype tells Fluree that a string value should be analyzed (tokenized, stemmed, stopword-filtered) and indexed for relevance scoring.

@fulltext is a JSON-LD shorthand that resolves to the full IRI https://ns.flur.ee/db#fullText, which can also be written as f:fullText when the Fluree namespace prefix is declared in your @context.

Inserting fulltext values (JSON-LD)

Use "@type": "@fulltext" to annotate a string as fulltext-searchable:

{
  "@context": {
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "ex:article-1",
      "@type": "ex:Article",
      "ex:title": "Rust Programming",
      "ex:content": {
        "@value": "Rust is a systems programming language focused on safety and performance",
        "@type": "@fulltext"
      }
    }
  ]
}

You can also use the full IRI or f: prefix form:

{
  "@context": {
    "ex": "http://example.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "@graph": [
    {
      "@id": "ex:article-1",
      "ex:content": {
        "@value": "Rust is a systems programming language...",
        "@type": "f:fullText"
      }
    }
  ]
}

Inserting fulltext values (Turtle / SPARQL UPDATE)

In Turtle and SPARQL UPDATE, the @fulltext shorthand is not available. Use the f:fullText datatype IRI with the standard ^^ typed-literal syntax.

Turtle data file:

@prefix ex: <http://example.org/> .
@prefix f: <https://ns.flur.ee/db#> .

ex:article-1
  a ex:Article ;
  ex:title "Introduction to Rust" ;
  ex:content "Rust is a systems programming language focused on safety and performance"^^f:fullText .

ex:article-2
  a ex:Article ;
  ex:title "Database Design Patterns" ;
  ex:content "Modern database systems use columnar storage and immutable ledgers"^^f:fullText .

SPARQL UPDATE:

PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>

INSERT DATA {
  ex:article-1 a ex:Article ;
    ex:title "Introduction to Rust" ;
    ex:content "Rust is a systems programming language focused on safety"^^f:fullText .
}

The ^^f:fullText annotation is the Turtle/SPARQL equivalent of "@type": "@fulltext" in JSON-LD. Without it, the string is stored as a plain xsd:string.

Multiple fulltext properties per entity

An entity can have @fulltext on multiple different properties:

{
  "@id": "ex:article-1",
  "ex:title": {
    "@value": "Rust Programming Guide",
    "@type": "@fulltext"
  },
  "ex:content": {
    "@value": "Rust is a systems programming language focused on safety...",
    "@type": "@fulltext"
  }
}

Each property produces an independent fulltext index (arena). When you query with fulltext(), the function automatically uses the arena for the property bound to the variable.

Portability

@fulltext annotations are fully portable across Fluree’s data distribution pipeline. Import, export, push, and pull all preserve @fulltext type annotations, and indexes are rebuilt transparently on the receiving side.

Configured Full-Text Properties (f:fullTextDefaults)

The @fulltext datatype is a per-value shortcut — you decide at insert time, one triple at a time, whether a string gets full-text indexed, and English is the only supported language. For many real-world workloads that’s not what you want. You want to say once, at the ledger level, “index every value of ex:title”, or “index ex:productName in the product catalog graph in Spanish.” That’s what f:fullTextDefaults gives you.

When a property is declared in f:fullTextDefaults, any plain xsd:string or rdf:langString value on that property gets full-text indexed — no @type: @fulltext needed on individual values. Language-tagged (rdf:langString) values automatically route to a per-language arena (French stemmer for "fr", Spanish stopwords for "es", and so on). Untagged plain strings fall back to the configured default language.

The @fulltext datatype continues to work exactly as before: any value tagged @fulltext is always indexed as English, regardless of what f:fullTextDefaults says about its property. You can mix both paths on the same property; English content from either path lands in a single shared arena.

When to use which

NeedUse
English-only, a few obviously-fulltext fields, want the choice per-value@fulltext datatype
Non-English (or mixed languages)f:fullTextDefaults with f:defaultLanguage
Every value of a property should be searchable, no per-value opt-inf:fullTextDefaults
Different languages per graph (e.g. multilingual catalog)f:fullTextDefaults with per-graph overrides
Zero config, just works@fulltext datatype

Setting it up

Write configuration into the ledger’s #config named graph, alongside any other config groups (policy, SHACL, reasoning, etc.). The config is itself a transaction — it’s versioned and auditable like any other data.

Minimal — index ex:title and ex:body, English by default:

@prefix f: <https://ns.flur.ee/db#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex: <http://example.org/> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    f:fullTextDefaults [
      a f:FullTextDefaults ;
      f:defaultLanguage "en" ;
      f:property [ a f:FullTextProperty ; f:target ex:title ] ,
                 [ a f:FullTextProperty ; f:target ex:body ]
    ] .
}

Or as JSON-LD:

{
  "@context": {
    "f": "https://ns.flur.ee/db#",
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "urn:fluree:mydb:main:config:ledger",
      "@type": "f:LedgerConfig",
      "@graph": "urn:fluree:mydb:main#config",
      "f:fullTextDefaults": {
        "@type": "f:FullTextDefaults",
        "f:defaultLanguage": "en",
        "f:property": [
          { "@type": "f:FullTextProperty", "f:target": { "@id": "ex:title" } },
          { "@type": "f:FullTextProperty", "f:target": { "@id": "ex:body" } }
        ]
      }
    }
  ]
}

HTTP / Docker: the same JSON-LD config goes into a regular /update transaction. Wrap it in @graph and POST to the ledger:

curl -X POST 'http://localhost:8090/v1/fluree/update?ledger=mydb:main' \
  -H 'Content-Type: application/json' \
  -d @- <<'JSON'
{
  "@context": {
    "f": "https://ns.flur.ee/db#",
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "urn:fluree:mydb:main:config:ledger",
      "@type": "f:LedgerConfig",
      "@graph": "urn:fluree:mydb:main#config",
      "f:fullTextDefaults": {
        "@type": "f:FullTextDefaults",
        "f:defaultLanguage": "en",
        "f:property": [
          { "@type": "f:FullTextProperty", "f:target": { "@id": "ex:title" } },
          { "@type": "f:FullTextProperty", "f:target": { "@id": "ex:body" } }
        ]
      }
    }
  ]
}
JSON

The config is stored in the ledger’s #config named graph (note the "@graph": "urn:fluree:mydb:main#config" placement directive on the resource). To verify, query the config graph:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H 'Content-Type: application/json' \
  -d '{
    "@context": { "f": "https://ns.flur.ee/db#" },
    "from": "mydb:main",
    "from-named": ["urn:fluree:mydb:main#config"],
    "where": [{ "@graph": "urn:fluree:mydb:main#config",
                "@id": "?cfg", "f:fullTextDefaults": "?defaults" }],
    "select": ["?cfg", "?defaults"]
  }'

After writing config, trigger a reindex so existing values on ex:title and ex:body get indexed. See Reindexing after a config change below.

Data writes don’t change. Once config is in place and the reindex has run, just insert plain strings the way you always would:

{
  "@id": "ex:doc1",
  "ex:title": "Rust programming language guide",
  "ex:body": "Rust is a systems programming language..."
}

Both values flow into BM25 arenas automatically.

Multiple languages

Fluree ships Snowball stemmers and curated stopwords for 18 languages. Pick one as your ledger default via f:defaultLanguage; any BCP-47 tag in the list below works.

TagLanguage
arArabic
daDanish
deGerman
elGreek
enEnglish
esSpanish
fiFinnish
frFrench
huHungarian
itItalian
nlDutch
no (or nb, nn)Norwegian
ptPortuguese
roRomanian
ruRussian
svSwedish
taTamil
trTurkish

A BCP-47 tag that isn’t on this list still works — it just skips stemming and stopword removal (tokenize + lowercase only). Index and query sides agree on that behavior so scores remain consistent.

Per-value language tagging via rdf:langString. If a single property holds values in different languages, tag them with @language (JSON-LD) or @lang (Turtle):

{
  "@id": "ex:doc1",
  "ex:title": [
    { "@value": "Rust programming", "@language": "en" },
    { "@value": "Programmation Rust", "@language": "fr" }
  ]
}

Fluree automatically builds per-language arenas (ex:title in English, ex:title in French) and queries against the arena whose language matches the row’s tag. Untagged values fall back to the ledger’s f:defaultLanguage.

Per-graph overrides

Different graphs can have different full-text configuration. For example, a product catalog graph might index ex:productName in Spanish while the rest of the ledger uses English:

@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/> .

GRAPH <urn:fluree:mydb:main#config> {
  <urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
    # Ledger-wide: English, index ex:title everywhere.
    f:fullTextDefaults [
      a f:FullTextDefaults ;
      f:defaultLanguage "en" ;
      f:property [ a f:FullTextProperty ; f:target ex:title ]
    ] ;
    # Catalog graph: also index ex:productName, default Spanish.
    f:graphOverrides [
      a f:GraphConfig ;
      f:targetGraph <urn:example:productCatalog> ;
      f:fullTextDefaults [
        a f:FullTextDefaults ;
        f:defaultLanguage "es" ;
        f:property [ a f:FullTextProperty ; f:target ex:productName ]
      ]
    ] .
}

The merge is additive: every property in the ledger-wide list applies to every graph (including productCatalog), and the per-graph override adds ex:productName on top of ex:title. The override’s f:defaultLanguage shadows the ledger-wide language only for untagged plain strings on that specific graph.

Targeting the default graph or txn-meta explicitly. Use the f:defaultGraph sentinel to target only the default graph (g_id = 0), or f:txnMetaGraph for the ledger’s txn-meta graph:

f:graphOverrides [
  a f:GraphConfig ;
  f:targetGraph f:defaultGraph ;
  f:fullTextDefaults [
    a f:FullTextDefaults ;
    f:property [ a f:FullTextProperty ; f:target ex:note ]
  ]
]

Locking config (f:overrideControl)

If you want to prevent per-graph overrides from modifying the ledger-wide full-text defaults, set f:overrideControl to f:OverrideNone on the ledger-wide group:

<urn:fluree:mydb:main:config:ledger> f:fullTextDefaults [
  a f:FullTextDefaults ;
  f:defaultLanguage "en" ;
  f:overrideControl f:OverrideNone ;
  f:property [ a f:FullTextProperty ; f:target ex:title ]
] .

With f:OverrideNone, any f:graphOverrides entry targeting f:fullTextDefaults is ignored at resolution time — the ledger-wide group is final. See Override control for the full model.

Reindexing after a config change

Writing or editing f:fullTextDefaults does not automatically rebuild any arenas. You control when reindexing happens.

What you need to know:

  1. New commits after the config change pick up the new config automatically during the next incremental index build — newly inserted values on configured properties flow into arenas as expected.
  2. Existing values that were committed before the config change are not retroactively indexed until you run a full reindex.
  3. Removing or renaming a property from f:fullTextDefaults drops it from the configured set for new commits, but the existing arena stays until you reindex.
  4. Changing f:defaultLanguage doesn’t rewrite existing arenas — they keep whatever language they were built with. New values get the new language; scores may be temporarily inconsistent across the old/new boundary until a reindex.

To force the full picture — pick up config changes for all existing data — run a manual reindex:

# CLI
fluree reindex mydb:main

# Or via the admin API
curl -X POST https://<fluree-server>/v1/fluree/reindex \
  -H 'Content-Type: application/json' \
  -d '{"ledger": "mydb:main"}'

The reindex reads the current f:fullTextDefaults, walks the entire commit chain, and rebuilds arenas with the new configuration applied consistently.

Note on concurrent reindex + config write. A reindex already in progress operates on a point-in-time snapshot and will NOT pick up a config change committed during its run. If you change config during a reindex, wait for it to finish, then trigger another reindex. See Reindex for full semantics.

How config-path and @fulltext-datatype coexist

If a value’s datatype is @fulltext, the datatype wins: that value is indexed as English, even if the property is listed in f:fullTextDefaults with a different f:defaultLanguage. This keeps the @fulltext contract stable (“I tagged this value English, index it now”) and guarantees no double-indexing.

In practice, a single property can mix:

  • @fulltext-datatype values → English arena
  • rdf:langString values tagged "fr" → French arena
  • Plain xsd:string values → arena for the configured f:defaultLanguage

Each language becomes its own arena; queries automatically look up the right one based on the row’s language tag (with English as the fallback). Ledger-wide English content from both paths shares a single arena — no wasted duplication.

The fulltext() Scoring Function

The fulltext() function computes a BM25 relevance score for a bound text value against a query string. Use it in bind expressions within JSON-LD queries.

Basic usage

{
  "@context": {
    "ex": "http://example.org/"
  },
  "select": ["?title", "?score"],
  "where": [
    { "@id": "?doc", "ex:content": "?content", "ex:title": "?title" },
    ["bind", "?score", "(fulltext ?content \"Rust programming\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Arguments:

  • First argument: a variable bound to a @fulltext-typed value
  • Second argument: the search query string (natural language)

Returns: A numeric score (xsd:double). Higher scores indicate greater relevance. Returns 0.0 when the document contains none of the query terms.

Alternative array syntax

The function also accepts array form:

["bind", "?score", ["fulltext", "?content", "Rust programming"]]

This is equivalent to the S-expression string form.

Filtering by score

Combine bind with filter to exclude non-matching documents:

["bind", "?score", "(fulltext ?content \"search terms\")"],
["filter", "(> ?score 0)"]

Combining with graph patterns

Fulltext scoring works naturally with standard graph patterns. Filter by type, category, or relationships before or after scoring:

{
  "@context": {
    "ex": "http://example.org/"
  },
  "select": ["?title", "?score"],
  "where": [
    {
      "@id": "?doc",
      "@type": "ex:Article",
      "ex:content": "?content",
      "ex:title": "?title",
      "ex:category": "?cat"
    },
    ["filter", "(= ?cat \"technology\")"],
    ["bind", "?score", "(fulltext ?content \"distributed database systems\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Placing the category filter before the fulltext() bind reduces the number of documents scored, improving query performance.

How Scoring Works

The fulltext() function uses BM25 (Best Match 25), the standard information retrieval scoring algorithm used by search engines.

BM25 formula

For each query term t in document d:

IDF(t)     = ln((N - df(t) + 0.5) / (df(t) + 0.5) + 1)
TF_norm(t) = tf(t,d) * (k1 + 1) / (tf(t,d) + k1 * (1 - b + b * |d| / avgdl))
score(q,d) = SUM( IDF(t) * TF_norm(t) )  for each query term t

What makes the scoring effective

  • IDF (Inverse Document Frequency) – Downweights common terms (“the”, “is”) and boosts rare, discriminative terms. A query for “distributed database” gives more weight to “distributed” (rarer) than “database” (common in a tech corpus).

  • Document length normalization – Prevents long documents from dominating purely due to having more words. Controlled by parameter b (default 0.75). A 50-word abstract mentioning “database” twice scores comparably to a 500-word article mentioning it twice.

  • Term frequency saturation – Diminishing returns for repeated terms, controlled by parameter k1 (default 1.2). The 5th occurrence of “database” in a document contributes less than the 1st.

  • Corpus-wide average document length (avgdl) – Anchors the length normalization across the entire collection.

Text analysis pipeline

Both documents and queries go through the same analysis pipeline, and the index and query sides always use the same analyzer for a given arena — so query stems match document stems:

  1. Tokenization – Split text on whitespace and punctuation (Unicode-aware)
  2. Lowercasing – Normalize to lowercase
  3. Stopword removal – Remove common stopwords for the bucket’s language (“the”, “is”, “and” in English; “le”, “la”, “et” in French; etc.)
  4. Stemming – Reduce words to stems using the Snowball stemmer for the bucket’s language

This means a query for “programming” against an English arena matches documents containing “programmed”, “programs”, or “programmer”. A French-language arena stems French word forms instead (“chantait” → “chant”, matching “chanter”, “chantons”, and so on).

For the @fulltext datatype, the analyzer is always English. For properties declared in f:fullTextDefaults, the analyzer matches the arena’s language (row’s rdf:langString tag, or the configured f:defaultLanguage). An unrecognized BCP-47 tag skips steps 3 and 4 — tokenize + lowercase only — consistently on both sides.

Indexing

Automatic arena construction

During background binary index builds, Fluree automatically constructs a FulltextArena (FTA1 format) for each (graph, predicate) combination that has @fulltext values. Each arena stores:

  • A sorted term dictionary of stemmed tokens
  • Per-document bag-of-words (BoW) entries: (term_id, tf) pairs sorted by term ID
  • Corpus-level statistics: document count (N), sum of document lengths (sum_dl), and per-term document frequency (df)

This precomputed representation enables fast scoring at query time – the indexed path avoids per-row text analysis entirely, reading precomputed BoW entries via binary search.

No-index fallback

If no binary index has been built yet (e.g., immediately after ledger creation), fulltext() still works using an on-the-fly analysis fallback. Documents are tokenized and scored using TF-saturation (a simplified scoring model). This is slower but ensures the feature works before background indexing catches up.

Novelty overlay

Documents committed after the last index build (in the “novelty” layer) are automatically included in query results with consistent BM25 scores. Fluree computes effective corpus statistics by merging the persisted arena stats with a novelty delta:

  • N' = N_arena + delta_N_novelty
  • avgdl' = (sum_dl_arena + delta_sum_dl_novelty) / N'
  • df'(t) = df_arena(t) + delta_df_novelty(t)

This ensures that indexed documents and novelty documents produce comparable, consistent scores in the same query.

Retraction handling

When a @fulltext value is retracted, it is removed from the arena at the next index build. The retracted document no longer appears in fulltext query results and its statistics are excluded from corpus-level calculations.

Performance

Query-time benchmarks

All benchmarks measure the full end-to-end query path: JSON-LD parse, query plan, scan, BM25 score, sort, and limit 10. Documents are paragraph-length (~30-60 words), representative of article abstracts, product descriptions, or knowledge base entries.

DocumentsNovelty (no index)Indexed (arena BM25)Speedup
1,00011.6 ms1.7 ms6.7x
5,00057.0 ms7.9 ms7.2x
10,000115.8 ms15.5 ms7.5x
50,000601.9 ms80.2 ms7.5x

Indexed throughput: ~625,000 docs/sec – 50K documents scored and ranked in 80ms.

Novelty throughput: ~85,000 docs/sec – 50K documents in ~600ms (no index required).

The indexed path is 7-7.5x faster because it reads precomputed BoW entries via binary search on sorted (term_id, tf) arrays, avoiding per-row text analysis and HashMap allocation.

Scaling is near-linear. Extrapolating, the indexed path handles approximately 625K documents within a 1-second query budget.

When to consider the BM25 graph source pipeline

Inline @fulltext works well for tens to hundreds of thousands of documents per predicate. For larger corpora (1M+ documents), consider the dedicated BM25 graph source pipeline, which provides:

  • WAND (Weak AND) top-k pruning – Skips documents that provably cannot enter the top-k results, critical for large corpora where scanning every document is prohibitive
  • Chunked posting list storage – Compressed, seekable posting lists with skip pointers for efficient I/O at scale
  • Incremental index updates – Updates posting lists in place without rebuilding the full index
  • Cross-property dependency tracking – BM25 scores can depend on fields from other properties
  • Configurable analyzers per property – Language-specific tokenizers, stemmers, and stopword lists
  • Multi-term query optimization – Term-at-a-time vs document-at-a-time evaluation strategies
Corpus sizeRecommendation
< 100K docsInline @fulltext works well, especially with binary indexing
100K - 500KInline @fulltext remains viable; query times scale linearly
500K - 1MEvaluate based on latency requirements; WAND pruning may help
1M+Use the BM25 graph source for production workloads

Comparison with @vector

Both @fulltext and @vector follow the same architectural pattern: annotate, commit, index, query.

@vector@fulltext
Annotation"@type": "@vector""@type": "@fulltext"
Index artifactVAS1 arena (raw vectors)FTA1 arena (BoW + corpus stats)
Scoring functiondotProduct, cosineSimilarity, euclideanDistancefulltext(?var, "query")
Query inputVector literalNatural language string
Per-row costO(dims) float mathO(query_terms) integer lookups
PortabilityPush/pull/import/export preserves @vectorPush/pull/import/export preserves @fulltext

Complete Example

1. Insert documents with fulltext content:

{
  "@context": {
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "ex:article-1",
      "@type": "ex:Article",
      "ex:title": "Introduction to Rust",
      "ex:content": {
        "@value": "Rust is a systems programming language focused on safety, speed, and concurrency. It prevents segfaults and guarantees thread safety.",
        "@type": "@fulltext"
      }
    },
    {
      "@id": "ex:article-2",
      "@type": "ex:Article",
      "ex:title": "Database Design Patterns",
      "ex:content": {
        "@value": "Modern database systems use columnar storage and immutable ledgers. Graph databases model relationships as first-class citizens.",
        "@type": "@fulltext"
      }
    },
    {
      "@id": "ex:article-3",
      "@type": "ex:Article",
      "ex:title": "Rust for Systems Programming",
      "ex:content": {
        "@value": "Building high-performance systems in Rust requires understanding ownership, borrowing, and lifetime semantics. Rust's type system catches bugs at compile time.",
        "@type": "@fulltext"
      }
    }
  ]
}

2. Query – find articles about “Rust systems programming”, ranked by relevance:

{
  "@context": {
    "ex": "http://example.org/"
  },
  "select": ["?title", "?score"],
  "where": [
    {
      "@id": "?doc",
      "@type": "ex:Article",
      "ex:content": "?content",
      "ex:title": "?title"
    },
    ["bind", "?score", "(fulltext ?content \"Rust systems programming\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Expected results (ordered by relevance):

  1. “Rust for Systems Programming” – highest score (most query terms, multiple occurrences)
  2. “Introduction to Rust” – mentions Rust and systems programming
  3. “Database Design Patterns” – excluded by > 0 filter (no matching terms)

SPARQL Support

Inserting data

Fulltext annotation works in SPARQL UPDATE today using the ^^f:fullText typed literal syntax (see the Turtle/SPARQL insertion examples above).

Querying

The fulltext() scoring function is currently available in JSON-LD Query only. SPARQL query support is planned for a future release, with anticipated syntax like:

PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?title ?score
WHERE {
  ?doc a ex:Article ;
       ex:content ?content ;
       ex:title ?title .
  BIND(f:fulltext(?content, "Rust programming") AS ?score)
  FILTER(?score > 0)
}
ORDER BY DESC(?score)
LIMIT 10

This mirrors the pattern established by inline vector similarity functions (dotProduct, cosineSimilarity, euclideanDistance), which also support JSON-LD Query today with SPARQL planned.

BM25 Full-Text Search

Fluree provides integrated full-text search using the BM25 (Best Matching 25) ranking algorithm. BM25 indexes are implemented as graph sources: they index text content from a source ledger and expose search results that can be joined with structured graph queries.

What is BM25?

BM25 is a probabilistic ranking function that scores documents based on query term frequency and document length normalization. It’s widely used in search engines and information retrieval systems.

Key features:

  • Term frequency with saturation (controlled by k1)
  • Inverse document frequency weighting
  • Document length normalization (controlled by b)
  • English stemming and stopword filtering (default analyzer)
  • Block-Max WAND for efficient top-k queries (early termination)
  • Incremental index updates
  • Time-travel: query the index as of any past transaction

Creating a BM25 Index

BM25 indexes are created via the Rust API using Bm25CreateConfig. There are no HTTP endpoints for index management yet — indexes are managed programmatically.

Basic Index

#![allow(unused)]
fn main() {
use fluree_db_api::{Bm25CreateConfig, FlureeBuilder};
use serde_json::json;

let fluree = FlureeBuilder::file("/path/to/data").build()?;

// Create a ledger and insert some data
let ledger = fluree.create_ledger("docs:main").await?;
let tx = json!({
    "@context": { "ex": "http://example.org/" },
    "@graph": [
        { "@id": "ex:doc1", "@type": "ex:Article", "ex:title": "Rust programming guide" },
        { "@id": "ex:doc2", "@type": "ex:Article", "ex:title": "Python for beginners" },
        { "@id": "ex:doc3", "@type": "ex:Article", "ex:title": "Systems programming in Rust" }
    ]
});
let ledger = fluree.insert(ledger, &tx).await?.ledger;

// Define the indexing query
let query = json!({
    "@context": { "ex": "http://example.org/" },
    "where": [{ "@id": "?x", "@type": "ex:Article", "ex:title": "?title" }],
    "select": { "?x": ["@id", "ex:title"] }
});

// Create the BM25 index
let config = Bm25CreateConfig::new("article-search", "docs:main", query);
let result = fluree.create_full_text_index(config).await?;

println!("Indexed {} documents", result.doc_count);
println!("Graph source: {}", result.graph_source_id); // "article-search:main"
}

The graph source ID is {name}:{branch} — for example, article-search:main.

Indexing Query

The indexing query defines what to index. It’s a standard Fluree JSON-LD query with these requirements:

  • Must include @id in the select (to identify documents)
  • Must use select with a map form: {"?x": ["@id", "ex:prop1", "ex:prop2"]}
  • All selected text properties are extracted and tokenized for search

The query can filter by type, filter by property values, or use any valid Fluree where clause:

{
    "@context": { "ex": "http://example.org/" },
    "where": [
        { "@id": "?x", "@type": "ex:Article", "ex:title": "?title" },
        { "@id": "?x", "ex:status": "published" }
    ],
    "select": { "?x": ["@id", "ex:title", "ex:content", "ex:tags"] }
}

Configuration Options

ParameterDefaultDescription
name(required)Graph source name. Cannot contain :.
ledger(required)Source ledger alias (e.g., "docs:main")
query(required)Indexing query (JSON-LD, must have select)
branch"main"Branch name for the graph source
k11.2Term frequency saturation. Higher = more weight to term frequency. Must be > 0. Typical range: 1.2-2.0.
b0.75Document length normalization. 0 = no normalization, 1 = full normalization. Must be 0.0-1.0.
#![allow(unused)]
fn main() {
let config = Bm25CreateConfig::new("search", "docs:main", query)
    .with_branch("dev")
    .with_k1(1.5)
    .with_b(0.5);
}

Text Analysis

Fluree uses a default English analyzer that applies:

  1. Tokenization: Unicode-aware word boundary splitting
  2. Lowercasing: All tokens converted to lowercase
  3. Stopword filtering: Common English words removed (the, a, an, is, etc.)
  4. Stemming: Snowball English stemmer reduces words to root forms (e.g., “programming” -> “program”)

The analyzer is not configurable — it always uses the English pipeline for consistency.

Querying BM25 Indexes

JSON-LD Query Syntax

BM25 search is integrated into Fluree’s query system via the f: namespace predicates:

{
    "@context": {
        "ex": "http://example.org/",
        "f": "https://ns.flur.ee/db#"
    },
    "from": "docs:main",
    "where": [
        {
            "f:graphSource": "article-search:main",
            "f:searchText": "rust programming",
            "f:searchLimit": 10,
            "f:searchResult": {
                "f:resultId": "?doc",
                "f:resultScore": "?score"
            }
        },
        { "@id": "?doc", "ex:author": "?author" }
    ],
    "select": ["?doc", "?score", "?author"]
}

Pattern fields:

FieldDescription
f:graphSourceGraph source ID (e.g., "article-search:main")
f:searchTextQuery text (analyzed with same pipeline as indexing)
f:searchLimitMaximum number of search results
f:searchResultBinding object for results
f:resultIdVariable binding for the document IRI
f:resultScoreVariable binding for the BM25 relevance score
f:resultLedger(Optional) Variable binding for ledger provenance

Combining Search with Structured Queries

The search pattern produces ?doc and ?score bindings. These can be joined with ledger data using normal where clauses:

{
    "@context": {
        "ex": "http://example.org/",
        "f": "https://ns.flur.ee/db#"
    },
    "from": "docs:main",
    "where": [
        {
            "f:graphSource": "article-search:main",
            "f:searchText": "rust",
            "f:searchLimit": 20,
            "f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
        },
        { "@id": "?doc", "ex:title": "?title" },
        { "@id": "?doc", "ex:author": "?author" }
    ],
    "select": ["?doc", "?title", "?author", "?score"]
}

The BM25 search runs first and produces candidate bindings. The subsequent where clauses join those candidates with the source ledger to retrieve additional properties.

You can also use the Rust API directly for programmatic search without the query engine:

#![allow(unused)]
fn main() {
use fluree_db_query::bm25::{Analyzer, Bm25Scorer};

// Load the index
let index = fluree.load_bm25_index("article-search:main").await?;

// Analyze query terms (same pipeline as indexing)
let analyzer = Analyzer::english_default();
let terms = analyzer.analyze_to_strings("rust programming");
let term_refs: Vec<&str> = terms.iter().map(|s| s.as_str()).collect();

// Score and rank
let scorer = Bm25Scorer::new(&index, &term_refs);
let results = scorer.top_k(10);

for (doc_key, score) in &results {
    println!("{}: {:.2}", doc_key.subject_iri, score);
}
}

Rust API: Query with BM25

Use query_connection_with_bm25 for integrated queries:

#![allow(unused)]
fn main() {
let query = json!({
    "@context": { "ex": "http://example.org/", "f": "https://ns.flur.ee/db#" },
    "from": "docs:main",
    "where": [
        {
            "f:graphSource": "article-search:main",
            "f:searchText": "rust",
            "f:searchLimit": 10,
            "f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
        },
        { "@id": "?doc", "ex:author": "?author" }
    ],
    "select": ["?doc", "?score", "?author"]
});

let result = fluree.query_connection_with_bm25(&query).await?;
}

Index Maintenance

Syncing

BM25 indexes are not automatically updated when the source ledger changes. You must explicitly sync them:

#![allow(unused)]
fn main() {
// Incremental sync (detects changes since last watermark)
let sync_result = fluree.sync_bm25_index("article-search:main").await?;
println!("Upserted: {}, Removed: {}", sync_result.upserted, sync_result.removed);

// Force full resync (rebuilds the entire index)
let sync_result = fluree.resync_bm25_index("article-search:main").await?;
}

Incremental sync uses property dependency tracking to identify which subjects changed since the last indexed commit. Only affected documents are re-queried and re-indexed. If no affected subjects are detected, it falls back to a full resync.

Background Maintenance Worker

For production use, the Bm25MaintenanceWorker can be configured to automatically sync indexes when source ledgers change:

  • Watches for commit events on source ledgers
  • Debounces rapid commits (configurable interval)
  • Bounded concurrency for concurrent sync operations
  • Registers/unregisters graph sources dynamically

Staleness Checking

Check whether an index is behind its source ledger:

#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("article-search:main").await?;
println!("Index at t={}, ledger at t={}, stale: {}, lag: {}",
    check.index_t, check.ledger_t, check.is_stale, check.lag);
}

Time-Travel

Load an index at a specific historical transaction time:

#![allow(unused)]
fn main() {
// Load index as of transaction t=5
let (index, actual_t) = fluree.load_bm25_index_at("article-search:main", 5).await?;
println!("Loaded snapshot at t={}, docs: {}", actual_t, index.num_docs());
}

BM25 maintains a manifest of historical snapshots. The manifest is stored in content-addressed storage and tracks all snapshot versions. load_bm25_index_at selects the snapshot with the largest index_t <= as_of_t.

Dropping an Index

#![allow(unused)]
fn main() {
let drop_result = fluree.drop_full_text_index("article-search:main").await?;
println!("Deleted {} snapshots", drop_result.deleted_snapshots);

// Drop is idempotent
let drop_again = fluree.drop_full_text_index("article-search:main").await?;
assert!(drop_again.was_already_retracted);
}

Dropping marks the graph source as retracted in the nameservice and deletes all snapshot blobs from storage. The index can be recreated with the same name afterward.

Scoring and Top-K Optimization

For top-k queries (the typical case via f:searchLimit), BM25 uses Block-Max WAND (Weak AND) to avoid scoring every matching document. Posting lists are divided into fixed-size blocks (128 postings each) with per-block metadata (maximum term frequency). WAND uses these to compute score upper bounds, skipping entire blocks that cannot contribute to the current top-k results.

This makes top_k(10) on a 100K-document index significantly faster than scoring all matches — the algorithm terminates early once it can prove no remaining document can displace the current top results.

When block metadata is unavailable (e.g., during index building before the first snapshot), scoring falls back to dense accumulation over all postings.

Storage Format

V4 Chunked Format

Large BM25 indexes use a chunked storage format (v4) that splits the index into:

  • Root blob: Terms dictionary, document metadata, BM25 statistics, routing table
  • Posting leaflet blobs: Compressed posting lists (~2MB each), stored as separate content-addressed objects. Each posting list includes block metadata (128 postings per block with max_doc_id and max_tf) used for WAND score upper bounds and block-level navigation.

This enables selective loading: queries only fetch the leaflets containing terms that match the search query, rather than loading the entire index.

Leaflet Caching

Posting leaflets are cached in the global LeafletCache (shared with core index leaflets). Cache entries are keyed by content ID hash and are immutable (content-addressed data never changes). The cache uses moka’s TinyLFU eviction and is governed by the global cache budget (--cache-max-mb / FLUREE_CACHE_MAX_MB, default: tiered fraction of RAM — 30% <4GB, 40% 4-8GB, 50% ≥8GB).

Parallel I/O

Both reads and writes use bounded-concurrency parallel I/O (buffer_unordered(32)) for leaflet operations. This caps socket pressure when working with object stores like S3 while still providing significant throughput improvement over sequential access.

Format Selection

The storage format is selected automatically based on the storage backend:

  • File storage: V3 single-blob format (optimized for local filesystem)
  • Memory / S3 / object store: V4 chunked format (enables selective loading and caching)

Deployment Modes

Embedded Mode (Default)

In embedded mode, the BM25 index is loaded and searched within the same process as Fluree. This is the default behavior.

Remote Mode

In remote mode, search queries are delegated to a dedicated search service (fluree-search-httpd):

fluree-search-httpd \
  --storage-root file:///var/fluree/data \
  --nameservice-path file:///var/fluree/ns \
  --listen 0.0.0.0:9090

Both modes use identical analyzer configuration, BM25 scoring algorithm, and time-travel semantics — queries return identical results regardless of deployment mode.

See BM25 Graph Source for details on the remote search protocol.

Vector Search

Vector search enables similarity search using embedding vectors, supporting use cases like:

  • Semantic search: Find similar meanings, not just keywords
  • Recommendations: Find similar products, content, users
  • Image search: Find similar images by visual features
  • Anomaly detection: Find unusual patterns

Fluree supports two complementary approaches:

  1. Inline similarity functions – compute dotProduct, cosineSimilarity, or euclideanDistance directly in queries using bind. No external index required.
  2. HNSW vector indexes – build dedicated approximate-nearest-neighbor (ANN) indexes for large-scale similarity search using the f:* query pattern.

The @vector Datatype

Why a dedicated datatype?

In RDF, a plain JSON array like [0.5, 0.5, 0.0] is decomposed into individual values. Duplicate elements can be deduplicated, and ordering is not guaranteed. This breaks embedding vectors. The @vector datatype tells Fluree to store the array as a single, ordered, fixed-length vector.

@vector is a shorthand for the full IRI https://ns.flur.ee/db#embeddingVector, which can also be written as f:embeddingVector when the Fluree namespace prefix is declared in your @context.

Storage: f32 precision contract

All @vector values are stored as IEEE-754 binary32 (f32) arrays. This means:

  • Each element in your JSON array is quantized to f32 at ingest time
  • Values that are not representable as finite f32 (NaN, Infinity, values exceeding f32 range) are rejected
  • Round-trip reads return the f32-quantized values (e.g., 0.1 in JSON becomes 0.10000000149011612 after f32 quantization)
  • This provides a compact, cache-friendly representation optimized for SIMD similarity computation

If you need higher precision (f64) or different vector formats (sparse, integer), store them as a custom RDF datatype string.

Inserting vectors (JSON-LD)

Use "@type": "@vector" to annotate a numeric array as a vector:

{
  "@context": {
    "ex": "http://example.org/"
  },
  "@graph": [
    {
      "@id": "ex:doc1",
      "@type": "ex:Document",
      "ex:embedding": {
        "@value": [0.1, 0.2, 0.3, 0.4],
        "@type": "@vector"
      }
    }
  ]
}

You can also use the full IRI or the f: prefix form, which is equivalent:

{
  "@context": {
    "ex": "http://example.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "@graph": [
    {
      "@id": "ex:doc1",
      "ex:embedding": {
        "@value": [0.1, 0.2, 0.3, 0.4],
        "@type": "f:embeddingVector"
      }
    }
  ]
}

Incorrect – plain array (will not work for similarity):

{
  "@id": "ex:doc1",
  "ex:embedding": [0.1, 0.2, 0.3, 0.4]
}

Plain arrays are decomposed into individual RDF values where duplicates may be removed and order is lost.

Inserting vectors (Turtle / SPARQL UPDATE)

In Turtle and SPARQL UPDATE, the @vector shorthand is not available. Use the f:embeddingVector datatype IRI with the standard ^^ typed-literal syntax:

PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>

INSERT DATA {
  ex:doc1 ex:embedding "[0.1, 0.2, 0.3, 0.4]"^^f:embeddingVector .
}

The vector is represented as a JSON array string with the ^^f:embeddingVector datatype annotation.

Multiple vectors per entity

An entity can have multiple vectors on the same property:

{
  "@id": "ex:doc1",
  "ex:embedding": [
    {"@value": [0.1, 0.9], "@type": "@vector"},
    {"@value": [0.2, 0.8], "@type": "@vector"}
  ]
}

Each vector produces separate rows in query results.

Vector literals in query VALUES clauses

When passing a vector literal in a query values clause, use the full IRI or the f: prefix form – the @vector shorthand is only resolved in the transaction parser:

"values": [
  ["?queryVec"],
  [{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
]

Or with the full IRI:

"values": [
  ["?queryVec"],
  [{"@value": [0.7, 0.6], "@type": "https://ns.flur.ee/db#embeddingVector"}]
]

Inline Similarity Functions (JSON-LD Query)

Fluree provides three vector similarity functions that can be used in bind expressions within JSON-LD queries. These compute similarity scores directly during query execution without requiring a pre-built index.

Function names are case-insensitive; dotProduct, dotproduct, and dot_product are all equivalent.

dotProduct

Computes the dot product (inner product) of two vectors. Higher scores indicate greater similarity when vectors represent aligned directions.

{
  "@context": {
    "ex": "http://example.org/ns/",
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?doc", "?score"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
  ],
  "where": [
    {"@id": "?doc", "ex:embedding": "?vec"},
    ["bind", "?score", "(dotProduct ?vec ?queryVec)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Score range: (-inf, +inf). Best when vector magnitude encodes importance.

cosineSimilarity

Computes the cosine of the angle between two vectors. Ignores magnitude, focusing purely on directional similarity.

{
  "@context": {
    "ex": "http://example.org/ns/",
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?doc", "?score"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
  ],
  "where": [
    {"@id": "?doc", "ex:embedding": "?vec"},
    ["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Score range: [-1, 1] (1 = identical direction, 0 = orthogonal, -1 = opposite). Returns null if either vector has zero magnitude. Best for text embeddings and normalized vectors.

euclideanDistance

Computes the L2 (straight-line) distance between two vectors. Lower scores indicate greater similarity.

{
  "@context": {
    "ex": "http://example.org/ns/",
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?doc", "?distance"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
  ],
  "where": [
    {"@id": "?doc", "ex:embedding": "?vec"},
    ["bind", "?distance", "(euclideanDistance ?vec ?queryVec)"]
  ],
  "orderBy": "?distance",
  "limit": 10
}

Score range: [0, +inf) (0 = identical). Best for geometric similarity and when absolute position matters.

Alternative array syntax

The similarity functions also accept array form instead of the S-expression string:

["bind", "?score", ["dotProduct", "?vec", "?queryVec"]]

This is equivalent to:

["bind", "?score", "(dotProduct ?vec ?queryVec)"]

Filtering by score threshold

Combine bind with filter to return only results above a similarity threshold:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?doc", "?score"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
  ],
  "where": [
    {"@id": "?doc", "ex:embedding": "?vec"},
    ["bind", "?score", "(dotProduct ?vec ?queryVec)"],
    ["filter", "(> ?score 0.7)"]
  ]
}

Combining with graph patterns

Vector similarity can be combined with standard graph patterns to filter by type, property values, or relationships:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?doc", "?title", "?score"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.9, 0.1, 0.05], "@type": "f:embeddingVector"}]
  ],
  "where": [
    {"@id": "?doc", "@type": "ex:Article", "ex:title": "?title", "ex:embedding": "?vec"},
    ["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"],
    ["filter", "(> ?score 0.5)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 5
}

Using a stored vector as the query vector

Instead of providing a literal vector, you can use a stored entity’s vector:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "select": ["?similar", "?score"],
  "where": [
    {"@id": "ex:reference-doc", "ex:embedding": "?queryVec"},
    {"@id": "?similar", "ex:embedding": "?vec"},
    ["filter", "(!= ?similar ex:reference-doc)"],
    ["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

Mixed datatypes

If a property contains both vector and non-vector values, the similarity functions return null for non-vector bindings:

{
  "@graph": [
    {"@id": "ex:a", "ex:data": {"@value": [0.6, 0.5], "@type": "@vector"}},
    {"@id": "ex:b", "ex:data": "Not a vector"}
  ]
}

Querying with dotProduct on ?data will return a numeric score for ex:a and null for ex:b.

SPARQL support

Inline vector similarity functions (dotProduct, cosineSimilarity, euclideanDistance) are available in both JSON-LD Query and SPARQL. In SPARQL, use them as built-in function calls within BIND expressions:

dotProduct (SPARQL)

PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?doc ?score
WHERE {
  VALUES ?queryVec { "[0.7, 0.6]"^^f:embeddingVector }
  ?doc ex:embedding ?vec ;
       ex:title ?title .
  BIND(dotProduct(?vec, ?queryVec) AS ?score)
}
ORDER BY DESC(?score)
LIMIT 10

cosineSimilarity (SPARQL)

PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?doc ?score
WHERE {
  VALUES ?queryVec { "[0.88, 0.12, 0.08]"^^f:embeddingVector }
  ?doc a ex:Article ;
       ex:embedding ?vec ;
       ex:title ?title .
  BIND(cosineSimilarity(?vec, ?queryVec) AS ?score)
  FILTER(?score > 0.5)
}
ORDER BY DESC(?score)
LIMIT 5

euclideanDistance (SPARQL)

PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>

SELECT ?doc ?distance
WHERE {
  VALUES ?queryVec { "[0.7, 0.6]"^^f:embeddingVector }
  ?doc ex:embedding ?vec .
  BIND(euclideanDistance(?vec, ?queryVec) AS ?distance)
}
ORDER BY ?distance
LIMIT 10

Vector literals in SPARQL

In SPARQL, vectors are passed as JSON array strings with the ^^f:embeddingVector typed literal syntax:

VALUES ?queryVec { "[0.1, 0.2, 0.3]"^^f:embeddingVector }

Or with the full IRI:

VALUES ?queryVec { "[0.1, 0.2, 0.3]"^^<https://ns.flur.ee/db#embeddingVector> }

Function name variants

Function names are case-insensitive in SPARQL. All of these are equivalent:

  • dotProduct, DOTPRODUCT, dot_product
  • cosineSimilarity, COSINESIMILARITY, cosine_similarity
  • euclideanDistance, EUCLIDEANDISTANCE, euclidean_distance

HNSW Vector Indexes

For large-scale similarity search, Fluree provides dedicated HNSW (Hierarchical Navigable Small World) vector indexes. These are approximate nearest-neighbor (ANN) indexes that trade exact results for dramatically faster query times on large datasets.

Vector indexes are implemented using embedded usearch following the same architecture as BM25:

  • Embedded in-process HNSW indexes (no external service required)
  • Remote mode via dedicated search service (fluree-search-httpd)
  • Snapshot-based persistence with watermarks
  • Incremental sync for efficient updates
  • Feature-gated via vector feature flag

v1 limitation: HNSW vector search is head-only. Time-travel queries (e.g. @t:) are not supported.

Creating Vector Indexes

HTTP/Docker users: there is no HTTP endpoint for creating vector indexes today. Index creation is Rust-API-only. To use HNSW vector search from an HTTP-only deployment, create the index using a Rust program (or the Rust API embedded in your application) against the same storage path your Fluree server reads, then run queries normally via POST /v1/fluree/query.

Rust API

#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, VectorCreateConfig};
use fluree_db_query::vector::DistanceMetric;

let fluree = FlureeBuilder::memory().build_memory();

// Create indexing query to select documents with embeddings
let indexing_query = json!({
    "@context": { "ex": "http://example.org/" },
    "where": [{ "@id": "?x", "@type": "ex:Document" }],
    "select": { "?x": ["@id", "ex:embedding"] }
});

// Create vector index
let config = VectorCreateConfig::new(
    "doc-embeddings",           // index name
    "mydb:main",                // source ledger
    indexing_query,             // what to index
    "ex:embedding",             // embedding property
    768                         // dimensions
)
.with_metric(DistanceMetric::Cosine);

let result = fluree.create_vector_index(config).await?;
println!("Indexed {} vectors", result.vector_count);
}

Configuration Options

OptionDescriptionDefault
nameIndex name (creates graph source ID name:branch)Required
ledgerSource ledger ID (name:branch)Required
queryJSON-LD query selecting documentsRequired
embedding_propertyProperty containing embeddingsRequired
dimensionsVector dimensionsRequired
metricDistance metric (Cosine, Dot, Euclidean)Cosine
connectivityHNSW M parameter16
expansion_addefConstruction parameter128
expansion_searchefSearch parameter64

Query Syntax

Vector index search uses the f:* pattern syntax in WHERE clauses:

{
  "@context": {
    "ex": "http://example.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "from": "mydb:main",
  "where": [
    {
      "f:graphSource": "doc-embeddings:main",
      "f:queryVector": [0.1, 0.2, 0.3],
      "f:distanceMetric": "cosine",
      "f:searchLimit": 10,
      "f:searchResult": {
        "f:resultId": "?doc",
        "f:resultScore": "?score"
      }
    }
  ],
  "select": ["?doc", "?score"]
}

Query Parameters

ParameterDescriptionRequired
f:graphSourceVector index aliasYes
f:queryVectorQuery vector (array or variable)Yes
f:distanceMetricDistance metric (“cosine”, “dot”, “euclidean”)No (uses index default)
f:searchLimitMaximum resultsNo
f:searchResultResult binding (variable or object)Yes
f:syncBeforeQueryWait for index sync before queryNo (default: false)
f:timeoutMsQuery timeout in msNo

Result Binding

Simple variable binding:

"f:searchResult": "?doc"

Structured binding with score and ledger:

"f:searchResult": {
  "f:resultId": "?doc",
  "f:resultScore": "?similarity",
  "f:resultLedger": "?source"
}

Variable Query Vectors

Query vector can be a variable bound earlier:

{
  "where": [
    { "@id": "ex:reference-doc", "ex:embedding": "?queryVec" },
    {
      "f:graphSource": "embeddings:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 5,
      "f:searchResult": "?similar"
    }
  ]
}

Index Maintenance

Sync Updates

After committing new data, sync the vector index:

#![allow(unused)]
fn main() {
let sync_result = fluree.sync_vector_index("doc-embeddings:main").await?;
println!("Upserted: {}, Removed: {}", sync_result.upserted, sync_result.removed);
}

Full Resync

Rebuild the entire index from scratch:

#![allow(unused)]
fn main() {
let resync_result = fluree.resync_vector_index("doc-embeddings:main").await?;
}

Check Staleness

#![allow(unused)]
fn main() {
let check = fluree.check_vector_staleness("doc-embeddings:main").await?;
if check.is_stale {
    println!("Index is {} commits behind", check.commits_behind);
}
}

Drop Index

#![allow(unused)]
fn main() {
fluree.drop_vector_index("doc-embeddings:main").await?;
}

Distance Metrics

Cosine (Default)

Measures angle between vectors. Best for:

  • Text embeddings (e.g., sentence transformers)
  • Normalized vectors
  • When magnitude doesn’t matter

Score range: [-1, 1] (1 = identical, 0 = orthogonal, -1 = opposite)

For unit-normalized vectors, cosine similarity equals dot product. Fluree’s SIMD kernels exploit this for faster computation when vectors are pre-normalized.

Dot Product

Measures alignment and magnitude. Best for:

  • Maximum inner product search (MIPS)
  • When vector magnitude encodes importance

Score range: (-inf, +inf)

Euclidean (L2)

Measures straight-line distance. Best for:

  • Geometric similarity
  • Image feature vectors
  • When absolute position matters

Raw score range: [0, +inf). In HNSW index results, normalized to (0, 1] via 1 / (1 + distance).

Note: In HNSW index results (f:* queries), all metrics are normalized to “higher is better”. In inline similarity functions, euclideanDistance returns the raw L2 distance (lower = more similar).

Deployment Modes

Vector indexes support two deployment topologies: searching in-process (embedded) or via a dedicated fluree-search-httpd service that mounts the same storage. Both topologies use identical distance-metric computation, score normalization, and snapshot serialization, so results are identical.

Embedded Mode (Default)

The vector index is loaded and searched within the same process as the Fluree server. No additional services. This is the default and is appropriate for most deployments.

Dedicated Search Service

For large indexes or when you want search traffic isolated from the main Fluree process, run the standalone fluree-search-httpd binary on the same storage volume and have your application send vector requests directly to it.

Note: Today, vector search is invoked from a Fluree query (the f:graphSource / f:queryVector pattern) using the embedded path — the main Fluree server does not yet route those queries to a remote service. The dedicated service is reachable directly via its own POST /v1/search API (the same protocol BM25 uses), which is suitable for applications that issue vector queries outside of a Fluree query context. Transparent delegation from inside a Fluree query is a planned follow-up; the wiring is in place but the deployment config is not yet persisted by create_vector_index.

See Remote Search Service for fluree-search-httpd configuration, env vars, the request/response protocol (vector and vector_similar_to query kinds), and Docker deployment.

Performance and Scaling

The importance of binary indexing

Fluree’s binary columnar index dramatically accelerates vector queries. Queries against novelty-only (unindexed) data perform a linear scan through the in-memory commit log, while indexed queries read pre-sorted, cache-friendly columnar data. Ensure background indexing is running for production workloads – the difference is substantial.

The following benchmarks use 768-dimensional vectors (typical for transformer embeddings like sentence-transformers or OpenAI text-embedding-3-small) on Apple M-series hardware:

Novelty-only (no binary index)

ScenarioVectorsQuery timeThroughput
Scan all1,0009.9 ms~101K vec/s
Scan all5,00045.1 ms~111K vec/s
Filtered + score1,000 (75 pass filter)13.5 ms~5.5K vec/s
Filtered + score5,000 (402 pass filter)62.1 ms~6.5K vec/s

With binary index

ScenarioVectorsQuery timeThroughputSpeedup vs novelty
Scan all1,0001.68 ms~595K vec/s5.9x
Scan all5,0007.69 ms~650K vec/s5.9x
Filtered + score1,000 (75 pass filter)533 us~141K vec/s25x
Filtered + score5,000 (402 pass filter)2.40 ms~168K vec/s26x

Key takeaways:

  • Unfiltered scans are ~6x faster with the binary index
  • Filtered queries (where graph patterns reduce the candidate set before scoring) are ~25x faster – the index enables efficient predicate-first access that avoids loading irrelevant vectors entirely
  • At 5,000 vectors, a filtered indexed query completes in 2.4 ms – well within interactive latency budgets

Inline similarity functions (flat scan)

  • Best for: Small to medium datasets, ad-hoc similarity queries, prototyping
  • Complexity: O(n) linear scan – computes similarity against every matching vector
  • Advantage: No index setup required, works immediately after insert
  • SIMD acceleration: Fluree uses runtime-detected SIMD kernels (SSE2/AVX on x86_64, NEON on ARM) for vectorized dot/cosine/L2 computation
  • Normalized embedding optimization: For unit-normalized vectors (most transformer embeddings), cosine similarity reduces to a dot product, avoiding magnitude computation entirely

When to consider HNSW

Inline similarity functions perform a brute-force scan over all candidate vectors. This scales linearly and remains fast for moderate datasets, but at larger scales an HNSW index provides O(log n) approximate nearest-neighbor search.

Rule of thumb:

Vector count (per property)Recommendation
< 100KFlat scan works well, especially with binary indexing. Sub-100ms queries typical.
100K – 1MStart evaluating HNSW. Flat scan may still be acceptable depending on latency target and hardware, but HNSW will provide more consistent low-latency results.
1M – 10MHNSW strongly recommended for interactive latency. Flat scan can work if vectors are memory-resident and you can tolerate ~1-2 second queries.
> 10MHNSW (or other ANN index) is the default recommendation. Flat scan becomes I/O- and cache-bound for low-latency use cases.

Factors that shift the crossover:

  • Hardware: Fast NVMe / large RAM pushes the threshold higher; object storage (S3) pulls it lower
  • Latency target: A 50 ms budget favors HNSW earlier than a 2-second budget
  • Filter selectivity: If graph patterns reduce candidates to a small fraction before scoring, flat scan remains viable at higher counts
  • Normalized embeddings: Cosine-as-dot-product is faster, pushing the threshold higher
  • Binary indexing: An indexed dataset scans ~6x faster than novelty-only, effectively raising the flat-scan ceiling

HNSW vector indexes

  • Best for: Large datasets (100K+ vectors), production similarity search with strict latency requirements
  • Complexity: O(log n) approximate nearest neighbor
  • Space: ~1.5x embedding size + IRI mapping overhead
  • Updates: Incremental via affected-subject tracking

Tuning parameters

ParameterEffectTrade-off
connectivity (M)Graph connectivityHigher = better recall, more memory
expansion_add (efConstruction)Build-time search widthHigher = better index quality, slower build
expansion_search (efSearch)Query-time search widthHigher = better recall, slower queries

Feature Flag

The HNSW vector index functionality requires the vector feature:

[dependencies]
fluree-db-api = { version = "0.1", features = ["vector"] }

Inline similarity functions (dotProduct, cosineSimilarity, euclideanDistance) and the @vector datatype are available without feature flags.

1. Insert documents with embeddings:

{
  "@context": {
    "ex": "http://example.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "@graph": [
    {
      "@id": "ex:doc1",
      "@type": "ex:Article",
      "ex:title": "Introduction to Machine Learning",
      "ex:embedding": {"@value": [0.9, 0.1, 0.05], "@type": "@vector"}
    },
    {
      "@id": "ex:doc2",
      "@type": "ex:Article",
      "ex:title": "Database Design Patterns",
      "ex:embedding": {"@value": [0.1, 0.8, 0.1], "@type": "@vector"}
    },
    {
      "@id": "ex:doc3",
      "@type": "ex:Article",
      "ex:title": "Neural Network Architectures",
      "ex:embedding": {"@value": [0.85, 0.15, 0.1], "@type": "@vector"}
    }
  ]
}

2. Query – find articles similar to a “machine learning” embedding:

{
  "@context": {
    "ex": "http://example.org/",
    "f": "https://ns.flur.ee/db#"
  },
  "select": ["?title", "?score"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.88, 0.12, 0.08], "@type": "f:embeddingVector"}]
  ],
  "where": [
    {"@id": "?doc", "@type": "ex:Article", "ex:title": "?title", "ex:embedding": "?vec"},
    ["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 5
}

Expected results (ordered by similarity):

  1. “Introduction to Machine Learning” – highest cosine similarity
  2. “Neural Network Architectures” – similar domain
  3. “Database Design Patterns” – different domain, lower score

Geospatial Data

Fluree provides native support for geographic point data using the OGC GeoSPARQL standard. POINT geometries from geo:wktLiteral values are stored in an optimized binary format enabling efficient storage and index-accelerated proximity queries.

Status

Geospatial support is implemented with:

  • Inline GeoPoint encoding: POINT geometries stored as packed 60-bit lat/lng values
  • Automatic detection: geo:wktLiteral POINT values automatically converted to native format
  • Full round-trip: GeoPoints preserved through commit, index, and query paths
  • ~0.3mm precision: 30-bit encoding per coordinate provides sub-millimeter accuracy
  • Index-accelerated proximity queries: POST latitude-band scans with haversine post-filtering
  • Time travel support: Point-in-time geo queries via from: "<ledger>@t:<t>" (see examples below)

Non-POINT geometries (polygons, linestrings, multipolygons, etc.) are indexed using a separate S2 cell-based spatial index that enables efficient containment and intersection queries.

Storing Geographic Data

WKT Literal Format

Geographic data uses the Well-Known Text (WKT) format with the geo:wktLiteral datatype:

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "@graph": [
    {
      "@id": "ex:eiffel-tower",
      "@type": "ex:Landmark",
      "ex:name": "Eiffel Tower",
      "ex:location": {
        "@value": "POINT(2.2945 48.8584)",
        "@type": "geo:wktLiteral"
      }
    }
  ]
}

Important: WKT uses POINT(longitude latitude) order (X, Y), which is the opposite of common lat/lng conventions.

Coordinate Order

FormatOrderExample
WKTlongitude, latitudePOINT(2.2945 48.8584)
Common conventionslatitude, longitude48.8584, 2.2945

Fluree handles the conversion internally, storing coordinates in latitude-primary order for efficient latitude-band index scans.

Valid POINT Syntax

Fluree recognizes these POINT formats:

POINT(2.2945 48.8584)           # Standard 2D point
POINT( 2.2945  48.8584 )        # Whitespace is flexible
POINT(-122.4194 37.7749)        # Negative coordinates (San Francisco)

The following are not supported for native GeoPoint storage (stored as strings instead):

POINT EMPTY                      # Empty point
POINT Z(2.2945 48.8584 100)     # 3D point with altitude
POINT M(2.2945 48.8584 1.0)     # Point with measure
POINT ZM(2.2945 48.8584 100 1)  # 3D point with measure
<http://...>POINT(...)          # SRID prefix
point(2.2945 48.8584)           # Lowercase (case-sensitive)

Coordinate Validation

Coordinates must be within valid ranges:

  • Latitude: -90.0 to 90.0 (degrees)
  • Longitude: -180.0 to 180.0 (degrees)
  • Finite values only: NaN and infinity are rejected

Invalid coordinates cause the value to be stored as a plain string rather than a native GeoPoint.

Querying Geographic Data

Basic Retrieval

GeoPoints are returned in WKT format in query results:

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "from": "places:main",
  "where": [
    { "@id": "?place", "@type": "ex:Landmark" },
    { "@id": "?place", "ex:location": "?loc" }
  ],
  "select": ["?place", "?loc"]
}

Result:

[
  ["ex:eiffel-tower", "POINT(2.2945 48.8584)"]
]

SPARQL Queries

PREFIX ex: <http://example.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>

SELECT ?place ?location
WHERE {
  ?place a ex:Landmark ;
         ex:location ?location .
}

Output Formats

GeoPoints appear differently based on output format:

JSON-LD (default):

{
  "@id": "ex:eiffel-tower",
  "ex:location": {
    "@value": "POINT(2.2945 48.8584)",
    "@type": "geo:wktLiteral"
  }
}

SPARQL JSON:

{
  "type": "literal",
  "value": "POINT(2.2945 48.8584)",
  "datatype": "http://www.opengis.net/ont/geosparql#wktLiteral"
}

Typed JSON:

{
  "@value": "POINT(2.2945 48.8584)",
  "@type": "geo:wktLiteral"
}

Storage Encoding

Binary Format

GeoPoints are stored using a compact 60-bit encoding:

  • Upper 30 bits: Latitude scaled from [-90, 90] to [0, 2^30-1]
  • Lower 30 bits: Longitude scaled from [-180, 180] to [0, 2^30-1]

This provides:

  • 8 bytes total storage per point (vs ~25+ bytes for WKT string)
  • ~0.3mm precision at the equator
  • Ordered encoding enabling efficient range scans by latitude band

Index Structure

GeoPoints use ObjKind::GEO_POINT (0x14) in the binary index:

ComponentEncoding
Object kind1 byte (0x14)
Object key8 bytes (packed lat/lng)

The latitude-primary encoding enables POST index scans that efficiently retrieve all points within a latitude band.

Distance Queries

Fluree supports the geof:distance function (OGC GeoSPARQL) for calculating haversine distances between geographic points.

geof:distance Function

Calculate the distance between two points in meters:

JSON-LD Query (bind + filter):

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "from": "places:main",
  "where": [
    { "@id": "?place", "ex:location": "?loc" },
    { "@id": "ex:paris", "ex:location": "?parisLoc" },
    ["bind", "?distance", "(geof:distance ?loc ?parisLoc)"],
    ["filter", "(< ?distance 500000)"]
  ],
  "select": ["?place", "?distance"]
}

SPARQL:

PREFIX ex: <http://example.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?place ?distance
WHERE {
  ?place ex:location ?loc .
  ex:paris ex:location ?parisLoc .
  BIND(geof:distance(?loc, ?parisLoc) AS ?distance)
  FILTER(?distance < 500000)
}
ORDER BY ?distance

Function aliases: geof:distance, geo_distance, geodistance

Arguments:

  • Two GeoPoint values (stored as geo:wktLiteral POINT)
  • Or two WKT POINT strings

Returns: Distance in meters (Double)

Calculation: Uses the haversine formula with Earth’s mean radius (6,371 km), accurate to within 0.3% for typical distances.

Fluree supports index-accelerated proximity queries that find points within a given distance of a center point.

Index-Accelerated Point Proximity

Use a geof:distance bind + filter pattern to run an accelerated proximity search over inline GeoPoints. This pattern works identically in both JSON-LD and SPARQL queries — the query optimizer detects the Triple + Bind(geof:distance) + Filter combination and rewrites it into an index-accelerated scan.

JSON-LD Query (find restaurants within 5km, include distance, limit to 10):

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "from": "places:main",
  "where": [
    { "@id": "?place", "@type": "ex:Restaurant" },
    { "@id": "?place", "ex:location": "?loc" },
    ["bind", "?distance", "(geof:distance ?loc \"POINT(2.35 48.85)\")"],
    ["filter", "(<= ?distance 5000)"]
  ],
  "select": ["?place", "?distance"],
  "orderBy": ["?distance"],
  "limit": 10
}

SPARQL (same pattern, same acceleration):

PREFIX ex: <http://example.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?station ?distance
WHERE {
  ?station a ex:GasStation ;
           ex:location ?loc .
  BIND(geof:distance(?loc, "POINT(2.35 48.85)"^^geo:wktLiteral) AS ?distance)
  FILTER(?distance < 10000)
}
ORDER BY ?distance
LIMIT 10

How Index Acceleration Works

  1. Latitude-band scan: The query planner converts the radius to latitude bounds and scans only points in [lat - δ, lat + δ]
  2. Haversine post-filter: Results are filtered by exact haversine distance to eliminate false positives
  3. Distance sorting: Results can be sorted by distance for k-nearest-neighbor queries

Performance characteristics:

  • Uses POST index with latitude-primary encoding
  • Scans only relevant latitude band (not full table scan)
  • False positive rate: 22-70% depending on latitude and radius (eliminated by post-filter)
  • Handles antimeridian crossing with multiple range scans

Time Travel Support

Point proximity queries support time travel via the from ledger selector.

JSON-LD with time travel:

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "from": "places:main@t:100",
  "where": [
    { "@id": "?place", "ex:location": "?loc" },
    ["bind", "?dist", "(geof:distance ?loc \"POINT(2.35 48.85)\")"],
    ["filter", "(<= ?dist 5000)"]
  ],
  "select": ["?place"]
}

SPARQL with time travel:

PREFIX ex: <http://example.org/>
PREFIX fluree: <https://ns.flur.ee/ledger#>

SELECT ?place ?loc
FROM <ledger:places:main?t=100>
WHERE {
  ?place ex:location ?loc .
}

Time travel correctly handles:

  • Points that existed at time t but were later retracted
  • Points added after time t (excluded from results)
  • Overlay novelty merging for recent uncommitted data

Graph Scoping

Point proximity queries respect graph context. When used inside a GRAPH pattern, the query scans only the specified named graph:

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "from": "world:main",
  "where": [
    ["graph", "http://example.org/france", [
      { "@id": "?city", "ex:location": "?loc" },
      ["bind", "?dist", "(geof:distance ?loc \"POINT(2.35 48.85)\")"],
      ["filter", "(<= ?dist 50000)"]
    ]]
  ],
  "select": ["?city"]
}

This returns only cities from the France graph within 50km of Paris, not cities from other named graphs.

S2 Spatial Index (Complex Geometries)

Fluree provides an S2 cell-based spatial index for complex geometries (polygons, linestrings, multipolygons). This index enables efficient spatial predicate queries like “find all places within this region” or “find all regions that contain this point.”

Supported Operations

OperationDescriptionUse Case
withinFind geometries that are completely inside a query geometry“Find all buildings within this city boundary”
containsFind geometries that completely contain a query geometry“Find the district that contains this point”
intersectsFind geometries that overlap with a query geometry“Find all parcels that touch this proposed road”
nearbyFind geometries within a radius (with distances)“Find polygons within 10km of this point”

Query Syntax

JSON-LD Query (find places within a polygon):

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#",
    "idx": "https://ns.flur.ee/index#"
  },
  "from": "places:main",
  "where": [
    {
      "idx:spatial": "within",
      "idx:property": "ex:boundary",
      "idx:geometry": "POLYGON((2.0 48.0, 3.0 48.0, 3.0 49.0, 2.0 49.0, 2.0 48.0))",
      "idx:result": "?place"
    }
  ],
  "select": ["?place"]
}

Find regions containing a point:

{
  "where": [
    {
      "idx:spatial": "contains",
      "idx:property": "ex:boundary",
      "idx:geometry": "POINT(2.35 48.85)",
      "idx:result": "?district"
    }
  ],
  "select": ["?district"]
}

Find intersecting parcels:

{
  "where": [
    {
      "idx:spatial": "intersects",
      "idx:property": "ex:parcel",
      "idx:geometry": "LINESTRING(2.0 48.0, 3.0 49.0)",
      "idx:result": "?parcel"
    }
  ],
  "select": ["?parcel"]
}

Find polygons near a point (with distances):

{
  "@context": {
    "ex": "http://example.org/",
    "idx": "https://ns.flur.ee/index#"
  },
  "from": "places:main",
  "where": [
    {
      "idx:spatial": "nearby",
      "idx:property": "ex:boundary",
      "idx:geometry": "POINT(2.35 48.85)",
      "idx:radius": 10000,
      "idx:result": {
        "idx:id": "?region",
        "idx:distance": "?dist"
      }
    }
  ],
  "select": ["?region", "?dist"],
  "orderBy": ["?dist"]
}

How It Works

The S2 spatial index uses Google’s S2 geometry library to map geometries to hierarchical cells on a sphere:

  1. Ingestion: When a geo:wktLiteral polygon/linestring is committed, the indexer generates an S2 cell covering and stores cell entries in the spatial index.

  2. Query: When you query with a spatial predicate, the system:

    • Generates an S2 covering for your query geometry
    • Scans the index for matching cell ranges
    • Applies bounding-box prefiltering
    • Performs exact geometry tests on candidates
  3. Time-Travel: The index supports full time-travel semantics, so you can query spatial data at any historical point in time.

Index Configuration

The S2 index is automatically created for predicates with geo:wktLiteral values. Configuration options:

ParameterDefaultDescription
min_level4Minimum S2 cell level (coarser = faster build)
max_level16Maximum S2 cell level (finer = tighter coverage)
max_cells8Maximum cells per geometry covering

Higher max_cells values produce tighter coverings (fewer false positives) but increase index size and build time.

Performance Characteristics

Performance depends on data distribution, covering configuration, and result selectivity. See Spatial Index Design for design rationale; a benchmark suite is recommended for deployment-specific measurements.

Supported Geometry Types

Geometry TypeS2 IndexNotes
POLYGON✅ YesMost common for region queries
MULTIPOLYGON✅ YesMultiple disjoint regions
LINESTRING✅ YesRoutes, boundaries
MULTILINESTRING✅ YesMultiple line segments
POINT⚠️ OptionalUse inline GeoPoint for proximity; S2 available with index_points=true
GEOMETRYCOLLECTION✅ YesMixed geometry types

Graph Scoping

Spatial indexes are scoped by named graph. Each graph has its own spatial index, and queries automatically use the correct index based on the graph context.

Default graph query:

{
  "from": "mydb:main",
  "where": [
    {
      "idx:spatial": "within",
      "idx:property": "ex:boundary",
      "idx:geometry": "POLYGON(...)",
      "idx:result": "?region"
    }
  ]
}

Named graph query (using GRAPH pattern):

{
  "from": "mydb:main",
  "where": [
    ["graph", "http://example.org/regions",
     {
       "idx:spatial": "within",
       "idx:property": "ex:boundary",
       "idx:geometry": "POLYGON(...)",
       "idx:result": "?region"
     }
    ]
  ]
}

When you enter a GRAPH pattern, the spatial query automatically switches to that graph’s index. This ensures results are correctly scoped—a spatial query inside GRAPH <http://example.org/france> only searches geometries in the France graph, not geometries from other named graphs.

Multiple named graphs:

If you have data across multiple named graphs (e.g., countries), you can query each independently:

{
  "from": "world:main",
  "where": [
    ["graph", "http://example.org/germany",
     {
       "idx:spatial": "within",
       "idx:property": "ex:boundary",
       "idx:geometry": "POLYGON(...)",
       "idx:result": "?germanCity"
     }
    ]
  ]
}

The same idx:property (e.g., ex:boundary) in different named graphs will query separate spatial indexes.

Time-Travel Support

Spatial queries support time travel via the from ledger selector:

{
  "from": "places:main@t:100",
  "where": [
    {
      "idx:spatial": "within",
      "idx:property": "ex:boundary",
      "idx:geometry": "POLYGON(...)",
      "idx:result": "?place"
    }
  ],
  "select": ["?place"]
}

This returns places as they existed at transaction time 100, correctly handling:

  • Geometries added after t=100 (excluded)
  • Geometries retracted before t=100 (excluded)
  • Geometries modified between t=100 and now

Note: Time travel requires t >= index.base_t. Queries for times before the index was built will return an error.

Note (v1): The historical-view API (query_historical) does not execute spatial index patterns. Use a time-pinned from selector (as above) against the current ledger state for spatial time travel.

Choosing Between Point Proximity and S2 Spatial Queries

Fluree provides two spatial query paths. Use this guide to pick the right one:

Use CaseApproachReason
“Find restaurants near me”geof:distance bind+filterPOINT proximity with distance ranking
“Find cities within 100km”geof:distance bind+filterPOINT data with radius filter
“Find buildings in this district”idx:spatial (within)POLYGONs inside a boundary
“Which zone contains this address?”idx:spatial (contains)POLYGON containment test
“Find parcels crossing this road”idx:spatial (intersects)LINESTRING intersection
“Find regions near this location”idx:spatial (nearby)POLYGONs with distance from point

Quick rule: Use geof:distance bind+filter for POINT locations with radius queries. Use idx:spatial for polygon/linestring containment, intersection, or region-based queries.

End-to-End Example: Points and Polygons

This example shows storing both POINT locations and POLYGON boundaries, then querying each appropriately.

1. Insert data with both geometry types:

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "@graph": [
    {
      "@id": "ex:central-paris",
      "@type": "ex:District",
      "ex:name": "Central Paris",
      "ex:boundary": {
        "@value": "POLYGON((2.3 48.8, 2.4 48.8, 2.4 48.9, 2.3 48.9, 2.3 48.8))",
        "@type": "geo:wktLiteral"
      }
    },
    {
      "@id": "ex:eiffel-tower",
      "@type": "ex:Landmark",
      "ex:name": "Eiffel Tower",
      "ex:location": {
        "@value": "POINT(2.2945 48.8584)",
        "@type": "geo:wktLiteral"
      }
    },
    {
      "@id": "ex:louvre",
      "@type": "ex:Landmark",
      "ex:name": "Louvre Museum",
      "ex:location": {
        "@value": "POINT(2.3376 48.8606)",
        "@type": "geo:wktLiteral"
      }
    }
  ]
}

2. Find landmarks near Eiffel Tower (POINT proximity):

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "from": "places:main",
  "where": [
    { "@id": "?place", "ex:location": "?loc" },
    ["bind", "?dist", "(geof:distance ?loc \"POINT(2.2945 48.8584)\")"],
    ["filter", "(<= ?dist 5000)"],
    { "@id": "?place", "ex:name": "?name" }
  ],
  "select": ["?name", "?dist"],
  "orderBy": ["?dist"]
}

3. Find which district contains the Louvre (POLYGON containment):

{
  "@context": {
    "ex": "http://example.org/",
    "idx": "https://ns.flur.ee/index#"
  },
  "from": "places:main",
  "where": [
    {
      "idx:spatial": "contains",
      "idx:property": "ex:boundary",
      "idx:geometry": "POINT(2.3376 48.8606)",
      "idx:result": "?district"
    },
    { "@id": "?district", "ex:name": "?name" }
  ],
  "select": ["?name"]
}

MULTIPOLYGON Example

Store regions with multiple disjoint areas (e.g., archipelagos, non-contiguous territories):

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "@id": "ex:hawaii",
  "@type": "ex:State",
  "ex:name": "Hawaii",
  "ex:territory": {
    "@value": "MULTIPOLYGON(((-160 22, -159 22, -159 21, -160 21, -160 22)), ((-156 20, -155 20, -155 19, -156 19, -156 20)))",
    "@type": "geo:wktLiteral"
  }
}

Query: “Find states that contain this coordinate”

{
  "where": [
    {
      "idx:spatial": "contains",
      "idx:property": "ex:territory",
      "idx:geometry": "POINT(-155.5 19.5)",
      "idx:result": "?state"
    }
  ],
  "select": ["?state"]
}

LINESTRING Example

Store routes, roads, or boundaries:

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "@id": "ex:route-66",
  "@type": "ex:Highway",
  "ex:name": "Route 66",
  "ex:path": {
    "@value": "LINESTRING(-118.2 34.1, -112.0 35.2, -106.6 35.1, -97.5 35.5, -90.2 38.6, -87.6 41.9)",
    "@type": "geo:wktLiteral"
  }
}

Query: “Find highways that cross this region”

{
  "where": [
    {
      "idx:spatial": "intersects",
      "idx:property": "ex:path",
      "idx:geometry": "POLYGON((-100 34, -95 34, -95 37, -100 37, -100 34))",
      "idx:result": "?highway"
    }
  ],
  "select": ["?highway"]
}

Planned Capabilities

R-tree Index

An ephemeral R-tree is planned for:

  • Spatial joins between datasets
  • Range queries across multiple properties

Examples

Storing Multiple Locations

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "@graph": [
    {
      "@id": "ex:paris",
      "@type": "ex:City",
      "ex:name": "Paris",
      "ex:center": { "@value": "POINT(2.3522 48.8566)", "@type": "geo:wktLiteral" }
    },
    {
      "@id": "ex:london",
      "@type": "ex:City",
      "ex:name": "London",
      "ex:center": { "@value": "POINT(-0.1278 51.5074)", "@type": "geo:wktLiteral" }
    },
    {
      "@id": "ex:tokyo",
      "@type": "ex:City",
      "ex:name": "Tokyo",
      "ex:center": { "@value": "POINT(139.6917 35.6895)", "@type": "geo:wktLiteral" }
    }
  ]
}

Turtle Format

@prefix ex: <http://example.org/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .

ex:sensor-1 a ex:WeatherStation ;
    ex:name "Central Park Station" ;
    ex:location "POINT(-73.9654 40.7829)"^^geo:wktLiteral .

ex:sensor-2 a ex:WeatherStation ;
    ex:name "Times Square Station" ;
    ex:location "POINT(-73.9855 40.7580)"^^geo:wktLiteral .

Mixed Geometry Types

Non-POINT geometries are stored as strings:

{
  "@context": {
    "ex": "http://example.org/",
    "geo": "http://www.opengis.net/ont/geosparql#"
  },
  "@graph": [
    {
      "@id": "ex:central-park",
      "@type": "ex:Park",
      "ex:name": "Central Park",
      "ex:entrance": {
        "@value": "POINT(-73.9654 40.7829)",
        "@type": "geo:wktLiteral"
      },
      "ex:boundary": {
        "@value": "POLYGON((-73.9819 40.7681, -73.9580 40.8006, -73.9493 40.7969, -73.9732 40.7644, -73.9819 40.7681))",
        "@type": "geo:wktLiteral"
      }
    }
  ]
}

The ex:entrance POINT is stored as a native GeoPoint, while the ex:boundary POLYGON is stored as a string.

Fluree supports the GeoSPARQL geo:wktLiteral datatype and geof:distance function. Point proximity queries use a unified geof:distance bind+filter pattern in both JSON-LD and SPARQL. For complex geometry queries (within/contains/intersects/nearby), use the JSON-LD idx:spatial pattern described above.

FeatureStatus
geo:wktLiteral datatype✅ Supported
POINT geometry✅ Native encoding (60-bit packed)
LINESTRING geometry✅ S2 spatial index
POLYGON geometry✅ S2 spatial index
MULTIPOLYGON geometry✅ S2 spatial index
geo:asWKT property✅ Use any property with wktLiteral type
geof:distance function✅ Supported (haversine, ~0.3% accuracy)
Proximity queries (radius)✅ Index-accelerated via geof:distance bind+filter
Time travel✅ Support via from: "<ledger>@t:<t>"
k-NN queries (nearest K)✅ Via ORDER BY distance + LIMIT
within spatial predicate✅ Via JSON-LD idx:spatial
contains spatial predicate✅ Via JSON-LD idx:spatial
intersects spatial predicate✅ Via JSON-LD idx:spatial
Spatial join (two variables)🔜 Planned (R-tree)

Best Practices

Use geo:wktLiteral for All Geometry

Always declare the datatype explicitly:

// Correct
{ "@value": "POINT(2.3522 48.8566)", "@type": "geo:wktLiteral" }

// Incorrect - stored as plain string
{ "@value": "POINT(2.3522 48.8566)" }

Coordinate Precision

While Fluree stores ~0.3mm precision, consider your source data accuracy:

// Excessive precision (GPS typically ±3-5m)
"POINT(2.352219834765 48.856614892341)"

// Appropriate precision for most applications
"POINT(2.3522 48.8566)"

Coordinate Validation

Validate coordinates before insertion:

  • Latitude: -90 to 90
  • Longitude: -180 to 180
  • No NaN or infinity values

Invalid coordinates are stored as strings and won’t benefit from native GeoPoint indexing.

Troubleshooting

Query returns no results

Check the coordinate order. WKT uses POINT(longitude latitude), not POINT(latitude longitude):

// Correct: Paris (lng=2.35, lat=48.86)
"POINT(2.35 48.86)"

// Wrong: coordinates swapped
"POINT(48.86 2.35)"

Check the datatype. Geometry values must use geo:wktLiteral:

// Correct
{ "@value": "POINT(2.35 48.86)", "@type": "geo:wktLiteral" }

// Wrong - no datatype, stored as plain string
{ "@value": "POINT(2.35 48.86)" }

Check the predicate. The property in the triple pattern must match the data exactly:

// If data uses ex:location, the triple must use ex:location
{ "@id": "?place", "ex:location": "?loc" }    // Correct
{ "@id": "?place", "ex:geo": "?loc" }         // Wrong - different predicate

For S2 spatial queries, idx:property must also match:

"idx:property": "ex:boundary"    // Correct
"idx:property": "ex:geo"         // Wrong - different predicate

“No spatial index available” error

The spatial index is built asynchronously after commits. If querying immediately after insert:

  • Wait for background indexing to complete, or
  • Use from: "<ledger>@t:<t>" to query up to the indexed t

Large polygons cause slow queries

Polygons crossing the antimeridian (±180° longitude) generate many S2 cells. Consider:

  • Splitting the polygon at the antimeridian
  • Using a simpler bounding region for initial filtering

SPARQL spatial predicates not accelerated

In v1, SPARQL geof:* spatial predicates (like geof:sfWithin) evaluate as filters, not index operators. For accelerated spatial queries on complex geometries, use the JSON-LD idx:spatial pattern instead. Note: geof:distance bind+filter patterns are automatically accelerated in both SPARQL and JSON-LD.

Graph Sources and Integrations

Graph sources extend Fluree’s query capabilities by integrating specialized indexes and external data sources. Graph sources appear as queryable ledgers but are backed by different storage and indexing systems.

Graph Source Types

Overview

Introduction to graph sources:

  • What are graph sources
  • Architecture and design
  • Use cases
  • Performance characteristics
  • Creating and managing graph sources

Iceberg / Parquet

Apache Iceberg data lake integration:

  • Querying Iceberg tables
  • Parquet file support
  • Schema mapping
  • Partition pruning
  • Performance optimization

R2RML

Relational database mapping:

  • R2RML standard
  • Mapping relational data to RDF
  • SQL query generation
  • Join optimization
  • Supported databases (PostgreSQL, MySQL, etc.)

BM25 Graph Source

Full-text search as graph source:

  • BM25 index as queryable ledger
  • Search predicates
  • Combining with structured queries
  • Real-time index updates

What are Graph Sources?

Graph sources are queryable data sources that appear as Fluree ledgers but are backed by specialized storage:

Standard Ledger:

mydb:main → RDF triple store → SPOT/POST/OPST/PSOT indexes

Graph Source:

products-search:main → BM25 index → Inverted text index
products-vector:main → HNSW → Vector similarity index
warehouse-data:main → Iceberg → Parquet files
sql-db:main → R2RML → PostgreSQL tables

Query Transparency

Graph sources are queried like regular ledgers:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ]
}

Note: SPARQL queries use the same f: namespace pattern (f:graphSource, f:searchText, etc.) within JSON-LD query syntax.

Multi-Graph Queries

Combine regular ledgers with graph sources:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "select": ["?product", "?name", "?price", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    },
    { "@id": "?product", "schema:name": "?name" },
    { "@id": "?product", "schema:price": "?price" }
  ],
  "orderBy": ["-?score"]
}

Joins structured data from products:main with search results from the products-search:main graph source.

Graph Source Lifecycle

1. Create Graph Source

Define mapping/configuration:

curl -X POST http://localhost:8090/index/bm25?ledger=mydb:main \
  -d '{"name": "products-search", "fields": [...]}'

2. Initial Indexing

Build index from source data:

  • Load data from source ledger
  • Transform to target format
  • Build specialized index
  • Publish to nameservice

3. Incremental Updates

Keep synchronized with source:

  • Monitor source ledger for changes
  • Update graph source incrementally
  • Maintain consistency

4. Query Execution

Execute queries against graph source:

  • Parse query
  • Route to appropriate backend
  • Execute specialized query
  • Return results

Supported Graph Sources

BM25 Full-Text Search

Purpose: Keyword search with relevance ranking

Backend: Inverted index

Use Cases:

  • E-commerce product search
  • Document search
  • Knowledge base search

Example:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "docs:main",
  "where": [
    {
      "f:graphSource": "docs-search:main",
      "f:searchText": "quarterly report",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?doc" }
    }
  ]
}

See BM25 Graph Source and BM25 Indexing.

Purpose: Semantic search using embeddings

Backend: HNSW index (embedded or remote)

Use Cases:

  • Semantic search
  • Recommendations
  • Image similarity
  • Clustering

See Vector Search for details.

Apache Iceberg

Purpose: Query data lake tables

Backend: Apache Iceberg / Parquet files

Use Cases:

  • Analytics on historical data
  • Data warehouse integration
  • Large-scale batch data

Example:

{
  "from": "warehouse-sales:main",
  "select": ["?date", "?revenue"],
  "where": [
    { "@id": "?sale", "warehouse:date": "?date" },
    { "@id": "?sale", "warehouse:revenue": "?revenue" }
  ],
  "filter": "?date >= '2024-01-01'"
}

See Iceberg / Parquet.

R2RML (Relational Databases)

Purpose: Query relational databases as RDF

Backend: SQL databases (PostgreSQL, MySQL, etc.)

Use Cases:

  • Existing database integration
  • Incremental adoption of graph queries
  • Unified queries across systems

Example:

{
  "from": "sql-customers:main",
  "select": ["?name", "?email"],
  "where": [
    { "@id": "?customer", "schema:name": "?name" },
    { "@id": "?customer", "schema:email": "?email" }
  ]
}

See R2RML.

Architecture

Graph Source Registry

Graph sources registered in nameservice:

{
  "graph_source_id": "products-search:main",
  "type": "bm25",
  "source": "products:main",
  "backend": "inverted_index",
  "status": "ready"
}

Query Routing

Query engine routes to appropriate backend:

Query: FROM <products-search:main>
  ↓
Nameservice lookup: type=bm25
  ↓
Route to BM25 query engine
  ↓
Execute against inverted index
  ↓
Return results

Result Integration

Results from graph sources join with regular graphs:

FROM <products:main>, <products-search:main>
  ↓
Execute subquery on products:main → Results A
Execute subquery on products-search:main → Results B
  ↓
Join Results A + B on ?product
  ↓
Return combined results

Performance Considerations

Query Planning

Graph sources affect query optimization:

  • Specialized indexes enable efficient filtering
  • Push filters down to graph source when possible
  • Minimize data transfer between graphs

Data Transfer

Minimize data movement:

  • Filter in graph source before joining
  • Use selective projections
  • Leverage graph source’s native capabilities

Caching

Some graph source backends support caching:

  • BM25: Results cacheable
  • Vector: Similar queries share computation
  • Iceberg: Parquet file caching
  • R2RML: SQL query plan caching

Best Practices

1. Choose Appropriate Graph Source Type

Match graph source to use case:

  • Keyword search → BM25
  • Semantic search → Vector
  • Analytics → Iceberg
  • Relational database integration → R2RML

2. Filter Early

Push filters to graph sources:

Good:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 50,
      "f:searchResult": { "f:resultId": "?p" }
    },
    { "@id": "?p", "schema:price": "?price" }
  ],
  "filter": "?price < 1000"
}

3. Monitor Graph Source Lag

Check synchronization status:

curl http://localhost:8090/index/status/products-search:main

4. Use Appropriate Limits

Limit results from graph sources:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "query",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?p" }
    }
  ]
}

5. Test Performance

Profile queries combining graph sources:

curl -X POST http://localhost:8090/v1/fluree/explain \
  -d '{...}'

Troubleshooting

Graph Source Not Found

{
  "error": "GraphSourceNotFound",
  "message": "Graph source not found: products-search:main"
}

Solution: Create graph source or check name spelling.

Synchronization Lag

Graph source out of sync with source:

# Check status
curl http://localhost:8090/index/status/products-search:main

# Trigger rebuild
curl -X POST http://localhost:8090/index/rebuild/products-search:main

Poor Performance

Query combining graph sources is slow:

  1. Check explain plan
  2. Add filters to reduce result set
  3. Ensure indexes are up-to-date
  4. Consider query rewrite

Graph Sources Overview

Graph sources enable querying specialized indexes and external data sources using the same query interface as regular Fluree ledgers. This document provides a comprehensive overview of graph source architecture and capabilities.

Concept

A graph source is anything you can address by a graph name/IRI and query as part of a single execution. Some graph sources are ledger-backed RDF graphs; others are backed by different systems optimized for specific query patterns.

Regular Ledger:

  • Stored as RDF triples
  • Indexed with SPOT, POST, OPST, PSOT
  • Optimized for graph traversal

Non-ledger Graph Source:

  • Stored in specialized format
  • Custom indexing for specific queries
  • Optimized for particular use cases

Both are queried using the same SPARQL or JSON-LD Query syntax.

Architecture

Components

┌─────────────────────────────────────────┐
│         Fluree Query Engine             │
└─────────────────┬───────────────────────┘
                  │
      ┌───────────┴──────────┐
      │                      │
┌─────▼──────┐      ┌───────▼────────┐
│  Regular   │      │    Graph       │
│  Ledgers   │      │    Sources     │
└─────┬──────┘      └───────┬────────┘
      │                     │
      │             ┌───────┴────────┐
      │             │                │
┌─────▼──────┐ ┌───▼───┐     ┌─────▼──────┐
│ RDF Triple │ │ BM25  │     │  usearch   │
│   Store    │ │ Index │     │  Vector    │
└────────────┘ └───────┘     └────────────┘

Graph Source Registry (Nameservice)

Non-ledger graph sources are registered in nameservice:

{
  "graph_source_id": "products-search:main",
  "type": "graph-source",
  "backend": "bm25",
  "source": "products:main",
  "config": {
    "fields": [...]
  },
  "status": "ready",
  "last_sync": "2024-01-22T10:30:00Z"
}

Graph Source Types

Backend: Inverted text index

Purpose: Keyword search with relevance ranking

Configuration:

{
  "type": "bm25",
  "source": "products:main",
  "fields": [
    { "predicate": "schema:name", "weight": 2.0 },
    { "predicate": "schema:description", "weight": 1.0 }
  ]
}

Query:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ],
  "select": ["?product", "?score"]
}

2. Vector Similarity

Backend: HNSW index (embedded or remote)

Purpose: Semantic search using embeddings

Configuration:

{
  "type": "vector",
  "source": "products:main",
  "embedding_property": "ex:embedding",
  "dimensions": 384,
  "metric": "cosine"
}

Query:

{
  "from": "mydb:main",
  "where": [
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": [0.1, 0.2, ...],
      "f:distanceMetric": "cosine",
      "f:searchLimit": 10,
      "f:searchResult": {
        "f:resultId": "?product",
        "f:resultScore": "?score"
      }
    }
  ],
  "select": ["?product", "?score"]
}

3. Apache Iceberg

Backend: Iceberg tables / Parquet files via R2RML mapping

Purpose: Analytics on data lake

Iceberg graph sources require an R2RML mapping that defines how table rows become RDF triples. Two catalog modes select how Iceberg metadata is discovered:

  • REST catalog: connects to an Iceberg REST catalog API (e.g., Polaris)
  • Direct S3: reads metadata/version-hint.text from the table’s S3 location (no catalog server required)

See Iceberg / Parquet for full configuration details and examples.

Query:

{
  "from": "warehouse-orders:main",
  "select": ["?orderId", "?total"],
  "where": [
    { "@id": "?order", "ex:orderId": "?orderId" },
    { "@id": "?order", "ex:total": "?total" }
  ]
}

Creating Graph Sources

Via Rust API

Graph sources are created and registered via the fluree-db-api Rust API, which publishes the graph source record into the nameservice.

#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, R2rmlCreateConfig};

let fluree = FlureeBuilder::default().build().await?;

let config = R2rmlCreateConfig::new_direct(
    "execution-log",
    "s3://bucket/warehouse/logs/execution_log",
    "fluree:file://mappings/execution_log.ttl",
)
.with_s3_region("us-east-1");

fluree.create_r2rml_graph_source(config).await?;
}

Querying Graph Sources

Graph sources come in two flavors with different query models:

  • Iceberg sources — queried transparently using standard SPARQL/JSON-LD patterns (FROM, GRAPH, or as a direct query target)
  • Search indexes (BM25, Vector) — queried using the f:graphSource / f:searchText pattern

Iceberg (Transparent)

Iceberg graph sources are queried just like ledgers. No special syntax is needed:

As a direct target:

-- Query the graph source directly
SELECT ?s ?p ?o FROM <execution-log:main> WHERE { ?s ?p ?o } LIMIT 10

Via GRAPH pattern (joining with ledger data):

{
  "from": "mydb:main",
  "select": ["?customer", "?orderId", "?total"],
  "where": [
    { "@id": "?customer", "schema:name": "?name" },
    { "@id": "?customer", "ex:customerId": "?custId" },
    {
      "graph": "warehouse-orders:main",
      "where": [
        { "@id": "?order", "ex:customerId": "?custId" },
        { "@id": "?order", "ex:orderId": "?orderId" },
        { "@id": "?order", "ex:total": "?total" }
      ]
    }
  ]
}

Iceberg graph sources use R2RML mappings to define how table rows become RDF triples. See Iceberg / Parquet and R2RML for details.

Search Indexes (BM25, Vector)

Search indexes use the f:graphSource pattern:

Single Graph Source

Query one graph source:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ]
}

Multiple Graph Sources

Combine multiple graph sources:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "select": ["?product", "?textScore", "?vecScore"],
  "values": [
    ["?queryVec"],
    [{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
  ],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
    },
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vecScore" }
    }
  ]
}

Graph Sources + Regular Graphs

Combine graph sources and regular ledgers:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "select": ["?product", "?name", "?price", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    },
    { "@id": "?product", "schema:name": "?name" },
    { "@id": "?product", "schema:price": "?price" }
  ]
}

Synchronization

Source Tracking

Graph sources track their source ledger:

Source: products:main @ t=150
Graph Source: products-search:main @ source_t=150

Update Modes

Real-Time:

  • Updates immediately as source changes
  • Low latency
  • Higher overhead

Batch:

  • Updates periodically
  • Higher latency
  • Lower overhead

Manual:

  • Updates on demand
  • Full control
  • Requires manual triggering

Checking Sync Status

curl http://localhost:8090/graph-source/products-search:main/status

Response:

{
  "name": "products-search:main",
  "source": "products:main",
  "source_t": 150,
  "index_t": 148,
  "lag": 2,
  "last_sync": "2024-01-22T10:30:00Z",
  "status": "syncing"
}

Query Execution

Query Planning

Query planner handles graph sources:

  1. Parse Query: Extract graph patterns
  2. Route Subqueries: Identify which graphs handle which patterns
  3. Execute Subqueries: Run against appropriate backends
  4. Join Results: Combine results from multiple graphs
  5. Apply Filters: Final filtering and sorting

Example Execution

Query:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 50,
      "f:searchResult": { "f:resultId": "?p" }
    },
    { "@id": "?p", "schema:price": "?price" }
  ],
  "filter": "?price < 1000"
}

Execution Plan:

1. Execute BM25 search on products-search:main:
   f:searchText "laptop", f:searchLimit 50
   → Result: ?p = [ex:p1, ex:p2, ex:p3, ...]

2. Execute on products:main:
   SELECT ?p ?price WHERE {
     VALUES ?p { ex:p1 ex:p2 ex:p3 ... }
     ?p schema:price ?price
   }
   → Result: [(ex:p1, 899), (ex:p2, 1200), ...]

3. Join and filter:
   ?price < 1000
   → Result: [(ex:p1, 899)]

Performance Characteristics

BM25 Graph Sources

  • Index Build: O(n × avg_doc_length)
  • Query: O(log n) with inverted index
  • Space: 2-3× source data
  • Update: Incremental, O(doc_size)

Vector Graph Sources

  • Index Build: O(n log n) for HNSW
  • Query: O(log n) approximate
  • Space: 1.5× embedding size
  • Update: Incremental, O(1)

Iceberg Graph Sources

  • Index Build: No index (direct file access)
  • Query: O(partitions scanned)
  • Space: Zero overhead (uses Parquet files)
  • Update: Batch-oriented

Best Practices

1. Choose Appropriate Type

Match graph source type to use case:

  • Keyword search → BM25
  • Semantic search → Vector
  • Analytics / data lake → Iceberg (with R2RML mapping)

2. Monitor Synchronization

Check sync lag regularly:

setInterval(async () => {
  const status = await getGraphSourceStatus('products-search:main');
  if (status.lag > 10) {
    console.warn(`Graph source lag: ${status.lag} transactions`);
  }
}, 60000);

3. Filter in Graph Sources

Push filters to graph sources when possible:

Good (graph source pattern first narrows results before graph traversal):

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?p" }
    },
    { "@id": "?p", "schema:name": "?name" }
  ]
}

Bad (graph traversal before graph source means scanning all products first):

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "where": [
    { "@id": "?p", "schema:name": "?name" },
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?p" }
    }
  ]
}

4. Use Explain Plans

Understand query execution:

curl -X POST http://localhost:8090/v1/fluree/explain \
  -d '{...}'

5. Limit Results

Always use LIMIT with graph sources:

{
  "where": [...],
  "limit": 100
}

Troubleshooting

High Sync Lag

Symptom: lag increasing

Causes:

  • Source ledger write rate too high
  • Graph source indexing too slow
  • Resource constraints

Solutions:

  • Increase indexing resources
  • Batch updates
  • Use manual sync mode

Query Performance Issues

Symptom: Slow queries combining graph sources

Solutions:

  1. Check explain plan
  2. Add filters to reduce intermediate results
  3. Ensure graph source is synced
  4. Consider query rewrite

Missing Results

Symptom: Expected results not returned

Causes:

  • Graph source not synced
  • Mapping misconfiguration
  • Filter too restrictive

Solutions:

  • Check sync status
  • Verify mapping configuration
  • Test subqueries independently

Iceberg / Parquet

Fluree integrates with Apache Iceberg to query data lake tables as graph sources. An R2RML mapping defines how Iceberg table rows are materialized into RDF triples, enabling you to query large-scale analytical data stored in Parquet format using the same SPARQL / JSON-LD query interface as regular ledgers.

Note: Requires the iceberg feature flag. See Compatibility and Feature Flags.

What is Apache Iceberg?

Apache Iceberg is an open table format for huge analytical datasets. It provides:

  • ACID transactions on data lakes
  • Time travel and versioning
  • Schema evolution
  • Partition management
  • Optimized file organization (Parquet)

Configuration

Catalog Modes

Fluree supports two ways to discover Iceberg metadata:

  • REST catalog: discover table metadata via an Iceberg REST catalog API (e.g., Polaris).
  • Direct S3 (no catalog server): bypass REST discovery and read version-hint.text from the table’s metadata/ directory to resolve the current metadata file.

CLI

The fluree iceberg map command creates Iceberg graph sources from the command line. An R2RML mapping is required to define how table rows become RDF triples.

# REST catalog with R2RML mapping
fluree iceberg map warehouse-orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --r2rml mappings/orders.ttl \
  --auth-bearer $POLARIS_TOKEN

# Direct S3 (no catalog server) with R2RML mapping
fluree iceberg map execution-log \
  --mode direct \
  --table-location s3://bucket/warehouse/logs/execution_log \
  --r2rml mappings/execution_log.ttl

Once mapped, graph sources appear in fluree list, can be inspected with fluree info, and removed with fluree drop. See CLI iceberg reference for all options.

HTTP API

When running the Fluree server (or Docker image) with the iceberg feature enabled, map a table by POSTing to {api_base_url}/iceberg/map (default: /v1/fluree/iceberg/map). The endpoint is admin-protected — include the admin Bearer token if admin auth is configured.

# REST catalog with R2RML mapping (mapping passed inline)
curl -X POST http://localhost:8090/v1/fluree/iceberg/map \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d @- <<'JSON'
{
  "name": "warehouse-orders",
  "mode": "rest",
  "catalog_uri": "https://polaris.example.com/api/catalog",
  "table": "sales.orders",
  "warehouse": "my-warehouse",
  "auth_bearer": "polaris-token-here",
  "r2rml": "@prefix rr: <http://www.w3.org/ns/r2rml#> . ...",
  "r2rml_type": "text/turtle"
}
JSON
# Direct S3 mode (no catalog server)
curl -X POST http://localhost:8090/v1/fluree/iceberg/map \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{
    "name": "execution-log",
    "mode": "direct",
    "table_location": "s3://bucket/warehouse/logs/execution_log",
    "r2rml": "...",
    "r2rml_type": "text/turtle",
    "s3_region": "us-east-1",
    "s3_path_style": true
  }'

R2RML can be omitted to auto-generate a direct mapping. AWS credentials for direct mode are read from the server’s environment (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, or an attached instance role). See the Graph Source Endpoints section in the API reference for the complete request/response schema.

Rust API

R2rmlCreateConfig::new and new_direct take the R2RML mapping as a content string (Turtle or JSON-LD), not a file path — read the file yourself first. To reference an already-stored mapping by address instead, build the config directly with R2rmlMappingInput::Address(...).

REST catalog mode (Polaris-style):

#![allow(unused)]
fn main() {
use fluree_db_api::R2rmlCreateConfig;

let mapping = std::fs::read_to_string("mappings/orders.ttl")?;

let config = R2rmlCreateConfig::new(
    "warehouse-orders",
    "https://polaris.example.com/api/catalog",
    "sales.orders",
    mapping,
)
.with_warehouse("my-warehouse")
.with_auth_bearer("my-token")
.with_vended_credentials(true);

fluree.create_r2rml_graph_source(config).await?;
}

Direct S3 mode (no REST catalog):

#![allow(unused)]
fn main() {
use fluree_db_api::R2rmlCreateConfig;

let mapping = std::fs::read_to_string("mappings/execution_log.ttl")?;

let config = R2rmlCreateConfig::new_direct(
    "execution-log",
    "s3://bucket/warehouse/logs/execution_log",
    mapping,
)
.with_s3_region("us-east-1")
.with_s3_path_style(true);

fluree.create_r2rml_graph_source(config).await?;
}

Stored Configuration Format (Nameservice)

Iceberg graph sources are persisted as an IcebergGsConfig JSON document in the nameservice record’s config field.

Note the nesting: the graph source is “Iceberg” (this page), and catalog.type selects the catalog mode (rest vs direct) used to discover Iceberg metadata.

REST catalog config:

{
  "catalog": {
    "type": "rest",
    "uri": "https://polaris.example.com/api/catalog",
    "warehouse": "my-warehouse",
    "auth": { "type": "bearer", "token": { "env_var": "POLARIS_TOKEN" } }
  },
  "table": "sales.orders",
  "io": {
    "vended_credentials": true,
    "s3_region": "us-east-1",
    "s3_endpoint": null,
    "s3_path_style": false
  }
}

Direct S3 config:

{
  "catalog": {
    "type": "direct",
    "table_location": "s3://bucket/warehouse/logs/execution_log"
  },
  "table": "",
  "io": {
    "vended_credentials": false,
    "s3_region": "us-east-1",
    "s3_endpoint": null,
    "s3_path_style": true
  }
}

Direct mode requirements:

  • catalog.table_location must be an S3 URI (s3:// or s3a://) pointing to the table root directory.
  • The table must contain a metadata/ subdirectory with:
    • version-hint.text (containing the current metadata filename, e.g., 00001-abc-def.metadata.json)
    • The referenced .metadata.json file
  • Direct mode uses ambient AWS credentials (IAM roles, env vars, ~/.aws/credentials). It does not support vended credentials.

How Direct metadata resolution works:

  • Fluree does not require you to provide a path to version-hint.text in the config. You provide the table root (table_location), and Fluree reads:
    • "{table_location}/metadata/version-hint.text" to get the current metadata filename
    • "{table_location}/metadata/{filename}" as the table’s current metadata
  • version-hint.text may contain a bare filename (e.g., 00001-abc.metadata.json) or a full absolute path (s3://...).
  • If version-hint.text is missing or empty, Direct mode fails with an error mentioning version-hint.text.

Iceberg table setup must already exist:

Direct mode assumes table_location points at a valid Iceberg table layout (created by iceberg-rust, Spark, etc.), including the metadata/ directory and referenced metadata/manifest files. Fluree does not create or “bootstrap” Iceberg tables; it only reads them.

When to use Direct vs REST:

ScenarioRecommended
Shared catalog (multiple consumers)REST
Writer and reader are the same systemDirect
iceberg-rust / Spark appending to known S3 pathDirect
Need catalog-managed credentials (vended)REST
Minimizing infrastructure (no catalog server)Direct

RDF Mapping (R2RML)

Every Iceberg graph source requires an R2RML mapping (Turtle format) that defines how table rows become RDF triples — specifying subject IRI templates, predicate mappings, and type conversions. See R2RML for the full mapping reference.

Type Mapping

Iceberg types map to XSD types:

Iceberg TypeRDF Type
int, longxsd:integer
float, doublexsd:decimal
stringxsd:string
booleanxsd:boolean
datexsd:date
timestampxsd:dateTime
uuidxsd:string

Querying Iceberg Tables

Iceberg graph sources are queried using standard SPARQL and JSON-LD syntax. In the Rust API, mapped sources resolve transparently through the lazy query builders:

  • fluree.graph("warehouse-orders:main").query() for a single target that may be either a native ledger or a mapped graph source
  • fluree.query_from() when the query body itself carries the dataset ("from" / FROM) or when composing multiple sources

The lower-level materialized snapshot path (let view = fluree.db(...).await?; fluree.query(&view, ...)) is still native-ledger-oriented and should not be used for graph source aliases.

#![allow(unused)]
fn main() {
// Single-target lazy query
let result = fluree.graph("warehouse-orders:main")
    .query()
    .sparql("SELECT * WHERE { ?s ?p ?o } LIMIT 10")
    .execute()
    .await?;

// FROM-driven query
let result = fluree.query_from()
    .sparql("SELECT * FROM <warehouse-orders:main> WHERE { ?s ?p ?o } LIMIT 10")
    .execute()
    .await?;
}

Basic Query

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "from": "warehouse-orders:main",
  "select": ["?orderId", "?total"],
  "where": [
    { "@id": "?order", "ex:orderId": "?orderId" },
    { "@id": "?order", "ex:total": "?total" }
  ],
  "limit": 100
}

SPARQL Query

PREFIX ex: <http://example.org/ns/>

SELECT ?orderId ?total ?date
FROM <warehouse-orders:main>
WHERE {
  ?order ex:orderId ?orderId .
  ?order ex:total ?total .
  ?order ex:orderDate ?date .
  FILTER (?date >= "2024-01-01"^^xsd:date)
}
ORDER BY DESC(?date)
LIMIT 100

Partition Pruning

Iceberg’s partition pruning optimizes queries:

{
  "from": "warehouse-orders:main",
  "select": ["?orderId", "?total"],
  "where": [
    { "@id": "?order", "ex:orderId": "?orderId" },
    { "@id": "?order", "ex:total": "?total" },
    { "@id": "?order", "ex:orderDate": "?date" }
  ],
  "filter": "?date >= '2024-01-01' && ?date < '2024-02-01'"
}

If orderDate is a partition column, Iceberg only scans January 2024 partitions.

Combining with Fluree Data

Join Iceberg data with Fluree ledgers:

{
  "from": ["customers:main", "warehouse-orders:main"],
  "select": ["?customerName", "?orderTotal", "?orderDate"],
  "where": [
    { "@id": "?customer", "schema:name": "?customerName" },
    { "@id": "?customer", "ex:customerId": "?customerId" },
    { "@id": "?order", "ex:customerId": "?customerId" },
    { "@id": "?order", "ex:total": "?orderTotal" },
    { "@id": "?order", "ex:orderDate": "?orderDate" }
  ],
  "filter": "?orderDate >= '2024-01-01'",
  "orderBy": ["-?orderDate"]
}

Combines customer data from Fluree with order data from Iceberg.

Time Travel

Query historical Iceberg snapshots:

{
  "from": "warehouse-orders:main@snapshot:12345",
  "select": ["?orderId", "?total"],
  "where": [
    { "@id": "?order", "ex:orderId": "?orderId" },
    { "@id": "?order", "ex:total": "?total" }
  ]
}

Or by timestamp:

{
  "from": "warehouse-orders:main@timestamp:2024-01-01T00:00:00Z",
  "select": ["?orderId", "?total"],
  "where": [...]
}

Aggregations

Aggregate Iceberg data:

PREFIX ex: <http://example.org/ns/>

SELECT ?date (SUM(?total) AS ?dailyRevenue) (COUNT(?order) AS ?orderCount)
FROM <warehouse-orders:main>
WHERE {
  ?order ex:orderDate ?date .
  ?order ex:total ?total .
  FILTER (?date >= "2024-01-01"^^xsd:date)
}
GROUP BY ?date
ORDER BY ?date

Performance

Query Planning

Fluree pushes filters to Iceberg:

Query: SELECT ?id WHERE { ?order ex:orderDate ?date } FILTER (?date > "2024-01-01")
  ↓
Pushed to Iceberg:
  SELECT order_id FROM sales.orders WHERE order_date > '2024-01-01'
  ↓
Iceberg optimizations:
  - Partition pruning (only scan 2024 partitions)
  - File skipping (skip files outside date range)
  - Column pruning (only read order_id, order_date)

Best Practices

  1. Partition by Common Filters:

    -- Partition Iceberg table by date
    PARTITIONED BY (YEAR(order_date), MONTH(order_date))
    
  2. Use Filters:

    {
      "where": [...],
      "filter": "?date >= '2024-01-01'"  // Enables partition pruning
    }
    
  3. Limit Results:

    {
      "where": [...],
      "limit": 1000
    }
    
  4. Project Only Needed Columns:

    {
      "select": ["?orderId", "?total"],  // Only these columns read from Parquet
      "where": [...]
    }
    

Schema Evolution

Iceberg supports schema evolution via metadata updates. If a schema change renames/removes columns used by your R2RML mapping, update the mapping accordingly.

Configuration Options

AWS Credentials

For S3-backed Iceberg (both REST and Direct modes):

export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1

REST catalog mode also supports vended credentials (credentials issued by the catalog). Direct mode uses only ambient AWS credentials (env vars, IAM roles, ~/.aws/credentials).

Use Cases

Analytics on Historical Data

Query years of historical data:

SELECT ?year (SUM(?revenue) AS ?totalRevenue)
FROM <warehouse-sales:main>
WHERE {
  ?sale ex:year ?year .
  ?sale ex:revenue ?revenue .
  FILTER (?year >= 2020 && ?year <= 2023)
}
GROUP BY ?year
ORDER BY ?year

Data Warehouse Integration

Combine real-time Fluree data with warehouse analytics:

{
  "from": ["products:main", "warehouse-sales:main"],
  "select": ["?productName", "?totalSold"],
  "where": [
    { "@id": "?product", "schema:name": "?productName" },
    { "@id": "?product", "ex:productId": "?pid" },
    { "@id": "?sale", "ex:productId": "?pid" }
  ]
}

Large-Scale Reporting

Generate reports from petabyte-scale data:

SELECT ?region ?category (SUM(?amount) AS ?total)
FROM <warehouse-transactions:main>
WHERE {
  ?txn ex:region ?region .
  ?txn ex:category ?category .
  ?txn ex:amount ?amount .
  FILTER (?year = 2024)
}
GROUP BY ?region ?category
ORDER BY DESC(?total)

Limitations

  1. Read-Only: Iceberg graph sources are read-only (no writes via Fluree)
  2. Complex Joins: Large joins between Fluree and Iceberg may be slow
  3. No Full-Text Search: Use Fluree’s BM25 for text search

Troubleshooting

Connection Issues

{
  "error": "IcebergConnectionError",
  "message": "Cannot connect to Glue catalog"
}

Solutions:

  • Check AWS credentials
  • Verify IAM permissions
  • Check network connectivity

Schema Mismatch

{
  "error": "SchemaMismatchError",
  "message": "Column 'order_date' not found in Iceberg table"
}

Solutions:

  • Update R2RML mapping configuration (if the mapping references missing columns)
  • Verify table name and catalog

Slow Queries

Causes:

  • Large result sets
  • No partition pruning
  • Scanning many files

Solutions:

  • Add date filters to enable partition pruning
  • Use LIMIT clause
  • Optimize Iceberg table partitioning
  • Use Iceberg file compaction

R2RML (Relational to RDF Mapping)

R2RML (RDB to RDF Mapping Language) is a W3C standard for mapping tabular data into RDF triples. In Fluree, R2RML mappings are used to expose Iceberg tables as RDF graph sources, enabling you to query data lake tables using SPARQL or JSON-LD Query.

What is R2RML?

R2RML defines how to map:

  • Database tables to RDF classes
  • Table columns to RDF properties
  • Rows to RDF resources
  • Foreign keys to RDF relationships

In Fluree, this enables querying Iceberg tables as if they were RDF graphs.

Configuration

Create R2RML Graph Source (Iceberg-backed)

Use R2rmlCreateConfig to register a graph source that combines:

  • an Iceberg table (REST catalog or Direct S3), and
  • an R2RML mapping (Turtle) that materializes table rows into RDF triples.

If you use Direct S3 mode, Fluree resolves the current Iceberg metadata by reading metadata/version-hint.text under the configured table_location, then loading the metadata file referenced by the hint. The Iceberg table layout must already exist at that location.

#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, R2rmlCreateConfig};

let fluree = FlureeBuilder::default().build().await?;

let config = R2rmlCreateConfig::new_direct(
    "airlines-rdf",
    "s3://bucket/warehouse/openflights/airlines",
    "fluree:file://mappings/airlines.ttl",
)
.with_s3_region("us-east-1")
.with_s3_path_style(true)
.with_mapping_media_type("text/turtle");

fluree.create_r2rml_graph_source(config).await?;
}

R2RML Mapping

Basic Mapping

Map a table to RDF class:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

<#CustomerMapping>
  a rr:TriplesMap ;
  
  rr:logicalTable [
    rr:tableName "customers"
  ] ;
  
  rr:subjectMap [
    rr:template "http://example.org/customer/{id}" ;
    rr:class schema:Person
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate schema:name ;
    rr:objectMap [ rr:column "name" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate schema:email ;
    rr:objectMap [ rr:column "email" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:customerId ;
    rr:objectMap [ rr:column "id" ]
  ] .

This maps the customers table:

CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255),
  email VARCHAR(255)
);

To RDF triples:

<http://example.org/customer/1>
  a schema:Person ;
  schema:name "Alice" ;
  schema:email "alice@example.org" ;
  ex:customerId "1" .

Foreign Key Mapping

Map relationships:

<#OrderMapping>
  a rr:TriplesMap ;
  
  rr:logicalTable [
    rr:tableName "orders"
  ] ;
  
  rr:subjectMap [
    rr:template "http://example.org/order/{id}" ;
    rr:class ex:Order
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:orderId ;
    rr:objectMap [ rr:column "id" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:customer ;
    rr:objectMap [
      rr:parentTriplesMap <#CustomerMapping> ;
      rr:joinCondition [
        rr:child "customer_id" ;
        rr:parent "id"
      ]
    ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:total ;
    rr:objectMap [ rr:column "total" ]
  ] .

Maps foreign key customer_id to RDF object property linking to customer resource.

Complex Queries

Use SQL views for complex mappings:

<#SalesReportMapping>
  a rr:TriplesMap ;
  
  rr:logicalTable [
    rr:sqlQuery """
      SELECT
        c.id as customer_id,
        c.name as customer_name,
        SUM(o.total) as total_spent,
        COUNT(o.id) as order_count
      FROM customers c
      JOIN orders o ON o.customer_id = c.id
      WHERE o.order_date >= '2024-01-01'
      GROUP BY c.id, c.name
    """
  ] ;
  
  rr:subjectMap [
    rr:template "http://example.org/customer/{customer_id}" ;
    rr:class ex:Customer
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate schema:name ;
    rr:objectMap [ rr:column "customer_name" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:totalSpent ;
    rr:objectMap [ rr:column "total_spent" ; rr:datatype xsd:decimal ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:orderCount ;
    rr:objectMap [ rr:column "order_count" ; rr:datatype xsd:integer ]
  ] .

Querying R2RML Graph Sources

R2RML graph sources are queried using standard SPARQL and JSON-LD query syntax — no special query language is needed. In the Rust API, graph source resolution is wired into the lazy query builders:

  • fluree.graph("my-gs:main").query() for a single target that may be either a native ledger or a mapped graph source
  • fluree.query_from() when the query body specifies the dataset ("from" / FROM) or combines multiple sources

The raw materialized snapshot path (fluree.db(&alias)fluree.query(&view, ...)) is still the wrong abstraction for graph source aliases because it assumes a native ledger snapshot has already been loaded.

Graph sources can be:

  • Queried directly as the target: fluree query my-gs 'SELECT * WHERE { ?s ?p ?o }'
  • Referenced in FROM clauses: SELECT * FROM <my-gs:main> WHERE { ... }
  • Referenced in GRAPH patterns: SELECT * WHERE { GRAPH <my-gs:main> { ... } } (useful for joining with ledger data)

Basic Query

{
  "@context": {
    "schema": "http://schema.org/",
    "ex": "http://example.org/ns/"
  },
  "from": "warehouse-customers:main",
  "select": ["?name", "?email"],
  "where": [
    { "@id": "?customer", "@type": "schema:Person" },
    { "@id": "?customer", "schema:name": "?name" },
    { "@id": "?customer", "schema:email": "?email" }
  ]
}

The mapping controls how subjects and predicate/object values are produced from the scanned table columns.

SPARQL Query

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>

SELECT ?name ?email
FROM <warehouse-customers:main>
WHERE {
  ?customer a schema:Person .
  ?customer schema:name ?name .
  ?customer schema:email ?email .
}

Filters

{
  "from": "warehouse-customers:main",
  "select": ["?name", "?email"],
  "where": [
    { "@id": "?customer", "schema:name": "?name" },
    { "@id": "?customer", "schema:email": "?email" },
    { "@id": "?customer", "ex:status": "?status" }
  ],
  "filter": "?status == 'active'"
}

Joins

{
  "from": "warehouse-orders:main",
  "select": ["?customerName", "?orderTotal"],
  "where": [
    { "@id": "?customer", "schema:name": "?customerName" },
    { "@id": "?order", "ex:customer": "?customer" },
    { "@id": "?order", "ex:total": "?orderTotal" }
  ]
}

Combining with Fluree Data

Join Iceberg data with Fluree ledgers:

{
  "from": ["products:main", "warehouse-inventory:main"],
  "select": ["?productName", "?stockLevel"],
  "where": [
    { "@id": "?product", "schema:name": "?productName" },
    { "@id": "?product", "ex:sku": "?sku" },
    { "@id": "?inventory", "ex:sku": "?sku" },
    { "@id": "?inventory", "ex:stockLevel": "?stockLevel" }
  ]
}

Combines product data from Fluree with inventory from an Iceberg-backed R2RML graph source.

Performance

R2RML graph sources execute by scanning the underlying Iceberg table and materializing RDF terms according to the mapping.

Best Practices

  1. Filter Early: Filters are pushed down to Iceberg for partition pruning.

    {
      "where": [...],
      "filter": "?date >= '2024-01-01'"
    }
    
  2. Limit Results:

    {
      "where": [...],
      "limit": 100
    }
    
  3. Project Only Needed Columns: Only columns referenced in the query and mapping are read from Parquet files.

  4. Partition by Common Filters: Partition your Iceberg tables by columns frequently used in filters (e.g., date).

Use Cases

Data Lake Analytics

Query Iceberg tables containing large-scale analytical data alongside Fluree ledgers:

{
  "from": ["products:main", "warehouse-sales:main"],
  "select": ["?productName", "?totalSold"],
  "where": [
    { "@id": "?product", "schema:name": "?productName" },
    { "@id": "?product", "ex:productId": "?pid" },
    { "@id": "?sale", "ex:productId": "?pid" },
    { "@id": "?sale", "ex:quantity": "?totalSold" }
  ]
}

Multi-Table Mapping

A single R2RML mapping file can define multiple TriplesMap entries, each targeting a different Iceberg table or logical view. This enables querying across related tables through a single graph source.

Limitations

  1. Read-Only: R2RML graph sources are read-only (no writes via Fluree)
  2. Performance: Complex joins across Fluree + Iceberg may be slow
  3. Schema Changes: Requires mapping updates when referenced columns change

Troubleshooting

Connection Errors

{
  "error": "IcebergConnectionError",
  "message": "Cannot load table metadata"
}

Solutions:

  • Check catalog configuration (REST vs Direct)
  • Verify AWS credentials and S3 access
  • Verify version-hint.text is present for Direct mode

Mapping Errors

{
  "error": "R2RMLMappingError",
  "message": "Invalid R2RML mapping: table 'customers' not found"
}

Solutions:

  • Verify table name / location
  • Check referenced column names in the mapping
  • Validate R2RML syntax (Turtle)

Slow Queries

Causes:

  • Large result sets (many Parquet files scanned)
  • No partition pruning
  • Complex joins across Fluree + Iceberg

Solutions:

  • Add date/partition filters to enable Iceberg partition pruning
  • Use LIMIT clause
  • Optimize R2RML mapping to project only needed columns
  • Partition Iceberg tables by common filter columns

BM25 Graph Source

BM25 indexes in Fluree are implemented as graph sources, allowing full-text search to be seamlessly integrated with structured graph queries through the standard query interface.

Overview

A BM25 graph source:

  • Indexes text content from a source ledger using a configurable query
  • Provides relevance-ranked search results via BM25 scoring
  • Integrates with JSON-LD queries through f: namespace predicates
  • Supports time-travel (query the index at any historical point)
  • Maintains a manifest of snapshots for incremental sync

For index creation, configuration, and lifecycle management, see BM25 Full-Text Search.

Querying BM25 Graph Sources

JSON-LD Search Pattern

BM25 search uses the f: (Fluree) namespace predicates in where clauses:

{
    "@context": {
        "ex": "http://example.org/",
        "f": "https://ns.flur.ee/db#"
    },
    "from": "docs:main",
    "where": [
        {
            "f:graphSource": "article-search:main",
            "f:searchText": "rust programming",
            "f:searchLimit": 10,
            "f:searchResult": {
                "f:resultId": "?doc",
                "f:resultScore": "?score"
            }
        },
        { "@id": "?doc", "ex:title": "?title" }
    ],
    "select": ["?doc", "?title", "?score"]
}

Pattern Fields

FieldRequiredDescription
f:graphSourceYesGraph source ID (e.g., "article-search:main")
f:searchTextYesQuery text. Analyzed with the same tokenizer/stemmer as indexing.
f:searchLimitYesMaximum number of search results to return
f:searchResultYesObject with variable bindings for results
f:resultIdYesVariable for the matched document IRI (e.g., "?doc")
f:resultScoreNoVariable for the BM25 relevance score (e.g., "?score")
f:resultLedgerNoVariable for the source ledger alias (for multi-ledger provenance)

How It Works

  1. The search pattern is parsed and turned into a Bm25SearchOperator
  2. The operator loads the BM25 index from storage (using the leaflet cache when available)
  3. Query text is analyzed (tokenized, lowercased, stopwords removed, stemmed)
  4. The top-k results are computed using Block-Max WAND, which skips posting list segments whose upper-bound scores cannot enter the result set, then returns the highest-scoring documents
  5. Results produce variable bindings (?doc, ?score) that flow into subsequent where clauses
  6. Subsequent patterns join against the source ledger to retrieve additional properties

Joining with Ledger Data

The primary use case is combining search results with structured graph data:

{
    "@context": {
        "ex": "http://example.org/",
        "f": "https://ns.flur.ee/db#"
    },
    "from": "docs:main",
    "where": [
        {
            "f:graphSource": "article-search:main",
            "f:searchText": "database design",
            "f:searchLimit": 20,
            "f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
        },
        { "@id": "?doc", "ex:title": "?title" },
        { "@id": "?doc", "ex:author": "?author" },
        { "@id": "?doc", "ex:year": "?year" }
    ],
    "select": ["?doc", "?title", "?author", "?year", "?score"]
}

The BM25 search runs first, producing a set of (?doc, ?score) bindings. The remaining where clauses join those bindings against the source ledger to enrich results with structured data.

Rust API

Creating and Querying

#![allow(unused)]
fn main() {
use fluree_db_api::{Bm25CreateConfig, FlureeBuilder};
use serde_json::json;

let fluree = FlureeBuilder::memory().build_memory();

// Seed ledger
let ledger0 = fluree.create_ledger("docs:main").await?;
let tx = json!({
    "@context": { "ex": "http://example.org/" },
    "@graph": [
        { "@id": "ex:doc1", "@type": "ex:Doc", "ex:title": "Rust guide", "ex:author": "Alice" },
        { "@id": "ex:doc2", "@type": "ex:Doc", "ex:title": "Python intro", "ex:author": "Bob" }
    ]
});
let ledger = fluree.insert(ledger0, &tx).await?.ledger;

// Create index
let query = json!({
    "@context": { "ex": "http://example.org/" },
    "where": [{ "@id": "?x", "@type": "ex:Doc", "ex:title": "?title" }],
    "select": { "?x": ["@id", "ex:title"] }
});
let config = Bm25CreateConfig::new("search", "docs:main", query);
let created = fluree.create_full_text_index(config).await?;

// Query with BM25 search + ledger join
let search_query = json!({
    "@context": { "ex": "http://example.org/", "f": "https://ns.flur.ee/db#" },
    "from": "docs:main",
    "where": [
        {
            "f:graphSource": &created.graph_source_id,
            "f:searchText": "rust",
            "f:searchLimit": 10,
            "f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
        },
        { "@id": "?doc", "ex:author": "?author" }
    ],
    "select": ["?doc", "?score", "?author"]
});

let result = fluree.query_connection_with_bm25(&search_query).await?;
}

Using FlureeIndexProvider

The FlureeIndexProvider implements the Bm25IndexProvider and Bm25SearchProvider traits, used by the query engine for graph source resolution:

#![allow(unused)]
fn main() {
use fluree_db_api::FlureeIndexProvider;
use fluree_db_query::bm25::{Bm25IndexProvider, Bm25Scorer, Analyzer};

let provider = FlureeIndexProvider::new(&fluree);

// Load index through the provider (with optional sync and time-travel)
let index = provider
    .bm25_index("search:main", Some(ledger.t()), false, None)
    .await?;

// Direct search
let analyzer = Analyzer::english_default();
let terms = analyzer.analyze_to_strings("rust");
let term_refs: Vec<&str> = terms.iter().map(|s| s.as_str()).collect();
let scorer = Bm25Scorer::new(&index, &term_refs);
let results = scorer.top_k(10);
}

Remote Search Service

For large indexes or multi-instance deployments, BM25 (and vector) search can be delegated to a standalone search service: the fluree-search-httpd binary.

Important: the search service is a separate process with its own listen port and its own HTTP API. It is not mounted under the main Fluree server’s api_base_url (/v1/fluree/...). It needs read access to the same storage and nameservice paths the main server writes to, so the typical deployment is to share a storage volume.

Prerequisite: the index must already exist

fluree-search-httpd only serves queries against existing indexes; it does not create them. Today, BM25 and vector graph-source indexes are created via the Rust API (Bm25CreateConfig + create_full_text_index, or VectorCreateConfig + create_vector_index). HTTP endpoints for index creation are not yet available — see the note in API endpoints.

The recommended workflow is:

  1. Run the Fluree server (or use the Rust API directly) to create the BM25 / vector index on a shared storage path.
  2. Run fluree-search-httpd against the same --storage-root and --nameservice-path.
  3. Point clients (or the main Fluree server’s SearchDeploymentConfig) at the search service’s /v1/search endpoint.

Running the Search Service

fluree-search-httpd \
  --storage-root file:///var/fluree/data \
  --nameservice-path file:///var/fluree/ns \
  --listen 0.0.0.0:9090

Configuration options (CLI flag / env var):

FlagEnv varDefaultDescription
--storage-rootFLUREE_STORAGE_ROOT(required)Path to Fluree storage (where indexes are persisted). file:// prefix optional.
--nameservice-pathFLUREE_NAMESERVICE_PATH(required)Path to nameservice data.
--listenFLUREE_SEARCH_LISTEN0.0.0.0:9090Address and port to bind.
--cache-max-entriesFLUREE_SEARCH_CACHE_MAX_ENTRIES100Maximum cached indexes.
--cache-ttl-secsFLUREE_SEARCH_CACHE_TTL_SECS300Cache TTL in seconds.
--max-limitFLUREE_SEARCH_MAX_LIMIT1000Maximum results per query.
--default-timeout-msFLUREE_SEARCH_DEFAULT_TIMEOUT_MS30000Default request timeout.
--max-timeout-msFLUREE_SEARCH_MAX_TIMEOUT_MS300000Maximum allowed request timeout.

Vector search is feature-gated: build/run a binary that includes the vector feature to enable the vector backend. When enabled, GET /v1/capabilities reports "vector" in supported_query_kinds.

Docker Deployment

Run the search service in Docker against a shared volume that the main Fluree server also mounts:

docker run -d --name fluree-search \
  -p 9090:9090 \
  -v fluree-data:/var/lib/fluree \
  -e FLUREE_STORAGE_ROOT=/var/lib/fluree/storage \
  -e FLUREE_NAMESERVICE_PATH=/var/lib/fluree/ns \
  fluree/search-httpd:latest

For a full Compose example showing the main server + search service sharing a volume, see Running with Docker › Search service.

Search Protocol

The remote search service uses a JSON-based protocol on POST /v1/search. The request is the same shape regardless of backend; the query.kind discriminator selects BM25 vs. vector.

BM25 request:

{
  "protocol_version": "1.0",
  "graph_source_id": "article-search:main",
  "query": { "kind": "bm25", "text": "rust programming" },
  "limit": 20,
  "as_of_t": 150,
  "sync": false,
  "timeout_ms": 5000
}

Vector request (requires the vector feature):

{
  "protocol_version": "1.0",
  "graph_source_id": "doc-embeddings:main",
  "query": { "kind": "vector", "vector": [0.12, -0.34, ...], "metric": "cosine" },
  "limit": 10
}

A vector_similar_to variant takes a to_iri instead of an explicit vector — the server resolves the entity’s embedding from the source ledger.

Response:

{
  "protocol_version": "1.0",
  "index_t": 150,
  "hits": [
    { "iri": "http://example.org/doc1", "ledger_id": "docs:main", "score": 8.75 },
    { "iri": "http://example.org/doc2", "ledger_id": "docs:main", "score": 7.32 }
  ],
  "took_ms": 12
}

Endpoints:

  • POST /v1/search — execute a search query (BM25 or vector)
  • GET /v1/capabilities — protocol version, supported query kinds, max limit/timeout
  • GET /v1/health — health check

Time-travel: BM25 supports as_of_t (the service walks the manifest to find the newest snapshot ≤ t). Vector indexes are head-only and reject as_of_t.

Auth: the standalone service does not enforce auth itself — front it with a reverse proxy (or a network policy) if it shouldn’t be publicly reachable. The auth_token field on the main server’s SearchDeploymentConfig is sent as a Bearer token, so any proxy you put in front can validate it.

Where this fits in your architecture

Two ways to use the search service today:

  1. Direct client → search service. Your application sends BM25 / vector requests straight to fluree-search-httpd and joins the resulting IRIs back to the main Fluree server’s query API on the application side. This is the path that works end-to-end today and is appropriate when search traffic dominates and you want it isolated from your main Fluree process.
  2. Main Fluree server → search service (transparent delegation). The query path inside the main server has the plumbing to consult a per-graph-source SearchDeploymentConfig and forward to a remote endpoint. This wiring is not yet exposed end-to-end through the create APIs — Bm25CreateConfig has no deployment builder, and the deployment field is not persisted to the nameservice config record by today’s create flow. Track this as a near-term gap; until then, query the search service directly.

Parity Guarantee

Both embedded and remote modes use identical:

  • Analyzer configuration (tokenization, stemming, stopwords)
  • BM25 scoring algorithm and parameters
  • Time-travel and sync semantics

Queries return identical results regardless of deployment mode.

Time-travel note: BM25 time-travel selection is implemented by BM25 itself via a manifest/root in storage. The nameservice stores only a head pointer to the latest BM25 manifest (an opaque address) and does not store BM25 snapshot history.

Graph Source Identity

BM25 graph sources are registered in the nameservice as @type: "f:GraphSourceDatabase" records:

  • ID format: {name}:{branch} (e.g., article-search:main)
  • Name: Cannot contain : (reserved for ID formatting)
  • Branch: Defaults to "main"
  • Dependencies: Tracked for the source ledger(s) the index draws from
  • Config: Stores the indexing query and BM25 parameters (k1, b)

List ledgers and graph sources to discover BM25 graph sources:

curl http://localhost:8090/v1/fluree/ledgers

Fluree Memory

Persistent, searchable memory for AI coding assistants — built for real work.

Fluree Memory gives tools like Claude Code, Cursor, and VS Code Copilot a long-term project brain. Facts, decisions, and constraints are captured as structured memories, stored in a local Fluree ledger you control, and retrieved via ranked recall — either by the agent through MCP or directly from the CLI.

Because memories live in plain-text TTL files under your project (.fluree-memory/repo.ttl for the team, .fluree-memory/.local/user.ttl for you), they can be committed to git and shared across the team the same way code is. No cloud service, no opaque database, no data leaving your machine. Open the file, read it, grep it, diff it, review it in a PR.

Design philosophy

We initiallly built Fluree Memory for us, with a goal to increase the velocity of development with LLMs, work seamlessly in a git workflow, and to reduce token usage – in that order. We ended with a simple knowledge organization model (it started out more complex), and leaned into the speed and power of our knowledge graph database. We found most memory systems are designed for benchmarks or demos – they optimize for recall scores on synthetic tasks, ship your data to a hosted service, or bury context in a format only the tool can read - often running LLMs over git hooks or conversation turns that can burn more tokens than your actual coding session.

Fluree Memory has been refined by running it daily across real repositories — a 37-crate Rust workspace, multi-service TypeScript apps, real teams — and iterating on what actually gets used. The schema started with five memory kinds, four sensitivity levels, six sub-type fields, and bi-temporal validity. Usage data showed that 85% of memories were facts, “architecture” covered 81% of sub-types, and most optional fields were never set. So we simplified. Three kinds. Tags instead of sub-type taxonomies. Scope instead of a redundant sensitivity axis. Fewer decisions for the agent to make on every save means more saves actually happen.

The principles that came out of this:

  • Your repo, your data. Memories are local Turtle (TTL) files. They live alongside your code, flow through your existing review and version control, and never leave your infrastructure. There is no hosted component, no account, no telemetry.
  • Visible and auditable. Every memory is a block of Turtle you can read in any text editor. git diff shows exactly what changed. git blame shows who (or what) added it. No black boxes.
  • Simple enough to actually use. Three kinds — fact, decision, constraint — cover the real-world space. If a model has to deliberate over a five-way kind taxonomy plus sub-types on every save, it won’t save. A system that gets used at 80% fidelity beats one that’s theoretically perfect but sits idle.
  • Recalled, not regurgitated. Seach with metadata re-ranking (tags, branch affinity, recency) pulls what’s relevant to the current task. The agent gets a handful of targeted memories, not a dump of everything that was ever stored.
  • Optimized for context tokens. Terse output, scoring thresholds, and explicit instructions of pagination telling the LLM whats next with enough context it can decide if useful to fetch more.
  • Iterated from production. The schema, the recall ranking, the tool descriptions — all of it has been refined based on real agent behavior across real codebases. Features that earned usage stay. Features that didn’t get cut.

Why

Every AI coding session starts from zero. The model doesn’t remember what was tried last week, which library the team chose and why, or the ten subtle gotchas that live in someone’s head. You either re-explain each time, stuff it all into a CLAUDE.md / AGENTS.md that bloats context, or ship agents that repeat mistakes.

Fluree Memory is:

  • Structured, not a wall of markdown. Memories have a kind (fact, decision, constraint), tags, scope, optional severity, rationale, and artifact references.
  • Recalled on demand via BM25 keyword-scored search over memory content, with metadata-based re-ranking (tags, refs, kind, branch, recency). The agent pulls only what’s relevant to the current task, keeping context small.
  • Versioned via git — update modifies in place (same ID, only changed fields); git log -p shows the full history. Use fluree create <name> --memory to import git history into a time-travel-capable Fluree ledger.
  • Scoped per-repo or per-user, so team knowledge stays shareable and personal preferences stay yours.
  • Local-first, stored in .fluree-memory/ as TTL — no cloud dependency, you own the data.
  • Secret-aware — content is scanned on write against a set of known credential patterns, and matches are redacted automatically.

Start here

How it fits

Fluree Memory is a feature of Fluree DB — installing the fluree CLI gives you both. If you only care about the memory tooling, you can still install and use Fluree as a single binary and never touch the rest of the database features.

Getting started

Three steps and you’re running:

  1. Install the fluree CLI — one binary, single command.
  2. Run the quickstart — initialize the memory store, add your first memory, recall it. 2 minutes.
  3. Wire it into your AI tool — pick yours:

Once the MCP server is configured, the AI tool gets memory_add and memory_recall tools and will start saving and retrieving memories without you having to reach for the CLI.

Install Fluree

Fluree Memory ships as part of the fluree CLI. Install the binary once and you have both the database and the memory tooling.

macOS / Linux (installer script)

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.sh | sh

Homebrew (macOS / Linux)

brew install fluree/tap/fluree

PowerShell (Windows)

Open PowerShell and run:

irm https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.ps1 | iex

Open a new PowerShell session and verify with fluree --version. The binary is unsigned, so Windows SmartScreen may prompt on first run — click More info → Run anyway.

Pre-built binary

# Linux x86_64
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-x86_64-unknown-linux-gnu.tar.xz | tar xJ

# macOS aarch64
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-aarch64-apple-darwin.tar.xz | tar xJ

Build from source

If you have Rust installed:

git clone https://github.com/fluree/db
cd db
cargo install --path fluree-db-cli

Verify

fluree --version
fluree memory --help

You should see a list of memory subcommands: init, add, recall, update, forget, status, export, import, mcp-install.

Next: quickstart.

Quickstart

2 minutes to your first memory.

1. Initialize the memory store

From the root of a project you’d like to give memory to:

cd my-project
fluree memory init

This creates:

  • .fluree-memory/repo.ttlteam memories, meant to be committed to git
  • .fluree-memory/.local/user.ttlyour personal memories, gitignored
  • .fluree-memory/.gitignore — pre-configured to ignore .local/ (which holds your user scope plus the MCP log)
  • The __memory ledger inside your project’s .fluree/ store

init is idempotent; running it again is safe.

It will also detect any installed AI coding tools (Claude Code, Cursor, VS Code, Windsurf, Zed) and offer to wire up MCP. You can say no here and run fluree memory mcp-install later.

2. Add a memory

fluree memory add --kind fact \
  --text "Tests use cargo nextest, not cargo test" \
  --tags testing

Output:

Stored memory: mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0

The ID is a ULID — sortable by creation time and unique across the store.

3. Recall it

fluree memory recall "how do I run tests"

Output:

Recall: "how do I run tests" (1 match)

1. [score: 13.0] mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0
   Tests use cargo nextest, not cargo test
   Tags: testing

Recall is BM25-ranked over the memory content and tags. No embeddings, no network — fast and deterministic.

4. Check status

fluree memory status
Memory Store Status
  Total memories: 1
  Total tags:     1
  By kind:
    fact: 1

That’s the loop

Add memories as you learn things. Recall them when you need them. Commit .fluree-memory/repo.ttl to share team knowledge.

Next

Set up Claude Code

Wire Fluree Memory into Claude Code so it saves and recalls memories for you.

Automatic setup

Easiest path: run init from your project root and accept the Claude Code prompt.

cd my-project
fluree memory init

When you see:

Detected AI coding tools:
  - Claude Code

Install MCP config for Claude Code? [Y/n]

…press Y. This runs claude mcp add under the hood to register the Fluree Memory MCP server at local (user) scope, and appends a short section to your CLAUDE.md telling Claude when to use it.

If you already ran init and skipped it:

fluree memory mcp-install --ide claude-code

What gets added

  • MCP server registered in ~/.claude.json — scope local
    • Command: fluree mcp serve --transport stdio
  • Project instructions in <repo>/CLAUDE.md — a short block explaining the memory tools

Verify

Restart Claude Code and start a session in the project. Ask:

What project memories do you have?

Claude should call memory_recall and return whatever you’ve added (initially nothing).

Try:

Remember: we use cargo nextest for tests, not cargo test.

Claude should call memory_add and report the stored ID.

Troubleshooting

The tool doesn’t appear. Confirm Claude Code sees the MCP server:

claude mcp list

You should see a fluree-memory entry. If not, re-run fluree memory mcp-install --ide claude-code.

Memories aren’t scoped to the repo. The Claude Code MCP entry doesn’t set FLUREE_HOME — the server walks up from its spawn CWD looking for a .fluree/ directory. In normal use this matches the workspace, but if Claude Code launched the server from outside your repo, memories can land in a global store. Fix by editing ~/.claude.json and adding an env block to the fluree-memory server entry:

"env": { "FLUREE_HOME": "/absolute/path/to/your/repo/.fluree" }

Then restart Claude Code.

The MCP log. The MCP server logs to <repo>/.fluree-memory/.local/mcp.log (the file is truncated on each server start). Tail it if something’s off:

tail -f .fluree-memory/.local/mcp.log

Set up Cursor

Wire Fluree Memory into Cursor so its agent mode saves and recalls memories for you.

Automatic setup

From your project root:

cd my-project
fluree memory init

Accept the Cursor prompt:

Install MCP config for Cursor? [Y/n]

Or, at any time:

fluree memory mcp-install --ide cursor

What gets written

  • <repo>/.cursor/mcp.json — repo-scoped MCP server config
  • <repo>/.cursor/rules/fluree_rules.md — a short rules file telling Cursor when to reach for memory_recall
{
  "mcpServers": {
    "fluree-memory": {
      "type": "stdio",
      "command": "fluree",
      "args": ["mcp", "serve", "--transport", "stdio"],
      "env": {
        "FLUREE_HOME": "${workspaceFolder}/.fluree"
      }
    }
  }
}

${workspaceFolder} is a Cursor config-interpolation token — the MCP server is always launched with FLUREE_HOME pointing at the current project, so memories stay scoped to the repo even if Cursor spawns the process from a different working directory.

Verify

Fully restart Cursor (Cmd-Q on macOS, not just reload window). Open the project and ask the agent:

Recall project memories for testing.

The agent should call memory_recall with the tag testing and return what’s in .fluree-memory/repo.ttl.

Troubleshooting

MCP isn’t connecting. Tail the MCP log:

tail -f .fluree-memory/.local/mcp.log

You should see a client initialized line within a few seconds of Cursor startup. If not, check .cursor/mcp.json exists and is valid JSON, then restart Cursor.

Memories going to a global store on macOS. If you see memories landing in ~/Library/Application Support/.fluree-memory/ instead of <repo>/.fluree-memory/, FLUREE_HOME isn’t being honored. Re-run fluree memory mcp-install --ide cursor from inside the repo and restart Cursor fully.

Rules file ignored. Cursor picks up .cursor/rules/*.md on project open. After editing, reload the window.

Set up VS Code (Copilot)

Wire Fluree Memory into VS Code with GitHub Copilot Chat so it can save and recall memories through MCP.

Automatic setup

From your project root:

fluree memory init

Accept the VS Code prompt, or run:

fluree memory mcp-install --ide vscode

What gets written

  • <repo>/.vscode/mcp.json — repo-scoped MCP server config (key: servers)
  • <repo>/.vscode/fluree_rules.md — rules file you can reference from your prompts
{
  "servers": {
    "fluree-memory": {
      "type": "stdio",
      "command": "fluree",
      "args": ["mcp", "serve", "--transport", "stdio"]
    }
  }
}

Unlike the Cursor config, this entry does not set FLUREE_HOME — VS Code normally spawns the server from the workspace root, so the walk-up logic in fluree mcp serve finds .fluree/ on its own. If you need to pin the location explicitly (e.g. the server is ending up in a global store), add an env block pointing at the absolute path to <repo>/.fluree/.

Verify

Open the project in VS Code with Copilot Chat enabled. In chat (agent mode), ask:

Call memory_recall for “testing”.

Copilot should invoke the tool and return matching memories. On first use VS Code may prompt to allow the MCP server — approve it.

Troubleshooting

Tail .fluree-memory/.local/mcp.log and fully restart VS Code if something’s off. If memory is landing in a global store rather than the repo, add an explicit env.FLUREE_HOME pointing at <repo>/.fluree/ in .vscode/mcp.json and restart.

Set up Windsurf

Wire Fluree Memory into Windsurf (Codeium’s IDE).

Automatic setup

fluree memory init

Accept the Windsurf prompt, or run:

fluree memory mcp-install --ide windsurf

What gets written

Windsurf uses a global MCP config:

  • ~/.codeium/windsurf/mcp_config.json — a fluree-memory entry is merged under mcpServers
{
  "mcpServers": {
    "fluree-memory": {
      "command": "fluree",
      "args": ["mcp", "serve", "--transport", "stdio"]
    }
  }
}

Because the config is global, it’s wired once and every Windsurf project can use it. The MCP server figures out which repo it’s serving by walking up from its spawn CWD until it finds a .fluree/ directory; in normal use Windsurf spawns it from the workspace root so this works without extra configuration. No FLUREE_HOME is set by default.

Verify

Restart Windsurf and open your project. In Cascade (Windsurf’s agent chat):

Use memory_recall to find testing patterns.

The agent should invoke the tool.

Troubleshooting

If memories end up in a global store instead of <repo>/.fluree-memory/, Windsurf is likely spawning the server from outside the workspace. Edit ~/.codeium/windsurf/mcp_config.json and add an explicit absolute path:

"env": { "FLUREE_HOME": "/absolute/path/to/repo/.fluree" }

${workspaceFolder} interpolation is not guaranteed in all Windsurf versions — when in doubt, use an absolute path and switch it per project.

Set up Zed

Wire Fluree Memory into Zed’s agent via MCP.

Automatic setup

fluree memory init

Accept the Zed prompt, or run:

fluree memory mcp-install --ide zed

What gets written

  • <repo>/.zed/settings.json — the context_servers key gets a fluree-memory entry
{
  "context_servers": {
    "fluree-memory": {
      "command": "fluree",
      "args": ["mcp", "serve", "--transport", "stdio"]
    }
  }
}

No FLUREE_HOME is set by default — the MCP server walks up from Zed’s spawn CWD to find the workspace’s .fluree/. If you need to pin it explicitly, add an env block alongside command/args with an absolute path.

Caveat: JSONC

Zed’s settings.json often contains // comments (JSONC). mcp-install detects this and will skip the automatic write rather than risk corrupting your settings — it prints a hint telling you to add the block by hand.

If you’d like to pre-empt that, strip comments from .zed/settings.json before running mcp-install, or paste the block yourself.

Verify

Restart Zed. In the agent panel:

Recall project memories about testing.

The agent should call memory_recall via the fluree-memory context server.

Concepts

Short, self-contained explanations of the ideas behind Fluree Memory. Read these once; they’ll save you time when you’re reading CLI reference or wiring a new IDE.

What is a memory?

A memory is a single structured record of something worth remembering about a project. Every memory has:

  • Content — the text itself (“Tests use cargo nextest, not cargo test”)
  • Kind — what sort of thing it is
  • Tags — free-form keywords for filtering
  • Scope — repo (shared) or user (yours)
  • Refs — optional file or artifact pointers
  • Timestamps — when it was created

Everything else (severity, rationale, alternatives) is optional metadata that can appear on any kind.

The three kinds

Memories are typed. The kind tells future-you (and future-agents) how to interpret the content.

fact

Something that is objectively true about the project.

“The indexer uses postcard encoding for on-disk format.” “We run PostgreSQL 16 in production.” “The BM25 code lives in fluree-db-indexer/src/bm25.rs.” “Error pattern defined here -> fluree-db-core/src/error.rs

Use facts liberally. They’re the default and make up the bulk of a typical memory store. Use tags to categorize them (e.g. architecture, dependency, configuration). Facts can carry --rationale and --alternatives when you want to explain why something is the way it is.

decision

A choice the team made, ideally with why and what was considered.

“Use postcard for compact index encoding. Why: no_std compatible, smaller than bincode. Alternatives: bincode, CBOR, MessagePack.”

Decisions are what distinguishes a project with institutional knowledge from one where people keep re-litigating settled choices. Capture them with --rationale and --alternatives:

fluree memory add --kind decision \
  --text "Use postcard for compact index encoding" \
  --rationale "no_std compatible, smaller output than bincode" \
  --alternatives "bincode, CBOR, MessagePack" \
  --refs fluree-db-indexer/

constraint

A rule — something that must, should, or is preferred. Constraints carry a severity.

must “Never commit secrets; use environment variables.” should “Integration tests run in a real Postgres, not SQLite.” prefer “Name errors with the module prefix (QueryError, not Error).”

fluree memory add --kind constraint \
  --text "Never suppress dead code with _underscore prefix; delete it" \
  --severity must \
  --tags code-style \
  --rationale "Underscore-prefixed names hide code from future discovery"

When an agent is about to do something, constraints are the first thing it should recall. Like facts and decisions, constraints can carry --rationale and --alternatives to explain the reasoning behind the rule.

Which kind should I use?

You have…Use kind
A verifiable truthfact
A choice and its reasoningdecision
A rule that must/should be followedconstraint
A pointer to code / a filefact (with --refs)
A soft taste or conventionfact or constraint --severity prefer

When in doubt: fact. The kind can always be refined later via update. All three kinds support --rationale and --alternatives for capturing the why.

Repo vs user memory

Fluree Memory has two scopes, and they live in separate files:

ScopeFileGitVisible to
repo.fluree-memory/repo.ttl✅ commit itthe whole team
user.fluree-memory/.local/user.ttl❌ gitignoredjust you

Scope is set at write time (--scope repo or --scope user) and defaults to repo. Once set, it determines which TTL file the memory is written to.

Layout

After fluree memory init inside a project:

my-project/
├── .fluree/                      # Fluree DB storage for the __memory ledger
├── .fluree-memory/
│   ├── .gitignore                # contents: ".local/"
│   ├── repo.ttl                  # team memories — COMMIT THIS
│   └── .local/                   # ignored by the .gitignore above
│       ├── user.ttl              # your personal memories
│       ├── mcp.log               # MCP server log
│       └── build-hash            # content hash used to detect external TTL edits
└── (your code)

The .fluree-memory/.gitignore is written by init and handles the split for you. Commit the whole .fluree-memory/ directory; git will skip .local/ automatically.

When to use which

Repo scope (default):

  • Facts about the codebase (“tests use cargo nextest”)
  • Team decisions with rationale
  • Constraints everyone must follow
  • File/symbol pointers via --refs (“X lives at Y”)

User scope:

  • Your IDE quirks
  • Personal conventions the team hasn’t agreed on
  • Scratch notes while you’re exploring
  • Anything you’d be embarrassed to commit

Changing scope after the fact

You can’t move a memory between scopes directly. If you stored something as repo that should be user-only:

fluree memory forget <id>
fluree memory add --scope user --kind <kind> --text "..."

Recall sees both

By default, fluree memory recall and the memory_recall MCP tool return matches from both scopes — your personal notes and the team’s are merged in the result set. Filter with --scope repo or --scope user if you need to isolate one.

Sharing with the team

Memory becomes a shared asset as soon as you commit .fluree-memory/repo.ttl. A teammate who clones the repo and runs fluree memory init gets the ledger populated from the committed TTL automatically — no manual import step.

Conflicts on repo.ttl resolve like any other text file. TTL is line-oriented per-triple, so most merges are clean; occasionally you’ll see a merge mark in the middle of a memory’s fields and need to pick one side.

See Team workflows for the full story.

Updates and forgetting

Memories are updated in place. When you update a memory, the same ID is kept and only the changed fields are modified. History is tracked via git, not via internal versioning.

update modifies in place

fluree memory update mem:fact-01JDXYZ... --text "Tests use cargo nextest with --no-fail-fast"

Output:

Updated: mem:fact-01JDXYZ...

The memory keeps its original ID. The TTL file is rewritten with the new content, and git records what changed:

git diff .fluree-memory/repo.ttl
 mem:fact-01JDXYZ a mem:Fact ;
-    mem:content "Tests use cargo nextest" ;
+    mem:content "Tests use cargo nextest with --no-fail-fast" ;
     mem:tag "cargo" ;

forget retracts

forget is different from update. It retracts the memory’s triples — the memory stops existing entirely.

fluree memory forget mem:fact-01JDXYZ...
Forgotten: mem:fact-01JDXYZ...

Rule of thumb:

You think…Use
“This was wrong from the start”forget
“This was right but the world changed”update
“I never want anyone to see this again”forget

History via git

Both update and forget rewrite the TTL file, and git tracks the full history. To see how a memory evolved:

git log -p .fluree-memory/repo.ttl

This shows every change — what was added, updated, or forgotten, and when.

Time-travel over memory history

If you want to query memory history with Fluree’s time-travel capabilities, you can import your git-tracked memory history into a Fluree ledger:

fluree create my-memory-ledger --memory

This replays each git commit to .fluree-memory/repo.ttl as a Fluree transaction, giving you a full time-travel-capable ledger over your memory history. Use --no-user to exclude user.ttl from the import.

Recall and ranking

recall is how you get memories out. It’s a keyword query against an inverted index with BM25 scoring — fast, local, and deterministic.

The basics

fluree memory recall "how do I run tests"

The query string is tokenized and matched against each memory’s content via a BM25-scored fulltext index. Tags, artifact refs, kind, branch, and recency contribute as re-rank bonuses on top of the BM25 score — they’re not part of the fulltext match itself. Results are sorted by combined score (higher = better) and capped at --limit (default: 3).

Recall: "how do I run tests" (2 matches)

1. [score: 13.0] mem:fact-01JDXYZ...
   Tests use cargo nextest, not cargo test
   Tags: testing

2. [score: 8.0] mem:fact-01JDABC...
   Integration tests use assert_cmd + predicates
   Tags: testing

What BM25 rewards

BM25 scores a memory’s content higher when:

  • Query terms appear in the content.
  • Those terms are rare in the overall store — a match on “postcard” beats a match on “the”.
  • The matched terms are in a shorter memory — density matters.
  • Multiple distinct terms from the query match (not the same term repeated).

There are no embeddings, no semantic matching — just lexical overlap with smart weighting. If you mean “tests” but phrase it as “unit tests” or “testing”, BM25 catches that because the stems overlap; it won’t catch “QA” unless the content mentions it.

Re-rank bonuses

After BM25 produces content scores, Fluree Memory adds small bonuses:

  • Tag hit: +10 per tag that contains a query word.
  • Artifact ref hit: +8 per ref path that contains a query word.
  • Kind word in query: +6 if the query mentions the memory’s kind (“constraint”, “decision”, etc.).
  • Branch match: +3 if the memory was captured on the current git branch.
  • Recency: +2 for memories <7 days old, +1 for <30 days.

If BM25 returns no hits, recall falls back to metadata-only scoring using these same bonuses so a well-tagged memory can still surface on a content miss.

Filters

Filters narrow the candidate set before scoring:

# Only constraints tagged "errors"
fluree memory recall "handling" --kind constraint --tags errors

# Only repo-scoped memories
fluree memory recall "deployment" --scope repo

# Page through results
fluree memory recall "tests" --limit 10 --offset 10

Common filter recipes:

You want…Flags
Team-only (ignore personal)--scope repo
Just the hard rules--kind constraint
Just the decisions with reasoning--kind decision
Pointers to code--kind fact --tags <domain> (with --refs)

Output formats

fluree memory recall "tests"                 # text — for humans
fluree memory recall "tests" --format json   # JSON — for scripts
fluree memory recall "tests" --format context  # XML — for LLM injection

The context format produces a compact XML block designed to be pasted into an agent’s context window:

<memory-context>
  <memory id="mem:fact-01JDXYZ..." kind="fact" score="13.0">
    <content>Tests use cargo nextest, not cargo test</content>
    <tags>testing</tags>
  </memory>
  <pagination shown="1" offset="0" total_in_store="13" />
</memory-context>

When results are cut off, the pagination element embeds a human-readable hint telling the agent how to get more:

<pagination shown="3" offset="0" limit="3" total_in_store="13">
  Results 1–3. Use offset=3 to retrieve more.
</pagination>

This pattern is why Fluree Memory is practical to use with an agent: a small, ranked slice goes into context, and the agent can ask for more if the top hits aren’t enough.

How this compares to other approaches

ApproachCostQualityWorks offline
BM25 (Fluree Memory)free, instanthigh for keyword overlapyes
Embedding searchpaid + latencyhigh for paraphraseusually no
Stuff-it-all-in-CLAUDE.mdfreecontext blow-upyes

For developer memory — where the agent knows the words for what it’s looking for — BM25 is a very good fit. If you later want semantic recall, Fluree DB itself ships a vector search feature that the memory store could layer on.

MCP server

Fluree Memory exposes its functionality over Model Context Protocol so AI coding agents can use it natively. The MCP server is bundled with the fluree CLI — no separate install.

Start it manually

fluree mcp serve --transport stdio

In practice you never start it manually — your IDE launches it. fluree memory mcp-install writes the IDE-specific config that does the spawning. See mcp-install for the per-IDE details.

Tools exposed

The server exposes these tools to the agent:

memory_recall

Search for relevant memories.

{
  "name": "memory_recall",
  "arguments": {
    "query": "how do I run tests",
    "limit": 5,
    "offset": 0,
    "kind": "fact",
    "tags": ["testing"],
    "scope": "repo"
  }
}

Returns XML context-formatted output (see Recall and ranking).

memory_add

Store a new memory. The content field is named content (not text).

{
  "name": "memory_add",
  "arguments": {
    "kind": "fact",
    "content": "Tests use cargo nextest, not cargo test",
    "tags": ["testing"],
    "scope": "repo"
  }
}

Other optional arguments: refs, severity, rationale, alternatives. Returns the new memory ID.

memory_update

Patch an existing memory in place. The memory keeps its ID; only the fields you pass are changed. Use content (not text) for the new body.

{
  "name": "memory_update",
  "arguments": {
    "id": "mem:fact-01JDXYZ...",
    "content": "Tests use cargo nextest with --no-fail-fast"
  }
}

Also accepts tags, refs, rationale, alternatives.

memory_forget

Retract a memory permanently.

{
  "name": "memory_forget",
  "arguments": { "id": "mem:fact-01JDXYZ..." }
}

memory_status

Return a summary of the store — totals by kind and a preview of recent memories. Agents are encouraged to call this first to discover what topics to query.

kg_query

Run a raw SPARQL SELECT against the __memory ledger. Advanced escape hatch — prefer memory_recall for ranked search.

{
  "name": "kg_query",
  "arguments": {
    "query": "PREFIX mem: <https://ns.flur.ee/memory#> SELECT ?id ?content WHERE { ?id a mem:Constraint ; mem:content ?content } LIMIT 20"
  }
}

Where the store lives

When the MCP server starts, it picks its Fluree directory the same way the CLI does:

  1. If $FLUREE_HOME is set, that directory is used (unified mode).
  2. Otherwise it walks up from the spawn CWD looking for an existing .fluree/.
  3. If neither is found, it falls back to the platform’s global config/data directories.

When the server is in unified mode (cases 1 and 2), the memory store lives in <dir>/../.fluree-memory/ and is shared with the CLI. In global mode, file-based sync is disabled and memories live only in the global ledger.

This matters for IDE integrations: the Cursor config that mcp-install writes explicitly sets FLUREE_HOME=${workspaceFolder}/.fluree so memory stays scoped to the current repo regardless of Cursor’s CWD. The other supported IDEs (Claude Code, VS Code, Windsurf, Zed) rely on the spawn CWD plus the walk-up behavior — which normally works, but can land in a global store if the IDE spawns the MCP server from outside the repo. If you see that, set FLUREE_HOME manually in the MCP config or re-run mcp-install from inside the repo root.

The rules file

Alongside the MCP server, mcp-install writes (or appends to) a short rules file for IDEs that support one:

IDERules file
Claude CodeShort section appended to <repo>/CLAUDE.md
Cursor<repo>/.cursor/rules/fluree_rules.md
VS Code<repo>/.vscode/fluree_rules.md
Windsurf, ZedNone written — you can add your own guidance manually

The file tells the agent when to reach for memory tools — e.g. at the start of a task (memory_recall first), after capturing something reusable (memory_add), and not to re-ask the user for things already memorized. You can edit it to customize the agent’s instincts; see Customizing the rules file.

Secrets and sensitivity

Memory is meant to be written freely and committed to git. That only works if secrets never land in there.

Automatic redaction

Every memory_add / fluree memory add runs the input through a secret detector before storage. If the content matches patterns for API keys, passwords, tokens, or connection strings, the sensitive substrings are replaced with [REDACTED] and a warning is printed:

  warning: secrets detected in content — storing redacted version.
  Original content contained sensitive data that was replaced with [REDACTED].
Stored memory: mem:fact-01JDXYZ...

Patterns covered include:

  • AWS access key IDs (AKIA…)
  • GitHub personal access tokens (ghp_…, gho_…, ghu_…, ghs_…, ghr_…)
  • OpenAI keys (sk-…) and Anthropic keys (sk-ant-…)
  • Fluree API keys (flk_…)
  • Generic api_key=… / apikey: … assignments
  • password=… / passwd: … assignments
  • Connection strings with inline credentials (postgres://, mysql://, mongodb://, redis://, amqp:// containing user:pass@host)
  • PEM private keys (-----BEGIN … PRIVATE KEY-----)
  • Bearer tokens (Bearer eyJ…)
  • JWT tokens (three base64 segments separated by dots)

Redaction preserves enough context that the memory still makes sense (e.g. “Use the API key [REDACTED] from 1Password”) while the actual value never reaches the TTL file.

The detector is pattern-based, not entropy-based — well-disguised secrets outside these patterns can still slip through. Treat redaction as a safety net, not a guarantee.

Scope as the privacy boundary

Memory visibility is controlled by scope (repo or user), not by a separate sensitivity level. Repo-scoped memories live in .fluree-memory/repo.ttl and are committed to git, so they’re visible to anyone with repo access. User-scoped memories live in .fluree-memory/.local/user.ttl, which is gitignored.

If something is client-specific or team-internal, put it in user scope or use a private sub-repo. The scope mechanism plus secret-detection on ingest handles what a separate sensitivity field used to.

What if I slip?

If something slipped past the detector and into repo.ttl before you noticed:

  1. fluree memory forget <id> — retracts the memory.
  2. Run git log -p .fluree-memory/repo.ttl and use git filter-repo (or the BFG) to scrub the history if the value leaked there too.
  3. Rotate the credential at the source. Redaction in memory doesn’t rotate keys.

Treat this the same way you’d treat accidentally committing .env — the git history is the hard part, the file is the easy part.

Guides

Task-oriented walkthroughs for common situations.

Looking for end-to-end setup instead? See Getting started.

Team workflows: sharing memory via git

The whole point of repo.ttl is that memory becomes a team asset — captured once by whoever learns it, available to every teammate and every AI agent forever.

The happy path

  1. Someone runs fluree memory init in the repo and commits .fluree-memory/ (minus the gitignored .local/).
  2. Teammates pull and run fluree memory init once. The init picks up the committed repo.ttl and populates the ledger from it. No manual import.
  3. As people add memories, .fluree-memory/repo.ttl changes in the working tree. Commit it like any other file.
  4. Pulls bring in new memories automaticallyfluree memory recall (and the MCP server) read the ledger, which stays in sync with the TTL file.

That’s it. No server, no sync daemon, no API tokens. Git is the sync mechanism.

What to commit

✅ Commit:

  • .fluree-memory/repo.ttl
  • .fluree-memory/.gitignore
  • Any IDE config MCP-install created: .cursor/mcp.json, .cursor/rules/fluree_rules.md, .vscode/mcp.json, .vscode/fluree_rules.md, .zed/settings.json

❌ Don’t commit:

  • .fluree-memory/.local/user.ttl — your personal memories (handled by .fluree-memory/.gitignore)
  • .fluree-memory/.local/mcp.log — noisy and personal (handled by .fluree-memory/.gitignore)
  • .fluree/ — the Fluree storage dir, can be re-hydrated from repo.ttl (add this to your project’s root .gitignore.fluree-memory/.gitignore only covers its own subtree)

Reviewing memory in PRs

Treat repo.ttl changes like documentation changes in code review:

  • New memory? Is the kind right? Is the wording accurate? Are the tags useful?
  • Updated memory? Is the new content better (not just different)?
  • Forgot memory? Was that really wrong, or should it have been updated instead?

Memories are serialized as subject blocks with one predicate per line, so most diffs are readable.

Merge conflicts

Memories in repo.ttl are sorted by (branch, id) — memories from the same git branch cluster together, and different branches land in different regions of the file. This means two feature branches that each add memories will almost never conflict, because their blocks insert at different positions in the file.

The branch name is captured automatically when a memory is created, so memories from feature/auth sort separately from memories created on feature/indexer. Within each branch group, memories are ordered chronologically (ULID encodes creation time).

When conflicts do occur, they’re usually because two branches modified the same existing memory (via update) or both worked on the same branch. These are typically clean to resolve:

<<<<<<< HEAD
mem:fact-01JD... a mem:Fact ;
    mem:content "Tests use cargo nextest" ;
    mem:tag "cargo" ;
    mem:tag "testing" ;
    ...
=======
mem:fact-01JD... a mem:Fact ;
    mem:content "Tests use cargo nextest with --no-fail-fast" ;
    mem:tag "testing" ;
    ...
>>>>>>> their-branch

Pick the version you want or combine them, then re-run fluree memory status to make sure the store parses cleanly. If the merged file is genuinely messy, a cleaner path is to accept one side wholesale and then apply the other side’s changes via fluree memory add / update on top.

Onboarding a new teammate

When someone new clones the repo:

git clone git@github.com:team/project
cd project
fluree memory init

After init, they can immediately:

fluree memory recall "testing" -n 10

…and get everything the team has captured. No setup beyond installing fluree.

Going further

  • Keep PR review short by tagging memories with their domain (auth, indexer, docs, etc.) so reviewers can filter.
  • Use constraint --severity must sparingly — they’re the “policy layer” of memory. Prefer should or prefer for taste.
  • Periodically fluree memory status and prune stale memories with forget. The store should feel curated, not a dumping ground.

Customizing the rules file

When you run fluree memory mcp-install, a short “rules file” gets written alongside the MCP server config. This file tells your AI tool when and how to use the memory tools — things the tool definitions alone don’t express.

Where it lives

IDERules file
Claude CodeSection appended to <repo>/CLAUDE.md
Cursor<repo>/.cursor/rules/fluree_rules.md
VS Code<repo>/.vscode/fluree_rules.md
WindsurfNot written — add your own guidance to Windsurf’s memory / rules UI
ZedNot written — add your own guidance via Zed’s assistant settings

The canonical source for the default text lives in fluree-db-memory/rules/fluree_rules.md in the repo; the Cursor and VS Code installers copy it verbatim. The Claude Code installer appends a short variant directly to CLAUDE.md. Windsurf and Zed don’t have a conventional per-project rules-file slot that mcp-install targets automatically — the paragraph below is a reasonable starting point if you want to paste one in yourself.

What the default says

A minimal set of instructions along these lines:

Before starting a task: call memory_recall with a query describing what you’re about to do. Review the top matches for constraints, decisions, and relevant facts.

After learning something reusable: call memory_add with the appropriate kind:

  • fact — verifiable truths about the codebase (use --refs for file pointers)
  • decision — choices with rationale (use --rationale)
  • constraint — rules with severity (use --severity must/should/prefer)

Don’t re-ask the user for things that are already in memory.

Customizing

Edit the file freely. Common tweaks:

  • Add domain-specific guidance: “When working on the indexer, always recall with the indexer tag first.”
  • Tighten the defaults: “Only call memory_add for memories that will apply in future sessions — not for task-specific scratch.”
  • Shape the kinds: “Use fact with --refs when the memory is really a pointer to a file or symbol.”

Reloading

  • Cursor / VS Code: reload the window after editing.
  • Claude Code: appending to CLAUDE.md takes effect on the next session.
  • Zed: agent reads settings on connection — reload.

Keeping team customizations shared

If you edit the rules file and like what you got, commit it. Teammates get your tuning automatically on their next pull. The rules file is just markdown — treat it like any other piece of team guidance.

Migrating from plain-markdown memory

Many teams start with one big markdown file that their AI tool reads on every session — CLAUDE.md, AGENTS.md, .cursorrules, .windsurfrules, or a section in README.md. These files work until they don’t: they bloat context, mix levels (architectural rules next to “the CI flag is –all-features”), and rot silently.

Here’s a pragmatic migration from that world to structured memory.

Phase 1: leave the markdown alone

You don’t have to delete anything to start using Fluree Memory. Add memories for new things you learn while keeping the old file around. After a week or two of active use, you’ll have a sense of which things belong where.

fluree memory init
# ...work, capture things as they come up...
fluree memory add --kind constraint --severity must \
  --text "All public fns must have doc comments" --tags code-style

Phase 2: categorize the markdown file

Open the old file and go paragraph by paragraph. For each chunk, ask:

Chunk typeWhere it goes
High-level overview / architecture proseStays in markdown (README, ARCHITECTURE.md)
Rules (“do this”, “don’t do that”)constraint memories with --severity
Choices + reasoningdecision memories with --rationale
Named quirks / gotchasfact memories
“Look here for X”fact memories with --refs
Personal preferencesfact memories (--scope user usually)

The markdown file that’s left after this should be genuinely about framing — the 30-second project tour — not a knowledge base.

Phase 3: move the categorized chunks

Turn each chunk into a memory add call. Tag consistently so things group later:

fluree memory add --kind constraint --severity must \
  --text "Never commit secrets; use environment variables" \
  --tags security,secrets

fluree memory add --kind decision \
  --text "Use postcard for index encoding" \
  --rationale "no_std compatible, smaller than bincode" \
  --alternatives "bincode, CBOR, MessagePack" \
  --refs fluree-db-indexer/ \
  --tags indexer,encoding

fluree memory add --kind fact \
  --text "Error pattern defined here" \
  --refs fluree-db-core/src/error.rs \
  --tags errors

If you want to script it, pipe content into fluree memory add on stdin (with --kind / --tags set per-line). add reads stdin when --text is omitted:

echo "The index format uses postcard encoding" \
  | fluree memory add --kind fact --tags indexer

Phase 4: trim the old file

Once the chunks are in memory, delete them from the markdown. What’s left is your high-level orientation doc, which is fine.

Leave a pointer at the top:

> Detailed conventions, rules, and decisions are in Fluree Memory.
> Use `memory_recall` from an MCP-enabled IDE, or `fluree memory recall "..."` from the shell.

Phase 5: review

Run fluree memory status and fluree memory recall "" -n 50 to eyeball everything. Look for:

  • Duplicates — memories that say nearly the same thing with different wording.
  • Mis-categorized kinds — a “decision” with no rationale is really a fact.
  • Over-long content — memories should be paragraphs at most, not pages. Break up if needed.

Why this is worth doing

Plain markdownStructured memory
Entire file loaded every sessionOnly relevant matches loaded
No filteringFilter by kind, tag, scope
No historyFull history via git log -p
Hard to share a sliceexport + jq / curated recall
Drifts silentlystatus visibility + curation flow

You get a knowledge base the team can actually maintain — and that costs fewer tokens per session than the markdown file it replaces.

CLI reference

The fluree memory subcommands, alphabetically-ish.

CommandPurpose
initCreate the memory store and optionally configure MCP for detected AI tools
addStore a new memory
recallSearch and rank relevant memories
updateUpdate an existing memory in place
forgetRetract a memory permanently
statusSummary of the store (totals, tags, kinds)
export / importRound-trip memories as JSON
mcp-installInstall MCP config for an IDE

Several subcommands take a --format flag (text for humans, json for scripts, and context on recall for XML intended for LLM injection). The default is always text.

The common options

A few flags show up across many subcommands:

FlagDefaultWhere
--scope <repo|user>repoadd; filter on recall
--tags <t1,t2>noneadd, update; filter on recall
--kind <kind>fact on addadd; filter on recall
--format <text|json>textadd, update
--format <text|json|context>textrecall (XML context is for LLM injection)

See What is a memory? for the kind taxonomy.

Environment

VariableEffect
FLUREE_HOMEWhen set, the CLI and MCP server use this path as the unified Fluree directory. If unset, both walk up from CWD looking for an existing .fluree/; if none is found, they fall back to a platform-global config/data directory.

Set FLUREE_HOME=<repo>/.fluree if you need to force repo-scoped operation from a shell that starts elsewhere. Among the IDE integrations, only the Cursor MCP config sets this automatically via ${workspaceFolder}; the others rely on the walk-up behavior from spawn CWD.

fluree memory init

Initialize the memory store and optionally configure MCP for detected AI coding tools. Idempotent — safe to run repeatedly.

fluree memory init [OPTIONS]

Options

OptionDescription
--yes, -yAuto-confirm all MCP installations (non-interactive)
--no-mcpSkip AI tool detection and MCP configuration entirely

What init does

  1. Creates the __memory ledger inside <repo>/.fluree/ and transacts the memory schema.
  2. Creates .fluree-memory/ at the project root:
    • repo.ttl — team memories (empty to start; meant to be committed)
    • .local/user.ttl — your personal memories (gitignored)
    • .gitignore — pre-configured with .local/ (which holds your user scope and the MCP log)
  3. Migrates existing memories — if the ledger already has memories (e.g. from before the TTL file layout), they’re exported into the appropriate .ttl file.
  4. Detects AI coding tools (Claude Code, Cursor, VS Code, Windsurf, Zed) and offers to install MCP for each.

Example

$ fluree memory init

Memory store initialized at /path/to/project/.fluree-memory

Repo memories are stored in .fluree-memory/repo.ttl (git-tracked).
Commit this directory to share project knowledge with your team.

Detected AI coding tools:
  - Claude Code (already configured)
  - Cursor
  - VS Code (Copilot) (already configured)

Install MCP config for Cursor? [Y/n] Y
  Installed: .cursor/mcp.json
  Installed: .cursor/rules/fluree_rules.md

Configured 1 tool.

With --yes: auto-confirms all installations without prompting. In a non-interactive shell (piped stdin) without --yes, MCP installation is skipped with a message.

Re-running

init is safe to run again. It won’t re-create or overwrite files that already exist; it just:

  • Checks that the ledger and schema are current (migrating if not).
  • Detects IDEs you’ve since installed and offers to configure them.
  • Leaves existing memories untouched.

Run it again after:

  • Installing a new AI tool you want to wire up.
  • Cloning a repo someone else set up — init will pick up the committed repo.ttl into the ledger automatically.

fluree memory add

Store a new memory.

fluree memory add [OPTIONS]

Options

OptionDescription
--kind <KIND>fact (default), decision, constraint
--text <TEXT>Content text (or provide via stdin)
--tags <T1,T2>Required. Comma-separated tags — the primary recall signal
--refs <R1,R2>Comma-separated file/artifact references
--severity <SEV>For constraints: must, should, prefer
--scope <SCOPE>repo (default) or user
--rationale <TEXT>Why — the reasoning behind this memory (any kind)
--alternatives <TEXT>Alternatives considered (any kind)
--format <FMT>text (default) or json

Examples

# A simple fact
fluree memory add --kind fact \
  --text "Tests use cargo nextest" \
  --tags testing,cargo

# A hard constraint with rationale
fluree memory add --kind constraint \
  --text "Never suppress dead code with an underscore prefix" \
  --tags code-style \
  --severity must \
  --rationale "Underscore-prefixed names hide code from future discovery"

# From stdin (useful for piping from other tools)
echo "The index format uses postcard encoding" \
  | fluree memory add --kind fact --tags indexer

# A decision with full context
fluree memory add --kind decision \
  --text "Use postcard for compact index encoding" \
  --rationale "no_std compatible, smaller output than bincode" \
  --alternatives "bincode, CBOR, MessagePack" \
  --refs fluree-db-indexer/

# A fact pointing to a file (use --refs for artifact pointers)
fluree memory add --kind fact \
  --text "Error pattern defined here" \
  --refs fluree-db-core/src/error.rs \
  --tags errors

# A personal convention, user-scoped
fluree memory add --kind fact \
  --text "Always run clippy with --all-features" \
  --scope user \
  --tags code-style

Output

Default (text):

Stored memory: mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0

json:

{
  "id": "mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0",
  "kind": "fact",
  "scope": "repo",
  "created_at": "2026-04-14T16:45:12Z"
}

Secret detection

If the content matches a known secret pattern (AWS keys, GitHub tokens, password-bearing URLs, etc.), the sensitive portions are replaced with [REDACTED] before storage and a warning is printed. See Secrets and sensitivity.

Scope and file placement

| --scope repo (default) | Writes to .fluree-memory/repo.ttl — committable | | --scope user | Writes to .fluree-memory/.local/user.ttl — gitignored |

See Repo vs user memory.

See also

fluree memory recall

Search and retrieve relevant memories ranked by BM25 score.

fluree memory recall <QUERY> [OPTIONS]

Arguments

ArgumentDescription
<QUERY>Natural-language search query (keyword-matched, not semantic)

Options

OptionDescription
-n, --limit <N>Max results per page (default: 3)
--offset <N>Skip the first N results — use for pagination (default: 0)
--kind <KIND>Filter to a specific memory kind
--tags <T1,T2>Filter to memories with these tags
--scope <SCOPE>Filter by repo or user
--format <FMT>text (default), json, or context (XML for LLM)

Examples

# Basic recall — returns top 3
fluree memory recall "how to run tests"

# Page through a longer result set
fluree memory recall "how to run tests" --offset 3
fluree memory recall "error handling" -n 10

# Narrow with filters
fluree memory recall "error handling" --kind constraint --tags errors
fluree memory recall "deployment" --scope repo

# XML output designed for LLM context injection
fluree memory recall "testing patterns" --format context

Output

text

Recall: "how to run tests" (2 matches)

1. [score: 13.0] mem:fact-01JDXYZ...
   Tests use cargo nextest
   Tags: testing, cargo

2. [score: 8.0] mem:fact-01JDABC...
   Integration tests use assert_cmd + predicates
   Tags: testing

  (showing results 1–3; use --offset 3 for more)

json

{
  "query": "how to run tests",
  "memories": [
    {
      "memory": {
        "id": "mem:fact-01JDXYZ...",
        "kind": "fact",
        "content": "Tests use cargo nextest",
        "tags": ["testing", "cargo"],
        "scope": "repo",
        "created_at": "2026-02-22T14:00:00Z"
      },
      "score": 13.0
    }
  ],
  "total_count": 13
}

total_count is the total number of memories in the store, not the number of matches — useful for UI context but not for pagination math.

context (XML for LLM injection)

<memory-context>
  <memory id="mem:fact-01JDXYZ..." kind="fact" score="13.0">
    <content>Tests use cargo nextest</content>
    <tags>testing, cargo</tags>
  </memory>
  <pagination shown="1" offset="0" total_in_store="13" />
</memory-context>

When results are cut off, the pagination element embeds a hint:

<pagination shown="3" offset="0" limit="3" total_in_store="13">
  Results 1–3. Use offset=3 to retrieve more.
</pagination>

How ranking works

See Recall and ranking for the full story — BM25 over content plus metadata bonuses for tag, ref, kind, branch, and recency matches. Filters (--kind, --tags, --scope) narrow the candidate set. All local, deterministic, and offline.

See also

fluree memory update

Update an existing memory in place. The memory keeps the same ID — only the changed fields are modified. History is tracked via git.

fluree memory update <ID> [OPTIONS]

Options

OptionDescription
--text <TEXT>New content text
--tags <T1,T2>New tags (replaces all existing)
--refs <R1,R2>New artifact refs (replaces all existing)
--format <FMT>text (default) or json

Example

fluree memory update mem:fact-01JDXYZ... \
  --text "Tests use cargo nextest with --no-fail-fast"

Output:

Updated: mem:fact-01JDXYZ...

The TTL file is rewritten with the updated content. Use git diff to see what changed, or git log -p .fluree-memory/repo.ttl to review the full history.

See also

fluree memory forget

Retract a memory permanently. Unlike update, forget removes the memory entirely — it stops existing.

fluree memory forget <ID>

Output:

Forgotten: mem:fact-01JDXYZ...

When to forget vs. update

You think…Use
“This was wrong from the start”forget
“This was right but the world changed”update
“I never want anyone to see this again”forget

See Updates and forgetting for more detail.

Forgetting accidentally-committed secrets

Forgetting removes the memory from the ledger and rewrites repo.ttl (or .local/user.ttl) immediately, so the deletion shows up in your next git diff. If a secret value also ended up in git history, you need to scrub the history separately — see Secrets and sensitivity.

Memory history via git

The fluree memory explain command has been removed. Memory history is now tracked via git.

Viewing history

Since updates modify memories in place and the TTL file is rewritten on each change, git log shows the full history:

# Full history of all memory changes
git log -p .fluree-memory/repo.ttl

# Search for changes to a specific memory ID
git log -p -S "mem:fact-01JDXYZ" .fluree-memory/repo.ttl

# Compact one-line summary
git log --oneline .fluree-memory/repo.ttl

Time-travel via Fluree

For richer querying over memory history, import your git history into a Fluree ledger:

fluree create my-memory-ledger --memory

Each git commit becomes a Fluree transaction, enabling time-travel queries over the full evolution of your project’s memory.

See Updates and forgetting for the update model.

fluree memory status

Show a summary of the memory store.

fluree memory status

Output:

  Directory: /path/to/project/.fluree-memory
Memory Store: 12 memories, 25 tags
  Kinds: 7 fact, 2 decision, 3 constraint

Recent memories:
  - [fact] Tests use cargo nextest, not cargo test [cargo, testing]
    ID: mem:fact-01JDXYZ...
  - [decision] Use postcard for compact index encoding [encoding, indexer]
    ID: mem:decision-01JDABC...

Use memory_recall with specific keywords from above to search.

status counts all memories in the store. The “Recent memories” list is included to help agents (or you) pick good keywords for memory_recall.

When it’s useful

  • Confirming init worked and the store is live.
  • Sanity-checking after an import.
  • Quick “how much does this project remember?” check.

For per-memory detail, use recall with a broad query (e.g. fluree memory recall "" -n 100) or export.

fluree memory export / import

Round-trip memories as JSON.

export

Write all memories to stdout as a JSON array.

fluree memory export > memories.json

export takes no options — it emits every memory, both scopes included. To get a single scope, filter with jq or use recall with --scope and a permissive limit.

Output is a flat array of full memory objects:

[
  {
    "id": "mem:fact-01JDXYZ...",
    "kind": "fact",
    "content": "Tests use cargo nextest",
    "tags": ["testing", "cargo"],
    "scope": "repo",
    "severity": null,
    "artifact_refs": [],
    "branch": "main",
    "created_at": "2026-02-22T14:00:00Z"
  }
]

import

Load memories from a JSON file produced by export (or a hand-crafted array of the same shape).

fluree memory import memories.json

Import is additive — every entry in the file is re-transacted into the ledger, with secret-detection applied to content, rationale, and alternatives. IDs and timestamps from the source file are preserved. There is no dedup step, so importing the same file twice will double-insert; forget the existing entries first (or import into a freshly-initialized store) if that’s not what you want.

When to use

  • Backup / portability — export before a risky refactor.
  • Bootstrapping a new repo from another project’s knowledge.
  • Sharing a slice of memory out-of-band (e.g. into an issue or wiki).

For normal team sharing, you don’t need export/import — .fluree-memory/repo.ttl is committed to git and everyone who clones + runs fluree memory init picks it up automatically. See Repo vs user memory.

fluree memory mcp-install

Install MCP configuration for an IDE so its agent can use memory tools.

fluree memory mcp-install [--ide <IDE>]

Options

OptionDescription
--ide <IDE>Target IDE (auto-detected if omitted)

Supported IDEs

ValueConfig writtenExtras
claude-codeclaude mcp add~/.claude.json (local scope)Appends to CLAUDE.md
vscode<repo>/.vscode/mcp.json (key: servers).vscode/fluree_rules.md
cursor<repo>/.cursor/mcp.json (key: mcpServers).cursor/rules/fluree_rules.md
windsurf~/.codeium/windsurf/mcp_config.json (global)
zed<repo>/.zed/settings.json (key: context_servers)Skips if JSONC detected

Legacy aliases: claude-vscode and github-copilot both map to vscode.

When --ide is omitted, the first unconfigured detected tool is used; defaults to claude-code if nothing’s detected.

Example

fluree memory mcp-install --ide cursor

Output:

  Installed: .cursor/mcp.json
  Installed: .cursor/rules/fluree_rules.md

Per-IDE config shape

The JSON mcp-install writes differs per IDE:

Cursor (.cursor/mcp.json) is the only target that sets FLUREE_HOME by default. It uses ${workspaceFolder} interpolation to pin the memory store to the current workspace regardless of where Cursor spawns the process from:

{
  "mcpServers": {
    "fluree-memory": {
      "type": "stdio",
      "command": "fluree",
      "args": ["mcp", "serve", "--transport", "stdio"],
      "env": { "FLUREE_HOME": "${workspaceFolder}/.fluree" }
    }
  }
}

VS Code, Windsurf, Zed, Claude Code get a simpler entry with no env:

{
  "command": "fluree",
  "args": ["mcp", "serve", "--transport", "stdio"]
}

(The top-level wrapper key differs — servers for VS Code, mcpServers for Windsurf, context_servers for Zed. Claude Code’s entry is registered globally via claude mcp add.)

These rely on the MCP server’s walk-up behavior: on start, it looks for .fluree/ beginning at its spawn CWD. That’s usually the workspace, but if the IDE starts it elsewhere memory may land in a global store. See the troubleshooting section below.

Troubleshooting: repo vs global memory

Repo-scoped (the goal):

  • Memories: <repo>/.fluree-memory/repo.ttl
  • MCP log: <repo>/.fluree-memory/.local/mcp.log (truncated on each server start — tail it while reproducing the issue)

Global (something’s wrong):

  • Memories under the platform default, e.g. ~/Library/Application Support/fluree/ on macOS
  • Fix: add an explicit absolute FLUREE_HOME to the MCP config entry, pointing at your repo’s .fluree/, and fully restart (not just reload) the IDE. For Cursor, the ${workspaceFolder}-based default should already be in place — re-run mcp-install from inside the repo if it’s missing.

See also

Reference

Lookups — the things you open once to check a specific detail.

IDE support matrix

Where each supported AI coding tool stores its MCP config and its rules file, and whether the config is scoped per-repo or global.

IDEMCP configConfig scopeFLUREE_HOME set?Rules filemcp-install value
Claude Code~/.claude.json (via claude mcp add)user (local)nosection appended to <repo>/CLAUDE.mdclaude-code
Cursor<repo>/.cursor/mcp.jsonrepoyes — ${workspaceFolder}/.fluree<repo>/.cursor/rules/fluree_rules.mdcursor
VS Code (Copilot)<repo>/.vscode/mcp.jsonrepono<repo>/.vscode/fluree_rules.mdvscode
Windsurf~/.codeium/windsurf/mcp_config.jsonglobalnononewindsurf
Zed<repo>/.zed/settings.jsonrepononone (skipped if JSONC)zed

Legacy aliases:

  • claude-vscodevscode
  • github-copilotvscode

FLUREE_HOME and repo scoping

Only the Cursor config sets FLUREE_HOME automatically. For the other IDEs, the MCP server figures out which repo it’s serving by walking up from its spawn CWD until it finds a .fluree/ directory. In normal use the IDE spawns the server from the workspace root, so this works without extra configuration.

If memory ends up in a platform-global store instead of <repo>/.fluree-memory/, the fix is to add FLUREE_HOME manually to the relevant MCP config, pointing at an absolute path (or a variable the IDE interpolates — Cursor supports ${workspaceFolder}; other IDEs’ support varies). Then restart the IDE.

Known gotchas

  • Zed + JSONC: If .zed/settings.json contains // comments, mcp-install refuses to write to avoid corrupting your settings. Paste the snippet yourself or strip comments first.
  • Windsurf globals: Windsurf’s MCP config is user-global, not per-repo. If you work across multiple repos, you likely need to leave FLUREE_HOME unset and rely on walk-up — or switch the env var per project manually.
  • Cursor restarts: Cursor caches MCP servers aggressively. If a change to .cursor/mcp.json doesn’t take effect, fully quit Cursor (Cmd-Q on macOS) rather than just reloading the window.
  • Claude Code CLAUDE.md: The rules section is appended at the end of CLAUDE.md (only if one doesn’t already mention fluree memory or memory_recall). If you have a large existing CLAUDE.md, make sure the agent is actually reading to the end.

Schema (mem: vocabulary)

Every memory is a set of RDF triples. The mem: vocabulary defines the classes and predicates.

Namespace

@prefix mem: <https://ns.flur.ee/memory#> .

Classes

A memory’s kind is expressed via rdf:type (a in Turtle) — there is no mem:kind predicate.

ClassKind
mem:Factfact
mem:Decisiondecision
mem:Constraintconstraint

mem:repo and mem:user are additional IRIs used as the range of mem:scope (see below).

Core predicates

PredicateRangeRequiredMeaning
mem:contentxsd:string (indexed as @fulltext)The textual content; BM25-searchable
mem:scopeIRI — mem:repo or mem:userWhich TTL file it lives in
mem:createdAtxsd:dateTimeInsertion timestamp
mem:tagxsd:string (multi-valued)optionalFree-form tags
mem:artifactRefxsd:string (multi-valued)optionalFile / symbol / URL references
mem:branchxsd:stringoptionalGit branch captured at write time

Optional predicates (any kind)

These predicates can appear on any memory kind. All values are stored as plain string literals (not IRIs).

PredicateRangeMeaning
mem:rationalexsd:string (indexed as @fulltext)Why — the reasoning behind this memory
mem:alternativesxsd:stringWhat else was considered
mem:severityxsd:string"must", "should", or "prefer"How hard a constraint is (constraints only)

ID format

Memory IRIs take the shape:

mem:<kind>-<ULID>

Examples:

mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0
mem:decision-01JDABC6D7E8F9G0H1I2J3K4L5
mem:constraint-01JDLMN7O8P9Q0R1S2T3U4V5W6

ULIDs are sortable by creation time, which is why memories display nicely in chronological order without an explicit index.

Full example

@prefix mem: <https://ns.flur.ee/memory#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

mem:decision-01JDABC a mem:Decision ;
    mem:content "Use postcard for compact index encoding" ;
    mem:tag "encoding" ;
    mem:tag "indexer" ;
    mem:scope mem:repo ;
    mem:artifactRef "fluree-db-indexer/" ;
    mem:createdAt "2026-02-22T14:00:00Z"^^xsd:dateTime ;
    mem:rationale "no_std compatible, smaller output than bincode" ;
    mem:alternatives "bincode, CBOR, MessagePack" .

See also: TTL file format for how this shows up on disk.

TTL file format

The .fluree-memory/repo.ttl and .fluree-memory/.local/user.ttl files hold the serialized form of every memory in their respective scope. Each memory is a block of Turtle triples.

Structure

Each memory is a Turtle subject block: the IRI, followed by a mem:<Kind> (RDF type), then a predicate list in a canonical order. Multi-valued predicates (mem:tag, mem:artifactRef) repeat once per value.

# Fluree Memory — repo-scoped
# Auto-managed by `fluree memory`. Manual edits are supported.
@prefix mem: <https://ns.flur.ee/memory#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

mem:fact-01JDXYZ a mem:Fact ;
    mem:content "Tests use cargo nextest" ;
    mem:tag "cargo" ;
    mem:tag "testing" ;
    mem:scope mem:repo ;
    mem:createdAt "2026-02-22T14:00:00Z"^^xsd:dateTime .

mem:decision-01JDABC a mem:Decision ;
    mem:content "Use postcard for compact index encoding" ;
    mem:tag "encoding" ;
    mem:tag "indexer" ;
    mem:scope mem:repo ;
    mem:artifactRef "fluree-db-indexer/" ;
    mem:createdAt "2026-02-22T14:05:00Z"^^xsd:dateTime ;
    mem:rationale "no_std compatible, smaller output than bincode" ;
    mem:alternatives "bincode, CBOR, MessagePack" .

Tags and artifact refs are sorted alphabetically within a memory for deterministic diffs. When a memory is updated, the TTL file is rewritten with the changes in place and git tracks the history.

Why TTL and not JSON

Three reasons:

  • Diff-friendly — predicates are one per line within a subject block, so git diffs are readable. Memories are sorted by (branch, id), which groups memories from the same branch together and reduces merge conflicts across feature branches.
  • Merge-friendly — because the sort distributes memories by originating branch, two feature branches adding memories will insert into different regions of the file and won’t conflict on merge.
  • Semantically exact — Turtle is RDF, so there’s no impedance mismatch between what’s in the file and what’s in the __memory ledger.

Sync direction

The TTL file is the canonical store for a given scope. The __memory ledger is a derived cache rebuilt from the TTL files when they change.

When you memory add, the CLI / MCP server:

  1. Rewrites the TTL file with the new memory inserted in sorted position (authoritative).
  2. Transacts the new triples into the __memory ledger (so recall is fast).
  3. Writes a content-hash watermark to .fluree-memory/.local/build-hash.

If the ledger write fails, the hash is left stale and the next ensure_synced call rebuilds the ledger from the files. When git pulls in a new version of repo.ttl, the hash mismatch triggers the same rebuild. In practice this is invisible.

Editing by hand

You can edit repo.ttl or user.ttl directly if you need to — fix a typo, reorder, batch-retag. After editing:

fluree memory status

…to verify the store parses cleanly. If there’s a syntax error, status will point at it.

For most fixes, though, prefer update / forget — they’ll produce cleaner git history than hand-edits.

File size

TTL is compact. A project with ~200 memories typically lands under 50 KB. At that size, repo.ttl stays pleasant to review in a PR.

If a file grows past that, consider whether you’re memorizing task state instead of durable knowledge — a fluree memory status + skim + cleanup pass is usually all it takes.

Operations

This section covers operational aspects of running Fluree in production, including configuration, storage backends, monitoring, and administrative operations.

Operation Guides

Configuration

Server configuration options:

  • Command-line flags
  • Configuration files
  • Environment variables
  • Runtime settings
  • Tuning parameters

Running with Docker

Configuring the official fluree/server image:

  • Image internals (entrypoint, volumes, runtime user)
  • Three configuration approaches: env vars, mounted JSON-LD/TOML config, CLI flags
  • Common recipes: LRU cache sizing, background indexing, auth, S3+DynamoDB, query peers
  • Full annotated Docker Compose example
  • Troubleshooting (volume permissions, RUST_LOG vs FLUREE_LOG_LEVEL, cache auto-sizing under cgroup limits)

Storage Modes

Storage backend options:

  • Memory storage (development)
  • File system storage (single server)
  • AWS S3/DynamoDB (distributed)
  • IPFS / Kubo (decentralized)
  • Storage selection criteria
  • Switching between storage modes

IPFS Storage

IPFS-specific setup and configuration:

  • Kubo node installation and setup
  • JSON-LD configuration fields
  • Content addressing and CID mapping
  • Pinning strategies
  • Operational considerations

DynamoDB Nameservice

DynamoDB-specific setup and configuration:

  • Table creation (CLI, CloudFormation, Terraform)
  • Schema reference (v2 attributes)
  • AWS credentials and permissions
  • Local development with LocalStack
  • Production considerations

Telemetry and Logging

Monitoring and observability:

  • Logging configuration
  • Metrics collection
  • Tracing
  • Health monitoring
  • Performance metrics
  • Integration with monitoring systems

Admin, Health, and Stats

Administrative operations:

  • Health check endpoints
  • Server statistics
  • Manual indexing triggers
  • Backup and restore
  • Maintenance operations

Query peers and replication

Run fluree-server as a read-only query peer:

  • SSE nameservice events (GET /v1/fluree/events)
  • Peer mode (refresh on stale + write forwarding)
  • Storage proxy endpoints (/v1/fluree/storage/*) for private-storage deployments

Deployment Patterns

Development

Single-process, memory storage:

./fluree-db-server --storage memory --log-level debug

Single Server Production

File-based storage:

./fluree-db-server \
  --storage file \
  --data-dir /var/lib/fluree \
  --port 8090 \
  --log-level info

Distributed Production

AWS-backed distributed deployment:

./fluree-db-server \
  --storage aws \
  --s3-bucket fluree-prod-data \
  --s3-region us-east-1 \
  --dynamodb-table fluree-nameservice \
  --port 8090

Key Configuration Areas

Server Settings

  • Port and host binding
  • TLS/SSL certificates
  • Request size limits
  • Timeout values
  • CORS configuration

Storage Configuration

  • Storage mode selection
  • Data directory (file mode)
  • AWS credentials (S3 mode)
  • IPFS / Kubo connection (IPFS mode)
  • Connection pooling
  • Cache settings

Indexing Configuration

  • Index interval
  • Batch size
  • Memory allocation
  • Number of threads
  • Index retention

Security Configuration

  • Authentication mode
  • API key requirements
  • Signed request validation
  • Policy enforcement
  • Rate limiting

Monitoring

Health Checks

curl http://localhost:8090/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "storage": "file",
  "uptime_ms": 3600000
}

Server Statistics

curl http://localhost:8090/v1/fluree/stats

Response:

{
  "version": "0.1.0",
  "uptime_ms": 3600000,
  "ledgers": 5,
  "queries": {
    "total": 12345,
    "active": 3,
    "avg_duration_ms": 45
  },
  "transactions": {
    "total": 567,
    "avg_duration_ms": 89
  },
  "indexing": {
    "active": true,
    "pending_ledgers": 1,
    "avg_lag_ms": 1500
  }
}

Metrics Collection

Use GET /v1/fluree/stats for built-in server statistics. Prometheus-style /metrics export is not currently part of the standalone server API.

Operational Tasks

Backup

File storage backup:

# Backup data directory
tar -czf fluree-backup-$(date +%Y%m%d).tar.gz /var/lib/fluree/

AWS storage backup:

# S3 versioning enabled - automatic backups
aws s3 ls s3://fluree-prod-data/ --recursive

# Point-in-time recovery via S3 versions

Restore

File storage restore:

# Stop server
systemctl stop fluree

# Restore backup
tar -xzf fluree-backup-20240122.tar.gz -C /

# Start server
systemctl start fluree

Manual Indexing

Trigger indexing manually:

curl -X POST http://localhost:8090/v1/fluree/reindex \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

Compaction

There is no standalone HTTP compaction endpoint. Reindexing rebuilds index artifacts when you need to force a full refresh.

Performance Tuning

Memory Settings

./fluree-db-server \
  --query-memory-mb 2048 \
  --cache-size-mb 1024

Indexing Tuning

fluree-server \
  --indexing-enabled \
  --reindex-min-bytes 100000 \
  --reindex-max-bytes 1000000

Query Tuning

./fluree-db-server \
  --query-timeout-ms 30000 \
  --max-query-size 1048576 \
  --query-threads 8

High Availability

Load Balancing

Run multiple Fluree instances behind load balancer:

          ┌─────────────┐
          │   Clients   │
          └──────┬──────┘
                 │
          ┌──────▼──────┐
          │    Load     │
          │  Balancer   │
          └──────┬──────┘
                 │
    ┌────────────┼────────────┐
    │            │            │
┌───▼────┐  ┌───▼────┐  ┌───▼────┐
│Fluree 1│  │Fluree 2│  │Fluree 3│
└───┬────┘  └───┬────┘  └───┬────┘
    │           │           │
    └───────────┼───────────┘
                │
         ┌──────▼──────┐
         │  S3/Dynamo  │
         │  Nameservice│
         └─────────────┘

Failover

Configure health checks in load balancer:

health_check:
  path: /health
  interval: 10s
  timeout: 5s
  healthy_threshold: 2
  unhealthy_threshold: 3

Security Hardening

TLS/SSL

./fluree-db-server \
  --tls-cert /path/to/cert.pem \
  --tls-key /path/to/key.pem \
  --tls-ca /path/to/ca.pem

Require Authentication

./fluree-db-server \
  --require-auth \
  --require-signed-requests

Rate Limiting

./fluree-db-server \
  --rate-limit-queries 100 \
  --rate-limit-transactions 10 \
  --rate-limit-window 60

Best Practices

1. Use Appropriate Storage Mode

  • Development: memory
  • Single server: file
  • Production/Distributed: AWS
  • Decentralized: IPFS

2. Enable Monitoring

Set up monitoring for:

  • Health status
  • Query latency
  • Transaction rate
  • Indexing lag
  • Error rates

3. Regular Backups

Automate backups:

# Daily backup cron
0 2 * * * /usr/local/bin/backup-fluree.sh

4. Capacity Planning

Monitor growth:

  • Storage usage
  • Query volume
  • Transaction rate
  • Index sizes

5. Security Best Practices

  • Use TLS in production
  • Require authentication
  • Enable rate limiting
  • Regular security audits

6. Log Management

  • Rotate logs regularly
  • Ship logs to centralized system
  • Set appropriate log levels
  • Monitor error rates

Configuration

Fluree server is configured via a configuration file, command-line flags, and environment variables.

Configuration Methods

Configuration File (TOML, JSON, or JSON-LD)

The server reads configuration from .fluree/config.toml (or .fluree/config.jsonld) — the same file used by the Fluree CLI. Server settings live under the [server] section (or "server" key in JSON/JSON-LD). The server walks up from the current working directory looking for .fluree/config.toml or .fluree/config.jsonld, falling back to the global Fluree config directory ($FLUREE_HOME, or the platform config directory — see table below).

Global Directory Layout

When $FLUREE_HOME is set, both config and data share that single directory. When it is not set, the platform’s config and data directories are used:

ContentLinuxmacOSWindows
Config (config.toml)~/.config/fluree~/Library/Application Support/fluree%LOCALAPPDATA%\fluree
Data (storage/, active)~/.local/share/fluree~/Library/Application Support/fluree%LOCALAPPDATA%\fluree

On Linux, config and data directories are separated per the XDG Base Directory specification. On macOS and Windows both resolve to the same directory. When directories are split, fluree init --global writes an absolute storage_path into config.toml so the server can locate the data directory regardless of working directory.

# Use default config file discovery
fluree-server

# Override config file path
fluree-server --config /etc/fluree/config.toml

# Activate a profile
fluree-server --profile prod

Example config.toml:

[server]
listen_addr = "0.0.0.0:8090"
storage_path = "/var/lib/fluree"
log_level = "info"
# cache_max_mb = 4096  # global cache budget (MB); default: tiered fraction of RAM (30% <4GB, 40% 4-8GB, 50% ≥8GB)

[server.indexing]
enabled = true
reindex_min_bytes = 100000
# reindex_max_bytes defaults to 20% of system RAM; override only if needed:
# reindex_max_bytes = 536870912  # 512 MB

[server.auth.data]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]

JSON is also supported (detected by .json file extension):

{
  "server": {
    "listen_addr": "0.0.0.0:8090",
    "storage_path": "/var/lib/fluree",
    "indexing": { "enabled": true }
  }
}

JSON-LD Format

JSON-LD config files (.jsonld extension) add a @context that maps config keys to the Fluree config vocabulary (https://ns.flur.ee/config#), making the file valid JSON-LD. Generate one with:

fluree init --format jsonld

Example .fluree/config.jsonld:

{
  "@context": {
    "@vocab": "https://ns.flur.ee/config#"
  },
  "_comment": "Fluree Configuration — JSON-LD format.",
  "server": {
    "listen_addr": "0.0.0.0:8090",
    "storage_path": ".fluree/storage",
    "log_level": "info",
    "indexing": {
      "enabled": true,
      "reindex_min_bytes": 100000
    }
  },
  "profiles": {
    "prod": {
      "server": {
        "log_level": "warn"
      }
    }
  }
}

The @context is validated at load time (using the JSON-LD parser) but does not affect config value resolution — serde ignores unknown keys like @context and _comment. If both config.toml and config.jsonld exist in the same directory, TOML takes precedence and a warning is logged.

Profiles

Profiles allow environment-specific overrides. Define them in [profiles.<name>.server] and activate with --profile <name>:

[server]
log_level = "info"

[profiles.dev.server]
log_level = "debug"

[profiles.prod.server]
log_level = "warn"
[profiles.prod.server.indexing]
enabled = true
[profiles.prod.server.auth.data]
mode = "required"

Profile values are deep-merged onto [server] — only the fields present in the profile are overridden.

Command-Line Flags

fluree-server \
  --listen-addr 0.0.0.0:8090 \
  --storage-path /var/lib/fluree \
  --log-level info

Environment Variables

All CLI flags have corresponding environment variables with FLUREE_ prefix:

export FLUREE_LISTEN_ADDR=0.0.0.0:8090
export FLUREE_STORAGE_PATH=/var/lib/fluree
export FLUREE_LOG_LEVEL=info

fluree-server

Precedence

Configuration precedence (highest to lowest):

  1. Command-line flags
  2. Environment variables
  3. Profile overrides ([profiles.<name>.server])
  4. Config file ([server])
  5. Built-in defaults

Error Handling

If --config or --profile is specified and the configuration cannot be loaded (file not found, parse error, missing profile), the server exits with an error. This prevents silent misconfiguration in production.

If the config file is auto-discovered (no explicit --config) and cannot be parsed, the server logs a warning and continues with CLI/env/default values only.

Server Configuration

Listen Address

Address and port to bind to:

FlagEnv VarDefault
--listen-addrFLUREE_LISTEN_ADDR0.0.0.0:8090
fluree-server --listen-addr 0.0.0.0:9090

Storage Path

Path for file-based storage. If not specified, defaults to .fluree/storage relative to the working directory (the same location used by fluree init):

FlagEnv VarDefault
--storage-pathFLUREE_STORAGE_PATH.fluree/storage
# Explicit storage path (e.g. production)
fluree-server --storage-path /var/lib/fluree

# Default: uses .fluree/storage in the working directory
fluree-server

Connection Configuration (S3, DynamoDB, etc.)

For storage backends beyond local files — S3, DynamoDB nameservice, split commit/index storage, encryption — use a JSON-LD connection config file:

FlagEnv VarDefault
--connection-configFLUREE_CONNECTION_CONFIGNone

When set, the server builds its storage and nameservice from the connection config file instead of using --storage-path. The file uses the same JSON-LD format as the Fluree API connection config.

# S3 + DynamoDB via connection config
fluree server run --connection-config /etc/fluree/connection.jsonld

# Or via environment variable
FLUREE_CONNECTION_CONFIG=/etc/fluree/connection.jsonld fluree server run

Example connection config (connection.jsonld):

{
  "@context": {
    "@base": "https://ns.flur.ee/config/connection/",
    "@vocab": "https://ns.flur.ee/system#"
  },
  "@graph": [
    {
      "@id": "commitStorage",
      "@type": "Storage",
      "s3Bucket": "fluree-commits",
      "s3Prefix": "fluree-data/"
    },
    {
      "@id": "indexStorage",
      "@type": "Storage",
      "s3Bucket": "fluree-indexes--use1-az4--x-s3"
    },
    {
      "@id": "publisher",
      "@type": "Publisher",
      "dynamodbTable": "fluree-nameservice",
      "dynamodbRegion": "us-east-1"
    },
    {
      "@id": "conn",
      "@type": "Connection",
      "commitStorage": { "@id": "commitStorage" },
      "indexStorage": { "@id": "indexStorage" },
      "primaryPublisher": { "@id": "publisher" }
    }
  ]
}

Behavior notes:

  • --connection-config and --storage-path are mutually exclusive. If both are set, --connection-config takes precedence (a warning is logged).
  • Server-level settings (--cache-max-mb, --indexing-enabled, --reindex-min-bytes, --reindex-max-bytes) override any equivalent values from the connection config.
  • --indexing-enabled defaults to true. Pass --indexing-enabled=false only when a separate peer/indexer process owns index maintenance for the same storage.
  • AWS credentials and region are resolved via the standard AWS SDK chain (env vars, instance profile, ~/.aws/config, etc.) — they are not part of the connection config.
  • The connection config can use envVar indirection for sensitive fields like S3 bucket names or encryption keys (see ConfigurationValue).

Config file equivalent:

[server]
connection_config = "/etc/fluree/connection.jsonld"

Capabilities by Backend

Not all nameservice backends support all features. The server checks capabilities at runtime:

FeatureFile (local)DynamoDBStorage-backed
Query / transactYesYesYes
Event subscriptionsYesNoNo
Default context (read)YesYesYes
Default context (write)YesYesNo

If a capability is not available, the server returns an appropriate error (e.g., 501 for event subscriptions with DynamoDB).

CORS

Enable Cross-Origin Resource Sharing:

FlagEnv VarDefault
--cors-enabledFLUREE_CORS_ENABLEDtrue

When enabled, allows requests from any origin.

Body Limit

Maximum request body size in bytes:

FlagEnv VarDefault
--body-limitFLUREE_BODY_LIMIT52428800 (50MB)

Log Level

Logging verbosity:

FlagEnv VarDefault
--log-levelFLUREE_LOG_LEVELinfo

Options: trace, debug, info, warn, error

Cache Size

Global cache budget (MB):

FlagEnv VarDefault
--cache-max-mbFLUREE_CACHE_MAX_MB30/40/50% of RAM (tiered: <4GB / 4-8GB / ≥8GB)

Background Indexing

Enable background indexing and configure novelty backpressure thresholds:

FlagEnv VarDefaultDescription
--indexing-enabledFLUREE_INDEXING_ENABLEDtrueEnable background indexing (set false only when an external indexer process owns this storage)
--reindex-min-bytesFLUREE_REINDEX_MIN_BYTES100000Soft threshold (triggers background indexing)
--reindex-max-bytesFLUREE_REINDEX_MAX_BYTES20% of system RAM (256 MB fallback)Hard threshold (blocks commits until reindexed)

Config file equivalent:

[server.indexing]
enabled = true
reindex_min_bytes = 100000         # 100 KB — soft trigger
# reindex_max_bytes = 536870912    # 512 MB — defaults to 20% of system RAM if omitted

Server Role Configuration

Server Role

Operating mode: transaction server or query peer:

FlagEnv VarDefault
--server-roleFLUREE_SERVER_ROLEtransaction

Options:

  • transaction: Write-enabled, produces events stream
  • peer: Read-only, subscribes to transaction server

Transaction Server URL (Peer Mode)

Base URL of the transaction server (required in peer mode):

FlagEnv Var
--tx-server-urlFLUREE_TX_SERVER_URL
fluree-server \
  --server-role peer \
  --tx-server-url http://tx.internal:8090

Authentication Configuration

Replication vs Query Access

Fluree enforces a hard boundary between replication-scoped and query-scoped access:

  • Replication (fluree.storage.*): Raw commit and index block transfer for peer sync and CLI fetch/pull/push. These operations bypass dataset policy (data must be bit-identical). Replication tokens are operator/service-account credentials — never issue them to end users.
  • Query (fluree.ledger.read/write.*): Application-level data access through the query engine with full dataset policy enforcement. Query tokens are appropriate for end users and application service accounts.

A user holding only query-scoped tokens cannot clone or pull a ledger. They can fluree track a remote ledger (forwarding queries/transactions to the server) but cannot replicate its storage locally.

Events Endpoint Authentication

Protect the /v1/fluree/events SSE endpoint:

FlagEnv VarDefault
--events-auth-modeFLUREE_EVENTS_AUTH_MODEnone
--events-auth-audienceFLUREE_EVENTS_AUTH_AUDIENCENone
--events-auth-trusted-issuerFLUREE_EVENTS_AUTH_TRUSTED_ISSUERSNone

Modes:

  • none: No authentication
  • optional: Accept tokens but don’t require them
  • required: Require valid Bearer token

Supports both Ed25519 (embedded JWK) and OIDC/JWKS (RS256) tokens when the oidc feature is enabled and --jwks-issuer is configured. For OIDC tokens, issuer trust is implicit — only tokens signed by keys from configured JWKS endpoints will verify. For Ed25519 tokens, the issuer must appear in --events-auth-trusted-issuer.

# Ed25519 tokens only
fluree-server \
  --events-auth-mode required \
  --events-auth-trusted-issuer did:key:z6Mk...

# OIDC + Ed25519 (both work simultaneously)
fluree-server \
  --events-auth-mode required \
  --jwks-issuer "https://auth.example.com=https://auth.example.com/.well-known/jwks.json" \
  --events-auth-trusted-issuer did:key:z6Mk...

Data API Authentication

Protect query/transaction endpoints (including /v1/fluree/query/{ledger...}, /v1/fluree/insert/{ledger...}, /v1/fluree/upsert/{ledger...}, /v1/fluree/update/{ledger...}, /v1/fluree/info/{ledger...}, and /v1/fluree/exists/{ledger...}):

FlagEnv VarDefault
--data-auth-modeFLUREE_DATA_AUTH_MODEnone
--data-auth-audienceFLUREE_DATA_AUTH_AUDIENCENone
--data-auth-trusted-issuerFLUREE_DATA_AUTH_TRUSTED_ISSUERSNone
--data-auth-default-policy-classFLUREE_DATA_AUTH_DEFAULT_POLICY_CLASSNone

Modes:

  • none: No authentication (default)
  • optional: Accept tokens but don’t require them (development only)
  • required: Require either a valid Bearer token or a signed request (JWS/VC)

Bearer token scopes:

  • Read: fluree.ledger.read.all=true or fluree.ledger.read.ledgers=[...]
  • Write: fluree.ledger.write.all=true or fluree.ledger.write.ledgers=[...]

Back-compat: fluree.storage.* claims imply read scope for data endpoints.

fluree-server \
  --data-auth-mode required \
  --data-auth-trusted-issuer did:key:z6Mk...

OIDC / JWKS Token Verification

When the oidc feature is enabled, the server can verify JWT tokens signed by external identity providers (e.g., Fluree Cloud Service) using JWKS (JSON Web Key Set) endpoints. This is in addition to the existing embedded-JWK (Ed25519 did:key) verification path.

Dual-path dispatch: The server inspects each Bearer token’s header:

  • Embedded JWK (Ed25519): Uses the existing verify_jws() path — no JWKS needed.
  • kid header (RS256): Uses OIDC/JWKS path — fetches the signing key from the issuer’s JWKS endpoint.

Both paths coexist; no configuration change is needed for existing Ed25519 tokens.

FlagEnv VarDefaultDescription
--jwks-issuerFLUREE_JWKS_ISSUERSNoneOIDC issuer to trust (repeatable)
--jwks-cache-ttlFLUREE_JWKS_CACHE_TTL300JWKS cache TTL in seconds

The --jwks-issuer flag takes the format <issuer_url>=<jwks_url>:

fluree-server \
  --data-auth-mode required \
  --jwks-issuer "https://solo.example.com=https://solo.example.com/.well-known/jwks.json"

For multiple issuers, repeat the flag or use comma separation in the env var:

# CLI flags (repeatable)
fluree-server \
  --jwks-issuer "https://issuer1.example.com=https://issuer1.example.com/.well-known/jwks.json" \
  --jwks-issuer "https://issuer2.example.com=https://issuer2.example.com/.well-known/jwks.json"

# Environment variable (comma-separated)
export FLUREE_JWKS_ISSUERS="https://issuer1.example.com=https://issuer1.example.com/.well-known/jwks.json,https://issuer2.example.com=https://issuer2.example.com/.well-known/jwks.json"

Behavior details:

  • JWKS endpoints are fetched at startup (warm()) but the server starts even if they’re unreachable.
  • Keys are cached and refreshed when a kid miss occurs (rate-limited to one refresh per issuer every 10 seconds).
  • The token’s iss claim must exactly match a configured issuer URL — unconfigured issuers are rejected immediately with a clear error.
  • Data API, events, admin, and storage proxy endpoints all support JWKS verification. A single --jwks-issuer flag enables OIDC tokens across all endpoint groups. MCP auth continues to use the existing Ed25519 path only.

Connection-Scoped SPARQL Scope Enforcement

When a Bearer token is present for connection-scoped SPARQL queries (/v1/fluree/query with Content-Type: application/sparql-query), the server enforces ledger scope:

  • FROM / FROM NAMED clauses are parsed to extract ledger IDs (name:branch).
  • Each ledger ID is checked against the token’s read scope (fluree.ledger.read.all or fluree.ledger.read.ledgers).
  • Out-of-scope ledgers return 404 (no existence leak).
  • If no FROM clause is present, the query proceeds normally (the engine handles missing dataset errors).

Admin Endpoint Authentication

Protect /v1/fluree/create, /v1/fluree/drop, /v1/fluree/reindex, branch administration, and Iceberg mapping endpoints:

FlagEnv VarDefault
--admin-auth-modeFLUREE_ADMIN_AUTH_MODEnone
--admin-auth-trusted-issuerFLUREE_ADMIN_AUTH_TRUSTED_ISSUERSNone

Modes:

  • none: No authentication (development)
  • required: Require valid Bearer token (production)

Supports both Ed25519 (embedded JWK) and OIDC/JWKS (RS256) tokens when the oidc feature is enabled and --jwks-issuer is configured. For OIDC tokens, issuer trust is implicit — only tokens signed by keys from configured JWKS endpoints will verify. For Ed25519 tokens, the issuer must appear in --admin-auth-trusted-issuer or the fallback --events-auth-trusted-issuer.

# Ed25519 tokens only
fluree-server \
  --admin-auth-mode required \
  --admin-auth-trusted-issuer did:key:z6Mk...

# OIDC (trust comes from --jwks-issuer, no did:key issuers needed)
fluree-server \
  --admin-auth-mode required \
  --jwks-issuer "https://auth.example.com=https://auth.example.com/.well-known/jwks.json"

If no admin-specific issuers are configured, falls back to --events-auth-trusted-issuer.

MCP Endpoint Authentication

Protect the /mcp Model Context Protocol endpoint:

FlagEnv VarDefault
--mcp-enabledFLUREE_MCP_ENABLEDfalse
--mcp-auth-trusted-issuerFLUREE_MCP_AUTH_TRUSTED_ISSUERSNone
fluree-server \
  --mcp-enabled \
  --mcp-auth-trusted-issuer did:key:z6Mk...

Peer Mode Configuration

Peer Subscription

Configure what the peer subscribes to:

FlagDescription
--peer-subscribe-allSubscribe to all ledgers and graph sources
--peer-ledger <ledger-id>Subscribe to specific ledger (repeatable)
--peer-graph-source <ledger-id>Subscribe to specific graph source (repeatable)
fluree-server \
  --server-role peer \
  --tx-server-url http://tx:8090 \
  --peer-subscribe-all

Or subscribe to specific resources:

fluree-server \
  --server-role peer \
  --tx-server-url http://tx:8090 \
  --peer-ledger books:main \
  --peer-ledger users:main

Peer Events Configuration

FlagEnv VarDescription
--peer-events-urlFLUREE_PEER_EVENTS_URLCustom events URL (default: {tx_server_url}/v1/fluree/events)
--peer-events-tokenFLUREE_PEER_EVENTS_TOKENBearer token for events (supports @filepath)

Peer Reconnection

FlagDefaultDescription
--peer-reconnect-initial-ms1000Initial reconnect delay
--peer-reconnect-max-ms30000Maximum reconnect delay
--peer-reconnect-multiplier2.0Backoff multiplier

Peer Storage Access

FlagEnv VarDefault
--storage-access-modeFLUREE_STORAGE_ACCESS_MODEshared

Options:

  • shared: Direct storage access (requires --storage-path or --connection-config)
  • proxy: Proxy reads through transaction server

For proxy mode:

FlagEnv Var
--storage-proxy-tokenFLUREE_STORAGE_PROXY_TOKEN
--storage-proxy-token-fileFLUREE_STORAGE_PROXY_TOKEN_FILE

Storage Proxy Configuration (Transaction Server)

Storage proxy provides replication-scoped access to raw storage for peer servers and CLI replication commands (fetch/pull/push). Tokens must carry fluree.storage.* claims — query-scoped tokens (fluree.ledger.read/write.*) are not sufficient. See Replication vs Query Access above.

Enable storage proxy endpoints for peers without direct storage access:

FlagEnv VarDefault
--storage-proxy-enabledFLUREE_STORAGE_PROXY_ENABLEDfalse
--storage-proxy-trusted-issuerFLUREE_STORAGE_PROXY_TRUSTED_ISSUERSNone
--storage-proxy-default-identityFLUREE_STORAGE_PROXY_DEFAULT_IDENTITYNone
--storage-proxy-default-policy-classFLUREE_STORAGE_PROXY_DEFAULT_POLICY_CLASSNone
--storage-proxy-debug-headersFLUREE_STORAGE_PROXY_DEBUG_HEADERSfalse
# Ed25519 trust (did:key):
fluree-server \
  --storage-proxy-enabled \
  --storage-proxy-trusted-issuer did:key:z6Mk...

# OIDC/JWKS trust (same --jwks-issuer flag used by other endpoints):
fluree-server \
  --storage-proxy-enabled \
  --jwks-issuer "https://solo.example.com=https://solo.example.com/.well-known/jwks.json"

JWKS support: When --jwks-issuer is configured, storage proxy endpoints accept RS256 OIDC tokens in addition to Ed25519 JWS tokens. The --jwks-issuer flag is shared with data, admin, and events endpoints — a single flag enables OIDC across all endpoint groups.

Complete Configuration Examples

Development (Memory Storage)

fluree-server \
  --log-level debug

Single Server (File Storage)

fluree-server \
  --storage-path /var/lib/fluree \
  --indexing-enabled \
  --log-level info

Production with Admin Auth

fluree-server \
  --storage-path /var/lib/fluree \
  --indexing-enabled \
  --admin-auth-mode required \
  --admin-auth-trusted-issuer did:key:z6Mk... \
  --log-level info

Transaction Server with Events Auth

fluree-server \
  --storage-path /var/lib/fluree \
  --events-auth-mode required \
  --events-auth-trusted-issuer did:key:z6Mk... \
  --storage-proxy-enabled \
  --admin-auth-mode required

Production with OIDC (All Endpoints)

fluree-server \
  --storage-path /var/lib/fluree \
  --indexing-enabled \
  --jwks-issuer "https://auth.example.com=https://auth.example.com/.well-known/jwks.json" \
  --data-auth-mode required \
  --events-auth-mode required \
  --admin-auth-mode required \
  --storage-proxy-enabled

Query Peer (Shared Storage)

fluree-server \
  --server-role peer \
  --tx-server-url http://tx.internal:8090 \
  --storage-path /var/lib/fluree \
  --peer-subscribe-all \
  --peer-events-token @/etc/fluree/peer-token.jwt

Query Peer (Proxy Storage)

fluree-server \
  --server-role peer \
  --tx-server-url http://tx.internal:8090 \
  --storage-access-mode proxy \
  --storage-proxy-token @/etc/fluree/storage-proxy.jwt \
  --peer-subscribe-all \
  --peer-events-token @/etc/fluree/peer-token.jwt

S3 + DynamoDB (Connection Config)

fluree server run \
  --connection-config /etc/fluree/connection.jsonld \
  --indexing-enabled \
  --reindex-min-bytes 100000 \
  --reindex-max-bytes 5000000 \
  --cache-max-mb 4096

With a config file:

[server]
connection_config = "/etc/fluree/connection.jsonld"
cache_max_mb = 4096

[server.indexing]
enabled = true
reindex_min_bytes = 100000
reindex_max_bytes = 5000000

[server.auth.data]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]

S3 Peer (Shared Storage via Connection Config)

fluree server run \
  --server-role peer \
  --tx-server-url http://tx.internal:8090 \
  --connection-config /etc/fluree/connection.jsonld \
  --peer-subscribe-all \
  --peer-events-token @/etc/fluree/peer-token.jwt

Environment Variables Reference

VariableDescriptionDefault
FLUREE_HOMEGlobal Fluree directory (unified config + data)Platform dirs (see Global Directory Layout)
FLUREE_CONFIGConfig file path.fluree/config.{toml,jsonld} (auto-discovered)
FLUREE_PROFILEConfiguration profile nameNone
FLUREE_LISTEN_ADDRServer address:port0.0.0.0:8090
FLUREE_STORAGE_PATHFile storage path.fluree/storage
FLUREE_CONNECTION_CONFIGJSON-LD connection config file pathNone
FLUREE_CORS_ENABLEDEnable CORStrue
FLUREE_INDEXING_ENABLEDEnable background indexingtrue
FLUREE_REINDEX_MIN_BYTESSoft reindex threshold (bytes)100000
FLUREE_REINDEX_MAX_BYTESHard reindex threshold (bytes)20% of system RAM (256 MB fallback)
FLUREE_CACHE_MAX_MBGlobal cache budget (MB)30/40/50% of RAM (tiered: <4GB / 4-8GB / ≥8GB)
FLUREE_BODY_LIMITMax request body bytes52428800
FLUREE_LOG_LEVELLog levelinfo
FLUREE_SERVER_ROLEServer roletransaction
FLUREE_TX_SERVER_URLTransaction server URLNone
FLUREE_EVENTS_AUTH_MODEEvents auth modenone
FLUREE_EVENTS_AUTH_TRUSTED_ISSUERSEvents trusted issuersNone
FLUREE_DATA_AUTH_MODEData API auth modenone
FLUREE_DATA_AUTH_AUDIENCEData API expected audienceNone
FLUREE_DATA_AUTH_TRUSTED_ISSUERSData API trusted issuersNone
FLUREE_DATA_AUTH_DEFAULT_POLICY_CLASSData API default policy classNone
FLUREE_ADMIN_AUTH_MODEAdmin auth modenone
FLUREE_ADMIN_AUTH_TRUSTED_ISSUERSAdmin trusted issuersNone
FLUREE_MCP_ENABLEDEnable MCP endpointfalse
FLUREE_MCP_AUTH_TRUSTED_ISSUERSMCP trusted issuersNone
FLUREE_STORAGE_ACCESS_MODEPeer storage modeshared
FLUREE_STORAGE_PROXY_ENABLEDEnable storage proxyfalse

Command-Line Reference

fluree-server --help

Best Practices

1. Keep Secrets Out of Config Files

Tokens and credentials should not be stored as plaintext in config files (which may be committed to version control or readable by other processes). Three options, in order of preference:

Environment variables (recommended for production):

export FLUREE_PEER_EVENTS_TOKEN=$(cat /etc/fluree/token.jwt)
export FLUREE_STORAGE_PROXY_TOKEN=$(cat /etc/fluree/proxy-token.jwt)

@filepath references in config files or CLI flags (reads the file at startup):

[server.peer]
events_token = "@/etc/fluree/peer-token.jwt"
storage_proxy_token = "@/etc/fluree/proxy-token.jwt"
--peer-events-token @/etc/fluree/token.jwt

Direct values (development only): If a secret-bearing field contains a literal token in the config file, the server logs a warning at startup recommending @filepath or env vars.

The following config file fields support @filepath resolution:

Config file keyEnv var alternative
peer.events_tokenFLUREE_PEER_EVENTS_TOKEN
peer.storage_proxy_tokenFLUREE_STORAGE_PROXY_TOKEN

2. Enable Admin Auth in Production

Always protect admin endpoints in production:

fluree-server \
  --admin-auth-mode required \
  --admin-auth-trusted-issuer did:key:z6Mk...

3. Use File Storage for Persistence

Memory storage is lost on restart:

# Development only
fluree-server

# Production
fluree-server --storage-path /var/lib/fluree

4. Monitor Logs

Use structured logging for production:

fluree-server --log-level info 2>&1 | jq .

Remote Connections

Remote connections enable SPARQL SERVICE federation against other Fluree instances. A remote connection maps a name to a server URL and bearer token. Once registered, queries can reference any ledger on that server using SERVICE <fluree:remote:<name>/<ledger>> { ... }.

Rust API

Register remote connections on the FlureeBuilder:

#![allow(unused)]
fn main() {
let fluree = FlureeBuilder::file("./data")
    .remote_connection("acme", "https://acme-fluree.example.com", Some(token))
    .remote_connection("partner", "https://partner.example.com", None)
    .build()?;
}

Each call registers a named connection. The name is used in SPARQL queries:

SERVICE <fluree:remote:acme/customers:main> { ?s ?p ?o }
SERVICE <fluree:remote:partner/inventory:main> { ?item ex:sku ?sku }

Connection Parameters

ParameterDescription
nameAlias used in fluree:remote:<name>/... URIs
base_urlServer URL (e.g., https://acme-fluree.example.com). The query path /v1/fluree/query/{ledger} is appended automatically.
tokenOptional bearer token for authentication. Sent as Authorization: Bearer <token> on every request.

The default per-request timeout is 30 seconds. Requests that exceed this produce a query error (or empty results with SERVICE SILENT).

Security

Bearer tokens are stored in memory on the Fluree instance. They are never serialized to storage, included in nameservice records, or exposed through info/admin endpoints. If the token needs rotation, rebuild the Fluree instance with an updated token, or use set_remote_service() to inject a custom executor with token refresh logic.

Feature Flag

The HTTP transport for remote SERVICE requires the search-remote-client Cargo feature (which enables reqwest). Without this feature, remote connections can be registered but queries against them will fail at runtime. The feature is enabled by default in the server binary.

See SPARQL: Remote Fluree Federation for query syntax and examples.

Running Fluree with Docker

The official image (fluree/server) ships the fluree binary on a slim Debian base. This guide covers what’s inside the image, how to configure it (env vars, mounted config files, CLI flags), and worked recipes for the common production patterns.

What’s in the Image

AspectValue
Basedebian:trixie-slim
Entrypoint/usr/local/bin/fluree-entrypoint.sh
Default commandfluree server run
WORKDIR/var/lib/fluree
VOLUME/var/lib/fluree
Exposed port8090
Runtime userfluree (UID 1000, GID 1000)
HealthcheckGET /health every 30s
Default log filterRUST_LOG=info

Entrypoint behavior: on first start, if /var/lib/fluree/.fluree/ does not exist, the entrypoint runs fluree init to create a default .fluree/config.toml and .fluree/storage/ directory. Subsequent starts skip init. Any arguments passed to docker run after the image name are forwarded to fluree server run, so you can append CLI flags (e.g. --log-level debug) directly.

Quick Start

docker run --rm -p 8090:8090 fluree/server:latest

Verify:

curl http://localhost:8090/health

Data lives inside the container’s writable layer here — fine for trying things out, lost when the container is removed. For anything beyond a smoke test, mount a volume.

Persisting Data

The image declares VOLUME /var/lib/fluree. Mount a host directory or named volume there:

# Named volume (recommended)
docker run -d --name fluree \
  -p 8090:8090 \
  -v fluree-data:/var/lib/fluree \
  fluree/server:latest

# Host bind mount — make sure the directory is writable by UID 1000
mkdir -p ./fluree-data && sudo chown 1000:1000 ./fluree-data
docker run -d --name fluree \
  -p 8090:8090 \
  -v "$PWD/fluree-data:/var/lib/fluree" \
  fluree/server:latest

The volume holds both .fluree/config.toml (config) and .fluree/storage/ (ledger data) by default.

Three Ways to Configure

Fluree resolves configuration with this precedence (highest wins):

  1. CLI flags appended after the image name
  2. Environment variables (FLUREE_*) set with -e or environment:
  3. Profile overrides ([profiles.<name>.server]) when you pass --profile
  4. Config file at .fluree/config.toml or .fluree/config.jsonld
  5. Built-in defaults

You can use any one of these — or, more typically, layer them: bake a base config file into a volume, then tweak per-environment with env vars or compose overrides.

Heads up — log level: The Dockerfile sets ENV RUST_LOG=info. The console log filter uses RUST_LOG if it is non-empty and only falls back to FLUREE_LOG_LEVEL when RUST_LOG is unset. Inside this image you must override RUST_LOG to change console verbosity:

docker run -e RUST_LOG=debug fluree/server:latest

1. Environment Variables Only

Every CLI flag has a FLUREE_* env var equivalent (see Configuration). For simple deployments this is the lowest-friction path:

docker run -d --name fluree \
  -p 8090:8090 \
  -v fluree-data:/var/lib/fluree \
  -e FLUREE_LISTEN_ADDR=0.0.0.0:8090 \
  -e FLUREE_STORAGE_PATH=/var/lib/fluree/.fluree/storage \
  -e FLUREE_INDEXING_ENABLED=true \
  -e FLUREE_REINDEX_MIN_BYTES=1000000 \
  -e FLUREE_REINDEX_MAX_BYTES=10000000 \
  -e FLUREE_CACHE_MAX_MB=2048 \
  -e RUST_LOG=info \
  fluree/server:latest

2. Mounted Config File (JSON-LD or TOML)

Author a config file on the host, then mount it at /var/lib/fluree/.fluree/config.jsonld (or .toml). The server walks up from WORKDIR=/var/lib/fluree and picks it up automatically.

./fluree-config/config.jsonld:

{
  "@context": { "@vocab": "https://ns.flur.ee/config#" },
  "server": {
    "listen_addr": "0.0.0.0:8090",
    "storage_path": "/var/lib/fluree/.fluree/storage",
    "log_level": "info",
    "cache_max_mb": 2048,
    "indexing": {
      "enabled": true,
      "reindex_min_bytes": 1000000,
      "reindex_max_bytes": 10000000
    }
  },
  "profiles": {
    "prod": {
      "server": {
        "log_level": "warn",
        "cache_max_mb": 8192
      }
    }
  }
}
docker run -d --name fluree \
  -p 8090:8090 \
  -v fluree-data:/var/lib/fluree \
  -v "$PWD/fluree-config/config.jsonld:/var/lib/fluree/.fluree/config.jsonld:ro" \
  fluree/server:latest --profile prod

If both config.toml and config.jsonld exist in the same directory, TOML wins and the server logs a warning. Pick one format.

The TOML equivalent (./fluree-config/config.toml):

[server]
listen_addr = "0.0.0.0:8090"
storage_path = "/var/lib/fluree/.fluree/storage"
log_level = "info"
cache_max_mb = 2048

[server.indexing]
enabled = true
reindex_min_bytes = 1000000
reindex_max_bytes = 10000000

[profiles.prod.server]
log_level = "warn"
cache_max_mb = 8192

You can also stash the config outside WORKDIR and point at it explicitly:

docker run -d --name fluree \
  -p 8090:8090 \
  -v fluree-data:/var/lib/fluree \
  -v "$PWD/fluree-config:/etc/fluree:ro" \
  fluree/server:latest --config /etc/fluree/config.jsonld

3. Layered: File + Env Var Overrides

The common production shape: bake the base config into the image or volume, then let the orchestrator override per-environment with FLUREE_* env vars. Env vars beat the file — no file edit needed to bump cache size in staging vs. prod.

docker run -d --name fluree \
  -p 8090:8090 \
  -v fluree-data:/var/lib/fluree \
  -v "$PWD/fluree-config/config.jsonld:/var/lib/fluree/.fluree/config.jsonld:ro" \
  -e FLUREE_CACHE_MAX_MB=4096 \
  -e RUST_LOG=warn \
  fluree/server:latest

Common Configuration Recipes

Tuning the LRU Cache

cache_max_mb is the global budget for the in-memory index/flake cache. The default is a tiered fraction of system RAM (30%/40%/50% for <4GB/4–8GB/≥8GB hosts). On a container with a hard memory limit, set this explicitly — the auto-tier reads host RAM, not the cgroup limit, and can over-allocate.

# docker-compose.yml fragment
services:
  fluree:
    image: fluree/server:latest
    mem_limit: 6g
    environment:
      FLUREE_CACHE_MAX_MB: 3072    # ~50% of the cgroup limit

Or in JSON-LD:

{
  "@context": { "@vocab": "https://ns.flur.ee/config#" },
  "server": { "cache_max_mb": 3072 }
}

Background Indexing

Indexing is off by default. Enable it for production write workloads — without it, every commit writes to novelty and queries get slower as novelty grows.

SettingMeaning
indexing.enabledTurn the background indexer on
reindex_min_bytesSoft threshold — novelty above this triggers a background reindex
reindex_max_bytesHard threshold — commits block above this until reindexing catches up

Tune min/max based on commit volume. Defaults (100 KB / 1 MB) are conservative; busy ledgers should raise both:

[server.indexing]
enabled = true
reindex_min_bytes = 5000000     # 5 MB — start indexing in the background
reindex_max_bytes = 50000000    # 50 MB — block commits at this point
docker run -d \
  -e FLUREE_INDEXING_ENABLED=true \
  -e FLUREE_REINDEX_MIN_BYTES=5000000 \
  -e FLUREE_REINDEX_MAX_BYTES=50000000 \
  fluree/server:latest

CORS and Request Body Size

[server]
cors_enabled = true
body_limit = 104857600    # 100 MB — raise for bulk imports

Authentication (Production)

Require a Bearer token on data and admin endpoints. The trusted issuer is the did:key of your token signer.

[server.auth.data]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]

[server.auth.admin]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]

For OIDC/JWKS (e.g. an external IdP), set --jwks-issuer or FLUREE_JWKS_ISSUERS:

docker run -d \
  -e FLUREE_DATA_AUTH_MODE=required \
  -e FLUREE_JWKS_ISSUERS="https://auth.example.com=https://auth.example.com/.well-known/jwks.json" \
  fluree/server:latest

See Configuration → Authentication for the full matrix.

S3 + DynamoDB (Distributed Storage)

For multi-node or cloud deployments, point the server at a JSON-LD connection config describing your storage and nameservice. AWS credentials come from the standard SDK chain (env vars, IAM role, etc.) — they are not part of the connection config.

./fluree-config/connection.jsonld:

{
  "@context": {
    "@base": "https://ns.flur.ee/config/connection/",
    "@vocab": "https://ns.flur.ee/system#"
  },
  "@graph": [
    { "@id": "commitStorage", "@type": "Storage",
      "s3Bucket": "fluree-prod-commits", "s3Prefix": "data/" },
    { "@id": "indexStorage", "@type": "Storage",
      "s3Bucket": "fluree-prod-indexes" },
    { "@id": "publisher", "@type": "Publisher",
      "dynamodbTable": "fluree-nameservice", "dynamodbRegion": "us-east-1" },
    { "@id": "conn", "@type": "Connection",
      "commitStorage": { "@id": "commitStorage" },
      "indexStorage":  { "@id": "indexStorage" },
      "primaryPublisher": { "@id": "publisher" } }
  ]
}
docker run -d --name fluree \
  -p 8090:8090 \
  -v "$PWD/fluree-config:/etc/fluree:ro" \
  -e AWS_REGION=us-east-1 \
  -e AWS_ACCESS_KEY_ID=... \
  -e AWS_SECRET_ACCESS_KEY=... \
  -e FLUREE_CONNECTION_CONFIG=/etc/fluree/connection.jsonld \
  -e FLUREE_INDEXING_ENABLED=true \
  fluree/server:latest

--connection-config and --storage-path are mutually exclusive. See Configuration → Connection Configuration and the DynamoDB guide for backend-specific setup.

Search Service (fluree-search-httpd)

Run a dedicated BM25 / vector search service alongside the main server when search traffic is heavy enough that you want it isolated from the transactional path. The service is a separate binary with its own listen port — it is not mounted under the main server’s api_base_url. It needs read access to the same storage and nameservice paths the main server writes to.

docker run -d --name fluree-search \
  -p 9090:9090 \
  -v fluree-data:/var/lib/fluree \
  -e FLUREE_STORAGE_ROOT=/var/lib/fluree/storage \
  -e FLUREE_NAMESERVICE_PATH=/var/lib/fluree/ns \
  fluree/search-httpd:latest
Env varDefaultPurpose
FLUREE_STORAGE_ROOT(required)Storage path (file:// optional)
FLUREE_NAMESERVICE_PATH(required)Nameservice path
FLUREE_SEARCH_LISTEN0.0.0.0:9090Listen address
FLUREE_SEARCH_CACHE_MAX_ENTRIES100Max cached indexes
FLUREE_SEARCH_CACHE_TTL_SECS300Cache TTL
FLUREE_SEARCH_MAX_LIMIT1000Max results per query
FLUREE_SEARCH_DEFAULT_TIMEOUT_MS30000Default request timeout
FLUREE_SEARCH_MAX_TIMEOUT_MS300000Maximum allowed timeout

Prerequisites. The service only serves queries against indexes that already exist on the shared volume. BM25 / vector graph-source indexes are created via the Rust API today (Bm25CreateConfig + create_full_text_index, or VectorCreateConfig + create_vector_index). The @fulltext datatype and the f:fullTextDefaults config-graph paths are managed entirely through the main server’s HTTP API and don’t require this dedicated service.

Compose example with both services sharing a volume:

services:
  fluree:
    image: fluree/server:latest
    ports:
      - "8090:8090"
    volumes:
      - fluree-data:/var/lib/fluree
    environment:
      RUST_LOG: info
      FLUREE_INDEXING_ENABLED: "true"

  fluree-search:
    image: fluree/search-httpd:latest
    depends_on:
      - fluree
    ports:
      - "9090:9090"
    volumes:
      - fluree-data:/var/lib/fluree:ro     # read-only is sufficient
    environment:
      RUST_LOG: info
      FLUREE_STORAGE_ROOT: /var/lib/fluree/storage
      FLUREE_NAMESERVICE_PATH: /var/lib/fluree/ns

volumes:
  fluree-data:

Clients send search requests to POST http://fluree-search:9090/v1/search. See BM25 → Remote Search Service for the request/response protocol.

Query Peer

Run as a read-only peer that subscribes to a transaction server’s event stream:

docker run -d --name fluree-peer \
  -p 8090:8090 \
  -v fluree-peer-data:/var/lib/fluree \
  -e FLUREE_SERVER_ROLE=peer \
  -e FLUREE_TX_SERVER_URL=http://tx.internal:8090 \
  fluree/server:latest --peer-subscribe-all

See Query peers and replication for the proxy-mode and auth options.

Docker Compose: Full Example

A production-leaning single-node setup with a mounted JSON-LD config, env-var overrides, named data volume, and resource limits:

services:
  fluree:
    image: fluree/server:latest
    container_name: fluree
    restart: unless-stopped
    ports:
      - "8090:8090"
    volumes:
      - fluree-data:/var/lib/fluree
      - ./fluree-config/config.jsonld:/var/lib/fluree/.fluree/config.jsonld:ro
    environment:
      RUST_LOG: info
      FLUREE_CACHE_MAX_MB: 4096
      FLUREE_INDEXING_ENABLED: "true"
      FLUREE_REINDEX_MIN_BYTES: "5000000"
      FLUREE_REINDEX_MAX_BYTES: "50000000"
      # Auth — point at your trusted did:key signer
      FLUREE_DATA_AUTH_MODE: required
      FLUREE_DATA_AUTH_TRUSTED_ISSUERS: did:key:z6Mk...
      FLUREE_ADMIN_AUTH_MODE: required
      FLUREE_ADMIN_AUTH_TRUSTED_ISSUERS: did:key:z6Mk...
    mem_limit: 8g
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8090/health"]
      interval: 30s
      timeout: 3s
      start_period: 15s
      retries: 3
    command: ["--profile", "prod"]

volumes:
  fluree-data:
docker compose up -d
docker compose logs -f fluree

Troubleshooting

Container restarts after fluree init. First-run init only runs when /var/lib/fluree/.fluree/ is missing. If the volume is owned by a non-1000 UID, init fails. Fix with sudo chown -R 1000:1000 ./fluree-data on the host.

Mounted config file is ignored. Confirm the mount path and the file extension. The server only auto-discovers .fluree/config.toml or .fluree/config.jsonld under the working directory. Anything else needs --config <path> (or FLUREE_CONFIG=<path>). If both formats are present in the same directory, TOML wins — check the startup logs for the warning.

Setting FLUREE_LOG_LEVEL doesn’t change console output. The image’s ENV RUST_LOG=info shadows it. Override with -e RUST_LOG=debug instead.

cache_max_mb auto-default is too large under a memory limit. The auto-tier reads host RAM, not the cgroup. Set FLUREE_CACHE_MAX_MB (or cache_max_mb in the file) to a value sized to the container limit.

Health check failing. curl http://localhost:8090/health from your host. If the server is up but the healthcheck fails, the listen address is probably bound to 127.0.0.1 inside the container — set FLUREE_LISTEN_ADDR=0.0.0.0:8090.

Storage Modes

Fluree supports four storage modes, each optimized for different deployment scenarios. This document provides detailed information about each storage mode and guidance for choosing the right one.

Storage Modes

Memory Storage

In-memory storage for development and testing:

./fluree-db-server --storage memory

Characteristics:

  • Data stored in RAM only
  • No persistence (data lost on restart)
  • Fastest performance
  • No external dependencies

Use Cases:

  • Local development
  • Unit testing
  • Temporary/ephemeral databases
  • Prototyping

Limitations:

  • No durability (data lost on crash/restart)
  • Limited by available RAM
  • Single process only

File Storage

Local file system storage:

./fluree-db-server \
  --storage file \
  --data-dir /var/lib/fluree

Characteristics:

  • Data persisted to local disk
  • Survives server restarts
  • Good performance (SSD recommended)
  • Simple setup

Use Cases:

  • Single-server production
  • Development with persistence
  • Edge deployments
  • Small to medium scale

Limitations:

  • Single machine only
  • No built-in replication
  • Limited by disk capacity
  • No cross-region support

AWS Storage

Distributed storage using S3 and DynamoDB:

./fluree-db-server \
  --storage aws \
  --s3-bucket fluree-prod-data \
  --s3-region us-east-1 \
  --dynamodb-table fluree-nameservice \
  --dynamodb-region us-east-1

Characteristics:

  • Distributed, scalable storage
  • Multi-process coordination
  • Cross-region replication
  • High durability (99.999999999%)

Use Cases:

  • Multi-server production
  • High availability requirements
  • Geographic distribution
  • Cloud-native applications

Limitations:

  • Requires AWS account
  • Higher latency than local storage
  • Usage costs
  • More complex setup

IPFS Storage

Decentralized content-addressed storage via a local Kubo node:

{
  "@context": {"@vocab": "https://ns.flur.ee/system#"},
  "@graph": [{
    "@type": "Connection",
    "indexStorage": {
      "@type": "Storage",
      "ipfsApiUrl": "http://127.0.0.1:5001",
      "ipfsPinOnPut": true
    }
  }]
}

Characteristics:

  • Content-addressed (every blob identified by SHA-256 hash)
  • Immutable, tamper-evident storage
  • Decentralized replication via IPFS network
  • Fluree’s native CIDs work directly with IPFS

Use Cases:

  • Decentralized / censorship-resistant deployments
  • Content integrity verification
  • Cross-organization data sharing
  • Foundation for IPNS/ENS-based ledger discovery

Limitations:

  • Requires a running Kubo node
  • No prefix listing (manifest-based tracking needed)
  • No native deletion (unpin + GC)
  • Higher write latency than local file I/O

See IPFS Storage Guide for complete setup and configuration.

Storage Architecture

Memory Storage

┌──────────────────────┐
│   Fluree Process     │
│  ┌────────────────┐  │
│  │  Hash Map      │  │
│  │  (In Memory)   │  │
│  └────────────────┘  │
└──────────────────────┘

All data in process memory.

File Storage

┌──────────────────────┐
│   Fluree Process     │
│  ┌────────────────┐  │
│  │   File I/O     │  │
│  └────────┬───────┘  │
└───────────┼──────────┘
            │
     ┌──────▼──────┐
     │ File System │
     │  /var/lib/  │
     │   fluree/   │
     └─────────────┘

Data persisted to local files.

AWS Storage

┌──────────────────────┐  ┌──────────────────────┐
│   Fluree Process 1   │  │   Fluree Process 2   │
│  ┌────────────────┐  │  │  ┌────────────────┐  │
│  │  AWS SDK       │  │  │  │  AWS SDK       │  │
│  └────────┬───────┘  │  │  └────────┬───────┘  │
└───────────┼──────────┘  └───────────┼──────────┘
            │                         │
            └────────┬────────────────┘
                     │
          ┌──────────▼──────────┐
          │     AWS Cloud       │
          │  ┌──────┐  ┌──────┐│
          │  │  S3  │  │Dynamo││
          │  └──────┘  └──────┘│
          └─────────────────────┘

Multiple processes coordinate via AWS.

IPFS Storage

┌──────────────────────┐
│   Fluree Process     │
│  ┌────────────────┐  │
│  │  IpfsStorage   │  │
│  │  (HTTP client) │  │
│  └────────┬───────┘  │
└───────────┼──────────┘
            │ HTTP RPC
     ┌──────▼──────┐
     │  Kubo Node  │
     │  (IPFS)     │
     └──────┬──────┘
            │ libp2p
     ┌──────▼──────┐
     │  IPFS P2P   │
     │  Network    │
     └─────────────┘

Data stored as content-addressed blocks in IPFS via Kubo.

Storage Encryption

Fluree supports transparent AES-256-GCM encryption for data at rest. When enabled, all data is automatically encrypted before being written to storage.

Enabling Encryption

# Generate a 32-byte encryption key
export FLUREE_ENCRYPTION_KEY=$(openssl rand -base64 32)

Configure via JSON-LD (file storage):

{
  "@context": {"@vocab": "https://ns.flur.ee/system#"},
  "@graph": [{
    "@type": "Connection",
    "indexStorage": {
      "@type": "Storage",
      "filePath": "/var/lib/fluree",
      "AES256Key": {"envVar": "FLUREE_ENCRYPTION_KEY"}
    }
  }]
}

For S3 storage with encryption:

{
  "@context": {"@vocab": "https://ns.flur.ee/system#"},
  "@graph": [{
    "@type": "Connection",
    "indexStorage": {
      "@type": "Storage",
      "s3Bucket": "my-fluree-bucket",
      "s3Endpoint": "https://s3.us-east-1.amazonaws.com",
      "AES256Key": {"envVar": "FLUREE_ENCRYPTION_KEY"}
    }
  }]
}

Key Features:

  • AES-256-GCM authenticated encryption
  • Works natively with all storage backends (memory, file, S3)
  • Transparent encryption/decryption on read/write
  • Portable ciphertext format (encrypted data can be moved between backends)
  • Environment variable support for key configuration

See Storage Encryption for full documentation.

File Storage Details

Directory Structure

/var/lib/fluree/
├── ns@v2/                    # Nameservice records
│   ├── mydb/
│   │   ├── main.json        # Ledger metadata
│   │   └── dev.json
│   └── customers/
│       └── main.json
├── commit/                   # Transaction commits
│   ├── abc123def456.commit
│   └── def456abc789.commit
├── index/                    # Index snapshots
│   ├── mydb-main-t100.idx
│   └── mydb-main-t150.idx
└── graph-sources/            # Graph sources
    └── products-search/
        └── main/
            └── bm25/
                ├── manifest.json
                └── t150/
                    └── snapshot.bin

File Formats

Nameservice (JSON):

{
  "ledger_id": "mydb:main",
  "commit_t": 150,
  "index_t": 145,
  "commit_id": "bafybeig...commitT150",
  "index_id": "bafybeig...indexRootT145"
}

Commits (Binary):

  • Compressed flake data
  • Transaction metadata
  • Cryptographic signatures

Indexes (Binary):

  • SPOT, POST, OPST, PSOT trees
  • Optimized for query performance

File System Requirements

Minimum:

  • 10 GB free space
  • SSD recommended (HDD acceptable)
  • Sufficient IOPS for workload

Recommended:

  • 100 GB+ free space
  • NVMe SSD
  • High IOPS capability
  • Regular backups

AWS Storage Details

S3 Structure

s3://fluree-prod-data/
├── commit/
│   ├── abc123def456.commit
│   └── def456abc789.commit
├── index/
│   ├── mydb-main-t100.idx
│   └── mydb-main-t150.idx
└── graph-sources/
    └── products-search/
        └── main/
            └── bm25/
                ├── manifest.json
                └── t150/
                    └── snapshot.bin

DynamoDB Schema

The nameservice uses a DynamoDB table with a composite primary key (pk + sk) for ledger and graph source metadata coordination. Each ledger or graph source is stored as multiple items (one per concern) under the same partition key.

See DynamoDB Nameservice Guide for:

  • Complete table schema with composite-key layout
  • Table creation scripts (AWS CLI, CloudFormation, Terraform)
  • GSI setup for listing by kind
  • Local development setup with LocalStack
  • Production considerations and troubleshooting

Quick Reference:

Table: fluree-nameservice
Primary Key: pk (String, ledger-id) + sk (String, concern)
Sort Key Values: meta, head, index, config, status
GSI1 (gsi1-kind): kind (HASH) + pk (RANGE)
Items per ledger: 5 (meta, head, index, config, status)
Items per graph source: 4 (meta, config, index, status)

AWS Permissions

Required IAM permissions:

S3:

{
  "Effect": "Allow",
  "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:ListBucket"
  ],
  "Resource": [
    "arn:aws:s3:::fluree-prod-data",
    "arn:aws:s3:::fluree-prod-data/*"
  ]
}

DynamoDB:

{
  "Effect": "Allow",
  "Action": [
    "dynamodb:GetItem",
    "dynamodb:PutItem",
    "dynamodb:UpdateItem",
    "dynamodb:Query",
    "dynamodb:BatchGetItem"
  ],
  "Resource": [
    "arn:aws:dynamodb:us-east-1:*:table/fluree-nameservice",
    "arn:aws:dynamodb:us-east-1:*:table/fluree-nameservice/index/gsi1-kind"
  ]
}

Cost Considerations

S3 Costs:

  • Storage: ~$0.023/GB/month (Standard)
  • PUT requests: ~$0.005/1000 requests
  • GET requests: ~$0.0004/1000 requests

DynamoDB Costs:

  • Provisioned: ~$0.25/WCU/month + $0.05/RCU/month
  • On-Demand: ~$1.25/million writes + $0.25/million reads

Typical Monthly Costs (medium deployment):

  • S3: $50-200 (depending on data size)
  • DynamoDB: $10-50 (depending on traffic)
  • Total: $60-250/month

Choosing a Storage Mode

Decision Matrix

RequirementMemoryFileAWSIPFS
DevelopmentBestGoodOverkillOverkill
Single serverNoBestOverkillGood
Multi-serverNoNoBestGood
PersistenceNoYesYesYes
Cloud-nativeNoNoYesNo
DecentralizedNoNoNoBest
Content integrityNoNoNoBest
CostFreeFreeMonthlyFree
Setup complexityTrivialSimpleComplexModerate
PerformanceFastestFastGoodGood
DurabilityNoneLocal11 9’sNetwork-wide

Recommendations

Use Memory when:

  • Developing locally
  • Running tests
  • Data is temporary
  • Maximum performance needed

Use File when:

  • Single server deployment
  • Local persistence needed
  • Simple setup preferred
  • Predictable costs important

Use AWS when:

  • Multiple servers needed
  • High availability required
  • Geographic distribution needed
  • Cloud-native architecture

Use IPFS when:

  • Decentralized storage required
  • Content integrity verification is critical
  • Cross-organization data sharing
  • Building toward IPNS/ENS-based ledger discovery
  • Censorship resistance is a requirement

Switching Storage Modes

Memory to File

Export from the running system and import into the new one:

# Export from memory
curl -X POST http://localhost:8090/export?ledger=mydb:main > mydb-export.jsonld

# Stop memory server, start file server
./fluree-db-server --storage file --data-dir /var/lib/fluree

# Import to file storage
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
  --data-binary @mydb-export.jsonld

File to AWS

Copy files to S3 and create the nameservice table:

# Copy data directory to S3
aws s3 sync /var/lib/fluree/ s3://fluree-prod-data/

# Create DynamoDB table (see docs/operations/dynamodb-guide.md for full schema)
aws dynamodb create-table \
  --table-name fluree-nameservice \
  --attribute-definitions \
    AttributeName=pk,AttributeType=S \
    AttributeName=sk,AttributeType=S \
    AttributeName=kind,AttributeType=S \
  --key-schema \
    AttributeName=pk,KeyType=HASH \
    AttributeName=sk,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

# Start AWS-backed server
./fluree-db-server --storage aws --s3-bucket fluree-prod-data

AWS to File

Download from S3:

# Download data from S3
aws s3 sync s3://fluree-prod-data/ /var/lib/fluree/

# Start file-backed server
./fluree-db-server --storage file --data-dir /var/lib/fluree

Backup and Recovery

Memory Storage

No native backup (data is ephemeral):

# Export ledger
curl -X POST http://localhost:8090/export?ledger=mydb:main > backup.jsonld

File Storage

Backup data directory:

# Stop server (recommended)
systemctl stop fluree

# Backup
tar -czf fluree-backup-$(date +%Y%m%d).tar.gz /var/lib/fluree/

# Start server
systemctl start fluree

For online backups, prefer storage-level snapshots or object-store versioning. The standalone server does not currently expose HTTP read-only toggle endpoints.

AWS Storage

Use S3 versioning and lifecycle policies:

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket fluree-prod-data \
  --versioning-configuration Status=Enabled

# Configure lifecycle
aws s3api put-bucket-lifecycle-configuration \
  --bucket fluree-prod-data \
  --lifecycle-configuration file://lifecycle.json

DynamoDB backups:

# Enable point-in-time recovery
aws dynamodb update-continuous-backups \
  --table-name fluree-nameservice \
  --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true

Troubleshooting

File Storage

Permission Errors:

sudo chown -R fluree:fluree /var/lib/fluree
chmod -R 755 /var/lib/fluree

Disk Full:

# Check space
df -h /var/lib/fluree

# Force a full index refresh
curl -X POST http://localhost:8090/v1/fluree/reindex \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

AWS Storage

Connection Errors:

  • Verify AWS credentials
  • Check IAM permissions
  • Verify S3 bucket exists
  • Check DynamoDB table exists

Throttling:

  • Increase DynamoDB capacity
  • Use provisioned capacity mode
  • Implement retry logic

IPFS Storage

Fluree can use IPFS as a content-addressed storage backend via the Kubo HTTP RPC API. This enables decentralized, content-addressed data storage where every piece of data is identified by its cryptographic hash.

Feature flag: Requires the ipfs feature to be enabled at compile time. Build with: cargo build --features ipfs

Overview

IPFS storage maps naturally to Fluree’s content-addressed architecture. Fluree already identifies every blob (commits, transactions, index nodes) with a CIDv1 content identifier using SHA-256 hashing and Fluree-specific multicodec values. When IPFS is used as the storage backend, these CIDs are stored directly into IPFS via a local Kubo node.

Key properties:

  • Content-addressed: data is identified by its SHA-256 hash, providing built-in integrity verification
  • Immutable: once written, data cannot be modified or deleted (only unpinned for garbage collection)
  • Decentralized: data can be replicated across IPFS nodes without centralized coordination
  • Compatible: Fluree’s native CIDs work directly with IPFS (no translation layer needed)

Kubo Setup

Kubo (formerly go-ipfs) is the reference IPFS implementation. Fluree communicates with Kubo via its HTTP RPC API (default port 5001).

Install Kubo

macOS (Homebrew):

brew install ipfs

Linux (official binary):

wget https://dist.ipfs.tech/kubo/v0.32.1/kubo_v0.32.1_linux-amd64.tar.gz
tar xvfz kubo_v0.32.1_linux-amd64.tar.gz
cd kubo
sudo ./install.sh

Docker:

docker run -d \
  --name ipfs \
  -p 4001:4001 \
  -p 5001:5001 \
  -p 8080:8080 \
  -v ipfs_data:/data/ipfs \
  ipfs/kubo:latest

Initialize and Start

# Initialize IPFS (first time only)
ipfs init

# Start the daemon
ipfs daemon

Verify the node is running:

# Check node identity
curl -s -X POST http://127.0.0.1:5001/api/v0/id | jq .ID

Security Note

The Kubo HTTP RPC API (port 5001) provides full administrative access to the IPFS node. By default, it listens only on 127.0.0.1. Do not expose port 5001 to the public internet. If Fluree and Kubo run on different hosts, use SSH tunneling, a VPN, or a reverse proxy with authentication.

The IPFS gateway (port 8080) is read-only and can be exposed publicly if desired.

Configuration

JSON-LD Configuration

{
  "@context": {
    "@base": "https://ns.flur.ee/config/connection/",
    "@vocab": "https://ns.flur.ee/system#"
  },
  "@graph": [
    {
      "@id": "ipfsStorage",
      "@type": "Storage",
      "ipfsApiUrl": "http://127.0.0.1:5001",
      "ipfsPinOnPut": true
    },
    {
      "@id": "connection",
      "@type": "Connection",
      "indexStorage": { "@id": "ipfsStorage" }
    }
  ]
}

Flat JSON Configuration

{
  "indexStorage": {
    "@type": "IpfsStorage",
    "ipfsApiUrl": "http://127.0.0.1:5001",
    "ipfsPinOnPut": true
  }
}

Configuration Fields

FieldTypeDefaultDescription
ipfsApiUrlstringhttp://127.0.0.1:5001Kubo HTTP RPC API base URL
ipfsPinOnPutbooleantruePin blocks after writing (prevents garbage collection)

Both fields support ConfigurationValue indirection (env vars):

{
  "ipfsApiUrl": { "envVar": "FLUREE_IPFS_API_URL", "defaultVal": "http://127.0.0.1:5001" },
  "ipfsPinOnPut": true
}

Architecture

┌──────────────────────┐
│   Fluree Process     │
│  ┌────────────────┐  │
│  │  IpfsStorage   │  │
│  │  (HTTP client) │  │
│  └────────┬───────┘  │
└───────────┼──────────┘
            │ HTTP RPC
     ┌──────▼──────┐
     │  Kubo Node  │
     │  (port 5001)│
     └──────┬──────┘
            │ libp2p
     ┌──────▼──────┐
     │  IPFS P2P   │
     │  Network    │
     └─────────────┘

Fluree communicates with a local Kubo node via the HTTP RPC API. The Kubo node handles peer-to-peer networking, block storage, and replication with the broader IPFS network.

API Endpoints Used

Kubo EndpointPurpose
POST /api/v0/block/putStore a block with optional codec and hash type
POST /api/v0/block/getRetrieve a block by CID
POST /api/v0/block/statCheck if a block exists (metadata only)
POST /api/v0/pin/addPin a block to prevent garbage collection
POST /api/v0/idHealth check (verify node is reachable)

Content Addressing

How Fluree CIDs Map to IPFS

Fluree uses CIDv1 with SHA-256 multihash and private-use multicodec values:

Content KindMulticodecHexExample
Commitfluree-commit0x300001bafybeig...
Transactionfluree-txn0x300002bafybeig...
Index Rootfluree-index-root0x300003bafybeig...
Index Branchfluree-index-branch0x300004bafybeig...
Index Leaffluree-index-leaf0x300005bafybeig...
Dict Blobfluree-dict-blob0x300006bafybeig...
Garbage Recordfluree-garbage0x300007bafybeig...
Ledger Configfluree-ledger-config0x300008bafybeig...
Stats Sketchfluree-stats-sketch0x300009bafybeig...
Graph Source Snapshotfluree-graph-source-snapshot0x30000Abafybeig...
Spatial Indexfluree-spatial-index0x30000Bbafybeig...

These are in the multicodec private-use range (0x300000+). Kubo accepts them via the cid-codec parameter and resolves blocks by multihash regardless of codec. This means Fluree’s native CIDs work directly with IPFS without any translation layer.

Cross-Codec Retrieval

IPFS block storage is keyed by multihash internally. A block stored with codec 0x300001 (Fluree commit) can be retrieved using a CID with codec 0x55 (raw) as long as the SHA-256 digest is the same. This simplifies the address-based StorageRead implementation: given a Fluree address containing a hash, we can construct any CID with that hash to fetch the block.

Pinning

What is Pinning?

IPFS nodes periodically garbage-collect unpinned blocks to free disk space. Pinning tells the node to keep specific blocks permanently. Without pinning, blocks may be removed from the local node (though they remain available on other nodes that have them).

Default Behavior

Fluree pins every block on write when ipfsPinOnPut is true (the default). This ensures that:

  • All committed data survives Kubo garbage collection
  • The local node serves as a reliable storage backend
  • Blocks remain available even if no other node has them

When to Disable Pinning

Set ipfsPinOnPut: false when:

  • Running integration tests (faster, less disk usage)
  • Using a separate pinning service (Pinata, web3.storage, etc.)
  • The Kubo node is configured with --enable-gc=false

Pinning Services

For production deployments, consider using a remote pinning service for redundancy:

# Add a remote pinning service
ipfs pin remote service add pinata https://api.pinata.cloud/psa YOUR_JWT

# Pin a CID to the remote service
ipfs pin remote add --service=pinata bafybeig...

Limitations

No Prefix Listing

IPFS is a content-addressed store with no concept of directory listing or prefix enumeration. The list_prefix() operation returns an error. Operations that require listing (e.g., ledger discovery, GC scans) must use an alternative strategy such as manifest-based tracking.

No Deletion

IPFS content is immutable. The delete() operation is a no-op. Data removal is handled through:

  1. Unpinning the block on the local node
  2. Waiting for Kubo’s garbage collector to reclaim space
  3. The block may still exist on other IPFS nodes

Nameservice

IPFS storage currently requires a separate nameservice (file-based or DynamoDB) for ledger metadata. A future phase will add IPNS and/or ENS-based decentralized nameservices.

Latency

Writes go through the Kubo HTTP RPC API, adding HTTP overhead compared to direct file I/O. For latency-sensitive workloads, ensure Kubo runs on the same host as Fluree (localhost communication).

No Encryption

The IPFS storage backend does not currently support Fluree’s AES256Key encryption. Blocks are stored unencrypted in IPFS. If encryption is needed, use a separate encryption layer or a private IPFS network.

Storage Addresses

Fluree addresses for IPFS storage follow the standard format:

fluree:ipfs://{ledger_id}/{kind_dir}/{hash_hex}.{ext}

Examples:

fluree:ipfs://mydb/main/commit/a1b2c3...f6a1b2.fcv2
fluree:ipfs://mydb/main/index/roots/d4e5f6...c3d4e5.json
fluree:ipfs://mydb/main/index/spot/abc123...def456.fli

The hash hex in the filename is extracted and used to construct a CID for retrieval from IPFS.

Operational Considerations

Disk Usage

Kubo stores blocks in a local datastore (by default, a LevelDB-based flatfs at ~/.ipfs/blocks/). Monitor disk usage:

# Check IPFS repo size
ipfs repo stat

# Run garbage collection (removes unpinned blocks)
ipfs repo gc

Network Bandwidth

By default, Kubo participates in the IPFS DHT and may serve blocks to other nodes. For a private deployment:

# Disable DHT (private node)
ipfs config Routing.Type none

# Or use a private IPFS network with a swarm key
# See: https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#private-networks

Performance Tuning

# Increase concurrent connections
ipfs config Swarm.ConnMgr.HighWater 300

# Adjust datastore cache
ipfs config Datastore.BloomFilterSize 1048576

# Disable automatic GC (if using external pinning)
ipfs config --json Datastore.GCPeriod '"0"'

Monitoring

Check Kubo node health:

# Node identity and version
ipfs id

# Connected peers
ipfs swarm peers | wc -l

# Repo statistics
ipfs repo stat

# Bandwidth usage
ipfs stats bw

Troubleshooting

Connection Refused

IPFS node connection failed: http://127.0.0.1:5001

Causes:

  • Kubo daemon is not running
  • Kubo is listening on a different address/port
  • Firewall blocking the connection

Fix:

# Start the daemon
ipfs daemon

# Or check what address it's listening on
ipfs config Addresses.API

Block Not Found

IPFS block not found: bafybeig...

Causes:

  • Block was never stored on this node
  • Block was unpinned and garbage collected
  • CID format mismatch

Fix:

# Check if block exists locally
ipfs block stat bafybeig...

# Try fetching from the network
ipfs block get bafybeig... > /dev/null

Slow Writes

Causes:

  • Kubo node under heavy load
  • Network latency (if Kubo is remote)
  • Disk I/O bottleneck

Fix:

  • Run Kubo on the same host as Fluree
  • Use SSD storage for the IPFS datastore
  • Consider disabling DHT for private deployments

Future Roadmap

Phase 2: Decentralized Nameservice

The IPFS storage backend is designed as the foundation for decentralized Fluree deployments. Planned additions:

  • IPNS: Publish mutable pointers to ledger state (commit head, index root)
  • ENS / L2 chain: On-chain CID pointers for trustless ledger discovery
  • Two-tier nameservice: Local nameservice for fast reads with async push to decentralized upstream (similar to git push)

Content Pinning Strategy

Future versions may support:

  • Automatic pinning profiles (pin commits only, pin everything, pin nothing)
  • Integration with remote pinning services (Pinata, web3.storage)
  • Manifest-based tracking for GC and prefix listing

DynamoDB Nameservice Guide

Overview

Fluree supports Amazon DynamoDB as a nameservice backend for storing ledger and graph source metadata. The DynamoDB nameservice provides:

  • Item-per-concern independence: Each concern (commit head, index, status, config) is a separate DynamoDB item, eliminating physical write contention between transactors and indexers
  • Atomic conditional updates: Reduced logical contention via conditional expressions
  • Strong consistency reads: Always see the latest data
  • High availability: DynamoDB’s built-in redundancy and durability
  • Unified ledger + graph source support: Both ledgers and graph sources (BM25, Vector, Iceberg, etc.) share the same table with a composite key

Why DynamoDB for Nameservice?

The nameservice stores metadata about ledgers and graph sources: commit IDs, index state, status, and configuration. In high-throughput scenarios, transactors and indexers may update this metadata concurrently.

DynamoDB solves this because:

  1. Item-per-concern layout: Each concern (head, index, status, config) is a separate DynamoDB item under the same partition key, so writes to different concerns never contend at the physical level
  2. Conditional updates: Each update only proceeds if the new watermark advances monotonically
  3. No read-modify-write cycles (for the write itself): Updates are atomic; callers should still expect occasional conditional-update conflicts under contention and retry where appropriate

Graph Sources (non-ledger)

Graph sources (BM25, Vector, Iceberg, etc.) are stored in the same nameservice table as ledgers. Under the graph-source-owned manifest design, the nameservice does not store snapshot history for graph sources.

  • For ledgers, index_id points to a ledger index root.
  • For graph sources, index_id points to a graph-source-owned root/manifest in storage (opaque to nameservice).
  • Snapshot history (if any) is stored in storage and managed by the graph source implementation.

This keeps DynamoDB schema stable: no unbounded “snapshot history” list is stored in the DynamoDB item.

Table Setup

Schema Overview

The table uses a composite primary key (pk + sk) with a Global Secondary Index (GSI) for listing by kind.

  • pk (Partition Key, String): Alias in name:branch form (e.g., mydb:main)
  • sk (Sort Key, String): Concern discriminator (meta, head, index, config, status)
  • GSI1 (gsi1-kind): Enables efficient listing of all ledgers or all graph sources

AWS CLI

aws dynamodb create-table \
  --table-name fluree-nameservice \
  --attribute-definitions \
    AttributeName=pk,AttributeType=S \
    AttributeName=sk,AttributeType=S \
    AttributeName=kind,AttributeType=S \
  --key-schema \
    AttributeName=pk,KeyType=HASH \
    AttributeName=sk,KeyType=RANGE \
  --global-secondary-indexes '[
    {
      "IndexName": "gsi1-kind",
      "KeySchema": [
        {"AttributeName": "kind", "KeyType": "HASH"},
        {"AttributeName": "pk", "KeyType": "RANGE"}
      ],
      "Projection": {
        "ProjectionType": "INCLUDE",
        "NonKeyAttributes": ["name", "branch", "source_type", "dependencies", "retracted"]
      }
    }
  ]' \
  --billing-mode PAY_PER_REQUEST

CloudFormation

Resources:
  FlureeNameserviceTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: fluree-nameservice
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: pk
          AttributeType: S
        - AttributeName: sk
          AttributeType: S
        - AttributeName: kind
          AttributeType: S
      KeySchema:
        - AttributeName: pk
          KeyType: HASH
        - AttributeName: sk
          KeyType: RANGE
      GlobalSecondaryIndexes:
        - IndexName: gsi1-kind
          KeySchema:
            - AttributeName: kind
              KeyType: HASH
            - AttributeName: pk
              KeyType: RANGE
          Projection:
            ProjectionType: INCLUDE
            NonKeyAttributes:
              - name
              - branch
              - source_type
              - dependencies
              - retracted
      PointInTimeRecoverySpecification:
        PointInTimeRecoveryEnabled: true
      Tags:
        - Key: Application
          Value: Fluree

Terraform

resource "aws_dynamodb_table" "fluree_nameservice" {
  name         = "fluree-nameservice"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "pk"
  range_key    = "sk"

  attribute {
    name = "pk"
    type = "S"
  }

  attribute {
    name = "sk"
    type = "S"
  }

  attribute {
    name = "kind"
    type = "S"
  }

  global_secondary_index {
    name            = "gsi1-kind"
    hash_key        = "kind"
    range_key       = "pk"
    projection_type = "INCLUDE"
    non_key_attributes = [
      "name",
      "branch",
      "source_type",
      "dependencies",
      "retracted",
    ]
  }

  point_in_time_recovery {
    enabled = true
  }

  tags = {
    Application = "Fluree"
  }
}

Programmatic Table Creation

Fluree’s DynamoDbNameService also provides an ensure_table() method that creates the table with the correct schema if it doesn’t already exist:

#![allow(unused)]
fn main() {
use fluree_db_storage_aws::dynamodb::DynamoDbNameService;

let ns = DynamoDbNameService::from_client(dynamodb_client, "fluree-nameservice".to_string());
ns.ensure_table().await?;
}

This is used by integration tests and can be used for bootstrapping development environments.

Table Schema

Primary Key

AttributeTypeDescription
pkString (Partition Key)Alias in name:branch form (e.g., mydb:main)
skString (Sort Key)Concern discriminator: meta, head, index, config, status

Items per Alias

Each ledger or graph source is represented as multiple items under the same pk:

Ledger (5 items):

Sort Key (sk)DescriptionKey Attributes
metaIdentity and metadatakind, name, branch, retracted, schema
headCommit head pointercommit_id, commit_t
indexIndex head pointerindex_id, index_t
configLedger configurationdefault_context_id, config_v, config_meta
statusOperational statusstatus, status_v, status_meta

Graph Source (4 items):

Sort Key (sk)DescriptionKey Attributes
metaIdentity and metadatakind, source_type, name, branch, dependencies, retracted, schema
configSource configurationconfig_json, config_v
indexIndex head pointerindex_id, index_t
statusOperational statusstatus, status_v, status_meta

Attribute Reference

All items share these common attributes:

AttributeTypeDescription
pkStringRecord address (name:branch)
skStringConcern discriminator
schemaNumberSchema version (always 2)
updated_at_msNumberLast update timestamp (epoch milliseconds)

meta item:

AttributeTypeDescription
kindStringledger or graph_source
nameStringBase name (reserved word — use #name in expressions)
branchStringBranch name
retractedBooleanSoft-delete flag
source_typeString (graph source only)Graph-source type (e.g., f:Bm25Index)
dependenciesList<String> (graph source only)Dependent ledger IDs

head item (ledgers only):

AttributeTypeDescription
commit_idString | nullLatest commit ContentId (CIDv1)
commit_tNumberCommit watermark (t). 0 = unborn.

index item (ledgers + graph sources):

AttributeTypeDescription
index_idString | nullLatest index ContentId (CIDv1)
index_tNumberIndex watermark (t). 0 = unborn.

config item:

AttributeTypeDescription
default_context_idString | nullDefault JSON-LD context ContentId (ledger)
config_jsonString | nullOpaque JSON config string (graph source)
config_vNumberConfig version watermark
config_metaMap | nullExtensible config metadata (ledger)

status item:

AttributeTypeDescription
statusStringCurrent state (reserved word — use #st in expressions)
status_vNumberStatus version watermark
status_metaMap | nullExtensible status metadata

GSI1: gsi1-kind

Enables listing all entities of a given kind (ledger or graph source).

GSI AttributeSource AttributeDescription
Partition Keykindledger or graph_source
Sort KeypkRecord address
Projectedname, branch, source_type, dependencies, retractedMeta fields for listing without additional reads

Only meta items carry the kind attribute and project into the GSI.

Initialization Semantics

All concern items are created atomically at initialization time. This is a key structural decision:

  • publish_ledger_init creates all 5 items (meta, head, index, config, status) via TransactWriteItems
  • publish_graph_source creates all 4 items (meta, config, index, status) via TransactWriteItems

Subsequent writes usually use UpdateItem operations (compare_and_set_ref, publish_index, push_status, push_config). The one exception is commit-head CAS on an unknown ledger ID with expected=None, where the backend bootstraps the ledger atomically via TransactWriteItems.

How Updates Work

Commit updates (transactor):

UpdateItem Key: { pk: "mydb:main", sk: "head" }
UpdateExpression: SET commit_id = :cid, commit_t = :t, updated_at_ms = :now
ConditionExpression: attribute_exists(pk) AND commit_t < :t

Index updates (indexer):

UpdateItem Key: { pk: "mydb:main", sk: "index" }
UpdateExpression: SET index_id = :cid, index_t = :t, updated_at_ms = :now
ConditionExpression: attribute_exists(pk) AND index_t < :t

Since commit and index updates target different items (different sk), they never contend at the DynamoDB physical level.

Status updates (CAS):

UpdateItem Key: { pk: "mydb:main", sk: "status" }
UpdateExpression: SET #st = :new_state, status_v = :new_v, updated_at_ms = :now
ConditionExpression: status_v = :expected_v AND #st = :expected_state

Config updates (CAS):

UpdateItem Key: { pk: "mydb:main", sk: "config" }
UpdateExpression: SET default_context_id = :ctx, config_v = :new_v, updated_at_ms = :now
ConditionExpression: config_v = :expected_v

RefPublisher updates (compare-and-set refs):

  • CommitHead uses strict monotonic guard: new.t > current.t
  • IndexHead allows same-watermark overwrite: new.t >= current.t (reindex at same t)

When a caller attempts compare_and_set_ref(expected=None) on an unknown ledger ID, the DynamoDB backend bootstraps the ledger by creating all 5 ledger concern items via TransactWriteItems and pre-setting the target ref to the requested value.

Retract:

UpdateItem Key: { pk: "mydb:main", sk: "meta" }
UpdateExpression: SET retracted = :true, updated_at_ms = :now

DynamoDB Reserved Words

The attributes name and status are DynamoDB reserved words. All expressions (reads, updates, projections) must use ExpressionAttributeNames:

ExpressionAttributeNames: { "#name": "name", "#st": "status" }

Trait Implementations

The DynamoDB nameservice implements all seven nameservice traits:

TraitDescription
NameServiceLookup, ledger ID resolution, list all records
PublisherInitialize ledgers, publish indexes, retract
AdminPublisherAdmin index publishing (allows equal-t overwrites)
RefPublisherCompare-and-set on commit/index refs
StatusPublisherCAS-based status updates
ConfigPublisherCAS-based config updates (ledgers only)
GraphSourceLookupRead-only graph source discovery: lookup, list all records
GraphSourcePublisherGraph source lifecycle (extends GraphSourceLookup): create, index, retract

Note: ConfigPublisher is scoped to ledgers only. Graph source configuration is managed through GraphSourcePublisher, which stores config as an opaque JSON string (config_json). GraphSourceLookup is a supertrait of NameService, so all nameservice implementations automatically support graph source discovery. GraphSourcePublisher adds write operations and is required only by APIs that create or drop graph sources.

Configuration

JSON-LD Connection Configuration

{
  "@context": {
    "@vocab": "https://ns.flur.ee/system#"
  },
  "@graph": [
    {
      "@id": "s3Storage",
      "@type": "Storage",
      "s3Bucket": "fluree-production-data",
      "s3Endpoint": "https://s3.us-east-1.amazonaws.com",
      "s3Prefix": "ledgers",
      "addressIdentifier": "prod-s3"
    },
    {
      "@id": "dynamodbNs",
      "@type": "Publisher",
      "dynamodbTable": "fluree-nameservice",
      "dynamodbRegion": "us-east-1"
    },
    {
      "@id": "connection",
      "@type": "Connection",
      "parallelism": 4,
      "cacheMaxMb": 1000,
      "commitStorage": {"@id": "s3Storage"},
      "indexStorage": {"@id": "s3Storage"},
      "primaryPublisher": {"@id": "dynamodbNs"}
    }
  ]
}

Configuration Options

FieldRequiredDescriptionDefault
dynamodbTableYesDynamoDB table name-
dynamodbRegionNoAWS regionus-east-1
dynamodbEndpointNoCustom endpoint URL (for LocalStack)AWS default
dynamodbTimeoutMsNoRequest timeout in milliseconds5000

AWS Credentials

Authentication Methods

The DynamoDB nameservice uses the standard AWS SDK credential chain:

  1. Environment Variables

    export AWS_ACCESS_KEY_ID=your_access_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export AWS_REGION=us-east-1
    
  2. AWS Credentials File (~/.aws/credentials)

    [default]
    aws_access_key_id = your_access_key
    aws_secret_access_key = your_secret_key
    region = us-east-1
    
  3. IAM Roles (when running on EC2/ECS/Lambda)

    • Automatically uses instance/task role credentials
  4. Session Tokens (for temporary credentials)

    export AWS_SESSION_TOKEN=your_session_token
    

Required IAM Permissions

Full permissions (recommended):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:Query",
        "dynamodb:BatchGetItem"
      ],
      "Resource": [
        "arn:aws:dynamodb:*:*:table/fluree-nameservice",
        "arn:aws:dynamodb:*:*:table/fluree-nameservice/index/gsi1-kind"
      ]
    }
  ]
}

If you also use ensure_table() for automated table creation (development/testing):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:Query",
        "dynamodb:BatchGetItem",
        "dynamodb:CreateTable",
        "dynamodb:DescribeTable"
      ],
      "Resource": [
        "arn:aws:dynamodb:*:*:table/fluree-nameservice",
        "arn:aws:dynamodb:*:*:table/fluree-nameservice/index/gsi1-kind"
      ]
    }
  ]
}

Minimal permissions (if not using all_records, all_graph_source_records, or graph sources):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:*:*:table/fluree-nameservice"
    }
  ]
}

Local Development

Using LocalStack

  1. Start LocalStack

    docker run -d --name localstack \
      -p 4566:4566 \
      -e SERVICES=dynamodb \
      localstack/localstack
    
  2. Create Test Table

    AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=test \
    aws --endpoint-url=http://localhost:4566 dynamodb create-table \
      --table-name fluree-nameservice \
      --attribute-definitions \
        AttributeName=pk,AttributeType=S \
        AttributeName=sk,AttributeType=S \
        AttributeName=kind,AttributeType=S \
      --key-schema \
        AttributeName=pk,KeyType=HASH \
        AttributeName=sk,KeyType=RANGE \
      --global-secondary-indexes '[
        {
          "IndexName": "gsi1-kind",
          "KeySchema": [
            {"AttributeName": "kind", "KeyType": "HASH"},
            {"AttributeName": "pk", "KeyType": "RANGE"}
          ],
          "Projection": {
            "ProjectionType": "INCLUDE",
            "NonKeyAttributes": ["name", "branch", "source_type", "dependencies", "retracted"]
          }
        }
      ]' \
      --billing-mode PAY_PER_REQUEST
    
  3. Configure Fluree

    {
      "@id": "dynamodbNs",
      "@type": "Publisher",
      "dynamodbTable": "fluree-nameservice",
      "dynamodbEndpoint": "http://localhost:4566",
      "dynamodbRegion": "us-east-1"
    }
    
  4. Set Environment Variables

    export AWS_ACCESS_KEY_ID=test
    export AWS_SECRET_ACCESS_KEY=test
    

Using DynamoDB Local

  1. Start DynamoDB Local

    docker run -d --name dynamodb-local \
      -p 8000:8000 \
      amazon/dynamodb-local
    
  2. Create Test Table (same command as LocalStack, change --endpoint-url to http://localhost:8000)

Production Considerations

Performance

  • DynamoDB provides single-digit millisecond latency
  • The item-per-concern layout eliminates physical contention between transactors and indexers
  • Use on-demand (PAY_PER_REQUEST) billing for variable workloads
  • Consider provisioned capacity for predictable high-throughput scenarios
  • Enable DynamoDB Accelerator (DAX) if sub-millisecond reads are needed

Security

  • Use IAM roles instead of access keys when possible
  • Enable encryption at rest (default for new tables)
  • Use VPC endpoints for private DynamoDB access
  • Enable CloudTrail for audit logging

Monitoring

Set up CloudWatch alarms for:

  • ConditionalCheckFailedRequests - indicates contention (usually normal)
  • ThrottledRequests - capacity issues
  • SystemErrors - service issues
  • SuccessfulRequestLatency - track latency

Backup and Recovery

# Enable Point-in-Time Recovery
aws dynamodb update-continuous-backups \
  --table-name fluree-nameservice \
  --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true

# Create on-demand backup
aws dynamodb create-backup \
  --table-name fluree-nameservice \
  --backup-name fluree-ns-backup-$(date +%Y%m%d)

Cost Optimization

  • On-demand pricing is cost-effective for variable workloads
  • Table data is small (5 items per ledger, 4 per graph source), so costs are minimal
  • Typical costs: $1-10/month for small deployments
  • GSI storage adds minimal cost (only meta items project into it)

Troubleshooting

Authentication Failures

Symptoms: Access denied, credential errors

Solutions:

  • Verify AWS credentials are configured
  • Check IAM permissions for the table and GSI
  • Test with AWS CLI:
    aws dynamodb describe-table --table-name fluree-nameservice
    

Table Not Found

Symptoms: ResourceNotFoundException

Solutions:

  • Verify table name is correct
  • Check table is in the correct region
  • Ensure table has finished creating (including GSI)

Timeout Errors

Symptoms: Request timeout

Solutions:

  • Increase dynamodbTimeoutMs configuration
  • Check network connectivity to DynamoDB
  • Verify endpoint URL is correct (especially for LocalStack)

Conditional Check Failures

Symptoms: High rate of ConditionalCheckFailedException in logs

Note: This is usually normal and indicates the system is working correctly. The conditional check prevents overwriting newer data with older data. publish_index stale writes are silently ignored (the newer value is preserved). CAS operations (compare_and_set_ref, push_status, push_config) return the current value so the caller can retry or report a conflict.

Unprocessed Keys (BatchGetItem)

Symptoms: Listing graph sources intermittently returns fewer results under load, or logs show throttling.

Cause: DynamoDB may return UnprocessedKeys in BatchGetItem responses under throttling.

Behavior: Fluree retries UnprocessedKeys with exponential backoff (bounded retries). If retries are exhausted, it returns an error rather than silently dropping items.

Uninitialized Alias Errors

Symptoms: Publish operations fail with “not found” or storage errors

Cause: Attempting to publish_index or other non-bootstrap writes on a ledger ID that was never initialized with publish_ledger_init.

Solution: Ensure ledger initialization happens before index/status/config writes. Normal Fluree transaction commit-head publication uses RefPublisher CAS and can bootstrap an unknown ledger ID when expected=None.

Query peers and replication

This document describes how to run fluree-server in transaction mode (event source + transactions) and peer mode (read replica). It also documents the events stream (/v1/fluree/events) and storage proxy endpoints (/v1/fluree/storage/*) used to keep peers up to date and/or to proxy storage reads.

This guide is written from an operator / end-user standpoint: what to deploy, how to configure it, and what to expect from each mode.

Server roles

fluree-server supports two roles:

  • Transaction server (--server-role transaction)
    • Write-enabled.
    • Produces the nameservice events stream at GET /v1/fluree/events.
    • Optionally exposes storage proxy endpoints at /v1/fluree/storage/*.
  • Query peer (--server-role peer)
    • Read-only API surface for clients (queries, history, etc.).
    • Subscribes to GET /v1/fluree/events from a transaction server to learn about nameservice updates.
    • Reads ledger data from storage (shared-storage deployments), and refreshes on staleness based on the events stream.
    • Forwards write/admin operations to the configured transaction server.

Events stream (SSE): GET /v1/fluree/events

The transaction server exposes a Server-Sent Events (SSE) stream that emits nameservice changes for ledgers and graph sources. Query peers use this stream to stay up to date.

Query parameters

  • all=true: subscribe to all ledgers and graph sources
  • ledger=<ledger_id>: subscribe to a ledger ID (name:branch, repeatable)
  • graph-source=<graph_source_id>: subscribe to a graph source ID (name:branch, repeatable)

Authentication and authorization

The /v1/fluree/events endpoint can be configured to require Bearer tokens:

  • --events-auth-mode none|optional|required
  • --events-auth-audience <aud> (optional)
  • --events-auth-trusted-issuer <did:key:...> (repeatable)

When authentication is enabled, the token can restrict what the client may subscribe to. Requests that ask for resources not covered by the token are silently filtered to the allowed scope.

The repo includes a token generator binary for operator workflows:

  • fluree-events-token: generates Bearer tokens suitable for GET /v1/fluree/events

Peer mode behavior

In peer mode:

  • Write forwarding: write and admin endpoints are forwarded to the transaction server configured by --tx-server-url.
  • Read serving: query endpoints are served locally, using ledger/index data obtained either from shared storage or via storage proxy reads (see below). History queries are executed via the standard /query endpoint with time range specifiers.

Peer configuration (SSE subscription)

  • --server-role peer
  • --tx-server-url <base-url> (required)
  • --peer-events-url <url> (optional; default is {tx_server_url}/v1/fluree/events)
  • --peer-events-token <token-or-@file> (optional; Bearer token for /v1/fluree/events)
  • Subscribe scope:
    • --peer-subscribe-all or
    • --peer-ledger <ledger_id> (repeatable) and/or --peer-graph-source <graph_source_id> (repeatable)

Peer storage access modes

Peer servers support two storage access modes:

  • Shared storage (--storage-access-mode shared, default)
    • The peer reads the same storage backend as the transaction server (shared filesystem, shared bucket credentials, etc.).
    • Requires --storage-path.
  • Proxy storage (--storage-access-mode proxy)
    • The peer does not need direct storage credentials.
    • The peer proxies all storage reads through the transaction server’s /v1/fluree/storage/* endpoints.
    • Requires --tx-server-url and a storage proxy token via --storage-proxy-token or --storage-proxy-token-file.
    • --storage-path is ignored in this mode.

Storage proxy endpoints (transaction server): /v1/fluree/storage/*

Storage proxy endpoints allow a peer to read storage through the transaction server, rather than holding storage credentials directly. This is intended for environments where storage is private and peers cannot access it.

Storage proxy supports two kinds of reads:

  • Raw bytes reads (Accept: application/octet-stream) for any block type (commit blobs, branch nodes, leaf nodes).
  • Policy-filtered leaf flakes reads (Accept: application/x-fluree-flakes) for ledger leaf nodes only.

Enablement

Storage proxy endpoints are disabled by default. Enable them on the transaction server:

  • --storage-proxy-enabled
  • --storage-proxy-trusted-issuer <did:key:...> (repeatable; optional if you reuse --events-auth-trusted-issuer)
  • --storage-proxy-default-identity <iri> (optional; used when token has no fluree.identity)
  • --storage-proxy-default-policy-class <class-iri> (optional; applies policy in addition to identity-based policy)
  • --storage-proxy-debug-headers (optional; debug only—can leak information)

AuthZ claims (Bearer token)

Storage proxy endpoints require a Bearer token that grants storage proxy permissions:

  • fluree.storage.all: true: access all ledgers (graph source artifacts are denied in v1)
  • fluree.storage.ledgers: ["books:main", ...]: access specific ledgers
  • fluree.identity: "ex:PeerServiceAccount" (optional): identity used for policy evaluation in policy-filtered read mode

Unauthorized requests return 404 (no existence leak).

Endpoints

GET /v1/fluree/storage/ns/{ledger-id}

Fetch a nameservice record for a ledger ID. Requires storage proxy authorization for that ledger.

POST /v1/fluree/storage/block

Fetch a block/blob by CID. The request includes the ledger ID so the server can authorize the request and derive the physical storage address internally. Currently supports:

  • Accept: application/octet-stream (raw bytes; always available)
  • Accept: application/x-fluree-flakes (binary “FLKB” transport of policy-filtered leaf flakes only)
  • Accept: application/x-fluree-flakes+json (debug-only JSON flake transport; leaf flakes only)

If the client requests a flakes format for a non-leaf block, the server returns 406 Not Acceptable. Clients (and peers in proxy mode) should retry with Accept: application/octet-stream in that case.

Example request body:

{
  "cid": "bafy...leafOrBranchCid",
  "ledger": "mydb:main"
}
Policy filtering semantics (leaf flakes)

When a flakes format is requested and the block is a ledger leaf:

  • The transaction server loads policy restrictions using the effective identity and effective policy class:
    • effective identity: token fluree.identity if present, otherwise --storage-proxy-default-identity (if configured)
    • effective policy class: --storage-proxy-default-policy-class (if configured; token-driven policy class selection may be added later)
  • If the resolved policy is root/unrestricted, the server returns all leaf flakes (still encoded as FLKB in application/x-fluree-flakes mode).
  • If the resolved policy is non-root, the server filters leaf flakes before encoding them for transport.

Note: the peer can still apply additional client-facing policy enforcement on top of this. Client-side policy can only further restrict results; it cannot “recover” facts filtered out upstream.

Security notes and limitations

  • Branch/commit leakage (v1 limitation): filtering leaves without rewriting branches/commits can leak structure/existence information to the peer identity. This is currently an accepted v1 limitation.
  • Graph source artifacts (v1): storage proxy denies graph-source artifacts by returning 404 even when fluree.storage.all is present.

Deployment examples

Transaction server (events + storage proxy)

fluree-server \
  --listen-addr 0.0.0.0:8090 \
  --server-role transaction \
  --storage-path /var/lib/fluree \
  --events-auth-mode required \
  --events-auth-trusted-issuer did:key:z6Mk... \
  --storage-proxy-enabled

Query peer (shared storage)

fluree-server \
  --listen-addr 0.0.0.0:8091 \
  --server-role peer \
  --tx-server-url http://tx.internal:8090 \
  --storage-path /var/lib/fluree \
  --peer-subscribe-all \
  --peer-events-token @/etc/fluree/peer-events.jwt

Query peer (proxy storage mode)

In proxy storage mode, the peer does not need --storage-path and instead needs a storage proxy token:

fluree-server \
  --listen-addr 0.0.0.0:8091 \
  --server-role peer \
  --tx-server-url http://tx.internal:8090 \
  --storage-access-mode proxy \
  --storage-proxy-token @/etc/fluree/storage-proxy.jwt \
  --peer-subscribe-all \
  --peer-events-token @/etc/fluree/peer-events.jwt

Telemetry and Logging

Fluree provides comprehensive logging, metrics, and tracing capabilities for monitoring and debugging production deployments.

Logging

Log Levels

Configure log verbosity:

--log-level error|warn|info|debug|trace

error: Critical errors only warn: Warnings and errors info: Informational messages (default) debug: Detailed debugging information trace: Very detailed tracing

Log Formats

--log-format json

Output:

{
  "timestamp": "2024-01-22T10:30:00.123Z",
  "level": "INFO",
  "target": "fluree_db_server",
  "message": "Transaction committed",
  "fields": {
    "ledger": "mydb:main",
    "t": 42,
    "duration_ms": 45,
    "flakes_added": 3
  }
}

Benefits:

  • Machine-parseable
  • Easy to index (Elasticsearch, etc.)
  • Structured fields
  • JSON query tools work

Text Format

--log-format text

Output:

2024-01-22T10:30:00.123Z INFO  fluree_db_server] Transaction committed ledger=mydb:main t=42 duration_ms=45

Benefits:

  • Human-readable
  • Compact
  • Easy to grep

Log Output

Standard Output (Default)

./fluree-db-server

Logs to stdout/stderr.

Log File

--log-file /var/log/fluree/server.log
[logging]
file = "/var/log/fluree/server.log"

Log Rotation

Use logrotate:

# /etc/logrotate.d/fluree
/var/log/fluree/*.log {
    daily
    rotate 14
    compress
    delaycompress
    notifempty
    create 0644 fluree fluree
    sharedscripts
    postrotate
        systemctl reload fluree
    endscript
}

Structured Logging

Add context to logs:

#![allow(unused)]
fn main() {
// Rust code (for reference)
info!(
    ledger = %ledger,
    t = transaction_time,
    duration_ms = duration.as_millis(),
    "Transaction committed"
);
}

Output:

{
  "message": "Transaction committed",
  "ledger": "mydb:main",
  "t": 42,
  "duration_ms": 45
}

Metrics

Planned — not yet implemented. The metrics below are a design target for a future PR. Prometheus metrics are not currently exposed by the server. The tracing/OTEL instrumentation described in the rest of this document is the current observability mechanism.

Prometheus Metrics (planned)

curl http://localhost:8090/metrics

Planned metrics:

  • fluree_transactions_total - Total transactions (counter)
  • fluree_transaction_duration_seconds - Transaction latency (histogram)
  • fluree_queries_total - Total queries (counter)
  • fluree_query_duration_seconds - Query latency (histogram)
  • fluree_query_errors_total - Query errors (counter)
  • fluree_indexing_lag_transactions - Novelty count (gauge)
  • fluree_index_duration_seconds - Indexing time (histogram)
  • fluree_uptime_seconds - Server uptime (gauge)

Prometheus Integration (planned)

Configure Prometheus to scrape Fluree:

# prometheus.yml
scrape_configs:
  - job_name: 'fluree'
    static_configs:
      - targets: ['localhost:8090']
    metrics_path: '/metrics'
    scrape_interval: 15s

Distributed Tracing (OpenTelemetry)

Fluree supports OpenTelemetry (OTEL) distributed tracing, providing deep visibility into query, transaction, and indexing performance. Traces are exported to any OTLP-compatible backend (Jaeger, Grafana Tempo, AWS X-Ray, Datadog, etc.).

Integrating your application’s traces with Fluree? See Distributed Tracing Integration for how to correlate your spans with Fluree’s – both for the Rust library (fluree-db-api) and the HTTP server (fluree-db-server with W3C traceparent).

Enabling OTEL

Build the server with the otel feature flag:

cargo build -p fluree-db-server --features otel --release

Then set environment variables to configure the OTLP exporter:

OTEL_SERVICE_NAME=fluree-server \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
RUST_LOG=info,fluree_db_query=debug,fluree_db_transact=debug \
./target/release/fluree-db-server --data-dir ./data
Environment VariableDefaultDescription
OTEL_SERVICE_NAMEfluree-db-serverService name in traces
OTEL_EXPORTER_OTLP_ENDPOINThttp://localhost:4317OTLP receiver endpoint
OTEL_EXPORTER_OTLP_PROTOCOLgrpcProtocol: grpc or http/protobuf

Quick Start with Jaeger

The repository includes a self-contained test harness in the otel/ directory:

cd otel/
make all    # starts Jaeger, builds with --features otel, starts server, runs tests
make ui     # opens Jaeger UI at http://localhost:16686

See Performance Investigation with Distributed Tracing for detailed usage.

Dual-Layer Subscriber Architecture

The OTEL exporter uses its own Targets filter independent of RUST_LOG. This is a critical design choice: without it, enabling RUST_LOG=debug causes third-party crate spans (hyper, tonic, h2, tower-http) to flood the OTEL batch processor, which overwhelms the exporter and causes parent spans to be dropped.

┌──────────────────────────────────────────────────┐
│              tracing-subscriber registry          │
│                                                   │
│  ┌─────────────────────┐  ┌────────────────────┐ │
│  │   Console fmt layer  │  │   OTEL trace layer │ │
│  │   (EnvFilter from    │  │   (Targets filter: │ │
│  │    RUST_LOG)          │  │    fluree_* only)  │ │
│  └─────────────────────┘  └────────────────────┘ │
└──────────────────────────────────────────────────┘
  • Console layer: Respects RUST_LOG as-is (all crates)
  • OTEL layer: Exports only fluree_* crate targets at DEBUG level. Per-leaf-node TRACE spans (binary_cursor_next_leaf, scan) are excluded to prevent flooding the batch processor queue on large queries

This means RUST_LOG=debug produces verbose console output, but the OTEL exporter only receives Fluree spans – no hyper/tonic/tower noise.

Batch processor queue size: The OTEL batch span processor queue is set to 1,000,000 spans. At ~200 bytes per span, this represents ~200MB of potential memory usage under sustained debug-level traffic. This is intentional to prevent span loss during investigation. At RUST_LOG=info without OTEL, no debug spans are created at all (true zero overhead). With OTEL enabled, the queue rarely exceeds a few thousand entries under normal operation.

Shutdown

On server shutdown, the OTEL SdkTracerProvider is flushed and shut down to ensure all pending spans are exported. This is handled automatically by the server’s shutdown hook.

Dynamic Span Naming (otel.name)

Each HTTP request span is named dynamically via the otel.name field so that traces in Jaeger/Tempo show descriptive names instead of a generic request:

Operationotel.name examples
Queryquery:json-ld, query:sparql, query:explain
Transacttransact:json-ld, transact:sparql-update, transact:turtle
Insertinsert:json-ld, insert:turtle
Upsertupsert:json-ld, upsert:turtle, upsert:trig
Ledger mgmtledger:create, ledger:drop, ledger:info, ledger:exists

The operation span attribute retains the handler-specific name for precise filtering when needed.

Span Hierarchy

Fluree instruments queries, transactions, and indexing with structured tracing spans at two tiers. The only info_span! in the codebase is request (the HTTP request span). All operation spans use debug_span!, guaranteeing true zero overhead when OTEL is not compiled and RUST_LOG is at info.

Tier 1: DEBUG (operation and phase level)

All operation, phase, and operator spans. Visible when OTEL is enabled or when RUST_LOG includes debug:

RUST_LOG=info,fluree_db_query=debug,fluree_db_transact=debug,fluree_db_indexer=debug

Spans: query_execute, query_prepare, query_run, txn_stage, txn_commit, commit_* sub-spans, index_build, build_all_indexes, build_index, sort_blocking, groupby_blocking, core operators (scan, join, filter, project, sort), format, policy_enforce, etc.

Tier 2: TRACE (maximum detail)

Per-operator detail for deep performance analysis:

RUST_LOG=info,fluree_db_query=trace

Additional spans: binary_cursor_next_leaf, property_join, group_by, aggregate, group_aggregate, distinct, limit, offset, union, optional, subquery, having

Span Tree (Query)

query_execute (debug)
├── query_prepare (debug)
│   ├── reasoning_prep (debug)
│   ├── pattern_rewrite (debug, patterns_before, patterns_after)
│   └── plan (debug, pattern_count)
├── query_run (debug)
│   ├── scan (debug)
│   ├── join (debug)
│   │   └── join_next_batch (debug, per iteration)
│   ├── filter (debug)
│   ├── project (debug)
│   ├── sort (debug)
│   ├── sort_blocking (debug, cross-thread via spawn_blocking)
│   └── ...
└── format (debug)

Span Tree (Transaction)

transact_execute (debug)
├── txn_stage (debug, insert_count, delete_count)
│   ├── where_exec (debug, pattern_count, binding_rows, retraction_count, assertion_count)
│   │   ├── delete_gen (debug, template_count, retraction_count)  ← per streaming-WHERE batch
│   │   └── insert_gen (debug, template_count, assertion_count)   ← per batch (mixed DELETE+INSERT only)
│   ├── cancellation (debug)        ← mixed DELETE+INSERT path
│   ├── dedup_retractions (debug)   ← pure-DELETE path (no INSERT templates, not Upsert)
│   └── policy_enforce (debug)
└── txn_commit (debug, flake_count, delta_bytes)
    ├── commit_nameservice_lookup (debug)
    ├── commit_verify_sequencing (debug)
    ├── commit_namespace_delta (debug)
    ├── commit_write_raw_txn (debug)  ← await of upload task spawned at pipeline entry
    ├── commit_build_record (debug)
    ├── commit_write_commit_blob (debug)
    ├── commit_publish_nameservice (debug)
    ├── commit_generate_metadata_flakes (debug)
    ├── commit_populate_dict_novelty (debug)
    └── commit_apply_to_novelty (debug)

Span Tree (Indexing)

Indexing runs as a separate top-level trace (not nested under an HTTP request). Each index refresh cycle starts its own trace root:

index_build (debug, ledger_id)
├── commit_chain_walk (debug)
├── commit_resolve (debug, per commit)
├── dict_merge_and_remap (debug)
├── build_all_indexes (debug)
│   └── build_index (debug, per order: SPOT, PSOT, POST, OPST) [cross-thread]
├── secondary_partition (debug)
├── upload_dicts (debug)
├── upload_indexes (debug)
├── build_index_root (debug)
└── BinaryIndexStore::load (debug) [cross-thread]

index_gc is a separate top-level trace (fire-and-forget tokio::spawn):

index_gc (debug, separate trace)
├── gc_walk_chain (debug)
└── gc_delete_entries (debug)

Span Tree (Bulk Import / fluree-ingest)

Bulk import runs as a standalone top-level trace under the fluree-cli service (no HTTP server involved). The import pipeline instruments all major phases:

bulk_import (debug, alias)
├── import_chunks (debug, total_chunks, parse_threads)
│   ├── [resolver thread: inherits parent context]
│   ├── [ttl-parser-N threads: inherit parent context]
│   └── commit + run generation log events
├── import_index_build (debug)
│   ├── build_all_indexes (debug)
│   │   └── build_index (debug, per order: SPOT, PSOT, POST, OPST) [cross-thread]
│   ├── import_cas_upload (debug)
│   └── import_publish (debug)
└── cleanup log events

The import_chunks span covers the parse+commit loop. Spawned threads (resolver, parse workers) and async tasks (dict upload, index build) inherit the parent span context so their work appears nested in the trace waterfall.

Tracker-to-Span Bridge

When tracked queries or transactions are executed (via the /query or /update endpoints with tracking enabled), the tracker_time and tracker_fuel fields are recorded as deferred attributes on the query_execute and transact_execute spans. These values appear as span attributes in OTEL backends (Jaeger, Tempo, etc.), enabling correlation between the Tracker’s fuel accounting and the span waterfall.

RUST_LOG Quick Reference

GoalPatternWhat you see
Production defaultinfoHTTP request spans only (zero operation spans)
Debug slow queriesinfo,fluree_db_query=debug+ query_execute, query_prepare, query_run, operators
Debug slow transactionsinfo,fluree_db_transact=debug+ txn_stage, txn_commit, commit sub-spans
Full phase decompositioninfo,fluree_db_query=debug,fluree_db_transact=debug,fluree_db_indexer=debugAll debug spans
Per-operator detailinfo,fluree_db_query=trace+ per-leaf: binary_cursor_next_leaf, etc.
Console firehosedebugEverything (OTEL still filters to fluree_*)

Note: When OTEL is enabled, the OTEL Targets filter always captures fluree_* spans at DEBUG regardless of RUST_LOG. The table above describes console output visibility only.

Further Reading

Monitoring Integration

Grafana Dashboards

Import Fluree dashboard:

{
  "dashboard": {
    "title": "Fluree Monitoring",
    "panels": [
      {
        "title": "Query Rate",
        "targets": [
          {
            "expr": "rate(fluree_queries_total[5m])"
          }
        ]
      },
      {
        "title": "Query Latency (p95)",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, fluree_query_duration_seconds)"
          }
        ]
      },
      {
        "title": "Indexing Lag",
        "targets": [
          {
            "expr": "fluree_indexing_lag_transactions"
          }
        ]
      }
    ]
  }
}

Datadog Integration

Send logs to Datadog:

./fluree-db-server \
  --log-format json | \
  datadog-agent stream --service=fluree

New Relic Integration

Use New Relic agent:

export NEW_RELIC_LICENSE_KEY=your-key
export NEW_RELIC_APP_NAME=fluree-prod

./fluree-db-server

Elasticsearch/Kibana

Ship logs to Elasticsearch:

./fluree-db-server \
  --log-format json | \
  filebeat -e -c filebeat.yml

Filebeat config:

filebeat.inputs:
  - type: stdin
    json.keys_under_root: true

output.elasticsearch:
  hosts: ["localhost:9200"]
  index: "fluree-logs-%{+yyyy.MM.dd}"

Health Monitoring

Health Check Endpoint

curl http://localhost:8090/health

Response (healthy):

{
  "status": "healthy",
  "version": "0.1.0",
  "storage": "file",
  "uptime_ms": 3600000,
  "checks": {
    "storage": "healthy",
    "indexing": "healthy",
    "nameservice": "healthy"
  }
}

Response (unhealthy):

{
  "status": "unhealthy",
  "checks": {
    "storage": "healthy",
    "indexing": "unhealthy",
    "nameservice": "healthy"
  },
  "errors": [
    {
      "component": "indexing",
      "message": "Indexing lag exceeds threshold"
    }
  ]
}

Liveness Probe

For Kubernetes:

livenessProbe:
  httpGet:
    path: /health
    port: 8090
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

Readiness Probe

readinessProbe:
  httpGet:
    path: /ready
    port: 8090
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3

Alerting

Alert Rules

Prometheus alert rules:

groups:
  - name: fluree
    rules:
      - alert: HighQueryLatency
        expr: histogram_quantile(0.95, fluree_query_duration_seconds) > 1
        for: 5m
        annotations:
          summary: "High query latency"
          description: "95th percentile query latency is {{ $value }}s"
      
      - alert: HighIndexingLag
        expr: fluree_indexing_lag_transactions > 100
        for: 10m
        annotations:
          summary: "High indexing lag"
          description: "Indexing lag is {{ $value }} transactions"
      
      - alert: HighErrorRate
        expr: rate(fluree_query_errors_total[5m]) > 10
        for: 5m
        annotations:
          summary: "High query error rate"
          description: "Error rate is {{ $value }}/s"

Alert Destinations

Configure alert routing:

route:
  receiver: 'team-ops'
  group_by: ['alertname', 'ledger']
  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
    - match:
        severity: warning
      receiver: 'slack'

receivers:
  - name: 'pagerduty'
    pagerduty_configs:
      - service_key: 'your-key'
  
  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/...'
        channel: '#alerts'

Performance Monitoring

Key Metrics to Track

  1. Query Performance:

    • p50, p95, p99 latency
    • Queries per second
    • Error rate
  2. Transaction Performance:

    • Commit time
    • Transactions per second
    • Error rate
  3. Indexing:

    • Novelty count
    • Index time
    • Indexing lag
  4. Resource Usage:

    • CPU utilization
    • Memory usage
    • Disk I/O
    • Network I/O
  5. Storage:

    • Storage used
    • Storage growth rate
    • S3 request rate (if AWS)

Dashboards

Create operational dashboards:

Overview Dashboard:

  • Request rate
  • Error rate
  • Response times
  • Active connections

Performance Dashboard:

  • Query latency percentiles
  • Transaction latency
  • Indexing performance
  • Resource utilization

Capacity Dashboard:

  • Storage usage and growth
  • Memory usage trends
  • Indexing lag trends
  • Projection to capacity limits

Logging Best Practices

1. Use Structured Logging

JSON format with consistent fields:

{
  "timestamp": "2024-01-22T10:30:00Z",
  "level": "INFO",
  "ledger": "mydb:main",
  "operation": "query",
  "duration_ms": 45
}

2. Log Request IDs

Include request IDs for tracing:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Request-ID: abc-123-def-456" \
  -d '{...}'

3. Appropriate Log Levels

  • Production: info
  • Debugging: debug
  • Development: debug or trace

4. Sample High-Volume Logs

For high-traffic deployments, sample logs:

[logging]
sample_rate = 0.1  # Log 10% of requests

5. Sensitive Data

Never log sensitive data:

  • API keys
  • Passwords
  • Personal information
  • Financial data

Distributed Tracing Integration

This guide explains how to correlate your application’s traces and logs with Fluree’s internal instrumentation, whether you use Fluree as an embedded Rust library (fluree-db-api) or as an HTTP server (fluree-db-server).

Overview

Fluree instruments queries, transactions, and indexing with tracing spans. These spans can participate in your application’s distributed traces so that a single trace shows the full picture: your application code, the Fluree call, and every internal phase (parsing, planning, execution, commit, etc.).

There are two integration paths depending on how you use Fluree:

Integration modeMechanismWhat you get
Rust library (fluree-db-api)Shared tracing subscriberFluree spans automatically nest under your application spans
HTTP server (fluree-db-server)W3C Trace Context (traceparent header)Fluree’s request span becomes a child of your distributed trace

Rust Library Integration (fluree-db-api)

When you embed Fluree via fluree-db-api, trace correlation works automatically through the tracing crate’s context propagation – no special Fluree configuration required.

How it works

The tracing crate uses task-local storage to track the “current span.” When your code creates a span and then calls a Fluree API method, any spans Fluree creates internally become children of your span. This happens automatically as long as both your code and Fluree share the same tracing subscriber (which they do by default – there’s one global subscriber per process).

Basic setup

use fluree_db_api::{FlureeBuilder, Result};
use tracing_subscriber::{EnvFilter, fmt};

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize tracing -- Fluree's spans will appear here too
    tracing_subscriber::fmt()
        .with_env_filter(EnvFilter::from_default_env())
        .init();

    let fluree = FlureeBuilder::new()
        .with_storage_path("./data")
        .build()
        .await?;

    // Your application span wraps the Fluree call
    let span = tracing::info_span!("handle_request", user_id = %user_id);
    async {
        let db = fluree.db("my-ledger", None).await?;
        let result = fluree.query(&db, my_query).await?;
        Ok(result)
    }
    .instrument(span)
    .await
}

At the default RUST_LOG=info, Fluree’s info-level log events appear within your span’s context:

INFO handle_request{user_id=42}: fluree_db_api::view::query: parse_ms=0.12 plan_ms=0.45 exec_ms=3.21 query phases

With RUST_LOG=info,fluree_db_query=debug, you additionally see Fluree’s operation spans nested under yours:

INFO  handle_request{user_id=42}: my_app: handling request
DEBUG handle_request{user_id=42}:query_execute: fluree_db_query: ...
DEBUG handle_request{user_id=42}:query_execute:query_prepare: fluree_db_query: ...
DEBUG handle_request{user_id=42}:query_execute:query_run: fluree_db_query: ...
INFO  handle_request{user_id=42}:query_execute: fluree_db_api: parse_ms=0.12 plan_ms=0.45 exec_ms=3.21 query phases

With OpenTelemetry export

If your application exports traces to an OTEL backend (Jaeger, Tempo, Datadog, etc.), Fluree’s spans appear in the same trace waterfall:

#![allow(unused)]
fn main() {
use opentelemetry::global;
use opentelemetry_otlp::WithExportConfig;
use tracing_opentelemetry::OpenTelemetryLayer;
use tracing_subscriber::{layer::SubscriberExt, EnvFilter, Registry};

fn init_tracing() {
    let exporter = opentelemetry_otlp::SpanExporter::builder()
        .with_tonic()
        .with_endpoint("http://localhost:4317")
        .build()
        .expect("OTLP exporter");

    let provider = opentelemetry_sdk::trace::SdkTracerProvider::builder()
        .with_simple_exporter(exporter)
        .build();

    global::set_tracer_provider(provider);

    let otel_layer = OpenTelemetryLayer::new(global::tracer("my-app"));

    let subscriber = Registry::default()
        .with(otel_layer)
        .with(EnvFilter::from_default_env())
        .with(tracing_subscriber::fmt::layer());

    tracing::subscriber::set_global_default(subscriber).unwrap();
}
}

In Jaeger/Tempo, you’ll see a single trace containing both your application spans and Fluree’s internal spans (query_execute, query_prepare, query_run, scan, join, etc.).

Three tiers of visibility

Fluree uses a tiered logging strategy. At every tier, events and spans are correlated to your application’s active span.

TierRUST_LOG patternWhat you see from Fluree
Logsinfo (default)Info-level log events: phase timings (parse_ms, plan_ms, exec_ms), commit summaries, errors. Zero span overhead.
Operation spansinfo,fluree_db_query=debug+ query_execute, query_prepare, query_run, operator spans — timing waterfall in Jaeger/Tempo
Deep tracinginfo,fluree_db_query=trace+ per-leaf, per-iteration detail (binary_cursor_next_leaf, group_by, etc.)

At the default INFO level, you get Fluree’s summary log events (timings, counts, errors) correlated inside your spans. This is sufficient for most production correlation needs.

At DEBUG, you additionally get the structured span hierarchy that produces the timing waterfall in OTEL backends. This is useful for performance investigation.

Useful RUST_LOG patterns:

PatternUse case
infoProduction: correlatable log events, zero span overhead
info,fluree_db_query=debugInvestigate slow queries
info,fluree_db_transact=debugInvestigate slow transactions
info,fluree_db_query=debug,fluree_db_transact=debugFull operation visibility
debugEverything, but includes third-party crate noise

See Telemetry and Logging for the full span hierarchy.

Key span names and fields

These are the most useful spans and fields for application-level correlation:

SpanLevelKey fieldsWhen it appears
query_executeDEBUGledger_idEvery query
query_prepareDEBUGpattern_countQuery planning phase
query_runDEBUGQuery execution phase
transact_executeDEBUGledger_idEvery transaction
txn_stageDEBUGinsert_count, delete_countTransaction staging
txn_commitDEBUGflake_count, delta_bytesCommit to storage
formatDEBUGoutput_format, result_countResult serialization

Adding your own context to Fluree spans

Since spans nest automatically, the simplest approach is to wrap Fluree calls with your own spans containing the context you need:

#![allow(unused)]
fn main() {
let span = tracing::info_span!(
    "api_query",
    user_id = %user_id,
    endpoint = %path,
    ledger = %ledger_alias,
);

let result = async {
    fluree.query(&db, query).await
}
.instrument(span)
.await?;
}

All of Fluree’s internal spans inherit the user_id, endpoint, and ledger fields from the parent span in trace backends that support field inheritance.

HTTP Server Integration (fluree-db-server)

When Fluree runs as a standalone HTTP server, your application connects over HTTP. Distributed trace correlation uses the W3C Trace Context standard.

W3C traceparent header

When your application sends a traceparent header with an HTTP request, fluree-db-server automatically makes its request span a child of your trace. This requires the otel feature to be enabled on the server.

traceparent: 00-{trace-id}-{parent-span-id}-{trace-flags}

Example request:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "Content-Type: application/json" \
  -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
  -d '{"from": "my-ledger", "select": {"?s": ["*"]}, "where": [["?s", "rdf:type", "schema:Person"]]}'

The resulting trace in Jaeger/Tempo:

your-service: handle_request          ─────────────────────────────
  fluree-server: request (query:json-ld) ──────────────────────────
    query_execute                           ─────────────────────
      query_prepare                         ────
      query_run                                 ───────────────
        scan                                    ─────
        join                                         ─────────
      format                                                   ──

Server requirements

W3C trace context propagation requires:

  1. otel feature enabled at build time:

    cargo build -p fluree-db-server --features otel --release
    
  2. OTEL environment variables set at runtime:

    OTEL_SERVICE_NAME=fluree-server \
    OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
    ./fluree-server
    

Without the otel feature, the traceparent header is still parsed and the trace ID is recorded as a log field for text-based correlation, but the span is not linked as a child in the OTEL trace.

For background indexing triggered by a transaction request, note the distinction between logs and traces:

  • The later indexing work still runs in its own background task and appears as a separate trace/span tree.
  • Fluree copies the triggering request’s request_id and trace_id into the queued indexing job, so the background worker’s log lines can still be correlated back to the originating request.
  • If multiple requests coalesce onto one queued indexing job, the latest queued request metadata is the one retained on the worker logs.

X-Request-ID header (non-OTEL correlation)

For simpler log correlation without full distributed tracing, send an X-Request-ID header:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Request-ID: abc-123-def-456" \
  -d '...'

The server logs and echoes back this ID in the response headers. All log lines for the request include the request_id field, so you can correlate with:

# In JSON log output:
grep '"request_id":"abc-123-def-456"' /var/log/fluree/server.log

This works without the otel feature and is useful for text-based log correlation. The same request_id is also copied onto background indexing logs when that request queues an index build, which helps connect the foreground transaction and later worker activity in plain log search.

Client examples

Python (OpenTelemetry)

from opentelemetry import trace
from opentelemetry.propagate import inject
import requests

tracer = trace.get_tracer("my-app")

with tracer.start_as_current_span("fluree_query") as span:
    headers = {"Content-Type": "application/json"}
    inject(headers)  # adds traceparent header automatically

    response = requests.post(
        "http://localhost:8090/v1/fluree/query",
        headers=headers,
        json={
            "from": "my-ledger",
            "select": {"?s": ["*"]},
            "where": [["?s", "rdf:type", "schema:Person"]],
        },
    )

JavaScript / TypeScript (OpenTelemetry)

import { trace, context, propagation } from "@opentelemetry/api";

const tracer = trace.getTracer("my-app");

await tracer.startActiveSpan("fluree_query", async (span) => {
  const headers: Record<string, string> = {
    "Content-Type": "application/json",
  };
  propagation.inject(context.active(), headers);

  const response = await fetch("http://localhost:8090/v1/fluree/query", {
    method: "POST",
    headers,
    body: JSON.stringify({
      from: "my-ledger",
      select: { "?s": ["*"] },
      where: [["?s", "rdf:type", "schema:Person"]],
    }),
  });

  span.end();
  return response;
});

Rust (reqwest + tracing-opentelemetry)

#![allow(unused)]
fn main() {
use opentelemetry::global;
use opentelemetry::propagation::Injector;
use reqwest::header::HeaderMap;

struct HeaderInjector<'a>(&'a mut HeaderMap);
impl Injector for HeaderInjector<'_> {
    fn set(&mut self, key: &str, value: String) {
        if let Ok(name) = key.parse() {
            if let Ok(val) = value.parse() {
                self.0.insert(name, val);
            }
        }
    }
}

let span = tracing::info_span!("fluree_query", ledger = "my-ledger");
let _guard = span.enter();

let mut headers = HeaderMap::new();
global::get_text_map_propagator(|propagator| {
    propagator.inject(&mut HeaderInjector(&mut headers));
});

let response = reqwest::Client::new()
    .post("http://localhost:8090/v1/fluree/query")
    .headers(headers)
    .json(&query)
    .send()
    .await?;
}

Correlation Strategy Summary

ScenarioMechanismSetup required
Rust app embedding fluree-db-apiShared tracing subscriberNone – automatic
Rust app embedding fluree-db-api with OTELShared subscriber + OTEL layerAdd OpenTelemetryLayer to subscriber
HTTP client → fluree-db-server (OTEL)traceparent headerServer built with otel feature + OTEL env vars
HTTP client → fluree-db-server (log only)X-Request-ID headerNone – works out of the box

Pack format: archive and restore

Fluree’s .flpack format is a self-contained binary snapshot of an entire ledger – commits, transaction payloads, and (optionally) binary index artifacts. It enables ledger portability: archive a ledger to cold storage, restore it later under the same or a different name, or move it between environments.

Overview

The pack protocol (fluree-pack-v1) was designed for efficient bulk transfer between Fluree instances. The same format works equally well for file-based archive/restore workflows. Because all objects inside a pack are content-addressed (identified by ContentId / CIDv1), the ledger name only matters at the nameservice layer – making rename-on-restore straightforward.

What’s in a .flpack file?

A .flpack file is a binary stream of frames:

[Preamble: FPK1 + version(1)]
[Header frame]        -- JSON metadata (commit count, estimated size, etc.)
[Data frames...]      -- commits + txn blobs (oldest-first, topological order)
[Manifest frame]?     -- marks start of index artifact phase (if included)
[Data frames...]?     -- index branches, leaves, dict blobs, roots
[End frame]

Each data frame contains a CID (content identity) and the raw bytes of the object. On ingest, every frame is integrity-verified before being written to storage.

With or without indexes

A pack can include just commits + txn blobs (compact, sufficient for full restore – queries replay from commits), or it can also include binary index artifacts (larger, but the restored ledger is immediately queryable without reindexing).

CLI usage

Archive (export to .flpack)

The CLI does not yet have a dedicated fluree export --format flpack command. To produce a .flpack file today, use the pack HTTP endpoint directly or the Rust API (see below).

From the CLI, the closest equivalent is fluree clone which uses the pack protocol internally for transfer, then writes objects to local CAS.

Restore (import from .flpack)

fluree create my-restored-ledger --from /path/to/archive.flpack

This reads the .flpack file, ingests all CAS objects, and creates a new ledger pointing at the imported commit chain. The ledger name (my-restored-ledger) is independent of whatever the original ledger was called.

Rust API usage

All building blocks for archive/restore live in the API and core crates – no CLI dependency required.

Dependencies

[dependencies]
fluree-db-api = { version = "0.1", features = ["native"] }
fluree-db-core = "0.1"
fluree-db-nameservice-sync = "0.1"
tokio = { version = "1", features = ["full"] }

Archive: generate a .flpack file

Use stream_pack() from fluree-db-api to generate pack frames, then write them to a file (or S3, GCS, etc.).

#![allow(unused)]
fn main() {
use fluree_db_api::{Fluree, FlureeBuilder};
use fluree_db_api::pack::{full_ledger_pack_request, stream_pack};
use tokio::sync::mpsc;
use tokio::io::AsyncWriteExt;

async fn archive_ledger(
    fluree: &Fluree<impl Storage + Clone + Send + Sync + 'static, impl NameService + RefPublisher + Send + Sync>,
    ledger_id: &str,
    output_path: &std::path::Path,
) -> Result<(), Box<dyn std::error::Error>> {
    let handle = fluree.ledger(ledger_id).await?;

    // Build a request that captures the current head commit (and index
    // root, if present). `include_indexes = true` gives the restored
    // ledger instant queryability; pass `false` for a smaller archive
    // that reindexes on import. Empty `want` is always rejected by
    // `stream_pack`, so always build via this helper.
    //
    // `full_ledger_pack_request` sets `include_txns = true` by default.
    // To produce an even smaller archive without original transaction
    // payloads (verifiable but not replayable), mutate the returned
    // request: `request.include_txns = false;`.
    let request = full_ledger_pack_request(&handle, /* include_indexes */ true).await?;

    let (tx, mut rx) = mpsc::channel(64);

    // Spawn the pack generator
    let fluree_clone = fluree.clone();
    let handle_clone = handle.clone();
    let req_clone = request.clone();
    tokio::spawn(async move {
        let _ = stream_pack(&fluree_clone, &handle_clone, &req_clone, tx).await;
    });

    // Write frames to file
    let mut file = tokio::fs::File::create(output_path).await?;
    while let Some(chunk) = rx.recv().await {
        file.write_all(&chunk.bytes).await?;
    }
    file.flush().await?;

    Ok(())
}
}

To archive to S3 instead of a local file, replace the file writer with your S3 upload (e.g., aws_sdk_s3 multipart upload consuming chunks from rx).

Restore: ingest a .flpack file

Use ingest_pack_frame() from fluree-db-nameservice-sync to write each object, then finalize the nameservice pointers with set_commit_head() / set_index_head().

Streaming vs. memory-mapped reads

Pack files can be very large for production ledgers. There are two approaches to reading them:

  • Memory-mapped (mmap): The CLI uses memmap2::Mmap to map the entire file into virtual address space. This avoids heap allocation but still requires the OS to page the entire file through virtual memory. Suitable for files that fit comfortably in available address space.
  • Streaming: For very large archives or when reading from a non-seekable source (S3 GetObject, HTTP response, pipe), decode frames incrementally from a buffered reader. The network ingestion path (ingest_pack_stream) already works this way – it processes one frame at a time and never holds more than a single frame in memory.

For API consumers building archive/restore on large datasets, the streaming approach is recommended. The example below shows the mmap approach for simplicity; see fluree-db-nameservice-sync::pack_client::ingest_pack_stream for the streaming pattern using BytesMut + decode_frame in a loop.

#![allow(unused)]
fn main() {
use fluree_db_api::{Fluree, FlureeBuilder};
use fluree_db_core::pack::{decode_frame, read_stream_preamble, PackFrame, DEFAULT_MAX_PAYLOAD};
use fluree_db_core::{ContentKind, ContentStore};
use fluree_db_nameservice_sync::pack_client::ingest_pack_frame;

async fn restore_ledger(
    fluree: &Fluree<impl Storage + Clone + Send + Sync + 'static, impl NameService + RefPublisher + Send + Sync>,
    new_ledger_id: &str,
    flpack_bytes: &[u8],
) -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create the target ledger (empty)
    fluree.create(new_ledger_id).await?;
    let handle = fluree.ledger(new_ledger_id).await?;

    // 2. Parse preamble
    let mut pos = read_stream_preamble(flpack_bytes)?;

    // 3. Decode frames and ingest each CAS object
    let storage = fluree.storage();
    let mut ns_manifest: Option<serde_json::Value> = None;

    loop {
        let (frame, consumed) = decode_frame(&flpack_bytes[pos..], DEFAULT_MAX_PAYLOAD)?;
        pos += consumed;

        match frame {
            PackFrame::Header(_header) => {
                // Metadata -- log or inspect as needed
            }
            PackFrame::Data { cid, payload } => {
                ingest_pack_frame(&cid, &payload, storage, new_ledger_id).await?;
            }
            PackFrame::Manifest(json) => {
                // The nameservice manifest contains commit/index head CIDs and t values
                if json.get("phase").and_then(|v| v.as_str()) == Some("nameservice") {
                    ns_manifest = Some(json);
                }
            }
            PackFrame::End => break,
            PackFrame::Error(msg) => {
                return Err(format!("pack error: {msg}").into());
            }
        }
    }

    // 4. Finalize nameservice pointers from the manifest
    let manifest = ns_manifest.ok_or("missing nameservice manifest in .flpack")?;

    if let Some(cid_str) = manifest.get("commit_head_id").and_then(|v| v.as_str()) {
        let commit_cid: fluree_db_core::ContentId = cid_str.parse()?;
        let commit_t = manifest.get("commit_t").and_then(|v| v.as_i64()).unwrap_or(0);
        fluree.set_commit_head(&handle, &commit_cid, commit_t).await?;
    }
    if let Some(cid_str) = manifest.get("index_head_id").and_then(|v| v.as_str()) {
        let index_cid: fluree_db_core::ContentId = cid_str.parse()?;
        let index_t = manifest.get("index_t").and_then(|v| v.as_i64()).unwrap_or(0);
        fluree.set_index_head(&handle, &index_cid, index_t).await?;
    }

    Ok(())
}
}

Key points

  • Rename on restore: The new_ledger_id parameter controls the ledger name. CAS objects are content-addressed and name-agnostic; only the nameservice pointer uses the name.
  • Integrity: Every data frame is verified (SHA-256) before writing. A corrupted archive is detected immediately.
  • Indexes are optional: Without indexes, the restored ledger is functional but will need to reindex (or replay from commits) before queries are efficient. With indexes, it’s ready immediately.
  • Storage-agnostic: The same .flpack file can be restored to file storage, S3, or any backend that implements the Storage trait. Archive from file, restore to S3 (or vice versa).

Wire format reference

For full protocol details including frame encoding, see:

Architecture

ConcernCrateKey file
Wire format (FPK1 frames, encode/decode)fluree-db-coresrc/pack.rs
Pack stream generation (export)fluree-db-apisrc/pack.rs
HTTP endpoint (POST /v1/fluree/pack/*)fluree-db-serversrc/routes/pack.rs
Stream ingestion (import)fluree-db-nameservice-syncsrc/pack_client.rs
Commit/index head finalizationfluree-db-apisrc/commit_transfer.rs
CLI .flpack file importfluree-db-clisrc/commands/create.rs

Admin, Health, and Stats

This document covers administrative operations, health monitoring, and server statistics for Fluree deployments.

Health Endpoints

GET /health

Basic health check:

curl http://localhost:8090/health

Response (200 OK):

{
  "status": "ok",
  "version": "0.1.0"
}

Use this endpoint for:

  • Load balancer health checks
  • Container orchestration (Kubernetes liveness/readiness probes)
  • Monitoring systems

Kubernetes Example:

livenessProbe:
  httpGet:
    path: /health
    port: 8090
  initialDelaySeconds: 5
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /health
    port: 8090
  initialDelaySeconds: 5
  periodSeconds: 5

Statistics Endpoints

GET /v1/fluree/stats

Server statistics:

curl http://localhost:8090/v1/fluree/stats

Response:

{
  "uptime_secs": 3600,
  "storage_type": "file",
  "indexing_enabled": true,
  "cached_ledgers": 3,
  "version": "0.1.0"
}
FieldDescription
uptime_secsServer uptime in seconds
storage_typeStorage mode (memory or file)
indexing_enabledWhether background indexing is enabled
cached_ledgersNumber of ledgers currently cached
versionServer version

Diagnostic endpoints

GET /v1/fluree/whoami

Diagnostic endpoint for debugging Bearer tokens.

  • If no token is present, returns token_present=false.
  • If a token is present, attempts to cryptographically verify it using the same verification logic as authenticated endpoints (embedded-JWK Ed25519 and JWKS/OIDC when enabled/configured).
  • On verification failure, returns verified=false and includes an error string. Some unverified decoded fields may be included for debugging.
curl http://localhost:8090/v1/fluree/whoami \
  -H "Authorization: Bearer eyJ..."

CLI discovery

GET /.well-known/fluree.json

Discovery document used by the CLI when adding a remote (fluree remote add) or when running fluree auth login with no configured auth type.

Standalone fluree-server returns:

  • {"version":1,"api_base_url":"/v1/fluree"} when no auth is enabled
  • {"version":1,"api_base_url":"/v1/fluree","auth":{"type":"token"}} when any server auth mode is enabled (data/events/admin)

OIDC-capable implementations can return auth.type="oidc_device" plus issuer, client_id, and exchange_url. The CLI treats oidc_device as “OIDC interactive login”: it uses device-code when the IdP supports it, otherwise authorization-code + PKCE (localhost callback).

Implementations MAY also return api_base_url to tell the CLI where the Fluree API is mounted (for example, when the API is hosted under /v1/fluree or on a separate data subdomain).

See Auth contract (CLI ↔ Server) for the full schema and behavior.

GET /v1/fluree/info/<ledger…>

Get detailed ledger metadata:

curl "http://localhost:8090/v1/fluree/info/mydb:main"

Minimum fields used by the Fluree CLI:

  • t (required)
  • commitId (required for fluree push when t > 0)

Optional query params:

  • By default, ledger-info returns the full novelty-aware stats view, including real-time datatype details and class ref edges.
  • realtime_property_details=false: switch ledger-info to the lighter fast novelty-aware stats layer that keeps counts current but skips lookup-backed class/ref enrichment.
  • include_property_datatypes=false: omit stats.properties[*].datatypes when you want a smaller payload.
  • include_property_estimates=true: include index-derived ndv-values, ndv-subjects, and selectivity fields under stats.properties[*].

Example:

curl "http://localhost:8090/v1/fluree/info/mydb:main"

Response:

{
  "ledger": "mydb:main",
  "t": 150,
  "commitId": "bafybeig...commitT150",
  "indexId": "bafybeig...indexRootT145",
  "commit": {
    "commit_id": "bafybeig...commitT150",
    "t": 150
  },
  "index": {
    "id": "bafybeig...indexRootT145",
    "t": 145
  },
  "stats": {
    "flakes": 12345,
    "size": 1048576,
    "indexed": 145,
    "properties": {
      "ex:name": {
        "count": 3,
        "last-modified-t": 150
      }
    },
    "classes": {
      "ex:Person": {
        "count": 2,
        "properties": {
          "ex:worksFor": {
            "count": 2,
            "refs": { "ex:Organization": 2 },
            "ref-classes": { "ex:Organization": 2 }
          },
          "ex:name": {}
        },
        "property-list": ["ex:name", "ex:worksFor"]
      }
    }
  }
}

Stats freshness (real-time vs indexed)

  • Real-time (includes novelty):

    • commit and top-level t reflect the latest committed head.
    • stats.flakes and stats.size are derived from the current ledger stats view (indexed + novelty deltas).
    • stats.classes[*].properties / property-list will include properties introduced in novelty, even when the update does not restate @type.
    • stats.properties[*].datatypes is real-time by default.
    • stats.classes[*].properties[*].refs is real-time by default.
  • As-of last index:

    • stats.indexed is the last index (t). If commit.t > indexed, the index is behind the head.
    • NDV-related fields in stats.properties[*] (ndv-values, ndv-subjects) and selectivity derived from them are only as current as the last index refresh, so they are omitted by default and only included when include_property_estimates=true.
    • stats.properties[*].datatypes are omitted only when include_property_datatypes=false is requested.
    • Class property ref-edge counts (stats.classes[*].properties[*].refs) fall back to the lighter indexed/fast path only when realtime_property_details=false is requested.

GET /v1/fluree/exists/<ledger…>

Check if a ledger exists:

curl "http://localhost:8090/v1/fluree/exists/mydb:main"

Response:

{
  "ledger": "mydb:main",
  "exists": true
}

This is a lightweight check that only queries the nameservice without loading the ledger.

Administrative Operations

POST /v1/fluree/create

Create a new ledger:

curl -X POST http://localhost:8090/v1/fluree/create \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

Response (201 Created):

{
  "ledger": "mydb:main",
  "t": 0,
  "tx-id": "fluree:tx:sha256:abc123...",
  "commit": {
    "commit_id": "bafybeig...commitT0"
  }
}

Authentication: When --admin-auth-mode=required, requires Bearer token from a trusted issuer.

See Admin Authentication for details.

POST /v1/fluree/drop

Drop (delete) a ledger:

# Soft drop (retract from nameservice, preserve files)
curl -X POST http://localhost:8090/v1/fluree/drop \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

# Hard drop (delete all files - IRREVERSIBLE)
curl -X POST http://localhost:8090/v1/fluree/drop \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main", "hard": true}'

Response:

{
  "ledger": "mydb:main",
  "status": "dropped",
  "files_deleted": 23
}
StatusDescription
droppedSuccessfully dropped
already_retractedWas previously dropped
not_foundLedger doesn’t exist

Authentication: When --admin-auth-mode=required, requires Bearer token from a trusted issuer.

Drop Modes:

  • Soft (default): Retracts from nameservice, files remain (recoverable)
  • Hard: Deletes all files (irreversible)

See Dropping Ledgers for more details.

API Specification

GET /swagger.json

OpenAPI specification:

curl http://localhost:8090/swagger.json

Returns the OpenAPI 3.0 specification for the server API.

Monitoring Best Practices

1. Use Health Checks

Configure your infrastructure to poll /health:

# Simple monitoring script
while true; do
  curl -sf http://localhost:8090/health > /dev/null || echo "ALERT: Server unhealthy"
  sleep 10
done

2. Track Server Stats

Periodically collect statistics:

curl http://localhost:8090/v1/fluree/stats | jq .

Key metrics to track:

  • uptime_secs: Detect restarts
  • cached_ledgers: Cache efficiency

3. Monitor Ledger Health

For each critical ledger:

curl "http://localhost:8090/v1/fluree/info/mydb:main" | jq .

Watch for:

  • Index lag (commit.t vs index.t)
  • Unexpected state changes

4. Set Up Alerts

Alert conditions:

  • Health check failures
  • Server restarts (low uptime)
  • High index lag

5. Log Analysis

Enable structured logging:

fluree-server --log-level info 2>&1 | jq .

Search for:

  • level: "error" - Errors
  • level: "warn" - Warnings
  • Slow query patterns

Security Considerations

Protect Admin Endpoints

In production, enable admin authentication:

fluree-server \
  --admin-auth-mode required \
  --admin-auth-trusted-issuer did:key:z6Mk...

This protects /v1/fluree/create, /v1/fluree/drop, and other admin-protected API routes from unauthorized access.

Limit Endpoint Exposure

Consider network-level restrictions:

  • Health endpoint: Available to load balancers
  • Stats endpoint: Internal monitoring only
  • Admin endpoints: Restricted access

Audit Logging

Admin operations are logged. Monitor for:

  • Ledger creation
  • Ledger drops
  • Authentication failures

Troubleshooting

This section helps you diagnose and resolve common issues with Fluree deployments.

Troubleshooting Guides

Common Errors

Reference for frequently encountered errors:

  • Ledger not found
  • Invalid IRI errors
  • Transaction failures
  • Query timeouts
  • Permission errors
  • Storage issues
  • Indexing problems

Debugging Queries

Tools and techniques for query debugging:

  • Using EXPLAIN plans
  • Query tracing
  • Performance profiling
  • Identifying slow queries
  • Optimizing query patterns

Quick Diagnostics

Health Check

First step for any issue:

curl http://localhost:8090/health

Check for unhealthy components.

Server Status

Check overall server state:

curl http://localhost:8090/v1/fluree/stats

Look for:

  • High error counts
  • Active queries/transactions stuck
  • High indexing lag
  • Memory issues

Logs

Check server logs:

# Recent errors
tail -f /var/log/fluree/server.log | grep ERROR

# Recent warnings
tail -f /var/log/fluree/server.log | grep WARN

Common Issue Categories

Connection Issues

Symptoms:

  • Cannot connect to server
  • Connection refused
  • Connection timeout

Common Causes:

  • Server not running
  • Wrong port
  • Firewall blocking
  • Network issues

Quick Checks:

# Is server running?
ps aux | grep fluree-db-server

# Is port listening?
netstat -an | grep 8090

# Can you reach it?
curl http://localhost:8090/health

Query Issues

Symptoms:

  • Queries return no results
  • Queries timeout
  • Unexpected results
  • Query errors

Quick Checks:

# Enable explain
curl -X POST http://localhost:8090/v1/fluree/explain \
  -d '{...}'

# Check server stats
curl http://localhost:8090/v1/fluree/stats

See Debugging Queries.

Transaction Issues

Symptoms:

  • Transactions fail
  • Validation errors
  • Policy denials
  • Slow commits

Quick Checks:

# Validate JSON-LD
# Use online validator: json-ld.org/playground

# Check permissions
curl -X POST http://localhost:8090/v1/fluree/update?dryRun=true \
  -d '{...}'

# Check server stats
curl http://localhost:8090/v1/fluree/stats

Performance Issues

Symptoms:

  • Slow queries
  • Slow transactions
  • High latency
  • Timeouts

Quick Checks:

# Check indexing lag
curl http://localhost:8090/v1/fluree/info/mydb:main | jq '.t - .index.t'

# Check resource usage
curl http://localhost:8090/v1/fluree/stats

# Check active operations
curl http://localhost:8090/v1/fluree/stats

Storage Issues

Symptoms:

  • Cannot write data
  • Storage errors
  • Disk full
  • AWS errors

Quick Checks:

# Check disk space
df -h /var/lib/fluree

# Check AWS connectivity
aws s3 ls s3://fluree-prod-data/

# Check server stats
curl http://localhost:8090/v1/fluree/stats

Error Code Reference

See Common Errors for complete error code reference.

Most Common:

  • LEDGER_NOT_FOUND - Ledger doesn’t exist
  • PARSE_ERROR - Invalid JSON-LD or SPARQL
  • INVALID_IRI - Malformed IRI
  • QUERY_TIMEOUT - Query took too long
  • POLICY_DENIED - Not authorized

Diagnostic Tools

Enable Debug Logging

./fluree-db-server --log-level debug

Runtime log-level changes are not currently exposed through the standalone HTTP API; restart with the desired --log-level or RUST_LOG.

Enable Query Tracing

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Fluree-Trace: true" \
  -d '{...}'

Enable Policy Tracing

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Fluree-Policy-Trace: true" \
  -d '{...}'

Get Query Plan

curl -X POST http://localhost:8090/v1/fluree/explain \
  -d '{...}'

Getting Help

Diagnostic Information to Collect

When reporting issues, include:

  1. Server version:

    curl http://localhost:8090/health
    
  2. Configuration:

    ./fluree-db-server --help
    # Include relevant config values
    
  3. Error messages:

    • Complete error response
    • Relevant log entries
  4. Reproduction steps:

    • Minimal example to reproduce
    • Sample data if needed
  5. Environment:

    • OS and version
    • Storage mode
    • Available resources (RAM, disk)

Log Collection

Collect diagnostic logs:

# Last 1000 lines
tail -n 1000 /var/log/fluree/server.log > fluree-diagnostic.log

# Specific time range
grep "2024-01-22T10:" /var/log/fluree/server.log > issue-logs.log

Best Practices

1. Check Logs First

Always check logs before deeper investigation:

tail -f /var/log/fluree/server.log

2. Start with Health Check

curl http://localhost:8090/health

3. Isolate the Issue

Test components independently:

  • Can you connect?
  • Can you query?
  • Can you transact?

4. Use Debug Mode Carefully

Debug logging is verbose:

  • Use temporarily
  • Disable in production
  • May impact performance

5. Test on Development

Reproduce on development environment before investigating production.

6. Keep Logs

Retain logs for historical analysis:

# Logrotate config
/var/log/fluree/*.log {
    daily
    rotate 30
    compress
}

Common Errors

This document provides solutions for the most frequently encountered Fluree errors.

LEDGER_NOT_FOUND

{
  "error": "NotFound",
  "message": "Ledger not found: mydb:main",
  "code": "LEDGER_NOT_FOUND"
}

Causes

  1. Ledger doesn’t exist
  2. Typo in ledger name
  3. Wrong branch name
  4. Nameservice not initialized

Solutions

Check ledger exists:

curl http://localhost:8090/v1/fluree/ledgers

Create ledger:

curl -X POST "http://localhost:8090/v1/fluree/create" \
  -H "Content-Type: application/json" \
  -d '{"ledger": "mydb:main"}'

Verify spelling:

  • Check for typos in ledger name
  • Verify branch name (default is main)
  • Check case sensitivity

PARSE_ERROR

{
  "error": "ParseError",
  "message": "Invalid JSON-LD: unexpected token at line 5",
  "code": "PARSE_ERROR",
  "details": {
    "line": 5,
    "column": 12
  }
}

Causes

  1. Invalid JSON syntax
  2. Invalid JSON-LD structure
  3. Invalid SPARQL syntax
  4. Missing required fields

Solutions

Validate JSON:

# Use jq to validate
cat query.json | jq .

Check JSON-LD:

  • Validate @context format
  • Check @id and @type values
  • Verify array vs object usage

Check SPARQL:

  • Validate syntax online
  • Check PREFIX declarations
  • Verify quote matching

Common JSON Mistakes:

// Bad: trailing comma
{
  "select": ["?name"],
  "where": [...],
}

// Good: no trailing comma
{
  "select": ["?name"],
  "where": [...]
}

INVALID_IRI

{
  "error": "ValidationError",
  "message": "Invalid IRI: not a valid URI",
  "code": "INVALID_IRI",
  "details": {
    "iri": "not a uri"
  }
}

Causes

  1. Malformed IRI
  2. Missing namespace prefix
  3. Invalid characters
  4. Spaces in IRI

Solutions

Use valid IRIs:

// Good
{"@id": "http://example.org/alice"}
{"@id": "ex:alice"}

// Bad
{"@id": "not a uri"}
{"@id": "alice"}  // Missing namespace
{"@id": "ex:alice smith"}  // Space

Define namespace:

{
  "@context": {
    "ex": "http://example.org/ns/"
  },
  "@graph": [
    {"@id": "ex:alice"}  // Now valid
  ]
}

URL encode spaces:

{"@id": "ex:alice%20smith"}

UNRESOLVED_COMPACT_IRI

Unresolved compact IRI 'ex:Person': prefix 'ex' is not defined in @context.
If this is intended as an absolute IRI, use a full form (e.g. http://...)
or add the prefix to @context.

This error fires from the JSON-LD strict compact-IRI guard. A value that looks like a compact IRI (prefix:suffix) appeared in an IRI position, but prefix is not defined in @context and is not a recognised absolute scheme.

Causes

  1. Forgotten @context on a query or transaction
  2. Misspelled or missing prefix in @context
  3. Intentionally using a bare prefix:suffix string as an opaque identifier

Solutions

Add the missing prefix to @context (most common fix):

{
  "@context": {"ex": "http://example.org/ns/"},
  "@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
}

Use a full absolute IRI instead of the compact form:

{
  "@graph": [
    {"@id": "http://example.org/ns/alice", "http://example.org/ns/name": "Alice"}
  ]
}

Opt out of the guard for legacy data where bare prefix:suffix strings are intentional:

{
  "@context": {"ex": "http://example.org/ns/"},
  "opts": {"strictCompactIri": false},
  "@graph": [{"@id": "legacy:alice", "ex:name": "Alice"}]
}

The opt-out applies to both queries and transactions. See IRIs and @context — Strict Compact-IRI Guard for the full policy.

QUERY_TIMEOUT

{
  "error": "Timeout",
  "message": "Query execution exceeded timeout of 30000ms",
  "code": "QUERY_TIMEOUT",
  "details": {
    "timeout_ms": 30000,
    "elapsed_ms": 31245
  }
}

Causes

  1. Complex query
  2. Large result set
  3. High indexing lag
  4. Insufficient resources

Solutions

Add LIMIT:

{
  "select": ["?name"],
  "where": [...],
  "limit": 100  // Add limit
}

Add filters:

{
  "where": [...],
  "filter": "?age > 18"  // Reduce result set
}

Check indexing lag:

curl http://localhost:8090/v1/fluree/info/mydb:main
# If (t - index.t) is large, wait for indexing (or reduce write rate)

Simplify query:

  • Break into smaller queries
  • Remove unnecessary joins
  • Use more specific patterns

Increase timeout:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Fluree-Timeout: 60000" \
  -d '{...}'

POLICY_DENIED

{
  "error": "Forbidden",
  "message": "Policy denies access to ledger mydb:main",
  "code": "POLICY_DENIED",
  "details": {
    "subject": "did:key:z6Mkh...",
    "action": "query",
    "ledger": "mydb:main"
  }
}

Causes

  1. No permission for operation
  2. Missing authentication
  3. Policy misconfiguration
  4. Wrong DID/identity

Solutions

Check authentication:

# Are you sending credentials?
curl -H "Authorization: Bearer token" ...

Verify policy:

# Query policies
SELECT ?policy ?subject ?action ?allow
WHERE {
  ?policy a f:Policy .
  ?policy f:subject ?subject .
  ?policy f:action ?action .
  ?policy f:allow ?allow .
}

Test with policy trace:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Fluree-Policy-Trace: true" \
  -d '{...}'

Check DID:

  • Verify DID in signed request
  • Check DID is registered
  • Verify public key

TYPE_ERROR

{
  "error": "TypeError",
  "message": "Expected integer, got string",
  "code": "TYPE_ERROR",
  "details": {
    "expected": "xsd:integer",
    "actual": "xsd:string",
    "value": "not a number"
  }
}

Causes

  1. Wrong datatype
  2. Type mismatch in comparison
  3. Invalid type conversion

Solutions

Use correct types:

// Good
{"ex:age": 30}
{"ex:age": {"@value": "30", "@type": "xsd:integer"}}

// Bad
{"ex:age": "30"}  // String, not integer

Check type constraints:

  • Verify expected types
  • Use explicit @type
  • Validate before submitting

PAYLOAD_TOO_LARGE

{
  "error": "PayloadTooLarge",
  "message": "Transaction exceeds maximum size of 10485760 bytes",
  "code": "PAYLOAD_TOO_LARGE",
  "details": {
    "max_size": 10485760,
    "actual_size": 15000000
  }
}

Causes

  1. Transaction too large
  2. Query result too large
  3. Large embedded data

Solutions

Batch large transactions:

const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
  const batch = entities.slice(i, i + batchSize);
  await transact({"@graph": batch});
}

Use LIMIT for queries:

{
  "select": ["?name"],
  "where": [...],
  "limit": 1000  // Paginate
}

Increase limits (if appropriate):

./fluree-db-server --max-transaction-size 20971520

STORAGE_ERROR

{
  "error": "StorageError",
  "message": "Cannot write to storage",
  "code": "STORAGE_ERROR"
}

Causes

  1. Disk full (file storage)
  2. Permission errors
  3. AWS connectivity (AWS storage)
  4. Storage backend down

Solutions

File Storage:

# Check disk space
df -h /var/lib/fluree

# Check permissions
ls -la /var/lib/fluree
sudo chown -R fluree:fluree /var/lib/fluree

AWS Storage:

# Check AWS credentials
aws sts get-caller-identity

# Check S3 access
aws s3 ls s3://fluree-prod-data/

# Check DynamoDB
aws dynamodb describe-table --table-name fluree-nameservice

HIGH_INDEXING_LAG

Not an error, but a warning condition.

Symptoms

curl http://localhost:8090/v1/fluree/info/mydb:main
{
  "commit_t": 150,
  "index_t": 0
}

Causes

  1. Transaction rate exceeds indexing capacity
  2. Large transactions
  3. Insufficient resources
  4. Storage bottleneck

Solutions

Tune indexing:

fluree-server \
  --indexing-enabled \
  --reindex-min-bytes 100000 \
  --reindex-max-bytes 1000000

Reduce transaction rate:

// Add delay between transactions
await transact(data);
await sleep(100);

Wait for catchup:

async function waitForIndexing() {
  while (true) {
    const status = await getStatus();
    const lag = status.commit_t - status.index_t;
    if (lag < 10) break;
    await sleep(1000);
  }
}

Add resources:

  • More CPU
  • More memory
  • Faster disk

CONCURRENT_MODIFICATION

{
  "error": "Conflict",
  "message": "Concurrent modification detected",
  "code": "CONCURRENT_MODIFICATION"
}

Causes

  1. Multiple processes updating same entity
  2. Nameservice contention
  3. Race condition

Solutions

Implement retry:

async function transactWithRetry(data, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await transact(data);
    } catch (err) {
      if (err.code === 'CONCURRENT_MODIFICATION' && i < maxRetries - 1) {
        await sleep(Math.pow(2, i) * 100);
        continue;
      }
      throw err;
    }
  }
}

Use upsert for retry-friendly transactions:

# Upsert is more retry-friendly for idempotent entity transactions
POST /upsert?ledger=mydb:main

SIGNATURE_VERIFICATION_FAILED

{
  "error": "SignatureVerificationFailed",
  "message": "Invalid signature",
  "code": "INVALID_SIGNATURE"
}

Causes

  1. Wrong private key
  2. Payload modified after signing
  3. Incorrect algorithm
  4. Key not registered

Solutions

Verify signing process:

// Ensure payload not modified
const payload = JSON.stringify(transaction);
const jws = await sign(payload, privateKey);
// Don't modify payload after signing

Check algorithm:

{
  "alg": "EdDSA",  // Must match key type
  "kid": "did:key:z6Mkh..."
}

Verify public-key material: standalone server signed requests use the key material embedded in supported JWS/JWT headers (or configured OIDC JWKS). There is no /admin/keys registration endpoint.

Memory Issues

Symptoms

  • Out of memory errors
  • Server crashes
  • Slow performance
  • Swap usage

Solutions

Check memory:

curl http://localhost:8090/v1/fluree/stats

Reduce memory usage:

# See docs/operations/configuration.md for current memory-related flags.
# In general: reduce write/query load, reduce indexing lag, and provision more RAM.

Add more RAM:

  • Upgrade server
  • Use cloud instance with more memory

Reduce novelty:

  • Index more frequently
  • Reduce transaction size

Troubleshooting Checklist

When encountering issues, check:

  1. Server is running
  2. Can connect to server
  3. Health endpoint returns healthy
  4. Logs show no errors
  5. Ledger exists
  6. Correct ledger name/branch
  7. Valid JSON-LD/SPARQL syntax
  8. Sufficient resources (disk, memory)
  9. No network issues
  10. Authentication working (if required)

Debugging Queries

This guide provides tools and techniques for debugging query performance and correctness issues in Fluree.

Query Explain Plans

Enable Explain

Get query execution plan:

curl -X POST http://localhost:8090/v1/fluree/explain \
  -H "Content-Type: application/json" \
  -d '{
    "from": "mydb:main",
    "select": ["?name", "?age"],
    "where": [
      { "@id": "?person", "schema:name": "?name" },
      { "@id": "?person", "schema:age": "?age" }
    ],
    "filter": "?age > 25"
  }'

Response:

{
  "plan": {
    "type": "join",
    "left": {
      "type": "scan",
      "index": "POST",
      "predicate": "schema:name",
      "estimated_rows": 1000
    },
    "right": {
      "type": "scan",
      "index": "POST",
      "predicate": "schema:age",
      "estimated_rows": 1000
    },
    "join_variable": "?person",
    "filter": {
      "expression": "?age > 25",
      "selectivity": 0.6
    },
    "estimated_result_rows": 600
  },
  "execution": {
    "duration_ms": 45,
    "rows_scanned": 2000,
    "rows_returned": 573,
    "index_hits": 2000,
    "filter_applications": 1000
  }
}

Understanding Explain Output

Scan Operations:

  • Which index used (SPOT, POST, OPST, PSOT)
  • Estimated rows
  • Actual rows scanned

Join Operations:

  • Join type (hash, merge, nested loop)
  • Join variable
  • Join order

Filter Operations:

  • Filter expression
  • Estimated selectivity
  • Rows filtered

Execution Stats:

  • Total duration
  • Rows scanned vs returned
  • Index efficiency

Query Tracing

Enable Tracing

Get detailed execution trace:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Fluree-Trace: true" \
  -d '{...}'

Response:

{
  "results": [...],
  "trace": {
    "total_duration_ms": 45,
    "phases": [
      {
        "phase": "parse",
        "duration_ms": 2
      },
      {
        "phase": "plan",
        "duration_ms": 3
      },
      {
        "phase": "execute",
        "duration_ms": 38,
        "steps": [
          {
            "step": "scan_POST_schema:name",
            "duration_ms": 12,
            "rows": 1000
          },
          {
            "step": "scan_POST_schema:age",
            "duration_ms": 15,
            "rows": 1000
          },
          {
            "step": "join",
            "duration_ms": 8,
            "rows": 1000
          },
          {
            "step": "filter",
            "duration_ms": 3,
            "rows_in": 1000,
            "rows_out": 573
          }
        ]
      },
      {
        "phase": "serialize",
        "duration_ms": 2
      }
    ]
  }
}

Trace Analysis

Look for:

  • Slow phases: Which phase takes longest?
  • Excessive scans: Too many rows scanned?
  • Inefficient joins: Large intermediate results?
  • Ineffective filters: Filters applied too late?

Common Query Issues

No Results

Symptom: Query returns empty array

Debugging Steps:

  1. Check data exists:

    SELECT (COUNT(*) as ?count)
    WHERE { ?s ?p ?o }
    
  2. Test each pattern separately:

    // Test pattern 1
    {"select": ["?person"], "where": [{"@id": "?person", "schema:name": "?name"}]}
    
    // Test pattern 2
    {"select": ["?person"], "where": [{"@id": "?person", "schema:age": "?age"}]}
    
  3. Check IRI matching:

    // Query with full IRI
    {"@id": "http://example.org/ns/alice"}
    
    // Or with prefix
    {"@id": "ex:alice"}
    
  4. Verify time specifier:

    # Current data
    "from": "mydb:main"
    
    # Historical might be empty
    "from": "mydb:main@t:1"
    

Unexpected Results

Symptom: Results don’t match expectations

Debugging Steps:

  1. Check each variable:

    {
      "select": ["?person", "?name", "?age"],  // See all bindings
      "where": [...]
    }
    
  2. Verify types:

    SELECT ?person ?name (DATATYPE(?name) as ?nameType)
    WHERE {
      ?person schema:name ?name
    }
    
  3. Check for duplicates:

    SELECT ?person (COUNT(?name) as ?count)
    WHERE {
      ?person schema:name ?name
    }
    GROUP BY ?person
    HAVING (?count > 1)
    
  4. Test without filters:

    // Remove filter temporarily
    {"where": [...] }  // No filter
    

Slow Queries

Symptom: Query takes too long

Debugging Steps:

  1. Check explain plan:

    curl -X POST http://localhost:8090/v1/fluree/explain -d '{...}'
    
  2. Check indexing lag:

    curl http://localhost:8090/v1/fluree/info/mydb:main
    # High indexing lag (commit_t - index_t) can slow queries
    
  3. Add LIMIT:

    {"where": [...], "limit": 100}
    
  4. Check pattern specificity:

    // Specific (fast)
    {"@id": "ex:alice", "schema:name": "?name"}
    
    // General (slow)
    {"@id": "?entity", "?pred": "?value"}
    
  5. Verify index usage:

    • Subject-based patterns use SPOT (fast)
    • Broad patterns may scan many triples (slow)

Query Optimization

Automatic Pattern Reordering

The query planner automatically reorders WHERE-clause patterns for optimal join order. You do not need to manually order patterns from most to least selective — the planner does this for you using a greedy algorithm that considers cardinality estimates and which variables are already bound at each step.

When database statistics are available (after at least one indexing cycle), estimates use HLL-derived property counts and distinct-value counts. Without statistics, the planner falls back to heuristic constants. You can verify the planner’s decisions using explain plans (see Explain Plans).

Both of these queries produce the same execution plan:

{
  "where": [
    {"@id": "?company", "schema:name": "?companyName"},
    {"@id": "?person", "schema:worksFor": "?company"},
    {"@id": "ex:alice", "schema:name": "?name"}
  ]
}
{
  "where": [
    {"@id": "ex:alice", "schema:name": "?name"},
    {"@id": "ex:alice", "schema:worksFor": "?company"},
    {"@id": "?company", "schema:name": "?companyName"}
  ]
}

The planner recognizes that ex:alice patterns are highly selective (bound subject), and that ?company becomes bound after those patterns execute, making the final pattern a cheap per-subject lookup rather than a full scan.

Filter and BIND Placement

Filters and BINDs are placed during the greedy reordering loop, as soon as all their input variables are bound. You do not need to manually position them for efficiency. For BIND patterns, only the expression’s input variables must be bound — the target variable is an output that feeds back into the bound set, enabling cascading placement of dependent patterns.

When a filter or BIND becomes ready immediately after a compound pattern (UNION, Graph, or Service), the planner pushes it into the compound pattern’s inner lists rather than placing it after. For UNION, the filter is cloned into every branch. This means filters execute within each branch, benefiting from optimal placement, range pushdown, and inline evaluation — the same optimizations available to top-level filters.

Additionally, filters whose variables are all bound by a join operator are evaluated inline during the join itself, avoiding the overhead of a separate filter pass. Filters that depend on a BIND’s output variable are fused into the BindOperator and evaluated inline after computing each row’s BIND value, similarly eliminating a separate filter pass. Range-safe filters (comparisons like >, < on indexed properties) are pushed down to the index scan.

Use LIMIT

Always limit large result sets:

{
  "where": [...],
  "orderBy": ["?name"],
  "limit": 100,
  "offset": 0
}

Implement pagination for UI.

Avoid Cartesian Products

Ensure patterns are connected:

Bad (Cartesian product):

{
  "where": [
    {"@id": "?person", "schema:name": "?name"},
    {"@id": "?company", "schema:name": "?companyName"}
    // Not connected! Returns person × company combinations
  ]
}

Good (connected):

{
  "where": [
    {"@id": "?person", "schema:name": "?name"},
    {"@id": "?person", "schema:worksFor": "?company"},
    {"@id": "?company", "schema:name": "?companyName"}
  ]
}

Policy Debugging

Enable Policy Trace

See which policies apply:

curl -X POST http://localhost:8090/v1/fluree/query \
  -H "X-Fluree-Policy-Trace: true" \
  -d '{...}'

Response:

{
  "results": [...],
  "policy_trace": [
    {
      "policy": "ex:department-policy",
      "matched": true,
      "condition_met": true,
      "decision": "allow",
      "patterns_added": [
        {"@id": "?person", "ex:department": "engineering"}
      ]
    },
    {
      "policy": "ex:role-policy",
      "matched": false,
      "reason": "subject_mismatch"
    }
  ],
  "final_decision": "allow"
}

Policy Impact on Query

Compare query with and without policies:

// With policies (authenticated)
const authResult = await queryWithAuth(query);

// Without policies (admin override)
const fullResult = await queryAsAdmin(query);

console.log(`Auth sees ${authResult.length} rows`);
console.log(`Admin sees ${fullResult.length} rows`);
console.log(`Policy filtered ${fullResult.length - authResult.length} rows`);

Testing Queries

Isolate Components

Test query components separately:

// Test each WHERE pattern
for (const pattern of wherePatterns) {
  const result = await query({
    select: ["?s"],
    where: [pattern]
  });
  console.log(`Pattern ${JSON.stringify(pattern)}: ${result.length} results`);
}

Use Smaller Datasets

Test on small dataset first:

# Create test ledger
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=test:main" \
  -d '{"@graph": [small test data]}'

# Test query
curl -X POST http://localhost:8090/v1/fluree/query \
  -d '{"from": "test:main", ...}'

Compare with Expected Results

const expected = [
  { name: "Alice", age: 30 },
  { name: "Bob", age: 25 }
];

const actual = await query({...});

assert.deepEqual(actual, expected);

Diagnostic Queries

Check Index Usage

# Count triples per index
SELECT (COUNT(*) as ?count)
WHERE { ?s ?p ?o }

Find Large Entities

SELECT ?entity (COUNT(?triple) as ?tripleCount)
WHERE {
  ?entity ?p ?o .
  BIND(?entity AS ?triple)
}
GROUP BY ?entity
ORDER BY DESC(?tripleCount)
LIMIT 10

Find Common Predicates

SELECT ?predicate (COUNT(*) as ?count)
WHERE {
  ?s ?predicate ?o
}
GROUP BY ?predicate
ORDER BY DESC(?count)

Check Data Types

SELECT ?type (COUNT(*) as ?count)
WHERE {
  ?entity a ?type
}
GROUP BY ?type
ORDER BY DESC(?count)

Performance Profiling

Measure Query Time

const start = Date.now();
const result = await query({...});
const duration = Date.now() - start;

console.log(`Query returned ${result.length} rows in ${duration}ms`);

Identify Bottlenecks

Use trace to find slow operations:

const response = await queryWithTrace({...});
const trace = response.trace;

const slowSteps = trace.phases
  .flatMap(p => p.steps || [])
  .filter(s => s.duration_ms > 100)
  .sort((a, b) => b.duration_ms - a.duration_ms);

console.log('Slow steps:', slowSteps);

Compare Approaches

Test different query formulations:

// Approach 1
const start1 = Date.now();
const result1 = await query(approach1);
const time1 = Date.now() - start1;

// Approach 2
const start2 = Date.now();
const result2 = await query(approach2);
const time2 = Date.now() - start2;

console.log(`Approach 1: ${time1}ms, Approach 2: ${time2}ms`);

Best Practices

1. Use Explain Early

Run explain on new queries:

curl -X POST http://localhost:8090/v1/fluree/explain -d '{...}'

2. Test with Representative Data

Test queries with production-like data volume:

// Load realistic test data
await loadTestData(10000);  // Similar to production size

// Test query performance
const result = await query({...});

3. Monitor Query Patterns

Track slow queries:

if (duration > 1000) {
  logger.warn(`Slow query: ${duration}ms`, {
    query: queryText,
    resultCount: result.length
  });
}

4. Profile Before Optimizing

Measure before optimizing:

console.time('query');
const result = await query({...});
console.timeEnd('query');

5. Use Query Logs

Enable query logging:

[logging]
level = "debug"
log_queries = true

Common Query Antipatterns

Antipattern 1: Overly Broad Patterns

Bad:

{"@id": "?entity", "?predicate": "?value"}

Good:

{"@id": "?person", "@type": "schema:Person"},
{"@id": "?person", "schema:name": "?name"}

Antipattern 2: Disconnected Patterns (Cartesian Products)

Ensure all patterns share at least one variable with the rest of the query. Disconnected patterns produce a Cartesian product:

Bad:

{
  "where": [
    {"@id": "?person", "schema:name": "?name"},
    {"@id": "?dept", "schema:budget": "?budget"}
  ]
}

Good:

{
  "where": [
    {"@id": "?person", "schema:name": "?name"},
    {"@id": "?person", "schema:department": "?dept"},
    {"@id": "?dept", "schema:budget": "?budget"}
  ]
}

Note: filter placement is handled automatically by the planner. Filters are applied as soon as all their referenced variables are bound, regardless of where they appear in the query.

Antipattern 3: Missing LIMIT

Bad:

{
  "select": ["?name"],
  "where": [...]  // Could return millions
}

Good:

{
  "select": ["?name"],
  "where": [...],
  "limit": 1000  // Always limit
}

Antipattern 4: Redundant Patterns

Bad:

{
  "where": [
    {"@id": "ex:alice", "schema:name": "?name"},
    {"@id": "ex:alice", "schema:name": "Alice"}  // Redundant
  ]
}

Good:

{
  "where": [
    {"@id": "ex:alice", "schema:name": "Alice"}
  ]
}

Tools

Query Validation

Validate before sending:

function validateQuery(query) {
  if (!query.select) {
    throw new Error('Missing select clause');
  }
  if (!query.where || query.where.length === 0) {
    throw new Error('Missing where clause');
  }
  if (!query.limit && estimateResultSize(query) > 1000) {
    console.warn('Query missing LIMIT clause');
  }
}

Query Builder

Use query builder for complex queries:

const query = new QueryBuilder()
  .from('mydb:main')
  .select('?name', '?age')
  .where('?person', 'schema:name', '?name')
  .where('?person', 'schema:age', '?age')
  .filter('?age > 25')
  .limit(100)
  .build();

Query Templates

Create reusable templates:

function findPersonByName(name) {
  return {
    from: 'mydb:main',
    select: ['?person', '?email'],
    where: [
      { '@id': '?person', 'schema:name': name },
      { '@id': '?person', 'schema:email': '?email' }
    ]
  };
}

Performance Investigation with Distributed Tracing

Fluree includes deep instrumentation that decomposes every query, transaction, and indexing operation into a span waterfall visible in Jaeger, Grafana Tempo, AWS X-Ray, or any OpenTelemetry-compatible backend. This guide explains how to use that instrumentation to find and fix performance bottlenecks.

When to Use Deep Tracing

SymptomStart withEscalate to
Single slow query/v1/fluree/explain planDeep tracing at debug level
Slow queries in general, unclear which phaseDeep tracing at debug leveltrace level for operator detail
Slow transactions / commitsDeep tracing at debug levelCheck txn_commit sub-spans
Indexing taking too longDeep tracing at debug levelCheck build_index per-order timing
Intermittent latency spikesSustained tracing + Jaeger search by durationCorrelate with indexing traces
Production regressionCompare Jaeger traces before/after deployFilter by tracker_time span attribute

Deep tracing is complementary to explain plans, not a replacement. Explain plans show the shape of a query plan; tracing shows where wall-clock time actually went.

Quick Start: Local Investigation

The otel/ directory at the repository root provides a self-contained Makefile-driven harness for local trace investigation.

Prerequisites

  • Docker (for Jaeger)
  • Rust toolchain
  • curl, bash

One-liner setup

cd otel/
make all    # starts Jaeger, builds with --features otel, starts server, runs smoke tests
make ui     # opens Jaeger UI in browser

This gives you a running Fluree server exporting traces to a local Jaeger instance with pre-loaded test data.

Investigate a specific query

Once the server is running (via make server or make all):

# Run your problematic query against the server
curl -s -X POST http://localhost:8090/v1/fluree/query/otel-test:main \
  -H 'Content-Type: application/json' \
  -d '{
    "select": ["?name", "?price"],
    "where": [
      {"@id": "?p", "@type": "ex:Product"},
      {"@id": "?p", "ex:name": "?name"},
      {"@id": "?p", "ex:price": "?price"}
    ],
    "orderBy": [{"desc": "?price"}],
    "limit": 100
  }'

Then open Jaeger (make ui or http://localhost:16686), select service fluree-server, and find the trace. The waterfall shows exactly where time was spent.

Teardown

make clean-all   # stops server, stops Jaeger, removes data

Writing Custom Scenario Scripts

The otel/scripts/ directory contains scenario scripts you can use as templates. To investigate a specific performance issue:

1. Create a scenario script

#!/usr/bin/env bash
# otel/scripts/my-investigation.sh
set -euo pipefail

PORT="${PORT:-8090}"
LEDGER="${LEDGER:-otel-test:main}"
BASE="http://localhost:${PORT}"

echo "=== My investigation scenario ==="

# Step 1: Insert data that triggers the problem
curl -sf -X POST "${BASE}/${LEDGER}/insert" \
  -H 'Content-Type: application/json' \
  -d '{
    "@context": {"ex": "http://example.org/ns/"},
    "@graph": [
      ... your test data ...
    ]
  }' > /dev/null

sleep 0.5  # let OTEL batch exporter flush

# Step 2: Run the problematic query multiple times
for i in $(seq 1 5); do
  echo "  Query iteration $i..."
  curl -sf -X POST "${BASE}/${LEDGER}/query" \
    -H 'Content-Type: application/json' \
    -d '{ ... your query ... }' > /dev/null
  sleep 0.3
done

echo "=== Done. Check Jaeger for traces. ==="

2. Run it

cd otel/
make up build server          # ensure infrastructure is running
bash scripts/my-investigation.sh
make ui                        # inspect traces

3. Add a Makefile target (optional)

# In otel/Makefile
my-investigation: _data/storage
	bash scripts/my-investigation.sh

Tips for effective scenario scripts

  • Pause between requests (sleep 0.3-0.5) to let the OTEL batch exporter flush. Without this, spans from adjacent requests may interleave in Jaeger, making waterfall analysis harder.
  • Run the query multiple times to see variance. Sort by duration in Jaeger to find the worst case.
  • Use different RUST_LOG levels for different investigations. Override when starting the server: make server RUST_LOG=info,fluree_db_query=trace
  • Isolate variables: test with and without indexing (INDEXING=false), with different data volumes, or with different query patterns.

Reading Jaeger Waterfalls

Anatomy of a query trace

request (info)                              ─────────────────────────── 834ms
  query_execute (debug)                     ─────────────────────────── 832ms
    query_prepare (debug)                   ──── 12ms
      reasoning_prep (debug)                ── 3ms
      pattern_rewrite (debug)               ── 2ms
      plan (debug)                          ── 5ms
    query_run (debug)                       ──────────────────────── 818ms
      scan (debug)                          ── 4ms
      join (debug)                          ─────────────────── 780ms
        join_flush_scan_spot (debug)        ────────────────── 775ms
      filter (debug)                        ── 2ms
      sort (debug)                          ── 15ms
        sort_blocking (debug)               ── 14ms
      project (debug)                       ── 1ms
    format (debug)                          ── 2ms

In this example, the bottleneck is immediately visible: the join_flush_scan_spot span accounts for 775ms of the 834ms total. This tells you the query is doing a large range scan during the join phase.

Key span attributes to check

SpanAttributeWhat it tells you
query_executetracker_time, tracker_fuelTotal tracked time and fuel consumption
pattern_rewritepatterns_before, patterns_afterWhether pattern rewriting is effective
planpattern_countComplexity of the query plan
scan(trace level)How long individual scans take
join_flush_scan_spotunique_subjects, total_leavesJoin scan size — large values indicate broad scans
sort_blockinginput_rows, sort_msSort cost — are you sorting a huge result set?
txn_stageinsert_count, delete_countTransaction size
txn_commitflake_count, delta_bytesCommit I/O volume

Note: Span attributes like tracker_time, tracker_fuel, patterns_before, assertion_count, template_count, and pattern_count are verified by acceptance tests (it_tracing_spans.rs). Other attributes in this table (unique_subjects, total_leaves, sort_ms, flake_count, delta_bytes) are documented from code inspection but not programmatically verified — they may drift if span instrumentation is refactored.

Anatomy of a transaction trace

request (info)                              ─────────────── 245ms
  transact_execute (debug)                  ─────────────── 243ms
    txn_stage (debug)                       ────── 45ms
      where_exec (debug)                    ── 8ms
      delete_gen (debug)                    ── 3ms
      insert_gen (debug)                    ── 12ms
      cancellation (debug)                  ── 5ms
      policy_enforce (debug)                ── 2ms
    txn_commit (debug)                      ──────────── 195ms
      commit_nameservice_lookup (debug)     ── 2ms
      commit_verify_sequencing (debug)      ── 1ms
      commit_write_raw_txn (debug)          ── 5ms  (await of spawned upload)
      commit_build_record (debug)           ── 3ms
      commit_write_commit_blob (debug)      ────── 65ms
      commit_publish_nameservice (debug)    ────── 35ms

When store_raw_txn is opted in, the raw-transaction bytes are uploaded on a Tokio task spawned at the top of the pipeline (see PendingRawTxnUpload). commit_write_raw_txn then measures just the await of that task — usually a few ms, even on S3, because the upload overlapped staging CPU work. If you see commit_write_raw_txn approaching the upload’s intrinsic latency (50-100ms on S3), staging finished faster than the upload; otherwise the overlap has absorbed it. The bottleneck on S3 is now typically commit_write_commit_blob alone.

Anatomy of an indexing trace

Indexing runs as a separate trace (not nested under an HTTP request). Search Jaeger for operation name index_build:

If the index build was queued by an HTTP transaction request, use logs to bridge the two views: the background worker now copies the triggering request_id and trace_id onto its log lines, but the OTEL/Jaeger indexing trace remains separate by design.

index_build (debug)                         ─────────────────── 12.5s
  commit_chain_walk (debug)                 ── 50ms
  commit_resolve (debug, commits=2)         ── 27ms
  build_all_indexes (debug)                 ─────────────────── 12.4s
    build_index (debug, order=SPOT)         ──────── 3.1s
    build_index (debug, order=PSOT)         ──────── 3.2s
    build_index (debug, order=POST)         ──────── 3.0s
    build_index (debug, order=OPST)         ──────── 3.1s

Common Bottleneck Patterns

1. Join scan too broad

Symptom: join_flush_scan_spot has unique_subjects in the thousands and dominates the waterfall.

Cause: A join pattern that matches too many subjects, forcing a large range scan.

Fix: Add more selective patterns or filters to narrow the join. Check the explain plan for join order.

2. Sort on large result set

Symptom: sort_blocking shows input_rows > 10,000 and sort_ms dominates.

Cause: Sorting happens after all joins/filters, on the full result set.

Fix: Add LIMIT if possible, or ensure filters run before the sort by placing restrictive patterns first.

3. Commit I/O on S3

Symptom: commit_write_commit_blob takes 50-200ms. commit_write_raw_txn may also show time if staging completed before the parallel upload finished.

Cause: S3 PutObject latency (~50-100ms per call). The raw-txn upload is parallelized with staging, so its cost is usually absorbed, but the commit-blob write is serial on the critical path.

Fix: S3 latency is inherent. Batch multiple small transactions into fewer larger ones. Consider file storage for latency-sensitive workloads. If commit_write_raw_txn is non-trivial, it indicates staging finished faster than the raw-txn upload — the overlap helped but couldn’t fully hide it.

4. Indexing backlog

Symptom: Multiple index_build traces in quick succession, each taking 10+ seconds.

Cause: Transaction volume exceeds indexing throughput, building up novelty.

Fix: Increase the novelty reindex threshold, or reduce transaction frequency. Check build_index sub-spans to see which index order is slowest.

5. Policy evaluation overhead

Symptom: policy_eval or policy_enforce takes a significant fraction of query/transaction time.

Cause: Complex policy rules that require additional queries to evaluate.

Fix: Simplify policy rules, or pre-compute policy decisions where possible.

Controlling Trace Verbosity

RUST_LOG patterns

GoalPatternVisible spans
Production defaultinfoHTTP request spans only (zero operation spans)
Query investigationinfo,fluree_db_query=debug+ query_execute, query_prepare, query_run, operators
Transaction investigationinfo,fluree_db_transact=debug+ txn_stage, txn_commit, commit sub-spans
Full debuginfo,fluree_db_query=debug,fluree_db_transact=debug,fluree_db_indexer=debugAll debug spans
Operator-level detailinfo,fluree_db_query=trace+ per-leaf: binary_cursor_next_leaf, etc.
EverythingdebugConsole firehose (OTEL layer still filters to fluree_* only)

Note: When OTEL is enabled, all fluree_* debug spans flow to the OTEL collector regardless of RUST_LOG. The table above describes console output only.

With the otel/ harness

# Override RUST_LOG when starting the server
make server RUST_LOG='info,fluree_db_query=trace'

# Then run your scenario
make query

In production

Set RUST_LOG via your container orchestrator’s environment variables. Start at info and increase selectively:

# ECS task definition (environment section)
RUST_LOG=info,fluree_db_query=debug,fluree_db_transact=debug

Production Tracing: AWS Deployments

Architecture: fluree-db-server on ECS/Fargate

┌─────────────┐     OTLP/gRPC (4317)     ┌───────────────────┐
│  ECS Task   │ ─────────────────────────▶│  OTEL Collector   │
│  fluree-srv │                           │  (sidecar or      │
│  --features │                           │   Daemon/Service) │
│    otel     │                           └────────┬──────────┘
└─────────────┘                                    │
                                         ┌─────────▼──────────┐
                                         │  Grafana Tempo /    │
                                         │  AWS X-Ray /        │
                                         │  Jaeger             │
                                         └─────────────────────┘

ECS task definition snippet:

{
  "containerDefinitions": [
    {
      "name": "fluree-server",
      "image": "your-ecr-repo/fluree-db-server:latest",
      "environment": [
        {"name": "RUST_LOG", "value": "info,fluree_db_query=debug,fluree_db_transact=debug"},
        {"name": "OTEL_SERVICE_NAME", "value": "fluree-server"},
        {"name": "OTEL_EXPORTER_OTLP_ENDPOINT", "value": "http://localhost:4317"},
        {"name": "OTEL_EXPORTER_OTLP_PROTOCOL", "value": "grpc"}
      ]
    },
    {
      "name": "otel-collector",
      "image": "amazon/aws-otel-collector:latest",
      "essential": true,
      "command": ["--config=/etc/otel-config.yaml"]
    }
  ]
}

OTEL Collector config (for X-Ray export):

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  awsxray:
    region: us-east-1
  # Or for Grafana Tempo:
  # otlp:
  #   endpoint: tempo.internal:4317
  #   tls:
  #     insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [awsxray]

Architecture: fluree-db-api as a Rust crate in AWS Lambda

When using fluree-db-api directly (not through fluree-db-server), you initialize OTEL yourself. The key pattern is the same dual-layer subscriber with a Targets filter on the OTEL layer.

#![allow(unused)]
fn main() {
use opentelemetry_otlp::SpanExporter;
use opentelemetry_sdk::trace::SdkTracerProvider;
use tracing_subscriber::{filter::Targets, layer::SubscriberExt, Registry};

fn init_tracing() {
    // OTEL exporter — Lambda uses HTTP/protobuf to the collector sidecar
    let exporter = SpanExporter::builder()
        .with_http()
        .with_endpoint("http://localhost:4318")  // collector sidecar
        .build()
        .expect("Failed to create OTLP exporter");

    let resource = opentelemetry_sdk::Resource::builder()
        .with_service_name("my-lambda-fn")
        .build();

    let provider = SdkTracerProvider::builder()
        .with_batch_exporter(exporter)
        .with_resource(resource)
        .build();

    let tracer = provider.tracer("fluree-db");
    let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);

    // Critical: filter OTEL layer to fluree_* crates only
    let otel_filter = Targets::new()
        .with_target("fluree_db_api", tracing::Level::DEBUG)
        .with_target("fluree_db_query", tracing::Level::DEBUG)
        .with_target("fluree_db_transact", tracing::Level::DEBUG)
        .with_target("fluree_db_indexer", tracing::Level::DEBUG)
        .with_target("fluree_db_core", tracing::Level::DEBUG);

    let subscriber = Registry::default()
        .with(tracing_subscriber::fmt::layer())
        .with(otel_layer.with_filter(otel_filter));

    tracing::subscriber::set_global_default(subscriber).ok();
}
}

Lambda deployment with ADOT (AWS Distro for OpenTelemetry):

Add the ADOT Lambda layer and set:

AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-handler
OTEL_SERVICE_NAME=my-fluree-lambda
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

The ADOT layer runs a collector sidecar that receives OTLP spans and exports them to X-Ray, Tempo, or any configured backend.

Grafana Tempo + Grafana UI

For production trace exploration, Grafana Tempo with the Grafana UI provides the best experience:

  1. Search by attributes: Find all queries with tracker_time > 500ms
  2. Service graph: Visualize call patterns between services
  3. Trace-to-logs: Jump from a slow span to the corresponding log lines
  4. Trace-to-metrics: Correlate latency spikes with metric dashboards

Tempo query examples (TraceQL):

# Find slow queries
{ resource.service.name = "fluree-server" && name = "query_execute" && duration > 500ms }

# Find large commits
{ name = "txn_commit" && span.flake_count > 1000 }

# Find indexing operations
{ name = "index_build" }

AWS X-Ray

X-Ray works with OTEL traces exported via the AWS OTEL Collector. Key differences from Jaeger/Tempo:

  • X-Ray automatically creates a service map showing request flow
  • Subsegment annotations map to OTEL span attributes
  • X-Ray sampling rules can be configured server-side (no code changes)
  • Use X-Ray Insights for anomaly detection on latency patterns

Using the otel/ Harness for Regression Testing

The otel/ directory is designed for reproducible trace validation. Use it to verify that tracing instrumentation works correctly after code changes:

cd otel/

# Clean slate
make fresh

# After all scenarios complete, check Jaeger:
# 1. Transaction traces should show txn_stage > txn_commit with sub-spans
# 2. Query traces should show query_prepare > query_run with operator spans
# 3. Index traces should appear as separate traces (not under a request)

Specific test scenarios

ScenarioCommandWhat to verify in Jaeger
All transaction typesmake transact5 traces, each with txn_stage + txn_commit
All query typesmake query7 traces with query_prepare + query_run
Background indexingmake indexSeparate index_build trace (not under a request)
Bulk importmake importMany commit traces, possibly indexing traces
Full end-to-endmake smokeAll of the above
Multi-cycle stressmake cycle3 full cycles, multiple index_build traces

Analyzing Exported Traces

Jaeger allows exporting traces as JSON files (click the download icon on any trace or search result). These exports are useful for offline analysis, sharing with teammates, and archiving evidence of performance issues.

Exporting from Jaeger

  1. Open Jaeger UI (http://localhost:16686 or your deployment)
  2. Search for traces (by service, operation, duration, tags)
  3. Click the download/export icon to save as JSON

What’s in the export

The JSON file contains data[].spans[] with full span details: operation names, tags (key-value attributes), parent-child references, durations (in microseconds), and timestamps. Files range from ~100KB for a few traces to 50MB+ for large search exports.

Analyzing with Claude Code

The repository includes Claude Code skills for trace analysis:

/trace-inspect /path/to/traces.json    # Drill into a single trace's span tree and timing
/trace-overview /path/to/traces.json   # Aggregate stats and anomaly detection across all traces

These skills analyze the export file using targeted Python scripts (to avoid loading the full JSON into context) and cross-reference the results against the expected span hierarchy to produce a diagnosis with concrete code-level fix recommendations.

Manual analysis with Python

For quick one-off checks without Claude Code, the Jaeger JSON structure is straightforward:

import json

with open("traces.json") as f:
    data = json.load(f)

for trace in data["data"]:
    for span in trace["spans"]:
        tags = {t["key"]: t["value"] for t in span.get("tags", [])}
        print(f"{span['operationName']}  dur={span['duration']/1000:.1f}ms  {tags}")

Key fields: operationName (span name), duration (microseconds), tags (span attributes), references (parent-child links with refType: "CHILD_OF").

Reference

Reference materials for Fluree developers and operators.

Reference Guides

Glossary

Definitions of key terms and concepts:

  • RDF terminology
  • Fluree-specific terms
  • Database concepts
  • Query terminology
  • Index terminology

Fluree System Vocabulary

Complete reference for Fluree’s system vocabulary under https://ns.flur.ee/db#:

  • Commit metadata predicates (f:t, f:address, f:time, f:previous, etc.)
  • Search query vocabulary (BM25 and vector search patterns)
  • Nameservice record fields and type taxonomy
  • Policy vocabulary
  • Namespace codes

Standards and Feature Flags

Standards and feature-flag reference:

  • SPARQL 1.1 compliance
  • JSON-LD specifications
  • W3C standards support
  • Feature flags
  • Deprecated features

Graph Identities and Naming

Naming conventions for graphs, ledgers, and identifiers:

  • User-facing terminology (ledger, graph IRI, graph source, graph snapshot)
  • Time pinning syntax (@t:, @iso:, @commit:)
  • Named graphs within a ledger
  • Base resolution for graph references

Crate Map

Overview of Fluree’s Rust crate architecture:

  • Core crates
  • API crates
  • Query engine crates
  • Storage crates
  • Dependency relationships

Quick Reference

Common Namespaces

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/terms/> .

Fluree Namespaces

@prefix f: <https://ns.flur.ee/db#> .

Time Specifiers

ledger:branch@t:123             # Transaction number
ledger:branch@iso:2024-01-22    # ISO timestamp
ledger:branch@commit:bafybeig...  # Commit ContentId

HTTP Status Codes

CodeMeaningCommon Cause
200OKSuccess
400Bad RequestInvalid syntax
401UnauthorizedMissing auth
403ForbiddenPolicy denied
404Not FoundLedger not found
408TimeoutQuery too slow
413Payload Too LargeRequest too big
429Too Many RequestsRate limited
500Internal ErrorServer error
503UnavailableOverloaded

Index Types

IndexOrderOptimized For
SPOTSubject-Predicate-Object-TimeEntity properties
POSTPredicate-Object-Subject-TimeProperty values
OPSTObject-Predicate-Subject-TimeValue lookups
PSOTPredicate-Subject-Object-TimePredicate scans

Standards Compliance

RDF Standards

  • RDF 1.1: Full compliance
  • Turtle: Full support
  • JSON-LD 1.1: Full compliance
  • N-Triples: Support (future)

Query Standards

  • SPARQL 1.1 Query: Full compliance
  • SPARQL 1.1 Update: Partial support
  • GeoSPARQL: Planned

Security Standards

  • JWS (RFC 7515): Full support
  • JWT (RFC 7519): Full support
  • Verifiable Credentials: W3C compliant
  • DIDs: did:key, did:web support

Performance Benchmarks

Typical performance characteristics:

Query Performance

Query TypeSmall DBMedium DBLarge DB
Simple lookup< 1ms< 5ms< 10ms
Pattern match< 10ms< 50ms< 100ms
Complex join< 50ms< 200ms< 500ms
Aggregation< 100ms< 500ms< 2s

Transaction Performance

OperationTypical Time
Small insert (< 10 triples)< 10ms
Medium insert (< 100 triples)< 50ms
Large insert (< 1000 triples)< 200ms
Update< 20ms
Upsert< 30ms

Indexing Performance

WorkloadRate
Light1,000 flakes/sec
Medium5,000 flakes/sec
Heavy10,000 flakes/sec

Glossary

Definitions of key terms and concepts used throughout Fluree documentation.

Core Concepts

Ledger

A versioned graph database instance in Fluree, equivalent to a database in traditional systems. Ledgers are identified by ledger IDs like mydb:main.

Example: customers:main, inventory:prod

Branch

A variant of a ledger, allowing multiple independent versions of the same logical database. Branches are part of the ledger ID after the colon.

Example: In mydb:dev, “dev” is the branch.

Transaction Time (t)

A monotonically increasing integer assigned to each transaction, representing the logical time of the transaction.

Example: t=42 is transaction number 42.

Flake

Fluree’s internal representation of an RDF triple with temporal information. A flake is a tuple: (subject, predicate, object, transaction-time, operation, metadata).

Novelty Layer

The set of transactions that have been committed but not yet indexed. The gap between commit_t and index_t.

Example: If commit_t=150 and index_t=145, the novelty layer contains transactions 146-150.

Nameservice

Fluree’s metadata registry that tracks ledger state, including commit and index locations. Enables discovery and coordination across distributed deployments.

RDF Terminology

IRI (Internationalized Resource Identifier)

A globally unique identifier for resources, predicates, and graphs. The internationalized version of URI supporting Unicode.

Example: http://example.org/alice, http://例え.jp/人物/アリス

Triple

The fundamental unit of RDF data: a subject-predicate-object statement.

Example: ex:alice schema:name "Alice"

Subject

The entity being described in a triple (first position).

Example: In ex:alice schema:name "Alice", ex:alice is the subject.

Predicate

The property or relationship in a triple (second position).

Example: In ex:alice schema:name "Alice", schema:name is the predicate.

Object

The value or target entity in a triple (third position).

Example: In ex:alice schema:name "Alice", "Alice" is the object.

Literal

A data value in a triple (string, number, date, etc.), as opposed to an IRI reference.

Example: "Alice", 30, "2024-01-22"^^xsd:date

Blank Node

An anonymous resource without an explicit IRI.

Example: [ schema:streetAddress "123 Main St" ]

Named Graph

A set of triples identified by an IRI, allowing data partitioning within a ledger.

Example: ex:graph1 containing specific triples.

Dataset

A collection of graphs (one default graph and zero or more named graphs) used for query execution.

Transaction Terms

Assertion

Adding a new triple to the database.

Example: Asserting ex:alice schema:age 30 adds this triple.

Retraction

Removing an existing triple from the current database state.

Example: Retracting ex:alice schema:age 30 removes this triple.

Commit

A persisted transaction with assigned transaction time and cryptographic signature.

Commit ContentId

Content-addressed identifier (CIDv1) for a commit, providing storage-agnostic identity and integrity verification. The SHA-256 digest is embedded in the CID.

Example: bafybeig...commitT42

Replace Mode

Transaction mode where all properties of an entity are replaced, enabling idempotent writes.

Also called: Upsert mode

WHERE/DELETE/INSERT

Update pattern for targeted modifications: match data (WHERE), remove old data (DELETE), add new data (INSERT).

Index Terms

SPOT Index

Subject-Predicate-Object-Time index, optimized for retrieving all properties of a subject.

POST Index

Predicate-Object-Subject-Time index, optimized for finding subjects with specific property values.

OPST Index

Object-Predicate-Subject-Time index, optimized for finding subjects that reference specific objects.

PSOT Index

Predicate-Subject-Object-Time index, optimized for scanning all values of a predicate.

Index Snapshot

A complete, query-optimized snapshot of the database at a specific transaction time.

Background Indexing

Asynchronous process that builds index snapshots from committed transactions.

Query Terms

Variable

A placeholder in a query pattern that matches actual values in the data, prefixed with ?.

Example: ?person, ?name, ?age

Binding

The association of a variable with a specific value during query execution.

Example: ?name binds to "Alice"

Pattern

A triple template with variables that matches actual triples in the database.

Example: { "@id": "?person", "schema:name": "?name" }

Filter

A condition that restricts which variable bindings are included in query results.

Example: "filter": "?age > 25"

CONSTRUCT

A SPARQL query form that generates RDF triples rather than variable bindings.

Graph Crawl

Following relationships recursively to explore connected entities.

Graph Source Terms

Graph Source

An addressable query source that participates in execution and can be named in SPARQL via FROM, FROM NAMED, and GRAPH <…>.

Graph sources include:

  • Ledger graph sources (default graph and named graphs stored in a ledger)
  • Index graph sources (BM25 and vector/HNSW indexes)
  • Mapped graph sources (R2RML and Iceberg-backed graph mappings)

Graph Source (Non-Ledger)

A non-ledger graph source is a queryable data source that appears in graph queries but is backed by specialized storage (BM25 index, vector index, Iceberg table, SQL database).

Example: products-search:main, products-vector:main

BM25

Best Matching 25, a ranking algorithm for full-text search. Scores documents by relevance to query terms.

Vector Embedding

A numerical representation of data (text, images, etc.) as a high-dimensional vector, enabling similarity search.

Example: 384-dimensional vector for text embeddings

HNSW

Hierarchical Navigable Small World, a graph-based algorithm for approximate nearest neighbor search in high-dimensional spaces.

R2RML

RDB to RDF Mapping Language, a W3C standard for mapping relational databases to RDF.

Iceberg

Apache Iceberg, an open table format for huge analytical datasets with ACID guarantees.

Security Terms

Policy

A rule specifying who can perform what operations on which data.

DID (Decentralized Identifier)

A globally unique identifier that doesn’t require a central authority, used for cryptographic identity.

Example: did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK

JWS (JSON Web Signature)

An IETF standard (RFC 7515) for representing digitally signed content as JSON.

Verifiable Credential (VC)

A W3C standard for cryptographically verifiable digital credentials.

Public Key

Cryptographic key used to verify signatures, shared publicly.

Private Key

Cryptographic key used to create signatures, kept secret.

Storage Terms

ContentId

A CIDv1 (multiformats) value that uniquely identifies any immutable artifact in Fluree. Encodes the content kind (multicodec) and a SHA-256 digest. The canonical string form is base32-lower multibase (e.g., bafybeig...).

See ContentId and ContentStore for details.

ContentKind

An enum identifying the type of content a ContentId refers to: Commit, Txn, IndexRoot, IndexBranch, IndexLeaf, DictBlob, or DefaultContext. Encoded as a multicodec tag within the CID.

ContentStore

The content-addressed storage trait providing get(ContentId), put(ContentKind, bytes), and has(ContentId) operations. All immutable artifacts are stored and retrieved via ContentStore.

Commit ID

A ContentId identifying a committed transaction. Derived by hashing the canonical commit bytes with SHA-256.

Example: bafybeig...commitT42

Index ID

A ContentId identifying an index root snapshot. Derived by hashing the index root descriptor bytes with SHA-256.

Example: bafybeig...indexRootT145

Storage Backend

The underlying system storing Fluree data (memory, file system, AWS S3/DynamoDB).

Nameservice Record

Metadata about a ledger stored in the nameservice, including commit and index ContentIds.

Time Travel Terms

Time Specifier

A suffix on a ledger reference indicating which point in time to query.

Examples: @t:100, @iso:2024-01-22, @commit:bafybeig...

Point-in-Time Query

A query executed against database state at a specific transaction time.

History Query

A query that returns changes to entities over a time range, showing assertions and retractions.

Temporal Database

A database that maintains complete history of all changes, enabling queries at any past state.

JSON-LD Terms

@context

JSON-LD mechanism for defining namespace prefixes and term mappings.

Example:

{
  "@context": {
    "ex": "http://example.org/ns/",
    "schema": "http://schema.org/"
  }
}

@id

JSON-LD property for specifying the IRI of a resource.

Example: "@id": "ex:alice"

@type

JSON-LD property for specifying the type(s) of a resource.

Example: "@type": "schema:Person"

@graph

JSON-LD property containing an array of entities.

Example:

{
  "@graph": [
    { "@id": "ex:alice", "schema:name": "Alice" }
  ]
}

@value

JSON-LD property for specifying a literal value explicitly.

Example: {"@value": "30", "@type": "xsd:integer"}

Compact IRI

A shortened IRI using namespace prefix.

Example: ex:alice (compact) vs http://example.org/ns/alice (full)

IRI Expansion

Converting compact IRIs to full IRIs using @context mappings.

Example: ex:alice expands to http://example.org/ns/alice

IRI Compaction

Converting full IRIs to compact form using @context.

Example: http://schema.org/name compacts to schema:name

Query Execution Terms

Fuel

A measure of query/transaction execution cost. One unit of fuel is consumed for each item processed (flakes matched, items expanded during graph crawl, etc.). Used to prevent runaway queries from consuming excessive resources.

Example: "opts": {"max-fuel": 10000} limits query to 10,000 fuel units.

Tracking

Query/transaction execution monitoring that provides visibility into performance metrics. When enabled, returns time (execution duration), fuel (items processed), and policy statistics.

Example: "opts": {"meta": true} enables all tracking metrics.

TrackingTally

The result of tracking, containing time (formatted as “12.34ms”), fuel (total count), and policy stats ({policy-id: {executed, allowed}}).

Acronyms

  • ANN: Approximate Nearest Neighbor
  • API: Application Programming Interface
  • CORS: Cross-Origin Resource Sharing
  • CAS: Compare-And-Swap
  • CID: Content Identifier (multiformats)
  • DID: Decentralized Identifier
  • HTTP: Hypertext Transfer Protocol
  • HNSW: Hierarchical Navigable Small World
  • IRI: Internationalized Resource Identifier
  • JSON: JavaScript Object Notation
  • JSON-LD: JSON for Linked Data
  • JWT: JSON Web Token
  • JWS: JSON Web Signature
  • RDF: Resource Description Framework
  • REST: Representational State Transfer
  • SHA: Secure Hash Algorithm
  • SPARQL: SPARQL Protocol and RDF Query Language
  • SSL/TLS: Secure Sockets Layer / Transport Layer Security
  • URI: Uniform Resource Identifier
  • URL: Uniform Resource Locator
  • VC: Verifiable Credential
  • W3C: World Wide Web Consortium
  • XSD: XML Schema Definition

Fluree System Vocabulary Reference

All Fluree system vocabulary lives under a single canonical namespace:

https://ns.flur.ee/db#

Users declare a prefix in their JSON-LD @context to use compact forms:

{ "@context": { "f": "https://ns.flur.ee/db#" } }

Any prefix works (f:, db:, fluree:, etc.) as long as it expands to the canonical IRI. Internally, Fluree always compares on fully expanded IRIs.

The @vector and @fulltext shorthands are exceptions: they are JSON-LD convenience aliases that resolve to f:embeddingVector and f:fullText respectively without requiring a prefix declaration.

Source of truth: All constants are defined in the fluree-vocab crate. This document is the user-facing reference.


Commit metadata predicates

These predicates appear on commit subjects in the txn-meta graph. Each commit produces 7-10 flakes describing the commit.

PredicateFull IRIDatatypeDescription
f:addresshttps://ns.flur.ee/db#addressxsd:stringCommit ContentId (CID string)
f:aliashttps://ns.flur.ee/db#aliasxsd:stringLedger ID (e.g. mydb:main)
f:vhttps://ns.flur.ee/db#vxsd:intCommit format version
f:timehttps://ns.flur.ee/db#timexsd:longCommit timestamp (epoch milliseconds)
f:thttps://ns.flur.ee/db#txsd:intTransaction number (watermark)
f:sizehttps://ns.flur.ee/db#sizexsd:longCumulative data size in bytes
f:flakeshttps://ns.flur.ee/db#flakesxsd:longCumulative flake count
f:previoushttps://ns.flur.ee/db#previous@id (ref)Reference to previous commit (optional)
f:identityhttps://ns.flur.ee/db#identityxsd:stringAuthenticated identity acting on the transaction (system-controlled — verified DID for signed requests, otherwise opts.identity / CommitOpts.identity).
f:authorhttps://ns.flur.ee/db#authorxsd:stringAuthor claim — user-supplied via f:author in the transaction body (optional). Distinct from f:identity.
f:txnhttps://ns.flur.ee/db#txnxsd:stringTransaction ContentId (CID string, optional)
f:messagehttps://ns.flur.ee/db#messagexsd:stringCommit message — user-supplied via f:message in the transaction body (optional).
f:assertshttps://ns.flur.ee/db#assertsxsd:longAssertion count in this commit
f:retractshttps://ns.flur.ee/db#retractsxsd:longRetraction count in this commit

Querying commit metadata

Commit metadata lives in the #txn-meta named graph within each ledger. To query it:

{
  "@context": { "f": "https://ns.flur.ee/db#" },
  "select": ["?t", "?time", "?author"],
  "where": {
    "@graph": "mydb:main#txn-meta",
    "f:t": "?t",
    "f:time": "?time",
    "f:author": "?author"
  }
}

Commit subject identifiers

Commit subjects use the scheme fluree:commit:<content-id> (e.g. fluree:commit:bafybeig...). This is a subject identifier scheme, not part of the db# predicate vocabulary.


Datalog rules

PredicateFull IRIDescription
f:rulehttps://ns.flur.ee/db#ruleDatalog rule definition predicate

Vector datatype

TermIRIDescription
f:embeddingVectorhttps://ns.flur.ee/db#embeddingVectorf32-precision embedding vector datatype
@vector(shorthand)JSON-LD alias that resolves to f:embeddingVector

Example usage in a transaction:

{
  "@context": { "f": "https://ns.flur.ee/db#", "ex": "http://example.org/" },
  "insert": {
    "@id": "ex:doc1",
    "ex:embedding": { "@value": [0.1, 0.2, 0.3], "@type": "f:embeddingVector" }
  }
}

Or with the @vector shorthand:

{
  "insert": {
    "@id": "ex:doc1",
    "ex:embedding": { "@value": [0.1, 0.2, 0.3], "@type": "@vector" }
  }
}

A property declared with @type: "@vector" (or @type: "f:embeddingVector") in the @context may also use a bare JSON array as its value — equivalent to the explicit @value form above:

{
  "@context": {
    "ex": "http://example.org/",
    "ex:embedding": { "@type": "@vector" }
  },
  "insert": {
    "@id": "ex:doc1",
    "ex:embedding": [0.1, 0.2, 0.3]
  }
}

Validation rules

  • Element type: every element must be a JSON number; non-numeric elements are rejected.
  • Element range: values are quantized to f32 at ingest. Non-finite values (NaN, ±Infinity) and values outside the representable f32 range are rejected.
  • Non-empty: vectors must have at least one element. The empty vector ([]) is reserved as an internal max-bound sentinel and is rejected by both the coercion layer and the write-path guard.
  • Scalar values are rejected: a single number paired with the f:embeddingVector datatype (e.g. {"@value": 0.1, "@type": "@vector"}) is rejected; the value must be an array.

The same rules apply to the SPARQL typed-literal form "[0.1, 0.2, 0.3]"^^f:embeddingVector and to Turtle.


Fulltext datatype

TermIRIDescription
f:fullTexthttps://ns.flur.ee/db#fullTextInline full-text search datatype
@fulltext(shorthand)JSON-LD alias that resolves to f:fullText

Example usage in a transaction:

{
  "@context": { "f": "https://ns.flur.ee/db#", "ex": "http://example.org/" },
  "insert": {
    "@id": "ex:article-1",
    "ex:content": { "@value": "Rust is a systems programming language", "@type": "f:fullText" }
  }
}

Or with the @fulltext shorthand:

{
  "insert": {
    "@id": "ex:article-1",
    "ex:content": { "@value": "Rust is a systems programming language", "@type": "@fulltext" }
  }
}

Values annotated with @fulltext are analyzed (tokenized, stemmed) and indexed into per-predicate fulltext arenas during background index builds. Query with the fulltext() function in bind expressions for BM25 relevance scoring.

See Inline Fulltext Search for details.

Fulltext configuration predicates

These predicates live in the ledger’s #config named graph and declare which properties to full-text index (no per-value @fulltext annotation needed). See Configured full-text properties for the end-user guide and Setting groups for the full schema reference.

TermIRIDescription
f:fullTextDefaultshttps://ns.flur.ee/db#fullTextDefaultsSetting group on f:LedgerConfig / f:GraphConfig
f:FullTextDefaultshttps://ns.flur.ee/db#FullTextDefaultsClass (type) of the setting-group node
f:defaultLanguagehttps://ns.flur.ee/db#defaultLanguageBCP-47 tag used for untagged plain strings on configured properties
f:propertyhttps://ns.flur.ee/db#propertyOne entry per full-text-indexed property (cardinality 0..n)
f:FullTextPropertyhttps://ns.flur.ee/db#FullTextPropertyClass of each f:property entry
f:targethttps://ns.flur.ee/db#targetIRI of the property being indexed (on f:FullTextProperty)

Search query vocabulary

These predicates are used in WHERE clause patterns for BM25 and vector search. Users write compact forms like "f:searchText" (with "f" in their @context) or full IRIs.

BM25 search

PredicateFull IRIRequiredDescription
f:graphSourcehttps://ns.flur.ee/db#graphSourceYesGraph source ID (name:branch, e.g. "my-search:main")
f:searchTexthttps://ns.flur.ee/db#searchTextYesSearch query text (string or variable)
f:searchResulthttps://ns.flur.ee/db#searchResultYesResult binding (variable or nested object)
f:searchLimithttps://ns.flur.ee/db#searchLimitNoMaximum results
f:syncBeforeQueryhttps://ns.flur.ee/db#syncBeforeQueryNoWait for index sync before querying (boolean)
f:timeoutMshttps://ns.flur.ee/db#timeoutMsNoQuery timeout in milliseconds

Vector search

PredicateFull IRIRequiredDescription
f:graphSourcehttps://ns.flur.ee/db#graphSourceYesGraph source ID (name:branch)
f:queryVectorhttps://ns.flur.ee/db#queryVectorYesQuery vector (array of numbers or variable)
f:searchResulthttps://ns.flur.ee/db#searchResultYesResult binding
f:distanceMetrichttps://ns.flur.ee/db#distanceMetricNoDistance metric: "cosine", "dot", "euclidean" (default: "cosine")
f:searchLimithttps://ns.flur.ee/db#searchLimitNoMaximum results
f:syncBeforeQueryhttps://ns.flur.ee/db#syncBeforeQueryNoWait for index sync (boolean)
f:timeoutMshttps://ns.flur.ee/db#timeoutMsNoQuery timeout in milliseconds

Nested result objects

Both BM25 and vector search support nested result bindings:

PredicateFull IRIDescription
f:resultIdhttps://ns.flur.ee/db#resultIdDocument/subject ID binding
f:resultScorehttps://ns.flur.ee/db#resultScoreSearch score binding
f:resultLedgerhttps://ns.flur.ee/db#resultLedgerSource ledger ID (multi-ledger disambiguation)

Example BM25 search with nested result:

{
  "@context": { "f": "https://ns.flur.ee/db#" },
  "select": ["?doc", "?score"],
  "where": {
    "f:graphSource": "my-search:main",
    "f:searchText": "software engineer",
    "f:searchLimit": 10,
    "f:searchResult": {
      "f:resultId": "?doc",
      "f:resultScore": "?score"
    }
  }
}

Nameservice record vocabulary

Ledger record fields

These predicates appear on ledger nameservice records (the metadata Fluree stores about each ledger).

PredicateFull IRIDescription
f:ledgerhttps://ns.flur.ee/db#ledgerLedger name/identifier
f:branchhttps://ns.flur.ee/db#branchBranch name (e.g. main)
f:thttps://ns.flur.ee/db#tCurrent transaction watermark
f:ledgerCommithttps://ns.flur.ee/db#ledgerCommitPointer to latest commit ContentId
f:ledgerIndexhttps://ns.flur.ee/db#ledgerIndexPointer to latest index root
f:statushttps://ns.flur.ee/db#statusRecord status (ready, etc.)
f:defaultContextCidhttps://ns.flur.ee/db#defaultContextCidDefault JSON-LD context ContentId

Graph source record fields

PredicateFull IRIDescription
f:namehttps://ns.flur.ee/db#nameGraph source base name
f:branchhttps://ns.flur.ee/db#branchBranch
f:statushttps://ns.flur.ee/db#statusStatus
f:graphSourceConfighttps://ns.flur.ee/db#graphSourceConfigConfiguration JSON
f:graphSourceDependencieshttps://ns.flur.ee/db#graphSourceDependenciesDependent ledger IDs
f:graphSourceIndexhttps://ns.flur.ee/db#graphSourceIndexIndex ContentId reference
f:graphSourceIndexThttps://ns.flur.ee/db#graphSourceIndexTIndex watermark (commit t)
f:graphSourceIndexAddresshttps://ns.flur.ee/db#graphSourceIndexAddressIndex ContentId (string form)

Record type taxonomy

Nameservice records use @type to classify what kind of graph source a record represents.

Required kind types (exactly one per record):

TypeFull IRIDescription
f:LedgerSourcehttps://ns.flur.ee/db#LedgerSourceLedger-backed knowledge graph
f:IndexSourcehttps://ns.flur.ee/db#IndexSourceIndex-backed graph source (BM25/HNSW/GEO)
f:MappedSourcehttps://ns.flur.ee/db#MappedSourceMapped database (Iceberg, R2RML)

Optional subtype @type values (further classify the record):

TypeFull IRIDescription
f:Bm25Indexhttps://ns.flur.ee/db#Bm25IndexBM25 full-text search index
f:HnswIndexhttps://ns.flur.ee/db#HnswIndexHNSW vector similarity search index
f:GeoIndexhttps://ns.flur.ee/db#GeoIndexGeospatial index
f:IcebergMappinghttps://ns.flur.ee/db#IcebergMappingIceberg-mapped database
f:R2rmlMappinghttps://ns.flur.ee/db#R2rmlMappingR2RML relational mapping

Policy vocabulary

These predicates are used to define access control policies.

PredicateFull IRIDescription
f:policyClasshttps://ns.flur.ee/db#policyClassMarks a class as policy-governed
f:allowhttps://ns.flur.ee/db#allowAllow/deny flag on a policy rule
f:actionhttps://ns.flur.ee/db#actionAction this rule governs (view or modify)
f:viewhttps://ns.flur.ee/db#viewView action IRI
f:modifyhttps://ns.flur.ee/db#modifyModify action IRI
f:onPropertyhttps://ns.flur.ee/db#onPropertyProperty-level policy targeting
f:onSubjecthttps://ns.flur.ee/db#onSubjectSubject-level policy targeting
f:onClasshttps://ns.flur.ee/db#onClassClass-level policy targeting
f:queryhttps://ns.flur.ee/db#queryPolicy query (determines applicability)
f:requiredhttps://ns.flur.ee/db#requiredWhether the policy is required (boolean)
f:exMessagehttps://ns.flur.ee/db#exMessageError message when policy denies access

See Policy model and inputs for usage details.


Config graph vocabulary

These predicates define ledger-level configuration stored in the config graph. See Ledger configuration for full documentation.

Core types

TypeFull IRIDescription
f:LedgerConfighttps://ns.flur.ee/db#LedgerConfigLedger-wide configuration resource
f:GraphConfighttps://ns.flur.ee/db#GraphConfigPer-graph configuration override
f:GraphRefhttps://ns.flur.ee/db#GraphRefReference to a graph source

Setting group predicates

PredicateFull IRIDescription
f:policyDefaultshttps://ns.flur.ee/db#policyDefaultsPolicy enforcement defaults
f:shaclDefaultshttps://ns.flur.ee/db#shaclDefaultsSHACL validation defaults
f:reasoningDefaultshttps://ns.flur.ee/db#reasoningDefaultsOWL/RDFS reasoning defaults
f:datalogDefaultshttps://ns.flur.ee/db#datalogDefaultsDatalog rule defaults
f:transactDefaultshttps://ns.flur.ee/db#transactDefaultsTransaction constraint defaults

Policy fields

PredicateFull IRIDescription
f:defaultAllowhttps://ns.flur.ee/db#defaultAllowDefault allow/deny when no policy matches (boolean)
f:policySourcehttps://ns.flur.ee/db#policySourceGraph containing policy rules (GraphRef)
f:policyClasshttps://ns.flur.ee/db#policyClassDefault policy classes to apply

SHACL fields

PredicateFull IRIDescription
f:shaclEnabledhttps://ns.flur.ee/db#shaclEnabledEnable/disable SHACL validation (boolean)
f:shapesSourcehttps://ns.flur.ee/db#shapesSourceGraph containing SHACL shapes (GraphRef)
f:validationModehttps://ns.flur.ee/db#validationModef:ValidationReject or f:ValidationWarn

Reasoning fields

PredicateFull IRIDescription
f:reasoningModeshttps://ns.flur.ee/db#reasoningModesReasoning modes: f:RDFS, f:OWL2QL, f:OWL2RL, f:Datalog
f:schemaSourcehttps://ns.flur.ee/db#schemaSourceGraph containing schema triples (GraphRef)

Datalog fields

PredicateFull IRIDescription
f:datalogEnabledhttps://ns.flur.ee/db#datalogEnabledEnable/disable datalog rules (boolean)
f:rulesSourcehttps://ns.flur.ee/db#rulesSourceGraph containing f:rule definitions (GraphRef)
f:allowQueryTimeRuleshttps://ns.flur.ee/db#allowQueryTimeRulesAllow ad-hoc query-time rules (boolean)

Transact / uniqueness fields

PredicateFull IRIDescription
f:uniqueEnabledhttps://ns.flur.ee/db#uniqueEnabledEnable unique constraint enforcement (boolean)
f:constraintsSourcehttps://ns.flur.ee/db#constraintsSourceGraph(s) containing constraint annotations (GraphRef)
f:enforceUniquehttps://ns.flur.ee/db#enforceUniqueAnnotation on property IRIs: enforce value uniqueness (boolean)

Override control

TermFull IRIDescription
f:overrideControlhttps://ns.flur.ee/db#overrideControlOverride gating on a setting group
f:OverrideNonehttps://ns.flur.ee/db#OverrideNoneNo overrides permitted
f:OverrideAllhttps://ns.flur.ee/db#OverrideAllAny request can override (default)
f:IdentityRestrictedhttps://ns.flur.ee/db#IdentityRestrictedOnly verified identities can override
f:controlModehttps://ns.flur.ee/db#controlModeControl mode (for identity-restricted objects)
f:allowedIdentitieshttps://ns.flur.ee/db#allowedIdentitiesList of DIDs authorized to override

Graph targeting

PredicateFull IRIDescription
f:graphOverrideshttps://ns.flur.ee/db#graphOverridesList of f:GraphConfig per-graph overrides
f:targetGraphhttps://ns.flur.ee/db#targetGraphTarget graph IRI for a f:GraphConfig
f:graphSelectorhttps://ns.flur.ee/db#graphSelectorGraph selector within a f:GraphRef
f:defaultGraphhttps://ns.flur.ee/db#defaultGraphSentinel IRI for the default graph
f:txnMetaGraphhttps://ns.flur.ee/db#txnMetaGraphSentinel IRI for the txn-meta graph

See Ledger configuration for usage details.


RDF-Star annotation predicates

Fluree supports RDF-Star annotations for transaction metadata. These predicates can appear in annotation triples:

PredicateFull IRIDescription
f:thttps://ns.flur.ee/db#tTransaction number on an annotated triple
f:ophttps://ns.flur.ee/db#opOperation type (assert/retract)

Namespace codes (internal)

Fluree encodes namespace IRIs as integer codes for compact storage. These are internal implementation details but useful for contributors working on the core.

CodeNamespaceIRI
0(empty)""
1JSON-LD@
2XSDhttp://www.w3.org/2001/XMLSchema#
3RDFhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
4RDFShttp://www.w3.org/2000/01/rdf-schema#
5SHACLhttp://www.w3.org/ns/shacl#
6OWLhttp://www.w3.org/2002/07/owl#
7Fluree DBhttps://ns.flur.ee/db#
8DID Keydid:key:
9Fluree Commitfluree:commit:
10Blank Node_:
11OGC GeoSPARQLhttp://www.opengis.net/ont/geosparql#
100+User-defined(allocated at first use)

Standard W3C namespaces

Fluree also recognizes these standard W3C namespaces:

PrefixIRICommon predicates
rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns#rdf:type, rdf:first, rdf:rest
rdfs:http://www.w3.org/2000/01/rdf-schema#rdfs:label, rdfs:subClassOf, rdfs:range
xsd:http://www.w3.org/2001/XMLSchema#xsd:string, xsd:int, xsd:dateTime
owl:http://www.w3.org/2002/07/owl#owl:sameAs, owl:inverseOf
sh:http://www.w3.org/ns/shacl#sh:path, sh:datatype, sh:minCount

See IRIs, namespaces, and JSON-LD @context for details on prefix declarations and IRI resolution.

JSON-LD Connection Configuration (Rust)

This page documents the JSON-LD connection config supported by the Rust implementation.

This config uses the same @context + @graph model as other Fluree JSON-LD config surfaces.

Using with the Fluree server

The server accepts a connection config file via --connection-config:

fluree server run --connection-config /path/to/connection.jsonld

This replaces --storage-path for S3, DynamoDB, and other non-filesystem backends. The server builds its storage and nameservice from the config file at startup. Server-level settings (--cache-max-mb, --indexing-enabled, etc.) override connection config defaults. See Configuration for full details and examples.

Entry points (Rust API)

All construction flows through FlureeBuilder:

  • FlureeBuilder::from_json_ld(&json)? — parses JSON-LD config into builder settings
    • Then call .build_client().await for a type-erased FlureeClient
    • Or use typed terminal methods (.build(), .build_memory(), .build_s3()) for compile-time type safety

JSON-LD shape

At minimum, your document contains:

  • @context with @base and @vocab
  • @graph with:
    • one Connection node
    • one or more Storage nodes
    • optional Publisher nodes (nameservice backends)
{
  "@context": {
    "@base": "https://ns.flur.ee/config/connection/",
    "@vocab": "https://ns.flur.ee/system#"
  },
  "@graph": [
    { "@id": "storage1", "@type": "Storage", "filePath": "./data" },
    {
      "@id": "connection",
      "@type": "Connection",
      "indexStorage": { "@id": "storage1" }
    }
  ]
}

ConfigurationValue (env var indirection)

Many fields can be provided as direct literals or as a ConfigurationValue object:

{
  "s3Bucket": { "envVar": "FLUREE_S3_BUCKET", "defaultVal": "my-bucket" },
  "cacheMaxMb": { "envVar": "FLUREE_CACHE_MAX_MB", "defaultVal": "1024" }
}

Notes:

  • envVar: reads from environment (non-wasm targets)
  • defaultVal: fallback string value
  • javaProp: accepted for compatibility; Rust treats it like another env var key (best-effort)

Connection node fields

Supported:

  • parallelism (default 4)
  • cacheMaxMb (supports ConfigurationValue)
  • indexStorage (required): reference to a Storage node
  • commitStorage (optional): reference to a Storage node
  • primaryPublisher (optional): reference to a Publisher node
  • addressIdentifiers (read routing): map of identifier → storage reference
  • defaults (partial):
    • defaults.indexing.reindexMinBytes / reindexMaxBytes are applied as the default IndexConfig for writes
    • defaults.indexing.indexingEnabled=false suppresses background index triggers
    • defaults.indexing.maxOldIndexes sets the maximum number of old index versions to retain before GC (default: 5)
    • defaults.indexing.gcMinTimeMins sets the minimum age in minutes before an index can be garbage collected (default: 30)

addressIdentifiers (read routing)

The addressIdentifiers field maps identifier strings to storage backends, enabling read routing based on the identifier segment in Fluree addresses.

{
  "@id": "connection",
  "@type": "Connection",
  "indexStorage": {"@id": "indexS3"},
  "commitStorage": {"@id": "commitS3"},
  "addressIdentifiers": {
    "commit-storage": {"@id": "commitS3"},
    "index-storage": {"@id": "indexS3"}
  }
}

Routing behavior:

  • fluree:commit-storage:s3://db/commit/abc.fcv2 → routes to commitS3
  • fluree:index-storage:s3://db/index/xyz.json → routes to indexS3
  • fluree:s3://db/index/xyz.json (no identifier) → routes to default storage
  • fluree:unknown-id:s3://db/file.json (unknown identifier) → fallback to default storage

Notes:

  • Writes always go to the default storage (TieredStorage or indexStorage), regardless of identifier
  • This is a read-only routing mechanism for addresses that already contain identifiers
  • Use addressIdentifier (singular) on storage nodes to write addresses with identifier segments

Not yet supported (parsed/ignored or absent):

  • remoteSystems — not supported

Storage node fields

Memory storage

{ "@id": "mem", "@type": "Storage" }

File storage (requires native)

Supported:

  • filePath
  • AES256Key (supports ConfigurationValue)

Notes:

  • Rust expects AES256Key to be base64-encoded and decode to exactly 32 bytes.
  • This encrypts the index/commit blobs written via the storage layer. The file-based nameservice remains plaintext, matching the existing builder behavior.
{
  "@id": "fileStorage",
  "@type": "Storage",
  "filePath": "/var/lib/fluree",
  "AES256Key": { "envVar": "FLUREE_ENCRYPTION_KEY" }
}

S3 storage (requires aws)

Supported fields (parsed and applied by Rust):

  • s3Bucket
  • s3Prefix
  • s3Endpoint (optional; recommended only for LocalStack/MinIO/custom endpoints)
  • s3ReadTimeoutMs, s3WriteTimeoutMs, s3ListTimeoutMs
    • Rust applies a single operation timeout of max(read, write, list)
  • s3MaxRetries, s3RetryBaseDelayMs, s3RetryMaxDelayMs
    • Rust maps s3MaxRetries to AWS SDK max_attempts = max_retries + 1

Standard S3 (AWS)

{
  "@id": "s3",
  "@type": "Storage",
  "s3Bucket": "fluree-prod-data",
  "s3Prefix": "fluree/"
}

LocalStack / MinIO (custom endpoint)

{
  "@id": "s3",
  "@type": "Storage",
  "s3Bucket": "fluree-test",
  "s3Endpoint": "http://localhost:4566",
  "s3Prefix": "fluree/"
}

S3 Express One Zone

Rust relies on the AWS SDK’s native support for directory buckets. We also provide bucket-name detection (--x-s3 + -azN) for diagnostics.

{
  "@id": "s3Express",
  "@type": "Storage",
  "s3Bucket": "my-index--use1-az1--x-s3",
  "s3Prefix": "indexes/"
}

Note: omit s3Endpoint for Express directory buckets and let the AWS SDK handle endpoint resolution. FlureeBuilder::s3() is designed for standard and LocalStack endpoints; for Express buckets, use FlureeBuilder::from_json_ld() with a config that omits s3Endpoint.

Guidance:

  • Standard S3 in AWS: omit s3Endpoint (let the SDK pick defaults)
  • Express One Zone: omit s3Endpoint
  • LocalStack/MinIO/custom: set s3Endpoint

addressIdentifier

Rust parses addressIdentifier on storage nodes and uses it to rewrite published commit/index ContentIds so they include the identifier segment, e.g.: fluree:{addressIdentifier}:s3://....

This is mainly useful when you have multiple storage backends and want addresses to carry an explicit storage identifier.

Split commit vs index storage (tiered S3)

Rust supports the tiered commitStorage + indexStorage format via FlureeBuilder::from_json_ld() / build_client(). Internally, Rust routes:

  • .../commit/... and .../txn/... → commit storage
  • everything else → index storage
{
  "@context": {"@base": "https://ns.flur.ee/config/connection/", "@vocab": "https://ns.flur.ee/system#"},
  "@graph": [
    { "@id": "commitStorage", "@type": "Storage", "s3Bucket": "commits-bucket", "s3Prefix": "fluree-data/" },
    { "@id": "indexStorage",  "@type": "Storage", "s3Bucket": "index--use1-az1--x-s3" },
    { "@id": "publisher", "@type": "Publisher", "dynamodbTable": "fluree-nameservice", "dynamodbRegion": "us-east-1" },
    {
      "@id": "connection",
      "@type": "Connection",
      "commitStorage": {"@id": "commitStorage"},
      "indexStorage":  {"@id": "indexStorage"},
      "primaryPublisher": {"@id": "publisher"}
    }
  ]
}

IPFS storage (requires ipfs)

Supported:

  • ipfsApiUrl (default http://127.0.0.1:5001): Kubo HTTP RPC API base URL
  • ipfsPinOnPut (default true): pin blocks after writing
{
  "@id": "ipfsStorage",
  "@type": "Storage",
  "ipfsApiUrl": "http://127.0.0.1:5001",
  "ipfsPinOnPut": true
}

With env var indirection:

{
  "@id": "ipfsStorage",
  "@type": "Storage",
  "ipfsApiUrl": { "envVar": "FLUREE_IPFS_API_URL", "defaultVal": "http://127.0.0.1:5001" },
  "ipfsPinOnPut": true
}

Notes:

  • Requires a running Kubo node at the specified URL
  • Fluree’s CIDs (SHA-256 + private-use multicodec) are stored directly into IPFS
  • No encryption support (AES256Key is not applicable)
  • See IPFS Storage Guide for Kubo setup and operational details

Publisher (nameservice) node fields

Storage-backed nameservice

Supported:

  • storage (reference to a Storage node)
{
  "@id": "publisher",
  "@type": "Publisher",
  "storage": { "@id": "s3" }
}

DynamoDB nameservice (requires aws)

Supported (and applied):

  • dynamodbTable
  • dynamodbRegion
  • dynamodbEndpoint
  • dynamodbTimeoutMs

Compatibility notes

This Rust JSON-LD model is intended to stay aligned with existing Fluree docs:

  • ../db/docs/S3_STORAGE_GUIDE.md
  • ../db/docs/FILE_STORAGE_GUIDE.md
  • ../db/docs/DYNAMODB_NAMESERVICE_GUIDE.md

Current intentional gaps in Rust:

  • remoteSystems not supported
  • defaults.identity is parsed but not currently applied
  • defaults.indexing.trackClassStats is parsed but not currently applied

Standards and Feature Flags

This document covers Fluree’s compliance with standards and feature flags.

Standards Compliance

RDF 1.1

Status: Fully compliant

Fluree implements the W3C RDF 1.1 specification:

  • RDF triples (subject-predicate-object)
  • IRI identifiers
  • Typed literals
  • Language tags
  • Blank nodes
  • RDF datasets

Specification: https://www.w3.org/TR/rdf11-concepts/

JSON-LD 1.1

Status: Fully compliant

Fluree supports JSON-LD 1.1:

  • @context for namespace mappings
  • @id for resource identification
  • @type for type specification
  • @graph for multiple entities
  • @value and @type for literals
  • @language for language tags
  • Nested objects
  • Arrays

Specification: https://www.w3.org/TR/json-ld11/

SPARQL 1.1 Query

Status: In progress toward full compliance

Supported SPARQL features:

  • SELECT queries
  • CONSTRUCT queries
  • ASK queries
  • DESCRIBE queries
  • FROM and FROM NAMED clauses
  • GRAPH patterns
  • OPTIONAL patterns
  • UNION patterns
  • FILTER expressions
  • BIND expressions
  • Aggregations (COUNT, SUM, AVG, MIN, MAX, SAMPLE, GROUP_CONCAT) with DISTINCT modifier
  • GROUP BY (variables and expressions)
  • ORDER BY
  • LIMIT and OFFSET
  • Subqueries
  • Property paths (partial: +, *, ^, |, /; see SPARQL docs)

Aggregate result types: COUNT and SUM of integers return xsd:integer (per W3C spec), not xsd:long. SUM of mixed types and AVG return xsd:double.

W3C Compliance Testing: Fluree runs the official W3C SPARQL test suite via the testsuite-sparql crate. The suite automatically discovers and runs 700+ test cases from W3C manifest files. See the compliance test guide for details.

Specification: https://www.w3.org/TR/sparql11-query/

SPARQL 1.1 Update

Status: Partial support

Supported:

  • INSERT DATA (via JSON-LD transactions)
  • DELETE/INSERT WHERE (via WHERE/DELETE/INSERT)

Not yet supported:

  • DELETE DATA
  • LOAD
  • CLEAR
  • DROP
  • COPY, MOVE, ADD

Use JSON-LD transactions for transaction operations.

Specification: https://www.w3.org/TR/sparql11-update/

Turtle

Status: Fully supported

Fluree parses Turtle 1.1:

  • @prefix declarations
  • Base IRIs
  • Abbreviated syntax (a, ;, ,)
  • Literals with datatypes and language tags
  • Collections
  • Blank nodes

Specification: https://www.w3.org/TR/turtle/

JSON Web Signature (JWS)

Status: Partial (EdDSA only)

Supported algorithms:

  • EdDSA (Ed25519) - Only supported algorithm

Not yet supported:

  • ES256, ES384, ES512 (ECDSA)
  • RS256 (RSA)
  • HS256, HS384, HS512 (HMAC)

Specification: RFC 7515

Note: Requires the credential feature flag.

Verifiable Credentials

Status: Planned (not yet implemented)

The credential module currently supports JWS verification only. Full VC support (proof verification, JSON-LD canonicalization) is planned but not yet available.

Specification: https://www.w3.org/TR/vc-data-model/

Decentralized Identifiers (DIDs)

Status: Partial support

Supported DID methods:

  • did:key (Ed25519 keys only)

Not yet supported:

  • did:web
  • did:ion
  • did:ethr

Specification: https://www.w3.org/TR/did-core/

Note: Requires the credential feature flag.

Compile-Time Feature Flags (Cargo)

These features are controlled at compile time via Cargo:

fluree-db-api Features

FeatureDefaultDescription
nativeYesFile storage support
awsNoAWS-backed storage support (S3, storage-backed nameservice). Enables FlureeBuilder::s3() and S3-based JSON-LD configs.
credentialNoDID/JWS/VerifiableCredential support for signed queries/transactions. Pulls in crypto dependencies (ed25519-dalek, bs58).
icebergNoApache Iceberg/R2RML graph source support
shaclNoSHACL constraint validation (requires fluree-db-transact + fluree-db-shacl). Default in server/CLI.
vectorNoEmbedded vector similarity search (HNSW indexes via usearch)
ipfsNoIPFS-backed storage via Kubo HTTP RPC
search-remote-clientNoHTTP client for remote BM25 and vector search services
aws-testcontainersNoOpt-in LocalStack-backed S3/DynamoDB tests (auto-start via testcontainers)
fullNoConvenience bundle: native, credential, iceberg, shacl, ipfs

Example:

[dependencies]
fluree-db-api = { path = "../fluree-db-api", features = ["native", "credential"] }

fluree-db-server Features

FeatureDefaultDescription
nativeYesFile storage support (forwards to fluree-db-api/native)
credentialYesSigned request verification (forwards to fluree-db-api/credential)
shaclYesSHACL constraint validation (forwards to fluree-db-api/shacl)
icebergYesApache Iceberg/R2RML graph source support (forwards to fluree-db-api/iceberg)
awsNoAWS S3 storage + DynamoDB nameservice (forwards to fluree-db-api/aws)
oidcNoOIDC JWT verification via JWKS (RS256 tokens from external IdPs)
swagger-uiNoSwagger UI endpoint
otelNoOpenTelemetry tracing

To build the server without credential support (faster compile):

cargo build -p fluree-db-server --no-default-features --features native

Runtime Behavior

Reasoning, SPARQL property paths, and GeoSPARQL functions are always available in any build that links the corresponding crate features (see the build-time feature tables above). They are not gated behind a runtime flag.

Reasoning is opted into per query (via the reasoning parameter or the SPARQL PRAGMA reasoning directive) or per ledger (via f:reasoningDefaults in the ledger configuration graph). See Query-time reasoning and Setting groups.

Parsing Modes

Strict Mode (Default)

Enforces strict compliance with standards:

  • Invalid IRIs rejected
  • Type mismatches rejected
  • Strict JSON-LD parsing
./fluree-db-server --strict-mode true

Lenient Mode

More permissive parsing:

  • Attempts to fix malformed IRIs
  • Coerces types when possible
  • Accepts non-standard syntax
./fluree-db-server --strict-mode false

Use lenient mode only when you fully control inputs and explicitly want permissive parsing behavior.

API Versioning

Current API version: v1

Version Header:

X-Fluree-API-Version: 1

Supported Data Formats

JSON-LD

Supported JSON-LD versions:

  • JSON-LD 1.0: Yes
  • JSON-LD 1.1: Yes

SPARQL

Supported SPARQL versions:

  • SPARQL 1.0: Yes
  • SPARQL 1.1: Yes

RDF Formats

FormatReadWrite
JSON-LDYesYes
TurtleYesYes
N-TriplesPlannedPlanned
N-QuadsPlannedPlanned
RDF/XMLPlannedNo
TriGPlannedPlanned

Protocol Support

HTTP Versions

  • HTTP/1.1: Fully supported
  • HTTP/2: Supported
  • HTTP/3: Planned

TLS Versions

  • TLS 1.2: Supported
  • TLS 1.3: Supported
  • SSL 3.0: Not supported (deprecated)
  • TLS 1.0/1.1: Not supported (deprecated)

Client Support

Fluree works with:

HTTP Clients:

  • curl
  • Postman
  • Insomnia
  • Any HTTP client library

RDF Libraries:

  • Apache Jena (Java)
  • RDF4J (Java)
  • rdflib (Python)
  • N3.js (JavaScript)

SPARQL Clients:

  • Apache Jena ARQ
  • RDF4J SPARQLRepository
  • Any SPARQL 1.1 client

Platform Support

Operating Systems

Server:

  • Linux (x86_64, aarch64)
  • macOS (Intel, Apple Silicon)
  • Windows (x86_64)

Clients:

  • Any OS with HTTP support

Cloud Platforms

  • AWS (native support)
  • Google Cloud Platform (via file storage)
  • Azure (via file storage)
  • Self-hosted / on-premises

Container Support

  • Docker: Full support
  • Kubernetes: Full support
  • Podman: Supported
  • Docker Compose: Full support

Database Support

Import Sources

Fluree can import from:

RDF Databases:

  • Apache Jena TDB
  • Virtuoso
  • Stardog
  • GraphDB
  • Any RDF export

Graph Databases:

  • Neo4j (via RDF export)
  • Amazon Neptune (via RDF export)

Relational Databases:

  • Via R2RML mapping
  • Direct SQL query

Export Formats

Export Fluree data to:

  • Turtle files
  • JSON-LD documents
  • SPARQL CONSTRUCT results
  • Any RDF format

Feature Roadmap

Planned Features

Query:

  • SPARQL property paths: remaining operators (? zero-or-one, ! negated set)
  • GeoSPARQL
  • SPARQL 1.1 Federation
  • Full SPARQL UPDATE

Storage:

  • Additional cloud providers (GCP, Azure)
  • Hybrid storage modes

Security:

  • OAuth 2.0 integration
  • SAML support
  • Additional DID methods

Graph Sources:

  • BigQuery integration
  • Snowflake integration
  • Elasticsearch integration

Feature Discovery

Feature availability is documented in this compatibility matrix and by crate feature flags; the standalone server does not expose a /features HTTP endpoint.

Browser Support

For web applications using Fluree API:

Supported Browsers:

  • Chrome/Edge 90+
  • Firefox 88+
  • Safari 14+

Requirements:

  • Fetch API support
  • CORS support
  • WebSocket support (for future streaming)

Tool Support

RDF Tools

Compatible with standard RDF tools:

  • Protégé (ontology editor)
  • TopBraid Composer
  • RDF validators
  • SPARQL editors

Data Tools

Works with data engineering tools:

  • Apache Airflow (via HTTP operators)
  • dbt (via SQL proxy with R2RML)
  • Apache Spark (via Iceberg)
  • Pandas (via query API)

Version Requirements

Rust Version

Building from source requires:

  • Rust 1.75.0 or later
  • Cargo 1.75.0 or later

Dependencies

Runtime dependencies:

  • None (statically linked binary)

Optional dependencies:

  • AWS SDK (for AWS storage)

Graph Identities and Naming

This document defines names → things and recommended naming conventions for Fluree. It is split into:

  • User-facing naming: what we say in docs, examples, and APIs.
  • Internal naming: how we name types/components in Rust so the implementation stays clear.
    The goal is to make these simultaneously true:
  • Users can think in the familiar model: “database as a value” (immutable, time-travelable).
  • SPARQL semantics remain correct: GRAPH <…> identifies a graph by IRI.
  • Fluree can seamlessly query across:
    • graphs inside a ledger (default + named graphs),
    • across ledgers (federation),
    • and non-ledger sources (BM25, vector, Iceberg/R2RML, etc.).

Summary: the model in one paragraph

In Fluree, you query graphs and often load a graph snapshot (an immutable point-in-time view you can query repeatedly). In SPARQL, graph scoping uses GRAPH <iri> { … }, where <iri> is a graph identifier (a graph IRI). Fluree supports multiple kinds of graph sources (ledger graphs and non-ledger sources like BM25/vector indexes and tabular mappings). Users may refer to graphs with short, friendly aliases that Fluree resolves against a configured base into canonical graph IRIs. Not all graph sources support the same time-travel semantics — time pinning and “as-of” behavior is a graph-source capability, not a universal guarantee.

Under the hood, this “graph snapshot” corresponds to the same semantic idea many temporal systems describe as “database as a value”: immutable, time-travelable, and safe to pass around.


Core terms

  • Ledger: A durable data product with a commit chain, identified by a ledger ID like mydb:main.
    • A ledger is what users create/manage.
    • A ledger can contain multiple graphs (default graph + named graphs).
  • Graph Source ID: A canonical name:branch identifier used in APIs/CLI/config to refer to a graph source, e.g. products-search:main.
    • This is an alias-style identifier (not a full IRI).
    • In SPARQL contexts it may appear inside <…> and can be resolved against a configured base into a canonical Graph IRI.
  • Graph: A query scope (SPARQL term).
    • In a query, a “graph” is identified by an IRI and used to scope patterns (GRAPH <iri> { … }).
  • Graph Snapshot: An immutable point-in-time view of a graph that can be queried repeatedly.
  • LedgerSnapshot (database value): The underlying semantic model: an immutable value at a point in time.
    • In product/docs we usually say “graph snapshot” because it aligns with SPARQL and the Rust API.
    • Internally the type is LedgerSnapshot in fluree-db-core.
  • Graph IRI: The canonical identity of a graph. This is what SPARQL uses.
  • Graph reference (GraphRef): What a user types (often an alias-like string), which Fluree resolves to a Graph IRI.
  • Graph Source: Anything addressable by a Graph IRI that can participate in query execution.
  • Federation (preferred) / Dataset (SPARQL term): A query executed over a set of graphs.
    • We prefer “federation” when describing the product feature to non-SPARQL users.

“Graph snapshot” vs “Graph IRI” (how to talk about it)

  • Graph Snapshot (value) answers: “What immutable point-in-time graph am I querying?”
  • Graph IRI (identifier) answers: “Which graph does this part of the query run against?”

In practice, you query a graph snapshot by naming its graph:

  • When you write FROM <…> or GRAPH <…>, you are naming a graph IRI.
  • That graph IRI resolves to a graph snapshot (an immutable value) at execution time.

Time pinning syntax (“the part after @ pins the snapshot”)

Fluree supports time pinning in graph references.

Current syntax (implemented today):

  • <ledger>:<branch>@t:<t> — pin to transaction time
  • <ledger>:<branch>@iso:<rfc3339> — pin to ISO datetime
  • <ledger>:<branch>@commit:<commit-content-id> — pin to commit ContentId (prefix allowed)

Note: you may see an = form in older design notes (@t=100, etc.). That form is not the supported user-facing syntax today; use the @t: / @iso: / @commit: forms in docs and examples.

From a user perspective:

  • The @… portion selects which snapshot value you mean for that ledger graph.

Important nuance:

  • For ledger graph sources, @… selects a pinned point-in-time view.
  • For non-ledger graph sources, @… support is capability-specific:
    • Some sources may support time pinning by selecting an appropriate snapshot/root.
    • Some sources are head-only and should reject time-pinned requests with a clear error.

Named graphs within a ledger

We support multiple named graphs inside a single ledger (shared commit chain, distinct graph identities/indexes).

System graphs

Every ledger reserves system named graphs for internal use:

GraphIRI patternPurpose
Default graph(implicit)Application data
Txn-metaurn:fluree:{ledger_id}#txn-metaCommit metadata
Config graphurn:fluree:{ledger_id}#configLedger configuration

User-defined named graphs (created via TriG) are identified by their IRI and allocated after the system graphs.

The config graph stores ledger-level operational defaults (policy, SHACL, reasoning, uniqueness constraints) as RDF triples. See Ledger configuration for details.

Naming convention

Recommended user-facing convention (alias-friendly, URL-friendly, avoids / as a delimiter inside the ledger namespace):

<ledger>:<branch>[ @time-spec ] [ #<named-graph-alias> ]

Examples:

  • Default graph, latest: <acme/people:main>
  • Default graph at (t=1000): <acme/people:main@t:1000>
  • Txn metadata graph at latest: <acme/people:main#txn-meta>
  • Txn metadata graph at (t=1000): <acme/people:main@t:1000#txn-meta>

Important note about # fragments

Using #<named-graph-alias> is idiomatic RDF identity, but HTTP clients do not send fragments to servers. That’s fine for graph identity and query semantics, but if you want a dereferenceable HTTP endpoint for a named graph, plan to expose a server-visible selector (e.g., ?graph=txn-meta) in addition to the canonical identity.

Full IRIs are always allowed

Semantic web users may prefer full IRIs:

  • https://data.flur.ee/acme/people:main@t:1000#txn-meta

These should be used as-is (no resolution needed).

Base resolution (“make aliases globally identifiable”)

Many users prefer short names like people:main or acme/people:main. To make them globally identifiable:

  • Allow a configured base (SPARQL BASE <…> or a connection/query base configuration).
  • Treat alias-style graph references as relative IRI references resolved against that base.

Example:

  • Base: https://data.flur.ee/
  • Ref: <acme/people:main@t:1000#txn-meta>
  • Graph IRI: https://data.flur.ee/acme/people:main@t:1000#txn-meta

Character and encoding rules (user-facing)

To avoid ambiguity and URL pitfalls:

  • Reserved delimiters:
    • @ separates the time specifier
    • # separates a named-graph alias (fragment)
    • : is used inside the ledger ID as ledger:branch
  • Do not use raw @ or # inside ledger names, branch names, or named-graph aliases.
    • If needed, percent-encode them.
  • RFC3339 / ISO timestamps must be URL-safe:
    • Prefer UTC with Z (e.g., 2026-02-03T17:02:11Z).
    • If offsets are used (+05:00), they should be percent-encoded in IRI contexts.

Graph Sources (user-facing)

We use Graph Source as the umbrella term for anything you can name in FROM, FROM NAMED, or GRAPH <…> and query as part of a single execution.

Graph sources differ in capabilities:

  • Some behave like RDF triple stores (ledger graphs, some mappings).
  • Some provide specialized operators/patterns (BM25 and vector search).
  • Some support time pinning / time travel; others are head-only.

Categories

  • Ledger Graph Sources: RDF graphs stored in a ledger (default graph or named graph).
  • Index Graph Sources: persisted indexes queried through graph-integrated patterns (BM25, Vector/HNSW).
  • Mapped Graph Sources: non-ledger data mapped into an RDF-shaped graph (R2RML, Iceberg).

Conventions and examples

SPARQL: base + pinned graphs

BASE <https://data.flur.ee/>

SELECT ?s ?p ?o
FROM <acme/people:main@t:1000>
WHERE {
  ?s ?p ?o .
}

SPARQL: txn metadata named graph

BASE <https://data.flur.ee/>

SELECT ?commit ?t
FROM NAMED <acme/people:main@t:1000#txn-meta>
WHERE {
  GRAPH <acme/people:main@t:1000#txn-meta> {
    ?commit <https://ns.flur.ee/db#t> ?t .
  }
}

JSON-LD Query: pinned graph reference

{
  "from": "acme/people:main@t:1000",
  "select": ["?name"],
  "where": [{ "@id": "?p", "http://schema.org/name": "?name" }]
}

Internal naming conventions (Rust code)

These conventions govern how we name types and variables in Rust code related to graphs, ledgers, and identifiers.

Canonical identities: prefer explicit types, not raw String

Internally we should distinguish:

  • LedgerId: the durable ledger identifier (e.g., acme/people:main)
  • GraphIri: canonical graph identity used by SPARQL (Arc<str> or validated URL/IRI type)
  • GraphRef: user input token, resolved to a GraphIri using base rules

Even if these are initially just struct LedgerId(Arc<str>) newtypes, they prevent accidental mixing and make APIs self-documenting.

Naming rules (make each word mean one thing)

This repo reserves _id for content identifiers (e.g. commit_id, index_id, default_context_id) and identity/lookup keys (name:branch tokens). The recommended rule set:

  • id: canonical identifier used as a cache key / stable identity.
    • For ledgers this is the full name:branch form (e.g., people:main).
    • For graph sources this is the full name:branch form (e.g., products-search:main).
  • _id (for content references): a content identifier (ContentId) used by the storage layer to fetch immutable artifacts.
    • Examples: commit_id, index_id, default_context_id.
  • name: a base name without branch (e.g., people).
  • branch: the branch name (e.g., main).
  • alias: a human-friendly label, and only that. For ledger identifiers, prefer ledger_id.

Practical guidelines:

  • If a string is used to load/cache/lookup a ledger, call it ledger_id (not ledger_alias, not ledger_address).
  • If a string is used to load/cache/lookup a graph source by name:branch, call it graph_source_id (not gs_alias, not graph_source_alias).
  • If a value identifies a content-addressed artifact (commit, index, context), call it *_id (e.g., commit_id, index_id, default_context_id).
  • If a string is used to identify a graph in SPARQL (FROM, GRAPH), call it graph_iri (canonical) or graph_ref (user input).
  • Avoid having two different meanings for the same field name across crates.

Ledger vs Db vs Graph (internal meaning)

  • Ledger: the durable identity + commit chain + publication state (nameservice record).
  • LedgerSnapshot: the indexed snapshot value used for range/scan (hot path).
  • Graph: query scoping mechanism (active graph, dataset graph selection, GRAPH operator).

Avoid using “graph” as a synonym for “db” in code comments. Prefer:

  • “ledger graph” (RDF graph inside a ledger)
  • “graph IRI” (identifier)
  • “graph source” (registry/resolution concept)

“Dataset” naming internally

SPARQL calls the set-of-graphs a “dataset”, and the code already models a DataSet. For product-facing APIs and docs, prefer “federation”, but internally keep DataSet as the SPARQL-aligned term.


  • docs/concepts/time-travel.md (time pinning syntax)
  • docs/concepts/datasets-and-named-graphs.md (SPARQL dataset semantics)
  • docs/graph-sources/overview.md and docs/concepts/graph-sources.md (graph source overview)
  • docs/reference/glossary.md (terms glossary)

OWL & RDFS Support Reference

This page lists every OWL and RDFS construct that Fluree’s reasoning engine supports. For conceptual background see Reasoning and inference; for query syntax see Query-time reasoning.

Quick orientation

Fluree implements reasoning via two techniques:

  • Query rewriting (RDFS and OWL 2 QL modes) — patterns are expanded at compile time; no facts are materialized.
  • Forward-chaining materialization (OWL 2 RL mode) — derived facts are computed before query execution using the OWL 2 RL rule set.

The tables below indicate which technique handles each construct.


RDFS constructs

These constructs are handled by query rewriting in RDFS mode (and also by materialization in OWL 2 RL mode).

rdfs:subClassOf

Declares that every instance of one class is also an instance of another.

ex:Student  rdfs:subClassOf  ex:Person .

Effect: A query for ?x rdf:type ex:Person also returns instances typed as ex:Student (and any subclass of Student, transitively).

JSON-LD transaction:

{"@id": "ex:Student", "rdfs:subClassOf": {"@id": "ex:Person"}}

rdfs:subPropertyOf

Declares that one property is a specialization of another.

ex:hasMother  rdfs:subPropertyOf  ex:hasParent .

Effect: A query for ?x ex:hasParent ?y also returns triples asserted with ex:hasMother.

JSON-LD transaction:

{"@id": "ex:hasMother", "rdfs:subPropertyOf": {"@id": "ex:hasParent"}}

rdfs:domain

Declares that the subject of a property is an instance of a class.

ex:teaches  rdfs:domain  ex:Professor .

Effect (OWL 2 QL / OWL 2 RL): If ex:alice ex:teaches ex:cs101, then ex:alice rdf:type ex:Professor is inferred.

JSON-LD transaction:

{"@id": "ex:teaches", "rdfs:domain": {"@id": "ex:Professor"}}

rdfs:range

Declares that the object of a property is an instance of a class.

ex:teaches  rdfs:range  ex:Course .

Effect (OWL 2 QL / OWL 2 RL): If ex:alice ex:teaches ex:cs101, then ex:cs101 rdf:type ex:Course is inferred.

JSON-LD transaction:

{"@id": "ex:teaches", "rdfs:range": {"@id": "ex:Course"}}

Note: Range inference applies to IRI-valued objects only. Literal values (strings, numbers, etc.) are not assigned a type via rdfs:range.


OWL property constructs

These are handled by materialization in OWL 2 RL mode (some also by query rewriting in OWL 2 QL mode, as noted).

owl:inverseOf

Declares that two properties are inverses of each other.

ex:hasMother  owl:inverseOf  ex:motherOf .

Effect: If ex:alice ex:hasMother ex:carol, then ex:carol ex:motherOf ex:alice is inferred (and vice versa).

Handled by: OWL 2 QL (query rewriting) and OWL 2 RL (materialization).

OWL 2 RL rule: prp-inv

JSON-LD transaction:

{"@id": "ex:hasMother", "owl:inverseOf": {"@id": "ex:motherOf"}}

owl:SymmetricProperty

Declares that a property holds in both directions.

ex:livesWith  a  owl:SymmetricProperty .

Effect: If ex:alice ex:livesWith ex:bob, then ex:bob ex:livesWith ex:alice is inferred.

OWL 2 RL rule: prp-symp

JSON-LD transaction:

{"@id": "ex:livesWith", "@type": "owl:SymmetricProperty"}

owl:TransitiveProperty

Declares that a property chains through intermediate nodes.

ex:hasAncestor  a  owl:TransitiveProperty .

Effect: If ex:a ex:hasAncestor ex:b and ex:b ex:hasAncestor ex:c, then ex:a ex:hasAncestor ex:c is inferred.

OWL 2 RL rule: prp-trp

JSON-LD transaction:

{"@id": "ex:hasAncestor", "@type": "owl:TransitiveProperty"}

owl:FunctionalProperty

Declares that a property can have at most one value per subject.

ex:hasBirthDate  a  owl:FunctionalProperty .

Effect: If ex:alice ex:hasBirthDate ex:d1 and ex:alice ex:hasBirthDate ex:d2, then ex:d1 owl:sameAs ex:d2 is inferred.

OWL 2 RL rule: prp-fp

JSON-LD transaction:

{"@id": "ex:hasBirthDate", "@type": "owl:FunctionalProperty"}

owl:InverseFunctionalProperty

Declares that a property’s object uniquely identifies the subject.

ex:hasSSN  a  owl:InverseFunctionalProperty .

Effect: If ex:alice ex:hasSSN "123" and ex:bob ex:hasSSN "123", then ex:alice owl:sameAs ex:bob is inferred.

OWL 2 RL rule: prp-ifp

JSON-LD transaction:

{"@id": "ex:hasSSN", "@type": "owl:InverseFunctionalProperty"}

owl:equivalentProperty

Declares that two properties have identical extensions.

ex:author  owl:equivalentProperty  ex:writtenBy .

Effect: Treated as mutual rdfs:subPropertyOf — queries and rules see both properties’ triples when either is used.

owl:propertyChainAxiom

Declares that a chain of properties implies another property.

ex:hasUncle  owl:propertyChainAxiom  ( ex:hasParent  ex:hasBrother ) .

Effect: If ex:alice ex:hasParent ex:bob and ex:bob ex:hasBrother ex:charlie, then ex:alice ex:hasUncle ex:charlie is inferred.

OWL 2 RL rule: prp-spo2

Chains can be of arbitrary length (2 or more properties) and can include inverse elements:

ex:hasNephew  owl:propertyChainAxiom  (
    [ owl:inverseOf ex:hasBrother ]
    ex:hasChild
) .

JSON-LD transaction:

{
  "@id": "ex:hasUncle",
  "owl:propertyChainAxiom": {
    "@list": [{"@id": "ex:hasParent"}, {"@id": "ex:hasBrother"}]
  }
}

OWL class constructs

owl:equivalentClass

Declares that two classes have identical extensions.

ex:Pupil  owl:equivalentClass  ex:Student .

Effect: Instances of either class are inferred to be instances of both.

OWL 2 RL rule: cax-eqc

owl:hasKey

Declares a set of properties that uniquely identify instances of a class.

ex:Person  owl:hasKey  ( ex:hasSSN ) .

Effect: If two ex:Person instances share the same ex:hasSSN value, they are inferred to be owl:sameAs.

OWL 2 RL rule: prp-key


OWL restrictions (class expressions)

OWL restrictions define classes based on property constraints. They are used with OWL 2 RL materialization.

owl:hasValue

Defines a class of all subjects that have a specific value for a property.

ex:RedThings  a  owl:Restriction ;
    owl:onProperty  ex:color ;
    owl:hasValue     ex:Red .

Effect (forward — cls-hv1): If ?x rdf:type ex:RedThings, then ?x ex:color ex:Red is inferred.

Effect (backward — cls-hv2): If ?x ex:color ex:Red, then ?x rdf:type ex:RedThings is inferred.

Limitation: Currently supports IRI-valued hasValue only. Literal values (strings, numbers) are not yet supported.

owl:someValuesFrom

Defines a class of subjects that have at least one value of a given type for a property.

ex:Parent  a  owl:Restriction ;
    owl:onProperty      ex:hasChild ;
    owl:someValuesFrom  ex:Person .

Effect (cls-svf1): If ?x ex:hasChild ?y and ?y rdf:type ex:Person, then ?x rdf:type ex:Parent is inferred.

owl:allValuesFrom

Defines a class where all values of a property belong to a given type.

ex:VeganRestaurant  a  owl:Restriction ;
    owl:onProperty     ex:servesFood ;
    owl:allValuesFrom  ex:VeganDish .

Effect (cls-avf): If ?x rdf:type ex:VeganRestaurant and ?x ex:servesFood ?y, then ?y rdf:type ex:VeganDish is inferred.

owl:maxCardinality (= 1)

When a restriction specifies maxCardinality of 1, it acts like a context-specific functional property.

ex:SingleChild  a  owl:Restriction ;
    owl:onProperty      ex:hasChild ;
    owl:maxCardinality  1 .

Effect (cls-maxc2): If ?x rdf:type ex:SingleChild, ?x ex:hasChild ?y1, and ?x ex:hasChild ?y2, then ?y1 owl:sameAs ?y2 is inferred.

owl:maxQualifiedCardinality (= 1)

Like maxCardinality but restricted to objects of a specific class.

ex:MonogamousPerson  a  owl:Restriction ;
    owl:onProperty                  ex:marriedTo ;
    owl:maxQualifiedCardinality     1 ;
    owl:onClass                     ex:Person .

Effect (cls-maxqc3/4): If ?x is typed as this restriction class, has two ex:marriedTo values, and both are ex:Person, they are inferred to be owl:sameAs.


OWL class operations

owl:intersectionOf

Defines a class as the intersection of member classes.

ex:WorkingStudent  owl:intersectionOf  ( ex:Student  ex:Employee ) .

Effect (forward — cls-int1): If ?x rdf:type ex:Student and ?x rdf:type ex:Employee, then ?x rdf:type ex:WorkingStudent is inferred.

Effect (backward — cls-int2): If ?x rdf:type ex:WorkingStudent, then both ?x rdf:type ex:Student and ?x rdf:type ex:Employee are inferred.

owl:unionOf

Defines a class as the union of member classes.

ex:PersonOrOrg  owl:unionOf  ( ex:Person  ex:Organization ) .

Effect (cls-uni): If ?x rdf:type ex:Person (or ex:Organization), then ?x rdf:type ex:PersonOrOrg is inferred.

owl:oneOf

Defines an enumerated class — a fixed set of individuals.

ex:PrimaryColor  owl:oneOf  ( ex:Red  ex:Blue  ex:Yellow ) .

Effect (cls-oo): ex:Red, ex:Blue, and ex:Yellow are each inferred to be of type ex:PrimaryColor.


owl:sameAs

owl:sameAs declares that two IRIs refer to the same real-world entity.

ex:alice  owl:sameAs  ex:aliceSmith .

Effect: All facts about ex:alice and ex:aliceSmith are merged. Queries for either IRI return the combined set of properties.

How sameAs is produced

owl:sameAs can be asserted explicitly or inferred by these rules:

RuleTrigger
prp-fpFunctional property with multiple objects
prp-ifpInverse functional property with multiple subjects
prp-keyowl:hasKey match across instances
cls-maxc2maxCardinality = 1 violation
cls-maxqc3/4maxQualifiedCardinality = 1 violation

Equivalence properties

owl:sameAs is handled as an equivalence relation:

  • Symmetric: if a sameAs b then b sameAs a
  • Transitive: if a sameAs b and b sameAs c then a sameAs c
  • Reflexive: every resource is same-as itself (implicit)

The engine uses a union-find data structure to efficiently track equivalence classes and select a canonical representative for each.


OWL 2 RL rule index

For reference, the complete set of OWL 2 RL rules implemented by Fluree:

Identity-producing rules (Phase B)

These rules produce owl:sameAs facts and run before other rules to ensure proper canonicalization.

RuleConstructDescription
prp-fpowl:FunctionalPropertySame subject + different objects → sameAs
prp-ifpowl:InverseFunctionalPropertySame object + different subjects → sameAs
prp-keyowl:hasKeySame class + matching key values → sameAs
cls-maxc2owl:maxCardinality = 1Over-cardinality → sameAs
cls-maxqc3owl:maxQualifiedCardinality = 1Qualified over-cardinality → sameAs
cls-maxqc4owl:maxQualifiedCardinality = 1Variant for owl:Thing

Non-identity rules (Phase C)

RuleConstructDescription
prp-sympowl:SymmetricPropertyP(x,y) → P(y,x)
prp-trpowl:TransitivePropertyP(x,y) ∧ P(y,z) → P(x,z)
prp-invowl:inverseOfP(x,y) → Q(y,x)
prp-domrdfs:domainP(x,y) → type(x,C)
prp-rngrdfs:rangeP(x,y) → type(y,C)
prp-spo1rdfs:subPropertyOfP1(x,y) → P2(x,y)
prp-spo2owl:propertyChainAxiomChain match → P(first,last)
cax-scordfs:subClassOftype(x,C1) → type(x,C2)
cax-eqcowl:equivalentClasstype(x,C1) ↔ type(x,C2)
cls-hv1owl:hasValue (backward)type(x,C) → P(x,v)
cls-hv2owl:hasValue (forward)P(x,v) → type(x,C)
cls-svf1owl:someValuesFromP(x,y) ∧ type(y,D) → type(x,C)
cls-avfowl:allValuesFromtype(x,C) ∧ P(x,y) → type(y,D)
cls-int1owl:intersectionOf (forward)All member types → intersection type
cls-int2owl:intersectionOf (backward)Intersection type → all member types
cls-uniowl:unionOfAny member type → union type
cls-ooowl:oneOfListed individual → enumerated type

Known limitations

AreaLimitation
Literal hasValueowl:hasValue with literal values (strings, numbers) is not yet supported; only IRI-valued restrictions work.
Negationowl:complementOf and negation-as-failure are not supported. OWL 2 RL is a positive-only fragment.
Disjointnessowl:disjointWith and owl:AllDisjointClasses do not trigger inconsistency detection.
Cardinality > 1Only maxCardinality = 1 and maxQualifiedCardinality = 1 are implemented (these are the only identity-producing cardinalities in OWL 2 RL).
Datatype reasoningNo inference over datatypes (e.g., xsd:integer subtype of xsd:decimal).

Namespaces

For reference, the standard namespace prefixes:

PrefixURI
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
owlhttp://www.w3.org/2002/07/owl#
xsdhttp://www.w3.org/2001/XMLSchema#

Crate Map

Fluree is organized into multiple Rust crates, each with a specific purpose. This document provides an overview of the crate architecture and dependencies.

Crate Organization

fluree-db/
├── Core
│   ├── fluree-vocab/              # RDF vocabulary constants and namespace codes
│   ├── fluree-db-core/            # Runtime-agnostic core types and queries
│   └── fluree-db-novelty/         # Novelty overlay and commit types
│
├── Graph Processing
│   ├── fluree-graph-ir/           # Format-agnostic RDF intermediate representation
│   ├── fluree-graph-json-ld/      # JSON-LD processing
│   ├── fluree-graph-turtle/       # Turtle parser
│   └── fluree-graph-format/       # RDF formatters (JSON-LD, Turtle, etc.)
│
├── Query & Transaction
│   ├── fluree-db-query/           # Query engine (JSON-LD Query)
│   ├── fluree-db-sparql/          # SPARQL parser and lowering
│   └── fluree-db-transact/        # Transaction processing
│
├── Storage & Connection
│   ├── fluree-db-connection/      # Storage backends and connection management
│   ├── fluree-db-storage-aws/     # AWS storage (S3, S3 Express, DynamoDB)
│   ├── fluree-db-nameservice/     # Nameservice implementations
│   └── fluree-db-nameservice-sync/# Git-like remote sync for nameservice
│
├── Indexing
│   ├── fluree-db-binary-index/    # Binary index formats + read-side runtime
│   ├── fluree-db-indexer/         # Index building
│   └── fluree-db-ledger/          # Ledger state (indexed DB + novelty)
│
├── Security & Validation
│   ├── fluree-db-policy/          # Policy enforcement
│   ├── fluree-db-credential/      # JWS/VerifiableCredential verification
│   ├── fluree-db-crypto/          # Storage encryption (AES-256-GCM)
│   └── fluree-db-shacl/           # SHACL validation engine
│
├── Reasoning
│   └── fluree-db-reasoner/        # OWL2-RL reasoning engine
│
├── Graph Sources
│   ├── fluree-db-tabular/         # Tabular column batch types
│   ├── fluree-db-iceberg/         # Apache Iceberg integration
│   └── fluree-db-r2rml/           # R2RML mapping support
│
├── Search
│   ├── fluree-search-protocol/    # Search service protocol types
│   ├── fluree-search-service/     # Search backend implementations
│   └── fluree-search-httpd/       # Standalone HTTP search server
│
├── Networking
│   ├── fluree-sse/                # Server-Sent Events parser
│   └── fluree-db-peer/            # SSE protocol for peer mode
│
└── Top-Level
    ├── fluree-db-api/             # Public API and high-level operations
    └── fluree-db-server/          # HTTP server (binary)

Foundation Crates

fluree-vocab

Purpose: RDF vocabulary constants and namespace codes

Responsibilities:

  • Standard RDF namespace definitions (rdf:, rdfs:, xsd:, owl:, etc.)
  • Fluree-specific namespace codes
  • IRI constants for common predicates

Dependencies: None (foundation crate)

fluree-db-core

Purpose: Runtime-agnostic core library for Fluree DB

Responsibilities:

  • Core types (Flake, Sid, IndexType, etc.)
  • Index structures (SPOT, POST, OPST, PSOT)
  • Range query operations
  • Database snapshot representation
  • Statistics and cardinality tracking
  • Content-addressed identity (ContentId, ContentKind)
  • Content store trait (ContentStore)

Key Types:

  • Flake - Indexed triple representation
  • Sid - Subject identifier
  • LedgerSnapshot - Database snapshot at a point in time
  • IndexType - Index selection enum
  • StatsView - Query statistics
  • ContentId - CIDv1 content-addressed identifier
  • ContentKind - Content type enum (Commit, Txn, IndexRoot, etc.)
  • ContentStore - Content-addressed storage trait
  • BranchedContentStore - Recursive content store with namespace fallback for branches

Dependencies:

  • fluree-vocab

fluree-db-novelty

Purpose: Novelty overlay and commit types

Responsibilities:

  • In-memory novelty (uncommitted/unindexed flakes)
  • Commit metadata and structure
  • Novelty application and slicing

Key Types:

  • Novelty - In-memory flake overlay
  • Commit - Commit metadata
  • FlakeId - Novelty flake identifier

Dependencies:

  • fluree-db-core
  • fluree-db-binary-index
  • fluree-vocab

Graph Processing Crates

fluree-graph-ir

Purpose: Format-agnostic RDF intermediate representation

Responsibilities:

  • Generic graph IR for RDF data
  • Triple/quad representation
  • Format-independent graph operations

Dependencies:

  • fluree-vocab

fluree-graph-json-ld

Purpose: Minimal JSON-LD processing

Responsibilities:

  • JSON-LD expansion
  • JSON-LD compaction
  • @context handling
  • IRI resolution

Dependencies:

  • fluree-graph-ir
  • fluree-vocab

fluree-graph-turtle

Purpose: Turtle (TTL) parser

Responsibilities:

  • Turtle syntax parsing
  • Triple generation from Turtle

Dependencies:

  • fluree-graph-ir
  • fluree-vocab

fluree-graph-format

Purpose: RDF graph formatters

Responsibilities:

  • Output formatting (JSON-LD, Turtle, N-Triples)
  • Serialization utilities

Dependencies:

  • fluree-graph-ir

Query & Transaction Crates

fluree-db-query

Purpose: Query engine for JSON-LD Query

Responsibilities:

  • Query parsing and planning
  • Statistics-driven pattern reordering across all WHERE-clause pattern types (triples, UNION, OPTIONAL, MINUS, search patterns, Graph, Service, etc.)
  • Bound-variable-aware selectivity estimation using HLL-derived property statistics (with heuristic fallbacks)
  • Query execution
  • Filter pushdown (index-level range filters, inline join/BIND evaluation, dependency-based placement, compound pattern nesting)
  • Aggregations
  • BM25 and vector search integration
  • Explain plan generation for optimization debugging

Key Types:

  • Query - Parsed query
  • VarRegistry - Variable management
  • Pattern - Query patterns
  • TriplePattern - Subject–predicate–object pattern with optional DatatypeConstraint
  • Ref - Variable or constant in subject/predicate position (no literals)
  • Term - Variable or constant in object position (includes literals)
  • DatatypeConstraint - Explicit datatype (Explicit(Sid)) or language tag (LangTag; implies rdf:langString datatype)
  • PatternEstimate - Cardinality classification (Source, Reducer, Expander, Deferred)

Dependencies:

  • fluree-db-core

fluree-db-sparql

Purpose: SPARQL parsing and execution

Responsibilities:

  • SPARQL lexing and parsing
  • AST construction
  • Lowering to internal IR
  • Diagnostic reporting

Key Types:

  • Query - SPARQL query AST
  • Pattern - Graph pattern
  • Diagnostic - Parse/validation errors

Dependencies:

  • fluree-db-query
  • fluree-db-core

fluree-db-transact

Purpose: Transaction processing

Responsibilities:

  • JSON-LD transaction parsing
  • RDF triple generation
  • Flake generation
  • Commit creation

Dependencies:

  • fluree-graph-json-ld
  • fluree-db-core

Storage & Connection Crates

fluree-db-connection

Purpose: Storage backends and connection management

Responsibilities:

  • Storage abstraction trait
  • Memory, file, and cloud storage
  • Address resolution
  • Commit storage and retrieval

Key Types:

  • Storage trait
  • MemoryStorage
  • FileStorage

Dependencies:

  • fluree-db-core
  • fluree-graph-json-ld
  • fluree-db-storage-aws (optional)
  • fluree-db-nameservice

fluree-db-storage-aws

Purpose: AWS storage backends

Responsibilities:

  • S3 storage implementation
  • S3 Express One Zone support
  • DynamoDB integration

Dependencies:

  • fluree-db-core
  • fluree-db-nameservice

fluree-db-nameservice

Purpose: Nameservice implementations

Responsibilities:

  • Nameservice abstraction
  • Ledger metadata management
  • Publish/lookup operations
  • Branch creation and listing
  • File and DynamoDB backends

Key Types:

  • NameService trait (includes list_branches, create_branch, drop_branch)
  • Publisher trait (commit/index publishing)
  • NsRecord - Nameservice record (includes source_branch for ancestry and branches child count for reference counting)
  • FileNameService

Dependencies:

  • fluree-db-core

fluree-db-nameservice-sync

Purpose: Git-like remote sync for nameservice

Responsibilities:

  • Remote nameservice synchronization (fetch/push refs)
  • Multi-origin CAS object fetching with integrity verification
  • Pack protocol client (streaming binary transport for clone/pull)
  • SSE-based change streaming
  • Sync driver (fetch/pull/push orchestration)

Key Types:

  • MultiOriginFetcher - Priority-ordered HTTP origin fallback
  • HttpOriginFetcher - Single-origin CAS object + pack fetcher
  • SyncDriver - Orchestrates fetch/pull/push with remote clients
  • PackIngestResult - Result of streaming pack import

Dependencies:

  • fluree-db-core
  • fluree-db-nameservice
  • fluree-db-novelty
  • fluree-sse

Indexing Crates

fluree-db-binary-index

Purpose: Binary index wire formats and read-side runtime

Responsibilities:

  • Binary index format codecs (FIR6 root, FBR3 branch, FLI3 leaf, leaflet layout)
  • Dictionary artifacts and readers (inline dicts, dict trees, arenas)
  • Query-time read types (BinaryIndexStore, BinaryGraphView, cursors)

Dependencies:

  • fluree-db-core

fluree-db-indexer

Purpose: Index building for Fluree DB

Responsibilities:

  • Incremental index updates
  • Full reindexing
  • Index refresh orchestration

Dependencies:

  • fluree-db-core
  • fluree-db-binary-index
  • fluree-db-novelty
  • fluree-db-nameservice
  • fluree-vocab

fluree-db-ledger

Purpose: Ledger state management

Responsibilities:

  • Combining indexed DB with novelty overlay
  • Ledger snapshot creation
  • State transitions
  • Building BranchedContentStore trees from branch ancestry

Key Types:

  • LedgerState - Complete ledger snapshot

Dependencies:

  • fluree-db-core
  • fluree-db-novelty
  • fluree-db-nameservice

Security & Validation Crates

fluree-db-policy

Purpose: Policy enforcement

Responsibilities:

  • Policy parsing and evaluation
  • Query augmentation for policy
  • Transaction authorization

Dependencies:

  • fluree-db-query
  • fluree-db-core

fluree-db-credential

Purpose: Credential verification

Responsibilities:

  • JWS signature verification
  • VerifiableCredential processing
  • DID resolution

Dependencies: None (standalone)

fluree-db-crypto

Purpose: Storage encryption

Responsibilities:

  • AES-256-GCM encryption/decryption
  • Key management
  • Encrypted storage layer

Dependencies:

  • fluree-db-core

fluree-db-shacl

Purpose: SHACL validation engine

Responsibilities:

  • SHACL shapes parsing
  • Constraint validation
  • Validation reports

Dependencies:

  • fluree-db-core
  • fluree-db-query
  • fluree-vocab

Reasoning

fluree-db-reasoner

Purpose: OWL2-RL reasoning engine

Responsibilities:

  • OWL2-RL rule application
  • Inference generation
  • Materialization

Dependencies:

  • fluree-db-core
  • fluree-vocab

Graph Source Crates

fluree-db-tabular

Purpose: Tabular column batch types

Responsibilities:

  • Arrow-compatible column batches
  • Graph source data abstraction

Dependencies: None (foundation for graph sources)

fluree-db-iceberg

Purpose: Apache Iceberg integration

Responsibilities:

  • Iceberg REST catalog support
  • Iceberg table scanning
  • Parquet file reading

Dependencies:

  • fluree-db-core
  • fluree-db-tabular

fluree-db-r2rml

Purpose: R2RML mapping support

Responsibilities:

  • R2RML mapping parsing
  • Relational-to-RDF mapping
  • Graph source generation

Dependencies:

  • fluree-graph-ir
  • fluree-graph-turtle (optional)
  • fluree-db-tabular
  • fluree-vocab

Search Crates

fluree-search-protocol

Purpose: Search service protocol types

Responsibilities:

  • Request/response structs
  • Error model and codes
  • Protocol version constants
  • BM25 and vector query definitions

Dependencies: serde, thiserror

fluree-search-service

Purpose: Search backend implementations

Responsibilities:

  • SearchBackend trait
  • BM25 backend (tantivy)
  • Vector backend (usearch, feature-gated)
  • Index caching with TTL

Dependencies:

  • fluree-search-protocol
  • fluree-db-query
  • fluree-db-core

fluree-search-httpd

Purpose: Standalone HTTP search server

Responsibilities:

  • HTTP API for search queries
  • Index loading from storage
  • Health and capabilities endpoints

Dependencies:

  • fluree-search-protocol
  • fluree-search-service
  • axum, tokio

Networking Crates

fluree-sse

Purpose: Lightweight SSE parser

Responsibilities:

  • Server-Sent Events parsing
  • Event stream handling

Dependencies: None (foundation)

fluree-db-peer

Purpose: SSE protocol for peer mode

Responsibilities:

  • Peer protocol types
  • SSE client for peer communication

Dependencies:

  • fluree-sse

Top-Level Crates

fluree-db-api

Purpose: Public API and orchestration

Responsibilities:

  • Ledger lifecycle (create, load, drop, branch)
  • Query execution coordination
  • Transaction execution
  • Time travel resolution
  • Policy application
  • Dataset and view composition

Key Types:

  • Fluree - Main entry point
  • Graph - Lazy handle for chaining
  • GraphSnapshot - Materialized snapshot
  • LedgerState - Loaded ledger state
  • QueryResult - Query results
  • TransactResult - Commit receipt

Dependencies:

  • fluree-db-query
  • fluree-db-sparql
  • fluree-db-transact
  • fluree-db-connection
  • fluree-db-nameservice
  • fluree-db-policy
  • fluree-db-reasoner
  • fluree-db-shacl

fluree-db-server

Purpose: HTTP server (binary)

Responsibilities:

  • HTTP API endpoints
  • Request routing
  • Response formatting
  • TLS/SSL, CORS handling

Dependencies:

  • fluree-db-api
  • axum

Dependency Layers

Layer 5 (Top)        fluree-db-server
                            │
                     fluree-db-api
                            │
Layer 4 (Features)   ┌──────┼──────┬──────────┬───────────┐
                     │      │      │          │           │
                  policy  shacl reasoner  credential  crypto
                     │      │      │
Layer 3 (Query)      └──────┴──────┴──────────┐
                                              │
                     fluree-db-query ←── fluree-db-sparql
                            │
Layer 2 (Data)       ledger, binary-index, indexer, novelty, connection
                            │
Layer 1 (Core)       fluree-db-core
                            │
Layer 0 (Foundation) fluree-vocab, fluree-sse, fluree-db-tabular

External Dependencies

Key External Crates

Web Framework:

  • axum - HTTP server framework
  • tokio - Async runtime
  • tower - Service abstractions

Serialization:

  • serde - Serialization framework
  • serde_json - JSON support

RDF:

  • oxiri - IRI parsing and validation

Storage:

  • aws-sdk-s3 - AWS S3 client
  • aws-sdk-dynamodb - AWS DynamoDB client

Search:

  • tantivy - BM25 full-text search
  • usearch - Vector similarity search (HNSW indexes)

Analytics:

  • iceberg-rust - Apache Iceberg support
  • parquet - Parquet file reading

Cryptography:

  • ed25519-dalek - Ed25519 signatures
  • ring - Cryptographic operations

Building

Build All

cargo build --release

Build Server Only

cargo build --release --bin fluree-db-server

Run Tests

cargo test

Build with Features

cargo build --features native,vector

Crate Versions

All crates use synchronized versioning and are updated together.

Check versions:

cargo tree | grep fluree

Contributing

Welcome to the Fluree contributor documentation! This section provides everything you need to contribute to Fluree.

Getting Started

Dev Setup

Set up your development environment:

  • Install dependencies
  • Clone repository
  • Build from source
  • Run development server
  • IDE configuration

Tests

Testing guide:

  • Running tests
  • Writing tests
  • Test organization
  • Integration tests
  • Benchmarking
  • Continuous integration

Adding Tracing Spans

How to instrument new code paths with tracing spans:

  • The two-tier span strategy (info / debug / trace)
  • Code patterns for sync and async spans
  • Deferred field recording
  • Testing spans with SpanCaptureLayer
  • Common gotchas (!Send guards, OTEL floods, etc.)

W3C SPARQL Compliance Suite

Guide to the manifest-driven W3C compliance test suite:

  • Running and interpreting results
  • Debugging failures
  • From failure to issue/PR workflow
  • Using Claude Code for compliance work
  • Architecture overview

SHACL Implementation

How SHACL validation is wired into Fluree, for contributors adding constraints or fixing bugs:

  • Pipeline: compile → cache → validate
  • Crate layout (fluree-db-shacl / -transact / -api)
  • Shared post-stage helper and its call sites
  • Per-graph config, f:shapesSource, target-type resolution
  • Adding a new constraint (walkthrough)
  • Testing patterns (unit + integration + temp-revert regression trick)
  • Known gaps (sh:uniqueLang, sh:qualifiedValueShape, cross-txn cache)

How to Contribute

Ways to Contribute

  1. Report Bugs: File issues with reproduction steps
  2. Suggest Features: Propose enhancements with use cases
  3. Fix Bugs: Submit pull requests for bug fixes
  4. Add Features: Implement new capabilities
  5. Improve Documentation: Fix typos, clarify explanations, add examples
  6. Review Pull Requests: Help review others’ contributions
  7. Answer Questions: Help users in discussions

Before Contributing

  1. Check existing issues: Search for duplicate issues
  2. Read documentation: Understand the feature area
  3. Discuss major changes: Open issue before large PRs
  4. Follow style guide: Match existing code style
  5. Add tests: Include tests for new features
  6. Update docs: Document new features

Contribution Workflow

1. Fork Repository

# Fork on GitHub, then clone
git clone https://github.com/YOUR-USERNAME/db.git
cd db

2. Create Branch

git checkout -b feature/my-feature

Branch naming:

  • feature/ - New features
  • fix/ - Bug fixes
  • docs/ - Documentation
  • refactor/ - Code refactoring
  • test/ - Test additions

3. Make Changes

Edit code, following style guidelines.

4. Add Tests

# Run existing tests
cargo test

# Add new tests
# Edit tests/test_my_feature.rs

5. Run Checks

# Format code
cargo fmt

# Lint code
cargo clippy

# Run all tests
cargo test --all

6. Commit Changes

git add .
git commit -m "Add feature: description"

Commit message format:

Short summary (50 chars or less)

More detailed explanation if needed. Wrap at 72 characters.

- Key point 1
- Key point 2

Fixes #123

7. Push and Create PR

git push origin feature/my-feature

Create pull request on GitHub.

8. Address Review Comments

Respond to reviewer feedback, make requested changes.

Code Style

Rust Style

Follow Rust standard style:

# Format all code
cargo fmt

# Check style
cargo clippy

Naming Conventions

Types: PascalCase

#![allow(unused)]
fn main() {
struct Dataset { ... }
enum QueryResult { ... }
}

Functions: snake_case

#![allow(unused)]
fn main() {
fn execute_query() { ... }
fn parse_json_ld() { ... }
}

Constants: SCREAMING_SNAKE_CASE

#![allow(unused)]
fn main() {
const MAX_QUERY_SIZE: usize = 1_048_576;
}

Modules: snake_case

#![allow(unused)]
fn main() {
mod query_engine;
mod storage_backend;
}

Documentation

Document public APIs:

#![allow(unused)]
fn main() {
/// Executes a query against the dataset.
///
/// # Arguments
///
/// * `query` - The query to execute
/// * `context` - Execution context
///
/// # Returns
///
/// Query results or error
///
/// # Examples
///
/// ```
/// let results = dataset.query(&query, &context)?;
/// ```
pub fn query(&self, query: &Query, context: &Context) -> Result<Vec<Solution>> {
    // Implementation
}
}

Error Handling

Use Result types:

#![allow(unused)]
fn main() {
// Good
pub fn parse_query(input: &str) -> Result<Query, ParseError> {
    // ...
}

// Bad
pub fn parse_query(input: &str) -> Query {
    // No error handling
}
}

Testing

Write tests for new code:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_query_execution() {
        let query = Query::parse("...").unwrap();
        let result = execute(&query).unwrap();
        assert_eq!(result.len(), 2);
    }
}
}

Pull Request Guidelines

PR Title

Format: category: short description

Examples:

  • feat: Add SPARQL property paths support
  • fix: Correct transaction time ordering
  • docs: Update query examples
  • test: Add integration tests for time travel
  • refactor: Simplify index scan logic

PR Description

Include:

  1. Summary: What does this PR do?
  2. Motivation: Why is this needed?
  3. Changes: What changed?
  4. Testing: How was it tested?
  5. Breaking Changes: Any breaking changes?

Example:

## Summary
Adds support for SPARQL property paths, enabling recursive graph traversal.

## Motivation
Many users need to query hierarchical data structures. Property paths are a standard SPARQL feature.

## Changes
- Added property path parser to fluree-db-sparql
- Implemented path evaluation in query engine
- Added tests for various path patterns

## Testing
- Unit tests for parser
- Integration tests for path queries
- Benchmarks show acceptable performance

## Breaking Changes
None

PR Checklist

  • Code follows style guidelines
  • Tests added/updated
  • Documentation updated
  • All tests pass
  • No clippy warnings
  • Commit messages clear
  • PR description complete

Review Process

What Reviewers Look For

  1. Correctness: Does it work as intended?
  2. Tests: Adequate test coverage?
  3. Style: Follows conventions?
  4. Documentation: Properly documented?
  5. Performance: No obvious performance issues?
  6. Breaking Changes: Backward compatible?

Responding to Reviews

  • Be receptive to feedback
  • Ask questions if unclear
  • Make requested changes promptly
  • Explain your reasoning when appropriate
  • Say thanks for helpful reviews

Community Guidelines

Code of Conduct

  • Be respectful and inclusive
  • Assume good intentions
  • Give constructive feedback
  • Welcome newcomers
  • No harassment or discrimination

Communication

  • GitHub Issues: Bug reports, feature requests
  • Pull Requests: Code contributions
  • Discussions: Questions, ideas, help

Getting Help

  • Read documentation first
  • Search existing issues
  • Ask specific questions
  • Provide reproduction steps
  • Be patient and respectful

License

Contributions licensed under Apache 2.0.

By contributing, you agree to license your contributions under the same license.

Recognition

Contributors are recognized in:

  • CONTRIBUTORS.md file
  • Release notes
  • GitHub contributors page

Thank you for contributing to Fluree!

Development Setup

This guide walks through setting up a development environment for contributing to Fluree.

Prerequisites

Required

Rust:

# Install rustup
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Verify installation
rustc --version  # Should be 1.75.0 or later
cargo --version

Git:

git --version  # Should be 2.0 or later

IDE/Editor:

  • Visual Studio Code with rust-analyzer
  • IntelliJ IDEA with Rust plugin
  • Vim/Neovim with rust-analyzer LSP

Tools:

  • cargo-watch - Auto-rebuild on changes
  • cargo-nextest - Faster test runner
  • cargo-flamegraph - Performance profiling
cargo install cargo-watch cargo-nextest cargo-flamegraph

Clone Repository

# Clone main repository
git clone https://github.com/fluree/db.git
cd db

# Or clone your fork
git clone https://github.com/YOUR-USERNAME/db.git
cd db

Build from Source

Development Build

# Build all crates
cargo build

# Build specific crate
cd fluree-db-query
cargo build

Release Build

# Optimized build
cargo build --release

# Server binary at: target/release/fluree-db-server

Build Server Only

cargo build --release --bin fluree-db-server

Run Development Server

Quick Start

# Run with default settings (memory storage)
cargo run --bin fluree-db-server

Server starts on http://localhost:8090

With Custom Settings

cargo run --bin fluree-db-server -- \
  --storage file \
  --data-dir ./dev-data \
  --log-level debug

Watch Mode

Auto-rebuild and restart on changes:

cargo watch -x 'run --bin fluree-db-server'

Run Tests

All Tests

cargo test --all

Specific Crate Tests

cd fluree-db-query
cargo test

Specific Test

cargo test test_query_execution

With Output

cargo test -- --nocapture

Integration Tests

cargo test --test integration_tests

With Nextest (Faster)

cargo nextest run

IDE Setup

Visual Studio Code

Install Extensions:

  • rust-analyzer
  • CodeLLDB (debugging)
  • Even Better TOML

Settings (.vscode/settings.json):

{
  "rust-analyzer.cargo.features": "all",
  "rust-analyzer.checkOnSave.command": "clippy",
  "rust-analyzer.inlayHints.enable": true
}

Launch Config (.vscode/launch.json):

{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "lldb",
      "request": "launch",
      "name": "Debug server",
      "cargo": {
        "args": ["build", "--bin=fluree-db-server"],
        "filter": {
          "name": "fluree-db-server",
          "kind": "bin"
        }
      },
      "args": ["--storage", "memory", "--log-level", "debug"],
      "cwd": "${workspaceFolder}"
    }
  ]
}

IntelliJ IDEA

Install Plugin:

  • Rust plugin (official)

Configure:

  • File → Settings → Languages & Frameworks → Rust
  • Set toolchain location
  • Enable external linter (clippy)

Vim/Neovim

Install rust-analyzer:

For Neovim with built-in LSP:

-- init.lua
require'lspconfig'.rust_analyzer.setup{}

For Vim with CoC:

" Install coc-rust-analyzer
:CocInstall coc-rust-analyzer

Development Workflow

Make Changes

# Create branch
git checkout -b feature/my-feature

# Edit code
vim fluree-db-query/src/execute.rs

# Format
cargo fmt

# Check
cargo clippy

Test Changes

# Run affected tests
cargo test -p fluree-db-query

# Run all tests
cargo test --all

Verify Build

# Development build
cargo build

# Release build
cargo build --release

# Check all features compile
cargo build --all-features

Run Server Locally

cargo run --bin fluree-db-server -- \
  --storage memory \
  --log-level debug

Test your changes:

# In another terminal
curl http://localhost:8090/health

curl -X POST http://localhost:8090/v1/fluree/query -d '{...}'

Debugging

With rust-lldb

# Build with debug symbols
cargo build

# Run with lldb
rust-lldb target/debug/fluree-db-server

# Set breakpoint
(lldb) b fluree_db_query::execute::execute_query
(lldb) run --storage memory

# Debug commands
(lldb) continue
(lldb) step
(lldb) print variable_name

With VS Code

Use launch.json configuration from above, then F5 to debug.

#![allow(unused)]
fn main() {
// Quick debugging
println!("Debug: value = {:?}", value);

// Better: use tracing
tracing::debug!(?value, "Processing query");
}

Logging

Enable debug logs:

RUST_LOG=debug cargo run --bin fluree-db-server

Or trace specific module:

RUST_LOG=fluree_db_query=trace cargo run --bin fluree-db-server

Performance Profiling

Criterion Benchmarks

Run benchmarks:

cargo bench

View results: target/criterion/report/index.html

Flamegraphs

Generate flamegraph:

# Install tools (Linux)
sudo apt install linux-tools-common linux-tools-generic

# Generate flamegraph
cargo flamegraph --bin fluree-db-server

# Open flamegraph.svg in browser

perf (Linux)

# Record
cargo build --release
perf record -g target/release/fluree-db-server

# Report
perf report

Common Development Tasks

Add New Query Feature

  1. Add to query parser (fluree-db-query/src/parse/)
  2. Add to query executor (fluree-db-query/src/execute/)
  3. Add tests (fluree-db-query/tests/)
  4. Update documentation (docs/query/)

Add New Transaction Feature

  1. Add to transaction parser (fluree-db-transact/src/parse/)
  2. Add to staging logic (fluree-db-transact/src/stage.rs)
  3. Add tests (fluree-db-transact/tests/)
  4. Update documentation (docs/transactions/)

Add New Storage Backend

  1. Implement Storage trait (fluree-db-storage/src/)
  2. Add backend-specific logic
  3. Add tests
  4. Update configuration options
  5. Document in docs/operations/storage.md

Code Organization

Module Structure

fluree-db-query/
├── src/
│   ├── lib.rs           # Public API and re-exports
│   ├── triple.rs        # TriplePattern, Ref, Term, DatatypeConstraint
│   ├── parse/           # Query parsing
│   │   ├── mod.rs
│   │   ├── ast.rs       # Unresolved AST (before IRI resolution)
│   │   ├── lower.rs     # AST → IR lowering
│   │   └── node_map.rs  # JSON-LD node-map → AST
│   ├── execute/         # Query execution
│   │   ├── mod.rs
│   │   ├── runner.rs
│   │   ├── operator_tree.rs
│   │   └── where_plan.rs  # WHERE-clause planning (pattern types, reordering)
│   ├── bind.rs          # Variable binding
│   └── filter.rs        # Filter evaluation
├── tests/               # Integration tests
└── benches/             # Benchmarks

Import Organization

#![allow(unused)]
fn main() {
// Standard library
use std::collections::HashMap;

// External crates
use serde::{Deserialize, Serialize};

// Internal crates
use fluree_db_common::{Iri, Literal};

// Current crate
use crate::parse::Query;
}

Documentation

Code Documentation

Use Rustdoc:

#![allow(unused)]
fn main() {
/// Executes a query against a dataset.
///
/// This function parses the query, generates an execution plan,
/// and runs the plan against the dataset's indexes.
///
/// # Arguments
///
/// * `dataset` - The dataset to query
/// * `query` - The query to execute
///
/// # Returns
///
/// A vector of solutions (variable bindings)
///
/// # Errors
///
/// Returns error if query is invalid or execution fails
///
/// # Examples
///
/// ```
/// use fluree_db_api::query;
///
/// let results = query(&dataset, &query)?;
/// assert_eq!(results.len(), 10);
/// ```
pub fn query(dataset: &Dataset, query: &Query) -> Result<Vec<Solution>> {
    // Implementation
}
}

Generate docs:

cargo doc --open

User Documentation

Update relevant docs in docs/ directory when adding user-facing features.

Dependencies

Adding Dependencies

Add to Cargo.toml:

[dependencies]
serde = { version = "1.0", features = ["derive"] }
tokio = { version = "1.35", features = ["full"] }

Updating Dependencies

# Update all dependencies
cargo update

# Update specific dependency
cargo update -p serde

Checking for Outdated

cargo install cargo-outdated
cargo outdated

Troubleshooting Development Issues

Build Fails

# Clean and rebuild
cargo clean
cargo build

Tests Fail

# Run with output
cargo test -- --nocapture

# Run specific test
cargo test test_name -- --nocapture

Clippy Warnings

# Fix automatically where possible
cargo clippy --fix

rustfmt Issues

# Format all code
cargo fmt

Development Tools

Cargo Commands

cargo build          # Build
cargo test           # Test
cargo run            # Run
cargo bench          # Benchmark
cargo doc            # Documentation
cargo clean          # Clean
cargo check          # Quick check (no binary)
cargo clippy         # Lint
cargo fmt            # Format

Useful Cargo Plugins

# Install useful plugins
cargo install cargo-watch      # Auto-rebuild
cargo install cargo-nextest    # Faster tests
cargo install cargo-outdated   # Check deps
cargo install cargo-audit      # Security audit
cargo install cargo-expand     # Expand macros

Performance Tips

Development Builds

Use development builds during development:

  • Faster compilation
  • Slower execution
  • Debug symbols included

Release Builds

Use release builds for testing performance:

cargo build --release
cargo test --release

For maximum performance:

[profile.release]
lto = true
codegen-units = 1

Warning: Significantly slower compile times.

Tests

This guide covers testing practices, test organization, and how to run tests in the Fluree codebase.

Test Organization

Unit Tests

Tests in the same file as code:

#![allow(unused)]
fn main() {
// src/query.rs
pub fn execute_query(query: &Query) -> Result<Vec<Solution>> {
    // Implementation
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_execute_query() {
        let query = Query::parse("SELECT ?s WHERE { ?s ?p ?o }").unwrap();
        let results = execute_query(&query).unwrap();
        assert!(!results.is_empty());
    }
}
}

Integration Tests

Tests in tests/ directory:

#![allow(unused)]
fn main() {
// tests/integration_test.rs
use fluree_db_api::{Dataset, query};

#[test]
fn test_query_workflow() {
    let dataset = Dataset::new_memory();
    
    // Insert data
    dataset.transact(test_data()).unwrap();
    
    // Query data
    let results = query(&dataset, test_query()).unwrap();
    
    // Verify
    assert_eq!(results.len(), 5);
}
}

Example Tests

Tests in examples/:

// examples/basic_query.rs
fn main() -> Result<()> {
    let dataset = Dataset::new_memory();
    dataset.transact(sample_data())?;
    let results = dataset.query(sample_query())?;
    println!("Results: {:?}", results);
    Ok(())
}

Run with:

cargo run --example basic_query

Running Tests

All Tests

cargo test --all

Opt-in LocalStack (S3/DynamoDB) tests

Some AWS/S3 tests are intentionally opt-in and will not run during typical cargo test runs. They require Docker and start LocalStack automatically.

cargo test -p fluree-db-connection --features aws-testcontainers --test aws_testcontainers_test -- --nocapture

Specific Crate

cargo test -p fluree-db-query

Specific Test

cargo test test_query_execution

With Output

cargo test -- --nocapture

Integration Tests Only

cargo test --test '*'

Doc Tests

cargo test --doc

With Nextest (Faster)

cargo nextest run

Writing Tests

Unit Test Example

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_parse_simple_query() {
        let input = r#"{"select": ["?s"], "where": [{"@id": "?s"}]}"#;
        let query = parse_query(input).unwrap();
        
        assert_eq!(query.select_vars.len(), 1);
        assert_eq!(query.where_patterns.len(), 1);
    }

    #[test]
    fn test_parse_invalid_query() {
        let input = "invalid json";
        let result = parse_query(input);
        
        assert!(result.is_err());
        assert!(matches!(result.unwrap_err(), ParseError::InvalidJson));
    }
}
}

Integration Test Example

#![allow(unused)]
fn main() {
// tests/it_query.rs
use fluree_db_api::*;

#[tokio::test]
async fn test_basic_query() {
    // Setup
    let dataset = Dataset::new_memory().await.unwrap();
    
    // Insert test data
    let txn = r#"{
        "@context": {"ex": "http://example.org/ns/"},
        "@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
    }"#;
    dataset.transact(txn).await.unwrap();
    
    // Execute query
    let query = r#"{
        "from": "test:main",
        "select": ["?name"],
        "where": [{"@id": "?s", "ex:name": "?name"}]
    }"#;
    let results = dataset.query(query).await.unwrap();
    
    // Verify
    assert_eq!(results.len(), 1);
    assert_eq!(results[0]["name"], "Alice");
}
}

Async Tests

Use tokio test runtime:

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_async_operation() {
    let result = async_function().await.unwrap();
    assert_eq!(result, expected);
}
}

Property-Based Tests

Use proptest for property-based testing:

#![allow(unused)]
fn main() {
use proptest::prelude::*;

proptest! {
    #[test]
    fn test_parse_roundtrip(s in "\\PC*") {
        let iri = Iri::parse(&s)?;
        let serialized = iri.to_string();
        let reparsed = Iri::parse(&serialized)?;
        assert_eq!(iri, reparsed);
    }
}
}

Test Fixtures

Test Data

Create reusable test data:

#![allow(unused)]
fn main() {
// tests/fixtures/mod.rs
pub fn sample_person_data() -> &'static str {
    r#"{
        "@context": {"schema": "http://schema.org/"},
        "@graph": [
            {"@id": "ex:alice", "@type": "schema:Person", "schema:name": "Alice"},
            {"@id": "ex:bob", "@type": "schema:Person", "schema:name": "Bob"}
        ]
    }"#
}

pub fn sample_query() -> &'static str {
    r#"{
        "select": ["?name"],
        "where": [{"@id": "?p", "schema:name": "?name"}]
    }"#
}
}

Use in tests:

#![allow(unused)]
fn main() {
#[test]
fn test_with_fixtures() {
    let dataset = Dataset::new_memory();
    dataset.transact(fixtures::sample_person_data()).unwrap();
    let results = dataset.query(fixtures::sample_query()).unwrap();
    assert_eq!(results.len(), 2);
}
}

Test Helpers

#![allow(unused)]
fn main() {
// tests/helpers/mod.rs
pub async fn setup_test_dataset() -> Dataset {
    let dataset = Dataset::new_memory().await.unwrap();
    dataset.transact(sample_data()).await.unwrap();
    dataset
}

pub fn assert_query_results(results: &[Solution], expected: &[(&str, &str)]) {
    assert_eq!(results.len(), expected.len());
    for (result, (var, value)) in results.iter().zip(expected) {
        assert_eq!(result.get(var).unwrap().to_string(), *value);
    }
}
}

Test Categories

Fast Tests

Quick unit tests:

#![allow(unused)]
fn main() {
#[test]
fn test_fast_operation() {
    // < 1ms execution
}
}

Slow Tests

Tests that take longer:

#![allow(unused)]
fn main() {
#[test]
#[ignore]  // Ignored by default
fn test_slow_operation() {
    // > 1s execution
}
}

Run slow tests:

cargo test -- --ignored

Integration Tests

End-to-end workflows:

#![allow(unused)]
fn main() {
// tests/it_full_workflow.rs
#[tokio::test]
async fn test_complete_workflow() {
    let dataset = setup_test_dataset().await;
    
    // Multiple operations
    transact_initial_data(&dataset).await;
    query_and_verify(&dataset).await;
    update_data(&dataset).await;
    query_history(&dataset).await;
}
}

Benchmarking

Criterion Benchmarks

Create benchmarks:

#![allow(unused)]
fn main() {
// benches/query_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use fluree_db_query::*;

fn benchmark_query_execution(c: &mut Criterion) {
    let dataset = setup_benchmark_dataset();
    let query = parse_query(QUERY).unwrap();
    
    c.bench_function("query execution", |b| {
        b.iter(|| {
            execute_query(black_box(&dataset), black_box(&query))
        });
    });
}

criterion_group!(benches, benchmark_query_execution);
criterion_main!(benches);
}

Run benchmarks:

cargo bench

Comparison Benchmarks

Compare different approaches:

#![allow(unused)]
fn main() {
fn benchmark_approaches(c: &mut Criterion) {
    let mut group = c.benchmark_group("approach_comparison");
    
    group.bench_function("approach_1", |b| {
        b.iter(|| approach_1(black_box(&data)))
    });
    
    group.bench_function("approach_2", |b| {
        b.iter(|| approach_2(black_box(&data)))
    });
    
    group.finish();
}
}

Continuous Integration

GitHub Actions

Tests run automatically on:

  • Pull requests
  • Commits to main
  • Scheduled (nightly)

Workflow: .github/workflows/test.yml

name: Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
      - run: cargo test --all
      - run: cargo clippy -- -D warnings
      - run: cargo fmt -- --check

Pre-commit Checks

Run before committing:

#!/bin/bash
# .git/hooks/pre-commit

cargo fmt --check || exit 1
cargo clippy -- -D warnings || exit 1
cargo test --all || exit 1

Make executable:

chmod +x .git/hooks/pre-commit

Test Best Practices

1. Test One Thing

Each test should verify one behavior:

Good:

#![allow(unused)]
fn main() {
#[test]
fn test_query_returns_correct_count() {
    let results = query(&dataset, &query).unwrap();
    assert_eq!(results.len(), 5);
}

#[test]
fn test_query_returns_correct_values() {
    let results = query(&dataset, &query).unwrap();
    assert_eq!(results[0]["name"], "Alice");
}
}

Bad:

#![allow(unused)]
fn main() {
#[test]
fn test_query() {
    let results = query(&dataset, &query).unwrap();
    assert_eq!(results.len(), 5);
    assert_eq!(results[0]["name"], "Alice");
    assert_eq!(results[1]["name"], "Bob");
    // Too many assertions
}
}

2. Use Descriptive Names

#![allow(unused)]
fn main() {
#[test]
fn test_query_with_filter_returns_only_matching_results() {
    // Clear what's being tested
}
}

3. Arrange-Act-Assert

Structure tests clearly:

#![allow(unused)]
fn main() {
#[test]
fn test_example() {
    // Arrange: Setup
    let dataset = setup_test_dataset();
    let query = parse_query(TEST_QUERY);
    
    // Act: Execute
    let results = execute_query(&dataset, &query).unwrap();
    
    // Assert: Verify
    assert_eq!(results.len(), 3);
}
}

4. Test Error Cases

#![allow(unused)]
fn main() {
#[test]
fn test_invalid_query_returns_error() {
    let result = parse_query("invalid");
    assert!(result.is_err());
}

#[tokio::test]
async fn test_missing_ledger_returns_ledger_not_found() {
    let result = fluree.ledger("nonexistent:main").await;
    assert!(matches!(result.unwrap_err(), Error::LedgerNotFound(_)));
}
}

5. Avoid Flaky Tests

Don’t depend on:

  • Timing
  • Random values (use seeded RNG)
  • External services
  • File system state

6. Clean Up Resources

#![allow(unused)]
fn main() {
#[test]
fn test_with_temp_file() {
    let temp_dir = tempfile::tempdir().unwrap();
    let file_path = temp_dir.path().join("test.db");
    
    // Test with file_path
    
    // temp_dir automatically cleaned up
}
}

7. Use Test Utilities

#![allow(unused)]
fn main() {
// tests/common/mod.rs
pub fn assert_solution_contains(solutions: &[Solution], var: &str, value: &str) {
    let found = solutions.iter().any(|s| {
        s.get(var).map(|v| v.to_string() == value).unwrap_or(false)
    });
    assert!(found, "Expected to find {}={} in results", var, value);
}
}

W3C SPARQL Compliance Tests

The testsuite-sparql crate runs official W3C SPARQL test cases against Fluree’s parser and query engine. Tests are discovered automatically from W3C manifest files — zero hand-written test cases.

# Run all W3C SPARQL tests
cargo test -p testsuite-sparql

# Run with verbose output
cargo test -p testsuite-sparql -- --nocapture 2>&1

The suite covers SPARQL 1.0 and 1.1 syntax tests (293 tests) plus query evaluation tests across 12 categories (233 tests). Eval tests are #[ignore]’d by default — run with --include-ignored or via make test-eval in testsuite-sparql/.

For the full guide on interpreting results, debugging failures, and contributing fixes, see the W3C SPARQL Compliance Suite guide.

Test Coverage

Generate Coverage Report

Using tarpaulin:

cargo install cargo-tarpaulin

cargo tarpaulin --out Html --output-dir coverage/

View: coverage/index.html

Coverage Goals

  • Core functionality: 90%+ coverage
  • Edge cases: Tested
  • Error paths: Tested
  • Public APIs: 100% covered

W3C SPARQL Compliance Test Suite

The testsuite-sparql crate runs official W3C SPARQL test cases against Fluree’s parser and query engine. Every test is discovered automatically from W3C manifest files — there are zero hand-written test cases.

This guide covers how to run the suite, interpret results, and turn failures into fixes.

Why This Exists

The W3C publishes its SPARQL test suite as RDF data. Each manifest.ttl file declares test entries: a query file, optional input data, and expected results. Every serious SPARQL implementation (Oxigraph, Apache Jena, Eclipse RDF4J) runs these manifests programmatically. We do the same.

The ratio is extraordinary: ~700 lines of Rust infrastructure drive 700+ W3C test cases. Each failure is a spec-backed bug report with built-in test data and expected results.

Philosophy: failures are features. When a test fails, the default response is to fix Fluree, not skip the test. Skip entries are reserved for documented, deliberate design divergences reviewed by the team.

Quick Start

Important: The testsuite-sparql crate is excluded from the Cargo workspace (see root Cargo.toml). You must cd testsuite-sparql/ before running any cargo or make commands. Using cargo test -p testsuite-sparql from the workspace root will fail.

All commands below assume you are already in testsuite-sparql/.

Run All Tests

cd testsuite-sparql
cargo test

This runs all non-ignored W3C test suites. Currently that includes SPARQL 1.0 and 1.1 syntax tests. Query evaluation tests (12 categories, 327 tests) are registered but #[ignore]’d — run them with --include-ignored or via the Makefile.

Run a Specific Suite

# SPARQL 1.1 syntax only
cargo test sparql11_syntax_query_tests

# SPARQL 1.0 syntax only
cargo test sparql10_syntax_tests

# Full query evaluation (~5 min, includes all 12 categories)
cargo test sparql11_query_w3c_testsuite -- --include-ignored

# Single evaluation category
cargo test sparql11_functions -- --include-ignored

Run With Verbose Output

cargo test -- --nocapture 2>&1

The suite writes progress to stderr (Running test N: <test_id> ...) and a summary at the end.

Using the Makefile

The testsuite-sparql/Makefile provides convenience targets:

# --- Running tests ---
make test              # Run syntax tests (live output)
make test-syntax11     # SPARQL 1.1 syntax tests only
make test-syntax10     # SPARQL 1.0 syntax tests only
make test-eval         # Full eval suite, all 12 categories
make test-eval-cat CAT=functions
                       # Run one eval category
make test-eval10       # Run SPARQL 1.0 eval tests

# --- Reports ---
make count-eval        # Quick pass/fail counts for eval tests
make report-eval-json  # JSON report for 1.1 eval → report-eval.json
make report-10-json    # JSON report for 1.0 eval → report-10.json
make cat-json CAT=functions
                       # JSON report for a single category

# --- Analysis (requires report-eval.json) ---
make summary           # Per-category pass/fail breakdown
make classify          # Group failures by error type
make failures-eval     # List all eval failures with type
make failures-eval CAT=functions
                       # Filter failures to one category

# --- Investigating specific tests ---
make investigate-eval TEST=substring01
                       # Search eval report for a test
make show-query TEST=syntax-select-expr-04.rq
                       # Print the .rq file for a test
make clean             # Remove generated report files

Understanding the Output

Test Summary

After running, the suite prints:

=== Test Summary ===
Total:   94
Passed:  79
Ignored: 0
Failed:  15
  • Total: Number of W3C test cases discovered from manifest files
  • Passed: Tests where Fluree’s behavior matched the W3C expectation
  • Ignored: Tests in the skip list (should be near zero)
  • Failed: Tests where Fluree diverged from the spec — these are bugs or gaps

Failure Messages

Each failure includes the test ID, type, and error details:

https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl#test_34:
  Positive syntax test failed — parser rejected valid query.
  Test: ...#test_34
  File: .../syntax-query/syntax-select-expr-04.rq

For syntax tests, failures fall into three categories:

Failure TypeWhat It MeansExample
Positive test failsParser rejects valid SPARQLMissing feature (subqueries, property path |)
Negative test failsParser accepts invalid SPARQLMissing validation (BIND scope, GROUP BY scope)
Parser timeoutParser enters infinite loopBug in grammar handling (mitigated by safety-net forward-progress check)

Test IDs

Every test has a unique IRI like:

https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl#test_34

The fragment (#test_34) identifies the specific test within that manifest. The path tells you the W3C category (syntax-query, aggregates, bind, etc.).

Analyzing Results

Per-Category Breakdown

Use make summary to see pass/fail rates by W3C category:

make summary

This requires report-eval.json (generated automatically if missing). Output looks like:

Category                  Pass  Fail Total    Rate
----------------------------------------------------
syntax-query                80    14    94     85%
subquery                     8     6    14     57%
functions                   27    48    75     36%
...
----------------------------------------------------
TOTAL                      167   160   327   51.1%

Error Classification

Use make classify to group failures by root cause:

make classify

Error types:

  • RESULT MISMATCH — Query runs but returns wrong values
  • INTERNAL ERROR — Execution fails with an internal error
  • PARSE/LOWERING — SPARQL parsing or IR lowering fails
  • NEGATIVE SYNTAX — Parser accepts a query it should reject
  • POSITIVE SYNTAX — Parser rejects a query it should accept
  • EMPTY RESULTS — Query returns no results when some were expected
  • NOT IMPLEMENTED — Feature not yet implemented
  • PANIC — Subprocess crashed (usually an index/unwrap bug)
  • TIMEOUT — Test exceeded 5s (syntax) or 10s (eval) timeout

Listing Failures

Use make failures-eval to list all failures with their type and first error line:

make failures-eval               # All failures
make failures-eval CAT=functions # Just one category

JSON Reports

For programmatic analysis, generate a JSON report:

make report-eval-json    # → report-eval.json
make report-10-json      # → report-10.json
make cat-json CAT=bind   # → report-bind.json

Report format:

{
  "total": 327, "passed": 167, "failed": 160, "pass_rate": "51.1%",
  "tests": [
    { "test_id": "http://...#agg01", "status": "pass", "error": null, "timeout": false },
    { "test_id": "http://...#agg02", "status": "fail", "error": "Results not isomorphic...", "timeout": false }
  ]
}

The analysis script at scripts/analyze_report.py can also be used directly:

python3 scripts/analyze_report.py summary report-eval.json
python3 scripts/analyze_report.py classify report-eval.json
python3 scripts/analyze_report.py failures report-eval.json --category functions

From Failure to Fix: The Workflow

Step 1: Identify the Failure Category

Run the suite and look at the failure message:

cargo test sparql11_syntax_query_tests -- --nocapture 2>&1 | tail -40

Determine which category:

  • Parser timeout → Bug in fluree-db-sparql grammar rules causing infinite loop (mitigated by safety-net forward-progress check in parse_group_graph_pattern(), but can still occur in other parse entry points)
  • Positive syntax rejected → Missing parser feature or incorrect grammar
  • Negative syntax accepted → Missing semantic validation pass
  • Query evaluation mismatch → Bug in query engine, data loading, or result formatting

Step 2: Find the Test Query

Every W3C test references a .rq (query) or .ru (update) file. The failure message includes the file URL. Map it to a local path:

URL:   https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq
Local: testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq

The pattern: strip https://w3c.github.io/rdf-tests/ and prepend testsuite-sparql/rdf-tests/.

Read the query to understand what SPARQL feature is being tested:

cat testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq

Step 3: Reproduce in Isolation

Try parsing the query directly to see the exact error:

#![allow(unused)]
fn main() {
// Quick test in fluree-db-sparql
let output = fluree_db_sparql::parse_sparql("SELECT (1 + ?x AS ?y) WHERE { ?x ?p ?o }");
println!("has_errors: {}", output.has_errors());
for err in output.errors() {
    println!("  error: {err:?}");
}
}

If you suspect an infinite loop, the subprocess timeout will catch it automatically when run via the harness.

Step 4: Investigate the Root Cause

For parser issues, the relevant code is in fluree-db-sparql/. Start with:

  • src/parser/ — Grammar rules and parser combinators
  • src/ast/ — AST types the parser emits

For query evaluation issues, the chain is:

  1. fluree-db-sparql → parses to SparqlAst
  2. fluree-db-query → evaluates the AST against a ledger
  3. fluree-db-api → orchestrates ledger creation and query execution

Step 5: Create an Issue

Use this template:

## W3C SPARQL Compliance: [short description]

**Test ID:** `https://w3c.github.io/rdf-tests/sparql/sparql11/[category]/manifest.ttl#[test_name]`
**Category:** [syntax-query | aggregates | bind | etc.]
**Failure type:** [parser timeout | positive syntax rejected | negative syntax accepted | evaluation mismatch]

### Test Query

\`\`\`sparql
[paste the .rq file contents]
\`\`\`

### Expected Behavior

[For positive syntax: should parse successfully]
[For negative syntax: should be rejected]
[For evaluation: expected results from the .srx/.srj file]

### Actual Behavior

[Error message or incorrect output]

### Root Cause Analysis

[What part of the code needs to change and why]

### W3C Spec Reference

[Link to relevant section of https://www.w3.org/TR/sparql11-query/]

Step 6: Fix and Verify

After making code changes:

# Verify the specific test passes (from testsuite-sparql/)
cargo test sparql11_syntax_query_tests -- --nocapture 2>&1 | grep "test_34"

# Verify you haven't regressed other tests
make count-eval

# Run the parser's own tests (from workspace root)
cd .. && cargo test -p fluree-db-sparql

# Full CI parity check
cargo clippy -p fluree-db-sparql --all-features -- -D warnings

Using Claude Code for Debugging

Claude Code is particularly effective for SPARQL compliance work because each failure is self-contained: a query file, an expected behavior, and a specific error. Here’s how to give a session full context.

Prompt Template for Parser Failures

I'm working on W3C SPARQL compliance in Fluree. The following test is failing:

Test ID: https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl#test_34
Category: Positive syntax test (parser should accept this query but rejects it)

The query file is at: testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq

The SPARQL parser is in fluree-db-sparql/. The parse entry point is
`parse_sparql()` which returns `ParseOutput<SparqlAst>` — check `has_errors()`.

Please:
1. Read the failing query file
2. Understand what SPARQL feature it tests
3. Find the relevant parser grammar in fluree-db-sparql/src/parser/
4. Identify why the parser rejects this input
5. Propose a fix

Prompt Template for Query Evaluation Failures

I'm working on W3C SPARQL compliance. This query evaluation test is failing:

Test ID: https://w3c.github.io/rdf-tests/sparql/sparql11/aggregates/manifest.ttl#agg01
Test data: testsuite-sparql/rdf-tests/sparql/sparql11/aggregates/agg01.ttl
Query file: testsuite-sparql/rdf-tests/sparql/sparql11/aggregates/agg01.rq
Expected results: testsuite-sparql/rdf-tests/sparql/sparql11/aggregates/agg01.srx

The test harness creates an in-memory Fluree ledger, loads the data via
stage_owned().insert_turtle(), executes the query via query_sparql(), and
compares results.

Actual output: [paste actual output]
Expected output: [paste expected from .srx file]

Please investigate why the results differ and propose a fix.

Key Files to Reference

When asking Claude Code for help, these files provide essential context:

Context NeededFile(s)
Test harness architecturetestsuite-sparql/src/lib.rs, src/evaluator.rs
Subprocess timeout isolationtestsuite-sparql/src/subprocess.rs
Subprocess worker binarytestsuite-sparql/src/bin/run_w3c_test.rs
How manifests are parsedtestsuite-sparql/src/manifest.rs
Syntax test handlerstestsuite-sparql/src/sparql_handlers.rs
Eval test handler (data load + query + compare)testsuite-sparql/src/query_handler.rs
Expected result parsing (.srx/.srj)testsuite-sparql/src/result_format.rs
Isomorphic result comparisontestsuite-sparql/src/result_comparison.rs
SPARQL parser entry pointfluree-db-sparql/src/lib.rs (parse_sparql())
Parser grammar rulesfluree-db-sparql/src/parser/
SPARQL AST typesfluree-db-sparql/src/ast/
Query enginefluree-db-query/src/
API orchestrationfluree-db-api/src/
W3C SPARQL test categoriestestsuite-sparql/tests/w3c_sparql.rs

Batch Processing Tips

When multiple tests fail for the same root cause (e.g., “all BIND tests timeout”), group them:

These 3 tests all timeout in the parser on BIND expressions:
- test_34: SELECT (1 + ?x AS ?y)
- test_40: SELECT (CONCAT(?x, "!") AS ?label)
- test_65: subquery with SELECT expression

All are in testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/.

The parser code for BIND is in fluree-db-sparql/src/parser/. Please find the
common root cause and fix all three.

JSON-LD Query Parity

SPARQL and JSON-LD queries in Fluree compile to the same intermediate representation (fluree-db-query/src/ir.rs) and share the entire execution engine. This means:

  1. Shared code changes affect both languages. If you add a new Expression variant, Pattern variant, or AggregateFn for SPARQL, it automatically becomes available to JSON-LD query as well. Ensure JSON-LD tests still pass.

  2. New SPARQL features may need JSON-LD test coverage. If a feature you’re implementing for SPARQL compliance (e.g., a new built-in function, a new filter operator) is also expressible in JSON-LD query syntax, add corresponding JSON-LD integration tests.

  3. Some features are SPARQL-only. Property paths, RDF-star, ASK query form, and SPARQL Update don’t have JSON-LD equivalents. These don’t require parity testing.

Where to add parity tests

LanguageTest files
SPARQLfluree-db-api/tests/it_query_sparql.rs
JSON-LDfluree-db-api/tests/it_query.rs, it_query_analytical.rs, it_query_grouping.rs
SharedUnit tests in fluree-db-query/src/ modules

Validation after shared-code changes

# SPARQL W3C tests (from testsuite-sparql/)
make test-eval-cat CAT=<category>

# JSON-LD query tests (from workspace root)
cargo test -p fluree-db-api --test it_query
cargo test -p fluree-db-api --test it_query_analytical

Architecture Overview

Crate Structure

testsuite-sparql/
├── Cargo.toml                      # Excluded from workspace, publish = false
├── Makefile                        # Developer convenience targets
├── scripts/
│   └── analyze_report.py           # JSON report analysis (summary, classify, failures)
├── src/
│   ├── lib.rs                      # check_testsuite() entry point
│   ├── vocab.rs                    # W3C namespace constants (mf:, qt:, etc.)
│   ├── files.rs                    # URL → local file path mapping
│   ├── manifest.rs                 # TestManifest: Iterator<Item=Test>
│   ├── evaluator.rs                # TestEvaluator: type → handler dispatch
│   ├── sparql_handlers.rs          # Handler registration (syntax + eval)
│   ├── query_handler.rs            # QueryEvaluationTest: load data, run query, compare
│   ├── subprocess.rs               # Subprocess isolation for timeout enforcement
│   ├── result_format.rs            # Parse .srx/.srj expected result files
│   ├── result_comparison.rs        # Isomorphic result comparison (blank node mapping)
│   ├── report.rs                   # JSON report generation
│   └── bin/
│       └── run_w3c_test.rs         # Subprocess worker binary
├── tests/
│   └── w3c_sparql.rs               # Test entry points (syntax + 12 eval categories)
└── rdf-tests/                      # Git submodule → github.com/w3c/rdf-tests

How It Works

1. Manifest Parsing (manifest.rs): TestManifest implements Iterator<Item = Result<Test>>. It loads manifest.ttl files using Fluree’s own Turtle parser, follows mf:include links recursively, and extracts per-test metadata: type, query file, data file, expected results.

2. Handler Dispatch (evaluator.rs): TestEvaluator maps test type URIs (e.g., mf:PositiveSyntaxTest11) to handler functions. For each test, it finds the matching handler and invokes it.

3. SPARQL Handlers (sparql_handlers.rs + query_handler.rs): The Fluree-specific logic. Both syntax and evaluation tests run in isolated subprocesses via the run-w3c-test binary (subprocess.rs). For syntax tests, the subprocess calls parse_sparql() + validate() and reports whether errors were found (5-second timeout). For evaluation tests, the subprocess creates an in-memory Fluree ledger, loads Turtle test data, executes the SPARQL query, and compares results against expected .srx/.srj files using isomorphic matching (10-second timeout). If a test exceeds its timeout, the parent kills the child process — no zombie threads.

4. Test Entry Points (tests/w3c_sparql.rs): Each test function is ~5 lines — just a manifest URL and a skip list. The harness does the rest.

Key Design Decisions

  • Subprocess isolation for all test execution. Each syntax and eval test runs in a child process (run-w3c-test binary) that can be killed on timeout. This prevents zombie threads from parser infinite loops or runaway queries.
  • Syntax timeout: 5 seconds, eval timeout: 10 seconds. If a test exceeds its limit, the subprocess is killed and the test is marked as a timeout failure.
  • Uses Fluree’s own Turtle parser for manifest files. If our parser can’t handle well-formed W3C manifests, that’s a bug worth knowing about.
  • Fluree’s list_index approach (instead of rdf:first/rdf:rest) simplifies manifest list handling.
  • @base prepended to manifest files since they use <> (empty relative IRI) which requires a base.

Test Categories

Syntax Tests (Phase 1)

SuiteWhat It TestsManifest
SPARQL 1.1 syntaxParser correctness for SPARQL 1.1 grammarsyntax-query/manifest.ttl
SPARQL 1.0 syntaxBackward compatibility with SPARQL 1.0manifest-syntax.ttl

Query Evaluation Tests (Phase 2)

Each test creates an in-memory Fluree ledger, loads RDF data, executes a SPARQL query, and compares results against W3C expected outputs. Run with make test-eval-cat CAT=<name>.

SuiteWhat It TestsManifest
AggregatesCOUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT, SAMPLEaggregates/manifest.ttl
BINDBIND expressions, variable assignmentbind/manifest.ttl
BindingsVALUES inline databindings/manifest.ttl
Castxsd:integer(), xsd:double(), xsd:string()cast/manifest.ttl
ConstructCONSTRUCT query formconstruct/manifest.ttl
ExistsFILTER EXISTS, FILTER NOT EXISTSexists/manifest.ttl
FunctionsString, numeric, date/time, hash, IRI functionsfunctions/manifest.ttl
GroupingGROUP BY semantics, error handlinggrouping/manifest.ttl
NegationMINUS, NOT EXISTSnegation/manifest.ttl
Project-ExpressionSELECT expressions, AS aliasesproject-expression/manifest.ttl
Property-Path/, |, ^, +, *, ? operatorsproperty-path/manifest.ttl
SubqueryNested SELECT within WHEREsubquery/manifest.ttl

BIND / VALUES Compliance Notes

BIND (10/10 — 100%):

  • Fixed lexer to tokenize +/- as separate operators per the SPARQL spec (INTEGER is unsigned; INTEGER_POSITIVE/INTEGER_NEGATIVE are grammar-level). This fixed ?o+10 being mis-tokenized as Var, Integer(10) instead of Var, Plus, Integer(10).
  • BIND input variable liveness is handled by precompute_suffix_vars (cross-block) and pending_binds.expr.variables() (within-block) in the WHERE planner — no special handling needed in compute_variable_deps.
  • Explicitly nested { } blocks inside WHERE are lowered as anonymous subqueries (SubqueryPattern) to preserve SPARQL scope boundaries (bind10).

VALUES / Bindings (10/11 — 91%):

  • Post-query VALUES (WHERE { ... } VALUES ?x { ... }) is now parsed and lowered. Added values field on SelectQuery AST and post_values field on ParsedQuery to prevent the planner from reordering it relative to OPTIONAL/UNION.
  • NestedLoopJoinOperator::combine_rows fixed to handle Unbound/Poisoned left-side shared variables by falling back to right-side values. This fixes VALUES with UNDEF (values4, values5, values8).
  • ValuesOperator updated to treat Poisoned (from failed OPTIONAL) as wildcard in is_compatible and merge_rows, fixing values7 (OPTIONAL + VALUES).
  • Remaining failure: graph test requires named graph support (GRAPH keyword) — tracked separately.

Managing the Skip List

Skip entries are the ignored_tests parameter in check_testsuite() calls:

#![allow(unused)]
fn main() {
check_testsuite(
    "https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl",
    &[
        // Deliberately accept bare `1` as integer literal (RDF 1.1 vs 1.0)
        // Spec: https://www.w3.org/TR/sparql11-query/#rNumericLiteral
        // Reviewed: 2025-02-15 by @ajohnson, @bsmith
        "https://...#test_99",
    ],
)
}

Rules:

  1. Start with an empty skip list. Expect full compliance.
  2. Only add entries after investigation confirms a deliberate design choice, not a bug.
  3. Every skip entry must have a comment explaining why, linking to the relevant spec section.
  4. Skip entries require review by 2+ team members.
  5. The total skip list should be <5% of tests (Oxigraph skips ~25 out of 700+).
  6. Review skip entries periodically — remove them as features are added.

Updating the rdf-tests Submodule

The W3C test data lives in a git submodule at testsuite-sparql/rdf-tests/. To update to the latest W3C tests:

cd testsuite-sparql/rdf-tests
git pull origin main
cd ../..
git add testsuite-sparql/rdf-tests
git commit -m "chore: update W3C rdf-tests submodule"

After updating, run the full suite to check for new tests or changed expectations:

cd testsuite-sparql
cargo test

SHACL Implementation

This is the contributor-facing guide to how SHACL validation is wired into Fluree. It covers the pipeline, the crate layout, and the places you’ll want to touch when fixing a bug or adding a constraint.

User-facing docs: Cookbook: SHACL Validation and Setting Groups — SHACL.

Pipeline at a glance

Transaction flakes
        │
        ▼
┌─────────────────────────────────────────────────────────────────┐
│ fluree-db-transact :: stage()                                   │
│   stages flakes into a StagedLedger (novelty overlay)             │
└─────────────────────────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────────────────────────┐
│ fluree-db-api :: apply_shacl_policy_to_staged_view()            │
│   (shared post-stage helper — called from every write surface)  │
│                                                                 │
│  1. load_transaction_config(ledger)                             │
│  2. build_per_graph_shacl_policy(config, graph_delta)           │
│     → HashMap<GraphId, ShaclGraphPolicy>                        │
│  3. resolve_shapes_source_g_ids(config, snapshot)               │
│     → Vec<GraphId>  (where to compile shapes from)              │
│  4. ShaclEngine::from_dbs_with_overlay(&[GraphDbRef], ledger)   │
│  5. validate_view_with_shacl(view, cache, ..., per_graph_policy)│
│     → ShaclValidationOutcome { reject, warn }                   │
│  6. log warn bucket; propagate ShaclViolation for reject bucket │
└─────────────────────────────────────────────────────────────────┘

Crate layout

CrateRole
fluree-db-shaclSHACL engine: shape compilation, cache, per-node validation, constraint evaluators. No transaction-layer concerns.
fluree-db-transactStaged-validation plumbing: validate_view_with_shacl, validate_staged_nodes. Knows about StagedLedger, staged flakes, and graph routing. Defines the per-graph policy types.
fluree-db-apiConfig resolution, policy building, and the shared helper that every write surface (JSON-LD, Turtle, commit replay) calls through.

SHACL is feature-gated (shacl). See Standards and feature flags.

The shared post-stage helper

All SHACL-enforced write surfaces route through apply_shacl_policy_to_staged_view in fluree-db-api/src/tx.rs:

#![allow(unused)]
fn main() {
pub(crate) async fn apply_shacl_policy_to_staged_view(
    view: &StagedLedger,
    ctx: StagedShaclContext<'_>,
) -> Result<(), TransactError>
}

StagedShaclContext carries everything that varies between call sites:

FieldPopulated by JSON-LD txnPopulated by Turtle insertPopulated by commit replay
graph_deltaSome(&txn.graph_delta) (IRIs)NoneSome(&routing.graph_iris)
graph_sidsSome(&graph_sids)NoneSome(&routing.graph_sids)
trackeroptions.trackerNoneNone

Why not fold this into fluree-db-transact? Config resolution (three-tier merge, override control, per-graph lookup) is API-layer policy, not a staging primitive. Keeping the helper in tx.rs lets fluree-db-transact stay focused on staging mechanics.

Call sites:

  • fluree-db-api/src/tx.rs::stage_with_config_shacl (JSON-LD / SPARQL UPDATE txns)
  • fluree-db-api/src/tx.rs::stage_turtle_insert (plain Turtle)
  • fluree-db-api/src/commit_transfer.rs (push / replay)

Config resolution

Ledger-wide and per-graph policy

build_per_graph_shacl_policy(config, graph_delta) returns Option<HashMap<GraphId, ShaclGraphPolicy>>:

  • Graphs absent from the map are disabled — their staged subjects are skipped by the validator.
  • ShaclGraphPolicy { mode: ValidationMode } controls warn vs reject for that graph.
  • The default graph (g_id=0) always gets the ledger-wide resolved policy when SHACL is enabled.
  • Every graph in graph_delta is resolved independently via config_resolver::resolve_effective_config(config, Some(graph_iri)), which applies the three-tier merge (query-time → per-graph → ledger-wide) under override-control rules.
  • Returns None when every graph resolves to disabled → the helper short-circuits before building the SHACL engine.

The transact layer’s validate_view_with_shacl signature:

#![allow(unused)]
fn main() {
pub async fn validate_view_with_shacl(
    view: &StagedLedger,
    shacl_cache: &ShaclCache,
    graph_sids: Option<&HashMap<GraphId, Sid>>,
    tracker: Option<&Tracker>,
    per_graph_policy: Option<&HashMap<GraphId, ShaclGraphPolicy>>,
) -> Result<ShaclValidationOutcome>
}
  • per_graph_policy = None: treat every graph with staged flakes as Reject (legacy / shapes-exist-heuristic path).
  • per_graph_policy = Some(map): only graphs in the map participate; their mode drives the warn/reject split.

Output:

#![allow(unused)]
fn main() {
pub struct ShaclValidationOutcome {
    pub reject_violations: Vec<ValidationResult>,
    pub warn_violations: Vec<ValidationResult>,
}
}

The API helper logs the warn bucket and returns TransactError::ShaclViolation for the reject bucket.

f:shapesSource resolution

resolve_shapes_source_g_ids(config, snapshot) in tx.rs is the sibling of policy_builder::resolve_policy_source_g_ids — identical shape, different namespace. Both:

  1. Start with [0] (default graph) when the source field is unset.
  2. Map f:defaultGraph[0].
  3. Map a named graph IRI to its registered GraphId via snapshot.graph_registry.graph_id_for_iri.
  4. Reject unsupported dimensions: f:atT, f:trustPolicy, f:rollbackGuard, cross-ledger f:ledger (these surface as TransactError::Parse).

f:shapesSource is authoritative, not additive — when set, shapes come exclusively from the configured graph(s). It’s intentionally non-overridable at query/txn time; it can only be changed via a config-graph transaction.

Shape compilation from multiple graphs

ShapeCompiler::compile_from_dbs(&[GraphDbRef]) in fluree-db-shacl/src/compile.rs scans each input graph for every SHACL predicate (see the shacl_predicates list), accumulates into a single ShapeCompiler, then finalizes. Cross-graph sh:and / sh:or / sh:xone / sh:in list references still resolve because finalization runs once after all graphs are consumed.

ShaclEngine::from_dbs_with_overlay(&[GraphDbRef], ledger_id) is the corresponding engine constructor. from_db_with_overlay(db, ledger_id) is a single-graph convenience that delegates to the multi-graph path via slice::from_ref(&db).

The engine’s SchemaHierarchy is taken from the first graph’s snapshot — hierarchy is schema-level and not graph-scoped.

Target-type resolution

The cache (fluree-db-shacl/src/cache.rs) holds four indexes:

FieldKeyed byUsed for
by_target_classclass Sid (with rdfs:subClassOf* expansion)sh:targetClass
by_target_nodesubject Sidsh:targetNode
by_target_subjects_ofpredicate Sidsh:targetSubjectsOf
by_target_objects_ofpredicate Sidsh:targetObjectsOf

ShaclEngine::validate_node assembles applicable shapes for a focus node by:

  1. shapes_for_node(focus) — O(1) hashmap hit.
  2. shapes_for_class(type) for each of the focus’s rdf:type values — O(1) per type.
  3. For each key p in by_target_subjects_of: existence check db.range(SPOT, s=focus, p=p) — if non-empty, shape applies.
  4. For each key p in by_target_objects_of: existence check db.range(OPST, p=p, o=focus) — if non-empty, shape applies.

Why the live db check for steps 3/4 instead of precomputed staged-flake hints? Three scenarios a hint-only approach can’t cover:

  • Base-state edge: the triggering edge is already indexed; the current txn only touches another property.
  • Retraction-only: the staged flake set for a focus contains retractions that don’t remove the last matching edge.
  • Cross-graph routing: a subject’s edge exists in graph A but we’re validating the subject in graph B — the per-graph db ref sees only B.

db.range() returns only post-state assertions (retractions are filtered in the range pipeline — see fluree-db-core/src/range.rs), so the check is exactly “is this edge present in the post-txn view of this graph”.

Cost is bounded by the number of predicate-targeted shapes in the cache, not by data size — typically 0–10 per ledger.

Staged validation loop

validate_staged_nodes in fluree-db-transact/src/stage.rs:

  1. Partition staged flakes into subjects_by_graph: HashMap<GraphId, HashSet<Sid>>.
    • Every flake’s subject is added (including retractions — class/node targets still need to see them).
    • Every assert flake’s Ref-object is also added to the graph’s focus set (ensures sh:targetObjectsOf shapes fire on newly-referenced nodes).
  2. For each (g_id, subjects):
    • If enabled_graphs is Some and g_id is not in it: skip.
    • Build a per-graph GraphDbRef with view as overlay and view.staged_t() as t.
    • Attach the tracker (if any) — fuel accounting works for SHACL range scans too.
    • For each subject: fetch rdf:type flakes, then call engine.validate_node(db, subject, &types).
    • Tag each returned ValidationResult with graph_id = Some(g_id) so the caller can partition reject vs warn.

RDFS subclass fallback (is_subclass_of)

When the indexed SchemaHierarchy doesn’t know about a rdfs:subClassOf edge (e.g. asserted in the same or a recent unindexed transaction), validate_class_constraint calls is_subclass_of(db, start, target) which walks rdfs:subClassOf upward via BFS.

Two invariants in that walk:

  • Always scope to g_id=0 via rescope_to_schema_graph(db) — schema lives in the default graph, matching how SchemaHierarchy::from_db_root_schema is built. Subject may be in graph G but the subClassOf edge must be looked up in the schema graph.
  • Preserve tracker + other GraphDbRef fieldsrescope_to_schema_graph uses db copy + g_id = 0 mutation rather than GraphDbRef::new(..), which would reset tracker, runtime_small_dicts, and eager. There’s a unit test pinning this (rescope_to_schema_graph_preserves_tracker_and_other_fields).

Adding a new constraint

1. Compile

In fluree-db-shacl/src/compile.rs:

  • Add a variant to the Constraint enum (or NodeConstraint for node-level).
  • Add the predicate name to the shacl_predicates array in ShapeCompiler::compile_from_dbs.
  • Handle the predicate in process_flake (sets the right field on the intermediate shape builder).
  • If the constraint takes arguments via an RDF list, extend expand_rdf_lists.

2. Validate

Pure per-value constraints (no db access) go in fluree-db-shacl/src/constraints/:

  • Add a validate_<name>(values, ..) -> Option<ConstraintViolation> helper next to the similar ones in cardinality.rs / value.rs / etc.
  • Wire it into the big match in validate_constraint in fluree-db-shacl/src/validate.rs.

Constraints that need database access (sh:class, pair constraints) are handled before the pure dispatch, inside validate_property_shape. Pattern:

#![allow(unused)]
fn main() {
Constraint::MyConstraint(target) => {
    let helper_violations = validate_my_constraint(db, &values, target).await?;
    for v in helper_violations {
        results.push(ValidationResult {
            focus_node: focus_node.clone(),
            result_path: Some(prop_shape.path.clone()),
            source_shape: parent_shape.id.clone(),
            source_constraint: Some(prop_shape.id.clone()),
            severity: prop_shape.severity,
            message: v.message,
            value: v.value,
            graph_id: None, // tagged later in validate_staged_nodes
        });
    }
}
}

3. Advertise

Update fluree-db-shacl/src/lib.rs:

  • Add the constraint to the Supported Constraints list.
  • Remove from the Not Yet Supported section if it was listed.

4. Test

  • Add a unit test next to your validate_<name> helper for the pure logic.
  • Add an integration test in fluree-db-api/src/shacl_tests.rs that transacts a shape + violating data + valid data.
  • For a bug fix: temp-revert the fix, confirm the test fails, restore, confirm it passes. This pins the regression into the test.

Testing patterns

Integration tests

Most SHACL integration tests live in fluree-db-api/src/shacl_tests.rs and use the assert_shacl_violation(err, "substring") helper. Pattern:

#![allow(unused)]
fn main() {
let shape = json!({ /* sh:NodeShape with the constraint under test */ });
let ledger = fluree.create_ledger("shacl/foo:main").await.unwrap();
let ledger = fluree.upsert(ledger, &shape).await.unwrap().ledger;

// Negative case
let err = fluree.upsert(ledger, &violating_data).await.unwrap_err();
assert_shacl_violation(err, "expected message fragment");

// Positive case
fluree.upsert(ledger, &valid_data).await.expect("must pass");
}

Cross-graph / per-graph tests

See fluree-db-api/tests/it_config_graph.rs for patterns that write config via TriG into the config graph, then stage transactions across multiple graphs. Examples:

  • shacl_shapes_source_points_to_named_graphf:shapesSource wiring.
  • shacl_per_graph_disable_honored — per-graph shaclEnabled: false.
  • shacl_per_graph_mode_warn_vs_reject — mixed modes across graphs.
  • shacl_target_subjects_of_fires_on_base_state_edge — base-state predicate-target discovery.

The temp-revert trick

For every correctness-fix PR, confirm the regression test actually covers the bug:

  1. Apply the minimum temp-revert in the production code (comment out the fix with a // TEMP REVERT: marker).
  2. Run the new test — it should fail with the expected symptom.
  3. Restore the fix — test passes.
  4. Commit the fix + the test together.

This is how we guard against tests that pass trivially but don’t actually exercise the fix.

Known gaps

  • sh:uniqueLang, sh:languageIn — parsed but not evaluated. Needs language-tag metadata on flakes, which isn’t yet threaded through the validation path.
  • sh:qualifiedValueShape (+ sh:qualifiedMinCount / sh:qualifiedMaxCount) — parsed but not evaluated. Needs recursive nested-shape counting.
  • Cross-transaction shape cache — every call to from_dbs_with_overlay recompiles from scratch. ShaclCacheKey has a schema_epoch field that’s ready to drive a shared Arc<ShaclCache> cache on the connection, but nothing populates it yet. Low priority until perf regressions are observed.

Where to look in the code

WhatFile
Shape compilation (Turtle/JSON-LD → CompiledShape)fluree-db-shacl/src/compile.rs
Shape cache with target indexesfluree-db-shacl/src/cache.rs
Per-focus validation enginefluree-db-shacl/src/validate.rs
Per-constraint validators (pure values)fluree-db-shacl/src/constraints/
Staged-validation loop (per-graph)fluree-db-transact/src/stage.rs::validate_staged_nodes
Public transact entry + outcome splitfluree-db-transact/src/stage.rs::validate_view_with_shacl
Policy types (ShaclGraphPolicy, ShaclValidationOutcome)fluree-db-transact/src/stage.rs
Shared post-stage helperfluree-db-api/src/tx.rs::apply_shacl_policy_to_staged_view
Per-graph policy builderfluree-db-api/src/tx.rs::build_per_graph_shacl_policy
f:shapesSource resolverfluree-db-api/src/tx.rs::resolve_shapes_source_g_ids
JSON-LD / SPARQL txn call sitefluree-db-api/src/tx.rs::stage_with_config_shacl
Turtle insert call sitefluree-db-api/src/tx.rs::stage_turtle_insert
Commit replay call sitefluree-db-api/src/commit_transfer.rs
Config field definitionfluree-db-core/src/ledger_config.rs::ShaclDefaults
Config graph parserfluree-db-api/src/config_resolver.rs::read_shacl_defaults
Effective-config mergefluree-db-api/src/config_resolver.rs::merge_shacl_opts

Adding Tracing Spans to New Code

When you add or modify code paths in Fluree, you should instrument them with tracing spans so that performance investigations can decompose wall-clock time into meaningful phases. This guide explains the conventions, patterns, and gotchas.

The Two-Tier Span Strategy

Fluree uses a tiered approach so that tracing is zero-overhead by default but deeply informative on demand.

The request span: info_span! (the one exception)

The HTTP request span in telemetry.rs::create_request_span() is the only info_span! in the codebase. It provides operators with HTTP request visibility at the production default RUST_LOG=info. All other operation spans are debug_span! — this guarantees true zero overhead when the otel feature is not compiled and RUST_LOG is at info.

Tier 1: debug_span! – operation and phase level

All operation spans (query_execute, transact_execute, txn_stage, txn_commit, index_build, sort_blocking, etc.) and their phases use debug_span!. They are visible when OTEL is enabled (the OTEL Targets filter registers interest at DEBUG for fluree_* crates) or when a developer sets RUST_LOG=debug or RUST_LOG=info,fluree_db_query=debug. Without either, debug_span! short-circuits to a single atomic load (~1-2ns, unmeasurable).

Tier 2: trace_span! – maximum detail

Per-operator, per-item, or per-iteration spans. Visible at RUST_LOG=info,fluree_db_query=trace. Use for fine-grained instrumentation in hot paths where you only want visibility during deep investigation. The OTEL Targets filter intentionally excludes TRACE to prevent flooding the batch processor.

Decision guide

You’re adding…Span levelExample
New top-level operation, phase, or operatordebug_span!query_execute, reasoning_prep, join, binary_open_leaf
Detail or per-iteration instrumentationtrace_span!group_by, distinct, binary_cursor_next_leaf

Do not use info_span! for new operation spans. The request span is the sole exception.

Code Patterns

Sync phases (no .await)

Use span.enter() which creates a guard dropped at end of scope:

#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
    "pattern_rewrite",
    patterns_before = patterns.len() as u64,
    patterns_after = tracing::field::Empty,  // recorded later
);
let _guard = span.enter();

// ... do the rewriting ...

span.record("patterns_after", rewritten.len() as u64);
// _guard dropped here, span ends
}

Async phases (contains .await)

Never hold a span.enter() guard across an .await point. In tokio’s multi-threaded runtime, span.enter() enters the span on the current thread. When the task yields at .await, the span remains “entered” on that thread. Other tasks polled on the same thread will then inherit this span as their parent, causing cross-request trace contamination — completely unrelated operations become nested under each other in Jaeger. This was the root cause of a critical trace corruption bug in the HTTP route handlers.

Symptoms in Jaeger: If you see sequential, independent requests nested as children of an earlier request (especially where child spans outlive their parents), the cause is almost certainly span.enter() held across .await.

Instead, use .instrument(span):

#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
    "format",
    output_format = %format_name,
    result_count = total_rows as u64,
);
format_results(batch, format).instrument(span).await
}

If you need to record deferred fields on a span that wraps an async block, use Span::current() inside the instrumented block:

#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
    "txn_stage",
    insert_count = tracing::field::Empty,
    delete_count = tracing::field::Empty,
);
async {
    // ... do staging work ...
    let current = tracing::Span::current();
    current.record("insert_count", inserts as u64);
    current.record("delete_count", deletes as u64);
    Ok(result)
}.instrument(span).await
}

HTTP route handlers (axum)

Route handlers are async and must use .instrument(). The standard pattern wraps the entire handler body in an async move block instrumented with the request span, then uses Span::current() inside:

#![allow(unused)]
fn main() {
pub async fn query(
    State(state): State<Arc<AppState>>,
    headers: FlureeHeaders,
) -> Result<impl IntoResponse> {
    let span = create_request_span("query", request_id.as_deref(), ...);

    async move {
        let span = tracing::Span::current(); // Same span, safe to .record() on
        tracing::info!(status = "start", "query request received");

        let alias = get_ledger_alias(...)?;
        span.record("ledger_alias", alias.as_str());

        execute_query(&state, &alias, &query_json).await
    }
    .instrument(span)
    .await
}
}

Why async move + Span::current() instead of just .instrument(): Route handlers need to record deferred fields (like ledger_alias, error_code) on the span after creation. By obtaining Span::current() inside the instrumented block, you get a handle to the same span that .instrument() entered, letting you call .record() and pass it to set_span_error_code().

spawn_blocking

For tokio::task::spawn_blocking, enter the span inside the closure:

#![allow(unused)]
fn main() {
let span = tracing::debug_span!("heavy_compute");
tokio::task::spawn_blocking(move || {
    let _guard = span.enter();
    // ... sync work ...
}).await
}

std::thread::scope (parallel OS threads)

std::thread::scope spawned threads do NOT inherit tracing span context from the parent thread. Capture the current span before spawning and enter it inside each closure:

#![allow(unused)]
fn main() {
let parent_span = tracing::Span::current();

std::thread::scope(|s| {
    for item in &work_items {
        let thread_span = parent_span.clone();
        s.spawn(move || {
            let _guard = thread_span.enter();
            // ... work that creates child spans ...
        });
    }
});
}

This is safe because scoped threads are pure sync (no .await). The same pattern applies to any OS thread spawning (std::thread::spawn, rayon, etc.).

Lightweight operators (hot path)

For simple operators that just need a span marker, use the terse .entered() pattern:

#![allow(unused)]
fn main() {
fn open(&mut self, ctx: &mut Context<S, C>) -> Result<()> {
    let _span = tracing::trace_span!("filter").entered();
    self.child.open(ctx)?;
    // ...
    Ok(())
}
}

Deferred Fields

Declare fields as tracing::field::Empty at span creation, then record values later. This is essential for fields whose values aren’t known until the operation completes.

#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
    "plan",
    pattern_count = tracing::field::Empty,
);
let _guard = span.enter();

let plan = build_plan(&patterns)?;
span.record("pattern_count", plan.patterns.len() as u64);
}

Gotcha: tracing::Span::current().record(...) records on the current innermost span. If you’ve entered a child span, .record() targets the child, not the parent. Get a handle to the parent span before entering children:

#![allow(unused)]
fn main() {
let parent_span = tracing::debug_span!("outer", total = tracing::field::Empty);
let _parent_guard = parent_span.enter();

{
    let _child = tracing::trace_span!("inner").entered();
    // Span::current() is now "inner", NOT "outer"
}

// Back to "outer" scope -- safe to record on parent
parent_span.record("total", count as u64);
}

#[tracing::instrument] vs Manual Spans

Use #[tracing::instrument] for simple functions where:

  • You want span entry/exit to match the function boundary
  • The function name is a good span name
  • You don’t need deferred field recording

Always use skip_all and explicitly list fields:

#![allow(unused)]
fn main() {
#[tracing::instrument(level = "debug", name = "parse", skip_all, fields(input_format, input_bytes))]
fn parse_query(input: &[u8], format: &str) -> Result<Query> {
    // ...
}
}

Use manual spans when:

  • The span covers only part of a function
  • You need a different name than the function
  • You need deferred fields
  • The function is a hot path (the #[instrument] macro captures all arguments by default unless you skip_all)

Where to Add Spans

New query feature

If you add a new phase to query execution (e.g., a new optimization pass):

  1. Add a debug_span! in the code path
  2. Add the span name to the hierarchy in docs/operations/telemetry.md
  3. Add a test in fluree-db-api/tests/it_tracing_spans.rs verifying the span emits

New operator

If you add a new query operator:

  1. For core structural operators (scan, join, filter, project, sort), use debug_span! in open()
  2. For detail operators (group_by, distinct, limit, offset, etc.), use trace_span! in open()
  3. If it’s a blocking/buffering operator (like sort), add a debug_span! timing span in next_batch()
  4. Add a test verifying the span emits at the correct level

For lower-level remote storage diagnostics on the binary path, prefer short-lived debug_span! blocks around:

  • leaf-open strategy selection (binary_open_leaf)
  • remote leaf metadata reads (binary_fetch_header_dir)
  • individual range reads (binary_range_fetch)
  • leaflet cache hit/miss points (binary_load_leaflet)

These spans are intended for investigation of repeated remote I/O and cache effectiveness under query load.

New transaction phase

If you add a new phase to transaction processing:

  1. Add a debug_span! in the phase code
  2. Record relevant counts/sizes as deferred fields
  3. Add the span to the hierarchy in docs/operations/telemetry.md

New background task

If you add a new background task (like indexing, garbage collection, compaction):

  1. Add a debug_span! as the trace root (these are independent traces, not children of HTTP requests)
  2. Add debug sub-spans for phases within the task
  3. Ensure the crate target is listed in the OTEL Targets filter in telemetry.rs

Testing Spans

All new spans should have at least one test verifying they emit with expected fields at the right level.

Test utilities

The test infrastructure lives in fluree-db-api/tests/support/span_capture.rs:

#![allow(unused)]
fn main() {
mod support;
use support::span_capture;

#[tokio::test]
async fn my_new_span_emits_at_debug_level() {
    let (store, _guard) = span_capture::init_test_tracing(); // captures ALL levels

    // ... run the code that emits the span ...

    assert!(store.has_span("my_new_phase"));
    let span = store.find_span("my_new_phase").unwrap();
    assert_eq!(span.level, tracing::Level::DEBUG);
    assert!(span.fields.contains_key("some_field"));
}

#[tokio::test]
async fn my_new_span_not_visible_at_info() {
    let (store, _guard) = span_capture::init_info_only_tracing(); // captures only INFO+

    // ... run the code ...

    assert!(!store.has_span("my_new_phase")); // zero noise at info
}
}

Test helpers available

  • span_capture::init_test_tracing() – captures all spans regardless of level (for verifying span existence)
  • span_capture::init_info_only_tracing() – captures only INFO+ (for verifying zero-noise at default level)
  • SpanStore::has_span(name) – check if a span was emitted
  • SpanStore::find_span(name) – get span details (level, fields, parent)
  • SpanStore::find_spans(name) – find all spans with a given name
  • SpanStore::span_names() – list all captured span names

Where to put tests

  • Tracing integration tests go in fluree-db-api/tests/it_tracing_spans.rs
  • The test utilities are in fluree-db-api/tests/support/span_capture.rs

OTEL Layer Configuration

If you add a new crate that emits spans that should be exported via OTEL, add it to the Targets filter in fluree-db-server/src/telemetry.rs:

#![allow(unused)]
fn main() {
let otel_filter = Targets::new()
    .with_target("fluree_db_server", Level::DEBUG)
    .with_target("fluree_db_api", Level::DEBUG)
    // ... existing targets ...
    .with_target("my_new_crate", Level::DEBUG);  // ADD THIS
}

Without this, spans from the new crate will appear in console logs but not in Jaeger/Tempo.

Important: All OTEL targets are set to DEBUG level. Do not set any target to TRACE in the OTEL filter — TRACE-level spans (e.g., binary_cursor_next_leaf, per-scan spans) can generate thousands of spans per query, overwhelming the BatchSpanProcessor queue and causing parent spans to be dropped. Users who need TRACE-level detail should use RUST_LOG for console output; the OTEL exporter intentionally excludes TRACE spans.

Checklist for New Instrumentation

  • Used debug_span! (not info_span!) for all new operation spans
  • Used span.enter() only in sync code, .instrument(span) for async
  • Propagated span context into spawned threads (spawn_blocking, std::thread::scope, etc.)
  • Added deferred fields for values computed after span creation
  • Tested span emission with SpanCaptureLayer
  • Verified zero overhead at INFO level (no debug/trace spans appear without OTEL or RUST_LOG=debug)
  • Updated span hierarchy in docs/operations/telemetry.md if adding spans
  • Updated .claude/skills/*/references/span-hierarchy.md (both copies)
  • Added new crate to OTEL Targets filter if applicable

Common Gotchas

  1. span.enter() across .await causes cross-request contamination – This is the most dangerous tracing bug. In tokio’s multi-threaded runtime, span.enter() sets the span on the current thread. When the task suspends at .await, the span stays “entered” on that thread. Other tasks polled on the same thread inherit it as their parent. Result: unrelated requests cascade into each other’s traces in Jaeger, with child spans that outlive their parents. Always use .instrument(span) in async code. This was a real bug in the HTTP route handlers and took Jaeger analysis to identify.
  2. Span::current().record() targets the innermost span – not necessarily the one you intend. Hold a reference to the span you want to record on.
  3. OTEL exporter floods – if you set RUST_LOG=debug globally, third-party crates (hyper, tonic, h2) emit debug spans that overwhelm the OTEL batch processor. The Targets filter on the OTEL layer prevents this.
  4. Tower-HTTP TraceLayer removed – tower-http’s TraceLayer was removed entirely because it created a duplicate request span that collided with Fluree’s own request span in create_request_span(). If you re-add tower-http tracing, ensure it does not conflict.
  5. set_global_default in tests – can only be called once per process. Use set_default() which returns a guard scoped to the test.
  6. Compiler won’t catch span.enter() across .await – Unlike what the tracing docs suggest, Entered may actually be Send (since &Span is Send when Span: Sync). The code compiles fine but produces incorrect traces at runtime. The only way to detect this is visual inspection in Jaeger. Grep for span.enter() in async functions as part of code review.
  7. std::thread::scope / std::thread::spawn drops span context – New OS threads start with empty thread-local span context, so any spans created on them become orphaned root traces. You must capture Span::current() and .enter() it inside the thread closure. This same issue applies to tokio::task::spawn_blocking, rayon, and any other thread-spawning API.

Claude Code Trace Analysis Skills

Two Claude Code skills are available for analyzing Jaeger trace exports:

/trace-inspect

Drills into a single trace: span tree visualization, timing breakdown, structural health checks. Use when you have a specific slow request and want to understand where time went.

/trace-inspect path/to/traces.json

/trace-overview

Analyzes all traces in an export: aggregate statistics, anomaly detection across the corpus, comparison of query vs transaction patterns. Use when you want a high-level understanding of system behavior.

/trace-overview path/to/traces.json

Exporting traces from Jaeger

  1. Open Jaeger UI (default: http://localhost:16686)
  2. Search for traces of interest (by service name, operation, duration, etc.)
  3. Click the JSON download button on a trace or search result
  4. Save to a file and pass to either skill

See the OTEL dev harness for running a local Jaeger instance.

Releasing

Cutting a release is two phases: a release PR that bumps the version, then a tag pushed after the PR merges. The tag triggers cargo-dist, which builds binaries and publishes the GitHub Release. Release notes are auto-generated by GitHub from merged PR titles since the previous tag.

The flow

# 1. Branch off main and bump the workspace version.
git checkout main && git pull
git checkout -b release/v4.0.3
$EDITOR Cargo.toml                      # update [workspace.package].version
cargo update --workspace                # refresh Cargo.lock
git commit -am "release v4.0.3"
git push -u origin release/v4.0.3
gh pr create --title "release v4.0.3"

# 2. After the PR is reviewed and merged to main, tag the merge commit.
git checkout main && git pull
git tag v4.0.3
git push origin v4.0.3                  # ← triggers .github/workflows/release.yml

Watch the Actions tab. cargo-dist builds platform binaries, creates the GitHub Release with auto-generated notes, publishes the Homebrew formula, and pushes the multi-arch Docker image.

Why the two-phase split

cargo-dist’s release workflow triggers on any pushed vX.Y.Z tag regardless of which branch the tag points at. Holding the tag step until after merge ensures every release goes through PR review.

How release notes are generated

.github/workflows/release.yml calls gh release create --generate-notes. GitHub builds the release body automatically from PRs merged since the previous tag, categorized per .github/release.yml:

LabelSection
breaking-change, semver:majorBreaking Changes
feature, enhancement, featFeatures
bug, fixBug Fixes
performance, perfPerformance
documentation, docsDocumentation
(anything else)Other Changes

Apply one of these labels to each PR before merging. Unlabeled PRs still appear under “Other Changes” with their full title and author credit, so categorization is nice-to-have, not required.

PR titles already follow the feat: / fix: / docs: convention from CLAUDE.md, which makes the unlabeled “Other Changes” list readable on its own.

Rolling back

During Phase 1, before pushing:

git checkout main
git branch -D release/vX.Y.Z

After the release PR is opened but before merge:

Close the PR and delete the branch on GitHub. Nothing has shipped.

After the tag is pushed but cargo-dist still running:

git push origin :refs/tags/vX.Y.Z       # delete the remote tag
git tag -d vX.Y.Z                       # delete the local tag

Cancel the in-progress Release workflow run from the Actions tab.

After cargo-dist created the GitHub Release:

Delete the GitHub Release from the UI, then delete the tag (commands above). The merge commit on main stays in place — re-tag it once the issue is fixed, or supersede it with another release PR.

Configuration files

  • .github/release.yml — categorization for GitHub’s auto-generated release notes.
  • dist-workspace.toml — cargo-dist’s distribution targets and installers.
  • .github/workflows/release.yml — autogenerated by cargo-dist (with allow-dirty = ["ci"] set so our --generate-notes edit survives dist init).