Fluree DB
A semantic graph database with time travel, branching, and verifiable data — built on W3C standards.
Fluree DB is a single binary that stores your data as an RDF knowledge graph, queryable with SPARQL or JSON-LD Query, with every commit immutably recorded so you can travel back to any prior state. It supports git-style branching and merging, signed and policy-gated transactions, SHACL validation, OWL/RDFS reasoning, and full-text and vector search — over local files, S3, or IPFS — without bolting on external services.
What you get
- Semantic by default. Your data is RDF. IRIs, JSON-LD
@context, named graphs, and typed values are first-class. Queries are SPARQL 1.1 or the equivalent JSON-LD Query, both compiling to the same execution engine. - Time travel. Every transaction is a commit on an immutable chain. Query the state of the graph at any past moment with a single
tparameter — no snapshots to restore, no separate audit log to consult. - Branching and merging. Create a branch off any commit, transact against it in isolation, then merge it back. Useful for staging changes, running what-if analyses, or maintaining environment-specific overlays.
- Verifiable data. Transactions and commits can be signed (JWS / W3C Verifiable Credentials). The commit chain is content-addressed, so any tampering is detectable. Pair it with policy enforcement to prove who changed what and when they were allowed to.
- Policy-based access control. Policies are written as graph data, evaluated per query and per transaction, and travel with the ledger — not bolted on at the API layer.
- Storage your way. Local filesystem for development, S3 + DynamoDB for production, IPFS for content-addressed distribution. The same ledger format works across all of them.
- Search built in. BM25 full-text indexing and HNSW vector search live alongside SPARQL — no separate search service to operate.
- Reasoning. OWL/RDFS inference and Datalog rules run inside the query engine, so derived facts are queryable without a materialization step.
- Embeddable. The same engine that powers the server runs as a Rust library, generic over storage and nameservice. Use it directly in your application or run it standalone over HTTP.
Start here
- New to Fluree? → Getting started
- Run the server → Quickstart: run the server
- Create a ledger and write data → Quickstart: create a ledger → Quickstart: write data
- Query data → Quickstart: query (JSON-LD + SPARQL)
- End-to-end walkthrough → Tutorial: search, time travel, branching, policies
- Coming from SQL? → Fluree for SQL developers
- Embedding in Rust? → Using Fluree as a Rust library
Explore the docs
- Concepts — ledgers, graph sources, IRIs, time travel, policy, verifiable data, reasoning
- Guides (cookbooks) — search, time travel, branching, policies, SHACL — task-oriented recipes
- CLI reference — every
flureecommand, flag by flag - HTTP API — endpoints, headers, signed requests, error model
- Query — JSON-LD Query, SPARQL, output formats, CONSTRUCT, explain plans, reasoning
- Transactions — insert, upsert, update, conditional updates, signed transactions
- Security and policy — authentication, encryption, commit signing, policy model
- Indexing and search — background indexing, BM25, vector search, geospatial
- Graph sources and integrations — Iceberg/Parquet, R2RML, BM25 graph source
- Operations — configuration, Docker, storage modes, telemetry, archive/restore
- Design — internals: query execution, storage traits, index format, nameservice
- Reference — glossary, vocabulary, OWL/RDFS support, crate map
- Troubleshooting — common errors, debugging queries, performance tracing
- Contributing — dev setup, tests, SPARQL compliance, releasing
The full table of contents is in SUMMARY.md.
Fluree Memory
Fluree Memory is persistent, searchable memory for AI coding assistants — built on Fluree DB and shipped in the same fluree binary. If you’re here for the memory tooling, jump straight to the Memory docs.
Fluree CLI
The fluree command-line interface provides a convenient way to manage ledgers, run queries, and perform transactions without running a server.
Installation
Build from source:
cargo build --release -p fluree-db-cli
The binary will be at target/release/fluree.
Quick Start
# Initialize a project directory
fluree init
# Create a ledger
fluree create myledger
# Insert data
fluree insert '@prefix ex: <http://example.org/> .
ex:alice a ex:Person ; ex:name "Alice" .'
# Query
fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
Global Options
| Option | Description |
|---|---|
-v, --verbose | Enable verbose output |
-q, --quiet | Suppress non-essential output |
--no-color | Disable colored output (also respects NO_COLOR env var) |
--config <PATH> | Path to config file |
--memory-budget-mb <MB> | Memory budget in MB for bulk import (0 = auto: 75% of system RAM). Affects chunk size, concurrency, and run budget when creating a ledger with --from. |
--parallelism <N> | Number of parallel parse threads for bulk import (0 = auto: system cores, default cap 6). Used when creating a ledger with --from. |
-h, --help | Print help |
-V, --version | Print version |
Commands
Core Commands
| Command | Description |
|---|---|
init | Initialize a new Fluree project directory |
create | Create a new ledger |
use | Set the active ledger |
list | List all ledgers |
info | Show detailed information about a ledger |
drop | Drop (delete) a ledger |
insert | Insert data into a ledger |
upsert | Upsert data (insert or update existing) |
update | Update with WHERE/DELETE/INSERT patterns |
query | Query a ledger |
history | Show change history for an entity |
export | Export ledger data |
log | Show commit log |
show | Show decoded commit contents (flakes with resolved IRIs) |
index | Build or update the binary index (incremental) |
reindex | Full reindex from commit history |
Remote Sync
| Command | Description |
|---|---|
remote | Manage remote servers |
upstream | Manage upstream tracking configuration |
fetch | Fetch refs from a remote |
clone | Clone a ledger from a remote (full commit download) |
pull | Pull commits from upstream |
push | Push to upstream remote |
track | Track remote-only ledgers (no local data) |
Clone and pull transfer commits and, by default, binary index data from the remote (pack protocol), so the local ledger is query-ready without a separate reindex. Use --no-indexes to skip index transfer and reduce download size; run fluree reindex afterward if you need the index. Large transfers may prompt for confirmation before streaming.
Server Management
| Command | Description |
|---|---|
server | Manage the Fluree HTTP server (run, start, stop, status, restart, logs) |
Start a server directly from a project directory — it inherits the same .fluree/ context (config, storage) as the CLI. See server for details.
Implementers
If you’re building a custom server that must support the CLI end-to-end (for example, integrating into another app), see:
server-integration- endpoints and auth contract required by the CLI
Authentication
| Command | Description |
|---|---|
token | Create, inspect, and manage JWS tokens |
auth | Manage bearer tokens stored on remotes (login/logout/status) |
Configuration
| Command | Description |
|---|---|
config | Manage configuration |
prefix | Manage IRI prefix mappings |
completions | Generate shell completions |
Developer Memory
| Command | Description |
|---|---|
memory | Store and recall facts, decisions, constraints, preferences, and artifact references |
mcp | MCP server for IDE agent integration |
For background, IDE setup, team workflows, and the mem: schema, see the Memory section of the docs.
Project Structure
When you run fluree init, a .fluree/ directory is created with:
.fluree/
├── active # Currently active ledger name
├── config.toml # Configuration settings
├── prefixes.json # IRI prefix mappings
└── storage/ # Ledger data storage
Input Resolution
Commands that accept data input (insert, upsert, update, query) use flexible argument resolution:
| Arguments | Behavior |
|---|---|
| (none) | Active ledger; provide input via -e, -f, or stdin |
<arg> | Auto-detected: if it looks like a query/data, uses it inline; if it’s an existing file, reads from it; otherwise treats it as a ledger name |
<ledger> <input> | Specified ledger + inline input |
Input is resolved in this priority order: -e flag > positional inline > -f flag > positional file > stdin.
Data Format Detection
The CLI auto-detects data format based on content:
- Lines starting with
@prefixor@base→ Turtle - Content starting with
{or[→ JSON-LD - Files with
.ttlextension → Turtle - Files with
.jsonor.jsonldextension → JSON-LD
You can override with --format turtle or --format jsonld.
fluree init
Initialize a new Fluree project directory.
Usage
fluree init [OPTIONS]
Options
| Option | Description |
|---|---|
--global | Create global config and data directories instead of a local .fluree/ directory |
Description
Creates a .fluree/ directory in the current working directory (or global directories with --global). This directory stores:
active- The currently active ledger nameconfig.toml- Configuration settingsprefixes.json- IRI prefix mappings for compact IRIsstorage/- Ledger data
Running init is idempotent - it won’t overwrite existing configuration.
Examples
# Initialize in current directory
fluree init
# Initialize global config
fluree init --global
Global Directory
With --global, the directories are determined by:
$FLUREE_HOMEenvironment variable (if set) — both config and data go in this single directory.- Platform directories (when
$FLUREE_HOMEis not set):
| Content | Linux | macOS | Windows |
|---|---|---|---|
Config (config.toml) | $XDG_CONFIG_HOME/fluree (default: ~/.config/fluree) | ~/Library/Application Support/fluree | %LOCALAPPDATA%\fluree |
Data (storage/, active, prefixes.json) | $XDG_DATA_HOME/fluree (default: ~/.local/share/fluree) | ~/Library/Application Support/fluree | %LOCALAPPDATA%\fluree |
On macOS and Windows both resolve to the same directory (unified); on Linux config and data are separated per XDG conventions.
The generated config.toml will contain an absolute storage_path pointing to the data directory when the directories are split.
See Also
- create - Create a new ledger after initialization
fluree create
Create a new ledger.
Usage
fluree create <LEDGER> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<LEDGER> | Name for the new ledger |
Options
| Option | Description |
|---|---|
--from <PATH> | Import data from a file (Turtle or JSON-LD) |
--memory [PATH] | Import memory history from a git-tracked .fluree-memory/ directory. Defaults to the current repo if no path is given. Mutually exclusive with --from. |
--no-user | Exclude user-scoped memories (.local/user.ttl) from --memory import |
--chunk-size-mb <MB> | Chunk size in MB for splitting large Turtle files (0 = derive from memory budget). Only used when --from points to a .ttl file. |
--leaflet-rows <N> | Rows per leaflet in the binary index (default: 25000). Larger values produce fewer, bigger leaflets — less I/O per scan, more memory per read. |
--leaflets-per-leaf <N> | Leaflets per leaf file (default: 10). Larger values produce fewer leaf files — shallower tree, bigger reads. |
Global flags that affect bulk import when using --from (see CLI README):
--memory-budget-mb <MB>— Memory budget in MB (0 = auto: 75% of system RAM). Drives chunk size, concurrency, and indexer run budget.--parallelism <N>— Number of parallel parse threads (0 = auto: system cores, cap 6).
Description
Creates a new empty ledger with the given name and sets it as the active ledger. The ledger is stored in .fluree/storage/.
Use --from to create a ledger pre-populated with data from a Turtle or JSON-LD file. For large Turtle files, the CLI splits work into chunks and runs parallel parse threads; tune with --memory-budget-mb and --parallelism if needed.
Use --memory to import your project’s developer memory history into a time-travel-capable Fluree ledger. Each git commit that touched .fluree-memory/repo.ttl (and .local/user.ttl unless --no-user is set) becomes a Fluree transaction. The git commit message, SHA, and author date are stored as transaction metadata, so you can correlate Fluree t values with git history.
Examples
# Create an empty ledger
fluree create mydb
# Create with initial data
fluree create mydb --from seed-data.ttl
# Create from JSON-LD
fluree create mydb --from initial.jsonld
# Create with explicit memory and parallelism for a large Turtle file
fluree create mydb --from large.ttl --memory-budget-mb 4096 --parallelism 8
# Import memory history from the current repo
fluree create memories --memory
# Import memory history from another repo, excluding user memories
fluree create memories --memory /path/to/other/repo --no-user
Output
Created ledger 'mydb'
Set 'mydb' as active ledger
With --from:
Created ledger 'mydb'
Committed t=1 (42 flakes)
Set 'mydb' as active ledger
With --memory:
Created ledger 'memories' with 42 commits (t=1..43)
Earliest: bf803255 — initial memory store
Latest: 9865e5cd — prevent overrides of fluree txn-meta
Query with time travel:
fluree query memories 'SELECT ?id ?content WHERE { ?id a mem:Fact ; mem:content ?content } LIMIT 5'
fluree query memories --at-t 2 'SELECT ...' # state at first commit
See Also
fluree use
Set the active ledger.
Usage
fluree use <LEDGER>
Arguments
| Argument | Description |
|---|---|
<LEDGER> | Ledger name to set as active |
Description
Sets the specified ledger as the active ledger. Subsequent commands that don’t specify a ledger will use this one.
Examples
# Switch to a different ledger
fluree use production
# Verify with info
fluree info
Output
Active ledger set to 'production'
Errors
If the ledger doesn’t exist:
error: ledger 'nonexistent' not found
See Also
fluree list
List all ledgers and graph sources.
Usage
fluree list
Description
Lists all ledgers and graph sources (Iceberg, R2RML, BM25, Vector, etc.) in the current Fluree directory. The active ledger is marked with an asterisk (*).
When graph sources are present, a TYPE column is shown to distinguish ledgers from graph sources.
Examples
fluree list
Output
When only ledgers exist:
LEDGER BRANCH T
* mydb main 5
production main 12
When graph sources are also present:
NAME BRANCH TYPE T
* mydb main Ledger 5
production main Ledger 12
warehouse-orders main Iceberg -
my-search main BM25 5
If nothing exists:
No ledgers found. Run 'fluree create <name>' to create one.
See Also
- create - Create a new ledger
- iceberg - Map Iceberg tables as graph sources
- info - Show detailed ledger or graph source information
- use - Switch active ledger
fluree info
Show detailed information about a ledger or graph source.
Usage
fluree info [NAME] [--remote <name>] [--graph <name|IRI>]
Arguments
| Argument | Description |
|---|---|
[NAME] | Ledger or graph source name (defaults to active ledger) |
Options
| Option | Description |
|---|---|
--remote <name> | Query a remote server (e.g., origin) instead of the local installation. |
--graph <name|IRI> | Scope the stats block to a single named graph within the ledger. Accepts a well-known name (default, txn-meta) or a graph IRI. Not applicable to graph sources. |
Description
Displays detailed information about a ledger or graph source. The command first checks for a matching ledger; if none is found, it checks for a graph source with the same name.
For ledgers, displays:
- Ledger ID, branch, and type
- Current transaction number (t)
- Commit and index details
For graph sources (Iceberg, R2RML, BM25, etc.), displays:
- Name, branch, and type
- Graph source ID
- Index status
- Dependencies
- Configuration (catalog URI, table, mapping, etc.)
Examples
# Info for active ledger
fluree info
# Info for specific ledger
fluree info production
# Info for a graph source
fluree info warehouse-orders
# Query a remote server
fluree info production --remote origin
# Scope stats to the default graph
fluree info mydb --graph default
# Scope stats to the transaction-metadata graph
fluree info mydb --graph txn-meta
# Scope stats to a specific named graph by IRI
fluree info mydb --graph https://example.org/graphs/inventory
When --graph is set, the command prints the full ledger-info JSON response
with the stats block scoped to the selected graph (properties, classes,
flakes, size).
Output
Ledger:
Ledger: mydb
Branch: main
Type: Ledger
Ledger ID: mydb:main
Commit t: 5
Commit ID: bafybeig...
Index t: 5
Index ID: bafybeig...
Graph source (Iceberg):
Name: warehouse-orders
Branch: main
Type: Iceberg
ID: warehouse-orders:main
Retracted: false
Index t: 0
Index ID: (none)
Configuration:
{
"catalog": {
"type": "rest",
"uri": "https://polaris.example.com/api/catalog"
},
"table": "sales.orders",
...
}
See Also
- list - List all ledgers and graph sources
- iceberg - Map Iceberg tables as graph sources
- log - Show commit history
fluree branch
Manage branches for a ledger.
Subcommands
fluree branch create
Create a new branch.
Usage:
fluree branch create <NAME> [OPTIONS]
Arguments:
| Argument | Description |
|---|---|
<NAME> | Name for the new branch (e.g., “dev”, “feature-x”) |
Options:
| Option | Description |
|---|---|
--ledger <LEDGER> | Ledger name (defaults to active ledger) |
--from <BRANCH> | Source branch to create from (defaults to “main”) |
--at <COMMIT-REF> | Commit to branch at (defaults to source branch HEAD). Accepts t:N for a transaction number or a hex digest / full CID. |
--remote <REMOTE> | Execute against a remote server |
Description:
Creates a new branch for a ledger. By default the branch starts at the source branch’s current HEAD, and is fully isolated — subsequent transactions on either branch are invisible to the other.
Pass --at to branch from a historical commit on the source branch instead of its HEAD. The commit must be reachable from the source HEAD; the new branch starts with no index and replays from genesis on first query. t:N and hex-prefix resolution require the source branch to be indexed (full CIDs work unconditionally).
Branches can be nested: you can create a branch from any existing branch, not just “main”.
Examples:
# Create a branch from main (default)
fluree branch create dev
# Create a branch for a specific ledger
fluree branch create dev --ledger mydb
# Create a branch from another branch
fluree branch create feature-x --from dev
# Branch at a historical point on main (transaction number)
fluree branch create rewind --at t:5
# Branch at a historical commit by hex-digest prefix
fluree branch create rewind --at 3dd028a7
# Create a branch on a remote server
fluree branch create staging --ledger mydb --remote origin
Output:
Created branch 'dev' from 'main' at t=5
Ledger ID: mydb:dev
fluree branch list
List all branches for a ledger.
Usage:
fluree branch list [LEDGER] [OPTIONS]
Arguments:
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
Options:
| Option | Description |
|---|---|
--remote <REMOTE> | List branches on a remote server |
Examples:
# List branches for the active ledger
fluree branch list
# List branches for a specific ledger
fluree branch list mydb
# List branches on a remote server
fluree branch list mydb --remote origin
Output:
BRANCH T SOURCE
main 5 -
dev 7 main
feature-x 8 dev
fluree branch drop
Drop a branch from a ledger.
Usage:
fluree branch drop <NAME> [OPTIONS]
Arguments:
| Argument | Description |
|---|---|
<NAME> | Branch name to drop (e.g., “dev”, “feature-x”) |
Options:
| Option | Description |
|---|---|
--ledger <LEDGER> | Ledger name (defaults to active ledger) |
--remote <REMOTE> | Execute against a remote server |
Description:
Drops a branch from a ledger. The main branch cannot be dropped.
- Leaf branches (no children) are fully deleted — storage artifacts are removed and the NsRecord is purged.
- Branches with children are retracted (hidden from listings, reject new transactions) but storage is preserved so that child branches continue to work. When the last child is eventually dropped, the retracted parent is automatically cascade-purged.
Examples:
# Drop a branch
fluree branch drop dev
# Drop a branch for a specific ledger
fluree branch drop dev --ledger mydb
# Drop a branch on a remote server
fluree branch drop staging --ledger mydb --remote origin
Output (leaf branch):
Dropped branch 'dev'.
Artifacts deleted: 5
Output (branch with children):
Branch 'dev' retracted (has children, storage preserved).
Output (cascade):
Dropped branch 'feature'.
Artifacts deleted: 3
Cascaded drops: mydb:dev
fluree branch rebase
Rebase a branch onto its source branch’s current HEAD.
Usage:
fluree branch rebase <NAME> [OPTIONS]
Arguments:
| Argument | Description |
|---|---|
<NAME> | Branch name to rebase (e.g., “dev”, “feature-x”) |
Options:
| Option | Description |
|---|---|
--ledger <LEDGER> | Ledger name (defaults to active ledger) |
--strategy <STRATEGY> | Conflict resolution strategy (default: “take-both”). Options: take-both, abort, take-source, take-branch, skip |
--remote <REMOTE> | Execute against a remote server |
Description:
Replays a branch’s unique commits on top of the source branch’s current HEAD. This brings the branch up to date with upstream changes. The main branch cannot be rebased.
If the branch has no unique commits, a fast-forward rebase is performed — the branch point is simply updated to the source’s current HEAD.
Conflicts occur when both the branch and source have modified the same (subject, predicate, graph) tuples. See conflict strategies for details.
Examples:
# Rebase with default strategy
fluree branch rebase dev
# Rebase with abort-on-conflict strategy
fluree branch rebase dev --strategy abort
# Rebase for a specific ledger
fluree branch rebase feature-x --ledger mydb --strategy take-source
# Rebase on a remote server
fluree branch rebase dev --ledger mydb --remote origin
Output (fast-forward):
Fast-forward rebase of 'dev' to t=5.
Output (with replay):
Rebased 'dev': 3 commits replayed, 0 skipped, 1 conflicts, 0 failures.
New branch point: t=8
fluree branch diff
Show a read-only merge preview between two branches.
Usage:
fluree branch diff <SOURCE> [OPTIONS]
Arguments:
| Argument | Description |
|---|---|
<SOURCE> | Source branch name to preview merging from (e.g., “dev”, “feature-x”) |
Options:
| Option | Description |
|---|---|
--target <BRANCH> | Target branch to preview merging into (defaults to source’s parent branch) |
--max-commits <N> | Cap on per-side commit summaries shown (default: 50; pass 0 for unbounded in local mode) |
--max-conflict-keys <N> | Cap on conflict keys shown (default: 50; pass 0 for unbounded in local mode) |
--no-conflicts | Skip conflict computation for a cheaper preview |
--conflict-details | Include source/target flake values for returned conflict keys |
--strategy <STRATEGY> | Strategy used for conflict detail labels (default: take-both). Options: take-both, abort, take-source, take-branch |
--json | Emit the raw JSON preview |
--ledger <LEDGER> | Ledger name (defaults to active ledger) |
--remote <REMOTE> | Execute against a remote server |
Description:
branch diff reports ahead/behind commits, fast-forward eligibility, and conflicting (subject, predicate, graph) keys without mutating state. With --conflict-details, the preview also shows the source and target values for the returned conflict keys and annotates what the selected strategy would do.
Examples:
# Preview merging dev into its parent
fluree branch diff dev
# Preview a specific target
fluree branch diff dev --target main
# Show value details and source-winning labels
fluree branch diff dev --target main --conflict-details --strategy take-source
# Emit raw JSON for UI tooling
fluree branch diff dev --conflict-details --json
fluree branch merge
Merge a source branch into a target branch.
Usage:
fluree branch merge <SOURCE> [OPTIONS]
Arguments:
| Argument | Description |
|---|---|
<SOURCE> | Source branch name to merge from (e.g., “dev”, “feature-x”) |
Options:
| Option | Description |
|---|---|
--ledger <LEDGER> | Ledger name (defaults to active ledger) |
--target <BRANCH> | Target branch to merge into (defaults to source’s parent branch) |
--strategy <STRATEGY> | Conflict resolution strategy (default: take-both). Options: take-both, abort, take-source, take-branch. |
--remote <REMOTE> | Execute against a remote server |
Description:
Merges a source branch into a target branch. When the target hasn’t advanced since the source branched, this is a fast-forward; otherwise --strategy controls how conflicting edits are resolved (mirroring branch rebase).
When --target is omitted, the merge target is inferred from the source branch’s parent (the branch it was created from).
After a successful merge, the source branch remains intact and can continue to receive new transactions and be merged again. Only the new commits since the last merge (or branch creation) are copied.
Examples:
# Merge dev into main (inferred from branch point)
fluree branch merge dev
# Merge feature-x into dev (explicit target)
fluree branch merge feature-x --target dev
# Merge for a specific ledger
fluree branch merge dev --ledger mydb
# Merge with source-winning conflict resolution
fluree branch merge dev --target main --strategy take-source
# Merge on a remote server
fluree branch merge dev --ledger mydb --remote origin
Output:
Merged 'dev' into 'main' (fast-forward to t=8, 3 commits copied).
Output (non-fast-forward):
Merged 'dev' into 'main' (t=9, 3 commits copied, 1 conflicts).
See Also
- create - Create a new ledger
- list - List all ledgers
- info - Show ledger details
- use - Switch active ledger
fluree drop
Drop (delete) a ledger or graph source.
Usage
fluree drop <NAME> --force
Arguments
| Argument | Description |
|---|---|
<NAME> | Ledger or graph source name to drop |
Options
| Option | Description |
|---|---|
--force | Required flag to confirm deletion |
Description
Permanently deletes a ledger or graph source. The --force flag is required to prevent accidental deletion.
The command first tries to drop the name as a ledger. If no ledger is found, it tries to drop it as a graph source. This means fluree drop works uniformly for both ledgers and graph sources like Iceberg mappings.
Examples
# Delete a ledger
fluree drop oldledger --force
# Delete a graph source (Iceberg mapping)
fluree drop warehouse-orders --force
Output
Ledger:
Dropped ledger 'oldledger'
Graph source:
Dropped graph source 'warehouse-orders:main'
Errors
Without --force:
error: use --force to confirm deletion of 'oldledger'
See Also
- create - Create a new ledger
- iceberg - Map Iceberg tables as graph sources
- list - List all ledgers and graph sources
fluree insert
Insert data into a ledger.
Usage
fluree insert [LEDGER] [DATA] [OPTIONS]
Arguments
| Arguments | Behavior |
|---|---|
| (none) | Active ledger; provide data via -e, -f, or stdin |
<arg> | Auto-detected: if it looks like data (JSON, Turtle), uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name |
<ledger> <data> | Specified ledger + inline data |
Options
| Option | Description |
|---|---|
-e, --expr <EXPR> | Inline data expression (alternative to positional) |
-f, --file <FILE> | Read data from a file |
-m, --message <MSG> | Commit message |
--format <FORMAT> | Data format: turtle or jsonld (auto-detected if omitted) |
--remote <NAME> | Execute against a remote server (by remote name, e.g., origin) |
Description
Inserts RDF data into a ledger. Supports both Turtle and JSON-LD formats. Data can come from:
- A positional argument (inline data)
-eflag (inline expression)-fflag (file)- Standard input (pipe)
Examples
# Insert inline Turtle
fluree insert '@prefix ex: <http://example.org/> .
ex:alice a ex:Person ; ex:name "Alice" .'
# Insert inline JSON-LD
fluree insert '{"@id": "ex:bob", "ex:name": "Bob"}'
# Insert from file
fluree insert -f data.ttl
# Insert with commit message
fluree insert -f data.ttl -m "Added initial users"
# Insert into specific ledger
fluree insert production '<http://example.org/x> a <http://example.org/Thing> .'
# Pipe from stdin
cat data.ttl | fluree insert
Output
Committed t=1 (42 flakes)
With verbose mode:
Committed t=1 (42 flakes)
Commit ID: bafybeig...
Data Format Detection
The format is auto-detected:
@prefixor@baseat line start → Turtle- Starts with
{or[→ JSON-LD .ttlfile extension → Turtle.jsonor.jsonldextension → JSON-LD
Override with --format turtle or --format jsonld.
See Also
- upsert - Insert or update existing data
- update - Full WHERE/DELETE/INSERT updates
- query - Query the inserted data
- export - Export all data
fluree upsert
Upsert data into a ledger (insert or update existing).
Usage
fluree upsert [LEDGER] [DATA] [OPTIONS]
Arguments
| Arguments | Behavior |
|---|---|
| (none) | Active ledger; provide data via -e, -f, or stdin |
<arg> | Auto-detected: if it looks like data (JSON, Turtle), uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name |
<ledger> <data> | Specified ledger + inline data |
Options
| Option | Description |
|---|---|
-e, --expr <EXPR> | Inline data expression (alternative to positional) |
-f, --file <FILE> | Read data from a file |
-m, --message <MSG> | Commit message |
--format <FORMAT> | Data format: turtle or jsonld (auto-detected if omitted) |
--remote <NAME> | Execute against a remote server (by remote name, e.g., origin) |
Description
Upserts RDF data into a ledger. Unlike insert, upsert will:
- Insert new entities
- Replace existing values for entities that already exist (matched by
@id)
This is useful for updating data without needing to know whether it exists.
Examples
# Update or insert a user
fluree upsert '@prefix ex: <http://example.org/> .
ex:alice ex:name "Alice Smith" ; ex:age 31 .'
# Upsert from file
fluree upsert -f updates.ttl
# Upsert with commit message
fluree upsert '{"@id": "ex:alice", "ex:status": "active"}' -m "Updated Alice status"
Output
Committed t=2 (3 flakes)
Difference from Insert
| Operation | Existing Entity | New Entity |
|---|---|---|
insert | Adds new triples (may create duplicates) | Creates entity |
upsert | Replaces values for given predicates | Creates entity |
See Also
- insert - Insert without replacement
- update - Full WHERE/DELETE/INSERT updates
- query - Query data
- history - View change history
fluree update
Update data with full WHERE/DELETE/INSERT semantics.
Usage
fluree update [LEDGER] [DATA] [OPTIONS]
Arguments
| Arguments | Behavior |
|---|---|
| (none) | Active ledger; provide data via -e, -f, or stdin |
<arg> | Auto-detected: if it looks like data (JSON or SPARQL UPDATE), uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name |
<ledger> <data> | Specified ledger + inline data |
Options
| Option | Description |
|---|---|
-e, --expr <EXPR> | Inline data expression (alternative to positional) |
-f, --file <FILE> | Read data from a file |
-m, --message <MSG> | Commit message |
--format <FORMAT> | Data format: jsonld or sparql (auto-detected if omitted) |
--remote <NAME> | Execute against a remote server (by remote name) |
--direct | Bypass auto-routing through a local server (global flag; see note on SPARQL UPDATE below) |
Description
Executes a WHERE/DELETE/INSERT transaction against a ledger. Unlike insert (which only adds data) and upsert (which replaces by subject+predicate), update supports the full WHERE/DELETE/INSERT pattern, enabling:
- Conditional deletes: delete triples matching a WHERE pattern
- Conditional updates: delete old values and insert new ones based on WHERE matches
- Computed updates: use variables from WHERE to derive new values via
bind - Delete-only operations: WHERE + DELETE without INSERT
- Insert-only operations: equivalent to
insertbut using the update command
Supported Formats
- JSON-LD (default): transaction body with
where,delete, and/orinsertkeys - SPARQL UPDATE: standard
INSERT DATA,DELETE DATA,DELETE/INSERT WHEREsyntax
SPARQL UPDATE Note
SPARQL UPDATE requires the server’s parsing pipeline. It works automatically when:
- A local server is running (the CLI auto-routes through it by default)
- Using
--remoteto target a remote server
For direct local mode (--direct), use JSON-LD format instead.
Examples
Conditional Property Update (JSON-LD)
# Update Alice's age: find old value, delete it, insert new one
fluree update '{
"@context": {"ex": "http://example.org/"},
"where": [{"@id": "ex:alice", "ex:age": "?oldAge"}],
"delete": [{"@id": "ex:alice", "ex:age": "?oldAge"}],
"insert": [{"@id": "ex:alice", "ex:age": 31}]
}'
Delete-Only
# Remove all email addresses for alice
fluree update '{
"@context": {"ex": "http://example.org/"},
"where": [{"@id": "ex:alice", "ex:email": "?email"}],
"delete": [{"@id": "ex:alice", "ex:email": "?email"}]
}'
Bulk Conditional Update
# Set all "pending" users to "active"
fluree update '{
"@context": {"ex": "http://example.org/"},
"where": [{"@id": "?person", "ex:status": "pending"}],
"delete": [{"@id": "?person", "ex:status": "pending"}],
"insert": [{"@id": "?person", "ex:status": "active"}]
}'
From File
fluree update -f update.json
fluree update -f update.json -m "Updated user statuses"
SPARQL UPDATE (via server)
# Requires a running server (fluree server start)
fluree update -e 'PREFIX ex: <http://example.org/>
DELETE { ex:alice ex:age ?oldAge }
INSERT { ex:alice ex:age 31 }
WHERE { ex:alice ex:age ?oldAge }'
Pipe from stdin
cat update.json | fluree update
Target a specific ledger
fluree update production -f migration.json
Output
Committed t=3, 4 flakes
With remote mode, the full server response is printed as JSON.
Format Detection
The format is auto-detected using this priority:
- Explicit flag (
--format) — always wins - File extension (when using
-for a positional file path):.json,.jsonld→ JSON-LD.rq,.ru,.sparql→ SPARQL UPDATE
- Content sniffing:
- Valid JSON (full parse, not just first character) → JSON-LD
- Starts with
INSERT,DELETE,PREFIX, orBASE→ SPARQL UPDATE
Override with --format jsonld or --format sparql.
Comparison with Insert and Upsert
| Operation | WHERE clause | DELETE | Conditional | Use case |
|---|---|---|---|---|
insert | No | No | No | Add new data |
upsert | No | Auto (per subject+predicate) | No | Replace values for known entities |
update | Yes | Explicit | Yes | Targeted updates, deletes, complex transformations |
See Also
- insert - Insert new data
- upsert - Insert or replace existing data
- Update (WHERE/DELETE/INSERT) - Full transaction syntax guide
- query - Query data
fluree query
Query a ledger.
Usage
fluree query [LEDGER] [QUERY] [OPTIONS]
Arguments
| Arguments | Behavior |
|---|---|
| (none) | Active ledger; provide query via -e, -f, or stdin |
<arg> | Auto-detected: if it looks like a query, uses it inline with the active ledger; if it’s an existing file, reads from it; otherwise treats it as a ledger name |
<ledger> <query> | Specified ledger + inline query |
Options
| Option | Description |
|---|---|
-e, --expr <EXPR> | Inline query expression (alternative to positional) |
-f, --file <FILE> | Read query from a file |
--format <FORMAT> | Output format: json, typed-json, table, csv, or tsv (default: table) |
--sparql | Force SPARQL query format |
--jsonld | Force JSON-LD query format |
--at <TIME> | Query at a specific point in time |
--normalize-arrays | Always wrap multi-value properties in arrays (graph-crawl JSON-LD queries only) |
--bench | Benchmark mode: time execution only and print the first 5 rows as a table (no full-result JSON formatting) |
--explain | Print the query plan without executing it |
--remote <NAME> | Execute against a remote server (by remote name, e.g., origin) |
Description
Executes a query against a ledger. Supports both SPARQL and JSON-LD query formats.
Query Formats
SPARQL
fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
JSON-LD Query
fluree query '{"select": ["?name"], "where": {"http://example.org/name": "?name"}}'
Format is auto-detected if not specified:
- Contains
SELECT,CONSTRUCT,ASK, orDESCRIBE→ SPARQL - Otherwise → JSON-LD
Output Formats
JSON (default)
fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
{
"head": {"vars": ["name"]},
"results": {"bindings": [{"name": {"type": "literal", "value": "Alice"}}]}
}
Table
fluree query --format table 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
┌───────┐
│ name │
├───────┤
│ Alice │
│ Bob │
└───────┘
CSV
fluree query --format csv 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
name
Alice
Bob
Note: --format csv (and --format tsv) are only supported for local ledgers. Tracked/remote ledgers support json and table output.
Time Travel
Query historical states with --at:
# Query at transaction 5
fluree query --at 5 'SELECT * WHERE { ?s ?p ?o }'
# Query at specific commit
fluree query --at abc123def 'SELECT * WHERE { ?s ?p ?o }'
# Query at ISO-8601 timestamp
fluree query --at 2024-01-15T10:30:00Z 'SELECT * WHERE { ?s ?p ?o }'
Tracked/remote ledgers also support --at. The CLI will translate --at into the appropriate dataset/time-travel form when forwarding the query to the remote server.
SPARQL note (remote): if your SPARQL already includes FROM / FROM NAMED, the CLI will not rewrite it for --at. In that case, encode time travel directly in the FROM IRI (e.g., FROM <myledger:main@t:5>).
Examples
# Inline SPARQL query (most common)
fluree query 'SELECT ?name WHERE { ?s <http://example.org/name> ?name }'
# JSON-LD query inline
fluree query '{"select": {"?s": ["*"]}, "where": {"@id": "?s"}}'
# Query specific ledger with CSV output
fluree query production --format csv 'SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10'
# SPARQL query from file
fluree query -f query.rq
# Time travel query
fluree query --at 3 'SELECT * WHERE { ?s ?p ?o }'
# Pipe from stdin
cat query.rq | fluree query
See Also
fluree history
Show change history for an entity.
Usage
fluree history <ENTITY> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<ENTITY> | Entity IRI (compact or full) |
Options
| Option | Description |
|---|---|
--ledger <LEDGER> | Ledger name (defaults to active ledger) |
--from <TIME> | Start of time range (default: 1) |
--to <TIME> | End of time range (default: latest) |
-p, --predicate <PRED> | Filter to specific predicate |
--format <FORMAT> | Output format: json, table, or csv (default: table) |
Description
Shows the change history for a specific entity across transactions. Each change shows:
t- Transaction numberop- Operation:+(assert) or-(retract)predicate- The property that changed (if not filtered)value- The value asserted or retracted
Prefix Expansion
Entity IRIs can use stored prefixes:
# First, add a prefix
fluree prefix add ex http://example.org/
# Then use compact IRI
fluree history ex:alice
Or use the full IRI:
fluree history http://example.org/alice
Examples
# Show all changes to an entity
fluree history ex:alice
# Show changes in JSON format
fluree history ex:alice --format json
# Filter to specific predicate
fluree history ex:alice -p ex:name
# Show changes in a time range
fluree history ex:alice --from 1 --to 5
# Query specific ledger
fluree history ex:alice --ledger production
Output
Table (default)
┌───┬────┬─────────────────────────────────┬─────────────┐
│ t │ op │ predicate │ value │
├───┼────┼─────────────────────────────────┼─────────────┤
│ 1 │ + │ http://example.org/name │ Alice │
│ 1 │ + │ http://example.org/age │ 30 │
│ 2 │ - │ http://example.org/name │ Alice │
│ 2 │ + │ http://example.org/name │ Alice Smith │
└───┴────┴─────────────────────────────────┴─────────────┘
JSON
[
{"?t": 1, "?op": true, "?p": "http://example.org/name", "?v": "Alice"},
{"?t": 1, "?op": true, "?p": "http://example.org/age", "?v": 30},
{"?t": 2, "?op": false, "?p": "http://example.org/name", "?v": "Alice"},
{"?t": 2, "?op": true, "?p": "http://example.org/name", "?v": "Alice Smith"}
]
CSV
t,op,predicate,value
1,+,http://example.org/name,Alice
1,+,http://example.org/age,30
2,-,http://example.org/name,Alice
2,+,http://example.org/name,Alice Smith
See Also
fluree export
Export ledger data as Turtle, N-Triples, N-Quads, TriG, or JSON-LD.
Usage
fluree export [LEDGER] [OPTIONS]
Arguments
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
Options
| Option | Description |
|---|---|
--format <FORMAT> | Output format: turtle (or ttl), ntriples (or nt), jsonld, trig, or nquads (default: turtle) |
--all-graphs | Export default + all named graphs including system graphs (dataset export). Requires --format trig or --format nquads. |
--graph <IRI> | Export a specific named graph by IRI. Mutually exclusive with --all-graphs. |
--context <JSON> | JSON-LD context for prefix declarations. Overrides the ledger’s default context. |
--context-file <FILE> | Read context from a JSON file. Overrides the ledger’s default context. |
--at <TIME> | Export data as of a specific point in time. Accepts a transaction number (5), ISO-8601 datetime (2024-01-15T10:30:00Z), or commit CID prefix (abc123def456). If omitted, exports at the latest committed time (including data committed but not yet persisted to index). |
Formats
turtle / jsonld (data snapshot)
Exports a point-in-time snapshot of all triples in the ledger. Output goes to stdout.
ledger (native pack)
Exports the full native ledger — all commits, transaction blobs, indexes, and dictionaries — as a .flpack file. This format preserves the complete history and can be imported into a new Fluree instance via fluree create <name> --from <file>.flpack.
The .flpack format uses the fluree-pack-v1 binary wire protocol (the same format used by fluree clone and fluree pull for network transfers).
All formats (Turtle, N-Triples, N-Quads, TriG, JSON-LD) read directly from the binary SPOT index with a novelty overlay, so export always includes the latest committed transactions — even those not yet persisted to index. Memory usage stays constant regardless of dataset size. JSON-LD streams one subject at a time, so memory is O(largest subject), not O(dataset).
Prefixes / Context
Turtle, TriG, and JSON-LD output use prefix compaction to produce compact, readable output. The prefix map is resolved in this order:
--contextor--context-file(explicit override)- The ledger’s default context (set via
fluree context set) - No prefixes (falls back to full IRIs)
The context format is a JSON object mapping prefixes to namespace IRIs:
{"ex": "http://example.org/", "schema": "http://schema.org/"}
Prerequisites
All export formats require a binary index. Ledgers that have only been created and inserted into (without an index build) cannot be exported. Run the server to trigger index building first.
Examples
# Export as Turtle (default) — uses ledger's default context for prefixes
fluree export > backup.ttl
# Export as Turtle with custom prefixes
fluree export --context '{"ex": "http://example.org/"}' > backup.ttl
# Export as Turtle with prefixes from a file
fluree export --context-file prefixes.json > backup.ttl
# Export as N-Triples (no prefixes, one triple per line)
fluree export --format ntriples > backup.nt
# Export as JSON-LD
fluree export --format jsonld > backup.jsonld
# Export all graphs as TriG
fluree export --all-graphs --format trig > backup.trig
# Export all graphs as N-Quads
fluree export --all-graphs --format nquads > backup.nq
# Export a specific named graph
fluree export --graph "http://example.org/g1" --format turtle > g1.ttl
# Export data as of a specific transaction number
fluree export --at 5 > snapshot-at-t5.ttl
# Export data as of an ISO-8601 datetime
fluree export --at "2024-06-15T12:00:00Z" > snapshot.ttl
# Export data as of a specific commit
fluree export --at abc123def456 > at-commit.ttl
# Export specific ledger
fluree export production > prod-backup.ttl
# Pipe to other tools
fluree export | grep "example.org"
Output
Turtle (default)
@prefix ex: <http://example.org/> .
ex:alice
a ex:Person ;
ex:name "Alice" .
ex:bob
a ex:Person ;
ex:name "Bob" .
N-Triples
<http://example.org/alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Person> .
<http://example.org/alice> <http://example.org/name> "Alice" .
TriG (all graphs)
@prefix ex: <http://example.org/> .
ex:alice
ex:name "Alice" .
GRAPH ex:g1 {
ex:bob
ex:name "Bob" .
}
N-Quads (all graphs)
<http://example.org/alice> <http://example.org/name> "Alice" .
<http://example.org/bob> <http://example.org/name> "Bob" <http://example.org/g1> .
JSON-LD
{
"@context": {
"ex": "http://example.org/"
},
"@graph": [
{"@id": "ex:alice", "@type": "ex:Person", "ex:name": "Alice"},
{"@id": "ex:bob", "@type": "ex:Person", "ex:name": "Bob", "ex:age": {"@value": 25, "@type": "http://www.w3.org/2001/XMLSchema#long"}}
]
}
JSON-LD output uses prefix compaction from the context. Value encoding rules:
- Plain strings (
xsd:string) → JSON string (no@type) - Booleans → native JSON
true/false - Integers/longs →
{"@value": 42, "@type": "xsd:long"}(explicit datatype) - Decimals →
{"@value": "3.14", "@type": "xsd:decimal"} - Doubles →
{"@value": 3.14, "@type": "xsd:double"} - Language-tagged strings →
{"@value": "Bonjour", "@language": "fr"} - References →
{"@id": "ex:other"} - Single-cardinality properties are unwrapped (not in
[]) - Multi-cardinality properties use arrays
API Usage
The export feature is available at the API level for upstream applications:
#![allow(unused)]
fn main() {
use fluree_db_api::export::ExportFormat;
// Turtle with default context
let stats = fluree.export("mydb")
.format(ExportFormat::Turtle)
.write_to(&mut writer)
.await?;
// N-Quads with all graphs
let stats = fluree.export("mydb")
.format(ExportFormat::NQuads)
.all_graphs()
.write_to(&mut writer)
.await?;
// Turtle with custom prefixes
let stats = fluree.export("mydb")
.format(ExportFormat::Turtle)
.context(&json!({"ex": "http://example.org/"}))
.write_to(&mut writer)
.await?;
// JSON-LD with prefix compaction
let stats = fluree.export("mydb")
.format(ExportFormat::JsonLd)
.context(&json!({"ex": "http://example.org/"}))
.write_to(&mut writer)
.await?;
// Export a specific named graph
let stats = fluree.export("mydb")
.format(ExportFormat::Turtle)
.graph("http://example.org/g1")
.write_to(&mut writer)
.await?;
// Time-travel: export as of transaction t=5
let stats = fluree.export("mydb")
.format(ExportFormat::Turtle)
.as_of(TimeSpec::at_t(5))
.write_to(&mut writer)
.await?;
// Time-travel: export as of an ISO-8601 datetime
let stats = fluree.export("mydb")
.format(ExportFormat::Turtle)
.as_of(TimeSpec::at_time("2024-06-15T12:00:00Z"))
.write_to(&mut writer)
.await?;
// Convenience: write directly to stdout
let stats = fluree.export("mydb")
.format(ExportFormat::Turtle)
.to_stdout()
.await?;
}
See Also
fluree context
Manage the default JSON-LD context for a ledger.
Usage
fluree context <COMMAND>
Subcommands
| Command | Description |
|---|---|
get [LEDGER] | Show the default JSON-LD context |
set [LEDGER] | Replace the default JSON-LD context |
Description
Each ledger can have a default context — a JSON object mapping prefixes to IRIs (e.g., {"ex": "http://example.org/"}). When a JSON-LD query is sent via the CLI and omits its own @context, the ledger’s default context is injected automatically. The HTTP API requires ?default-context=true to opt in per request, and fluree-db-api requires explicit opt-in via its default-context view builders.
Default context is populated automatically during bulk import (from Turtle @prefix declarations). This command allows reading or replacing it after the fact.
The context is stored in content-addressed storage (CAS) and referenced from the nameservice config. Updates use compare-and-set semantics, so concurrent writers are safely handled.
context get
Show the current default context.
fluree context get [LEDGER]
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
Examples
# Show context for active ledger
fluree context get
# Show context for a specific ledger
fluree context get mydb
Output (pretty-printed JSON):
{
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"owl": "http://www.w3.org/2002/07/owl#",
"ex": "http://example.org/"
}
If no default context has been set, a message is printed to stderr.
context set
Replace the default context with a new JSON object.
fluree context set [LEDGER] [OPTIONS]
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
| Option | Description |
|---|---|
-e, --expr <JSON> | Inline JSON context |
-f, --file <PATH> | Read context from a JSON file |
If neither -e nor -f is provided, context is read from stdin.
The body can be either a bare JSON object or wrapped in {"@context": {...}} — both forms are accepted.
Examples
# Set inline
fluree context set mydb -e '{"ex": "http://example.org/", "foaf": "http://xmlns.com/foaf/0.1/"}'
# Set from file
fluree context set mydb -f context.json
# Pipe from stdin
cat context.json | fluree context set mydb
# Wrapped form also accepted
fluree context set mydb -e '{"@context": {"ex": "http://example.org/"}}'
See Also
- prefix — Manage CLI-local prefix mappings (stored in project config, not the ledger)
- export — Export ledger data (the default context drives prefix output)
- IRIs, namespaces, and JSON-LD @context — Conceptual overview
fluree log
Show commit log for a ledger.
Usage
fluree log [LEDGER] [OPTIONS]
Arguments
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
Options
| Option | Description |
|---|---|
--oneline | Show one-line summary per commit |
-n, --count <N> | Maximum number of commits to show |
Description
Displays the commit history for a ledger, similar to git log. Shows transaction numbers, timestamps, and commit details.
Examples
# Show full commit log
fluree log
# Show last 5 commits
fluree log -n 5
# One-line format
fluree log --oneline
# Specific ledger
fluree log production --oneline -n 10
Output
Full Format (default)
commit bafybeig2k5...
t: 3
Date: 2024-01-15T10:30:00Z
Added new users
commit bafybeig7x3...
t: 2
Date: 2024-01-14T09:15:00Z
commit bafybeig9m1...
t: 1
Date: 2024-01-13T08:00:00Z
Initial data load
One-line Format
bafybeig2k5 t=3 Added new users
bafybeig7x3 t=2
bafybeig9m1 t=1 Initial data load
See Also
- show - Show decoded contents of a specific commit
- info - Show ledger details
- history - Show entity change history
fluree show
Show the decoded contents of a commit — assertions and retractions with resolved IRIs.
Usage
fluree show <COMMIT> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<COMMIT> | Commit identifier: t:<N> transaction number, hex-digest prefix (min 6 chars), or full CID |
Options
| Option | Description |
|---|---|
--ledger <NAME> | Ledger name (defaults to active ledger) |
--remote <NAME> | Execute against a remote server (by remote name, e.g., “origin”) |
Description
Displays the full decoded contents of a single commit, similar to git show. Each flake (assertion or retraction) is rendered with IRIs compacted using the ledger’s namespace prefix table.
The commit identifier can be:
- A transaction number prefixed with
t:(e.g.,t:5) as shown influree logoutput - An abbreviated hex digest (minimum 6 characters) as shown in the storage directory or obtained from the txn-meta graph
- A full CID string (e.g.,
bagaybqabciq...)
Policy Filtering
When executed against a remote server (--remote), the returned flakes are filtered by the server’s data-auth policy. The identity is derived from the Bearer token and the policy class from the server’s default_policy_class configuration. Flakes the caller is not permitted to read are silently omitted, and the asserts/retracts counts reflect only the visible flakes.
Unlike the query endpoints, show does not support per-request policy overrides via headers or request body — it uses only the Bearer token identity and server-configured default policy class.
When executed locally (no --remote, or with --direct), fluree show operates with full local-admin access and no policy filtering is applied. This is consistent with other local CLI operations that read directly from storage.
Output Format
The output is a JSON object containing:
| Field | Description |
|---|---|
id | Full CID of the commit |
t | Transaction number |
time | ISO 8601 timestamp |
size | Commit blob size in bytes |
previous | Previous commit CID |
signer | Transaction signer (if signed) |
asserts | Number of assertion flakes |
retracts | Number of retraction flakes |
@context | Namespace prefix table (prefix → IRI) |
flakes | Array of flake tuples in SPOT order |
Each flake is a tuple: [subject, predicate, object, datatype, operation]
operation:true= assert (added),false= retract (removed)- Ref objects use
"@id"as the datatype - When metadata is present (language tag, list index, or named graph), a 6th element is appended:
{"lang": "en", "i": 0, "graph": "ex:myGraph"}
Examples
# Show a commit by transaction number
fluree show t:5
# Show a commit by hex prefix
fluree show 3dd028
# Show a commit from a specific ledger
fluree show 0303b7 --ledger _system
# Show a commit on a remote server
fluree show t:5 --remote origin
# Show by hex prefix on remote with explicit ledger
fluree show 3dd028 --remote origin --ledger mydb
# Pipe to jq for filtering
fluree show 3dd028 | jq '.flakes[] | select(.[4] == true)'
Example Output
{
"id": "bagaybqabciqd3ubikmk2zh6gjxngpgjja3vi5myleidf46htiybpswyy2665zra",
"t": 40,
"time": "2026-03-12T16:58:18.395474217+00:00",
"size": 327,
"previous": "bagaybqabciqc64dbbv46vrueddgqfrafgmo27u4fibkrvwdmr2g6ze4cbaeg23a",
"asserts": 1,
"retracts": 1,
"@context": {
"xsd": "http://www.w3.org/2001/XMLSchema#",
"schema": "http://schema.org/",
"f": "https://ns.flur.ee/db#"
},
"flakes": [
["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T14:15:30Z", "xsd:string", false],
["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T16:58:16Z", "xsd:string", true]
]
}
In this example, one property (dateModified) was updated: the old value was retracted (false) and the new value asserted (true).
See Also
- log - Show commit log (list of commits)
- history - Show change history for a specific entity
- info - Show ledger details
fluree index
Build or update the binary index for a ledger.
Usage
fluree index [LEDGER]
Arguments
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
Description
Performs incremental indexing when possible — merges only new commits into the existing index. Falls back to a full rebuild if incremental indexing isn’t possible (e.g., no prior index exists).
Run this after transactions to clear the novelty layer and speed up queries. For routine use this is preferred over reindex, which always rebuilds from scratch.
Examples
# Index the active ledger
fluree index
# Index a specific ledger
fluree index mydb
Output
Indexed mydb to t=15 (root: bafyreig...)
When to Use
- After bulk transactions — clears accumulated novelty so queries hit the optimized binary index instead of scanning in-memory flakes.
- Routine maintenance — keeps query performance consistent as data grows.
- After
clone --no-indexesorpull --no-indexes— builds the local index that was skipped during transfer.
For a clean rebuild from commit history (e.g., suspected corruption), use reindex instead.
See Also
- reindex - Full rebuild from commit history
- Background indexing - Automatic indexing in the server
- Reindex API - Rust API reference
fluree reindex
Full reindex from commit history.
Usage
fluree reindex [LEDGER]
Arguments
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
Description
Rebuilds the binary index from scratch by replaying all commits in order. This is a heavier operation than index — use it when the index is corrupted, missing, or you want a guaranteed clean rebuild.
For routine indexing after transactions, prefer index.
Examples
# Reindex the active ledger
fluree reindex
# Reindex a specific ledger
fluree reindex mydb
Output
Reindexed mydb to t=15 (root: bafyreig...)
When to Use
- Suspected index corruption — query results seem wrong or incomplete.
- After schema or configuration changes that affect index structure.
- Clean slate — you want to guarantee the index matches the commit history exactly.
For incremental indexing (faster, merges only new commits), use index instead.
See Also
- index - Incremental index build
- Background indexing - Automatic indexing in the server
- Reindex API - Rust API reference
fluree config
Manage configuration settings.
Usage
fluree config <COMMAND>
Subcommands
| Command | Description |
|---|---|
get <KEY> | Get a configuration value |
set <KEY> <VALUE> | Set a configuration value |
list | List all configuration values |
set-origins <LEDGER> --file <PATH> | Set CID fetch origins for a ledger (writes a LedgerConfig to CAS and updates config_id) |
Description
Manages configuration stored in .fluree/config.toml. Configuration uses dotted keys for nested values (e.g., storage.path).
Examples
Get a value
fluree config get storage.path
Output:
/custom/storage/path
Set a value
fluree config set storage.path /custom/storage/path
Output:
Set 'storage.path' = "/custom/storage/path"
List all values
fluree config list
Output:
storage.path = "/custom/storage/path"
storage.encryption = "aes256"
If no configuration is set:
(no configuration set)
Configuration File
Configuration is stored in .fluree/config.toml:
[storage]
path = "/custom/storage/path"
encryption = "aes256"
Errors
Getting a key that doesn’t exist:
error: configuration key 'nonexistent' is not set
See Also
fluree config set-origins
Store a LedgerConfig blob in local CAS and update the ledger’s nameservice record to point to it via config_id.
This enables origin-based fluree pull (when no upstream remote is configured) and improves fluree clone --origin by allowing the remote to advertise multiple fallback origins.
Usage
fluree config set-origins <LEDGER> --file <PATH>
Arguments
| Argument | Description |
|---|---|
<LEDGER> | Ledger ID (e.g., mydb or mydb:main) |
Options
| Option | Description |
|---|---|
--file <PATH> | Path to a JSON file containing a LedgerConfig |
LedgerConfig File Format
The file is canonical JSON using compact f: keys (not JSON-LD):
{
"f:origins": [
{ "f:priority": 10, "f:enabled": true, "f:transport": "http://localhost:8090", "f:auth": { "f:mode": "none" } }
],
"f:replication": { "f:preferPack": true, "f:maxPackMiB": 64 }
}
Notes:
f:transportis an origin base URL. The CLI normalizes it the same way as remotes: it will append/flureeif missing and will useGET /.well-known/fluree.jsondiscovery when available.- Auth requirements are declarative. Credentials are not stored in the
LedgerConfig.
Current Limitations
fluree pullvia origins currently does not attach a Bearer token from any credential store, so only origins withf:auth.f:mode = "none"are usable for pull today.fluree clone --origin ... --token ...can use a Bearer token for origin fetch.
fluree prefix
Manage IRI prefix mappings.
Usage
fluree prefix <COMMAND>
Subcommands
| Command | Description |
|---|---|
add <PREFIX> <IRI> | Add a prefix mapping |
remove <PREFIX> | Remove a prefix mapping |
list | List all prefix mappings |
Description
Manages IRI prefix mappings stored in .fluree/prefixes.json. These prefixes are used to expand compact IRIs in commands like history.
Examples
Add a prefix
fluree prefix add ex http://example.org/
fluree prefix add foaf http://xmlns.com/foaf/0.1/
fluree prefix add schema https://schema.org/
Output:
Added prefix: ex = <http://example.org/>
List prefixes
fluree prefix list
Output:
ex: <http://example.org/>
foaf: <http://xmlns.com/foaf/0.1/>
schema: <https://schema.org/>
If no prefixes are defined:
(no prefixes defined)
Add prefixes with: fluree prefix add <prefix> <iri>
Example: fluree prefix add ex http://example.org/
Remove a prefix
fluree prefix remove foaf
Output:
Removed prefix: foaf
Usage with History
Once prefixes are defined, you can use compact IRIs:
# Instead of:
fluree history http://example.org/alice
# Use:
fluree history ex:alice
IRI Best Practices
IRI namespaces should end with / or #:
# Good
fluree prefix add ex http://example.org/
fluree prefix add foaf http://xmlns.com/foaf/0.1/
# Warning (will still work but may cause issues)
fluree prefix add bad http://example.org
Storage
Prefixes are stored in .fluree/prefixes.json:
{
"ex": "http://example.org/",
"foaf": "http://xmlns.com/foaf/0.1/"
}
See Also
fluree token
Manage JWS tokens for authentication with Fluree servers.
Subcommands
| Subcommand | Description |
|---|---|
create | Create a new JWS token |
keygen | Generate a new Ed25519 keypair |
inspect | Decode and verify a JWS token |
fluree token create
Create a new JWS token for authenticating with Fluree servers.
Usage
fluree token create --private-key <KEY> [OPTIONS]
Options
| Option | Description |
|---|---|
--private-key <KEY> | Required. Ed25519 private key (hex, base58, @filepath, or @- for stdin) |
--expires-in <DUR> | Token lifetime (default: 1h). Supports s, m, h, d, w suffixes |
--subject <SUB> | Subject claim (sub) - identity of the token holder |
--audience <AUD> | Audience claim (aud) - repeatable for multiple audiences |
--identity <ID> | Fluree identity claim (fluree.identity) - takes precedence over sub for policy |
--all | Grant full access to all ledgers (events, storage, read, and write) |
--events-ledger <ALIAS> | Grant events access to specific ledger (repeatable) |
--storage-ledger <ALIAS> | Grant storage access to specific ledger (repeatable) |
--read-all | Grant data API read access to all ledgers (fluree.ledger.read.all=true) |
--read-ledger <ALIAS> | Grant data API read access to specific ledger (repeatable) |
--write-all | Grant data API write access to all ledgers (fluree.ledger.write.all=true) |
--write-ledger <ALIAS> | Grant data API write access to specific ledger (repeatable) |
--graph-source <ALIAS> | Grant access to specific graph source (repeatable) |
--output <FMT> | Output format: token, json, or curl (default: token) |
--print-claims | Print decoded claims to stderr |
Private Key Formats
| Format | Example |
|---|---|
| Hex | 0x<64 hex chars> or <64 hex chars> |
| Base58 | z<base58 string> (multibase) or raw base58 |
| File | @/path/to/keyfile or @~/.fluree/key (tilde expansion) |
| Stdin | @- (read from stdin to avoid shell history) |
Examples
# Create a token with full access
fluree token create --private-key 0x1234...abcd --all
# Create a token for specific ledgers (events/storage)
fluree token create --private-key @~/.fluree/key \
--events-ledger mydb --storage-ledger mydb
# Create a token with data API read+write for specific ledgers
fluree token create --private-key @~/.fluree/key \
--read-ledger mydb:main --write-ledger mydb:main
# Create a token with identity and audience
fluree token create --private-key @- \
--identity did:example:alice \
--audience https://api.example.com \
--expires-in 7d
# Output as curl command
fluree token create --private-key 0x... --all --output curl
# View claims while creating
fluree token create --private-key 0x... --all --print-claims
fluree token keygen
Generate a new Ed25519 keypair for signing tokens.
Usage
fluree token keygen [OPTIONS]
Options
| Option | Description |
|---|---|
--format <FMT> | Output format: hex, base58, or json (default: hex) |
-o, --output <PATH> | Write private key to file (otherwise prints to stdout) |
Examples
# Generate keypair in hex format
fluree token keygen
# Generate in JSON format with all representations
fluree token keygen --format json
# Save private key to file
fluree token keygen --output ~/.fluree/key
# Generate base58 format
fluree token keygen --format base58
Output
Hex format:
Private key: 0x1234567890abcdef...
Public key: 0xabcdef1234567890...
DID: did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
JSON format:
{
"private_key": {
"hex": "0x1234...",
"base58": "z..."
},
"public_key": {
"hex": "0xabcd...",
"base58": "z..."
},
"did": "did:key:z6Mk..."
}
fluree token inspect
Decode and optionally verify a JWS token.
Usage
fluree token inspect <TOKEN> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<TOKEN> | JWS token string or @filepath |
Options
| Option | Description |
|---|---|
--no-verify | Skip signature verification (default: verify) |
--output <FMT> | Output format: pretty, json, or table (default: pretty) |
Examples
# Inspect and verify a token
fluree token inspect eyJhbGciOiJFZERTQSI...
# Inspect without verification
fluree token inspect eyJ... --no-verify
# Output as JSON
fluree token inspect eyJ... --output json
# Read token from file
fluree token inspect @token.txt
Output
Pretty format:
Token Information
─────────────────────────────────────────────────────
Issuer: did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
Subject: test@example.com
Issued: 2024-01-15 10:30:00 UTC
Expires: 2024-01-15 11:30:00 UTC
Permissions:
Events: all ledgers
Storage: all ledgers
Signature: ✓ Valid
Token Scopes
Tokens can carry different permission scopes that control access to different server features:
| Scope | Claim | Controls |
|---|---|---|
| Events (all) | fluree.events.all | SSE event stream for all ledgers |
| Events (specific) | fluree.events.ledgers | SSE event stream for listed ledgers |
| Storage (all) | fluree.storage.all | Storage proxy read access (all); also implies data API read |
| Storage (specific) | fluree.storage.ledgers | Storage proxy read access (listed); also implies data API read |
| Read (all) | fluree.ledger.read.all | Data API query access to all ledgers |
| Read (specific) | fluree.ledger.read.ledgers | Data API query access to listed ledgers |
| Write (all) | fluree.ledger.write.all | Data API write access to all ledgers |
| Write (specific) | fluree.ledger.write.ledgers | Data API write access to listed ledgers |
The --all flag sets events, storage, read, and write access for all ledgers.
Back-compat: fluree.storage.* claims also grant data API read access for the same ledgers.
See Also
- auth - Store/manage tokens on remotes
- remote - Configure remote servers
- Authentication - Auth model, modes, and token claims
- fetch - Fetch from remotes (requires auth token)
- push - Push to remotes (requires auth token)
fluree auth
Manage authentication tokens for remote servers. Tokens are stored in .fluree/config.toml as part of the remote configuration.
Token values are never printed to stdout. The status command shows token presence, expiry, and identity only.
Subcommands
| Subcommand | Description |
|---|---|
status | Show authentication status for a remote |
login | Store a bearer token for a remote |
logout | Clear the stored token for a remote |
fluree auth status
Show the current authentication state for a remote, including token presence, expiry time, identity, and issuer.
Usage
fluree auth status [OPTIONS]
Options
| Option | Description |
|---|---|
--remote <NAME> | Remote name (defaults to the only configured remote) |
Examples
# Show auth status (single remote)
fluree auth status
# Show auth status for a specific remote
fluree auth status --remote origin
Output
When a token is configured:
Auth Status:
Remote: origin
Token: configured
Expiry: 2026-02-15 12:00 UTC
Identity: did:example:alice
Issuer: did:key:z6Mk...
Subject: alice@example.com
When no token is configured:
Auth Status:
Remote: origin
Token: not configured
hint: fluree auth login --remote origin
fluree auth login
Store a bearer token for a remote. The token is saved in .fluree/config.toml and will be sent as a Authorization: Bearer header on subsequent remote operations (fetch, pull, push, query --remote, etc.).
Usage
fluree auth login [OPTIONS]
Options
| Option | Description |
|---|---|
--remote <NAME> | Remote name (defaults to the only configured remote) |
--token <VALUE> | Token value, @filepath to read from file, or @- for stdin |
If --token is omitted, you will be prompted to paste the token interactively.
Token Input Methods
| Method | Example |
|---|---|
| Inline value | --token eyJhbG... |
| File | --token @/path/to/token.jwt |
| File (tilde) | --token @~/.fluree/token.jwt |
| Stdin | --token @- (pipe or redirect) |
| Interactive | Omit --token to be prompted |
Examples
# Store a token (prompted interactively)
fluree auth login
# Store a token from a value
fluree auth login --token eyJhbGciOiJFZERTQSI...
# Store a token from a file
fluree auth login --token @~/.fluree/my-token.jwt
# Pipe a token from another command
fluree token create --private-key @~/.fluree/key --all | fluree auth login --token @-
# Login to a specific remote
fluree auth login --remote staging --token @token.jwt
Output
Token stored for remote 'origin'
Expiry: 2026-02-15 12:00 UTC
Identity: did:example:alice
fluree auth logout
Clear the stored token for a remote.
Usage
fluree auth logout [OPTIONS]
Options
| Option | Description |
|---|---|
--remote <NAME> | Remote name (defaults to the only configured remote) |
Examples
# Clear token for the default remote
fluree auth logout
# Clear token for a specific remote
fluree auth logout --remote staging
Output
Token cleared for remote 'origin'
Token Types
The auth command stores bearer tokens that are sent in the Authorization header. Fluree supports two types of bearer tokens:
Ed25519 JWS Tokens (did:key)
Created locally with fluree token create. These contain an embedded JWK (JSON Web Key) in the token header and are verified against the embedded public key. The issuer is a did:key identifier derived from the signing key.
# Create and store a token in one step
fluree token create --private-key @~/.fluree/key --all | fluree auth login --token @-
OIDC/JWKS Tokens (RS256)
Issued by external identity providers (OIDC). These contain a kid (Key ID) in the token header and are verified by the server against the provider’s JWKS (JSON Web Key Set) endpoint. The issuer is the provider’s URL.
The server must be configured with --jwks-issuer to trust these tokens. See Configuration.
Remote Resolution
When --remote is omitted:
- If exactly one remote is configured, it is used automatically.
- If no remotes are configured, an error is shown with a hint to use
fluree remote add. - If multiple remotes are configured, an error asks you to specify
--remote <name>.
Security Notes
- Tokens are stored in plaintext in
.fluree/config.toml. Protect this file with appropriate filesystem permissions. - The
statuscommand never displays the raw token value. - On 401 errors from remote operations, the CLI checks token expiry and suggests
fluree auth loginif the token appears expired.
OIDC login flow
When a remote is configured with auth.type = "oidc_device" (auto-discovered from the server’s /.well-known/fluree.json), fluree auth login runs an OIDC interactive login flow and then exchanges the IdP token for a Fluree-scoped Bearer token:
- Discovers OIDC endpoints from the configured issuer
- Chooses the flow based on IdP support:
- If the IdP discovery document includes
device_authorization_endpoint: use OAuth device-code (prints a URL + code and polls). - Otherwise, if it includes
authorization_endpoint: use OAuth authorization-code + PKCE (opens a browser and receives a localhost callback).
- If the IdP discovery document includes
- Exchanges the IdP token for a Fluree-scoped Bearer token via the server’s
exchange_url - Stores the token (and optional refresh token) in the remote config
Cognito note (Authorization Code + PKCE)
AWS Cognito does not publish device_authorization_endpoint, so the CLI will use authorization-code + PKCE.
Cognito requires the callback URL to be pre-allowlisted (no wildcard ports). Allowlist:
http://127.0.0.1:8400/callbackhttp://127.0.0.1:8401/callbackhttp://127.0.0.1:8402/callbackhttp://127.0.0.1:8403/callbackhttp://127.0.0.1:8404/callbackhttp://127.0.0.1:8405/callback
If your app only allowlists one callback URL, configure a fixed port with redirect_port in /.well-known/fluree.json (or set FLUREE_AUTH_PORT locally) and allowlist that single callback URL.
On subsequent 401 errors, the CLI automatically attempts a silent refresh using the stored refresh token before prompting for re-login.
See Auth contract (CLI ↔ Server) for the full protocol specification.
See Also
- token - Create and inspect JWS tokens
- remote - Manage remote servers
- Authentication - Auth model, modes, and token claims
- Auth contract (CLI ↔ Server) - Discovery, exchange, and refresh protocol
- Configuration - Server authentication configuration
fluree remote
Manage remote servers for syncing ledgers.
Subcommands
| Subcommand | Description |
|---|---|
add | Add a remote server |
remove | Remove a remote |
list | List all configured remotes |
show | Show details for a remote |
fluree remote add
Add a remote server configuration.
Usage
fluree remote add <NAME> <URL> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<NAME> | Remote name (e.g., origin) |
<URL> | Server URL (e.g., http://localhost:8090) |
Options
| Option | Description |
|---|---|
--token <TOKEN> | Authentication token (or @filepath to read from file) |
Examples
# Add a remote without authentication
fluree remote add origin http://localhost:8090
# Add a remote with inline token
fluree remote add prod https://api.example.com --token eyJ...
# Add a remote with token from file
fluree remote add staging https://staging.example.com --token @~/.fluree/staging-token
fluree remote remove
Remove a remote configuration.
Usage
fluree remote remove <NAME>
Arguments
| Argument | Description |
|---|---|
<NAME> | Remote name to remove |
Examples
fluree remote remove origin
fluree remote list
List all configured remotes.
Usage
fluree remote list
Output
┌─────────┬─────────────────────────────┬───────┐
│ Name │ URL │ Auth │
├─────────┼─────────────────────────────┼───────┤
│ origin │ http://localhost:8090 │ none │
│ prod │ https://api.example.com │ token │
└─────────┴─────────────────────────────┴───────┘
fluree remote show
Show detailed information about a remote.
Usage
fluree remote show <NAME>
Arguments
| Argument | Description |
|---|---|
<NAME> | Remote name |
Output
Remote:
Name: origin
Type: HTTP
URL: http://localhost:8090
Auth: token configured
See Also
- upstream - Configure upstream tracking
- clone - Clone a ledger from a remote
- fetch - Fetch refs from a remote
- token - Create authentication tokens
fluree upstream
Manage upstream tracking configuration for ledgers.
Upstream configuration links a local ledger to a remote ledger, enabling pull and push operations.
Subcommands
| Subcommand | Description |
|---|---|
set | Set upstream tracking for a ledger |
remove | Remove upstream tracking |
list | List all upstream configurations |
fluree upstream set
Configure a local ledger to track a remote ledger.
Usage
fluree upstream set <LOCAL> <REMOTE> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<LOCAL> | Local ledger ID (e.g., mydb or mydb:main) |
<REMOTE> | Remote name (e.g., origin) |
Options
| Option | Description |
|---|---|
--remote-alias <ALIAS> | Remote ledger ID (defaults to local ledger ID) |
--auto-pull | Automatically pull on fetch |
Examples
# Track remote ledger with same name
fluree upstream set mydb origin
# Track a differently-named remote ledger
fluree upstream set mydb origin --remote-alias production-db
# Enable auto-pull on fetch
fluree upstream set mydb origin --auto-pull
fluree upstream remove
Remove upstream tracking for a ledger.
Usage
fluree upstream remove <LOCAL>
Arguments
| Argument | Description |
|---|---|
<LOCAL> | Local ledger ID |
Examples
fluree upstream remove mydb
fluree upstream list
List all configured upstream tracking relationships.
Usage
fluree upstream list
Output
┌────────────┬─────────┬────────────────┬───────────┐
│ Local │ Remote │ Remote Alias │ Auto-Pull │
├────────────┼─────────┼────────────────┼───────────┤
│ mydb:main │ origin │ mydb │ no │
│ test:main │ staging │ test-ledger │ yes │
└────────────┴─────────┴────────────────┴───────────┘
See Also
- remote - Configure remote servers
- clone - Clone a ledger from a remote
- pull - Pull from upstream
- push - Push to upstream
fluree fetch
Fetch refs from a remote server (similar to git fetch).
Usage
fluree fetch <REMOTE>
Arguments
| Argument | Description |
|---|---|
<REMOTE> | Remote name (e.g., origin) |
Description
Fetches ledger references from a remote server and updates local tracking data. This does not modify your local ledgers - it only updates what the CLI knows about the remote’s state.
This is a replication operation. It requires a Bearer token with root / storage-proxy permissions (fluree.storage.*). If you only have permissioned/query access to a ledger, you should use fluree track (or --remote) and run queries/transactions against the remote instead.
After fetching, you can use pull to download and apply new commits to your local ledger.
Examples
# Fetch from origin
fluree fetch origin
# Typical workflow
fluree fetch origin
fluree pull mydb
Output
Fetching from 'origin'...
Updated:
mydb -> t=42
testdb -> t=15
Already up to date: 2 ledger(s) unchanged
If no ledgers are found:
Fetching from 'origin'...
No ledgers found on remote.
See Also
- remote - Configure remote servers
- clone - Clone a ledger from a remote
- pull - Pull commits from upstream
- push - Push to upstream
fluree pull
Pull commits from upstream and apply them to the local ledger, similar to git pull.
Usage
fluree pull [OPTIONS] [LEDGER]
Arguments
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
--no-indexes | Skip pulling binary index data; only transfer new commits and txn blobs (local index may lag until you run fluree reindex) |
Description
Downloads new commits from the configured upstream and applies them to the local ledger:
- Queries the remote for its current head (
tand commit ContentId) - Compares with the local head; exits early if already up to date
- Attempts bulk download of missing commits (and by default index artifacts) via the pack protocol (single streaming request)
- Falls back to paginated JSON export if the server does not support pack
- Stores all commit and transaction blobs to local CAS
- When index data is requested and transferred, advances the local index head to match the remote
- Advances the local commit head to the remote head
Index transfer
As with clone, pull uses the pack protocol to request index artifacts by default when the remote has an index. Use --no-indexes to transfer only new commits and txn blobs. For large estimated transfers (~1 GiB or more), the CLI prompts for confirmation before streaming.
Transport
Pull uses the same pack protocol as clone – see clone: Transport for details.
Origin-based pull
When no upstream remote is configured, pull falls back to origin-based fetching if a LedgerConfig with origins is set on the ledger (see fluree config set-origins). This uses the same pack-first / CID-walk-fallback transport as fluree clone --origin.
This is a replication operation. It requires a Bearer token with root / storage-proxy permissions (fluree.storage.*). If you only have permissioned/query access to a ledger, you should use fluree track (or --remote) and run queries/transactions against the remote instead.
The ledger must have an upstream configured (see fluree upstream set), or a LedgerConfig with origins (see fluree config set-origins).
Restart safety: If interrupted, the local head reflects the last successful import. The next pull resumes from the local head automatically.
Examples
# Pull changes for active ledger
fluree pull
# Pull changes for specific ledger
fluree pull mydb
# Pull commits only (skip index transfer)
fluree pull --no-indexes mydb
Output
Successful pull (with index data when remote has an index):
Pulling 'mydb:main' from 'origin' (local t=10, remote t=42)...
✓ 'mydb:main' pulled 32 commit(s) via pack (new head t=42)
With --no-indexes, only commits (and referenced txn blobs) are transferred; the message does not include index artifact counts.
Already up to date:
✓ 'mydb:main' is already up to date
No upstream configured:
error: no upstream configured for 'mydb:main'
hint: fluree upstream set mydb:main <remote>
Errors
| Error | Description |
|---|---|
| No upstream configured | Run fluree upstream set <ledger> <remote> first, or configure origins via fluree config set-origins |
| Ancestry mismatch | Remote chain does not descend from local head (histories diverged) |
| Import validation failure | Commit chain or retraction invariant violation |
Limitations
- Index head vs commit head: When you use
--no-indexes, the local index head is not updated. Queries still work but may replay more novelty; runfluree reindexto bring the index up to the current commit head. - Graph source indexes not replicated: Graph source snapshots (BM25/vector/geo, etc.) are not replicated by
fluree pullyet. Rebuild graph source indexes in the target environment as needed.
See Also
- clone - Clone a ledger from a remote server
- upstream - Configure upstream tracking
- fetch - Fetch refs without modifying local ledger
- push - Push local changes to upstream
fluree push
Push local ledger changes to upstream remote, similar to git push.
Usage
fluree push [LEDGER]
Arguments
| Argument | Description |
|---|---|
[LEDGER] | Ledger name (defaults to active ledger) |
Description
Pushes local commits to the configured upstream remote by uploading the commit v2 bytes to the server.
The ledger must have an upstream configured (see fluree upstream set).
The push uses strict sequencing + CAS semantics:
- The server rejects the push if the remote head is not in your local history (diverged) or if the remote is ahead.
- The server also rejects the push if the first commit’s
tdoes not match the server’s next-t.
Unlike fetch/pull, this is not a storage-proxy replication operation. It requires write permissions for the ledger (Bearer token with fluree.ledger.write.* claims) and the server validates the pushed commits like normal transactions.
If a pushed commit contains retractions, the server enforces a strict invariant: each retraction must target a fact that is currently asserted at that point in the push batch. (List retractions require exact list-index metadata match.)
Examples
# Push active ledger
fluree push
# Push specific ledger
fluree push mydb
Output
Successful push:
Pushing 'mydb:main' to 'origin'...
✓ 'mydb:main' pushed 3 commit(s) (new head t=42)
Push rejected (remote is ahead):
Pushing 'mydb:main' to 'origin'...
error: push rejected; remote is ahead (local t=10, remote t=42). Pull first.
No upstream configured:
error: no upstream configured for 'mydb:main'
hint: fluree upstream set mydb:main <remote>
Errors
| Error | Description |
|---|---|
| No upstream configured | Run fluree upstream set <ledger> <remote> first |
| Push rejected (409) | Remote head changed, histories diverged, or first commit t does not match next-t |
| Push rejected (422) | Invalid commit bytes, missing required referenced blob, or retraction invariant violation |
Workflow
Typical sync workflow:
# Configure remote and upstream (one time)
fluree remote add origin https://api.example.com --token @~/.fluree/token
fluree upstream set mydb origin
# Daily workflow
fluree pull mydb # Get latest changes
# ... make local changes ...
fluree push mydb # Push your changes
See Also
- clone - Clone a ledger from a remote
- upstream - Configure upstream tracking
- pull - Pull changes from upstream
- fetch - Fetch refs without modifying local ledger
fluree publish
Publish a local ledger to a remote server. Creates the ledger on the remote if it doesn’t exist, pushes all local commits, and configures upstream tracking for subsequent push/pull.
Usage
fluree publish <REMOTE> [LEDGER] [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<REMOTE> | Remote name (e.g., “origin”) |
[LEDGER] | Ledger name (defaults to active ledger) |
Options
| Option | Description |
|---|---|
--remote-name <NAME> | Remote ledger name (defaults to local ledger name) |
Description
fluree publish is the reverse of fluree clone. It takes a locally-created ledger and pushes it to a remote server in a single operation:
- Checks if the ledger exists on the remote (
GET /exists) - Creates it if not (
POST /create) - Pushes all local commits (
POST /push) - Configures upstream tracking so subsequent
fluree pushandfluree pullwork
This is intended for the “create locally, deploy to server” workflow. If the remote ledger already has data (t > 0), the command will fail — use fluree push instead for incremental updates.
Examples
# Publish active ledger to origin
fluree publish origin
# Publish a specific ledger
fluree publish origin mydb
# Publish with a different name on the remote
fluree publish origin mydb --remote-name production-db
# Typical workflow: create locally, develop, then publish
fluree create mydb
fluree insert mydb -e '{"@id": "ex:test", "ex:name": "Test"}'
fluree publish origin mydb
Prerequisites
- A remote must be configured:
fluree remote add origin <url> - The remote must support the Fluree HTTP API (see Server implementation guide)
- A valid auth token if the remote requires authentication:
fluree auth login --remote origin
After Publishing
Once published, the ledger has upstream tracking configured. Use standard sync commands:
# Push new local commits to remote
fluree push
# Pull remote changes
fluree pull
See Also
- push - Push incremental commits to upstream
- pull - Pull changes from upstream
- clone - Clone a remote ledger locally (reverse of publish)
- remote - Manage remote server configuration
- upstream - Manage upstream tracking
- export - Export ledger as
.flpackfor file-based transfer
fluree clone
Clone a ledger from a remote server, similar to git clone.
Usage
# Named-remote clone
fluree clone [OPTIONS] <REMOTE> <LEDGER>
# Origin-based clone (no pre-configured remote)
fluree clone --origin <URI> [--token <TOKEN>] [OPTIONS] <LEDGER>
Arguments
| Argument | Description |
|---|---|
<REMOTE> | Remote name (configured via fluree remote add) |
<LEDGER> | Ledger name on the remote server |
--origin <URI> | Bootstrap URI for CID-based clone (replaces <REMOTE>) |
--token <TOKEN> | Auth token for origin server (with --origin only) |
--no-indexes | Skip pulling binary index data; only transfer commits and txn blobs (queries will replay from commits until you run fluree reindex) |
--no-txns | Skip pulling original transaction payloads. Commits still transfer (chain remains valid and verifiable), but the raw JSON-LD / SPARQL requests that produced each commit are not downloaded. Use for read-only clones of large ledgers. See Transaction transfer. |
Description
Downloads all commits from a remote ledger and creates a local copy:
- Verifies the remote ledger exists and has commits
- Creates a local ledger with the same name as on the remote
- Attempts bulk download via the pack protocol (single streaming request)
- Falls back to paginated JSON export if the server does not support pack
- Stores all commit and transaction blobs to local CAS
- By default, also transfers binary index artifacts when the remote has an index (see Index transfer)
- Sets the local commit head (and index head when index data was transferred) to match the remote
- Configures the remote as upstream for future
pull/push(named-remote only)
Index transfer
When using the pack protocol, the CLI requests index artifacts by default so the local ledger is query-ready without a full reindex. The server sends missing commit blobs, txn blobs, and binary index artifacts (dictionaries, branches, leaves) in one stream.
- Use
--no-indexesto transfer only commits and txn blobs. This reduces transfer size and time; afterward, runfluree reindexto build the index locally if needed. - For large transfers (estimated size above ~1 GiB), the CLI prompts: “Estimated transfer size: ~X. This may take several minutes. Continue? [Y/n]”. Answer
nto abort or to re-request without index data (commits-only). - If the remote has no index yet (e.g. a fresh ledger), only commits and txns are transferred regardless of the flag.
Transaction transfer
Every commit references an original transaction blob — the raw request (JSON-LD insert/update or SPARQL Update) that produced the commit. By default, fluree clone downloads these so the local ledger has a complete audit trail of the original payloads.
- Use
--no-txnsto skip transaction blobs entirely. The commit chain is still cloned and remains valid and verifiable; only the original request payloads are missing. - The materialized ledger state (what queries return) is reconstructable from commits + indexes alone — transactions are not needed for query answering.
- With
--no-txns, operations that need the original request payload (e.g.,fluree show --flakesfor transaction-level inspection, or re-running a transaction against a branch) will fail locally for those transactions. Anything that only reads materialized state is unaffected. - Combine with
--no-indexesfor the smallest possible clone (fluree clone --no-indexes --no-txns origin mydb), useful for minimal verification / auditing of the commit chain only.
Transport
The CLI uses the pack protocol (fluree-pack-v1) as the primary transport for clone and pull. Pack transfers all missing CAS objects (commits + txn blobs, and by default index artifacts) in a single streaming HTTP request, avoiding per-object round-trips.
If the remote server does not support the pack endpoint (returns 404, 405, 406, or 501), the CLI automatically falls back to:
- Named-remote mode: paginated JSON export via
GET /commits/{ledger}(500 commits per page) - Origin mode: CID chain walk via
GET /storage/objects/{cid}(one round-trip per commit)
This fallback is transparent – no user action is required.
Origin-based clone
The --origin flag enables CID-based clone from a server URL without pre-configuring a named remote:
fluree clone --origin http://localhost:8090 mydb
fluree clone --origin https://api.example.com --token @~/.fluree/token mydb
This mode:
- Fetches the NsRecord from the origin to discover the head commit CID
- Optionally upgrades to a multi-origin fetcher if a LedgerConfig is advertised
- Downloads commits via pack (or CID chain walk as fallback)
- Stores the LedgerConfig locally for future origin-based
pull - Does not configure upstream tracking (use
fluree upstream setmanually)
This is a replication operation. It requires a Bearer token with root / storage-proxy permissions (fluree.storage.*). If you only have permissioned/query access to a ledger, you should use fluree track (or --remote) and run queries/transactions against the remote instead.
Idempotent CAS writes: If interrupted mid-clone, CAS blob writes are idempotent. Re-running the clone command will re-fetch all pages (duplicate writes are harmless). The local head is only set after all data is downloaded.
Examples
# Clone a ledger from a configured remote
fluree clone origin mydb
# Full workflow: add remote, then clone
fluree remote add production https://api.example.com --token @~/.fluree/token
fluree clone production customers
# Origin-based clone (no remote setup needed)
fluree clone --origin http://localhost:8090 mydb
# Origin-based clone with auth
fluree clone --origin https://api.example.com --token @~/.fluree/token mydb
# Clone without index data (faster; run fluree reindex afterward if needed)
fluree clone --no-indexes origin mydb
# Clone commits + indexes but skip original transaction payloads
fluree clone --no-txns origin mydb
# Smallest possible clone — commits only (no indexes, no transactions)
fluree clone --no-indexes --no-txns origin mydb
Output
Successful clone (via pack, with index data):
Cloning 'mydb:main' from 'origin' (remote t=1042)...
fetched 2084 object(s) via pack
✓ Cloned 'mydb:main' (1042 commits, head t=1042)
→ upstream set to 'origin/mydb:main'
With --no-indexes (commits and txns only), the object count will be lower and the local index head is not set until you run fluree reindex.
Successful clone (fallback to paginated export):
Cloning 'mydb:main' from 'origin' (remote t=1042)...
fetched 500 commits...
fetched 1000 commits...
fetched 1042 commits...
✓ Cloned 'mydb:main' (1042 commits, head t=1042)
→ upstream set to 'origin/mydb:main'
Origin-based clone:
Cloning 'mydb:main' from 'http://localhost:8090' (remote t=50)...
fetched 100 object(s) via pack
✓ Cloned 'mydb:main' (50 commit(s), head t=50)
Remote ledger has no commits:
Remote ledger 'mydb:main' has no commits (t=0), nothing to clone.
Errors
| Error | Description |
|---|---|
| Remote not configured | Run fluree remote add <name> <url> first |
| Ledger not found on remote | Verify the ledger name matches the remote server |
| Auth failure | Token missing or lacks fluree.storage.* permissions |
| Local ledger already exists | Drop the existing ledger first |
Limitations
- Post-clone indexing: If you used
--no-indexes, runfluree reindexto build a binary index locally. Without an index, queries replay from commits and can be slow for large ledgers. When index data is transferred by default (no--no-indexes), the local index head is set and no reindex is needed for the core ledger. - Missing transactions: If you used
--no-txns, the original transaction payloads for historical commits are permanently unavailable on the local clone (re-pull will not fetch them unless you explicitly re-clone without the flag). The ledger state remains queryable; only transaction-level inspection and replay are affected. - Graph source indexes not replicated: Graph source snapshots (BM25/vector/geo, etc.) are not replicated by
fluree cloneyet. After cloning, rebuild graph source indexes in the target environment as needed.
See Also
- pull - Pull new commits from upstream
- push - Push local commits to upstream
- remote - Configure remote servers
- upstream - Configure upstream tracking
fluree track
Track a remote ledger without storing local data. Tracked ledgers route reads and writes to the configured remote server while keeping a lightweight record locally so you can use short aliases and the active-ledger shortcut.
Usage
fluree track <SUBCOMMAND>
Subcommands
fluree track add
Start tracking a remote ledger under a local alias.
Usage:
fluree track add <LEDGER> [--remote <NAME>] [--remote-alias <NAME>]
Arguments:
| Argument | Description |
|---|---|
<LEDGER> | Local alias for the tracked ledger |
Options:
| Option | Description |
|---|---|
--remote <NAME> | Remote name (e.g., origin). Defaults to the only configured remote if unambiguous. |
--remote-alias <NAME> | Alias on the remote (defaults to the local alias) |
Examples:
# Track a remote ledger using the same name locally
fluree track add production --remote origin
# Use a different local alias
fluree track add prod --remote origin --remote-alias production
fluree track remove
Stop tracking a remote ledger. Local data is not affected (tracked ledgers have none).
Usage:
fluree track remove <LEDGER>
| Argument | Description |
|---|---|
<LEDGER> | Local alias to stop tracking |
fluree track list
List all currently tracked ledgers and the remote each resolves to.
Usage:
fluree track list
fluree track status
Show status of tracked ledger(s) by querying the configured remote for each — commit t, index t, and head IDs.
Usage:
fluree track status [LEDGER]
| Argument | Description |
|---|---|
[LEDGER] | Local alias (shows all tracked ledgers if omitted) |
Examples:
# Status of all tracked ledgers
fluree track status
# Status for a single tracked ledger
fluree track status production
Description
A tracked ledger is a local pointer to a remote ledger. Queries, transactions, and most administrative commands against a tracked alias are transparently forwarded to the remote. This lets you work against a hosted ledger using the same CLI flow as a local ledger — including the active-ledger shortcut (fluree use), without syncing commit/index data to disk.
Use fluree clone instead when you need a full local copy of a remote ledger’s data.
See Also
- remote - Manage named remote servers
- clone - Clone a remote ledger locally (with data)
- use - Switch active ledger
- list - List local and tracked ledgers
server
Manage the Fluree HTTP server from the CLI. The server inherits the same .fluree/ context (config file, storage path) as the CLI — one directory, two modes of interaction.
Subcommands
| Subcommand | Description |
|---|---|
run | Run the server in the foreground (Ctrl-C to stop) |
start | Start the server as a background process |
stop | Stop a backgrounded server |
status | Show server status (PID, address, health) |
restart | Stop and restart a backgrounded server |
logs | View server logs |
Common Options
These options are available on run, start, and restart:
| Option | Description |
|---|---|
--listen-addr <ADDR> | Listen address (e.g., 0.0.0.0:8090) |
--storage-path <PATH> | Storage path override (local file storage) |
--connection-config <FILE> | JSON-LD connection config for S3, DynamoDB, etc. |
--log-level <LEVEL> | Log level (trace, debug, info, warn, error) |
--profile <NAME> | Configuration profile to activate |
-- <ARGS>... | Additional server flags (passed through to server config) |
--storage-path and --connection-config are mutually exclusive. Use --storage-path for local file storage or --connection-config for remote backends (S3, DynamoDB, split storage). See Configuration for details.
When no flags are provided, the server discovers its configuration using the same search as the CLI: it walks up from the current working directory looking for a .fluree/config.toml (or config.jsonld), then falls back to the global Fluree config directory ($FLUREE_HOME, or the platform config directory — see Configuration). Server settings live under the [server] section. The CLI’s --config flag is also honored.
run
Run the server in the foreground. Logs go to stderr. Press Ctrl-C for graceful shutdown.
# Start with defaults from config.toml
fluree server run
# Override listen address
fluree server run --listen-addr 127.0.0.1:9090
# S3 + DynamoDB backend
fluree server run --connection-config /etc/fluree/connection.jsonld
# Pass through advanced server flags
fluree server run -- --cors-enabled --indexing-enabled
start
Start the server as a background daemon. Writes PID and metadata to .fluree/ and redirects output to .fluree/server.log.
# Start in background
fluree server start
# Preview resolved config without starting
fluree server start --dry-run
# Start with overrides
fluree server start --listen-addr 0.0.0.0:8090 --log-level debug
The --dry-run flag prints the fully resolved configuration (config file + env + flag overrides merged) without actually starting the server. Useful for debugging “why is it using port X?”.
stop
Stop a backgrounded server by sending SIGTERM and waiting for graceful shutdown (up to 10 seconds).
fluree server stop
# Force kill after timeout
fluree server stop --force
status
Check whether the server is running. Shows PID, listen address, uptime, storage path, and performs an HTTP health check.
fluree server status
Example output:
ok: Server is running
pid: 12345
listen_addr: 0.0.0.0:8090
storage_path: /path/to/.fluree/storage
started_at: 2026-02-16T10:30:00Z
uptime: 2h 15m 30s
health: ok
log: /path/to/.fluree/server.log
When using --connection-config, the status shows the connection config path instead of the storage path:
ok: Server is running
pid: 12345
listen_addr: 0.0.0.0:8090
connection: /etc/fluree/connection.jsonld
started_at: 2026-02-16T10:30:00Z
uptime: 2h 15m 30s
health: ok
restart
Stop and restart a backgrounded server. Recovers the original arguments from .fluree/server.meta.json. New flag overrides can be applied on restart.
fluree server restart
# Restart with a different log level
fluree server restart --log-level debug
logs
View server log output from .fluree/server.log.
# Last 50 lines (default)
fluree server logs
# Last 100 lines
fluree server logs -n 100
# Follow (like tail -f)
fluree server logs -f
Auto-Routing
When a local server is running (started via fluree server start), CLI commands that support remote execution are automatically routed through the server’s HTTP API. This applies to:
fluree queryfluree insertfluree upsertfluree listfluree info
The CLI detects the running server by checking .fluree/server.meta.json and verifying the PID is alive. When auto-routing is active, you’ll see a hint on stderr:
server: routing through local server at 0.0.0.0:8090 (use --direct to bypass)
Opting out
Use the --direct global flag to bypass auto-routing and execute directly via the CLI’s file-based path:
# Route through server (default when server is running)
fluree query 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'
# Bypass server, execute directly
fluree query --direct 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'
Crash detection
If the server has crashed or been killed, the CLI detects the stale PID and falls back to direct execution with a notice:
notice: local server (pid 12345) is no longer running; executing directly
Use fluree server status to check server health, or fluree server logs to view crash output.
Runtime Files
When a background server is running, these files are created in the .fluree/ data directory:
| File | Description |
|---|---|
server.pid | PID of the background server process |
server.log | stdout + stderr from the background server |
server.meta.json | Metadata for restart and status (PID, address, args, start time) |
These files are cleaned up automatically by fluree server stop.
Configuration
The server uses the same config file as the CLI (discovered via walk-up or global fallback — see above). Server-specific settings live under the [server] section:
[server]
listen_addr = "0.0.0.0:8090"
storage_path = "/var/lib/fluree"
log_level = "info"
cors_enabled = true
# cache_max_mb = 4096 # global cache budget (MB); default: tiered fraction of RAM (30% <4GB, 40% 4-8GB, 50% ≥8GB)
[server.indexing]
enabled = true
reindex_min_bytes = 100_000
# reindex_max_bytes defaults to 20% of system RAM; uncomment to override
# reindex_max_bytes = 536_870_912 # 512 MB
For S3/DynamoDB backends, use connection_config instead of storage_path:
[server]
connection_config = "/etc/fluree/connection.jsonld"
cache_max_mb = 4096
[server.indexing]
enabled = true
Indexing settings live under the [server.indexing] subsection, not directly on [server]. Authentication settings similarly use [server.auth.events], [server.auth.data], etc.
See Configuration for the full list of server options.
Feature Flags
The server subcommand requires the server Cargo feature (enabled by default). If compiled without it:
fluree server run
# error: server support not compiled. Rebuild with `--features server`.
For S3/DynamoDB support via --connection-config, the aws feature must be enabled:
cargo build -p fluree-db-cli --features aws
Without this feature, S3 storage configs in the connection config will produce a clear error at startup.
fluree memory
Developer memory — store and recall facts, decisions, and constraints.
This page is the CLI command reference. For conceptual background, IDE setup, team workflows, and the full schema, see the Memory section of the docs.
Usage
fluree memory <COMMAND>
Subcommands
| Command | Description |
|---|---|
init | Initialize the memory store (creates __memory ledger) |
add | Store a new memory |
recall | Search and rank relevant memories |
update <ID> | Update a memory in place |
forget <ID> | Delete a memory |
status | Show memory store status |
export | Export all current memories as JSON |
import <FILE> | Import memories from a JSON file |
mcp-install | Install MCP configuration for an IDE |
Description
The memory system stores project knowledge as RDF triples in a dedicated __memory Fluree ledger. Memories persist across sessions and are searchable by keyword-scored recall.
Run fluree memory init before using other memory commands. The MCP server auto-initializes on first tool call.
fluree memory init
Initialize the memory store and optionally configure MCP for detected AI coding tools. Idempotent — safe to run multiple times.
fluree memory init [OPTIONS]
Options
| Option | Description |
|---|---|
--yes, -y | Auto-confirm all MCP installations (non-interactive) |
--no-mcp | Skip AI tool detection and MCP configuration entirely |
What init does
- Creates the
__memoryledger and transacts the memory schema. - Creates
.fluree-memory/at the project root withrepo.ttl,.gitignore, and.local/user.ttl. - Migrates existing memories — if the ledger already has memories (e.g., from a pre-TTL version), they are exported to the appropriate
.ttlfiles. - Detects AI coding tools (Claude Code, Cursor, VS Code, Windsurf, Zed) and offers to install MCP config for each.
Example
$ fluree memory init
Memory store initialized at /path/to/project/.fluree-memory
Repo memories are stored in .fluree-memory/repo.ttl (git-tracked).
Commit this directory to share project knowledge with your team.
Detected AI coding tools:
- Claude Code (already configured)
- Cursor
- VS Code (Copilot) (already configured)
Install MCP config for Cursor? [Y/n] Y
Installed: .cursor/mcp.json
Installed: .cursor/rules/fluree_rules.md
Configured 1 tool.
With --yes: auto-confirms all installations without prompting. In a non-interactive shell (piped stdin) without --yes, MCP installation is skipped with a message.
fluree memory add
Store a new memory.
fluree memory add [OPTIONS]
Options
| Option | Description |
|---|---|
--kind <KIND> | Memory kind: fact, decision, constraint (default: fact) |
--text <TEXT> | Content text (or provide via stdin) |
--tags <T1,T2> | Required. Comma-separated tags for categorization — the primary recall signal |
--refs <R1,R2> | Comma-separated file/artifact references |
--severity <SEV> | For constraints: must, should, prefer |
--scope <SCOPE> | Scope: repo (default) or user |
--rationale <TEXT> | Why this memory exists (available on any kind) |
--alternatives <TEXT> | Alternatives considered (comma-separated) |
--format <FMT> | Output format: text (default) or json |
Examples
# Store a fact
fluree memory add --kind fact --text "Tests use cargo nextest" --tags testing,cargo
# Store a constraint with severity
fluree memory add --kind constraint --text "Never suppress dead code with underscore prefix" \
--tags code-style --severity must
# Store from stdin
echo "The index format uses postcard encoding" | fluree memory add --kind fact --tags indexer
# Store a decision with rationale and alternatives
fluree memory add --kind decision --text "Use postcard for compact index encoding" \
--rationale "no_std compatible, smaller output than bincode" \
--alternatives "bincode, CBOR, MessagePack" --refs fluree-db-indexer/
# Store a fact with rationale
fluree memory add --kind fact --text "PSOT queries return supersets — post-filter required" \
--rationale "B-tree range scan can't filter on non-key predicates" --tags query,index
Output (text):
Stored memory: mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0
Secret detection
If the content contains secrets (API keys, passwords, tokens, connection strings), they are automatically redacted and a warning is printed:
warning: secrets detected in content — storing redacted version.
Original content contained sensitive data that was replaced with [REDACTED].
Stored memory: mem:fact-01JDXYZ...
fluree memory recall
Search and retrieve relevant memories ranked by score.
fluree memory recall <QUERY> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<QUERY> | Natural language search query |
Options
| Option | Description |
|---|---|
-n, --limit <N> | Maximum results per page (default: 3) |
--offset <N> | Skip the first N results — use for pagination (default: 0) |
--kind <KIND> | Filter to a specific memory kind |
--tags <T1,T2> | Filter to memories with these tags |
--scope <SCOPE> | Filter by scope: repo or user |
--format <FMT> | Output: text (default), json, or context (XML for LLM) |
Examples
# Basic recall (returns top 3)
fluree memory recall "how to run tests"
# Get the next page
fluree memory recall "how to run tests" --offset 3
# Return up to 10 results
fluree memory recall "error handling" -n 10
# Filter by kind and tags
fluree memory recall "error handling" --kind constraint --tags errors
# Output as XML context (for LLM injection)
fluree memory recall "testing patterns" --format context
Output (text):
Recall: "how to run tests" (2 matches)
1. [score: 13.0] mem:fact-01JDXYZ...
Tests use cargo nextest
Tags: testing, cargo
2. [score: 8.0] mem:fact-01JDABC...
Integration tests use assert_cmd + predicates
Tags: testing
(showing results 1–3; use --offset 3 for more)
Output (context):
<memory-context>
<memory id="mem:fact-01JDXYZ..." kind="fact" score="13.0">
<content>Tests use cargo nextest</content>
<tags>testing, cargo</tags>
</memory>
<pagination shown="1" offset="0" total_in_store="13" />
</memory-context>
When results are cut off, the pagination element includes a hint:
<pagination shown="3" offset="0" limit="3" total_in_store="13">Results 1–3. Use offset=3 to retrieve more.</pagination>
fluree memory update
Update a memory in place. Only the fields you provide are changed — the ID stays the same. History is tracked via git.
fluree memory update <ID> [OPTIONS]
Options
| Option | Description |
|---|---|
--text <TEXT> | New content text |
--tags <T1,T2> | New tags (replaces all existing) |
--refs <R1,R2> | New artifact refs (replaces all existing) |
--format <FMT> | Output: text or json |
Example
fluree memory update mem:fact-01JDXYZ... --text "Tests use cargo nextest with --no-fail-fast"
Output:
Updated: mem:fact-01JDXYZ...
fluree memory forget
Delete a memory by retracting all its triples.
fluree memory forget <ID>
Output:
Forgotten: mem:fact-01JDXYZ...
fluree memory status
Show a summary of the memory store.
fluree memory status
Output:
Memory Store Status
Total memories: 12
Total tags: 25
By kind:
fact: 7
decision: 2
constraint: 3
fluree memory export / import
Export all current (non-superseded) memories as JSON, or import from a file.
fluree memory export > memories.json
fluree memory import memories.json
fluree memory mcp-install
Install MCP configuration for an IDE so agents can use memory tools.
fluree memory mcp-install [--ide <IDE>]
Options
| Option | Description |
|---|---|
--ide <IDE> | Target IDE (auto-detected if omitted) |
Supported IDE values:
| Value | Config written | Notes |
|---|---|---|
claude-code | claude mcp add (local scope → ~/.claude.json) | Also appends to CLAUDE.md |
vscode | .vscode/mcp.json (key: servers) | Also installs .vscode/fluree_rules.md |
cursor | .cursor/mcp.json (key: mcpServers) | Also installs .cursor/rules/fluree_rules.md |
windsurf | ~/.codeium/windsurf/mcp_config.json (global) | — |
zed | .zed/settings.json (key: context_servers) | Skips if JSONC (comments) detected |
Legacy aliases: claude-vscode and github-copilot map to vscode.
When --ide is omitted, the first unconfigured detected tool is used; defaults to claude-code if none detected.
Example
fluree memory mcp-install --ide cursor
Output:
Installed: .cursor/mcp.json
Installed: .cursor/rules/fluree_rules.md
Cursor notes (recommended config)
Cursor’s MCP configuration supports stdio servers with a type field and config interpolation like ${workspaceFolder}. A portable repo-scoped setup looks like:
{
"mcpServers": {
"fluree-memory": {
"type": "stdio",
"command": "fluree",
"args": ["mcp", "serve", "--transport", "stdio"],
"env": {
"FLUREE_HOME": "${workspaceFolder}/.fluree"
}
}
}
}
Setting FLUREE_HOME ensures the MCP server uses the current workspace’s .fluree/ directory even if Cursor spawns the process from a different working directory. That keeps repo memory/logs under <repo>/.fluree-memory/ instead of a global location.
Troubleshooting: repo vs global memory
- Repo-scoped expected:
- Memories:
<repo>/.fluree-memory/repo.ttl - MCP log:
<repo>/.fluree-memory/.local/mcp.log(should showclient initializedafter a full Cursor restart)
- Memories:
- If it’s using global dirs on macOS:
- Memories/log:
~/Library/Application Support/.fluree-memory/... - Fix: ensure your Cursor config sets
env.FLUREE_HOME = "${workspaceFolder}/.fluree"and restart Cursor fully.
- Memories/log:
See Also
- Memory overview — what it is, when to use it, how it fits into your workflow
- Memory getting started — install, quickstart, and per-IDE setup guides
- Memory concepts — repo vs user memory, supersession, recall ranking, secrets
- Memory guides — team workflows, rules-file customization, migrating from plain markdown
- Memory reference — IDE support matrix,
mem:schema, TTL file format - mcp — MCP server for IDE agent integration
fluree mcp
Model Context Protocol (MCP) server for IDE agent integration.
Usage
fluree mcp <COMMAND>
Subcommands
| Command | Description |
|---|---|
serve | Start the MCP server |
fluree mcp serve
Start an MCP server that exposes developer memory tools to IDE agents.
fluree mcp serve [--transport <TRANSPORT>]
Options
| Option | Description |
|---|---|
--transport <TRANSPORT> | Transport protocol: stdio (default) |
The stdio transport reads JSON-RPC requests from stdin and writes responses to stdout. This is the standard transport for IDE integration — the IDE spawns the process and communicates over pipes.
Available tools
The MCP server exposes 6 tools:
| Tool | Description |
|---|---|
memory_add | Store a new memory (fact, decision, constraint, preference, artifact) |
memory_recall | Search and retrieve relevant memories as XML context. Accepts query, limit (default: 3), offset (default: 0), kind, tags, scope. Returns a <pagination> element indicating whether more results are available. |
memory_update | Update (supersede) an existing memory |
memory_forget | Delete a memory |
memory_status | Show memory store summary |
kg_query | Execute raw SPARQL against the memory graph |
The server auto-initializes the memory store on first tool call. No separate fluree memory init is needed.
IDE configuration
The easiest way to configure your IDE is with fluree memory mcp-install:
fluree memory mcp-install --ide cursor
Or manually add to your IDE’s MCP config:
{
"mcpServers": {
"fluree-memory": {
"type": "stdio",
"command": "/path/to/fluree",
"args": ["mcp", "serve", "--transport", "stdio"],
"env": {
"FLUREE_HOME": "${workspaceFolder}/.fluree"
}
}
}
}
Testing with JSON-RPC
To test the server directly, pipe JSON-RPC to stdin:
printf '%s\n' \
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"smoke","version":"0.0"}}}' \
'{"jsonrpc":"2.0","method":"notifications/initialized","params":{}}' \
'{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' \
| fluree mcp serve --transport stdio
Tracing
CLI tracing is disabled when running fluree mcp serve to avoid any log output on stderr that could interfere with the JSON-RPC protocol.
See Also
- memory — CLI commands for memory management
- Memory: MCP server — what the MCP server exposes and how agents use it
- Memory getting started — per-IDE setup (Claude Code, Cursor, VS Code, Windsurf, Zed)
- Memory IDE support matrix — config paths and supported features per IDE
fluree iceberg
Manage Apache Iceberg table connections.
Subcommands
| Subcommand | Description |
|---|---|
map | Map an Iceberg table as a graph source |
list | List Iceberg-family graph sources (Iceberg and R2RML) |
info | Show details for an Iceberg-family graph source |
drop | Drop an Iceberg-family graph source |
fluree iceberg map
Map an Iceberg table as a queryable graph source.
Usage
fluree iceberg map <NAME> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<NAME> | Graph source name (e.g., “warehouse-orders”) |
Options
Catalog mode:
| Option | Description |
|---|---|
--mode <MODE> | Catalog mode: rest (default) or direct |
REST catalog mode options:
| Option | Description |
|---|---|
--catalog-uri <URI> | REST catalog URI (required for rest mode) |
--table <ID> | Table identifier in namespace.table format (required if not specified in R2RML mapping) |
--warehouse <NAME> | Warehouse identifier |
--no-vended-credentials | Disable vended credentials (enabled by default) |
Direct S3 mode options:
| Option | Description |
|---|---|
--table-location <URI> | S3 table location (required for direct mode, e.g., s3://bucket/warehouse/ns/table) |
R2RML mapping:
| Option | Description |
|---|---|
--r2rml <PATH> | R2RML mapping file (Turtle format, required). Defines how table rows become RDF triples. Table references come from the mapping’s rr:tableName entries. |
--r2rml-type <TYPE> | Mapping media type (e.g., text/turtle); inferred from extension if omitted |
Authentication:
| Option | Description |
|---|---|
--auth-bearer <TOKEN> | Bearer token for REST catalog authentication |
--oauth2-token-url <URL> | OAuth2 token URL for client credentials auth |
--oauth2-client-id <ID> | OAuth2 client ID |
--oauth2-client-secret <SECRET> | OAuth2 client secret |
S3 overrides:
| Option | Description |
|---|---|
--s3-region <REGION> | S3 region override |
--s3-endpoint <URL> | S3 endpoint override (for MinIO, LocalStack) |
--s3-path-style | Use path-style S3 URLs |
Other:
| Option | Description |
|---|---|
--remote <NAME> | Execute against a remote server (by remote name) |
--branch <NAME> | Branch name (defaults to “main”) |
Description
Maps an Apache Iceberg table as a graph source that can be queried using SPARQL or JSON-LD queries. The table is accessed read-only; Fluree does not modify the Iceberg table.
An R2RML mapping (--r2rml) is required to define how Iceberg table rows are transformed into RDF triples.
Two catalog modes are supported:
- REST mode (default): Connects to an Iceberg REST catalog (e.g., Apache Polaris) to discover table metadata. Supports vended credentials and warehouse selection.
- Direct S3 mode: Reads table metadata directly from S3 by resolving
version-hint.textin the table’smetadata/directory. No catalog server required.
Examples
# REST catalog with R2RML mapping
fluree iceberg map airlines \
--catalog-uri https://polaris.example.com/api/catalog \
--r2rml mappings/airlines.ttl \
--auth-bearer $POLARIS_TOKEN
# REST catalog with explicit table and warehouse
fluree iceberg map warehouse-orders \
--catalog-uri https://polaris.example.com/api/catalog \
--table sales.orders \
--r2rml mappings/orders.ttl \
--auth-bearer $POLARIS_TOKEN \
--warehouse my-warehouse
# Direct S3 (no catalog server)
fluree iceberg map execution-log \
--mode direct \
--table-location s3://my-bucket/warehouse/logs/execution_log \
--r2rml mappings/execution_log.ttl \
--s3-region us-east-1
# OAuth2 authentication
fluree iceberg map orders \
--catalog-uri https://polaris.example.com/api/catalog \
--table sales.orders \
--r2rml mappings/orders.ttl \
--oauth2-token-url https://auth.example.com/token \
--oauth2-client-id my-client \
--oauth2-client-secret $CLIENT_SECRET
# Create the graph source on a remote Fluree server
fluree iceberg map warehouse-orders \
--remote origin \
--catalog-uri https://polaris.example.com/api/catalog \
--table sales.orders \
--r2rml mappings/orders.ttl
Output
Mapped Iceberg table as R2RML graph source 'airlines:main'
Table: openflights.airlines
Catalog: https://polaris.example.com/api/catalog
R2RML: mappings/airlines.ttl
TriplesMaps: 3
Connection: verified
Mapping: validated
After Mapping
Once mapped, the graph source appears in standard commands:
# Listed alongside ledgers
fluree list
# Inspect configuration
fluree info warehouse-orders
# Query via SPARQL GRAPH pattern
fluree query mydb 'SELECT ?id ?total FROM <mydb:main> WHERE { GRAPH <warehouse-orders:main> { ?o ex:id ?id ; ex:total ?total } }'
# Remove the mapping
fluree drop warehouse-orders --force
Feature Flag
Requires the iceberg feature flag. Without it, the command returns:
error: Iceberg support not compiled. Rebuild with `--features iceberg`.
See Also
- Iceberg / Parquet - Iceberg integration details
- R2RML - R2RML mapping reference
- list - List ledgers and graph sources
- info - Show graph source details
- drop - Remove a graph source
fluree iceberg list
List Iceberg-family graph sources (Iceberg and R2RML types).
Usage
fluree iceberg list [--remote <NAME>]
Examples
# Local
fluree iceberg list
# Remote
fluree iceberg list --remote origin
fluree iceberg info
Show details for an Iceberg-family graph source.
Usage
fluree iceberg info <NAME> [--remote <NAME>]
Examples
# Local
fluree iceberg info warehouse-orders
# Remote
fluree iceberg info warehouse-orders --remote origin
fluree iceberg drop
Drop an Iceberg-family graph source. This command only targets Iceberg/R2RML graph sources; it does not fall back to dropping ledgers of the same name.
Usage
fluree iceberg drop <NAME> --force [--remote <NAME>]
Examples
# Local
fluree iceberg drop warehouse-orders --force
# Remote
fluree iceberg drop warehouse-orders --force --remote origin
fluree completions
Generate shell completions.
Usage
fluree completions <SHELL>
Arguments
| Argument | Description |
|---|---|
<SHELL> | Shell to generate completions for |
Supported Shells
bashzshfishpowershellelvish
Description
Generates shell completion scripts that enable tab-completion for fluree commands, options, and arguments.
Installation
Bash
# Add to ~/.bashrc
eval "$(fluree completions bash)"
# Or save to a file
fluree completions bash > /etc/bash_completion.d/fluree
Zsh
# Add to ~/.zshrc
eval "$(fluree completions zsh)"
# Or save to completions directory
fluree completions zsh > ~/.zfunc/_fluree
# Then add to ~/.zshrc: fpath=(~/.zfunc $fpath)
Fish
fluree completions fish > ~/.config/fish/completions/fluree.fish
PowerShell
# Add to your PowerShell profile
fluree completions powershell | Out-String | Invoke-Expression
Examples
# Generate bash completions
fluree completions bash
# Generate zsh completions and save
fluree completions zsh > ~/.zfunc/_fluree
Usage After Installation
After installing completions, you can use tab to complete:
fluree <TAB> # Shows all commands
fluree que<TAB> # Completes to "query"
fluree query --<TAB> # Shows available options
Getting Started
Welcome to Fluree! This section will guide you through the essential steps to start using Fluree for your graph database needs.
Quick Navigation
Fluree for SQL Developers
Coming from PostgreSQL, MySQL, or SQL Server? This guide maps SQL concepts to Fluree equivalents, shows the same operations in both languages, and highlights where Fluree gives you capabilities that relational databases don’t have.
Quickstart: Run the Server
Get Fluree up and running in minutes. Learn how to:
- Install and run the Fluree server
- Configure basic settings
- Verify the server is running
- Access the HTTP API
Quickstart: Create a Ledger
Create your first ledger to store data. Learn how to:
- Create a new ledger using the API
- Understand ledger IDs and branching
- Set up initial configuration
- Verify ledger creation
Quickstart: Write Data
Start writing data to your ledger. Learn how to:
- Insert new entities (basic inserts)
- Upsert data (idempotent transactions; predicate-level replacement for supplied predicates)
- Update existing data (WHERE/DELETE/INSERT pattern)
- Understand JSON-LD transaction format
Quickstart: Query Data
Query your data using Fluree’s powerful query languages. Learn how to:
- Write basic JSON-LD queries
- Write basic SPARQL queries
- Filter and select data
- Understand query results
Tutorial: End-to-End
Build a knowledge base that combines Fluree’s differentiating features in one workflow:
- Full-text search with BM25 relevance ranking
- Time travel to compare current and historical state
- Branching to experiment safely
- Policies for role-based access control
Using Fluree as a Rust Library
Embed Fluree directly in your Rust applications. Learn how to:
- Add Fluree as a dependency in Cargo.toml
- Use the Rust API programmatically
- Implement common patterns (insert, query, update)
- Integrate BM25 and vector search
- Handle errors and configuration
- Write tests with Fluree
What is Fluree?
Fluree is a temporal graph database that stores data as RDF triples with built-in support for:
- Time Travel: Query data as it existed at any point in time
- Full-Text Search: Integrated BM25 indexing for powerful text search
- Vector Search: Approximate nearest neighbor (ANN) queries
- Policy Enforcement: Fine-grained, data-level access control
- Verifiable Data: Cryptographically signed transactions
- Graph Sources: Integration with external data sources (Iceberg, R2RML)
Learning Path
For HTTP API users (server-based):
- Bridge the gap: Fluree for SQL Developers if coming from relational databases
- Start with the Server: Run the Server to get Fluree running
- Create Your First Ledger: Create a Ledger to set up your database
- Add Data: Write Data to insert your first entities
- Query Your Data: Query Data to retrieve and explore
- See it all together: End-to-End Tutorial — search, time travel, branching, and policies in one workflow
- Core Concepts: Read Concepts to understand Fluree’s architecture
- Practical Guides: Explore Cookbooks for search, time travel, branching, policies, and SHACL validation
- Deep Dive: Explore Query, Transactions, and Security
- Production Ready: Review Operations for deployment guidance
For Rust developers (embedded library):
- Rust API Guide: Using Fluree as a Rust Library for embedding Fluree in your application
- Core Concepts: Concepts to understand how Fluree works
- Practical Guides: Cookbooks for search, time travel, branching, policies, and validation
- Advanced Queries: Query for complex query patterns
- Transactions: Transactions for data modification patterns
- Production Ready: Operations and Dev Setup
Prerequisites
- Familiarity with JSON format
- HTTP client (curl, Postman, or your programming language’s HTTP library)
- No graph database or RDF experience required — Fluree for SQL Developers bridges the gap from relational databases
Support and Resources
- Documentation: This documentation provides comprehensive coverage
- API Reference: See HTTP API for endpoint details
- Troubleshooting: Check Troubleshooting for common issues
Let’s get started!
Fluree for SQL Developers
If you’ve spent years with PostgreSQL, MySQL, or SQL Server and are encountering a graph database for the first time, this guide bridges the gap. It maps SQL concepts you already know to their Fluree equivalents, shows you the same operations in both languages, and highlights where Fluree gives you capabilities that relational databases simply don’t have.
The mental model shift
In SQL, you design tables with fixed columns, then insert rows. In Fluree, you make statements about things — and those statements can describe anything, with any properties, at any time.
| SQL Concept | Fluree Equivalent | Key Difference |
|---|---|---|
| Database | Ledger | Immutable — every change is preserved |
| Table | Type (via rdf:type) | No fixed schema required; types are just labels |
| Row | Entity (identified by IRI) | An entity can have any properties, not just those in a “table” |
| Column | Predicate (property) | Not tied to a single type; any entity can use any property |
| Foreign key | Reference (IRI link) | Relationships are first-class, bidirectional, and traversable |
| Value | Object (literal or reference) | Typed values (string, integer, date, etc.) |
| Row (one fact) | Flake | A triple + provenance (graph, transaction time, assert/retract) |
NULL | Absence | Properties simply don’t exist if not set — no nulls |
The flake: Fluree’s atomic unit
Every fact in Fluree is stored as a flake — an extended triple that adds provenance. At its core, a flake is a statement: subject → predicate → object, plus metadata about when it was asserted, which graph it belongs to, and whether it’s an assertion or retraction.
ex:alice schema:name "Alice" (graph: default, t: 1, op: assert)
ex:alice schema:age 30 (graph: default, t: 1, op: assert)
ex:alice schema:knows ex:bob (graph: default, t: 1, op: assert)
Think of it as: “Alice’s name is Alice (added in transaction 1).” The provenance is what makes time travel and immutability possible — every change is a new flake, and retractions are recorded alongside assertions.
In SQL terms, imagine a universal table with columns entity_id, attribute, value, graph, transaction, operation — that can represent any data structure without DDL and preserves complete history.
Terminology note: In RDF standards, the core unit is called a “triple” (subject-predicate-object). Fluree’s “flake” extends the triple with temporal and provenance metadata. You’ll see both terms in the documentation — “triple” when discussing the RDF data model, “flake” when discussing Fluree’s storage and history.
Side by side: common operations
Creating structure
SQL — Define a table:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE,
department VARCHAR(100),
salary DECIMAL(10,2),
manager_id INTEGER REFERENCES employees(id)
);
Fluree — Just insert data:
fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/> .
ex:alice a schema:Person ;
schema:name "Alice Smith" ;
schema:email "alice@example.com" ;
ex:department "Engineering" ;
ex:salary 125000 ;
ex:reportsTo ex:bob .
ex:bob a schema:Person ;
schema:name "Bob Jones" ;
schema:email "bob@example.com" ;
ex:department "Engineering" .
'
There’s no CREATE TABLE. Types and properties emerge from the data itself. You can add new properties to any entity at any time without migrations.
Inserting data
SQL:
INSERT INTO employees (name, email, department, salary)
VALUES ('Carol Davis', 'carol@example.com', 'Marketing', 95000);
Fluree (CLI):
fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/> .
ex:carol a schema:Person ;
schema:name "Carol Davis" ;
schema:email "carol@example.com" ;
ex:department "Marketing" ;
ex:salary 95000 .
'
Fluree (HTTP API):
curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
-H "Content-Type: application/ld+json" \
-d '{
"@context": {
"schema": "http://schema.org/",
"ex": "http://example.org/"
},
"@id": "ex:carol",
"@type": "schema:Person",
"schema:name": "Carol Davis",
"schema:email": "carol@example.com",
"ex:department": "Marketing",
"ex:salary": 95000
}'
Basic queries
SQL:
SELECT name, email FROM employees WHERE department = 'Engineering';
Fluree (SPARQL):
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?name ?email
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:email ?email ;
ex:department "Engineering" .
}
Fluree (JSON-LD Query):
{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"select": ["?name", "?email"],
"where": [
{
"@id": "?person", "@type": "schema:Person",
"schema:name": "?name",
"schema:email": "?email",
"ex:department": "Engineering"
}
]
}
Joins
In SQL, joins are explicit operations. In Fluree, relationships are just triples — “joining” is following a link.
SQL — Find employees and their managers:
SELECT e.name AS employee, m.name AS manager
FROM employees e
JOIN employees m ON e.manager_id = m.id;
Fluree (SPARQL):
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?employee ?manager
WHERE {
?e schema:name ?employee ;
ex:reportsTo ?m .
?m schema:name ?manager .
}
No JOIN keyword — you just follow the ex:reportsTo link from one entity to another. The database traverses relationships natively.
Multi-hop relationships
This is where graphs shine. “Find everyone in Alice’s reporting chain” requires recursive CTEs in SQL but is natural in a graph.
SQL (recursive CTE):
WITH RECURSIVE chain AS (
SELECT id, name, manager_id FROM employees WHERE name = 'Alice Smith'
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e JOIN chain c ON e.id = c.manager_id
)
SELECT name FROM chain;
Fluree (SPARQL — property path):
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?name
WHERE {
ex:alice ex:reportsTo+ ?manager .
?manager schema:name ?name .
}
The + after ex:reportsTo means “follow this relationship one or more times.” No recursion needed.
Aggregation
SQL:
SELECT department, COUNT(*) as count, AVG(salary) as avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
Fluree (SPARQL):
PREFIX ex: <http://example.org/>
SELECT ?dept (COUNT(?person) AS ?count) (AVG(?salary) AS ?avg_salary)
WHERE {
?person ex:department ?dept ;
ex:salary ?salary .
}
GROUP BY ?dept
ORDER BY DESC(?avg_salary)
Updates
SQL:
UPDATE employees SET salary = 130000 WHERE name = 'Alice Smith';
Fluree (SPARQL UPDATE):
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
DELETE { ?person ex:salary ?oldSalary }
INSERT { ?person ex:salary 130000 }
WHERE { ?person schema:name "Alice Smith" ; ex:salary ?oldSalary }
The WHERE finds Alice, DELETE removes the old salary, and INSERT adds the new one. This is atomic.
Fluree (CLI — upsert for simpler cases):
fluree upsert '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"@id": "ex:alice",
"ex:salary": 130000
}'
Upsert replaces the salary value if Alice already exists, or creates the entity if she doesn’t.
Deletes
SQL:
DELETE FROM employees WHERE name = 'Carol Davis';
Fluree (SPARQL UPDATE):
PREFIX schema: <http://schema.org/>
DELETE { ?person ?p ?o }
WHERE { ?person schema:name "Carol Davis" ; ?p ?o }
But here’s the key difference: in SQL, the row is gone. In Fluree, the retraction is recorded — you can still query Carol’s data at any previous point in time.
What SQL can’t do
These features have no relational equivalent:
Time travel
Query data as it existed at any point in the past:
# What was Alice's salary before the raise?
fluree query --at 1 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?salary WHERE {
?person schema:name "Alice Smith" ; ex:salary ?salary .
}'
# Show the full history of salary changes
fluree history 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?salary ?t ?op WHERE {
?person schema:name "Alice Smith" ; ex:salary ?salary .
}'
In SQL, you’d need audit tables, temporal extensions, or trigger-based logging. In Fluree, every change is automatically preserved.
Schema flexibility
Add new properties to any entity without ALTER TABLE:
# Alice now has a phone number — no migration needed
fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/> .
ex:alice schema:telephone "+1-555-0100" .
'
Different entities of the same “type” can have different properties. There’s no fixed set of columns.
Branching
Fork your data to experiment without affecting production:
fluree branch create experiment
fluree use mydb:experiment
# Try risky changes on the branch
fluree update 'PREFIX ex: <http://example.org/>
DELETE { ?p ex:salary ?s }
INSERT { ?p ex:salary 200000 }
WHERE { ?p ex:salary ?s }'
# Main branch is untouched
fluree query --ledger mydb:main 'SELECT ?name ?salary WHERE {
?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary
}'
Triple-level access control
SQL databases give you table-level or row-level security. Fluree policies control access to individual facts:
{
"@id": "ex:hide-salary",
"f:action": "query",
"f:resource": { "f:predicate": "ex:salary" },
"f:allow": false
}
This hides salary data from everyone unless another policy explicitly grants access. The same query returns different results for different users, automatically.
Integrated full-text search
No need for Elasticsearch or Solr alongside your database:
fluree insert '{
"@context": {"ex": "http://example.org/"},
"@id": "ex:doc1",
"ex:content": {
"@value": "Fluree is a graph database with time travel and integrated search",
"@type": "@fulltext"
}
}'
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?id", "?score"],
"where": [
{"@id": "?id", "ex:content": "?text"},
["bind", "?score", "(fulltext ?text \"graph database search\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]]
}'
Common “but in SQL I would…” questions
“How do I enforce NOT NULL?” Use SHACL shapes to define constraints like required properties, value types, and cardinality.
“How do I enforce UNIQUE?” Fluree supports unique constraints in the ledger configuration.
“How do I do transactions?” Every Fluree transaction is atomic. Multiple operations in a single request either all succeed or all fail.
“How do I create indexes?” Fluree automatically maintains four indexes (SPOT, POST, OPST, PSOT) that cover all query patterns. You don’t need to create indexes manually.
“How do I paginate?”
Use LIMIT and OFFSET, just like SQL:
SELECT ?name WHERE { ?p schema:name ?name }
ORDER BY ?name LIMIT 20 OFFSET 40
“How do I do subqueries?” SPARQL supports subqueries natively:
SELECT ?name ?avgSalary WHERE {
?person schema:name ?name ; ex:department ?dept .
{ SELECT ?dept (AVG(?s) AS ?avgSalary) WHERE { ?p ex:department ?dept ; ex:salary ?s } GROUP BY ?dept }
}
Next steps
- Quickstart: Write Data — Start writing data with the HTTP API
- SPARQL Reference — Full SPARQL 1.1 query reference
- JSON-LD Query — Fluree’s JSON-native query language
- Concepts — Deeper understanding of Fluree’s architecture
- Time Travel — Full guide to temporal queries
Quickstart: Run the Server
This guide will get the Fluree server running on your machine in minutes.
Installation
Option 1: Shell Installer (macOS / Linux)
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.sh | sh
Option 2: Homebrew (macOS / Linux)
brew install fluree/tap/fluree
Option 3: PowerShell (Windows)
Open PowerShell and run:
irm https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.ps1 | iex
Then open a new PowerShell session and verify fluree --version. The installer adds %USERPROFILE%\bin to your PATH. The binary is unsigned, so Windows SmartScreen may prompt on first run — click More info → Run anyway.
Option 4: Download Pre-built Binary
Download the latest release for your platform from GitHub Releases:
# Linux (x86_64)
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-x86_64-unknown-linux-gnu.tar.xz | tar xJ
chmod +x fluree-db-cli-x86_64-unknown-linux-gnu/fluree
# macOS (Apple Silicon)
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-aarch64-apple-darwin.tar.xz | tar xJ
chmod +x fluree-db-cli-aarch64-apple-darwin/fluree
Option 5: Build from Source
If you have Rust installed:
# Clone the repository
git clone https://github.com/fluree/db.git
cd db
# Build the CLI (includes embedded server)
cargo build --release -p fluree-db-cli
# Binary will be at target/release/fluree
Option 6: Docker
# Pull the image
docker pull fluree/server:latest
# Run the container
docker run -p 8090:8090 fluree/server:latest
For configuration (mounted JSON-LD/TOML config files, env vars, persistent volumes, S3+DynamoDB, query peers, full Compose example), see Running with Docker.
Start the Server
Memory Storage (Development)
Start the server with in-memory storage (data is lost on restart):
fluree server run
You should see output like:
INFO fluree_db_server: Starting Fluree server
INFO fluree_db_server: Storage mode: memory
INFO fluree_db_server: Server listening on 0.0.0.0:8090
File Storage (Persistent)
For persistent storage, specify a storage path:
fluree server run --storage-path /var/lib/fluree
Custom Port
fluree server run --listen-addr 0.0.0.0:9090
Debug Logging
fluree server run --log-level debug
Verify Installation
Check Server Health
curl http://localhost:8090/health
Expected response:
{
"status": "ok",
"version": "4.0.3"
}
Create a Ledger
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-d '{"ledger": "test:main"}'
Insert Data
curl -X POST "http://localhost:8090/v1/fluree/insert" \
-H "Content-Type: application/json" \
-H "fluree-ledger: test:main" \
-d '{
"@context": {"ex": "http://example.org/"},
"@id": "ex:alice",
"ex:name": "Alice"
}'
Query Data
curl -X POST "http://localhost:8090/v1/fluree/query" \
-H "Content-Type: application/json" \
-d '{
"from": "test:main",
"select": {"?s": ["*"]},
"where": [["?s", "ex:name", "?name"]]
}'
Understanding the Server
Endpoints
Default server endpoints:
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Health check |
/v1/fluree/create | POST | Create a ledger |
/v1/fluree/drop | POST | Drop a ledger |
/v1/fluree/query | GET/POST | Execute queries |
/v1/fluree/insert | POST | Insert data |
/v1/fluree/update | POST | Update with WHERE/DELETE/INSERT |
/v1/fluree/events | GET | SSE event stream |
See the API Reference for complete endpoint documentation.
Storage Modes
Memory (default):
- Fast, in-process storage
- Data lost on restart
- Best for development and testing
File (with --storage-path):
- Persistent local file storage
- Data survives restarts
- Best for single-server deployments
Configuration
All options can be set via CLI flags or environment variables:
# CLI flag
fluree server run --storage-path /data --log-level debug
# Environment variables
export FLUREE_STORAGE_PATH=/data
export FLUREE_LOG_LEVEL=debug
fluree server run
See Configuration for all options.
Common Configurations
Development
fluree server run --log-level debug
Production (Single Server)
fluree server run \
--storage-path /var/lib/fluree \
--indexing-enabled \
--events-auth-mode required \
--events-auth-trusted-issuers did:key:z6Mk...
With Background Indexing
fluree server run \
--storage-path /var/lib/fluree \
--indexing-enabled
Docker Deployment
For the full Docker guide — image internals, configuration via env vars vs mounted JSON-LD/TOML config files, persistent volumes, LRU cache and indexing tuning, S3+DynamoDB connection configs, query peers, and a production-ready Compose example — see Running with Docker.
Minimal persistent run:
docker run -d --name fluree \
-p 8090:8090 \
-v fluree-data:/var/lib/fluree \
fluree/server:latest
Troubleshooting
Port Already in Use
# Use a different port
fluree server run --listen-addr 0.0.0.0:9090
Permission Denied (File Storage)
sudo chown -R $USER:$USER /var/lib/fluree
chmod -R 755 /var/lib/fluree
Server Won’t Start
Check logs with debug level:
fluree server run --log-level debug
Connection Refused
Verify the server is running and check the listen address:
# Listen on all interfaces (not just localhost)
fluree server run --listen-addr 0.0.0.0:8090
Next Steps
Now that your server is running:
- Create a Ledger - Set up your first database
- Write Data - Insert your first records
- Query Data - Retrieve and explore your data
For production deployments:
- Configuration - All server options
- Query Peers - Horizontal scaling
- Admin Authentication - Protect admin endpoints
Quickstart: Create a Ledger
Ledgers are Fluree’s fundamental unit of data organization—similar to databases in traditional systems. This guide shows you how to create your first ledger.
Understanding Ledger IDs
Ledgers are identified by ledger IDs with the format ledger-name:branch:
mydb:main- Primary branch of the “mydb” ledgercustomers:dev- Development branch of the “customers” ledgerinventory:prod- Production branch
The default branch is main, so mydb is equivalent to mydb:main.
Creating a Ledger
Rust API (Library Usage)
When using Fluree as a Rust library, create ledgers explicitly with create_ledger:
#![allow(unused)]
fn main() {
let fluree = FlureeBuilder::memory().build_memory();
// Create a new ledger (returns LedgerState at t=0)
let ledger = fluree.create_ledger("mydb").await?;
// Now insert data
let result = fluree.graph("mydb:main")
.transact()
.insert(&data)
.commit()
.await?;
}
create_ledger registers the ledger in the nameservice and returns a genesis LedgerState ready for transactions. It returns ApiError::LedgerExists (HTTP 409) if the ledger already exists.
To load an existing ledger, use ledger:
#![allow(unused)]
fn main() {
let ledger = fluree.ledger("mydb:main").await?;
}
HTTP API (Server Usage)
Via the HTTP API, create a ledger explicitly with POST /v1/fluree/create, then write data with POST /v1/fluree/insert.
Step 1: Create the Ledger
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
Response:
{
"ledger_id": "mydb:main",
"t": 0,
"tx-id": "fluree:tx:sha256:...",
"commit": {"hash": ""}
}
Step 2: Insert Data
curl -X POST http://localhost:8090/v1/fluree/insert \
-H "Content-Type: application/json" \
-H "fluree-ledger: mydb:main" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": "alice@example.org"
}
]
}'
Response:
{
"ledger_id": "mydb:main",
"t": 1,
"tx-id": "fluree:tx:sha256:...",
"commit": {"hash": "bagaybqab..."}
}
The ledger mydb:main now has data!
Verifying Ledger Creation
Check Ledger Exists
curl http://localhost:8090/v1/fluree/exists/mydb:main
Response:
{
"ledger_id": "mydb:main",
"exists": true
}
Query the Ledger
Verify you can query the new ledger:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?name"],
"where": [
{ "@id": "?person", "schema:name": "?name" }
]
}'
Response:
[
{ "name": "Alice" }
]
Ledger Naming Best Practices
Descriptive Names
Choose names that clearly indicate purpose:
Good examples:
customers:maininventory:prodanalytics:warehouse
Bad examples:
db1:maintest:maindata:main
Hierarchical Organization
Use slashes for logical grouping:
tenant/app:main
tenant/app:dev
department/project:feature-x
Branch Naming
Establish consistent branch naming conventions:
mydb:main - Production branch
mydb:dev - Development branch
mydb:staging - Staging branch
mydb:feature-auth - Feature branch
mydb:bugfix-login - Bug fix branch
Working with Branches
Creating a New Branch
Branches are independent ledgers. First create the branch, then transact data into it:
# Create the branch
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:dev"}'
# Insert data into the branch
curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:dev \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:bob",
"@type": "schema:Person",
"schema:name": "Bob"
}
]
}'
Now you have two independent ledgers:
mydb:main(with Alice)mydb:dev(with Bob)
Understanding Branch Independence
Branches are completely independent—changes in one don’t affect the other:
# Query main branch
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "mydb:main", "select": ["?name"], "where": [{"@id": "?person", "schema:name": "?name"}]}'
# Returns: [{"name": "Alice"}]
# Query dev branch
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "mydb:dev", "select": ["?name"], "where": [{"@id": "?person", "schema:name": "?name"}]}'
# Returns: [{"name": "Bob"}]
Ledger Metadata
Each ledger maintains metadata accessible via the nameservice:
- commit_t: Latest transaction time
- index_t: Latest indexed transaction time
- commit_id: ContentId (CID) of the latest commit
- index_id: ContentId (CID) of the latest index
- default_context: Default JSON-LD @context for the ledger
Checking Ledger Status
curl http://localhost:8090/v1/fluree/info/mydb:main
Response:
{
"ledger_id": "mydb:main",
"branch": "main",
"commit_t": 1,
"index_t": 1,
"commit_id": "bafybeig...commitT1",
"index_id": "bafybeig...indexT1",
"created": "2024-01-22T10:30:00.000Z",
"last_updated": "2024-01-22T10:30:05.000Z"
}
Understanding Commit vs Index
- commit_t: Most recent transaction (always up-to-date)
- index_t: Most recent indexed snapshot (may lag behind commits)
- Gap: If
commit_t > index_t, there’s a “novelty layer” being indexed
See Ledgers and Nameservice for details.
Multi-Tenant Scenarios
For multi-tenant applications, use hierarchical naming:
tenant1/app:main
tenant1/app:dev
tenant2/app:main
tenant2/app:dev
Or use separate ledgers per tenant:
tenant1-customers:main
tenant1-orders:main
tenant2-customers:main
tenant2-orders:main
Setting Default Context
A ledger may have a stored default JSON-LD @context that the CLI and HTTP server can auto-inject into queries that omit @context / PREFIX. Two ways to set it:
- At import time:
fluree create --from data.ttlcaptures@prefixdeclarations from the Turtle source and stores them as the default. - Explicitly:
fluree context set <ledger> <ctx.json>, orPUT /v1/fluree/context/{ledger...}over HTTP.
Regular JSON-LD transactions (insert/update) do not update the default context — only the two paths above do.
// One-time setup via the CLI:
// fluree context set mydb context.json
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
}
}
After this, the CLI (fluree query) and the HTTP server query endpoint will inject the stored context into queries that don’t supply their own @context / PREFIX. Direct fluree-db-api consumers do not get auto-injection — they must opt in via Fluree::db_with_default_context(...) or include @context in each query. See docs/concepts/iri-and-context.md for the full opt-in story.
Common Patterns
Development Workflow
1. Create main branch: mydb:main
2. Create dev branch: mydb:dev
3. Develop and test in dev
4. Copy desired state to main (application logic)
5. Repeat
Feature Branching
1. Create feature branch: mydb:feature-x
2. Develop feature in isolation
3. Test thoroughly
4. Merge to main (via application logic)
5. Optionally retract feature branch
Environment Separation
mydb:dev - Development environment
mydb:staging - Staging environment
mydb:prod - Production environment
Troubleshooting
Ledger Already Exists
If you try to query a ledger before it exists:
Error: Ledger not found: mydb:main
Solution: Create the ledger with a transaction first.
Permission Issues (File Storage)
If using file storage, ensure the server has write permissions:
# Check data directory permissions
ls -la /path/to/data
# Fix permissions if needed
sudo chown -R fluree:fluree /path/to/data
chmod -R 755 /path/to/data
AWS Storage Issues
For AWS storage, verify credentials and bucket access:
# Test S3 access
aws s3 ls s3://your-fluree-bucket/
# Test DynamoDB access
aws dynamodb describe-table --table-name fluree-nameservice
Next Steps
Now that you have a ledger:
- Write Data - Learn how to insert, upsert, and update data
- Query Data - Explore your data with queries
- Concepts: Ledgers - Deep dive into ledger architecture
Related Documentation
- Ledgers and Nameservice - Architectural details
- Transactions - Writing data to ledgers
- Storage Modes - Storage backend options
Quickstart: Write Data
This guide shows you how to write data to Fluree using three main patterns: insert, upsert, and update.
Prerequisites
- Fluree server running (see Run the Server)
- A ledger created (see Create a Ledger)
Understanding Fluree Transactions
Fluree stores data as RDF triples (subject-predicate-object). Transactions are submitted as JSON-LD documents that get converted to triples internally.
Basic Transaction Structure
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
}
]
}
This creates triples like:
ex:alice rdf:type schema:Person
ex:alice schema:name "Alice"
Insert: Adding New Data
The simplest operation is inserting new entities.
Insert a Single Entity
curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": "alice@example.org",
"schema:age": 30
}
]
}'
Response:
{
"t": 1,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT1",
"flakes_added": 4,
"flakes_retracted": 0
}
Insert Multiple Entities
curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:bob",
"@type": "schema:Person",
"schema:name": "Bob",
"schema:email": "bob@example.org"
},
{
"@id": "ex:carol",
"@type": "schema:Person",
"schema:name": "Carol",
"schema:email": "carol@example.org"
}
]
}'
Insert with Relationships
curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:company-a",
"@type": "schema:Organization",
"schema:name": "Acme Corp"
},
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:worksFor": {"@id": "ex:company-a"}
}
]
}'
Upsert: Idempotent Transactions
Upsert (update/insert) replaces values for the predicates you supply on an entity. If the entity doesn’t exist, it’s created.
Basic Upsert
Use the dedicated /upsert endpoint:
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice Smith",
"schema:email": "alice.smith@example.org",
"schema:age": 31
}
]
}'
This replaces existing values for the predicates included in the payload (for ex:alice, those are @type, schema:name, schema:email, schema:age).
Upsert Behavior
First transaction (entity doesn’t exist):
- Creates the entity with all specified properties
Subsequent transactions (entity exists):
- Retracts existing values for the supplied predicates
- Asserts new values for those predicates
- Leaves other predicates unchanged
Use Cases for Upsert
Good for:
- Idempotent transactions (can retry safely)
- Syncing from external systems
- Replacing values for the predicates you supply
- Avoiding duplicate checks
Not good for:
- Conditional/targeted changes (use UPDATE instead)
Update: Targeted Changes (WHERE/DELETE/INSERT)
For targeted changes to existing data, use the UPDATE pattern with WHERE/DELETE/INSERT.
Basic Update
curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"delete": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"insert": [
{ "@id": "ex:alice", "schema:age": 32 }
]
}'
This pattern:
- WHERE: Finds matching data
- DELETE: Retracts specific triples
- INSERT: Asserts new triples
Update Multiple Properties
curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:name": "?name", "schema:email": "?email" }
],
"delete": [
{ "@id": "ex:alice", "schema:name": "?name", "schema:email": "?email" }
],
"insert": [
{ "@id": "ex:alice", "schema:name": "Alice Johnson", "schema:email": "alice.j@example.org" }
]
}'
Conditional Update
Only update if condition is met:
curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:age": "?age" },
{ "@id": "?age", "@type": "xsd:integer" }
],
"delete": [
{ "@id": "ex:alice", "schema:age": "?age" }
],
"insert": [
{ "@id": "ex:alice", "schema:age": { "@value": "32", "@type": "xsd:integer" } }
]
}'
Adding Properties (Not Replacing)
To add a property without removing existing ones, use INSERT only:
curl -X POST http://localhost:8090/v1/fluree/update?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"insert": [
{ "@id": "ex:alice", "schema:telephone": "+1-555-0100" }
]
}'
This adds the telephone property without affecting other properties.
Data Types
Fluree supports various data types through JSON-LD typing:
Strings (Default)
{
"@id": "ex:alice",
"schema:name": "Alice"
}
Numbers
{
"@id": "ex:alice",
"schema:age": 30,
"schema:height": 1.68
}
Booleans
{
"@id": "ex:alice",
"schema:active": true
}
Dates
{
"@id": "ex:alice",
"schema:birthDate": {
"@value": "1994-05-15",
"@type": "xsd:date"
}
}
Timestamps
{
"@id": "ex:alice",
"schema:lastLogin": {
"@value": "2024-01-22T10:30:00Z",
"@type": "xsd:dateTime"
}
}
References (Links to Other Entities)
{
"@id": "ex:alice",
"schema:worksFor": { "@id": "ex:company-a" }
}
Transaction Receipts
Every successful transaction returns a receipt with metadata:
{
"t": 5,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT5",
"flakes_added": 3,
"flakes_retracted": 2,
"previous_commit_id": "bafybeig...commitT4"
}
Key fields:
- t: Transaction time (monotonically increasing)
- timestamp: ISO 8601 timestamp
- commit_id: Content-addressed identifier (CID) for the commit
- flakes_added: Number of triples added
- flakes_retracted: Number of triples removed
- previous_commit_id: ContentId of the previous commit (present when t > 1)
See Commit Receipts for details.
Error Handling
Transaction Errors
If a transaction fails, you’ll receive an error response:
{
"error": "TransactionError",
"message": "Invalid IRI: not a valid URI",
"code": "INVALID_IRI"
}
Common errors:
- INVALID_IRI: Malformed IRIs
- PARSE_ERROR: Invalid JSON-LD syntax
- TYPE_ERROR: Type mismatch
- CONSTRAINT_VIOLATION: Data constraint violated
Validation
Transactions are validated before being applied:
- JSON-LD syntax must be valid
- IRIs must be well-formed
- Types must be compatible
- References must resolve (optional)
Best Practices
1. Use Appropriate Transaction Pattern
- Insert: New entities, no duplication concerns
- Upsert: Idempotent transactions, predicate-level replacement for supplied predicates
- Update: Targeted changes, preserve other properties
2. Choose Meaningful IRIs
Good:
{"@id": "ex:user-12345"}
{"@id": "ex:product-widget-2024"}
Bad:
{"@id": "ex:1"}
{"@id": "ex:thing"}
3. Use Consistent Namespaces
Define a clear namespace strategy:
{
"@context": {
"app": "https://myapp.com/ns/",
"schema": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
}
}
4. Batch Related Changes
Include related entities in a single transaction:
{
"@graph": [
{"@id": "ex:order-123", "ex:customer": {"@id": "ex:alice"}},
{"@id": "ex:order-123", "ex:product": {"@id": "ex:widget"}},
{"@id": "ex:order-123", "ex:quantity": 5}
]
}
5. Use Typed Literals
Be explicit about types for dates, numbers, etc.:
{
"@id": "ex:alice",
"ex:birthDate": {
"@value": "1994-05-15",
"@type": "xsd:date"
}
}
Transaction Size Limits
Be aware of transaction size constraints:
- Recommended: < 1000 triples per transaction
- Maximum: Configurable (default: 10,000 triples)
- Large imports: Use batch processing
See Indexing Side-Effects for performance considerations.
Next Steps
Now that you can write data:
- Query Data - Learn how to retrieve your data
- Transactions Overview - Detailed transaction documentation
- JSON-LD Context - Understanding @context
Related Documentation
- Insert - Detailed insert documentation
- Upsert - Detailed upsert documentation
- Update - Detailed update documentation
- Data Types - Comprehensive type system guide
Quickstart: Query Data
This guide introduces you to querying data in Fluree using both JSON-LD Query and SPARQL.
Prerequisites
- Fluree server running with data (complete previous quickstarts)
- Sample data from Write Data guide
Query Languages
Fluree supports two query languages:
- JSON-LD Query: Fluree’s native JSON-based query language
- SPARQL: W3C standard RDF query language
Both provide access to the same data and features.
JSON-LD Query
Basic SELECT Query
Retrieve all person names:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?name"],
"where": [
{ "@id": "?person", "schema:name": "?name" }
]
}'
Response:
[
{ "name": "Alice" },
{ "name": "Bob" },
{ "name": "Carol" }
]
Query Multiple Properties
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?person", "schema:name": "?name" },
{ "@id": "?person", "schema:email": "?email" }
]
}'
Response:
[
{ "name": "Alice", "email": "alice@example.org" },
{ "name": "Bob", "email": "bob@example.org" },
{ "name": "Carol", "email": "carol@example.org" }
]
Filter Results
Query with a specific filter:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?name", "?age"],
"where": [
{ "@id": "?person", "schema:name": "?name" },
{ "@id": "?person", "schema:age": "?age" }
],
"filter": "?age > 25"
}'
Query Specific Entity
Query a specific entity by IRI:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?name", "?email", "?age"],
"where": [
{ "@id": "ex:alice", "schema:name": "?name" },
{ "@id": "ex:alice", "schema:email": "?email" },
{ "@id": "ex:alice", "schema:age": "?age" }
]
}'
Query with Relationships
Follow links between entities:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?personName", "?companyName"],
"where": [
{ "@id": "?person", "schema:name": "?personName" },
{ "@id": "?person", "schema:worksFor": "?company" },
{ "@id": "?company", "schema:name": "?companyName" }
]
}'
Response:
[
{ "personName": "Alice", "companyName": "Acme Corp" }
]
SPARQL
Basic SELECT Query
The same queries in SPARQL syntax:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d '
PREFIX schema: <http://schema.org/>
SELECT ?name
FROM <mydb:main>
WHERE {
?person schema:name ?name .
}
'
Query Multiple Properties
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d '
PREFIX schema: <http://schema.org/>
SELECT ?name ?email
FROM <mydb:main>
WHERE {
?person schema:name ?name .
?person schema:email ?email .
}
'
Filter Results
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d '
PREFIX schema: <http://schema.org/>
SELECT ?name ?age
FROM <mydb:main>
WHERE {
?person schema:name ?name .
?person schema:age ?age .
FILTER (?age > 25)
}
'
Query with Relationships
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d '
PREFIX schema: <http://schema.org/>
SELECT ?personName ?companyName
FROM <mydb:main>
WHERE {
?person schema:name ?personName .
?person schema:worksFor ?company .
?company schema:name ?companyName .
}
'
Time Travel Queries
Query historical data using time specifiers.
Query at Specific Transaction
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main@t:1",
"select": ["?name"],
"where": [
{ "@id": "?person", "schema:name": "?name" }
]
}'
This shows data as it existed at transaction 1.
Query at ISO Timestamp
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main@iso:2024-01-22T10:00:00Z",
"select": ["?name"],
"where": [
{ "@id": "?person", "schema:name": "?name" }
]
}'
Query at Commit ContentId
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main@commit:bafybeig...",
"select": ["?name"],
"where": [
{ "@id": "?person", "schema:name": "?name" }
]
}'
See Time Travel for comprehensive details.
History Queries
Track changes to entities over time by specifying a time range in the from clause.
Entity History
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"from": "mydb:main@t:1",
"to": "mydb:main@t:latest",
"select": ["?name", "?age", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "schema:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
{ "@id": "ex:alice", "schema:age": "?age" }
],
"orderBy": "?t"
}'
The @t annotation binds the transaction time, and @op binds the operation type as a boolean (true = assert, false = retract).
Response shows all changes:
[
["Alice", 30, 1, true],
["Alice", 30, 5, false],
["Alicia", 31, 5, true]
]
Property History
Track changes to a specific property:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"from": "mydb:main@t:1",
"to": "mydb:main@t:latest",
"select": ["?age", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "schema:age": { "@value": "?age", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}'
Response:
[
[30, 1, true],
[30, 5, false],
[31, 5, true]
]
Aggregations
Count Results
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d '
PREFIX schema: <http://schema.org/>
SELECT (COUNT(?person) AS ?count)
FROM <mydb:main>
WHERE {
?person schema:name ?name .
}
'
Average, Min, Max
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d '
PREFIX schema: <http://schema.org/>
SELECT (AVG(?age) AS ?avgAge) (MIN(?age) AS ?minAge) (MAX(?age) AS ?maxAge)
FROM <mydb:main>
WHERE {
?person schema:age ?age .
}
'
Limiting Results
JSON-LD Query Limit
{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?name"],
"where": [
{ "@id": "?person", "schema:name": "?name" }
],
"limit": 10
}
SPARQL Limit and Offset
PREFIX schema: <http://schema.org/>
SELECT ?name
FROM <mydb:main>
WHERE {
?person schema:name ?name .
}
ORDER BY ?name
LIMIT 10
OFFSET 20
Ordering Results
JSON-LD Query Order
{
"@context": {
"schema": "http://schema.org/"
},
"from": "mydb:main",
"select": ["?name", "?age"],
"where": [
{ "@id": "?person", "schema:name": "?name" },
{ "@id": "?person", "schema:age": "?age" }
],
"orderBy": ["?age"]
}
SPARQL Order
PREFIX schema: <http://schema.org/>
SELECT ?name ?age
FROM <mydb:main>
WHERE {
?person schema:name ?name .
?person schema:age ?age .
}
ORDER BY DESC(?age)
Multi-Ledger Queries
Query across multiple ledgers:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": {
"schema": "http://schema.org/"
},
"from": ["customers:main", "orders:main"],
"select": ["?customerName", "?orderTotal"],
"where": [
{ "@id": "?customer", "schema:name": "?customerName" },
{ "@id": "?order", "schema:customer": "?customer" },
{ "@id": "?order", "schema:totalPrice": "?orderTotal" }
]
}'
See Datasets for comprehensive multi-graph query documentation.
Understanding Query Results
JSON-LD Query Results
Results are returned as an array of objects:
[
{ "name": "Alice", "age": 30 },
{ "name": "Bob", "age": 25 }
]
SPARQL Results
SPARQL returns results in SPARQL JSON format:
{
"head": {
"vars": ["name", "age"]
},
"results": {
"bindings": [
{
"name": { "type": "literal", "value": "Alice" },
"age": { "type": "literal", "value": "30", "datatype": "http://www.w3.org/2001/XMLSchema#integer" }
},
{
"name": { "type": "literal", "value": "Bob" },
"age": { "type": "literal", "value": "25", "datatype": "http://www.w3.org/2001/XMLSchema#integer" }
}
]
}
}
See Output Formats for format details.
Query Performance Tips
1. Use Specific Patterns
More specific patterns are faster:
Good:
{ "@id": "ex:alice", "schema:name": "?name" }
Less efficient:
{ "@id": "?person", "?predicate": "?value" }
2. Filter Early
Apply filters in WHERE clauses when possible:
"where": [
{ "@id": "?person", "schema:age": "?age" }
],
"filter": "?age > 25"
3. Limit Results
Always use LIMIT for large result sets:
"limit": 100
4. Use Indexes
Queries leverage automatic indexes. Structure queries to take advantage:
- Subject-based lookups are fast
- Predicate-based lookups are fast
- Complex graph patterns may be slower
See Explain Plans for query optimization.
Common Query Patterns
Find All Types
SELECT DISTINCT ?type
FROM <mydb:main>
WHERE {
?entity a ?type .
}
Find All Predicates
SELECT DISTINCT ?predicate
FROM <mydb:main>
WHERE {
?subject ?predicate ?object .
}
Inverse Relationships
Find what points to an entity:
SELECT ?source ?predicate
FROM <mydb:main>
WHERE {
?source ?predicate <http://example.org/ns/alice> .
}
Optional Properties
Query with optional values:
PREFIX schema: <http://schema.org/>
SELECT ?name ?email ?phone
FROM <mydb:main>
WHERE {
?person schema:name ?name .
?person schema:email ?email .
OPTIONAL { ?person schema:telephone ?phone }
}
Error Handling
Query Errors
Common query errors:
{
"error": "QueryError",
"message": "Ledger not found: mydb:main",
"code": "LEDGER_NOT_FOUND"
}
{
"error": "ParseError",
"message": "Invalid JSON-LD: unexpected token",
"code": "PARSE_ERROR"
}
Empty Results
Empty result set (not an error):
[]
Next Steps
Now that you can query data:
- Learn Advanced Queries: Explore JSON-LD Query and SPARQL documentation
- Understand Time Travel: Deep dive into Time Travel
- Optimize Queries: Read about Explain Plans
- Multi-Graph Queries: Learn about Datasets
Related Documentation
- JSON-LD Query - Complete JSON-LD query reference
- SPARQL - Complete SPARQL reference
- Output Formats - Result format options
- Time Travel - Historical queries
- Graph Crawl - Graph traversal
Tutorial: Building a Knowledge Base with Fluree
This tutorial walks through a realistic scenario — building a team knowledge base — to show how Fluree’s differentiating features work together. You’ll use time travel, full-text search, branching, and access control in a single workflow.
Time: ~20 minutes
Prerequisites: Fluree installed and running (fluree init && fluree server run)
Step 1: Create the ledger and add data
fluree create knowledge-base
fluree use knowledge-base
Insert some articles and team members:
fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/> .
@prefix f: <https://ns.flur.ee/db#> .
ex:alice a schema:Person ;
schema:name "Alice Chen" ;
ex:role "engineer" ;
ex:team "platform" .
ex:bob a schema:Person ;
schema:name "Bob Martinez" ;
ex:role "engineer" ;
ex:team "platform" .
ex:carol a schema:Person ;
schema:name "Carol White" ;
ex:role "manager" ;
ex:team "platform" .
ex:doc1 a ex:Article ;
schema:name "Deployment Runbook" ;
schema:author ex:alice ;
ex:team "platform" ;
ex:visibility "internal" ;
ex:content "Step 1: Check the monitoring dashboard. Step 2: Run the database migration script. Step 3: Deploy the new container image using the CI pipeline."^^f:fullText .
ex:doc2 a ex:Article ;
schema:name "Onboarding Guide" ;
schema:author ex:bob ;
ex:team "platform" ;
ex:visibility "public" ;
ex:content "Welcome to the platform team. This guide covers setting up your development environment, accessing the database, and deploying your first service."^^f:fullText .
ex:doc3 a ex:Article ;
schema:name "Incident Response Playbook" ;
schema:author ex:carol ;
ex:team "platform" ;
ex:visibility "confidential" ;
ex:content "During a production incident, the on-call engineer should check database health, review recent deployments, and escalate if the service is not recovering within 15 minutes."^^f:fullText .
'
Verify the data is there:
fluree query --format table 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?title ?author_name ?visibility
WHERE {
?doc a ex:Article ;
schema:name ?title ;
schema:author ?author ;
ex:visibility ?visibility .
?author schema:name ?author_name .
}
ORDER BY ?title'
┌─────────────────────────────┬───────────────┬──────────────┐
│ title │ author_name │ visibility │
├─────────────────────────────┼───────────────┼──────────────┤
│ Deployment Runbook │ Alice Chen │ internal │
│ Incident Response Playbook │ Carol White │ confidential │
│ Onboarding Guide │ Bob Martinez │ public │
└─────────────────────────────┴───────────────┴──────────────┘
This is transaction t=1. Remember this — we’ll come back to it.
Step 2: Full-text search
The article content was inserted with the @fulltext datatype, so it’s automatically indexed for BM25 relevance scoring. Search for articles about deployments:
fluree query '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"select": ["?title", "?score"],
"where": [
{
"@id": "?doc", "@type": "ex:Article",
"ex:content": "?content",
"schema:name": "?title"
},
["bind", "?score", "(fulltext ?content \"database deployment\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}'
Results are ranked by relevance — the deployment runbook and incident playbook both mention deployments and databases, while the onboarding guide has a weaker match.
You can combine search with graph filters. Find only public articles matching the search:
fluree query '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"select": ["?title", "?score"],
"where": [
{
"@id": "?doc", "@type": "ex:Article",
"ex:content": "?content",
"schema:name": "?title",
"ex:visibility": "public"
},
["bind", "?score", "(fulltext ?content \"database deployment\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]]
}'
Search results participate in standard graph joins and filters — no separate search service needed.
Step 3: Update data and use time travel
Let’s update the deployment runbook with a new version:
fluree update 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>
DELETE { ex:doc1 ex:content ?old }
INSERT { ex:doc1 ex:content "Step 1: Check the monitoring dashboard and verify all health checks pass. Step 2: Run the database migration script with --dry-run first. Step 3: Deploy the new container image. Step 4: Verify the deployment in staging before promoting to production."^^f:fullText }
WHERE { ex:doc1 ex:content ?old }'
Now query the current version:
fluree query 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?content WHERE { ex:doc1 ex:content ?content }'
And query the original version using time travel:
fluree query --at 1 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?content WHERE { ex:doc1 ex:content ?content }'
The --at 1 flag queries the data as it was after transaction 1 — before the update. Both versions coexist in the same ledger.
You can also see the full change history:
fluree history 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?content ?t ?op WHERE { ex:doc1 ex:content ?content }'
Each result includes ?t (the transaction number) and ?op (whether it was an assertion or retraction). You see the original content retracted and the new content asserted, with exact timestamps.
Use cases this enables:
- Audit trails — Who changed what, when?
- Rollback — See what the data looked like before a bad change
- Compliance — Prove what was known at a specific point in time
- Debugging — Compare current vs. historical state to find when a problem was introduced
Step 4: Branch to experiment safely
Suppose you want to reorganize the knowledge base — maybe split articles into categories, or restructure ownership. You don’t want to affect the production data while experimenting.
Create a branch:
fluree branch create reorganize
fluree use knowledge-base:reorganize
On the branch, add categories and reorganize:
fluree insert '
@prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/> .
ex:doc1 ex:category "operations" .
ex:doc2 ex:category "onboarding" .
ex:doc3 ex:category "operations" .
'
fluree update 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
DELETE { ex:doc3 ex:visibility "confidential" }
INSERT { ex:doc3 ex:visibility "internal" }
WHERE { ex:doc3 ex:visibility "confidential" }'
Verify the branch has the changes:
fluree query --format table 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?title ?category ?visibility
WHERE {
?doc a ex:Article ;
schema:name ?title ;
ex:category ?category ;
ex:visibility ?visibility .
}
ORDER BY ?title'
The main branch is untouched:
fluree query --ledger knowledge-base:main 'PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
SELECT ?title ?visibility
WHERE {
?doc a ex:Article ; schema:name ?title ; ex:visibility ?visibility .
OPTIONAL { ?doc ex:category ?cat }
FILTER(!BOUND(?cat))
}
ORDER BY ?title'
No categories on main — the branch is fully isolated.
When you’re happy with the changes, merge back:
fluree branch merge reorganize
fluree use knowledge-base:main
Now main has the categories and the visibility change. The branch can continue for future experiments or be dropped:
fluree branch drop reorganize
Step 5: Add access control
Now let’s add policies so that different users see different articles based on their role and team.
Insert policies into the ledger:
fluree insert '{
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:policy-public-read",
"@type": "f:Policy",
"f:action": "query",
"f:resource": { "ex:visibility": "public" },
"f:allow": true
},
{
"@id": "ex:policy-team-internal",
"@type": "f:Policy",
"f:subject": "?user",
"f:action": "query",
"f:resource": {
"ex:visibility": "internal",
"ex:team": "?team"
},
"f:condition": [
{ "@id": "?user", "ex:team": "?team" }
],
"f:allow": true
},
{
"@id": "ex:policy-manager-confidential",
"@type": "f:Policy",
"f:subject": "?user",
"f:action": "query",
"f:resource": {
"ex:visibility": "confidential",
"ex:team": "?team"
},
"f:condition": [
{ "@id": "?user", "ex:team": "?team", "ex:role": "manager" }
],
"f:allow": true
}
]
}'
These three policies create a layered access model:
- Public articles — visible to everyone
- Internal articles — visible only to members of the same team
- Confidential articles — visible only to managers on the same team
Query as Alice (engineer, platform team):
fluree query '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"select": ["?title", "?visibility"],
"where": [
{"@id": "?doc", "@type": "ex:Article", "schema:name": "?title", "ex:visibility": "?visibility"}
],
"opts": {"identity": "ex:alice"}
}'
Alice sees the public onboarding guide and the internal deployment runbook, but not the confidential incident playbook.
Query as Carol (manager, platform team):
fluree query '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"select": ["?title", "?visibility"],
"where": [
{"@id": "?doc", "@type": "ex:Article", "schema:name": "?title", "ex:visibility": "?visibility"}
],
"opts": {"identity": "ex:carol"}
}'
Carol sees all three articles, including the confidential one.
The same query, different results, based on who’s asking — enforced by the database, not application code.
Step 6: Combine everything
Now let’s use all features together. Carol (manager) searches for articles about “database” in the knowledge base, with policies applied, and compares what she sees now vs. what existed before the reorganization:
Current state, with policy:
fluree query '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"select": ["?title", "?visibility", "?score"],
"where": [
{
"@id": "?doc", "@type": "ex:Article",
"ex:content": "?content",
"schema:name": "?title",
"ex:visibility": "?visibility"
},
["bind", "?score", "(fulltext ?content \"database\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]],
"opts": {"identity": "ex:carol"}
}'
Historical state (before runbook was updated):
fluree query '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"from": "knowledge-base:main@t:1",
"select": ["?title", "?score"],
"where": [
{
"@id": "?doc", "@type": "ex:Article",
"ex:content": "?content",
"schema:name": "?title"
},
["bind", "?score", "(fulltext ?content \"database\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]]
}'
In a single database, you’ve combined:
- Full-text search — ranked by relevance
- Access control — Carol sees confidential articles, others wouldn’t
- Time travel — compare current vs. historical content
- Branching — experimented with reorganization without risk
What you’ve learned
| Feature | What it gave you |
|---|---|
| Ledger | A single place for all knowledge base data |
| Full-text search | BM25-ranked article discovery, integrated in queries |
| Time travel | Complete audit trail, historical comparison, rollback capability |
| Branching | Safe experimentation without affecting production |
| Policies | Automatic access control based on team and role |
| SPARQL + JSON-LD | Two query languages accessing the same engine |
Next steps
- Search Cookbook — Deeper guide to BM25 and vector search
- Time Travel Cookbook — Practical time-travel patterns
- Branching Cookbook — Branch/merge workflows
- Policies Cookbook — Access control patterns
- SPARQL Reference — Full SPARQL 1.1 reference
- JSON-LD Query — Fluree’s native query language
Using Fluree as a Rust Library
This guide shows how to use Fluree programmatically in your Rust applications by depending on the fluree-db-api crate.
Overview
Fluree can be embedded directly in Rust applications, giving you a powerful graph database without requiring a separate server process. This is ideal for:
- Desktop applications
- Edge computing
- Embedded systems
- Library/framework integration
- Testing and development
Add Dependency
Add Fluree to your Cargo.toml:
[dependencies]
fluree-db-api = { path = "../fluree-db-api" }
tokio = { version = "1", features = ["full"] }
Note: Replace path with version when published to crates.io:
[dependencies]
fluree-db-api = "0.1"
Features
Available feature flags:
native(default) - File storage supportcredential(default in server/CLI) - DID/JWS/VerifiableCredential support for signed queries and transactionsshacl(default in server/CLI) - SHACL constraint validationiceberg(default in server/CLI) - Apache Iceberg/R2RML graph source supportaws- AWS-backed storage support (S3, storage-backed nameservice). EnablesFlureeBuilder::s3()and S3-based JSON-LD configs.ipfs- IPFS-backed storage via Kubo HTTP RPCvector- Embedded vector similarity search (HNSW indexes via usearch)search-remote-client- Remote search service client (HTTP client for remote BM25 and vector search services)aws-testcontainers- Opt-in LocalStack-backed S3/DynamoDB tests (auto-start via testcontainers)full- Convenience bundle:native,credential,iceberg,shacl,ipfs
Quick Start
Basic Setup
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
// Create a memory-backed Fluree instance
let fluree = FlureeBuilder::memory().build_memory();
// Create a new ledger
let ledger = fluree.create_ledger("mydb").await?;
println!("Ledger created at t={}", ledger.t());
Ok(())
}
With File Storage
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
// Use file-backed storage for persistence
let fluree = FlureeBuilder::file("./data").build()?;
// Create a new ledger (or load an existing one)
let ledger = fluree.create_ledger("mydb").await?;
// Load an existing ledger by ID (`name:branch`)
let ledger = fluree.ledger("mydb:main").await?;
Ok(())
}
Bulk import (high throughput)
For initial ledger bootstraps (large Turtle or JSON-LD datasets), Fluree exposes a bulk import pipeline as a first-class Rust API:
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// `chunks_dir` can be:
// - a directory containing *.ttl, *.trig, or *.jsonld files (sorted lexicographically), OR
// - a single .ttl or .jsonld file.
// Directories must contain a single format (no mixing Turtle and JSON-LD).
let result = fluree
.create("dblp:main")
.import("./chunks_dir")
.threads(8) // parallel TTL parsing; commits remain serial
.build_index(true) // write an index root and publish it
.publish_every(50) // nameservice checkpoints during long imports (0 disables)
.cleanup(true) // delete tmp import files on success
.execute()
.await?;
println!(
"import complete: t={}, flakes={}, root={:?}",
result.t, result.flake_count, result.root_id
);
// Query normally after import (loads the published V2 root from CAS).
let view = fluree.view("dblp:main").await?;
let qr = fluree
.query(&view, "SELECT * WHERE { ?s ?p ?o } LIMIT 10")
.await?;
println!("rows={}", qr.batches.iter().map(|b| b.len()).sum::<usize>());
Ok(())
}
Temporary files: the bulk import pipeline uses a session-scoped tmp_import/ directory and
removes it only on full success (unless .cleanup(false) is set). On failure, it keeps the
session directory and logs its path for debugging.
With S3 Storage
Requires fluree-db-api feature aws and standard AWS credential/region configuration.
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
// LocalStack/MinIO: endpoint is required
let fluree = FlureeBuilder::s3("my-bucket", "http://localhost:4566")
.build_client()
.await?;
let ledger = fluree.create_ledger("mydb").await?;
println!("Ledger created at t={}", ledger.t());
Ok(())
}
S3 Express One Zone note: for directory buckets (--x-s3 suffix), omit s3Endpoint in JSON-LD config and let the SDK handle it.
Connection Configuration (JSON-LD)
For advanced configuration (tiered storage, address identifier routing, DynamoDB nameservice,
environment variable indirection), use FlureeBuilder::from_json_ld() to parse a JSON-LD config
and build from it. The typed builder methods (build(), build_memory(), build_s3()) and
the type-erased build_client() all share the same underlying construction logic.
See also: JSON-LD connection configuration reference.
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let cfg = json!({
"@context": {"@base": "https://ns.flur.ee/config/connection/", "@vocab": "https://ns.flur.ee/system#"},
"@graph": [
{"@id": "s3Index", "@type": "Storage", "s3Bucket": {"envVar": "INDEX_BUCKET"}, "s3Endpoint": {"envVar": "S3_ENDPOINT"}},
{"@id": "conn", "@type": "Connection", "indexStorage": {"@id": "s3Index"}}
]
});
// from_json_ld parses the config into builder settings; build_client() constructs
// a type-erased FlureeClient suitable for runtime-determined backends.
let fluree = FlureeBuilder::from_json_ld(&cfg)?.build_client().await?;
Ok(())
}
Environment variables (ConfigurationValue)
Any string/number config value can be specified directly or via a ConfigurationValue object:
{
"s3Bucket": { "envVar": "FLUREE_S3_BUCKET", "defaultVal": "my-bucket" },
"cacheMaxMb": { "envVar": "FLUREE_CACHE_MAX_MB", "defaultVal": "1024" }
}
Supported JSON-LD fields (Rust)
Connection node:
parallelismcacheMaxMbindexStorage,commitStorageprimaryPublisher(publisher node)
Storage node:
- File:
filePath,AES256Key - S3:
s3Bucket,s3Prefix,s3Endpoint,s3ReadTimeoutMs,s3WriteTimeoutMs,s3ListTimeoutMs,s3MaxRetries,s3RetryBaseDelayMs,s3RetryMaxDelayMs
Publisher node:
- DynamoDB nameservice:
dynamodbTable,dynamodbRegion,dynamodbEndpoint,dynamodbTimeoutMs - Storage-backed nameservice:
storage(reference to a Storage node)
Core Patterns
The Graph API
The primary API revolves around fluree.graph(graph_ref), which returns a lazy Graph handle.
No I/O occurs until a terminal method (.execute(), .commit(), .load()) is called.
Use graph(...).query() when the target may be a mapped graph source as well as a native ledger. If the query body itself carries "from" / FROM, use query_from(). The lower-level fluree.db(...) + fluree.query(&view, ...) path is for materialized native ledger snapshots, not graph source aliases.
When I/O happens:
.execute()/.execute_formatted()/.execute_tracked()— loads the graph from storage, then runs the query (each call reloads).commit()— loads the cached ledger handle, stages, and commits.stage()— loads the ledger and stages without committing.load()— loads the graph once, returning aGraphSnapshotfor repeated queries without reloading
#![allow(unused)]
fn main() {
// Lazy query — loads graph and executes in one step
let result = fluree.graph("mydb:main")
.query()
.sparql("SELECT ?name WHERE { ?s <http://schema.org/name> ?name }")
.execute()
.await?;
// Lazy transact + commit
let out = fluree.graph("mydb:main")
.transact()
.insert(&data)
.commit()
.await?;
// Materialize for reuse (avoids reloading on each query)
let db = fluree.graph("mydb:main").load().await?;
let r1 = db.query().sparql("SELECT ...").execute().await?;
let r2 = db.query().jsonld(&q).execute().await?;
// Time travel
let result = fluree.graph_at("mydb:main", TimeSpec::AtT(42))
.query()
.jsonld(&q)
.execute()
.await?;
}
Insert Data
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.create_ledger("mydb").await?;
// Insert JSON-LD data using the Graph API
let data = json!({
"@context": {
"schema": "http://schema.org/",
"ex": "http://example.org/ns/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": "alice@example.org",
"schema:age": 30
},
{
"@id": "ex:bob",
"@type": "schema:Person",
"schema:name": "Bob",
"schema:email": "bob@example.org",
"schema:age": 25
}
]
});
let result = fluree.graph("mydb:main")
.transact()
.insert(&data)
.commit()
.await?;
println!("Transaction committed");
Ok(())
}
Query Data with JSON-LD Query
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.create_ledger("mydb").await?;
// Insert test data first (see Insert Data above)
// ...
// Query with JSON-LD using the lazy Graph API
let query = json!({
"select": ["?name", "?email"],
"where": [
{ "@id": "?person", "@type": "schema:Person" },
{ "@id": "?person", "schema:name": "?name" },
{ "@id": "?person", "schema:email": "?email" },
{ "@id": "?person", "schema:age": "?age" }
],
"filter": "?age > 25"
});
let result = fluree.graph("mydb:main")
.query()
.jsonld(&query)
.execute_formatted()
.await?;
println!("Query results: {}",
serde_json::to_string_pretty(&result)?);
Ok(())
}
Query Data with SPARQL
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.create_ledger("mydb").await?;
// Insert test data first (see Insert Data above)
// ...
// Query with SPARQL using the lazy Graph API
let sparql = r#"
PREFIX schema: <http://schema.org/>
SELECT ?name ?email
WHERE {
?person a schema:Person .
?person schema:name ?name .
?person schema:email ?email .
?person schema:age ?age .
FILTER (?age > 25)
}
ORDER BY ?name
"#;
let result = fluree.graph("mydb:main")
.query()
.sparql(sparql)
.execute_formatted()
.await?;
println!("Results: {}",
serde_json::to_string_pretty(&result)?);
Ok(())
}
Update Data
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.create_ledger("mydb").await?;
// Update using WHERE/DELETE/INSERT pattern
let update = json!({
"@context": { "schema": "http://schema.org/" },
"where": [
{ "@id": "?person", "schema:name": "Alice" },
{ "@id": "?person", "schema:age": "?oldAge" }
],
"delete": [
{ "@id": "?person", "schema:age": "?oldAge" }
],
"insert": [
{ "@id": "?person", "schema:age": 31 }
]
});
let result = fluree.graph("mydb:main")
.transact()
.update(&update)
.commit()
.await?;
println!("Updated successfully");
Ok(())
}
SPARQL UPDATE
Use SPARQL UPDATE syntax for transactions:
use fluree_db_api::{
FlureeBuilder, Result,
parse_sparql, lower_sparql_update, NamespaceRegistry, TxnOpts,
SparqlQueryBody,
};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Get a cached ledger handle
let handle = fluree.ledger_cached("mydb:main").await?;
// SPARQL UPDATE string
let sparql = r#"
PREFIX ex: <http://example.org/ns/>
DELETE {
?person ex:age ?oldAge .
}
INSERT {
?person ex:age 31 .
}
WHERE {
?person ex:name "Alice" .
?person ex:age ?oldAge .
}
"#;
// Parse SPARQL
let parse_output = parse_sparql(sparql);
if parse_output.has_errors() {
// Handle parse errors
for diag in parse_output.diagnostics.iter().filter(|d| d.is_error()) {
eprintln!("Parse error: {}", diag.message);
}
return Err(fluree_db_api::ApiError::Internal("SPARQL parse error".into()));
}
let ast = parse_output.ast.unwrap();
// Extract the UPDATE operation
let update_op = match &ast.body {
SparqlQueryBody::Update(op) => op,
_ => return Err(fluree_db_api::ApiError::Internal("Expected SPARQL UPDATE".into())),
};
// Get namespace registry from the ledger
let snapshot = handle.snapshot().await;
let mut ns = NamespaceRegistry::from_db(&snapshot.snapshot);
// Lower SPARQL UPDATE to Txn IR
let txn = lower_sparql_update(update_op, &ast.prologue, &mut ns, TxnOpts::default())?;
// Execute the transaction
let result = fluree.stage(&handle)
.txn(txn)
.execute()
.await?;
println!("SPARQL UPDATE committed at t={}", result.receipt.t);
Ok(())
}
Supported SPARQL UPDATE operations:
INSERT DATA- Insert ground triplesDELETE DATA- Delete specific triplesDELETE WHERE- Delete matching patternsDELETE/INSERT WHERE- Full update with patterns
See SPARQL UPDATE for syntax details.
Stage and Preview Changes
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.create_ledger("mydb").await?;
let data = json!({
"@context": {"ex": "http://example.org/ns/"},
"@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
});
// Stage without committing
let staged = fluree.graph("mydb:main")
.transact()
.insert(&data)
.stage()
.await?;
// Query the staged state to preview changes
let preview_query = json!({
"select": ["?name"],
"where": [{"@id": "ex:alice", "ex:name": "?name"}]
});
let preview = staged.query()
.jsonld(&preview_query)
.execute()
.await?;
println!("Preview: {} rows", preview.row_count());
Ok(())
}
Note: StagedGraph currently supports querying only. Staging on top of a staged transaction and committing from a StagedGraph are not yet supported.
Export Data
Stream ledger data as Turtle, N-Triples, N-Quads, TriG, or JSON-LD using the builder API:
use fluree_db_api::{FlureeBuilder, Result};
use fluree_db_api::export::ExportFormat;
use std::io::BufWriter;
use std::fs::File;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Export as Turtle to a file
let file = File::create("backup.ttl").unwrap();
let mut writer = BufWriter::new(file);
let stats = fluree.export("mydb")
.format(ExportFormat::Turtle)
.write_to(&mut writer)
.await?;
println!("Exported {} triples", stats.triples_written);
// Export as JSON-LD with custom prefixes
let mut buf = Vec::new();
let stats = fluree.export("mydb")
.format(ExportFormat::JsonLd)
.context(&serde_json::json!({"ex": "http://example.org/"}))
.write_to(&mut buf)
.await?;
// Export all graphs as N-Quads (dataset export)
let stats = fluree.export("mydb")
.format(ExportFormat::NQuads)
.all_graphs()
.to_stdout()
.await?;
Ok(())
}
All formats stream directly from the binary SPOT index. Memory usage is O(leaflet size) for line-oriented formats and O(largest subject) for JSON-LD, regardless of dataset size.
Builder methods:
.format(ExportFormat)— output format (default: Turtle).all_graphs()— include all named graphs including system graphs (requires TriG or NQuads).graph("iri")— export a specific named graph by IRI.as_of(TimeSpec)— time-travel export (transaction number, ISO-8601 datetime, or commit CID prefix).context(&json)— override prefix map (default: ledger’s context from nameservice).write_to(&mut writer)— stream to anyWritesink.to_stdout()— convenience for stdout output
See also: CLI export for command-line usage.
Materialize for Reuse
When you need to run multiple queries against the same snapshot, materialize a GraphSnapshot once:
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// Load once, query many times
let db = fluree.graph("mydb:main").load().await?;
let r1 = db.query()
.sparql("SELECT ?name WHERE { ?s <http://schema.org/name> ?name }")
.execute()
.await?;
let q2 = json!({
"select": ["?email"],
"where": [{"@id": "?s", "schema:email": "?email"}]
});
let r2 = db.query()
.jsonld(&q2)
.execute()
.await?;
// Access the underlying view if needed
let view = db.view();
Ok(())
}
Advanced Usage
Ledger Caching
Ledger caching is enabled by default on all FlureeBuilder constructors. When caching is active, fluree.ledger() returns a cached handle and subsequent calls avoid reloading from storage:
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
// Caching is on by default — no extra call needed
let fluree = FlureeBuilder::file("./data").build()?;
// First call loads from storage
let ledger = fluree.ledger("mydb:main").await?;
// Subsequent calls return cached state (fast)
let ledger2 = fluree.ledger("mydb:main").await?;
Ok(())
}
To disable caching (e.g., for a CLI tool that runs once and exits):
#![allow(unused)]
fn main() {
let fluree = FlureeBuilder::file("./data")
.without_ledger_caching()
.build()?;
}
Disconnecting Ledgers
Use disconnect_ledger to release a ledger from the connection cache. This forces a fresh load on the next access:
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Load and use ledger
let ledger = fluree.ledger("mydb:main").await?;
println!("Ledger at t={}", ledger.t());
// Release cached state
fluree.disconnect_ledger("mydb:main").await;
// Next access will reload from storage
let ledger = fluree.ledger("mydb:main").await?;
Ok(())
}
When to use disconnect_ledger:
- Force fresh load: After external changes to the ledger (e.g., another process wrote data)
- Free memory: Release memory for ledgers you no longer need
- Clean shutdown: Release resources before application exit
- Testing: Reset state between test cases
Note: If caching is disabled (via without_ledger_caching() on builder), disconnect_ledger is a no-op.
Checking Ledger Existence
Use ledger_exists to check if a ledger is registered in the nameservice without loading it:
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Check if ledger exists (lightweight nameservice lookup)
if fluree.ledger_exists("mydb:main").await? {
// Ledger exists - load it
let ledger = fluree.ledger("mydb:main").await?;
println!("Loaded ledger at t={}", ledger.t());
} else {
// Ledger doesn't exist - create it
let ledger = fluree.create_ledger("mydb").await?;
println!("Created new ledger");
}
Ok(())
}
When to use ledger_exists:
- Conditional create-or-load: Check before deciding whether to create or load
- Validation: Verify ledger IDs exist before operations
- Defensive programming: Avoid
NotFounderrors in application logic
Performance note: This is a lightweight check that only queries the nameservice - it does NOT load the ledger data, indexes, or novelty. Much faster than attempting to load and catching NotFound errors.
Dropping Ledgers
Use drop_ledger to permanently remove a ledger:
use fluree_db_api::{FlureeBuilder, DropMode, DropStatus, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Soft drop: retract from nameservice, preserve files
let report = fluree.drop_ledger("mydb:main", DropMode::Soft).await?;
match report.status {
DropStatus::Dropped => println!("Ledger dropped"),
DropStatus::AlreadyRetracted => println!("Already dropped"),
DropStatus::NotFound => println!("Ledger not found"),
}
// Hard drop: delete all files (IRREVERSIBLE)
let report = fluree.drop_ledger("mydb:main", DropMode::Hard).await?;
println!("Deleted {} commit files, {} index files",
report.commit_files_deleted,
report.index_files_deleted);
Ok(())
}
Drop Modes:
| Mode | Behavior | Reversible |
|---|---|---|
DropMode::Soft (default) | Retracts from nameservice only, files remain | Yes |
DropMode::Hard | Retracts + deletes all storage artifacts | No |
Drop Sequence:
- Normalizes the ledger ID (ensures
:mainsuffix) - Cancels any pending background indexing
- Waits for in-progress indexing to complete
- In hard mode: deletes all commit and index files
- Retracts from nameservice
- Disconnects from ledger cache (if caching enabled)
When to use drop_ledger:
- Cleanup: Remove test ledgers or unused data
- Data lifecycle: Permanently delete ledgers that are no longer needed
- Admin operations: Clean up after migrations or failures
Idempotency:
Safe to call multiple times:
- Returns
DropStatus::AlreadyRetractedif previously dropped - Hard mode still attempts deletion for
NotFound/AlreadyRetracted(useful for admin cleanup)
Warnings:
The DropReport includes a warnings field for any non-fatal errors encountered during the operation (e.g., failed to delete a specific file). Always check this for hard drops:
#![allow(unused)]
fn main() {
let report = fluree.drop_ledger("mydb:main", DropMode::Hard).await?;
if !report.warnings.is_empty() {
for warning in &report.warnings {
eprintln!("Warning: {}", warning);
}
}
}
Refreshing Cached Ledgers
Use refresh to poll-check whether a cached ledger is stale and update it if needed.
refresh returns a RefreshResult containing the ledger’s t after the operation
and what action was taken:
use fluree_db_api::{FlureeBuilder, NotifyResult, RefreshOpts, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Load ledger into cache
let _ledger = fluree.ledger_cached("mydb:main").await?;
// Later, check if the cached state is still fresh
match fluree.refresh("mydb:main", Default::default()).await? {
Some(r) => {
println!("Ledger at t={}, action: {:?}", r.t, r.action);
match r.action {
NotifyResult::Current => println!("Already up to date"),
NotifyResult::Reloaded => println!("Reloaded from storage"),
NotifyResult::IndexUpdated => println!("Index was updated"),
NotifyResult::CommitsApplied { count } => {
println!("{count} commits applied incrementally");
}
NotifyResult::NotLoaded => println!("Not in cache"),
}
}
None => println!("Ledger not found in nameservice"),
}
Ok(())
}
Key behaviors:
- Does NOT cold-load: If the ledger isn’t already cached, returns
NotLoaded(no-op) - Returns
None: If the ledger doesn’t exist in the nameservice - Alias resolution: Supports short aliases (
mydbresolves tomydb:main) - No-op without caching: If caching is disabled, returns
NotLoaded - Returns
t: TheRefreshResult.tfield always tells you the ledger’s current transaction time
When to use refresh:
- Poll-based freshness: When you can’t use SSE events but need periodic freshness checks
- Before critical reads: Ensure you have the latest state before important queries
- Peer mode: Check if the local cache is behind the transaction server
refresh vs disconnect_ledger:
| Behavior | refresh | disconnect_ledger |
|---|---|---|
| Checks freshness | Yes | No |
| Updates in place | Yes | No (forces full reload on next access) |
| Handles not-cached | Returns NotLoaded | No-op |
| Use case | Poll-based updates | Force full reload |
Read-After-Write Consistency
Fluree’s query engine is eventually consistent: when one process writes data and
another (or the same process on a warm cache) queries it, the query may not yet see
the latest commit. The t value returned from a transaction is the key to bridging
this gap.
Pass RefreshOpts { min_t: Some(t) } to refresh() to assert that the cached
ledger has reached at least that transaction time. If it hasn’t after pulling the
latest state from the nameservice, refresh returns ApiError::AwaitTNotReached
with both the requested and current t values. Your code owns retry timing
and timeout policy.
Basic usage:
use fluree_db_api::{FlureeBuilder, RefreshOpts, ApiError, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
let handle = fluree.ledger_cached("mydb:main").await?;
// Transaction returns the commit's t value
let receipt = fluree.stage(&handle)
.insert(&json!({"@id": "ex:item", "ex:count": 42}))
.commit()
.await?;
let committed_t = receipt.t;
// Ensure the cache reflects at least this t before querying
let opts = RefreshOpts { min_t: Some(committed_t) };
let result = fluree.refresh("mydb:main", opts).await?;
// result.unwrap().t >= committed_t is guaranteed here
Ok(())
}
Serverless / Lambda pattern (retry with backoff):
In a serverless environment, the transacting process and the querying process may be
different Lambda invocations. The querying invocation receives t (e.g., via an
event payload or API parameter) and must wait for that commit to be visible:
#![allow(unused)]
fn main() {
use fluree_db_api::{RefreshOpts, ApiError};
use std::time::{Duration, Instant};
async fn wait_for_t(
fluree: &Fluree<impl Storage, impl NameService>,
ledger_id: &str,
min_t: i64,
timeout: Duration,
) -> Result<i64, ApiError> {
let deadline = Instant::now() + timeout;
let opts = RefreshOpts { min_t: Some(min_t) };
loop {
match fluree.refresh(ledger_id, opts.clone()).await {
Ok(Some(r)) => return Ok(r.t), // reached min_t
Ok(None) => return Err(ApiError::NotFound(
format!("ledger {ledger_id} not in nameservice"),
)),
Err(ApiError::AwaitTNotReached { current, .. }) => {
if Instant::now() >= deadline {
return Err(ApiError::AwaitTNotReached {
requested: min_t,
current,
});
}
// Back off before retrying
tokio::time::sleep(Duration::from_millis(50)).await;
}
Err(e) => return Err(e),
}
}
}
}
How it works internally:
- Fast path: If the cached
talready satisfiesmin_t, returns immediately without hitting the nameservice at all. - Pull: Queries the nameservice for the latest commit/index pointers and applies any new commits incrementally (or reloads if the gap is large).
- Check: If
tis still belowmin_tafter the pull, returnsApiError::AwaitTNotReachedso you can retry.
This design keeps retry/timeout policy out of the database layer. Different deployment contexts (Lambda with 100ms backoff, HTTP handler with 5s deadline, integration test with immediate assertion) each wrap the same primitive differently.
Branch Diff (Merge Preview)
Fluree::merge_preview returns the rich diff between two branches —
ahead/behind commit summaries, the common ancestor, conflict keys, and
fast-forward eligibility — without mutating any state. It uses the
same primitives as merge_branch but skips the publish/copy steps,
making it cheap enough to call on every UI render.
use fluree_db_api::{FlureeBuilder, MergePreviewOpts, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// ... create ledger, branch, transact on dev, etc.
// Default: previewing dev → main with the spec defaults
// (cap each commit list at 500, conflict keys at 200, run conflicts).
let preview = fluree.merge_preview("mydb", "dev", None).await?;
println!(
"{} ahead, {} behind, fast-forward: {}",
preview.ahead.count, preview.behind.count, preview.fast_forward,
);
if preview.fast_forward {
println!("merge would advance {} → {}", preview.source, preview.target);
} else {
println!("merge has {} conflict(s)", preview.conflicts.count);
for k in &preview.conflicts.keys {
println!(" - s={} p={}", k.s, k.p);
}
}
Ok(())
}
Tuning the preview
merge_preview_with takes a MergePreviewOpts for callers that need
control over response size or want to skip the conflict computation:
use fluree_db_api::{FlureeBuilder, MergePreviewOpts, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// Cheap preview: counts only, no conflict walks.
let counts = fluree
.merge_preview_with(
"mydb",
"dev",
Some("main"),
MergePreviewOpts {
max_commits: Some(0), // counts only — no commit summaries
max_conflict_keys: Some(0),
include_conflicts: false,
},
)
.await?;
// Direct Rust callers can opt in to **unbounded** results — useful for
// tooling that needs the full divergence. The HTTP layer always supplies
// a bound, so this is a Rust-only escape hatch.
let full = fluree
.merge_preview_with(
"mydb",
"dev",
None,
MergePreviewOpts {
max_commits: None,
max_conflict_keys: None,
include_conflicts: true,
},
)
.await?;
Ok(())
}
What the caps do (and don’t) control
max_commits and max_conflict_keys cap the size of the returned
lists, not the cost of computing them:
BranchDelta::counton each side reflects the full unbounded divergence — computed by walking every commit envelope between HEAD and the common ancestor — regardless ofmax_commits.- When
include_conflicts: true, bothcompute_delta_keyswalks scan the full per-side delta regardless ofmax_conflict_keys. - When
include_conflict_details: true, value details are collected only for the returnedconflicts.keysafter themax_conflict_keyscap is applied. - Set
include_conflicts: falsefor a cheap preview on heavily diverged branches; you still get accurateahead.count/behind.count.
Response shape
| Type | Notable fields |
|---|---|
MergePreview | source, target, ancestor: Option<AncestorRef>, ahead, behind, fast_forward, conflicts, mergeable |
BranchDelta | count (unbounded), commits: Vec<CommitSummary> (newest-first, capped), truncated |
CommitSummary | t, commit_id, time, asserts, retracts, flake_count, message: Option<String> (extracted from the f:message txn_meta entry when present) |
ConflictSummary | count (unbounded), keys: Vec<ConflictKey> (sorted, capped), truncated, strategy, details |
ConflictDetail | key, source_values, target_values, resolution (values are the current asserted values at each branch HEAD) |
ConflictKey | s: Sid, p: Sid, g: Option<Sid> |
mergeable only reflects whether the selected strategy would abort due to
detected conflicts; it is not full validation of every constraint the eventual
merge commit may encounter. mergeable=true does not guarantee a subsequent
merge will succeed; it only reflects the conflict/strategy interaction at
preview time.
All types derive Serialize so the response is wire-stable; the HTTP
endpoint at GET /v1/fluree/merge-preview/{ledger...} returns the same struct.
See docs/api/endpoints.md and docs/cli/server-integration.md for the
HTTP contract.
Reusable primitives in fluree-db-core
The per-commit summary types and DAG walker are factored into core for
reuse outside the merge-preview flow (e.g., git-log-style commit history
viewers, indexer integration). Re-exported from fluree-db-api:
walk_commit_summaries(store, head, stop_at_t, max) -> Result<(Vec<CommitSummary>, usize)>— newest-first walk that returns both the (capped) summary list and the unbounded total count.commit_to_summary(commit) -> CommitSummary— pure function, no I/O.find_common_ancestor(store, head_a, head_b)— dual-frontier BFS.
Time Travel Queries
use fluree_db_api::{FlureeBuilder, TimeSpec, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// Query at a specific point in time
let result = fluree.graph_at("mydb:main", TimeSpec::AtT(100))
.query()
.sparql("SELECT * WHERE { ?s ?p ?o } LIMIT 10")
.execute()
.await?;
println!("Results at t=100: {:?}", result.row_count());
Ok(())
}
Multi-Ledger Queries
use fluree_db_api::{FlureeBuilder, DataSetDb, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// Load views from multiple ledgers
let customers = fluree.view("customers:main").await?;
let orders = fluree.view("orders:main").await?;
// Compose a dataset from multiple graphs
let dataset = DataSetDb::new()
.with_default(customers)
.with_named("orders:main", orders);
// Query across ledgers using the dataset builder
let query = r#"
SELECT ?customerName ?orderTotal
WHERE {
?customer schema:name ?customerName .
?customer ex:customerId ?cid .
GRAPH <orders:main> {
?order ex:customerId ?cid .
?order ex:total ?orderTotal .
}
}
"#;
let result = dataset.query(&fluree)
.sparql(query)
.execute()
.await?;
Ok(())
}
Remote Federation
Query ledgers on remote Fluree servers using SPARQL SERVICE with the fluree:remote: scheme. Register remote connections at build time — each maps a name to a server URL and optional bearer token:
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data")
.remote_connection(
"acme",
"https://acme-fluree.example.com",
Some("eyJhbG...".to_string()),
)
.build()?;
let db = fluree.view("local-ledger:main").await?;
// Join local data with a ledger on the remote server
let result = fluree.query(&db, r#"
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?email
WHERE {
?person ex:name ?name .
SERVICE <fluree:remote:acme/customers:main> {
?person ex:email ?email .
}
}
"#).await?;
Ok(())
}
The connection name (acme) maps to the server URL. The ledger path (customers:main) is appended to form the request URL: POST https://acme-fluree.example.com/v1/fluree/query/customers:main. The bearer token is sent as Authorization: Bearer <token> on every request.
Multiple ledgers on the same remote server use the same connection name — you register the server once and can query any ledger your token is authorized for.
See Configuration: Remote connections for details and SPARQL: Remote Fluree Federation for full query syntax.
FROM-Driven Queries (Connection Queries)
When the query body itself specifies which ledgers to target (via "from" in JSON-LD or FROM in SPARQL), use query_from():
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// Query where the "from" is embedded in the query body
let query = json!({
"from": "mydb:main",
"select": ["?name"],
"where": { "@id": "?s", "schema:name": "?name" }
});
let result = fluree.query_from()
.jsonld(&query)
.execute_formatted()
.await?;
// SPARQL with FROM clause
let result = fluree.query_from()
.sparql("SELECT ?name FROM <mydb:main> WHERE { ?s <http://schema.org/name> ?name }")
.execute_formatted()
.await?;
Ok(())
}
Background Indexing
use fluree_db_api::{FlureeBuilder, BackgroundIndexerWorker, Result};
use std::sync::Arc;
use tokio::time::{sleep, Duration};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = Arc::new(FlureeBuilder::file("./data").build()?);
// Start background indexer
let indexer = BackgroundIndexerWorker::new(
fluree.clone(),
Duration::from_secs(5), // Index interval
);
let indexer_handle = indexer.start();
// Application logic
let ledger = fluree.create_ledger("mydb").await?;
// Transactions will be indexed automatically in background
for i in 0..100 {
let txn = json!({
"@context": {"ex": "http://example.org/ns/"},
"@graph": [{"@id": format!("ex:item{}", i), "ex:value": i}]
});
fluree.graph("mydb:main")
.transact()
.insert(&txn)
.commit()
.await?;
}
// Wait for indexing to complete
sleep(Duration::from_secs(10)).await;
// Shutdown indexer
indexer_handle.shutdown().await?;
Ok(())
}
BM25 Full-Text Search
use fluree_db_api::{
FlureeBuilder, Bm25CreateConfig, Bm25FieldConfig, Result
};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.create_ledger("mydb").await?;
// Insert searchable data and create BM25 index
// ...
// Query with full-text search using JSON-LD and the f:graphSource pattern
let search_query = json!({
"@context": {
"schema": "http://schema.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "mydb:main",
"select": ["?product", "?score", "?name"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
},
{ "@id": "?product", "schema:name": "?name" }
],
"orderBy": [["desc", "?score"]],
"limit": 10
});
let result = fluree.query_from()
.jsonld(&search_query)
.execute()
.await?;
println!("Found {} matching products", result.row_count());
Ok(())
}
Configuration
Builder Options
use fluree_db_api::{FlureeBuilder, ConnectionConfig, IndexConfig, Result};
#[tokio::main]
async fn main() -> Result<()> {
let config = ConnectionConfig {
storage_path: "./data".into(),
index_config: IndexConfig {
interval_ms: 5000,
batch_size: 10,
memory_mb: 2048,
threads: 4,
},
..Default::default()
};
let fluree = FlureeBuilder::with_config(config).build()?;
Ok(())
}
Custom Storage Backend
use fluree_db_api::{
FlureeBuilder, Storage, StorageWrite, Result
};
use async_trait::async_trait;
// Implement custom storage
struct MyStorage;
#[async_trait]
impl Storage for MyStorage {
async fn read(&self, address: &str) -> Result<Vec<u8>> {
// Custom implementation
todo!()
}
}
#[async_trait]
impl StorageWrite for MyStorage {
async fn write(&self, address: &str, data: &[u8]) -> Result<()> {
// Custom implementation
todo!()
}
}
#[tokio::main]
async fn main() -> Result<()> {
let storage = MyStorage;
let fluree = FlureeBuilder::custom(storage).build()?;
Ok(())
}
If you need full control over both storage and nameservice (e.g., for proxy mode or custom backends), use build_with():
#![allow(unused)]
fn main() {
let storage = MyStorage;
let nameservice = MyNameService;
let fluree = FlureeBuilder::memory()
.build_with(storage, nameservice);
}
build_with() respects the builder’s caching configuration — caching is on by default, or call .without_ledger_caching() before build_with() to disable it.
Error Handling
use fluree_db_api::{FlureeBuilder, ApiError, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// Create a ledger — handles duplicates gracefully
match fluree.create_ledger("mydb").await {
Ok(ledger) => {
println!("Ledger created at t={}", ledger.t());
}
Err(ApiError::LedgerExists(ledger_id)) => {
println!("Ledger {} already exists, loading...", ledger_id);
let ledger = fluree.ledger("mydb:main").await?;
println!("Loaded at t={}", ledger.t());
}
Err(e) => {
eprintln!("Error: {}", e);
return Err(e);
}
}
Ok(())
}
Testing
Unit Tests
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::test]
async fn test_insert_and_query() -> Result<()> {
// Use memory storage for tests
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.create_ledger("test").await?;
// Insert data
let data = json!({
"@context": {"ex": "http://example.org/ns/"},
"@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
});
fluree.graph("test:main")
.transact()
.insert(&data)
.commit()
.await?;
// Query data
let query = json!({
"select": ["?name"],
"where": [{"@id": "ex:alice", "ex:name": "?name"}]
});
let result = fluree.graph("test:main")
.query()
.jsonld(&query)
.execute()
.await?;
assert_eq!(result.row_count(), 1);
Ok(())
}
}
}
Integration Tests
#![allow(unused)]
fn main() {
// tests/integration_test.rs
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
use tempfile::TempDir;
#[tokio::test]
async fn test_persistence() -> Result<()> {
let temp_dir = TempDir::new()?;
let path = temp_dir.path().to_str().unwrap();
// Create ledger and write data
{
let fluree = FlureeBuilder::file(path).build()?;
let ledger = fluree.create_ledger("test").await?;
let data = json!({"@context": {}, "@graph": [{"@id": "ex:test"}]});
fluree.graph("test:main")
.transact()
.insert(&data)
.commit()
.await?;
}
// Verify persistence by reopening
{
let fluree = FlureeBuilder::file(path).build()?;
let ledger = fluree.ledger("test:main").await?;
assert!(ledger.t() > 0);
}
Ok(())
}
}
Performance Tips
Batch Transactions
#![allow(unused)]
fn main() {
// Good: Batch related changes
let batch_data = json!({
"@graph": [
{"@id": "ex:item1", "ex:value": 1},
{"@id": "ex:item2", "ex:value": 2},
{"@id": "ex:item3", "ex:value": 3}
]
});
let result = fluree.graph("mydb:main")
.transact()
.insert(&batch_data)
.commit()
.await?;
// Bad: Individual transactions (more overhead per commit)
for i in 1..=3 {
let txn = json!({"@graph": [{"@id": format!("ex:item{}", i), "ex:value": i}]});
fluree.graph("mydb:main")
.transact()
.insert(&txn)
.commit()
.await?;
}
}
Use Appropriate Storage
- Memory: Fastest, no persistence (tests, temporary data)
- File: Good balance (single server, local development)
- AWS: Distributed, durable (production, multi-server)
Query Optimization
#![allow(unused)]
fn main() {
// Good: Specific patterns
let query = json!({
"select": ["?name"],
"where": [
{"@id": "ex:alice", "schema:name": "?name"}
]
});
// Bad: Broad patterns
let query = json!({
"select": ["?s", "?p", "?o"],
"where": [
{"@id": "?s", "?p": "?o"}
]
});
}
Enable Query Tracking
use fluree_db_api::{FlureeBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// Use execute_tracked() for fuel/time/policy tracking
let tracked = fluree.graph("mydb:main")
.query()
.sparql("SELECT * WHERE { ?s ?p ?o }")
.execute_tracked()
.await?;
println!("Query used {} fuel", tracked.fuel().unwrap_or(0));
Ok(())
}
Graph API Reference
The Graph API follows a lazy-handle pattern: fluree.graph(graph_ref) returns a lightweight handle, and all I/O is deferred to terminal methods.
Getting a Graph Handle
#![allow(unused)]
fn main() {
// Lazy handle to the current (head) state
let graph = fluree.graph("mydb:main");
// Lazy handle at a specific point in time
let graph = fluree.graph_at("mydb:main", TimeSpec::AtT(100));
}
Querying
#![allow(unused)]
fn main() {
// JSON-LD query (lazy — loads graph at execution time)
let result = fluree.graph("mydb:main")
.query()
.jsonld(&query_json)
.execute().await?;
// SPARQL query
let result = fluree.graph("mydb:main")
.query()
.sparql("SELECT ?s WHERE { ?s a <ex:Person> }")
.execute().await?;
// Formatted output (JSON-LD or SPARQL JSON based on query type)
let json = fluree.graph("mydb:main")
.query()
.jsonld(&query_json)
.execute_formatted().await?;
// Tracked query (fuel/time/policy metrics)
let tracked = fluree.graph("mydb:main")
.query()
.sparql("SELECT * WHERE { ?s ?p ?o }")
.execute_tracked().await?;
}
Materializing a GraphSnapshot
#![allow(unused)]
fn main() {
// Load once, query many times (avoids reloading)
let db = fluree.graph("mydb:main").load().await?;
let r1 = db.query().sparql("...").execute().await?;
let r2 = db.query().jsonld(&q).execute().await?;
// Access the underlying GraphDb
let view = db.view();
}
Transacting
#![allow(unused)]
fn main() {
// Insert and commit
let result = fluree.graph("mydb:main")
.transact()
.insert(&data)
.commit().await?;
// Upsert with options. f:identity is system-controlled (signed DID,
// opts.identity, or CommitOpts::identity). f:message and f:author are
// pure user claims — supply them in the transaction body just like any
// other txn-meta property.
let data = serde_json::json!({
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }],
"f:message": "admin update",
"f:author": "did:admin"
});
let result = fluree.graph("mydb:main")
.transact()
.upsert(&data)
.commit_opts(CommitOpts::default().identity("did:admin"))
.commit().await?;
// Stage without committing (preview changes)
let staged = fluree.graph("mydb:main")
.transact()
.insert(&data)
.stage().await?;
// Query staged state
let preview = staged.query()
.jsonld(&validation_query)
.execute().await?;
}
Commit Inspection
Decode and display the contents of a commit — assertions and retractions with IRIs resolved to compact form. Similar to git show for individual commits.
#![allow(unused)]
fn main() {
// By exact CID
let detail = fluree.graph("mydb:main")
.commit(&commit_id)
.execute().await?;
// By transaction number
let detail = fluree.graph("mydb:main")
.commit_t(5)
.execute().await?;
// By hex-digest prefix (min 6 chars, like abbreviated git hashes)
let detail = fluree.graph("mydb:main")
.commit_prefix("3dd028")
.execute().await?;
// With a custom @context for IRI compaction
let detail = fluree.graph("mydb:main")
.commit_prefix("3dd028")
.context(my_parsed_context)
.execute().await?;
// Access the result
println!("t={}, +{} -{}", detail.t, detail.asserts, detail.retracts);
for flake in &detail.flakes {
let op = if flake.op { "+" } else { "-" };
println!("{} {} {} {} [{}]", op, flake.s, flake.p, flake.o, flake.dt);
}
}
The returned CommitDetail contains:
- Metadata:
id,t,time,size,previous,signer,asserts,retracts context: prefix → IRI map derived from the ledger’s namespace codesflakes: flat list in SPOT order, each with resolved compact IRIs
CommitDetail implements Serialize — flakes serialize as [s, p, o, dt, op] tuples (with an optional 6th metadata element for language tags, list indices, or named graphs).
Terminal Operations
| Method | Returns | Description |
|---|---|---|
.execute() | Result<QueryResult> | Raw query result |
.execute_formatted() | Result<JsonValue> | Formatted JSON output (JSON-LD for .jsonld(), SPARQL JSON for .sparql()) |
.execute_tracked() | Result<TrackedQueryResponse> | Result with fuel/time/policy tracking |
.commit() | Result<TransactResultRef> | Stage + commit transaction |
.stage() | Result<StagedGraph> | Stage without committing |
.load() | Result<GraphSnapshot> | Materialize snapshot for reuse |
Format Override
#![allow(unused)]
fn main() {
use fluree_db_api::FormatterConfig;
// Force JSON-LD format for a SPARQL query
let result = fluree.graph("mydb:main")
.query()
.sparql("SELECT ?name WHERE { ?s <schema:name> ?name }")
.format(FormatterConfig::jsonld())
.execute_formatted()
.await?;
}
Multi-Ledger Queries (Dataset)
For multi-ledger queries, use GraphDb directly:
#![allow(unused)]
fn main() {
let customers = fluree.view("customers:main").await?;
let orders = fluree.view("orders:main").await?;
let dataset = DataSetDb::new()
.with_default(customers)
.with_named("orders:main", orders);
let result = dataset.query(&fluree)
.sparql(query)
.execute().await?;
}
FROM-Driven Queries (Connection Queries)
#![allow(unused)]
fn main() {
let result = fluree.query_from()
.jsonld(&query_with_from)
.execute().await?;
}
Transaction Builder API Reference
There are two transaction builder patterns, each suited for different use cases:
stage(&handle) — Server/Application Pattern (Recommended)
Use stage(&handle) when building servers or applications with ledger caching enabled. The handle is borrowed and updated in-place on successful commit, ensuring concurrent readers see the update.
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
// Caching is on by default (required for stage)
let fluree = FlureeBuilder::file("./data").build()?;
// Get a cached handle
let handle = fluree.ledger_cached("mydb:main").await?;
// Transaction via builder — handle updated in-place
let data = json!({"@graph": [{"@id": "ex:test", "ex:name": "Test"}]});
let result = fluree.stage(&handle)
.insert(&data)
.execute()
.await?;
println!("Committed at t={}", result.receipt.t);
// Handle now reflects the new state
let snapshot = handle.snapshot().await;
assert_eq!(snapshot.t, result.receipt.t);
Ok(())
}
Why use stage(&handle):
- Concurrent safety: Multiple requests share the same handle; updates are atomic
- No ownership dance: You don’t need to track and pass around
LedgerStatevalues - Server-friendly: Matches how the HTTP server handles transactions internally
stage_owned(ledger) — CLI/Script/Test Pattern
Use stage_owned(ledger) when you manage your own LedgerState directly. This is typical for CLI tools, scripts, and tests where you don’t need ledger caching.
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::memory().build_memory();
// You own the ledger state
let ledger = fluree.create_ledger("mydb").await?;
// Transaction consumes ledger, returns updated state
let data = json!({"@graph": [{"@id": "ex:test", "ex:name": "Test"}]});
let result = fluree.stage_owned(ledger)
.insert(&data)
.execute()
.await?;
// Get the updated ledger from the result
let ledger = result.ledger;
println!("Now at t={}", ledger.t());
Ok(())
}
Why use stage_owned(ledger):
- Simple ownership: Good for linear workflows (load → transact → done)
- No caching required: Works even with
without_ledger_caching() - Test-friendly: Each test manages its own state
Choosing Between Them
| Use Case | Pattern | Why |
|---|---|---|
| HTTP server | stage(&handle) | Shared handles, atomic updates |
| Long-running app | stage(&handle) | Concurrent access to same ledger |
| CLI tool | stage_owned(ledger) | Simple, no caching needed |
| Integration test | stage_owned(ledger) | Isolated state per test |
| Script/batch job | stage_owned(ledger) | Linear workflow |
Builder Methods (Both Patterns)
Both stage(&handle) and stage_owned(ledger) return a builder with identical methods:
#![allow(unused)]
fn main() {
let result = fluree.stage(&handle) // or stage_owned(ledger)
.insert(&data) // or .upsert(&data), .update(&data)
.commit_opts(CommitOpts::default().identity("did:admin"))
.execute()
.await?;
// (Include `f:message` / `f:author` directly in `data` for user-claim provenance.)
}
| Method | Description |
|---|---|
.insert(&json) | Insert JSON-LD data |
.upsert(&json) | Upsert JSON-LD data |
.update(&json) | Update with WHERE/DELETE/INSERT |
.insert_turtle(&ttl) | Insert Turtle data |
.upsert_turtle(&ttl) | Upsert Turtle data |
.txn_opts(opts) | Set transaction options (branch, context) |
.commit_opts(opts) | Set commit options (identity, raw_txn) |
.policy(ctx) | Set policy enforcement |
.execute() | Stage + commit |
.stage() | Stage without committing (returns Staged) |
.validate() | Check configuration without executing |
Graph API Transactions
The Graph API (fluree.graph(graph_ref).transact()) is built on top of stage(&handle) internally:
#![allow(unused)]
fn main() {
// Graph API (convenient, uses caching internally)
let result = fluree.graph("mydb:main")
.transact()
.insert(&data)
.commit()
.await?;
// Equivalent to:
let handle = fluree.ledger_cached("mydb:main").await?;
let result = fluree.stage(&handle)
.insert(&data)
.execute()
.await?;
}
Ledger Info API
Get comprehensive metadata about a ledger using the ledger_info() builder:
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Get ledger info with optional context for IRI compaction
let context = json!({
"schema": "http://schema.org/",
"ex": "http://example.org/ns/"
});
let info = fluree
.ledger_info("mydb:main")
.with_context(&context)
// Optional: include datatype breakdowns under stats.properties[*]
// .with_property_datatypes(true)
// Optional: make property datatype details novelty-aware (real-time)
// .with_realtime_property_details(true)
.execute()
.await?;
// Access metadata sections
println!("Commit: {}", info["commit"]);
println!("Nameservice: {}", info["nameservice"]);
println!("Namespace codes: {}", info["namespace-codes"]);
println!("Stats: {}", info["stats"]);
println!("Index: {}", info["index"]);
Ok(())
}
Ledger Info Response
The response includes:
| Section | Description |
|---|---|
commit | Commit info in JSON-LD format |
nameservice | NsRecord in JSON-LD format |
namespace-codes | Inverted mapping (prefix → code) for IRI expansion |
stats | Flake counts, size, property/class statistics with selectivity |
index | Index metadata (t, ContentId, index ID) |
Stats freshness (real-time vs indexed)
The stats section now uses layered runtime stats assembly:
- Default
ledger_info()uses the full novelty-aware path, including lookup-backed class/ref enrichment. with_realtime_property_details(false)downgrades to the lighter fast novelty-aware merge (Indexed+ novelty deltas, no extra lookups).- HLL / NDV fields remain index-derived, so they are omitted by default and only included via
with_property_estimates(true).
That means the payload still mixes real-time values (indexed + novelty deltas) with values that are only available as-of the last index.
-
Real-time (includes novelty):
stats.flakes,stats.sizestats.properties[*].count(but not NDV)stats.properties[*].datatypesby defaultstats.classes[*].countstats.classes[*].property-listandstats.classes[*].properties(property presence)stats.classes[*].properties[*].refsby default
-
As-of last index:
stats.indexed(the index (t))stats.properties[*].ndv-values,stats.properties[*].ndv-subjectswhen explicitly included viawith_property_estimates(true)- Any selectivity derived from NDV values
stats.classes[*].properties[*].refsonly when callers explicitly disable full detail withwith_realtime_property_details(false)
Nameservice Query API
Query metadata about all ledgers and graph sources using the nameservice_query() builder:
use fluree_db_api::{FlureeBuilder, Result};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<()> {
let fluree = FlureeBuilder::file("./data").build()?;
// Find all ledgers on main branch
let query = json!({
"@context": {"f": "https://ns.flur.ee/db#"},
"select": ["?ledger", "?t"],
"where": [{"@id": "?ns", "@type": "f:LedgerSource", "f:ledger": "?ledger", "f:branch": "main", "f:t": "?t"}],
"orderBy": [{"var": "?t", "desc": true}]
});
let results = fluree.nameservice_query()
.jsonld(&query)
.execute_formatted()
.await?;
println!("Ledgers: {}", serde_json::to_string_pretty(&results)?);
// SPARQL query
let results = fluree.nameservice_query()
.sparql("PREFIX f: <https://ns.flur.ee/db#>
SELECT ?ledger ?t WHERE { ?ns a f:LedgerSource ; f:ledger ?ledger ; f:t ?t }")
.execute_formatted()
.await?;
println!("SPARQL results: {}", serde_json::to_string_pretty(&results)?);
// Convenience method (equivalent to builder with defaults)
let results = fluree.query_nameservice(&query).await?;
Ok(())
}
Available Properties
Ledger Records (@type: "f:LedgerSource"):
| Property | Description |
|---|---|
f:ledger | Ledger name (without branch suffix) |
f:branch | Branch name |
f:t | Current transaction number |
f:status | Status: “ready” or “retracted” |
f:ledgerCommit | Reference to latest commit ContentId |
f:ledgerIndex | Index info with @id and f:t |
Graph Source Records (@type: "f:GraphSourceDatabase"):
| Property | Description |
|---|---|
f:name | Graph source name |
f:branch | Branch name |
f:config | Configuration JSON |
f:dependencies | Source ledger dependencies |
f:indexAddress | Index ContentId |
f:indexT | Index transaction number |
Builder Methods
| Method | Description |
|---|---|
.jsonld(&query) | Set JSON-LD query input |
.sparql(query) | Set SPARQL query input |
.format(config) | Override output format |
.execute_formatted() | Execute and return formatted JSON |
.execute() | Execute with default formatting |
.validate() | Validate without executing |
Example Queries
#![allow(unused)]
fn main() {
// Find ledgers with t > 100
let query = json!({
"@context": {"f": "https://ns.flur.ee/db#"},
"select": ["?ledger", "?t"],
"where": [{"@id": "?ns", "f:ledger": "?ledger", "f:t": "?t"}],
"filter": ["(> ?t 100)"]
});
// Find all BM25 graph sources
let query = json!({
"@context": {"f": "https://ns.flur.ee/db#"},
"select": ["?name", "?deps"],
"where": [{"@id": "?gs", "@type": "f:Bm25Index", "f:name": "?name", "f:dependencies": "?deps"}]
});
}
Examples
See complete examples in fluree-db-api/examples/:
benchmark_aj_query_1.rs- Basic query patternsbenchmark_aj_query_2.rs- Complex queriesbenchmark_aj_query_3.rs- Aggregationsbenchmark_aj_query_4.rs- Time travel queries
Run examples:
cargo run --example benchmark_aj_query_1 --release
API Reference
For detailed API documentation, see:
cargo doc --open -p fluree-db-api
Related Documentation
- Getting Started - Overview
- HTTP API - Server-based usage
- Distributed Tracing Integration - Correlating your app’s traces with Fluree
- Query - Query documentation
- Transactions - Write operations
- Crate Map - Architecture overview
- Dev Setup - Development guide
Concepts
Fluree is a graph database that stores and queries data using RDF (Resource Description Framework) semantics. This section explains the core concepts that make Fluree unique and powerful, with special emphasis on the features that differentiate Fluree from other graph databases.
Recommended Reading Order
These concepts build on each other. If you’re new to Fluree, read them in this order:
Foundations (read these first):
- IRIs, Namespaces, and JSON-LD @context — How Fluree identifies everything
- Datatypes and Typed Values — Fluree’s type system
- Ledgers and the Nameservice — The core unit of data storage
Core capabilities (read next):
- Time Travel — Query any point in history
- Branching — Git-like branch, merge, and rebase for your data
- Datasets and Named Graphs — Partition and query across graphs
Differentiating features (read as needed):
- Graph Sources — Integrated search and external data
- Policy Enforcement — Fine-grained access control
- Verifiable Data — Cryptographic signatures and trust
- Reasoning and Inference — Derive facts from ontology rules
If you’re coming from a SQL/relational background, start with Fluree for SQL Developers before diving into the concepts above.
Core Concepts
IRIs, Namespaces, and JSON-LD @context
Understand how Fluree uses Internationalized Resource Identifiers (IRIs) for all data identifiers, how namespaces provide convenient shorthand notation, and how JSON-LD @context enables compact, readable data exchange.
Datatypes and Typed Values
Explore Fluree’s type system, including support for XSD datatypes (strings, numbers, dates, booleans), RDF datatypes, and how all literal values are strongly typed.
Ledgers and the Nameservice
Learn about ledgers (Fluree’s equivalent of databases), how they’re organized with aliases like mydb:main, and how the nameservice provides discovery and metadata management across distributed deployments.
Time Travel
Differentiator: Discover Fluree’s temporal database capabilities, including transaction-time versioning, historical queries, and the ability to query data “as of” any previous transaction. Every change is preserved, enabling complete audit trails and historical analysis.
Datasets and Named Graphs
Learn about SPARQL datasets, named graphs, and how Fluree supports multi-graph queries across different data sources and time periods.
Graph Sources
Differentiator: Fluree’s graph source system enables seamless integration of specialized indexes and external data sources. Built-in BM25 full-text search, vector similarity search (ANN), Apache Iceberg integration, and R2RML relational mappings extend Fluree’s query capabilities beyond traditional graph queries.
Policy Enforcement
Differentiator: Fluree’s policy system provides fine-grained, data-level access control. Policies are enforced at query time, ensuring users only see data they’re authorized to access. This enables secure multi-tenant deployments and compliance with data privacy regulations.
Verifiable Data
Differentiator: Fluree supports cryptographically signed transactions using JWS (JSON Web Signatures) and Verifiable Credentials. Every transaction can be cryptographically verified, providing tamper-proof audit trails and enabling trustless data exchange.
Reasoning and Inference
Fluree’s built-in reasoning engine derives new facts from ontology declarations (RDFS, OWL) and user-defined Datalog rules. Query for a superclass and get all subclass instances automatically.
Architecture Overview
Fluree combines several architectural concepts:
- Triple Store: All data is stored as RDF triples (subject-predicate-object)
- Temporal Database: Every transaction is timestamped, enabling complete historical access
- Multi-Graph Support: Data can be partitioned across named graphs
- JSON-LD Integration: Native support for JSON-LD with full IRI expansion/compaction
- SPARQL & JSON-LD Query: Support for both SPARQL and Fluree’s native JSON-LD Query language
Key Differentiators
What makes Fluree unique:
- Built-in Full-Text Search: BM25 indexing is integrated directly into the database, not a separate system
- Vector Similarity Search: Native support for approximate nearest neighbor (ANN) queries via embedded HNSW indexes or remote search service
- Apache Iceberg Integration: Query data lake formats directly as graph sources
- Complete Time Travel: Every transaction is preserved with full historical query capabilities
- Data-Level Policy Enforcement: Fine-grained access control enforced at query time, not application level
- Cryptographically Verifiable: Transactions can be signed and verified using industry-standard formats (JWS/VC)
These concepts work together to provide a powerful, standards-compliant graph database with temporal capabilities, integrated search, and enterprise-grade security features.
Ledgers and the Nameservice
Ledgers are Fluree’s fundamental unit of data organization—similar to databases in traditional RDBMS systems. The nameservice is the metadata registry that enables ledger discovery, coordination, and management across distributed deployments.
Ledgers
A ledger in Fluree is an independent, versioned graph database containing:
- A complete graph of RDF triples
- Complete transaction history with temporal versioning
- Independent indexing and storage
- Configurable permissions and policies
- Support for multiple branches
Ledger IDs
Ledgers are identified by ledger IDs with the format ledger-name:branch.
A ledger ID serves as both a human-readable identifier and the canonical lookup key used across APIs, CLI, and caching.
Examples:
mydb:main- Primary branch of the “mydb” ledgercustomers:dev- Development branch of the “customers” ledgerinventory:prod- Production branch of the “inventory” ledgertenant/app:feature-x- Feature branch with hierarchical naming
Branch Semantics:
- The
:branchsuffix allows multiple isolated versions of the same logical ledger to coexist - The default branch name is
mainwhen not specified (e.g.,mydbis equivalent tomydb:main) - Branches are independent—changes in one branch don’t affect others
- Branch names can include slashes for hierarchical organization
Ledger Lifecycle
Ledgers are created implicitly through the first transaction and persist until explicitly retracted. Each ledger maintains:
- Transaction History: Every change is recorded as a transaction with a unique timestamp (
t) - Current State: The latest indexed state of all data
- Novelty Layer: Uncommitted transactions since the last index
- Metadata: Creation time, latest commit, indexing status
Creation Flow:
- First transaction to a ledger ID creates the ledger automatically
- Transaction is committed and assigned a transaction time (
t) - Commit ID is published to the nameservice
- Background indexing process creates queryable indexes
- Index ID is published to the nameservice when complete
Retraction:
Ledgers can be marked as retracted (soft delete), which:
- Marks the ledger as inactive in the nameservice
- Preserves all historical data
- Prevents new transactions (but allows historical queries)
- Can be reversed if needed
The Nameservice
The nameservice is Fluree’s metadata registry that enables ledger discovery and coordination. It acts as a directory service, tracking where ledger data is stored and what state each ledger is in.
Purpose and Role
The nameservice provides:
- Discovery: Find ledgers by ledger ID across distributed deployments
- Coordination: Track commit and index state for consistency
- Metadata Management: Store ledger configuration and status
- Multi-Process Support: Enable coordination across multiple Fluree instances
What the Nameservice Stores
For each ledger, the nameservice maintains a nameservice record (NsRecord) containing:
Core Identifiers
id: Canonical ledger ID with branch (e.g.,"mydb:main")name: Ledger name without branch suffix (e.g.,"mydb")branch: Branch name (e.g.,"main")
Commit State
commit_id: ContentId (CIDv1) of the latest commitcommit_t: Transaction time of the latest commit
The commit represents the most recent transaction that has been persisted. Commits are published immediately after each successful transaction. The commit_id is a content-addressed identifier derived from the commit’s bytes — it is storage-agnostic and does not depend on where the commit is physically stored.
Index State
index_id: ContentId (CIDv1) of the latest index rootindex_t: Transaction time of the latest index
The index represents a queryable snapshot of the ledger state. Indexes are created by background processes and may lag behind commits. Like commits, the index_id is a content-addressed identifier.
Branch Metadata
source_branch: For branches created viacreate_branch, records the name of the source branch (e.g.,"main").Nonefor the initial branch.
The divergence point (common ancestor) between a branch and its source is computed on demand by walking the commit chains rather than being stored. This avoids stale metadata and supports merge scenarios where the relationship between branches changes over time.
Additional Metadata
default_context_id: ContentId of the default JSON-LD @context for the ledgerretracted: Whether the ledger has been marked as inactive
Commit vs Index: Understanding the Difference
This distinction is crucial for understanding Fluree’s architecture:
Commits (commit_t):
- Created immediately after each transaction
- Represent the transaction log (what changed)
- Small, append-only files
- Published synchronously
- Always up-to-date with latest transactions
Indexes (index_t):
- Created by background indexing processes
- Represent queryable database snapshots (complete state)
- Large, optimized data structures
- Published asynchronously
- May lag behind commits (this gap is the “novelty layer”)
Example Timeline:
t=1: Transaction committed → commit_t=1, index_t=0
t=2: Transaction committed → commit_t=2, index_t=0
t=3: Transaction committed → commit_t=3, index_t=0
[Background indexing completes] → index_t=3
t=4: Transaction committed → commit_t=4, index_t=3
t=5: Transaction committed → commit_t=5, index_t=3
[Novelty layer: t=4, t=5 not yet indexed]
Queries combine the indexed state (up to index_t) with the novelty layer (transactions between index_t and commit_t) to provide real-time results.
Nameservice Operations
The nameservice supports these key operations:
Lookup
Find ledger metadata by ledger ID:
#![allow(unused)]
fn main() {
// Pseudo-code
let record = nameservice.lookup("mydb:main").await?;
// Returns: NsRecord with commit_id, index_id, timestamps, etc.
}
Publishing
Record new commits and indexes:
RefPublisher::compare_and_set_ref()/fast_forward_commit(): Advance the commit head with explicit CAS conflict handlingpublish_index(ledger_id, index_id, index_t): Update index state (monotonic: only ifnew_t > existing_t)
Commit-head publishing is CAS-based so concurrent writers get an explicit conflict result instead of a silent no-op. Index publishing remains monotonic and only accepts updates that advance time forward.
Branching
Create and list branches:
create_branch(ledger_name, new_branch, source_branch, at_commit): Create a new branch from the source. Whenat_commitisNone, the branch starts at the source’s current HEAD; whenSome((commit_id, commit_t)), the branch starts at the supplied historical commit instead (callers are expected to verify reachability from source HEAD before passing it in).list_branches(ledger_name): List all non-retracted branches for a ledger
Discovery
List all available ledgers:
#![allow(unused)]
fn main() {
// Pseudo-code
let all_ledgers = nameservice.all_records().await?;
// Returns: Vec<NsRecord> for all known ledgers
}
Querying the Nameservice
The nameservice can be queried using standard JSON-LD query or SPARQL syntax. This enables powerful ledger discovery, filtering, and metadata analysis across all managed databases.
Rust API (Builder Pattern)
#![allow(unused)]
fn main() {
// Find all ledgers on main branch
let query = json!({
"@context": {"f": "https://ns.flur.ee/db#"},
"select": ["?ledger"],
"where": [{"@id": "?ns", "f:ledger": "?ledger", "f:branch": "main"}]
});
let results = fluree.nameservice_query()
.jsonld(&query)
.execute_formatted()
.await?;
// Query with SPARQL
let results = fluree.nameservice_query()
.sparql("PREFIX f: <https://ns.flur.ee/db#>
SELECT ?ledger ?t WHERE {
?ns a f:LedgerSource ;
f:ledger ?ledger ;
f:t ?t
}")
.execute_formatted()
.await?;
// Convenience method (equivalent to builder with defaults)
let results = fluree.query_nameservice(&query).await?;
}
HTTP API
# List ledgers and graph sources from the nameservice
curl http://localhost:8090/v1/fluree/ledgers
Available Properties
Ledger Records (@type: "f:LedgerSource"):
| Property | Description |
|---|---|
f:ledger | Ledger name (without branch suffix) |
f:branch | Branch name (e.g., “main”, “dev”) |
f:t | Current transaction number |
f:status | Status: “ready” or “retracted” |
f:ledgerCommit | Reference to latest commit ContentId |
f:ledgerIndex | Index info object with @id (ContentId) and f:t |
f:sourceBranch | Source branch name (e.g., "main") if this is a branched ledger |
f:defaultContextCid | Default JSON-LD context ContentId (if set) |
Graph Source Records (@type: "f:GraphSourceDatabase"):
| Property | Description |
|---|---|
f:name | Graph source name |
f:branch | Branch name |
f:status | Status: “ready” or “retracted” |
f:config | Configuration JSON |
f:dependencies | Array of source ledger dependencies |
f:indexId | Index ContentId |
f:indexT | Index transaction number |
Example Queries
Find all ledgers with t > 100:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"select": ["?ledger", "?t"],
"where": [
{"@id": "?ns", "f:ledger": "?ledger", "f:t": "?t"}
],
"filter": ["(> ?t 100)"]
}
Find ledgers by name pattern (hierarchical):
{
"@context": {"f": "https://ns.flur.ee/db#"},
"select": ["?ledger", "?branch"],
"where": [
{"@id": "?ns", "f:ledger": "?ledger", "f:branch": "?branch"}
],
"filter": ["(strStarts ?ledger \"tenant1/\")"]
}
Find all BM25 graph sources:
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"select": ["?name", "?deps"],
"where": [
{"@id": "?gs", "@type": "f:Bm25Index", "f:name": "?name", "f:dependencies": "?deps"}
]
}
Retraction
Mark ledgers as inactive:
#![allow(unused)]
fn main() {
// Pseudo-code
nameservice.retract("mydb:old-branch").await?;
// Sets retracted=true, prevents new transactions
}
Storage Backends
The nameservice can be backed by various storage systems, each suited for different deployment scenarios:
File System (FileNameService)
- Use Case: Single-server deployments, development, testing
- Storage: Files in
ns@v2/directory structure - Format: JSON files per ledger (
{ledger}/{branch}.json) - Characteristics: Simple, local, no external dependencies
AWS S3 (StorageNameService)
- Use Case: Distributed deployments using S3 for both data and metadata
- Storage: S3 objects with ETag-based compare-and-swap (CAS)
- Characteristics: Scalable, distributed, requires AWS credentials
AWS DynamoDB (DynamoDbNameService)
- Use Case: Distributed deployments needing low-latency metadata coordination
- Storage: DynamoDB table with composite-key layout (one item per concern)
- Format: Separate items for
meta,head,index,config,statusper ledger/graph source - Characteristics: Single-digit millisecond latency, per-concern write independence, conditional expressions for monotonic updates
- See DynamoDB Nameservice Guide for setup and schema details
Memory (MemoryNameService)
- Use Case: Testing, in-process applications
- Storage: In-memory data structures
- Format: No persistence
- Characteristics: Fast, ephemeral, process-local
Graph Sources
The nameservice also tracks graph sources—specialized indexes and integrations:
- BM25: Full-text search indexes
- Vector: Vector similarity search
- R2RML: Relational database mappings
- Iceberg: Apache Iceberg table integrations
Graph sources have their own nameservice records (GraphSourceRecord) with similar metadata but different semantics. See the Graph Sources documentation for details.
Example Usage
Creating a Ledger
Ledgers are created automatically on the first transaction. Specify the ledger ID in your transaction:
POST /insert?ledger=mydb:main
Content-Type: application/json
{
"@context": {
"ex": "http://example.org/ns/",
"foaf": "http://xmlns.com/foaf/0.1/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "foaf:Person",
"foaf:name": "Alice"
}
]
}
What Happens:
- Transaction is processed and committed (assigned
t=1) - Commit is stored and its ContentId published to nameservice
- Nameservice record created/updated with
commit_t=1 - Background indexing begins
- When indexing completes,
index_t=1is published
Querying a Ledger
Specify the ledger ID in your query:
SPARQL:
PREFIX ex: <http://example.org/ns/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
FROM <mydb:main>
WHERE {
ex:alice foaf:name ?name
}
The FROM <mydb:main> clause specifies which ledger to query. The query engine:
- Looks up
mydb:mainin the nameservice - Retrieves the index ContentId for efficient querying
- Combines indexed data with novelty layer for current results
JSON-LD Query:
{
"@context": {
"ex": "http://example.org/ns/",
"foaf": "http://xmlns.com/foaf/0.1/"
},
"select": ["?name"],
"from": "mydb:main",
"where": [
{ "@id": "ex:alice", "foaf:name": "?name" }
]
}
Checking Ledger Status
Query the nameservice to check ledger state:
#![allow(unused)]
fn main() {
// Pseudo-code
let record = nameservice.lookup("mydb:main").await?;
if let Some(record) = record {
println!("Latest commit: t={}", record.commit_t);
println!("Latest index: t={}", record.index_t);
if record.has_novelty() {
println!("Novelty layer: {} transactions pending index",
record.commit_t - record.index_t);
}
if record.retracted {
println!("Ledger is retracted (inactive)");
}
}
}
Branching
Branches let you create isolated copies of a ledger’s state for independent development. After branching, transactions on one branch are invisible to the other.
Creating a Branch
Branches are created from a source branch (default: main). The new branch starts at the same transaction time as the source:
mydb:main (t=5)
└── create_branch("mydb", "dev")
mydb:dev (t=5) # starts with same data as main at t=5
Branches can also be nested — you can branch from a branch:
mydb:main (t=5)
└── mydb:dev (t=7) # branched from main at t=5, then advanced
└── mydb:feature (t=8) # branched from dev at t=7, then advanced
Data Isolation
After branching, each branch has its own independent transaction history:
mydb:main → t=5 (shared) → t=6: insert Bob → t=7: insert Dave
mydb:dev → t=5 (shared) → t=6: insert Carol
Querying main returns Alice + Bob + Dave. Querying dev returns Alice + Carol. Bob and Dave never appear on dev; Carol never appears on main.
Storage Model
Branches share storage efficiently through a BranchedContentStore — a recursive content store that reads from the branch’s own namespace first, then falls back to parent namespaces for pre-branch-point content.
- Commits are not copied — historical commits are read from the source namespace via fallback
- Index files are copied — protects the branch from garbage collection on the source after reindexing
- String dictionaries are globally shared — stored in a per-ledger
@sharednamespace (e.g.,mydb/@shared/dicts/) rather than per-branch paths, so all branches read and write to the same location without copying or fallback. The@prefix cannot collide with branch names. See Storage Traits — Global Dictionary Storage for details.
Each branch is a fully independent LedgerState with its own snapshot, novelty layer, commit chain, storage namespace, and t sequence.
Nameservice Metadata
When a branch is created, the nameservice records the source branch name on the new branch’s NsRecord (e.g., source_branch: Some("main")). The divergence point between the branch and its source is computed on demand by walking the commit chains rather than being stored as a static snapshot.
This metadata enables the system to reconstruct the BranchedContentStore tree when loading a branch. For nested branches, the ancestry chain is walked recursively via source_branch lookups.
API
Rust:
#![allow(unused)]
fn main() {
// Create a branch from main (default)
let record = fluree.create_branch("mydb", "dev", None).await?;
// Create a branch from another branch
let record = fluree.create_branch("mydb", "feature", Some("dev")).await?;
// List all branches
let branches = fluree.list_branches("mydb").await?;
}
HTTP:
# Create branch
curl -X POST http://localhost:8090/v1/fluree/branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "dev"}'
# List branches
curl http://localhost:8090/v1/fluree/branch/mydb
CLI:
# Create branch
fluree branch create dev --ledger mydb
# Create branch from another branch
fluree branch create feature-x --from dev --ledger mydb
# List branches
fluree branch list --ledger mydb
Dropping a Branch
Branches can be deleted with drop_branch. The main branch cannot be dropped.
Branches use reference counting (branches field on NsRecord) to track child branches. This enables safe deletion:
- Leaf branch (no children,
branches == 0): Fully dropped — storage artifacts are deleted, the NsRecord is purged, and the parent’s child count is decremented. If the parent was previously retracted and its count reaches 0, it is cascade-dropped. - Branch with children (
branches > 0): Retracted (hidden from listings, transactions rejected) but storage is preserved so children can still read parent data viaBranchedContentStorefallback. When the last child is dropped and the count reaches 0, the retracted branch is automatically cascade-purged.
Rust API:
#![allow(unused)]
fn main() {
// Drop a leaf branch
let report = fluree.drop_branch("mydb", "dev").await?;
// report.deferred == false for leaf branches
// report.deferred == true for branches with children
// report.cascaded contains any ancestor branches that were cascade-dropped
}
HTTP API:
curl -X POST http://localhost:8090/v1/fluree/drop-branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "dev"}'
CLI:
fluree branch drop dev --ledger mydb
See POST /branch, GET /branch/{ledger-name}, and POST /drop-branch for full endpoint details.
Rebasing a Branch
After a branch diverges from its source, you can rebase it to replay its unique commits on top of the source branch’s current HEAD. This brings the branch up to date with upstream changes without merging.
Rebase detects conflicts when both the branch and source have modified the same (subject, predicate, graph) tuples. Five conflict resolution strategies are available:
| Strategy | Behavior |
|---|---|
take-both (default) | Replay as-is, both values coexist (multi-cardinality) |
abort | Fail on first conflict, no changes applied |
take-source | Drop branch’s conflicting flakes (source wins) |
take-branch | Keep branch’s flakes, retract source’s conflicting values |
skip | Skip entire commit if any flakes conflict |
If the branch has no unique commits, rebase performs a fast-forward: it simply updates the branch point to the source’s current HEAD without replaying anything.
Rust API:
#![allow(unused)]
fn main() {
use fluree_db_api::ConflictStrategy;
let report = fluree.rebase_branch("mydb", "dev", ConflictStrategy::TakeBoth).await?;
// report.replayed — number of commits successfully replayed
// report.conflicts — conflicts detected and resolved
// report.fast_forward — true if no branch commits to replay
}
HTTP API:
curl -X POST http://localhost:8090/v1/fluree/rebase \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "dev", "strategy": "take-both"}'
CLI:
fluree branch rebase dev --ledger mydb --strategy take-both
See POST /rebase for full endpoint details.
Architecture Deep Dive
Ledger State Composition
Each ledger combines two layers for query execution:
1. Indexed Database
- What: Persisted, optimized snapshot of ledger state
- When: Created by background indexing processes
- Storage: Large, read-optimized data structures
- Query Performance: Fast, efficient for historical queries
- Update Frequency: Asynchronous, may lag behind commits
2. Novelty Overlay
- What: In-memory representation of uncommitted transactions
- When: Transactions between
index_tandcommit_t - Storage: Transaction log entries
- Query Performance: Slower, requires transaction replay
- Update Frequency: Real-time, always current
Query Execution Model:
Query Result = Indexed Database (up to t=index_t)
+ Novelty Overlay (t=index_t+1 to commit_t)
This architecture provides:
- Fast historical queries: Use appropriate index snapshot
- Real-time current queries: Include latest transactions via novelty
- Efficient background indexing: Doesn’t block new writes
- Consistent snapshots: Each query sees a consistent state
Concurrency Control
The nameservice ensures consistency through several mechanisms:
Ref Publishing
- Commits:
RefPublisheruses compare-and-set semantics on the current head identity plus a monotonictguard - Indexes:
publish_index()only acceptsnew_index_t > existing_index_t - Guarantee: Writers either advance the head or receive an explicit conflict outcome
Optimistic Concurrency
- CAS Operations: Storage-backed nameservices use compare-and-swap (ETags)
- Conflict Handling: Retry on conflicts (expected under contention)
- Atomic Updates: Metadata updates are atomic per ledger
Consistency Guarantees
- Read Consistency: All readers see the same nameservice state
- Write Consistency: Monotonic updates prevent time-travel inconsistencies
- Eventual Consistency: In distributed deployments, updates propagate eventually
Distributed Coordination
The nameservice enables coordination across distributed deployments:
Multi-Process Coordination
- Shared State: Nameservice provides shared view of ledger state
- Process Discovery: Processes can discover ledgers created by other processes
- State Synchronization: Commit/index state visible to all processes
Geographic Distribution
- Storage Backends: S3/DynamoDB enable cross-region coordination
- Replication: Storage backends handle replication
- Consistency: Eventual consistency with monotonic guarantees
Scalability Patterns
- Horizontal Scaling: Multiple Fluree instances can share nameservice
- Load Distribution: Queries can be distributed across instances
- Storage Distribution: Ledger data can be stored across multiple backends
Nameservice Record Lifecycle
Understanding how records evolve:
1. Initialization
- publish_ledger_init("mydb:main")
- Creates record with commit_t=0, index_t=0
2. First Transaction
- Transaction committed at t=1
- Commit head advanced via `RefPublisher` CAS to `(commit_cid_1, 1)`
- Record: commit_t=1, index_t=0
3. Indexing Completes
- Index created for t=1
- publish_index("mydb:main", index_cid_1, 1)
- Record: commit_t=1, index_t=1
4. More Transactions
- Transactions at t=2, t=3, t=4
- Commit head advanced via CAS for each
- Record: commit_t=4, index_t=1 (novelty: t=2,3,4)
5. Next Index
- Index created for t=4
- publish_index("mydb:main", index_cid_2, 4)
- Record: commit_t=4, index_t=4 (no novelty)
Best Practices
Ledger Naming
-
Use Descriptive Names: Choose names that clearly indicate purpose
- Good:
customers:main,inventory:prod,analytics:warehouse - Bad:
db1:main,test:main,data:main
- Good:
-
Hierarchical Organization: Use slashes for logical grouping
- Good:
tenant/app:main,tenant/app:dev - Good:
department/project:branch
- Good:
-
Branch Naming Conventions: Establish consistent branch naming
- Good:
feature/authentication,bugfix/login-error - Good:
release/v1.2.0,hotfix/security-patch
- Good:
Nameservice Configuration
-
Choose Appropriate Backend: Match backend to deployment needs
- Development: File system
- Single server: File system
- Distributed/Cloud: S3/DynamoDB
-
Monitor Novelty Layer: Track gap between commits and indexes
- Large gaps indicate indexing lag
- May need to tune indexing frequency or resources
-
Handle Retraction Carefully: Retracted ledgers preserve history
- Use for soft deletes, not hard deletes
- Historical queries still work on retracted ledgers
Performance Considerations
-
Index Frequency: Balance indexing frequency with query needs
- More frequent indexing: Better query performance, more storage
- Less frequent indexing: Lower overhead, larger novelty layer
-
Query Patterns: Understand your query patterns
- Historical queries: Benefit from frequent indexing
- Current-only queries: Can tolerate larger novelty layer
-
Storage Planning: Plan for index storage growth
- Each index is a complete snapshot
- Historical indexes accumulate over time
- Consider retention policies for old indexes
Operational Guidelines
-
Monitor Nameservice Health: Track nameservice operations
- Lookup latency
- Publish success rates
- Storage backend health
-
Backup Strategy: Include nameservice in backup plans
- File-based: Backup
ns@v2/directory - Storage-based: Use backend backup mechanisms
- File-based: Backup
-
Error Handling: Handle nameservice errors gracefully
- Lookup failures: May indicate ledger doesn’t exist
- Publish failures: May indicate contention (retry)
- Storage errors: May indicate backend issues
Troubleshooting
Ledger Not Found
Symptom: Query fails with “ledger not found”
Possible Causes:
- Ledger ID misspelled
- Ledger not yet created (no transactions yet)
- Ledger retracted
- Nameservice backend misconfigured
Solutions:
- Verify ledger ID spelling and format
- Check if ledger exists:
nameservice.lookup(ledger_id) - Verify nameservice backend configuration
- Check ledger status (retracted?)
Stale Query Results
Symptom: Queries don’t see latest transactions
Possible Causes:
- Novelty layer not being applied
- Index lagging significantly behind commits
- Query caching issues
Solutions:
- Check
commit_tvsindex_tgap - Verify indexing process is running
- Check query execution logs
- Consider forcing index update
Nameservice Contention
Symptom: Publish operations failing with conflicts
Possible Causes:
- Multiple processes updating same ledger
- High transaction rate
- Storage backend throttling
Solutions:
- Implement retry logic with backoff
- Reduce transaction rate if possible
- Scale storage backend (if S3/DynamoDB)
- Check for process coordination issues
This foundation of ledgers and the nameservice enables Fluree’s distributed, temporal graph database capabilities, providing the coordination layer needed for scalable, consistent data management.
Differentiator: Fluree’s nameservice architecture enables true distributed deployments with coordination across multiple processes and machines, unlike single-instance databases. The separation of commits and indexes, combined with the novelty layer, enables real-time queries while maintaining efficient background indexing—a unique architectural advantage.
Graph Sources
Differentiator: Graph sources are one of Fluree’s most powerful features, enabling seamless integration of specialized indexes and external data sources directly into graph queries. Unlike traditional databases that require separate systems for full-text search, vector similarity, or data lake access, Fluree makes these capabilities first-class citizens in the query language.
What Are Graph Sources?
A graph source is anything you can address by a graph name/IRI in Fluree query execution. Graph sources may be backed by:
- Ledger graphs (default graph and named graphs stored as RDF triples)
- Index graph sources (BM25 and vector/HNSW indexes)
- Mapped graph sources (R2RML and Iceberg-backed mappings)
Key Characteristics
- Query integration: Graph sources can be queried using the same SPARQL and JSON-LD Query interfaces
- Transparent access: Applications don’t need to know whether data comes from a ledger graph source or a non-ledger graph source
- Specialization: Each graph source type is optimized for specific query patterns
- Time travel (type-specific): Some graph sources support time-travel queries, but support is not uniform across all types. Time-travel is implemented by each graph source type (not by the nameservice).
Graph Source Types
BM25 Full-Text Search
Differentiator: Fluree includes built-in BM25 full-text search indexing, eliminating the need for separate search systems like Elasticsearch.
Use Cases:
- Product search with relevance ranking
- Document search with keyword matching
- Content discovery with fuzzy matching
Example:
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"from": "products:main",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Key Features:
- Relevance scoring (BM25 algorithm)
- Configurable parameters (k1, b)
- Language-aware search
- Optional time-travel support (BM25-owned manifest; see “Time Travel” below)
See the BM25 documentation for details.
Vector Similarity Search (ANN)
Differentiator: Native support for approximate nearest neighbor (ANN) queries via embedded HNSW indexes, enabling semantic search and similarity queries. Can run embedded (in-process) or via a dedicated remote search service.
Use Cases:
- Semantic search (find similar documents)
- Recommendation systems
- Image similarity search
- Embedding-based queries
Key Features:
- Approximate nearest neighbor search (HNSW algorithm)
- Configurable distance metrics (cosine, euclidean, dot product)
- Embedded indexes (no external service required) or remote mode via
fluree-search-httpd - Support for high-dimensional vectors
- Snapshot-based persistence with watermarks (head-only in v1; time-travel not supported)
See the Vector Search documentation for details.
Apache Iceberg Integration
Differentiator: Query Apache Iceberg tables and Parquet files directly as graph sources, enabling seamless integration with data lake architectures.
Use Cases:
- Query data lake formats without ETL
- Combine graph data with tabular data
- Analytics queries over large datasets
- Integration with existing data pipelines
Example:
# Query Iceberg table as graph source
SELECT ?customer ?order ?amount
FROM <iceberg:sales:main>
WHERE {
?order ex:customer ?customer .
?order ex:amount ?amount .
FILTER(?amount > 1000)
}
Key Features:
- Direct querying of Iceberg tables
- Parquet file support
- R2RML mapping for tabular data (Iceberg-backed)
- Time-travel via Iceberg snapshots
- Direct S3 mode: bypass REST catalog servers for
iceberg-rust/ self-managed tables — readsversion-hint.textfor automatic version discovery
See the Iceberg documentation for details.
R2RML Relational Mapping
Differentiator: Map relational databases to RDF using R2RML (R2RML Mapping Language), enabling graph queries over SQL databases.
Use Cases:
- Adopt graph queries alongside SQL data sources
- Query SQL databases using SPARQL
- Integrate existing systems
- Unified query interface across data sources
Example:
# Query relational database via R2RML mapping
SELECT ?customer ?order
FROM <r2rml:orders:main>
WHERE {
?customer ex:hasOrder ?order .
?order ex:status "pending" .
}
Key Features:
- R2RML standard compliance
- Automatic RDF mapping from SQL schemas
- Read-only access to source databases
- Support for complex joins and transformations
See the R2RML documentation for details.
Graph Source Lifecycle
Creation
Graph sources are created through administrative operations, specifying:
- Type: BM25, Vector, Iceberg, or R2RML
- Configuration: Type-specific settings
- Dependencies: Source ledgers or data sources
- Branch: Graph sources support branching like ledgers
Example BM25 Graph Source Creation:
{
"@type": "f:Bm25Index",
"f:name": "products-search",
"f:branch": "main",
"f:sourceLedger": "products:main",
"f:config": {
"k1": 1.2,
"b": 0.75,
"fields": ["name", "description"]
}
}
Indexing
Graph sources maintain their own indexes:
- BM25: Full-text indexes are built from source ledger data
- Vector: Embeddings stored in HNSW indexes (embedded or remote)
- Iceberg: Metadata is cached for efficient querying
- R2RML: Mapping rules are applied to generate RDF
Querying
Graph sources are queried like regular ledgers:
# Query any graph source
SELECT ?result
FROM <graph-source-name:branch>
WHERE {
# Query patterns specific to graph source type
}
Time Travel
Some graph sources support historical queries using the @t: syntax in the ledger reference, but the behavior is graph-source-type specific:
{
"@context": { "f": "https://ns.flur.ee/db#" },
"from": "products:main@t:1000",
"select": ["?product"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product" }
}
]
}
BM25
BM25 can support time travel by maintaining a BM25-owned manifest in storage that maps transaction watermarks (t) to index snapshot addresses. The nameservice stores only a head pointer (an opaque address to the latest BM25 manifest/root) and does not store snapshot history.
Vector
Vector search is head-only in v1. If a query requests an @t: (or otherwise requests an historical view), vector search rejects the request with a clear “time-travel not supported” error.
Iceberg
Iceberg time travel (when used) is handled by Iceberg’s own snapshot/metadata model, not by nameservice-managed snapshot history.
Graph Source Architecture
Nameservice Integration
Graph sources are tracked in the nameservice alongside ledgers:
- Discovery: List all graph sources via nameservice
- Metadata: Configuration and status stored in nameservice
- Coordination: Index state tracked separately from source ledgers
Important: for graph sources, the nameservice stores only configuration and a head pointer (as a ContentId) to the graph source’s latest index root/manifest. Snapshot history (if any) lives in graph-source-owned manifests in the content store.
Query Execution
When querying a graph source:
- Resolution: Query engine resolves graph source from nameservice
- Type Detection: Determines graph source type (BM25, Vector, etc.)
- Specialized Execution: Routes to type-specific query handler
- Result Integration: Results integrated with regular graph queries
Performance Characteristics
Each graph source type has different performance characteristics:
- BM25: Fast keyword search, relevance scoring
- Vector: Approximate similarity search, configurable accuracy/speed tradeoff
- Iceberg: Columnar storage, efficient for analytical queries
- R2RML: Depends on source database performance
Use Cases
Multi-Modal Search
Combine full-text search, vector similarity, and graph queries:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "products:main",
"select": ["?product", "?textScore", "?vectorScore"],
"values": [
["?queryVec"],
[{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
],
"where": [
{ "@id": "?product", "ex:category": "electronics" },
{
"f:graphSource": "products-search:main",
"f:searchText": "wireless",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
},
{
"f:graphSource": "products-vector:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vectorScore" }
}
],
"orderBy": [["desc", "(?textScore + ?vectorScore)"]]
}
Vector/HNSW graph sources are currently queried via JSON-LD Query using f:* patterns (e.g. f:graphSource, f:queryVector, f:searchResult). SPARQL query syntax for HNSW vector indexes is not currently available.
Data Lake Integration
Query both graph and tabular data:
SELECT ?customer ?graphData ?lakeData
FROM <customers:main> # Graph ledger
FROM <iceberg:sales:main> # Iceberg graph source
WHERE {
# Graph data
?customer ex:preferences ?graphData .
# Data lake data
GRAPH <iceberg:sales:main> {
?sale ex:customer ?customer .
?sale ex:total ?lakeData .
}
}
Hybrid Search
Combine semantic and keyword search:
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"from": "documents:main",
"select": ["?document"],
"where": [
{
"f:graphSource": "documents-search:main",
"f:searchText": "machine learning",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?document" }
}
]
}
Semantic similarity via HNSW vector indexes is also queried via JSON-LD Query using f:* patterns. SPARQL syntax for BM25 and vector index search is not currently available.
Best Practices
Graph Source Design
-
Choose Appropriate Type: Match graph source type to query patterns
- Keyword search → BM25
- Semantic search → Vector
- Analytics → Iceberg
- SQL integration → R2RML
-
Configuration Tuning: Optimize graph source parameters
- BM25: Tune k1 and b for relevance
- Vector: Choose appropriate distance metric
- Iceberg: Optimize partition strategy
-
Dependency Management: Understand source data dependencies
- BM25/Vector: Keep in sync with source ledger
- Iceberg: Handle schema evolution
- R2RML: Map schema changes
Performance Optimization
-
Index Maintenance: Keep graph source indexes up-to-date
- Monitor indexing lag
- Tune indexing frequency
- Handle large data volumes
-
Query Planning: Optimize queries using graph sources
- Use graph sources for appropriate query patterns
- Combine with graph queries efficiently
- Consider cost of graph source queries
-
Caching: Cache frequently accessed graph source results
- Cache query results when appropriate
- Consider graph source snapshot caching
- Balance freshness vs performance
Operational Considerations
-
Monitoring: Track graph source health
- Index build status
- Query performance
- Storage usage
-
Backup: Include graph sources in backup strategy
- BM25 indexes can be rebuilt (or restored from stored snapshots/manifests, depending on configuration)
- Vector indexes are stored as head snapshots (time-travel not supported in v1)
- Iceberg metadata in nameservice
-
Scaling: Plan for graph source scaling
- BM25: Scale with source ledger size
- Vector: Scale with embedding count
- Iceberg: Leverage Iceberg partitioning
Comparison with Traditional Approaches
Traditional Architecture
Application
├── Graph Database (Neo4j, etc.)
├── Search Engine (Elasticsearch)
├── Vector DB (Pinecone, etc.)
└── Data Lake (Spark, Presto)
Challenges:
- Multiple systems to manage
- Data synchronization complexity
- Different query languages
- Separate authentication/authorization
Fluree Graph Source Architecture
Application
└── Fluree
├── Graph Ledgers
├── BM25 Graph Sources (built-in)
├── Vector Graph Sources
└── Iceberg Graph Sources
Benefits:
- Single query interface (SPARQL/JSON-LD Query)
- Unified access control (policy enforcement)
- Consistent time-travel across all data
- Simplified operations and deployment
Graph sources make Fluree a unified platform for graph, search, vector, and data lake queries, eliminating the complexity of managing multiple specialized systems.
IRIs, Namespaces, and JSON-LD @context
Internationalized Resource Identifiers (IRIs)
In Fluree, all data identifiers use Internationalized Resource Identifiers (IRIs) - the internationalized version of URIs. IRIs uniquely identify:
- Subjects: Entities in your data (people, products, concepts)
- Predicates: Relationships or properties
- Objects: Values or other entities
- Graphs: Named data partitions
IRI Examples
# Full IRIs
<http://example.org/person/alice> <http://xmlns.com/foaf/0.1/name> "Alice" .
<http://example.org/person/alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
# IRIs with Unicode characters
<http://例え.org/人物/アリス> <http://xmlns.com/foaf/0.1/name> "アリス" .
IRI Best Practices
- Use stable domains: Choose domains you control or well-established standards
- Hierarchical structure: Organize IRIs with meaningful paths
- Avoid query parameters: IRIs should be clean identifiers, not URLs with parameters
- Internationalization: IRIs support Unicode characters for global identifiers
Namespaces
Namespaces provide shorthand notation for IRIs, making data more readable and manageable. A namespace maps a prefix to a base IRI.
Defining Namespaces
{
"@context": {
"ex": "http://example.org/ns/",
"foaf": "http://xmlns.com/foaf/0.1/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"xsd": "http://www.w3.org/2001/XMLSchema#"
}
}
Using Namespaced IRIs
With the above context, you can write compact IRIs:
{
"@context": {
"ex": "http://example.org/ns/",
"foaf": "http://xmlns.com/foaf/0.1/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "foaf:Person",
"foaf:name": "Alice Smith"
}
]
}
This expands to:
{
"@graph": [
{
"@id": "http://example.org/ns/alice",
"@type": "http://xmlns.com/foaf/0.1/Person",
"http://xmlns.com/foaf/0.1/name": "Alice Smith"
}
]
}
JSON-LD @context
The @context is a JSON-LD mechanism that defines how to interpret the data. In Fluree, @context serves multiple purposes:
IRI Expansion/Compaction
{
"@context": {
"name": "http://xmlns.com/foaf/0.1/name",
"Person": "http://xmlns.com/foaf/0.1/Person"
},
"@graph": [
{
"@id": "http://example.org/alice",
"@type": "Person",
"name": "Alice"
}
]
}
The @context maps name → http://xmlns.com/foaf/0.1/name and Person → http://xmlns.com/foaf/0.1/Person.
Standard Prefixes
Fluree includes many standard prefixes by default:
{
"@context": {
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"owl": "http://www.w3.org/2002/07/owl#",
"foaf": "http://xmlns.com/foaf/0.1/",
"dc": "http://purl.org/dc/elements/1.1/"
}
}
@context in Queries
@context is also used in query results for compact output:
{
"@context": {
"ex": "http://example.org/ns/",
"foaf": "http://xmlns.com/foaf/0.1/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "foaf:Person",
"foaf:name": "Alice"
}
]
}
IRI Resolution Rules
Fluree follows strict IRI resolution rules:
Absolute IRIs
These are used as-is:
http://example.org/person/alicehttps://data.example.com/product/123
Prefixed IRIs
These expand using @context:
ex:alice→http://example.org/ns/alice(ifexmaps tohttp://example.org/ns/)foaf:name→http://xmlns.com/foaf/0.1/name
Relative IRIs
These are resolved relative to a base IRI:
alice→http://example.org/ns/alice(if base ishttp://example.org/ns/)
Strict Compact-IRI Guard
JSON-LD parsing in Fluree (queries and transactions) is strict by default about compact IRIs. If you write a value that looks like a compact IRI — prefix:suffix — but the prefix is not defined in @context, Fluree rejects the request at parse time with a clear error:
Unresolved compact IRI 'ex:Person': prefix 'ex' is not defined in @context.
If this is intended as an absolute IRI, use a full form (e.g. http://...)
or add the prefix to @context.
Why strict by default
Without the guard, a missing or misspelled prefix passes through silently — ex:Person gets stored as the literal string "ex:Person" instead of being expanded to a real IRI like http://example.org/Person. This produces incorrect data and confusing query results that are very hard to diagnose later.
The guard catches the most common cause of these bugs: forgetting an @context.
What the guard accepts
- IRIs that resolve through
@context(the normal happy path). - Hierarchical absolute IRIs whose suffix starts with
//—http://...,https://...,ftp://..., etc. - A small allowlist of well-known non-hierarchical schemes —
urn:,did:,mailto:,tel:,data:,ipfs:,ipns:,geo:,blob:,magnet:,fluree:. Scheme names are matched case-insensitively per RFC 3986. - Variables (
?x) and blank nodes (_:b0) bypass the guard entirely.
Where the guard applies
The guard runs at every position that semantically expects an IRI in JSON-LD:
@id,@type, predicates / property names- Datatype IRIs in
@typeof@valueobjects - Graph names and graph-crawl roots
- Selection predicates (forward and reverse)
- VALUES
@idcells @pathaliases inside@context
It does not apply to:
- SPARQL queries
- Turtle / TriG transactions
- Literal string values (only IRI positions)
- Other consumers of the underlying JSON-LD expander (e.g. connection-config parsing)
Opting out per request
If you really need to accept unresolved compact-looking strings — for example, when migrating legacy data that uses bare prefix:suffix strings as opaque identifiers — set opts.strictCompactIri: false in the JSON-LD payload itself:
{
"@context": {"ex": "http://example.org/ns/"},
"opts": {"strictCompactIri": false},
"@graph": [
{"@id": "ex:alice", "ex:name": "Alice"},
{"@id": "legacy:bob", "ex:name": "Bob"}
]
}
The same key works on both queries and transactions. The default is true. Keep it on unless you have a concrete reason to disable it.
For programmatic use from Rust, transactions can also set TxnOpts.strict_compact_iri directly; that takes precedence over opts.strictCompactIri in the JSON.
Blank Nodes and Anonymous Entities
Blank nodes represent entities without global identifiers:
{
"@graph": [
{
"@id": "_:b1",
"foaf:name": "Anonymous Person"
}
]
}
Blank nodes are:
- Local to a single transaction
- Cannot be referenced across transactions
- Useful for temporary or anonymous data
Best Practices
Namespace Organization
- Use stable prefixes: Don’t change prefix mappings once data is committed
- Standard vocabularies: Use well-known prefixes (foaf, dc, rdf, etc.)
- Custom domains: Use your own domain for application-specific terms
- Versioning: Consider versioning in namespace IRIs for evolution
IRI Design
- Descriptive paths: Use meaningful hierarchical paths
- Avoid special characters: Stick to URL-safe characters
- Consistent casing: Use consistent capitalization conventions
- Future-proofing: Design IRIs to accommodate future extensions
@context Management
- Shared contexts: Reuse @context definitions across transactions
- Minimal contexts: Only define prefixes you actually use
- Documentation: Document custom prefixes and their meanings
- Evolution: Plan for @context changes over time
Default Context
Each ledger can store a default context — a JSON object mapping prefixes to IRIs. This context is available for retrieval and can be injected into queries by compatibility surfaces (the Fluree HTTP server and CLI), but is not applied automatically by the core API (fluree-db-api).
How it’s populated
- Bulk import: When importing Turtle data via
fluree create --from, all@prefixdeclarations are captured and stored as the ledger’s default context, augmented with built-in prefixes (rdf,rdfs,xsd,owl,sh,geo). - Manual update: Use the CLI (
fluree context set) or HTTP API (PUT /v1/fluree/context/{ledger...}) to set or replace the context at any time.
Core API behavior
When using fluree-db-api directly (e.g., embedding Fluree in a Rust application), queries must supply their own @context (JSON-LD) or PREFIX declarations (SPARQL). If a query omits context, IRIs are not compacted and compact IRIs without a matching prefix will produce an error.
To opt in to default context injection when using the API directly, fetch the stored context and use the with_default_context builder:
#![allow(unused)]
fn main() {
let ctx = fluree.get_default_context("mydb").await?;
let ledger = fluree.ledger("mydb").await?;
let view = GraphDb::from_ledger_state(&ledger)
.with_default_context(ctx);
}
Or use the convenience method:
#![allow(unused)]
fn main() {
let view = fluree.db_with_default_context("mydb").await?;
}
Server and CLI behavior
The CLI automatically injects the ledger’s default context into queries that don’t provide their own. The HTTP API defaults this behavior off; pass ?default-context=true on a query request to opt in.
When default context injection is enabled:
- Query-level
@context(JSON-LD) orPREFIXdeclarations (SPARQL) — always win - Ledger default context — applied only when the query provides no context of its own
- Built-in prefixes —
rdf,rdfs,xsd, etc. are always available
Use with SPARQL (server/CLI)
The default context provides prefix definitions for SPARQL queries, so you don’t need to repeat PREFIX declarations in every query when injection is enabled. If the ledger’s default context includes {"ex": "http://example.org/"}, then you can write:
SELECT ?name WHERE {
ex:alice ex:name ?name .
}
without an explicit PREFIX ex: <http://example.org/> declaration. If you declare any PREFIX in the query, the default context is not used at all — you must declare every prefix you need.
Use with JSON-LD queries (server/CLI)
Similarly, JSON-LD queries sent through an opt-in surface that omit @context receive the default context:
{
"select": ["?name"],
"where": [["ex:alice", "ex:name", "?name"]]
}
Viewing and updating
# View the default context
fluree context get mydb
# Replace it
fluree context set mydb -e '{"ex": "http://example.org/", "foaf": "http://xmlns.com/foaf/0.1/"}'
Via the HTTP API:
# Read
curl http://localhost:8090/v1/fluree/context/mydb:main
# Replace
curl -X PUT http://localhost:8090/v1/fluree/context/mydb:main \
-H "Content-Type: application/json" \
-d '{"ex": "http://example.org/"}'
See CLI context command and API endpoints for full details.
Opting out of the default context
When using a default-context-enabled surface, you may want full, unexpanded IRIs in query results — for debugging, interoperability with other RDF tools, or simply to avoid any prefix assumptions. You can opt out of the default context:
JSON-LD queries — pass an empty @context object:
{
"@context": {},
"select": ["?s", "?p", "?o"],
"where": [["?s", "?p", "?o"]]
}
Results will contain full IRIs (e.g., http://example.org/ns/alice) instead of compacted forms (ex:alice).
SPARQL queries — include any PREFIX declaration. When a query declares its own prefixes, the default context is not injected. To opt out without defining any real prefix, use an empty default prefix:
PREFIX : <>
SELECT ?s ?p ?o WHERE { ?s ?p ?o }
Or simply declare the specific prefixes you need — the default context is only injected when the query has no PREFIX declarations whatsoever.
Storage
The default context is stored as a content-addressed blob in CAS, with a pointer (ContentId) in the nameservice config. Updates use compare-and-set semantics, so concurrent writers are safely handled. After an update, the server invalidates the cached ledger state so subsequent operations use the new context.
Integration with Standards
Fluree’s IRI system is fully compatible with:
- RDF Standards: Works with RDF/XML, Turtle, N-Triples
- SPARQL: IRIs work seamlessly in SPARQL queries
- Linked Data: Enables publishing and consuming linked data
- Semantic Web: Supports OWL ontologies and RDF Schema
This foundation enables Fluree to participate in the broader semantic web ecosystem while providing the convenience of JSON-LD’s compact syntax.
Datatypes and Typed Values
Fluree enforces strong typing for all literal values, ensuring data consistency and enabling efficient indexing and querying. Every literal value has an explicit datatype, following RDF and XSD standards.
Core Principle: No Untyped Literals
Unlike some databases that allow “plain” strings, Fluree requires every literal to have a datatype. This design provides:
- Type Safety: Prevents type confusion in queries and applications
- Consistent Comparisons: Typed values compare predictably
- Standards Compliance: Follows RDF and SPARQL specifications
- Query Optimization: Enables efficient indexing and query planning
XSD Datatypes
Fluree supports the core XML Schema Definition (XSD) datatypes:
String Types
{
"@context": {
"xsd": "http://www.w3.org/2001/XMLSchema#",
"ex": "http://example.org/ns/"
},
"@graph": [
{
"@id": "ex:book1",
"ex:title": "The Great Gatsby",
"ex:author": {
"@value": "F. Scott Fitzgerald",
"@type": "xsd:string"
}
}
]
}
xsd:string is the default for plain string literals when no type is specified.
Numeric Types
{
"@graph": [
{
"@id": "ex:product1",
"ex:price": {
"@value": "29.99",
"@type": "xsd:decimal"
},
"ex:quantity": {
"@value": "100",
"@type": "xsd:integer"
},
"ex:rating": {
"@value": "4.5",
"@type": "xsd:double"
}
}
]
}
Supported numeric types:
- xsd:integer: Whole numbers (-∞, ∞)
- xsd:long: 64-bit integers
- xsd:int: 32-bit integers
- xsd:short: 16-bit integers
- xsd:byte: 8-bit integers
- xsd:decimal: Arbitrary precision decimals
- xsd:double: 64-bit floating point
- xsd:float: 32-bit floating point
Boolean Type
{
"@graph": [
{
"@id": "ex:user1",
"ex:isActive": {
"@value": "true",
"@type": "xsd:boolean"
},
"ex:hasVerifiedEmail": {
"@value": "false",
"@type": "xsd:boolean"
}
}
]
}
xsd:boolean accepts: true, false, 1, 0.
Date and Time Types
{
"@graph": [
{
"@id": "ex:event1",
"ex:startDate": {
"@value": "2024-01-15",
"@type": "xsd:date"
},
"ex:startTime": {
"@value": "14:30:00Z",
"@type": "xsd:time"
},
"ex:createdAt": {
"@value": "2024-01-15T14:30:00Z",
"@type": "xsd:dateTime"
}
}
]
}
Temporal types:
- xsd:date: Dates without time (e.g.,
2024-01-15) - xsd:time: Times without date (e.g.,
14:30:00Z) - xsd:dateTime: Full timestamps (e.g.,
2024-01-15T14:30:00Z)
Other XSD Types
{
"@graph": [
{
"@id": "ex:resource1",
"ex:homepage": {
"@value": "https://example.com",
"@type": "xsd:anyURI"
},
"ex:duration": {
"@value": "PT1H30M",
"@type": "xsd:duration"
}
}
]
}
Additional types include:
- xsd:anyURI: Web addresses and identifiers
- xsd:duration: Time periods (ISO 8601 format)
- xsd:gYear, xsd:gMonth, xsd:gDay: Partial date components
RDF Datatypes
Beyond XSD, Fluree supports RDF-specific datatypes:
Language-Tagged Strings
{
"@graph": [
{
"@id": "ex:book1",
"ex:title": {
"@value": "The Great Gatsby",
"@language": "en"
},
"ex:titel": {
"@value": "Der große Gatsby",
"@language": "de"
}
}
]
}
rdf:langString represents strings with language tags. This is distinct from plain strings and enables language-aware queries.
JSON Data
{
"@graph": [
{
"@id": "ex:config1",
"ex:settings": {
"@value": "{\"theme\": \"dark\", \"notifications\": true}",
"@type": "@json"
}
}
]
}
rdf:JSON stores JSON data as typed literals. This is useful for storing complex structured data that doesn’t fit the RDF model.
Geographic Data
{
"@context": {
"geo": "http://www.opengis.net/ont/geosparql#",
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "ex:location1",
"ex:coordinates": {
"@value": "POINT(2.3522 48.8566)",
"@type": "geo:wktLiteral"
}
}
]
}
geo:wktLiteral stores geographic data in Well-Known Text (WKT) format. POINT geometries are automatically converted to an optimized binary encoding, while other geometry types (polygons, lines) are stored as strings.
See Geospatial for complete documentation.
Vector Data
{
"@context": {
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "ex:doc1",
"ex:embedding": {
"@value": [0.1, 0.2, 0.3, 0.4],
"@type": "@vector"
}
}
]
}
@vector (full IRI: https://ns.flur.ee/db#embeddingVector, prefix form: f:embeddingVector) stores numeric arrays as embedding vectors. Values are quantized to IEEE-754 f32 at ingest for compact storage and SIMD-accelerated similarity computation. In Turtle/SPARQL, use f:embeddingVector with the ^^ typed-literal syntax.
Without this type annotation, plain JSON arrays are decomposed into individual RDF values where duplicates may be removed and ordering is lost.
See Vector Search for complete documentation.
Fulltext Data
{
"@context": {
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "ex:article-1",
"ex:content": {
"@value": "Rust is a systems programming language focused on safety and performance",
"@type": "@fulltext"
}
}
]
}
@fulltext (full IRI: https://ns.flur.ee/db#fullText, prefix form: f:fullText) marks a string value for full-text search indexing. Values annotated with @fulltext are automatically analyzed (tokenized, stemmed, stopword-filtered) and indexed into per-predicate fulltext arenas during background index builds. This enables BM25-ranked relevance scoring via the fulltext() query function.
Without this type annotation, strings are stored as plain xsd:string values and support only exact matching and prefix queries – not relevance-ranked full-text search.
See Inline Fulltext Search for complete documentation.
Type Coercion and Compatibility
Automatic Type Promotion
Fluree handles type compatibility intelligently:
# This works - integer can be used where decimal is expected
SELECT ?price
WHERE {
?product ex:price ?price .
FILTER(?price > 10.0) # decimal comparison
}
Comparisons Between Incompatible Types
When a filter compares values of incompatible types (e.g., a number and a string), the behavior depends on the operator:
- Equality (
=) returnsfalse— values of different types are never equal - Inequality (
!=) returnstrue— values of different types are never equal - Ordering (
<,<=,>,>=) raises an error — ordering between incompatible types is undefined
Numeric types (long, double, bigint, decimal) are mutually comparable via automatic promotion, so cross-numeric comparisons work as expected. Similarly, temporal types can be compared with string representations that parse to the same temporal type.
Type Casting in Queries
SPARQL provides functions for type conversion:
SELECT ?name (xsd:string(?id) AS ?idString)
WHERE {
?person ex:name ?name ;
ex:id ?id .
}
Best Practices
Choosing Datatypes
-
Be Specific: Use the most appropriate type for your data
- Use
xsd:integerfor whole numbers that will be used in calculations - Use
xsd:stringfor identifiers and labels - Use
xsd:dateTimefor timestamps
- Use
-
Consider Query Patterns: Choose types that support your intended queries
- Numeric types enable range queries and aggregations
- Date types enable temporal queries
- String types support text search
-
Standards Alignment: Use standard datatypes where possible
- Prefer XSD types over custom types
- Use established vocabularies with well-defined ranges
Type Consistency
- Consistent Usage: Use the same datatype for equivalent properties across your data
- Change Planning: Plan for type changes as your data model evolves
- Validation: Validate data types at ingestion time
Performance Considerations
-
Index Efficiency: Different types have different indexing characteristics
- Numeric types support efficient range queries
- String types support prefix and substring matching
- Date types enable temporal range queries
-
Storage Size: Some types are more storage-efficient than others
xsd:integeris more compact thanxsd:stringxsd:booleanis more efficient than string representations
Type System Architecture
Internal Representation
Fluree stores all typed values with their datatype information:
- Value Storage: The literal value as a string
- Type Metadata: The datatype IRI
- Comparison Logic: Type-aware comparison functions
Query Processing
The type system affects query processing:
- Type Checking: Ensures type compatibility in filters and joins
- Index Selection: Chooses appropriate indexes based on types
- Result Formatting: Formats results according to datatype rules
Standards Compliance
Fluree’s type system is fully compliant with:
- RDF 1.1 Concepts: Literal typing requirements
- SPARQL 1.1: Type promotion and compatibility rules
- XSD 1.1: Datatype definitions and constraints
- JSON-LD 1.1: Typed value syntax
This strong typing foundation ensures data consistency, enables optimization, and maintains interoperability with the broader semantic web ecosystem.
Datasets and Named Graphs
Fluree supports SPARQL datasets, allowing queries to span multiple graphs simultaneously. This enables complex data integration scenarios where data from different sources or time periods needs to be queried together.
SPARQL Datasets
A dataset in SPARQL is a collection of graphs used for query execution:
- Default Graph: The primary graph for triple patterns without GRAPH clauses
- Named Graphs: Additional graphs identified by IRIs, accessible via GRAPH clauses
Dataset Structure
# Dataset with one default graph and two named graphs
FROM <ledger:main> # Default graph
FROM NAMED <ledger:archive> # Named graph
FROM NAMED <ledger:staging> # Another named graph
Named Graphs
In SPARQL, named graphs are additional graphs (identified by IRIs) that participate in query execution and are accessed via GRAPH <iri> { ... }.
In Fluree, named graphs are used in several ways:
- Multi-graph execution (datasets):
FROM NAMED <...>identifies additional graph sources (often other ledgers or non-ledger graph sources) that you can reference withGRAPH <...> { ... }. - System named graphs: Fluree provides two built-in named graphs:
txn-meta(#txn-meta): commit/transaction metadata, queryable via the#txn-metafragment (e.g.,<mydb:main#txn-meta>)config(#config): ledger-level configuration (policy, SHACL, reasoning, uniqueness constraints). See Ledger configuration.
- User-defined named graphs: Fluree supports ingesting data into user-defined named graphs using TriG format. These graphs are identified by their IRI and can be queried using the structured
fromobject syntax with agraphfield.
HTTP endpoints and default graph behavior
Fluree exposes two query styles over HTTP:
- Connection-scoped (
POST /query): the ledger(s) and graphs are identified byfrom/fromNamed(JSON-LD) orFROM/FROM NAMED(SPARQL). This is the dataset path and supports multi-ledger datasets. - Ledger-scoped (
POST /query/{ledger}): the ledger is fixed by the URL. The request may still select a named graph inside that ledger:- JSON-LD:
"from": "default","from": "txn-meta", or"from": "<graph IRI>" - SPARQL:
FROM <default>,FROM <txn-meta>,FROM <graph IRI>, andFROM NAMED <graph IRI>
- JSON-LD:
If the request body tries to target a different ledger than the one in the URL, the server rejects it with a “Ledger mismatch” error.
Txn metadata named graph (#txn-meta)
The txn-meta graph contains per-commit metadata stored as triples. This is useful for auditing and operational metadata (machine address, internal user id, job id, etc.).
Querying txn-meta via SPARQL:
PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/ns/>
SELECT ?commit ?t ?machine
FROM <mydb:main#txn-meta>
WHERE {
?commit f:t ?t .
OPTIONAL { ?commit ex:machine ?machine }
}
Notes:
- Using
FROM <mydb:main#txn-meta>makes txn-meta the default graph for the query. - You can also use dataset syntax (
FROM NAMED+GRAPH) if you need to mix default graph and txn-meta in one query.
User-Defined Named Graphs
Fluree supports ingesting data into user-defined named graphs using TriG format. TriG extends Turtle by adding GRAPH blocks that assign triples to specific named graphs.
Creating named graphs via TriG:
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
# Default graph triples
ex:company a schema:Organization ;
schema:name "Acme Corp" .
# Named graph for product data
GRAPH <http://example.org/graphs/products> {
ex:widget a schema:Product ;
schema:name "Widget" ;
schema:price "29.99"^^xsd:decimal .
}
# Named graph for inventory
GRAPH <http://example.org/graphs/inventory> {
ex:widget schema:inventory 42 ;
schema:warehouse "main" .
}
Submit TriG data via HTTP API:
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/trig" \
--data-binary '@data.trig'
Querying user-defined named graphs (JSON-LD):
Use the structured from object with a graph field:
{
"@context": { "schema": "http://schema.org/" },
"from": {
"@id": "mydb:main",
"graph": "http://example.org/graphs/products"
},
"select": ["?name", "?price"],
"where": [
{ "@id": "?product", "schema:name": "?name" },
{ "@id": "?product", "schema:price": "?price" }
]
}
System and user graphs:
- Default graph (implicit): User data without GRAPH blocks
urn:fluree:{ledger_id}#txn-meta: Commit metadataurn:fluree:{ledger_id}#config: Ledger configuration (see Ledger configuration)- User-defined named graphs: Identified by their IRI, allocated in order of first use
Notes:
- Named graph IRIs are stored in the commit’s
graph_deltafield for replay - Queries against named graphs are scoped to the indexed data (post-indexing)
- Maximum 256 named graphs can be introduced per transaction
- Maximum IRI length is 8KB per graph IRI
Querying Named Graphs
# Query specific named graphs
SELECT ?name
FROM NAMED <http://example.org/ns/graph1>
WHERE {
GRAPH <http://example.org/ns/graph1> {
?person ex:name ?name
}
}
# Query across multiple graphs
SELECT ?graph ?name
FROM NAMED <http://example.org/ns/graph1>
FROM NAMED <http://example.org/ns/graph2>
WHERE {
GRAPH ?graph {
?person ex:name ?name
}
}
Default Graph Semantics
The default graph contains triples that are not in any named graph:
# Query only the default graph
SELECT ?name
FROM <ledger:main>
WHERE {
?person ex:name ?name
# This matches triples in the default graph only
}
Union Default Graph
Some SPARQL implementations create a “union default graph” containing triples from all graphs. Fluree keeps them separate by default, but you can achieve union semantics:
# Manual union across graphs
SELECT ?name
FROM NAMED <ledger:main>
FROM NAMED <ledger:archive>
WHERE {
{ GRAPH <ledger:main> { ?person ex:name ?name } }
UNION
{ GRAPH <ledger:archive> { ?person ex:name ?name } }
}
Multi-Ledger Datasets
Datasets can span multiple ledgers:
# Dataset across different ledgers
SELECT ?product ?price
FROM <inventory:main> # Default graph from inventory ledger
FROM NAMED <pricing:main> # Named graph from pricing ledger
WHERE {
?product ex:name "Widget" .
GRAPH <pricing:main> {
?product ex:price ?price
}
}
This enables federated queries across different data sources.
Time-Aware Datasets
Named graphs can represent different time periods:
# Query current and historical data
SELECT ?version ?name
FROM NAMED <ledger:main> # Current data
FROM NAMED <ledger:archive> # Historical data
WHERE {
{ GRAPH <ledger:main> {
?person ex:name ?name .
BIND("current" AS ?version)
}
}
UNION
{ GRAPH <ledger:archive> {
?person ex:name ?name .
BIND("archive" AS ?version)
}
}
}
Graph Management
Graph Operations
Fluree supports graph-level operations:
# Insert into a specific graph
INSERT DATA {
GRAPH <http://example.org/ns/metadata> {
<http://example.org/data/doc1> ex:created "2024-01-15T10:00:00Z"^^xsd:dateTime .
}
}
# Delete from a specific graph
DELETE {
GRAPH <http://example.org/ns/temp> {
?s ?p ?o
}
}
WHERE {
GRAPH <http://example.org/ns/temp> {
?s ?p ?o
}
}
Graph Metadata
For transaction-scoped metadata, Fluree uses the txn-meta named graph (see above). Transaction metadata is stored as properties on commit subjects in txn-meta, and can be queried independently of user data.
Use Cases
Data Partitioning
Separate different types of data:
FROM NAMED <urn:customers>
FROM NAMED <urn:products>
FROM NAMED <urn:orders>
SELECT ?customer ?product
WHERE {
GRAPH <urn:customers> { ?customer foaf:name ?name }
GRAPH <urn:orders> {
?order ex:customer ?customer ;
ex:product ?product .
}
}
Access Control
Different graphs can have different permissions:
- Public graph: Open access
- Private graph: Restricted access
- Admin graph: Administrative data
Data Provenance
Track data sources and quality:
FROM NAMED <urn:sensor1>
FROM NAMED <urn:sensor2>
SELECT ?sensor ?reading ?quality
WHERE {
GRAPH ?sensor {
?obs ex:reading ?reading ;
ex:quality ?quality .
}
FILTER(?quality > 0.8) # Only high-quality readings
}
Version Management
Maintain different versions of data:
FROM NAMED <urn:v1.0>
FROM NAMED <urn:v2.0>
SELECT ?feature ?version
WHERE {
GRAPH ?version {
?feature ex:status "active"
}
}
Performance Considerations
Index Optimization
Named graphs affect indexing strategy:
- Graph-aware indexes: Indexes can be partitioned by graph
- Cross-graph joins: May require special optimization
- Graph statistics: Maintain statistics per graph for query planning
Query Planning
The query planner considers:
- Graph selectivity: Which graphs contain relevant data
- Join patterns: How graphs are connected in the query
- Graph size: Larger graphs may need different strategies
Best Practices
- Logical Partitioning: Use graphs for logical data separation
- Size Considerations: Very large graphs may impact query performance
- Naming Conventions: Use consistent IRI patterns for graph names
- Documentation: Document the purpose and schema of each graph
Standards Compliance
Fluree’s dataset implementation follows:
- SPARQL 1.1 Query: FROM and FROM NAMED clauses
- SPARQL 1.1 Update: GRAPH clauses in updates
- RDF 1.1 Datasets: Named graph semantics
- JSON-LD 1.1: @graph syntax for named graphs
This enables seamless integration with other RDF tools and SPARQL endpoints while providing Fluree’s unique temporal and ledger capabilities.
Time Travel
Differentiator: Fluree is a temporal database that preserves the complete history of all changes. Every transaction is timestamped, enabling queries against any previous state of the data. This “time travel” capability is fundamental to Fluree’s architecture and provides capabilities that most databases cannot match.
Query Formats
Time travel is supported in both JSON-LD and SPARQL query formats. Examples in this document primarily use JSON-LD syntax with SPARQL equivalents shown where relevant.
Transaction Time
Every transaction in Fluree receives a unique transaction time (t) - a monotonically increasing integer that represents the logical time of the transaction.
Transaction Ordering
Transaction 1: t=1
Transaction 2: t=2
Transaction 3: t=3
...
- Monotonic: Each new transaction gets a higher
tthan all previous transactions - Unique: No two transactions share the same
t - Global: Transaction times are unique across the entire Fluree instance
Current Time
The current time is the highest transaction time that has been committed. Queries without a time specifier automatically query the current state:
{
"@context": { "ex": "http://example.org/ns/" },
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
You can also explicitly specify @t:latest to query the latest state:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:latest",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Historical Queries
Fluree supports querying data as it existed at any point in time using the @ syntax in ledger references.
Point-in-Time Queries
Query data as it existed at a specific transaction using the from field with @t::
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:100",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Query at ISO Timestamp
Query using ISO 8601 datetime with @iso::
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@iso:2024-01-15T10:30:00Z",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Query at Commit ContentId
Query at a specific commit using @commit: with a commit ContentId:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@commit:bafybeig...",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Temporal Data Model
Immutable Facts
Once committed, data is immutable. Changes are represented as new facts that supersede previous ones:
t=1: Alice age 25 (assertion)
t=5: Alice age 26 (retraction of age 25, assertion of age 26)
History queries capture both the retraction and assertion with @op:
[
[25, 1, true],
[25, 5, false],
[26, 5, true]
]
Each row shows [value, transaction_time, op] where op is true for assertions and false for retractions.
Valid Time vs Transaction Time
Fluree primarily uses transaction time (when the fact was recorded in the database). For applications needing valid time (when the fact was true in the real world), this can be modeled explicitly as properties:
{
"@context": { "ex": "http://example.org/ns/" },
"@graph": [
{
"@id": "ex:alice-employment-1",
"ex:person": "ex:alice",
"ex:company": "ex:company-a",
"ex:validFrom": "2020-01-01T00:00:00Z",
"ex:validTo": "2023-12-31T23:59:59Z"
}
]
}
This allows you to query by both:
- Transaction time: When was this recorded? (using
@t:,@iso:,@commit:) - Valid time: When was this true? (using standard WHERE clause filters on
ex:validFrom/ex:validTo)
Snapshot and Indexing
Database Snapshots
Fluree maintains indexed snapshots at regular intervals for efficient historical access:
- Index: A complete, optimized snapshot of the database at a specific
t - Novelty: Uncommitted transactions since the last index
- Background Indexing: Continuous process that creates new indexes
Query Execution Model
Queries combine indexed data with novelty:
Query Result = Indexed Database (up to t=index) + Novelty (t=index+1 to current)
This provides:
- Fast historical queries: Use appropriate index
- Real-time current queries: Include latest transactions
- Consistent snapshots: Each query sees a consistent state
Consistency and Read-After-Write
Fluree’s query engine is eventually consistent. When a transaction commits at t=N, queries running against a different process or a warm cache may still see a state older than t=N until the cache is refreshed.
The Problem
Process A: transact → receives t=42
Process B: query → sees t=40 (stale cache)
This is expected in architectures where the query server is a separate peer, or in serverless environments where a warm Lambda invocation holds a cached ledger state from a previous request.
The Solution: refresh() with min_t
The refresh() API accepts a min_t parameter that asserts the cached ledger has reached at least a specific transaction time. If the ledger hasn’t reached that t after pulling the latest state from the nameservice, the call returns an error so the caller can retry.
Flow:
1. Client transacts → receives t=42
2. Client calls refresh(ledger, min_t=42)
3. Fluree checks cached t:
- If cached t >= 42 → immediate success (no I/O)
- If cached t < 42 → pull latest from nameservice, apply commits
- If still t < 42 → return AwaitTNotReached error
4. Client queries at t >= 42 with confidence
Usage Patterns
Same-process (embedded Fluree):
In a single process where you transact and query through the same Fluree instance, the cache is updated in-place by the transaction. min_t is typically not needed, but can serve as a safety assertion.
Multi-process / Serverless:
When the transacting process and querying process are separate (e.g., a Lambda that writes and another that reads), pass the t from the transaction receipt through your event/message payload and use min_t to gate the query:
Writer Lambda:
receipt = transact(data)
publish_event({ t: receipt.t, ... })
Reader Lambda:
event = receive_event()
refresh(ledger, min_t=event.t, timeout=5s)
query(ledger) // guaranteed to see at least t=event.t
HTTP API:
The HTTP query endpoint does not yet expose min_t directly. For HTTP clients, use the SSE events endpoint (GET /v1/fluree/events) to receive real-time commit notifications, or poll the ledger info endpoint until the desired t is reached.
Rust API
See Using Fluree as a Rust Library — Read-After-Write Consistency for full code examples including retry-with-backoff patterns.
#![allow(unused)]
fn main() {
use fluree_db_api::RefreshOpts;
// After a transaction returns t=42:
let opts = RefreshOpts { min_t: Some(42) };
let result = fluree.refresh("mydb:main", opts).await?;
// result.t >= 42 is guaranteed if Ok
}
History Queries for Change Tracking
History queries let you see all changes (assertions and retractions) within a time range. Specify the range using from and to keys with time-specced endpoints.
Entity History (JSON-LD)
Track all changes to a specific entity over time by specifying a time range:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?name", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
The @t and @op annotations bind the transaction time and operation type:
- @t - Transaction time (integer) when the fact was asserted or retracted.
- @op - Boolean:
truefor assertions,falsefor retractions. MirrorsFlake.opon disk. Both literal- and IRI-valued objects carry the metadata.
Returns results showing all changes:
[
["Alice", 1, true],
["Alice", 5, false],
["Alicia", 5, true]
]
Entity History (SPARQL)
The same query in SPARQL uses RDF-star syntax with FROM...TO:
PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?name ?t ?op
FROM <ledger:main@t:1>
TO <ledger:main@t:latest>
WHERE {
<< ex:alice ex:name ?name >> f:t ?t .
<< ex:alice ex:name ?name >> f:op ?op .
}
ORDER BY ?t
Property-Specific History
Query changes for specific properties:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:100",
"select": ["?age", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:age": { "@value": "?age", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
All Properties History
Query all property changes for an entity:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?p", "?v", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "?p": { "@value": "?v", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
Time Range with Datetime
Query history using ISO 8601 datetime strings:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@iso:2024-01-01T00:00:00Z",
"to": "ledger:main@iso:2024-12-31T23:59:59Z",
"select": ["?name", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
]
}
Filter by Operation Type
Filter to show only assertions or only retractions:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?name", "?t"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
["filter", "(= ?op \"retract\")"]
]
}
Pattern History Across Subjects
Query changes for a specific property across all subjects:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?person", "?status", "?t", "?op"],
"where": [
{ "@id": "?person", "ex:status": { "@value": "?status", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
Performance Characteristics
Time Resolution Performance
Different time specifiers have different performance characteristics:
- @t:NNN (fastest): Direct transaction number, no resolution needed
- @iso:DATETIME: O(log n) binary search through commit timestamps using POST index
- @commit:CID: Bounded SPOT scan, O(k) where k is commits matching prefix (use longer prefixes for better performance)
Index Selection
Fluree automatically selects the most appropriate index for historical queries:
- Recent history: Uses current index + novelty (uncommitted transactions)
- Historical snapshots: Uses closest index snapshot to target time
- Point queries (
@t:): Direct index lookup for specific transaction
History Query Performance
History queries scan flakes within the specified time range:
- Entity history (specific
@id): SPOT index scan on subject - Property history (specific predicate): Narrower SPOT scan with predicate filter
- All properties (variable predicate
?p): Full SPOT scan for subject - Cross-entity (variable subject
?s): POST/PSOT index scan (can be slower for common predicates)
Optimization Strategies
- Use Transaction Numbers: When possible, use
@t:NNNinstead of@iso:DATETIME - Narrow History Patterns: Use
[subject, predicate]instead of[subject]when you only need specific properties - Limit Time Ranges: Specify realistic
from/tobounds rather than querying all history - ContentId Prefix Length: Use sufficiently long ContentId prefixes to avoid ambiguity checks
- Index Density: More frequent indexing improves historical query performance for distant past
Storage Implications
- Full History: All transaction history is preserved (immutable append-only)
- Index Snapshots: Periodic snapshots enable efficient historical queries without replaying all transactions
- Commit Metadata: Stored as queryable flakes (~8-9 flakes per commit)
- Transaction JSON: Optionally stored for audit trails (enable with
txn: true)
Practical Applications
Version Control
Treat data like code with version control:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "app:production@t:1000",
"select": ["?config"],
"where": [
{ "@id": "?setting", "ex:value": "?config" }
]
}
Regulatory Compliance
Maintain complete audit trails - query data as it existed at time of consent:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "users:main@iso:2024-05-25T14:30:00Z",
"select": ["?predicate", "?data"],
"where": [
{ "@id": "ex:alice", "?predicate": "?data" }
]
}
Change History Analysis
Track how data evolved over time:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "sales:main@iso:2024-01-01T00:00:00Z",
"to": "sales:main@iso:2024-12-31T23:59:59Z",
"select": ["?order", "?amount", "?t", "?op"],
"where": [
{ "@id": "?order", "ex:amount": { "@value": "?amount", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
Debugging and Troubleshooting
Investigate system state at time of incident:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "system:config@iso:2024-01-15T09:15:00Z",
"select": ["?setting", "?config"],
"where": [
{ "@id": "?setting", "ex:value": "?config" }
]
}
Time Travel in Multi-Ledger Scenarios
Cross-Ledger Temporal Queries
Query across ledgers at consistent time points:
{
"@context": { "ex": "http://example.org/ns/" },
"from": [
"customers:main@t:1000",
"orders:main@t:1000"
],
"select": ["?customer", "?order"],
"where": [
{ "@id": "?customer", "ex:name": "Alice" },
{ "@id": "?order", "ex:customer": "?customer" }
]
}
Ledger Branching
Time travel enables sophisticated branching workflows by querying historical states:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:500",
"select": ["?entity", "?property", "?value"],
"where": [
{ "@id": "?entity", "?property": "?value" }
]
}
You can then use this historical state as a basis for creating a new branch or comparing against current state.
Common Patterns
Compare Current vs Historical State
Query the same entity at two different points in time:
// Query current state
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main",
"select": ["?price"],
"where": [
{ "@id": "ex:product-123", "ex:price": "?price" }
]
}
// Query historical state
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:100",
"select": ["?price"],
"where": [
{ "@id": "ex:product-123", "ex:price": "?price" }
]
}
Find When a Change Occurred
Use history queries to identify when a specific change happened:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?status", "?t", "?op"],
"where": [
{ "@id": "ex:product-123", "ex:status": { "@value": "?status", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
The results show when ex:status changed, with ?op = false (retract) for the old value and ?op = true (assert) for the new value at the same transaction time.
Audit Trail for Compliance
Generate a complete audit trail for a sensitive entity:
{
"@context": { "schema": "http://schema.org/" },
"from": "users:main@iso:2024-01-01T00:00:00Z",
"to": "users:main@t:latest",
"select": ["?property", "?value", "?t", "?op"],
"where": [
{ "@id": "schema:Person/12345", "?property": { "@value": "?value", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
This returns all changes with transaction times for audit purposes. Each result row shows the property, value, when it was changed, and whether it was an assertion or retraction.
Rollback Detection
Find what changed after a specific commit:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "config:main@t:50",
"to": "config:main@t:latest",
"select": ["?setting", "?value", "?t", "?op"],
"where": [
{ "@id": "?setting", "ex:config": { "@value": "?value", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
This shows all configuration changes since transaction 50, useful for identifying what to rollback. You can first query "from": "config:main@commit:bafybeig..." to find the transaction number (using point-in-time queries), then use that in the history query.
Reproduce a Bug at Specific Time
Query the exact state of the system when a bug was reported:
{
"@context": { "ex": "http://example.org/ns/" },
"from": [
"products:main@iso:2024-06-15T14:30:00Z",
"inventory:main@iso:2024-06-15T14:30:00Z"
],
"select": ["?product", "?stock", "?reserved"],
"where": [
{ "@id": "?product", "ex:stockLevel": "?stock" },
{ "@id": "?product", "ex:reserved": "?reserved" }
]
}
This recreates the exact state across multiple ledgers at the time the bug occurred, making debugging much easier.
Best Practices
Time Travel Guidelines
- Explicit Time References: Always specify clear time references (
@t:,@iso:, or@commit:) for reproducible queries - Time Zone Awareness: Use UTC for ISO timestamps to avoid ambiguity
- ContentId Length: Use sufficiently long ContentId prefixes to avoid collisions
- Performance Testing: Test query performance across different time ranges and ledger sizes
History Query Patterns
- Narrow Your Scope: Use specific property patterns rather than wildcard
?pwhen you only need certain properties - Limit Time Ranges: Specify realistic time ranges with
fromandtorather than@t:1to@t:latest - Use Filters: Filter by
@opto show only assertions or retractions when you don’t need both - Order Results: Use
orderBy: "?t"to see changes in chronological order
Data Modeling for Time
- Temporal Validity: Model valid time explicitly when needed (separate from transaction time)
- Change Tracking: Use history queries rather than storing change logs manually
- Immutable Design: Design for immutability from the start - never update in place
- Audit Patterns: Leverage history queries for audit trails instead of separate audit tables
Operational Considerations
- Index Maintenance: Monitor and tune background indexing for optimal historical query performance
- Storage Planning: Plan storage growth for historical data (all history is preserved)
- Query Optimization: Use time-specific queries (
@t:) rather than datetime resolution (@iso:) when transaction numbers are known - Backup Strategy: Include temporal aspects in backup/recovery plans - commits and indexes are both critical
Implementation Architecture
Transaction Pipeline
- Transaction Reception: Assign new transaction time (
t) - Validation: Check against current state
- Commitment: Persist transaction with ISO timestamp
- Commit Metadata: Store commit ContentId, timestamp, and optional transaction JSON
- Indexing: Background process creates new indexes
- Publication: Update nameservice with new transaction time
Time Travel Resolution
When you query with @t:, @iso:, or @commit::
- @t:NNN - Direct transaction number (fastest)
- @iso:DATETIME - Binary search through commit timestamps using POST index
- @commit:CID - Bounded SPOT scan to find matching commit
Query Execution
- Time Resolution: Resolve time specifiers to specific
tvalues - Index Selection: Choose appropriate index for target time
- Novelty Application: Apply intervening transactions if needed
- Result Generation: Return consistent snapshot
History Query Execution
- Time Range Detection: The
fromandtokeys with time-specced endpoints activates history mode - Pattern Resolution: WHERE patterns are executed with history mode enabled
- Metadata Capture: Transaction time (
@t) and operation (@op) are captured for each binding - Result Generation: Results include both assertions and retractions within the time range
This temporal foundation makes Fluree uniquely powerful for applications requiring complete historical visibility, audit capabilities, and temporal analytics.
Policy Enforcement
Fluree enforces access control inside the database. Individual facts (flakes) are filtered against policy rules during query and transaction execution, so the same query returns different results to different identities — automatically. The application doesn’t filter; the database does.
Why triple-level
Most databases enforce access at the row, table, or schema level. That granularity is awkward for graph data, where a single subject may have facts that are public (schema:name), employee-only (ex:department), and HR-only (ex:salary). Fluree’s enforcement happens per flake — ?subject ?predicate ?object — so policies can permit name, allow department to platform employees, and restrict salary to managers in the same department, all from one query.
The consequences:
- No application-side filtering. Security can’t be bypassed by buggy code paths because the database never returns flakes the requester isn’t allowed to see.
- Auditable. Policies are themselves data. They live in the ledger, are time-travelable, and can be queried —
SELECT ?p WHERE { ?p a f:AccessPolicy }. - Multi-tenant ready. A single ledger can serve many tenants, with isolation enforced at flake level.
- Compliance-friendly. GDPR / HIPAA-style “minimum necessary” access is the default behavior, not a check the app forgot to do.
What a policy looks like
Every policy is a JSON-LD node typed f:AccessPolicy. A policy has three orthogonal pieces:
- Targeting —
f:onProperty,f:onClass,f:onSubject(each an array of@idreferences). Omit them all to make a default policy that applies to every flake. - Action —
f:actionwith valuesf:view(queries) and/orf:modify(transactions). - Decision — either:
f:allow: true— unconditional allow, orf:allow: false— unconditional deny, orf:query: "<JSON-encoded WHERE>"— allow when the embedded query produces at least one binding for the targeted flake.
Two further knobs:
f:required: true— the policy must allow for access to the targeted flake to be granted, even whendefault-allowis true. Use it for hard constraints (PII protection, write barriers).f:exMessage— a string returned to the caller when this policy denies a transaction.
A worked example:
{
"@id": "ex:salary-restriction",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "ex:salary"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"manager\"}}"
}
Translation: for every flake whose property is ex:salary and that someone is trying to read, this policy must allow. The embedded f:query runs with ?$identity pre-bound to the requester; if it returns a binding (i.e. the identity has role "manager"), the flake is permitted.
Variables in f:query
Inside an f:query, two variables are pre-bound:
| Variable | Meaning |
|---|---|
?$this | The subject of the targeted flake (the entity being read or written). |
?$identity | The IRI of the requesting identity, supplied via policy-values. |
Anything else is bound by the embedded WHERE just like a normal Fluree query.
How the engine combines policies
When a request hits a flake, the engine collects every policy that targets it:
- Required policies (with
f:required: true) must all allow. If any required policy denies — including by returning nof:querybindings — the flake is denied. - If no required policies target the flake, any allow is enough. Fluree uses allow-overrides across the non-required set.
- If no policies apply at all, the request falls back to
default-allow.
default-allow: false is fail-closed and the right choice for most production deployments.
Where policies come from
Two delivery channels, often mixed:
- Stored — write policies into the ledger as data. Tag each policy with a class (e.g.
ex:CorpPolicy), and tag each identity entity withf:policyClasslinking to that class. At request time, passpolicy-class: ["ex:CorpPolicy"]and the engine pulls the matching policy set from the ledger automatically. Stored policies are versioned, time-travelable, and consistent across all callers — the right approach for production. - Inline — pass policies in
opts.policy(an array of policy nodes) or via thefluree-policyHTTP header. Useful for ad-hoc queries, automated tests, and admin scripts.
The two can be combined: a query can carry a policy-class and an additional inline policy.
Identity binding
An identity entity ties a caller (DID, JWT subject, application user) to graph nodes that policies can reason about:
{
"@id": "ex:aliceIdentity",
"ex:user": {"@id": "ex:alice"},
"f:policyClass": [{"@id": "ex:CorpPolicy"}]
}
Caller traffic carrying identity: "ex:aliceIdentity" causes:
- Fluree binds
?$identitytoex:aliceIdentityin everyf:query. - Stored policies tagged
ex:CorpPolicyare loaded. - Each policy’s
f:queryruns against the snapshot, with?$identityand?$thispre-bound, deciding flake by flake whether the request is permitted.
The ex:user link is a domain-specific convention — your f:querys use it to reach from the identity to the human/service the policies should reason about. Any modeling works; nothing about that link is special to Fluree.
What you control at the request boundary
Each request can supply:
identity— IRI of the calling identity entity. Used to pre-bind?$identityand to discover the identity’sf:policyClass.policy-class— one or more class IRIs to pull stored policies by class.policy-values— an object of additional?$varbindings injected into every policy’sf:query.policy— an inline JSON-LD policy array.default-allow— boolean fallback for flakes no policy targets.
Over JSON-LD, these go inside opts. Over SPARQL, they’re sent as fluree-* headers (SPARQL has no opts block). When the server is configured with a default policy class, a verified bearer token’s identity is auto-applied — see the policy cookbook for the request shapes and the server-side data_auth_default_policy_class option in Configuration.
Query enforcement vs transaction enforcement
The same policy model governs both, distinguished by f:action:
f:view— runs during query execution. Flakes that fail the policy are filtered from the result; the query never sees them.f:modify— runs during transaction staging. The transaction is rejected (withf:exMessageif provided) if a write would touch flakes the identity isn’t allowed to modify.
A single policy can govern both ("f:action": [{"@id": "f:view"}, {"@id": "f:modify"}]). Most realistic policy sets mix view-only restrictions, modify-only restrictions, and a small number of [f:view, f:modify] defaults.
Policies are data
Because policies are flakes:
- Time travel. Query at past
tto see what was in effect. - Branchable. Trial policies on a branch before merging.
- Versionable. Edit through normal transactions; full history kept.
- Self-querying. Run reports over the policies themselves.
This makes policy management a normal Fluree workflow rather than a sidecar problem.
Performance shape
Policy evaluation has two phases — load (read the policies relevant to this request once) and apply (filter flakes during plan execution). Cost scales mostly with the apply phase: how many flakes the request touches, and how expensive each policy’s f:query is.
Two practical implications:
- Target policies. A policy with
f:onPropertyorf:onClassonly runs on flakes whose predicate or rdf:type matches. Default policies (no targeting) run on every flake. Prefer targeting wherever it makes sense. - Keep
f:querycheap. Lean on identity attributes already loaded (@type,f:policyClass, role flags) rather than deep traversals.
For deeper architectural detail see Policy model and inputs, Policy in queries, and Policy in transactions.
Related documentation
- Cookbook: Access control policies — worked examples for common patterns
- Policy model and inputs — full reference
- Policy in queries — query-time behavior
- Policy in transactions — transaction-time behavior
- Programmatic policy API (Rust) — building policy contexts in code
- Authentication — identity, JWTs, and bearer tokens
- Configuration — server-side policy defaults (
data_auth_default_policy_class, etc.)
Verifiable Data
Differentiator: Fluree supports cryptographically signed transactions using industry-standard formats (JWS and Verifiable Credentials), enabling tamper-proof audit trails and trustless data exchange. Every transaction can be cryptographically verified, providing cryptographic proof of data provenance and integrity.
Note: Requires the credential feature flag. See Compatibility and Feature Flags.
What Is Verifiable Data?
Verifiable data in Fluree refers to transactions that are cryptographically signed, providing proof of:
- Authenticity: Who created the transaction
- Integrity: That the data hasn’t been tampered with
- Non-repudiation: The signer cannot deny creating the transaction
- Provenance: The origin and history of the data
Key Characteristics
- Cryptographic Signatures: Transactions signed using standard cryptographic algorithms
- Industry Standards: Support for JWS (JSON Web Signatures) and Verifiable Credentials (VC)
- Tamper-Proof: Any modification to signed data invalidates the signature
- Verifiable: Anyone can verify signatures without special access
Why Verifiable Data Matters
Traditional Database Limitations
Most databases provide:
- Authentication: Who can access the database
- Authorization: What they can do
- Audit Logs: What happened (but logs can be modified)
Problems:
- No cryptographic proof of data origin
- Audit logs can be tampered with
- Difficult to prove data integrity
- No way to verify data across systems
Fluree’s Approach
Fluree provides:
- Cryptographic Signatures: Every transaction can be signed
- Tamper-Proof History: Signed transactions cannot be modified
- Verifiable Provenance: Anyone can verify data origin
- Trustless Exchange: Data can be shared without trusting intermediaries
Benefits:
- Audit Compliance: Cryptographic proof for compliance requirements
- Data Integrity: Detect any tampering with data
- Trustless Systems: Enable trustless data exchange
- Provenance Tracking: Track data origin cryptographically
Signed Transactions
JWS (JSON Web Signatures)
Fluree supports JWS for signing transactions:
Transaction Structure:
{
"ledger": "mydb:main",
"tx": [
{
"@id": "ex:alice",
"ex:name": "Alice"
}
],
"signature": {
"protected": {
"alg": "ES256",
"kid": "key-1"
},
"signature": "base64-encoded-signature"
}
}
Verification:
- Extract signature from transaction
- Verify signature using signer’s public key
- Confirm transaction hasn’t been modified
Verifiable Credentials
Fluree supports Verifiable Credentials (VC) for credential-based transactions:
VC Structure:
{
"@context": [
"https://www.w3.org/2018/credentials/v1"
],
"type": ["VerifiableCredential"],
"credentialSubject": {
"@id": "ex:alice",
"ex:name": "Alice"
},
"proof": {
"type": "Ed25519Signature2020",
"created": "2024-01-15T10:00:00Z",
"verificationMethod": "did:example:alice#key-1",
"proofValue": "base64-encoded-signature"
}
}
Verification:
- Verify credential proof
- Check credential issuer
- Validate credential structure
- Confirm credential hasn’t been revoked
Transaction Signing
Signing a Transaction
Step 1: Prepare Transaction
{
"ledger": "mydb:main",
"tx": [
{
"@id": "ex:alice",
"ex:name": "Alice"
}
]
}
Step 2: Create Signature
// Pseudo-code
const payload = JSON.stringify(tx);
const signature = sign(payload, privateKey);
Step 3: Add Signature
{
"ledger": "mydb:main",
"tx": [...],
"signature": {
"protected": {
"alg": "ES256",
"kid": "key-1"
},
"signature": signature
}
}
Signature Algorithms
Fluree supports standard signature algorithms:
- ES256: ECDSA with P-256 and SHA-256
- ES384: ECDSA with P-384 and SHA-384
- ES512: ECDSA with P-521 and SHA-512
- Ed25519: EdDSA with Ed25519 curve
Key Management
Public Key Storage:
Public keys can be stored:
- In the ledger itself (as data)
- In a separate key registry
- In a DID (Decentralized Identifier) document
Example Public Key in Ledger:
{
"@id": "ex:alice",
"ex:publicKey": {
"kty": "EC",
"crv": "P-256",
"x": "base64-x",
"y": "base64-y"
}
}
Transaction Verification
Verifying a Signed Transaction
Step 1: Extract Signature
{
"signature": {
"protected": {...},
"signature": "base64-signature"
}
}
Step 2: Get Public Key
// Pseudo-code
const kid = signature.protected.kid;
const publicKey = getPublicKey(kid);
Step 3: Verify Signature
// Pseudo-code
const payload = JSON.stringify(tx);
const isValid = verify(payload, signature.signature, publicKey);
Verification in Fluree
Fluree automatically verifies signed transactions:
- Signature Extraction: Extract signature from transaction
- Key Resolution: Resolve public key from signature
- Signature Verification: Verify cryptographic signature
- Transaction Acceptance: Accept transaction if signature valid
If verification fails:
- Transaction is rejected
- Error returned to client
- No data is committed
Use Cases
Audit Compliance
Requirement: Cryptographic proof of all data changes
Solution: Sign all transactions
{
"ledger": "audit:main",
"tx": [
{
"@id": "ex:change1",
"ex:action": "update",
"ex:timestamp": "2024-01-15T10:00:00Z"
}
],
"signature": {...}
}
Benefits:
- Cryptographic proof of changes
- Tamper-proof audit trail
- Compliance with regulations
Trustless Data Exchange
Requirement: Share data without trusting intermediaries
Solution: Sign data at source
{
"ledger": "shared:main",
"tx": [
{
"@id": "ex:data1",
"ex:value": "sensitive-data",
"ex:source": "ex:system-a"
}
],
"signature": {
"protected": {
"kid": "ex:system-a#key-1"
},
"signature": "..."
}
}
Benefits:
- Verify data origin
- Detect tampering
- Trustless data sharing
Multi-Party Systems
Requirement: Multiple parties contribute data
Solution: Each party signs their transactions
{
"ledger": "consortium:main",
"tx": [
{
"@id": "ex:contribution1",
"ex:party": "ex:party-a",
"ex:data": "..."
}
],
"signature": {
"protected": {
"kid": "ex:party-a#key-1"
},
"signature": "..."
}
}
Benefits:
- Identify data contributors
- Verify party contributions
- Enable accountability
Regulatory Compliance
Requirement: Prove data integrity for regulations
Solution: Sign all regulated data
Examples:
- HIPAA: Healthcare data integrity
- GDPR: Personal data provenance
- SOX: Financial data integrity
- FDA: Pharmaceutical data integrity
Verifiable Credentials
Credential Structure
Verifiable Credentials follow W3C VC standard:
{
"@context": [
"https://www.w3.org/2018/credentials/v1"
],
"id": "ex:credential-1",
"type": ["VerifiableCredential", "ex:IdentityCredential"],
"issuer": "did:example:issuer",
"issuanceDate": "2024-01-15T10:00:00Z",
"credentialSubject": {
"@id": "ex:alice",
"ex:name": "Alice",
"ex:email": "alice@example.com"
},
"proof": {
"type": "Ed25519Signature2020",
"created": "2024-01-15T10:00:00Z",
"verificationMethod": "did:example:issuer#key-1",
"proofValue": "base64-signature"
}
}
Credential Verification
Step 1: Verify Proof
// Pseudo-code
const proof = credential.proof;
const publicKey = resolvePublicKey(proof.verificationMethod);
const isValid = verifyProof(credential, proof, publicKey);
Step 2: Check Issuer
// Pseudo-code
const issuer = credential.issuer;
const isTrusted = checkIssuerTrust(issuer);
Step 3: Validate Credential
// Pseudo-code
const isValid = validateCredentialStructure(credential);
Credential Revocation
Credentials can be revoked:
{
"@id": "ex:revocation-1",
"@type": "ex:CredentialRevocation",
"ex:credentialId": "ex:credential-1",
"ex:revokedAt": "2024-01-20T10:00:00Z"
}
Verification should check revocation status.
Data Provenance
Tracking Data Origin
Signed transactions enable provenance tracking:
Query Transaction History:
SELECT ?tx ?signer ?timestamp
WHERE {
?tx ex:signature ?sig .
?sig ex:signer ?signer .
?tx ex:timestamp ?timestamp .
}
ORDER BY DESC(?timestamp)
Verify Data Chain:
SELECT ?data ?origin ?signer
WHERE {
?data ex:origin ?origin .
?origin ex:signature ?sig .
?sig ex:signer ?signer .
}
Provenance Verification
Step 1: Find Data Origin
SELECT ?tx
WHERE {
?tx ex:created ?data .
}
Step 2: Verify Transaction Signature
// Pseudo-code
const tx = getTransaction(txId);
const isValid = verifySignature(tx);
Step 3: Trace Provenance Chain
SELECT ?chain
WHERE {
?data ex:provenance ?chain .
?chain ex:signature ?sig .
}
Best Practices
Key Management
- Secure Storage: Store private keys securely
- Key Rotation: Rotate keys regularly
- Key Backup: Backup keys securely
- Key Recovery: Plan for key recovery
Signature Practices
- Always Sign: Sign all important transactions
- Verify Before Trust: Always verify signatures
- Standard Algorithms: Use standard signature algorithms
- Key Identification: Use clear key identifiers
Credential Management
- Issuer Trust: Establish issuer trust relationships
- Credential Validation: Validate credential structure
- Revocation Checking: Check revocation status
- Credential Storage: Store credentials securely
Compliance
- Audit Logging: Log all signature verifications
- Provenance Tracking: Track data provenance
- Regulatory Alignment: Align with regulations
- Documentation: Document verification processes
Comparison with Traditional Approaches
Traditional Audit Logs
Traditional Approach:
- Logs stored in database
- Can be modified by admins
- No cryptographic proof
- Difficult to verify
Problems:
- Logs can be tampered with
- No proof of authenticity
- Difficult to verify
- Not suitable for trustless systems
Fluree Verifiable Data
Fluree Approach:
- Transactions cryptographically signed
- Signatures cannot be forged
- Anyone can verify
- Suitable for trustless systems
Benefits:
- Tamper-proof history
- Cryptographic proof
- Easy verification
- Trustless data exchange
Architecture
Signature Storage
Signatures are stored with transactions:
- Transaction Metadata: Signature stored in transaction metadata
- Queryable: Signatures can be queried like any data
- Versioned: Signature history tracked over time
Verification Engine
The verification engine:
- Automatic Verification: Verifies signatures automatically
- Key Resolution: Resolves public keys from signatures
- Standard Compliance: Follows JWS and VC standards
API Integration
Verification integrated with:
- Transaction API: Verifies signatures on transaction submission
- Query API: Can query signature information
- Admin API: Administrative operations on signatures
Verifiable data makes Fluree uniquely suited for applications requiring cryptographic proof of data integrity, audit compliance, and trustless data exchange. By supporting industry-standard signature formats, Fluree enables integration with existing identity systems and credential ecosystems.
Reasoning and Inference
Fluree includes a built-in reasoning engine that can derive new facts from your data based on ontology declarations (RDFS and OWL) or user-defined rules (Datalog). This page introduces the core concepts; see Query-time reasoning for usage syntax, Datalog rules for custom rules, and the OWL & RDFS reference for a full list of supported constructs.
Why reasoning?
In a plain triple store every fact must be stated explicitly. If you assert that
Alice is a Student and that Student is a subclass of Person, a query for
all Person instances will not return Alice — unless you also assert
Alice rdf:type Person.
With reasoning enabled, Fluree can infer the missing fact automatically:
Alice rdf:type Student (asserted)
Student rdfs:subClassOf Person (schema)
────────────────────────────────────────────
Alice rdf:type Person (inferred)
This keeps your data clean (no redundant assertions) while giving your queries the full power of schema-aware retrieval.
Reasoning modes
Fluree supports four reasoning profiles that can be enabled independently or in combination. They are listed here from lightest to most powerful:
| Mode | What it does | Cost |
|---|---|---|
| RDFS | Expands rdfs:subClassOf and rdfs:subPropertyOf hierarchies so that querying for a superclass or superproperty also returns instances of its subclasses/subproperties. | Very low — query rewriting only, no materialization. |
| OWL 2 QL | Everything RDFS does, plus owl:inverseOf expansion and rdfs:domain/rdfs:range type inference via query rewriting. Based on the OWL 2 QL profile designed for query answering. | Low — query rewriting only. |
| OWL 2 RL | Forward-chaining materialization of a comprehensive rule set (symmetric, transitive, and inverse properties; functional properties; property chains; class restrictions; owl:sameAs equivalence; and more). See the OWL & RDFS reference for the full list. | Medium — derives facts before query execution; results are cached. |
| Datalog | User-defined if/then rules expressed in a familiar JSON-LD pattern syntax. Rules run in a fixpoint loop and can chain off each other or off OWL-derived facts. See Datalog rules. | Depends on the rules — can be lightweight or heavy. |
Combining modes
Modes can be combined freely. For example, ["rdfs", "owl2rl", "datalog"]
first materializes OWL 2 RL entailments, then runs your Datalog rules over the
combined base + OWL-derived data, and finally applies RDFS query rewriting on
top. This layering lets you start simple (RDFS) and add more powerful inference
only where you need it.
How it works
Fluree uses two complementary techniques depending on the mode:
Query rewriting (RDFS, OWL 2 QL)
The query planner rewrites your patterns at compile time. For example, a
?x rdf:type ex:Person pattern is expanded into a UNION over Person and all
of its subclasses. No extra data is stored; the rewriting is transparent to the
caller.
Forward-chaining materialization (OWL 2 RL, Datalog)
Before your query runs, the engine:
- Loads the ontology — extracts OWL/RDFS declarations (property types, class hierarchies, restrictions) from your data.
- Applies rules in a fixpoint loop — each iteration derives new facts from the combination of asserted and previously-derived facts. The loop stops when no new facts are produced (fixpoint) or a budget limit is reached.
- Overlays derived facts — the inferred triples are layered on top of your base data as a read-only overlay. Your original data is never modified.
- Caches the result — if the same database state is queried again with the same reasoning modes, the cached materialization is reused instantly.
Budget controls
To guarantee termination, materialization enforces configurable limits:
| Limit | Default | What happens when exceeded |
|---|---|---|
| Time | 30 seconds | Materialization stops; partial results used |
| Derived facts | 1,000,000 | Materialization stops; partial results used |
| Memory | 100 MB | Materialization stops; partial results used |
When a budget is exceeded the query still runs — it simply uses whatever facts were derived before the limit was hit. Diagnostics are available via tracing spans to identify when capping occurs.
Enabling reasoning
There are two levels of control:
1. Ledger-wide defaults (configuration graph)
Set reasoning defaults so every query against a ledger uses a particular mode without having to specify it each time:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"insert": {
"@id": "urn:fluree:mydb:main:config:ledger",
"@type": "f:LedgerConfig",
"f:reasoningDefaults": {
"f:reasoningModes": {"@id": "f:RDFS"},
"f:overrideControl": {"@id": "f:OverrideAll"}
}
}
}
See Setting groups — reasoningDefaults for full configuration options.
2. Per-query override
Any query can specify or override the reasoning mode:
{
"select": ["?s"],
"where": {"@id": "?s", "@type": "ex:Person"},
"reasoning": "rdfs"
}
Use "reasoning": "none" to explicitly disable reasoning for a single query,
even if the ledger has defaults configured.
See Query-time reasoning for complete syntax and examples.
Key concepts
Schema as data
Unlike systems with external schema files, Fluree stores ontology declarations
as regular triples in your graph. An rdfs:subClassOf assertion is just another
triple — you add it via a normal transaction:
{
"@context": {
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"ex": "http://example.org/"
},
"insert": {
"@id": "ex:Student",
"rdfs:subClassOf": {"@id": "ex:Person"}
}
}
This means your schema evolves with your data, is time-travelable, and is subject to the same policy controls as any other data.
Derived facts are virtual
Inferred triples exist only in a query-time overlay — they are never written to storage. This means:
- No storage bloat — you don’t pay disk costs for derived facts.
- Always consistent — derived facts are recomputed from the current state, so they can never go stale.
- Time-travel safe — querying a historical point in time materializes based on that point’s data and schema.
owl:sameAs and identity
When OWL 2 RL is enabled, the engine tracks owl:sameAs equivalences using an
efficient union-find data structure. If two resources are determined to be the
same (via functional properties, inverse functional properties, or owl:hasKey),
all their facts are merged under a canonical representative. Queries
transparently resolve through these equivalences.
What to read next
| Topic | Page |
|---|---|
| Using reasoning in queries | Query-time reasoning |
| Writing custom inference rules | Datalog rules |
| Full list of supported OWL & RDFS constructs | OWL & RDFS reference |
| Configuring ledger-wide defaults | Setting groups |
Guides
Practical, task-oriented cookbooks for Fluree’s key features. Each guide shows working patterns you can adapt to your use case.
If you’re new to Fluree, start with the Getting Started section first.
Cookbooks
Full-Text and Vector Search
Set up BM25 full-text search and vector similarity. Insert searchable data, write relevance-ranked queries, combine search with graph patterns, and build hybrid text+vector search.
Time Travel
Practical patterns for temporal queries: audit trails, point-in-time comparison, compliance snapshots, recovering deleted data, and transaction metadata.
Branching and Merging
Git-like workflows for data: safe experimentation, review-before-merge, multi-environment setups, feature branches, and rebase strategies.
Access Control Policies
Set up fine-grained access control: department isolation, role-based access, property redaction, multi-tenant isolation, and default-deny patterns.
SHACL Validation
Define data quality constraints: required properties, datatype validation, value ranges, string patterns, cardinality, and allowed values.
Cookbook: Full-Text and Vector Search
Fluree integrates BM25 full-text search and vector similarity directly into the query engine. Search results participate in joins, filters, and aggregations like any other graph pattern — no external search service needed.
This guide covers practical patterns for both approaches.
Quick start: full-text search
1. Insert searchable data
Annotate string values with @fulltext to make them searchable:
fluree insert '{
"@context": {"ex": "http://example.org/"},
"@graph": [
{
"@id": "ex:doc1",
"@type": "ex:Article",
"ex:title": "Introduction to Graph Databases",
"ex:body": {
"@value": "Graph databases model data as nodes and edges, making relationship queries fast and intuitive. Unlike relational databases, graph databases traverse relationships without expensive joins.",
"@type": "@fulltext"
}
},
{
"@id": "ex:doc2",
"@type": "ex:Article",
"ex:title": "Time Series vs Graph: When to Use Which",
"ex:body": {
"@value": "Time series databases excel at ordered, append-only data. Graph databases shine when relationships between entities matter more than temporal ordering.",
"@type": "@fulltext"
}
},
{
"@id": "ex:doc3",
"@type": "ex:Article",
"ex:title": "Building REST APIs with Rust",
"ex:body": {
"@value": "Rust provides memory safety without garbage collection, making it ideal for high-performance API servers. Popular frameworks include Actix and Axum.",
"@type": "@fulltext"
}
}
]
}'
In Turtle, use ^^f:fullText:
fluree insert '
@prefix ex: <http://example.org/> .
@prefix f: <https://ns.flur.ee/db#> .
ex:doc4 a ex:Article ;
ex:title "SPARQL Query Optimization" ;
ex:body "Optimizing SPARQL queries requires understanding triple patterns, join ordering, and index selection. The query planner reorders patterns based on estimated cardinality."^^f:fullText .
'
2. Search with relevance scoring
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?title", "?score"],
"where": [
{"@id": "?doc", "@type": "ex:Article", "ex:body": "?body", "ex:title": "?title"},
["bind", "?score", "(fulltext ?body \"graph database relationships\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}'
The fulltext() function returns a BM25 relevance score. Higher scores mean better matches. Documents with none of the search terms score 0.
3. Combine search with graph filters
Search only within a specific category:
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?title", "?score"],
"where": [
{
"@id": "?doc", "@type": "ex:Article",
"ex:body": "?body", "ex:title": "?title",
"ex:category": "databases"
},
["bind", "?score", "(fulltext ?body \"query optimization\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]]
}'
Place graph filters before the fulltext() bind to reduce the number of documents scored.
Patterns
Search across multiple properties
If both title and body are @fulltext, score them separately and combine:
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?title", "?combined"],
"where": [
{"@id": "?doc", "ex:ftTitle": "?ft", "ex:body": "?body", "ex:title": "?title"},
["bind", "?titleScore", "(fulltext ?ft \"graph databases\")"],
["bind", "?bodyScore", "(fulltext ?body \"graph databases\")"],
["bind", "?combined", "(+ (* ?titleScore 2.0) ?bodyScore)"],
["filter", "(> ?combined 0)"]
],
"orderBy": [["desc", "?combined"]]
}'
This weights title matches 2x higher than body matches.
Search with time travel
Search the knowledge base as it existed at a previous point in time:
fluree query '{
"@context": {"ex": "http://example.org/"},
"from": "mydb:main@t:5",
"select": ["?title", "?score"],
"where": [
{"@id": "?doc", "ex:body": "?body", "ex:title": "?title"},
["bind", "?score", "(fulltext ?body \"deployment\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]]
}'
Search with aggregation
Count matches by category:
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?category", "?count"],
"where": [
{"@id": "?doc", "ex:body": "?body", "ex:category": "?category"},
["bind", "?score", "(fulltext ?body \"database\")"],
["filter", "(> ?score 0)"]
],
"groupBy": "?category",
"aggregate": {"?count": ["count", "?doc"]}
}'
Quick start: vector search
1. Insert vector embeddings
Annotate arrays with @vector:
fluree insert '{
"@context": {"ex": "http://example.org/"},
"@graph": [
{
"@id": "ex:product1",
"@type": "ex:Product",
"ex:name": "Wireless Headphones",
"ex:embedding": {"@value": [0.82, 0.15, 0.91, 0.23], "@type": "@vector"}
},
{
"@id": "ex:product2",
"@type": "ex:Product",
"ex:name": "Bluetooth Speaker",
"ex:embedding": {"@value": [0.78, 0.12, 0.88, 0.31], "@type": "@vector"}
},
{
"@id": "ex:product3",
"@type": "ex:Product",
"ex:name": "Running Shoes",
"ex:embedding": {"@value": [0.11, 0.95, 0.05, 0.87], "@type": "@vector"}
}
]
}'
Vectors are stored as f32. Values are quantized at ingest time.
2. Find similar items
Use cosineSimilarity (or dotProduct, euclideanDistance) to rank by similarity:
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?name", "?sim"],
"where": [
{"@id": "?product", "@type": "ex:Product", "ex:name": "?name", "ex:embedding": "?vec"},
["bind", "?sim", "(cosineSimilarity ?vec [0.80, 0.14, 0.90, 0.25])"],
["filter", "(> ?sim 0.9)"]
],
"orderBy": [["desc", "?sim"]],
"limit": 5
}'
3. Combine vector search with graph patterns
Find products similar to a query vector, but only in a specific category:
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?name", "?sim"],
"where": [
{
"@id": "?product", "@type": "ex:Product",
"ex:name": "?name", "ex:embedding": "?vec",
"ex:category": "electronics"
},
["bind", "?sim", "(cosineSimilarity ?vec [0.80, 0.14, 0.90, 0.25])"]
],
"orderBy": [["desc", "?sim"]],
"limit": 10
}'
Hybrid search: text + vector
Combine BM25 keyword relevance with vector semantic similarity for the best of both:
fluree query '{
"@context": {"ex": "http://example.org/"},
"select": ["?name", "?hybrid"],
"where": [
{
"@id": "?doc", "ex:name": "?name",
"ex:description": "?desc", "ex:embedding": "?vec"
},
["bind", "?textScore", "(fulltext ?desc \"wireless audio\")"],
["bind", "?vecScore", "(cosineSimilarity ?vec [0.80, 0.14, 0.90, 0.25])"],
["bind", "?hybrid", "(+ (* ?textScore 0.4) (* ?vecScore 0.6))"],
["filter", "(> ?hybrid 0)"]
],
"orderBy": [["desc", "?hybrid"]],
"limit": 10
}'
Adjust the weights (0.4 text, 0.6 vector) based on your use case. Keyword search is better for exact term matching; vector search is better for semantic similarity.
When to use which
| Approach | Best for | Scale |
|---|---|---|
Inline @fulltext | Keyword search, document ranking | Up to ~500K documents per property |
| BM25 graph source | Large-scale text search with WAND pruning | 1M+ documents |
Inline @vector + similarity | Small-to-medium similarity search | Up to ~100K vectors |
| HNSW index | Large-scale approximate nearest neighbor | 100K+ vectors |
Performance tips
- Place graph filters before search — Reduce the candidate set before scoring
- Use
limit— BM25 and similarity scoring are per-document operations - Wait for indexing — Inline
@fulltextworks without an index (novelty fallback) but is 7x faster with a built index - Choose the right scale — Inline functions work well up to hundreds of thousands of documents. For millions, use the dedicated graph source pipeline
Related documentation
- Inline Fulltext Search —
@fulltextdatatype reference - BM25 Graph Source — Large-scale text search pipeline
- Vector Search —
@vectordatatype and HNSW indexes - JSON-LD Query — Full query language reference
Cookbook: Time Travel
Every transaction in Fluree is immutable. The database preserves complete history automatically — no audit tables, no trigger-based logging, no slowly-changing dimensions. This guide covers practical patterns for using time travel.
Basics
Query by transaction number
Every transaction increments a counter (t). Query data as it was after any transaction:
# Current state
fluree query 'SELECT ?name ?salary WHERE { ?p schema:name ?name ; ex:salary ?salary }'
# State after transaction 5
fluree query --at 5 'SELECT ?name ?salary WHERE { ?p schema:name ?name ; ex:salary ?salary }'
# State after the very first transaction
fluree query --at 1 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'
Query by ISO timestamp
Use a timestamp to query the state at a specific moment:
fluree query --at 2025-01-15T00:00:00Z \
'SELECT ?name ?email WHERE { ?p schema:name ?name ; schema:email ?email }'
Fluree finds the most recent transaction at or before the given timestamp.
Query by commit ID
Every commit has a content-addressed ID (CID). Query by exact commit:
fluree query --at bafyreif... \
'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'
HTTP API
# By transaction number
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main&t=5' \
-H "Content-Type: application/sparql-query" \
-d 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'
# By timestamp (URL-encoded)
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main&t=2025-01-15T00%3A00%3A00Z' \
-H "Content-Type: application/sparql-query" \
-d 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'
JSON-LD query with time specifier
{
"from": "mydb:main@t:5",
"select": ["?name"],
"where": [{"@id": "?p", "schema:name": "?name"}]
}
Patterns
Audit trail: who changed what
View the history of changes to a specific entity:
fluree history 'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?prop ?value ?t ?op WHERE {
ex:alice ?prop ?value .
}'
Each result includes:
?t— the transaction number?op—assert(added) orretract(removed)
Point-in-time comparison
Compare an entity before and after a change:
# Before the change (t=5)
fluree query --at 5 'SELECT ?salary WHERE { ex:alice ex:salary ?salary }'
# After the change (t=6)
fluree query --at 6 'SELECT ?salary WHERE { ex:alice ex:salary ?salary }'
Find when a value changed
Track salary history:
fluree history 'SELECT ?salary ?t ?op WHERE { ex:alice ex:salary ?salary }'
Output:
?salary ?t ?op
85000 1 assert ← Initial salary
85000 4 retract ← Old value removed
95000 4 assert ← New value added
95000 7 retract
110000 7 assert ← Another raise
Each update produces a retract/assert pair at the same t.
Compliance snapshot
Generate a report of all data as it existed on a specific date:
fluree query --at 2025-06-30T23:59:59Z --format csv \
'PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT ?name ?department ?role
WHERE {
?person a schema:Person ;
schema:name ?name ;
ex:department ?department ;
ex:role ?role .
}
ORDER BY ?department ?name' > compliance-report-2025-Q2.csv
This is a reproducible snapshot — running the same query with the same timestamp always returns the same results.
Debugging: find what changed between two points
Compare entity states across a range:
# What was added or removed between t=10 and t=15?
fluree history 'SELECT ?s ?p ?o ?t ?op WHERE {
?s ?p ?o .
FILTER(?t >= 10 && ?t <= 15)
}'
Recover deleted data
Data that was retracted still exists in history:
# Carol was deleted at t=8. Recover her data from t=7:
fluree query --at 7 'SELECT ?prop ?value WHERE { ex:carol ?prop ?value }'
To restore, simply re-insert the data from the historical query.
Multi-ledger time travel
Query two ledgers at different points in time:
{
"from": {
"products": {"ledger": "catalog:main", "t": 10},
"orders": {"ledger": "orders:main", "t": 25}
},
"select": ["?product", "?price", "?qty"],
"where": [
{"@id": "?order", "ex:product": "?p", "ex:quantity": "?qty", "@graph": "orders"},
{"@id": "?p", "schema:name": "?product", "schema:price": "?price", "@graph": "products"}
]
}
This joins product data from t=10 with order data from t=25 — useful for price-at-time-of-purchase analysis.
Temporal aggregation
Track how a metric changed over time:
fluree history 'SELECT ?count ?t ?op WHERE {
ex:dashboard ex:activeUsers ?count
}'
Transaction metadata
Every commit records metadata. Query it via the txn-meta graph:
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?t ?timestamp ?author
FROM <urn:fluree:knowledge-base:main#txn-meta>
WHERE {
?commit f:t ?t ;
f:time ?timestamp .
OPTIONAL { ?commit f:author ?author }
}
ORDER BY DESC(?t)
LIMIT 10
Common questions
Is time travel expensive? No. Querying a historical state uses the same indexes as querying the current state. The cost is O(log n) for index lookups.
Does old data use extra storage? Yes — immutability means retracted values are preserved. Storage grows with the number of changes, not just the current state size. For most workloads this is negligible.
Can I query “between” two points?
History queries return all changes with their transaction numbers. Use FILTER on ?t to scope to a range.
Can I delete history? No. Immutability is a core guarantee. If you need to remove data for compliance (e.g., GDPR right to erasure), contact the Fluree team about data compaction options.
Related documentation
- Time Travel Concepts — Architecture and design
- SPARQL Reference — History query syntax
- JSON-LD Query — Time specifiers in JSON-LD
- Commit Receipts — Transaction metadata
Cookbook: Branching and Merging
Fluree lets you fork a ledger into independent branches, each with its own commit history. Experiment freely, then merge changes back when ready. Think of it like git branch for your data.
Quick start
# Create a branch from main
fluree branch create experiment
# Switch to the branch
fluree use mydb:experiment
# Make changes (only on the branch)
fluree insert '...'
fluree update '...'
# See both branches
fluree branch list
# Merge back into main
fluree branch merge experiment
# Clean up
fluree branch drop experiment
Core concepts
- Branches are isolated — Transactions on one branch are invisible to others
- Branches are cheap — Creating a branch doesn’t copy data; it creates a new commit pointer
- Merge is fast-forward — The target branch must not have diverged. If it has, rebase first
- Source branch survives merge — After merging, the branch can continue receiving transactions
Patterns
Safe experimentation
Try a risky change without affecting production:
fluree branch create try-new-schema
fluree use mydb:try-new-schema
# Restructure data
fluree update 'PREFIX ex: <http://example.org/>
DELETE { ?doc ex:category ?cat }
INSERT { ?doc ex:tags ?cat }
WHERE { ?doc ex:category ?cat }'
# Verify the change looks right
fluree query 'SELECT ?doc ?tag WHERE { ?doc ex:tags ?tag }'
# If it works, merge back
fluree branch merge try-new-schema
fluree branch drop try-new-schema
# If it doesn't work, just drop the branch — main is untouched
fluree branch drop try-new-schema
Review before merge
Use branches as a staging area for data changes:
# Data engineer creates a branch for the weekly import
fluree branch create weekly-import
fluree use mydb:weekly-import
# Import new data
fluree insert -f new-data.ttl
# Verify: count new entities
fluree query 'SELECT (COUNT(?s) AS ?count) WHERE { ?s a ex:NewRecord }'
# Verify: no duplicates
fluree query 'SELECT ?id (COUNT(?s) AS ?count) WHERE {
?s ex:externalId ?id
} GROUP BY ?id HAVING(?count > 1)'
# Looks good — merge into main
fluree branch merge weekly-import
Multi-environment workflow
Use branches to model dev/staging/prod environments within a single ledger:
# Create environment branches
fluree branch create staging
fluree branch create dev --from staging
# Developers work on dev
fluree use mydb:dev
fluree insert '...'
# Promote to staging via merge
fluree branch merge dev --target staging
# Promote to main (production) after testing
fluree use mydb:staging
# ... run validation queries ...
fluree branch merge staging
Feature branches
Multiple people can work on different features simultaneously:
# Team member A: add product categories
fluree branch create feature-categories
# Team member B: update pricing
fluree branch create feature-pricing
# Each works independently on their branch
# ...
# Merge sequentially — first one is a fast-forward
fluree branch merge feature-categories
# Second one may need rebase if main advanced
fluree branch rebase feature-pricing
fluree branch merge feature-pricing
Rebase to catch up with upstream
When main has advanced since you branched:
# Main has new commits that your branch doesn't have
fluree branch rebase my-branch
This replays your branch’s commits on top of main’s current HEAD. Conflict strategies:
| Strategy | Behavior |
|---|---|
take-both (default) | Keep both the source and branch changes |
abort | Stop if any conflicts — let you inspect |
take-source | Source (main) wins on conflict |
take-branch | Branch wins on conflict |
skip | Skip conflicting commits entirely |
# Rebase with abort on conflict for manual review
fluree branch rebase my-branch --strategy abort
# Rebase where main always wins
fluree branch rebase my-branch --strategy take-source
Compare branches
See what’s different between two branches:
# Query branch for entities not in main
fluree query --ledger mydb:my-branch 'SELECT ?s ?p ?o WHERE {
?s ?p ?o .
FILTER NOT EXISTS {
SERVICE <fluree:ledger:mydb:main> { ?s ?p ?o }
}
}'
Time travel across branches
Each branch has its own transaction history. Query any branch at any point in time:
# Branch state after its 3rd transaction
fluree query --ledger mydb:experiment --at 3 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'
Branch at a historical point
By default, branch create starts the new branch at the source’s current HEAD. Pass --at to start it at an earlier commit on the source branch instead — useful for recovering to a known-good state, forking off an older release, or experimenting with what-if scenarios from a past point in time.
# Start a branch at transaction 5 on main
fluree branch create rewind --at t:5
# Or use a hex-digest prefix of the commit
fluree branch create rewind --at 3dd028a7
The commit must be reachable from the source branch’s HEAD (branching from an unrelated branch’s commit is rejected). The new branch starts with no index and replays from genesis on first query — acceptable for small/medium histories; if replay cost matters, transact a small no-op to force an index rebuild.
Full CIDs are also accepted (--at fluree:commit:sha256:...) and resolve without requiring the source to be indexed; t:N and hex prefixes require an indexed source.
Branch lifecycle
create ──→ transact ──→ rebase (if needed) ──→ merge ──→ drop
↑ │
└──────── continue working ←───────┘
After merging, the branch is still alive. You can:
- Continue transacting on it (for ongoing work)
- Merge again later (only new commits since last merge are copied)
- Drop it when done
HTTP API
# Create a branch
curl -X POST http://localhost:8090/v1/fluree/branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "dev", "source": "main"}'
# Branch at a historical commit
curl -X POST http://localhost:8090/v1/fluree/branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "rewind", "at": "t:5"}'
# Query a specific branch
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:dev' \
-H "Content-Type: application/sparql-query" \
-d 'SELECT ?s ?p ?o WHERE { ?s ?p ?o }'
# Merge
curl -X POST http://localhost:8090/v1/fluree/merge \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "source": "dev"}'
Best practices
- Name branches descriptively —
weekly-import-2025-04,feature-product-tags, nottest1 - Keep branches short-lived — Long-lived branches diverge more, making rebase harder
- Merge frequently — Small, frequent merges are easier than large, infrequent ones
- Test before merging — Run validation queries on the branch before promoting
- Drop after merging — Clean up branches you’re done with
Related documentation
- CLI: branch — Full command reference
- Ledgers and Nameservice — Branch architecture
- Time Travel — Temporal queries on branches
Cookbook: Access Control Policies
Fluree policies enforce access control inside the database — individual facts (flakes) are filtered based on the requesting identity. The same query returns different results for different users, automatically. No application-layer filtering needed.
This cookbook walks through the common patterns. For the underlying model see Policy enforcement; for the full reference see Policy model and inputs.
How a policy is shaped
Every policy is a JSON-LD node typed f:AccessPolicy. It has three orthogonal pieces:
| Field | Purpose |
|---|---|
| What it targets | f:onProperty, f:onClass, f:onSubject (any combination, each an array of @id references). Omit all three to make a default policy that applies to every flake. |
| What it governs | f:action — f:view (queries), f:modify (transactions), or both. |
| Whether it permits | Either f:allow: true (unconditional allow), f:allow: false (deny), or f:query: "<JSON-encoded WHERE>" (allow when the embedded query returns at least one binding for the target). |
Two more knobs:
f:required: true— the policy must allow for access to be granted on its targets, even ifdefault-allowis true. Use it for hard constraints.f:exMessage— error message returned to the caller when the policy denies a transaction.
Inside f:query, two special variables are pre-bound: ?$this (the entity being checked) and ?$identity (the requesting identity, supplied via policy-values).
Quick start
1. Insert sample data
fluree insert '{
"@context": {
"schema": "http://schema.org/",
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice Chen",
"ex:role": "engineer",
"ex:department": "platform",
"ex:salary": 130000
},
{
"@id": "ex:bob",
"@type": "schema:Person",
"schema:name": "Bob Martinez",
"ex:role": "manager",
"ex:department": "platform",
"ex:salary": 155000
},
{
"@id": "ex:carol",
"@type": "schema:Person",
"schema:name": "Carol White",
"ex:role": "engineer",
"ex:department": "marketing",
"ex:salary": 115000
}
]
}'
Add identity records that link DIDs / users to the entities they represent:
fluree insert '{
"@context": {"ex": "http://example.org/", "f": "https://ns.flur.ee/db#"},
"@graph": [
{ "@id": "ex:aliceIdentity", "ex:user": {"@id": "ex:alice"},
"f:policyClass": [{"@id": "ex:CorpPolicy"}] },
{ "@id": "ex:bobIdentity", "ex:user": {"@id": "ex:bob"},
"f:policyClass": [{"@id": "ex:CorpPolicy"}] }
]
}'
f:policyClass tags an identity with the set of policy classes that apply to it — every stored policy of that class will be loaded automatically when this identity makes a request.
2. Insert policies
Policies are data — they go into the ledger like any other graph:
fluree insert '{
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:salary-restriction",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "ex:salary"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$subject\"}, \"http://example.org/role\": \"manager\", \"http://example.org/department\": \"?dept\"}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/department\": \"?dept\"}}"
},
{
"@id": "ex:default-view",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:action": [{"@id": "f:view"}],
"f:allow": true
}
]
}'
What this set of two policies says:
ex:salary-restrictionis required forex:salary: a request can readex:salaryonly whenf:queryreturns a binding. The query says: given the identity, find the user it represents; if that user is a manager in the same department as the entity being viewed (?$this), allow.ex:default-viewallows reading everything else.
f:query is stored as a JSON string inside the policy because RDF can’t hold structured JSON natively. When loaded, the engine parses it and runs it as a subquery with ?$this and ?$identity pre-bound.
3. Query as different identities
As Alice (engineer in platform — no manager privilege):
fluree query '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"from": "mydb:main",
"select": ["?name", "?salary"],
"where": [
{"@id": "?p", "schema:name": "?name"},
["optional", {"@id": "?p", "ex:salary": "?salary"}]
],
"opts": {
"identity": "ex:aliceIdentity",
"policy-class": ["ex:CorpPolicy"],
"default-allow": false
}
}'
Alice sees every name but no salaries — the required policy denies ex:salary because she isn’t a manager.
As Bob (manager in platform):
Same query, but "identity": "ex:bobIdentity". Bob sees salaries for Alice and Bob (same department) but Carol’s salary stays hidden — different department.
Inline policies (no insert needed)
Don’t want to commit policies to the ledger yet? Pass them inline via opts.policy:
{
"from": "mydb:main",
"select": "?name",
"where": [{"@id": "?p", "schema:name": "?name"}],
"opts": {
"policy": [
{
"@id": "ex:adhoc-allow",
"@type": "f:AccessPolicy",
"f:action": "f:view",
"f:allow": true
}
],
"default-allow": false
}
}
Inline policies are useful for one-off queries, automated tests, and admin scripts. Stored policies (with policy-class) are the right approach for production access control because they’re versioned, time-travelable, and consistent across all requests.
Patterns
Public read
{
"@id": "ex:public-read",
"@type": "f:AccessPolicy",
"f:action": [{"@id": "f:view"}],
"f:allow": true
}
A default-allow policy with no targeting applies to every flake.
Owner-only access
{
"@id": "ex:owner-only",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$user\"}}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/owner\": {\"@id\": \"?$user\"}}}"
}
The query resolves ?$identity → user, then checks that ?$this (the entity being read or written) has that user as its ex:owner.
Property redaction (hide a property unless permitted)
[
{
"@id": "ex:hide-ssn",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "ex:ssn"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"
},
{
"@id": "ex:default-view",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:action": [{"@id": "f:view"}],
"f:allow": true
}
]
f:onProperty scopes the restriction to ex:ssn only — every other property still falls under ex:default-view. f:required: true means the SSN policy MUST allow for any SSN flake to be visible (the default allow doesn’t override it on this property).
Class-scoped restriction
{
"@id": "ex:employee-only",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onClass": [{"@id": "ex:Employee"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/Employee\"}}"
}
Anyone querying for ex:Employee instances must themselves be tagged as an employee.
Multi-tenant isolation
{
"@id": "ex:tenant-isolation",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/tenant\": \"?tenant\"}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/tenant\": \"?tenant\"}}"
}
Each tenant only sees and writes data tagged with their own ex:tenant. Required-no-targeting means it applies to every flake.
Hierarchical access (manager sees direct reports)
{
"@id": "ex:manager-sees-reports",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:onClass": [{"@id": "schema:Person"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$mgr\"}}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/reportsTo\": {\"@id\": \"?$mgr\"}}}"
}
Write protection
{
"@id": "ex:no-direct-writes",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "ex:approved"}],
"f:action": [{"@id": "f:modify"}],
"f:exMessage": "ex:approved is set by the workflow service only.",
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/WorkflowService\"}}"
}
When the policy denies a transaction, f:exMessage is returned to the client.
Combining algorithm
When multiple policies match a flake:
- A required policy must allow. If any required policy denies (or returns no
f:querybindings), access is denied. - If no required policy applies, any allow is enough — Fluree uses allow-overrides over the non-required set.
- If no policy applies, the request falls back to
default-allow. Settingdefault-allow: falseis the fail-closed default for production.
See Policy model and inputs for the full state diagram.
Invoking policies via HTTP
Policies are passed via opts on JSON-LD requests, and via headers on SPARQL requests.
JSON-LD
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $JWT" \
-d '{
"from": "mydb:main",
"select": "?name",
"where": [{"@id": "?p", "schema:name": "?name"}],
"opts": {
"identity": "ex:aliceIdentity",
"policy-class": ["ex:CorpPolicy"],
"default-allow": false
}
}'
SPARQL (headers — no opts block in SPARQL)
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main' \
-H 'Content-Type: application/sparql-query' \
-H "Authorization: Bearer $JWT" \
-H 'fluree-identity: ex:aliceIdentity' \
-H 'fluree-policy-class: ex:CorpPolicy' \
-H 'fluree-default-allow: false' \
-d 'SELECT ?name WHERE { ?p <http://schema.org/name> ?name }'
| Header | JSON-LD opts field | Value |
|---|---|---|
fluree-identity | identity | IRI of an identity entity |
fluree-policy-class | policy-class | Comma-separated or repeated header; matches f:policyClass on stored policies |
fluree-policy-values | policy-values | JSON object — extra ?$var bindings for policy queries |
fluree-policy | policy | Inline JSON-LD policy array |
fluree-default-allow | default-allow | true / false |
When the bearer token is verified and the server is configured with data_auth_default_policy_class, the verified identity is auto-applied to policy-values and the configured class to policy-class. See Configuration for those server-side settings.
Policies are data
Because policies live as flakes in the ledger:
- Time-travel — query at any past
tto see the policies in effect then. - Audit —
SELECT ?p ?action WHERE { ?p a f:AccessPolicy ; f:action ?action }. - Versionable — change policies through normal transactions; full history kept.
- Branchable — try new policies on a branch before merging to main.
Best practices
- Start with
default-allow: falseand required policies. Fail-closed is easier to reason about than fail-open. - Tag every stored policy with a class (e.g.
ex:CorpPolicy) and tag every identity withf:policyClass. Passpolicy-classat query time — Fluree pulls in the matching policy set automatically. - Use
f:onProperty/f:onClass/f:onSubjectaggressively. A targeted policy is cheaper to evaluate than a default policy, because Fluree can short-circuit during flake filtering. - Keep
f:querysimple. It runs once per flake-target during evaluation. Lean on tagged identity properties (@type,f:policyClass, role flags) rather than deep traversals. - Test with multiple identities. Verify the same query returns the right shape for each role.
- Document intent. Add
rdfs:labelandrdfs:commentto your policy nodes so audits are readable.
Related documentation
- Policy enforcement (concepts) — model and architecture
- Policy model and inputs — full reference
- Policy in queries — query-time enforcement details
- Policy in transactions — transaction-time enforcement
- Programmatic policy API (Rust) — building policy contexts in code
- Authentication — identity, JWTs, and bearer tokens
Cookbook: SHACL Validation
SHACL (Shapes Constraint Language) is a W3C standard for defining constraints on graph data. In Fluree, SHACL shapes are evaluated at transaction time — invalid data is rejected before it’s committed (or logged as a warning, depending on your config).
This guide covers:
- When SHACL runs — with and without a config graph
- Enabling SHACL via the config graph
- Defining shapes — node shapes, property shapes, targets
- Constraint patterns — cardinality, datatype, ranges, patterns, values, class, pair, logical
- Subclass reasoning for
sh:class - Predicate-target shapes —
sh:targetSubjectsOf/sh:targetObjectsOf - Per-graph enable/disable and warn vs reject modes
- Storing shapes in a named graph with
f:shapesSource - What isn’t enforced yet
When SHACL runs
Fluree decides whether to run SHACL validation on each transaction using this order:
- If a config graph exists with
f:shaclDefaults— follow the configured settings per graph (enable/disable, mode). - If no config graph section is present — fall back to the shapes-exist heuristic: if any SHACL shapes are present in the database (as regular RDF triples), validation runs in
Rejectmode. If no shapes are present, validation is skipped entirely (zero overhead).
This means you can start using SHACL without writing any config — just transact shapes and they’re enforced.
The shacl feature must be enabled at build time (it’s on by default for the server and CLI binaries). See Standards and feature flags.
Enabling SHACL via the config graph
Writing ledger config is done via transactions into the config graph, whose IRI is always urn:fluree:{ledger_id}#config. See Writing config data for the full pattern.
Minimal config: enable SHACL, shapes in the default graph
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:config:main> a f:LedgerConfig ;
f:shaclDefaults [
f:shaclEnabled true ;
f:validationMode f:ValidationReject
] .
}
Notes:
f:shaclEnableddefaults tofalsewhen af:shaclDefaultssection exists without it — make the enable decision explicit.f:validationModedefaults tof:ValidationReject. Usef:ValidationWarnto log violations without failing the transaction.- With no explicit
f:shapesSource, shapes are compiled from the default graph (f:defaultGraph, g_id=0). See Storing shapes in a named graph to load from elsewhere.
Defining shapes
Shapes are ordinary RDF — transact them like any other data. They can be written in Turtle, TriG, or JSON-LD.
Node shape with property constraints
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:PersonShape a sh:NodeShape ;
sh:targetClass schema:Person ;
sh:property [
sh:path schema:name ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:message "Every person must have exactly one name"
] ;
sh:property [
sh:path schema:email ;
sh:datatype xsd:string ;
sh:pattern "^[^@]+@[^@]+\\.[^@]+$" ;
sh:message "Email must be a valid email address"
] ;
sh:property [
sh:path ex:age ;
sh:datatype xsd:integer ;
sh:minInclusive 0 ;
sh:maxInclusive 200
] .
Target types
| Target | Effect |
|---|---|
sh:targetClass <C> | Every subject with rdf:type <C> (including RDFS subclasses of <C> when the hierarchy is available) |
sh:targetNode <N> | The specific subject <N> |
sh:targetSubjectsOf <P> | Every subject that currently has predicate <P> |
sh:targetObjectsOf <P> | Every node that currently appears as the object of <P> |
See Predicate-target shapes for notes on how the staged-path validator discovers focus nodes for sh:targetSubjectsOf / sh:targetObjectsOf.
Constraint patterns
Cardinality — required and multi-valued
ex:ArticleShape a sh:NodeShape ;
sh:targetClass ex:Article ;
sh:property [ sh:path ex:title ; sh:minCount 1 ; sh:maxCount 1 ] ;
sh:property [ sh:path ex:tag ; sh:minCount 1 ] .
Datatype
ex:ProductShape a sh:NodeShape ;
sh:targetClass ex:Product ;
sh:property [ sh:path ex:price ; sh:datatype xsd:decimal ] ;
sh:property [ sh:path ex:inStock ; sh:datatype xsd:boolean ] .
Numeric ranges
ex:OrderShape a sh:NodeShape ;
sh:targetClass ex:Order ;
sh:property [
sh:path ex:quantity ;
sh:datatype xsd:integer ;
sh:minInclusive 1 ;
sh:maxInclusive 10000
] .
Available: sh:minInclusive, sh:maxInclusive, sh:minExclusive, sh:maxExclusive.
String patterns and length
ex:UserShape a sh:NodeShape ;
sh:targetClass ex:User ;
sh:property [
sh:path ex:username ;
sh:datatype xsd:string ;
sh:minLength 3 ;
sh:maxLength 32 ;
sh:pattern "^[a-zA-Z0-9_]+$"
] .
sh:pattern accepts an optional sh:flags string (e.g. "i" for case-insensitive).
Node kind
ex:RefShape sh:property [
sh:path ex:owner ;
sh:nodeKind sh:IRI
] .
Values: sh:IRI, sh:BlankNode, sh:Literal, sh:BlankNodeOrIRI, sh:BlankNodeOrLiteral, sh:IRIOrLiteral.
Enumerated values
ex:TaskShape a sh:NodeShape ;
sh:targetClass ex:Task ;
sh:property [
sh:path ex:status ;
sh:in ( "todo" "in-progress" "review" "done" )
] .
sh:hasValue requires a specific value to be present.
Class constraint (with RDFS subclass reasoning)
ex:OrderShape a sh:NodeShape ;
sh:targetClass ex:Order ;
sh:property [
sh:path ex:customer ;
sh:class schema:Person ;
sh:minCount 1
] .
Each value of ex:customer must have rdf:type schema:Person — or rdf:type of any class that is rdfs:subClassOf* schema:Person. See RDFS subclass reasoning for sh:class.
Pair constraints — comparing two properties
ex:EventShape a sh:NodeShape ;
sh:targetClass ex:Event ;
sh:property [
sh:path ex:startYear ;
sh:lessThan ex:endYear
] ;
sh:property [
sh:path ex:primaryEmail ;
sh:disjoint ex:secondaryEmail
] .
| Constraint | Semantic |
|---|---|
sh:equals <P> | Value sets for this path and <P> must be identical |
sh:disjoint <P> | Value sets must not overlap |
sh:lessThan <P> | Every value on this path must be strictly less than every value of <P> |
sh:lessThanOrEquals <P> | Every value on this path must be ≤ every value of <P> |
Logical constraints
ex:ContactShape a sh:NodeShape ;
sh:targetClass ex:Contact ;
sh:or (
[ sh:property [ sh:path schema:email ; sh:minCount 1 ] ]
[ sh:property [ sh:path schema:telephone ; sh:minCount 1 ] ]
) .
Available: sh:not, sh:and, sh:or, sh:xone.
Closed shapes
ex:StrictPersonShape a sh:NodeShape ;
sh:targetClass ex:StrictPerson ;
sh:closed true ;
sh:ignoredProperties ( rdf:type ) ;
sh:property [ sh:path schema:name ; sh:minCount 1 ] .
A closed shape forbids any property not explicitly declared (or listed in sh:ignoredProperties). rdf:type is implicitly ignored per the SHACL spec.
RDFS subclass reasoning for sh:class
sh:class honors rdfs:subClassOf. Example:
ex:Novelist rdfs:subClassOf schema:Person .
ex:pratchett rdf:type ex:Novelist .
ex:BookShape sh:property [
sh:path ex:author ;
sh:class schema:Person
] .
A book whose ex:author is ex:pratchett conforms — ex:pratchett is a schema:Person via rdfs:subClassOf.
Fluree resolves this in two tiers:
- Fast path: the ledger’s indexed schema hierarchy (
SchemaHierarchy). Expanded at engine build time so same-class and descendant-class matches are O(1) hashmap hits. - Live fallback: when the subclass relation was asserted in the current transaction (or any earlier unindexed commit), the fast path misses. The engine then walks
rdfs:subClassOfvia a BFS on the database’s SPOT index. This walk is scoped to the default graph regardless of the subject’s own graph — matching howSchemaHierarchyis built and preventing cross-graph issues.
Predicate-target shapes
sh:targetSubjectsOf(P) and sh:targetObjectsOf(P) depend on the current state of the database — a subject is a focus node iff it actually has (or is referenced by) predicate P in the post-transaction view.
Fluree does not precompute target hints from staged flakes. Instead, for each focus node being validated, the engine does a bounded existence check against the post-state:
sh:targetSubjectsOf(P)→ SPOT range query(focus, P, _). Non-empty → shape applies.sh:targetObjectsOf(P)→ OPST range query(_, P, focus). Non-empty → shape applies.
This means:
- A base-state
(alice, ex:ssn, "123")makessh:targetSubjectsOf(ex:ssn)fire on alice even when this transaction only retractsex:name. - A retraction-only transaction that removes the last matching edge means the shape no longer applies — the post-state check returns empty.
- The check is bounded by the number of predicate-targeted shapes in the cache, not the data size.
Ref-objects of asserted flakes are pulled into the focus set for their graph, so newly-introduced inbound edges trigger validation of the referenced node.
Per-graph configuration
Each named graph can have its own f:shaclEnabled and f:validationMode via f:graphOverrides:
@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:config:main> a f:LedgerConfig ;
# Ledger-wide: SHACL on, reject on violation.
f:shaclDefaults [
f:shaclEnabled true ;
f:validationMode f:ValidationReject ;
f:overrideControl f:OverrideAll
] ;
# Per-graph: ex:scratch has SHACL off; ex:audit uses warn mode.
f:graphOverrides
[ a f:GraphConfig ;
f:targetGraph ex:scratch ;
f:shaclDefaults [ f:shaclEnabled false ]
],
[ a f:GraphConfig ;
f:targetGraph ex:audit ;
f:shaclDefaults [ f:validationMode f:ValidationWarn ]
] .
}
With this config:
- A violating write to the default graph is rejected (ledger-wide
Reject). - A violating write to
ex:scratchpasses without validation (graph disabled). - A violating write to
ex:auditpasses but emits atracing::warn!(Warnmode). - A single multi-graph transaction can mix modes: reject-bucket violations fail the txn; warn-bucket violations get logged.
Monotonicity
Per-graph configs can only tighten the ledger-wide posture:
| Ledger-wide | Per-graph | Effective |
|---|---|---|
enabled: false, OverrideNone | enabled: true | disabled (OverrideNone blocks per-graph) |
enabled: true, OverrideAll | enabled: false | disabled for that graph |
mode: warn, OverrideAll | mode: reject | reject for that graph |
See Override control for the full ruleset.
Storing shapes in a named graph
f:shapesSource points the shape compiler at a specific graph. Useful when you want schema / shapes isolated from data — even the config graph itself can be used as a shape source.
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:config:main> a f:LedgerConfig ;
f:shaclDefaults [
f:shaclEnabled true ;
f:shapesSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector <http://example.org/shapes> ]
]
] .
}
Semantics:
f:shapesSourceis authoritative, not additive: when set, shapes come exclusively from the configured graph. Shapes in the default graph are ignored.f:shapesSourceis non-overridable — it can only be set in the config graph, not via transaction/query-time options.- Use
f:graphSelector f:defaultGraphto explicitly point at the default graph (same as omittingf:shapesSource).
Validation modes
f:ValidationReject(default): on any violation, the transaction fails withShaclViolation(report). The formatted report lists each violation’s focus node, property path, and message.f:ValidationWarn: violations are logged viatracing::warn!and the transaction proceeds. Any non-violation error from the SHACL pipeline (compile failure, range-scan failure) still propagates — Warn mode never silently admits a broken validation pipeline.
Working with shapes across write surfaces
SHACL validation runs consistently on every write surface:
- JSON-LD / SPARQL transactions (
fluree insert,fluree upsert,fluree update) - Turtle / TriG ingest (
fluree insert-turtle,stage_turtle_insert) - Commit replay (
push_commits_with_handle, followers applying upstream commits)
All three routes go through the same post-stage helper, so the ledger’s configured SHACL posture (enable/disable, mode, per-graph, shapes source) applies uniformly.
Not yet supported
The following SHACL constructs are parsed/compiled but currently no-ops at validation time. Shapes using them load without error but don’t constrain data:
sh:uniqueLang,sh:languageIn— require language-tag metadata on flakes, which isn’t yet threaded through the validation path.sh:qualifiedValueShape(+sh:qualifiedMinCount/sh:qualifiedMaxCount) — requires recursive nested-shape counting.
These are tracked in the SHACL compliance effort. Contributors: see Contributing / SHACL implementation.
Shapes are data
Because shapes live as regular RDF in your ledger:
- Time-travelable —
@atTquery any shape’s history to see what validation was in effect at a given commit. - Versionable —
delete/insertconstraints through ordinary transactions. - Queryable —
SELECT ?shape ?target WHERE { ?shape sh:targetClass ?target }. - Branchable — test new constraints on a branch; merge when verified.
Best practices
- Start with
sh:minCount— missing-value bugs are the most common data quality issue. - Incremental rollout — deploy shapes in
f:ValidationWarnmode first. Watch the logs for a sprint, then flip tof:ValidationReject. - Per-graph scratch zones — for experimentation, disable SHACL on a named graph so exploratory transactions don’t fail your CI.
sh:messageeverywhere — custom messages are what end users see when a transaction is rejected. Invest in them early.f:shapesSourcefor schema hygiene — keep shapes out of user data graphs so deletes / retractions on user data can’t accidentally touch your schema.
Related documentation
- Setting Groups — SHACL — Configuration reference for
f:shaclDefaults - Override Control — Per-graph / query-time override rules
- Writing Config Data — How to transact into the config graph
- Contributing / SHACL implementation — How the pipeline works internally (for contributors)
Cookbook: owl:imports across named graphs
This walkthrough builds a small two-file ontology, links it together with
owl:imports, applies it to instance data, and shows OWL 2 QL and OWL 2 RL
inference firing through the import.
In Fluree, an owl:imports target must resolve to another named graph in the
same ledger (or to a local graph via f:ontologyImportMap). Cross-ledger
imports are not supported. This tutorial uses three named graphs in one ledger:
| Graph IRI | Role |
|---|---|
| (default graph) | Instance data |
<http://example.org/onto/core> | Core ontology — class hierarchy + owl:imports hub |
<http://example.org/onto/behaviors> | Imported ontology — property characteristics |
<urn:fluree:demo:main#config> | Ledger config — wires up reasoning |
See Reasoning and inference for background and Setting groups → reasoningDefaults for the full config schema.
1. Create the ledger
fluree init
fluree create demo
demo becomes the active ledger. Its full ID is demo:main, which means the
config named graph IRI is urn:fluree:demo:main#config (the #config
fragment is a Fluree convention).
2. Insert instance data into the default graph
Save as 01-data.ttl:
@prefix ex: <http://example.org/> .
# People (typed directly, will be classified further by reasoning)
ex:alice a ex:GradStudent .
ex:bob a ex:Person .
ex:carol a ex:Professor .
# Ancestor chain — exercises owl:TransitiveProperty (declared in the import)
ex:alice ex:hasAncestor ex:eve .
ex:eve ex:hasAncestor ex:frank .
# Living arrangement — exercises owl:SymmetricProperty
ex:alice ex:livesWith ex:bob .
# Parent/child — exercises owl:inverseOf
ex:carol ex:parentOf ex:alice .
# Teaching — exercises rdfs:domain / rdfs:range
ex:professor1 ex:teaches ex:cs101 .
Insert it:
fluree upsert -f 01-data.ttl
# → Committed t=1, 8 flakes
Use
upsert(notinsert) for any TriG document that containsGRAPHblocks. The CLI’sinsertpath parses Turtle straight to flakes and does not extractGRAPHblocks; over HTTP,/v1/fluree/insertrejectsContent-Type: application/trigoutright.upserthandles both Turtle and TriG.
3. Stage the ontology and reasoning config (TriG)
Save as 02-ontology.trig:
@prefix f: <https://ns.flur.ee/db#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex: <http://example.org/> .
# ---- Core ontology: class hierarchy + owl:imports hub -----------------
GRAPH <http://example.org/onto/core> {
<http://example.org/onto/core>
a owl:Ontology ;
owl:imports <http://example.org/onto/behaviors> .
ex:Student rdfs:subClassOf ex:Person .
ex:GradStudent rdfs:subClassOf ex:Student .
ex:Professor rdfs:subClassOf ex:Person .
}
# ---- Imported ontology: property characteristics + domain/range -------
GRAPH <http://example.org/onto/behaviors> {
ex:hasAncestor a owl:TransitiveProperty .
ex:livesWith a owl:SymmetricProperty .
ex:parentOf owl:inverseOf ex:childOf .
ex:teaches rdfs:domain ex:Professor ;
rdfs:range ex:Course .
}
# ---- Reasoning configuration ------------------------------------------
# schemaSource = <onto/core>, followOwlImports = true
# → reasoner walks the import closure and projects schema triples from
# BOTH graphs onto the default graph for inference.
GRAPH <urn:fluree:demo:main#config> {
<urn:demo:cfg>
a f:LedgerConfig ;
f:reasoningDefaults <urn:demo:cfg:reasoning> .
<urn:demo:cfg:reasoning>
f:schemaSource <urn:demo:cfg:schemaref> ;
f:followOwlImports true .
<urn:demo:cfg:schemaref>
a f:GraphRef ;
f:graphSource <urn:demo:cfg:schemasrc> .
<urn:demo:cfg:schemasrc>
f:graphSelector <http://example.org/onto/core> .
}
Submit it:
fluree upsert -f 02-ontology.trig --format turtle
# → Committed t=2, 17 flakes
--format turtle is needed because the file extension .trig is not on the
auto-detect list; the parser treats the contents as Turtle/TriG.
4. Verify base data
Without reasoning, only asserted facts are returned:
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?s",
"where":{"@id":"?s","@type":"ex:Person"},
"reasoning":"none"
}'
# → ["ex:bob"]
Only bob is directly typed Person. The schema and the rest of the
classifications are still hidden behind reasoning.
5. RDFS subclass expansion
rdfs:subClassOf is declared in <onto/core> (the schemaSource).
With RDFS reasoning, querying for Person returns every subclass instance:
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?s",
"where":{"@id":"?s","@type":"ex:Person"},
"reasoning":"rdfs"
}'
# → ["ex:bob", "ex:carol", "ex:alice"]
alice (GradStudent → Student → Person) and carol (Professor → Person)
are now classified through the hierarchy.
6. OWL 2 RL inference through the import
Everything below uses axioms declared in the imported <onto/behaviors>
graph — they reach the reasoner only because owl:imports resolved
correctly.
6.1 owl:TransitiveProperty — ex:hasAncestor
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?a",
"where":{"@id":"ex:alice","ex:hasAncestor":"?a"},
"reasoning":"owl2rl"
}'
# → ["ex:eve", "ex:frank"]
Asserted: alice → eve, eve → frank. Inferred via the
TransitiveProperty axiom in the imported graph: alice → frank.
6.2 owl:SymmetricProperty — ex:livesWith
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?p",
"where":{"@id":"ex:bob","ex:livesWith":"?p"},
"reasoning":"owl2rl"
}'
# → ["ex:alice"]
Only alice livesWith bob was asserted; the symmetric pair is inferred.
6.3 owl:inverseOf — parentOf / childOf
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?p",
"where":{"@id":"ex:alice","ex:childOf":"?p"},
"reasoning":"owl2rl"
}'
# → ["ex:carol"]
Asserted: carol parentOf alice. Inferred: alice childOf carol.
6.4 rdfs:domain / rdfs:range
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?p",
"where":{"@id":"?p","@type":"ex:Professor"},
"reasoning":"owl2rl"
}'
# → ["ex:carol", "ex:professor1"]
professor1 was never typed. The reasoner infers it from
teaches rdfs:domain Professor (declared in the imported graph) plus the
asserted professor1 teaches cs101.
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?c",
"where":{"@id":"?c","@type":"ex:Course"},
"reasoning":"owl2rl"
}'
# → ["ex:cs101"]
Same idea on the range side: cs101 is classified as a Course because of
teaches rdfs:range Course in the import.
7. OWL 2 QL — query rewriting only
OWL 2 QL handles the same constructs as RDFS plus owl:inverseOf and
rdfs:domain/range, but at query rewrite time rather than via fact
materialisation. For the patterns above where you query the inferred
direction directly, OWL 2 RL is the simpler choice. OWL 2 QL is best when
you want zero materialisation and your queries already align with the
rewriting (e.g., asking for any superclass type).
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?s",
"where":{"@id":"?s","@type":"ex:Person"},
"reasoning":"owl2ql"
}'
# → ["ex:bob", "ex:carol", "ex:alice"]
Same answer as RDFS for this pattern, with no materialisation step.
8. Full chain: combining modes
Combining rdfs + owl2rl lets schema hierarchy and forward-chained facts
work together. professor1 appears as a Person via:
teaches rdfs:domain Professor(imported axiom, OWL 2 RL)professor1 teaches cs101(asserted)- ⇒
professor1 a Professor(derived) Professor rdfs:subClassOf Person(core ontology, RDFS)- ⇒
professor1 a Person(derived)
fluree query --format json '{
"@context":{"ex":"http://example.org/"},
"select":"?s",
"where":{"@id":"?s","@type":"ex:Person"},
"reasoning":["rdfs","owl2rl"]
}'
# → ["ex:bob", "ex:carol", "ex:professor1", "ex:alice"]
Submitting TriG over the HTTP API
The CLI’s upsert command is one way to load TriG. Against a running
fluree-db-server, the same payload goes through the HTTP API. Both
endpoints below accept Turtle/TriG when sent with Content-Type: application/trig (or text/turtle):
# Connection-scoped (specify ledger via query string)
curl -X POST 'http://localhost:8090/v1/fluree/upsert?ledger=demo:main' \
-H 'Content-Type: application/trig' \
--data-binary @02-ontology.trig
# Ledger-scoped path form
curl -X POST 'http://localhost:8090/v1/fluree/upsert/demo:main' \
-H 'Content-Type: application/trig' \
--data-binary @02-ontology.trig
The same TriG GRAPH blocks land in the same named graphs as via the CLI;
nothing else changes about the reasoning wiring.
See HTTP endpoints for the full surface area and Datasets and named graphs for how named graphs participate in queries.
What was actually proved
Each query above is a load-bearing test that the import closure is being walked correctly:
| Query | Axiom location | Without owl:imports resolution it would… |
|---|---|---|
| §6.1 transitive ancestors | imported graph (behaviors) | …only return ex:eve (no transitive closure) |
§6.2 symmetric livesWith | imported graph | …return empty (bob livesWith alice not asserted) |
§6.3 childOf via inverse | imported graph | …return empty (childOf is never asserted) |
| §6.4 domain/range classification | imported graph | …not classify professor1 / cs101 |
If you change f:followOwlImports to false in the config graph, every
query in §6 except bob livesWith collapses back to base data — a useful
toggle for confirming the closure walk is what’s doing the work.
Related references
- Concepts: Reasoning and inference
- Query-time reasoning syntax
- Setting groups → reasoningDefaults
- Design: ontology imports
- Concepts: Datasets and named graphs
Design
Architecture and design documents for Fluree’s internal systems. These documents describe the rationale behind key design decisions, wire formats, and trait architectures.
Documents
Query execution and overlay merge
How queries run through a single preparation/execution pipeline, how scan operators select the binary-cursor path vs the range fallback, and where overlay novelty merges with indexed data (including graph scoping boundaries).
Auth Contract (CLI ↔ Server)
Wire-level contract between the Fluree CLI and any Fluree-compatible server, covering OIDC device auth, token refresh, and storage proxy authentication.
Nameservice Schema v2
Design of the nameservice schema: ledger records, graph source records, configuration payloads, and the ref/config/tracking store abstractions.
Storage-agnostic Commits and Sync
How ContentId (CIDv1) values decouple the commit chain from storage backends, enabling replication across filesystem, S3, and IPFS. Includes the pack protocol wire format for efficient bulk transfer.
ContentId and ContentStore
The content-addressed identity layer: ContentId type, ContentStore trait, multicodec content kinds, and the bridge between CID-based identity and storage-backend addressing.
Index Format
Binary columnar index format: branch/leaf/leaflet hierarchy, dictionary artifacts, SPOT/PSOT/POST/OPST/TSPO layout, and encoding details.
Namespace allocation and fallback modes
How Fluree assigns ns_code values for IRIs (prefix trie matching, fallback split modes), including bulk-import preflight mitigation and how the “host-only” fallback persists for future transactions.
Ontology imports (f:schemaSource + owl:imports)
How the reasoner consumes schema from a named f:schemaSource graph and transitively resolves owl:imports: resolution order, the SchemaBundleOverlay projection, schema-triple whitelist, and caching.
Storage Traits
Storage trait architecture: StorageRead, StorageWrite, ContentAddressedWrite, Storage, and NameService trait design with guidance for implementing new backends.
Related Documentation
- Crate Map - Workspace architecture
- Contributing - Development guidelines
- Graph Identities and Naming - Naming conventions (user-facing and internal)
Query execution and overlay merge
title: Query execution and overlay merge
This document describes the single query execution pipeline in Fluree DB and how it combines:
- Indexed data (binary columnar indexes)
- Overlay data (novelty + staged flakes)
It also calls out where graph scoping (g_id) is applied so named graphs remain isolated.
Pipeline overview
flowchart TD
LedgerState -->|produces| LedgerSnapshot
LedgerSnapshot -->|shared substrate| GraphDb
GraphDb -->|single-ledger| QueryRunner
GraphDb -->|member_of| DataSetDb
DataSetDb -->|federated| QueryRunner
QueryRunner -->|scan index + merge overlay| DatasetOperator
DatasetOperator -->|per-graph| BinaryScanOperator
BinaryScanOperator -->|fast path| BinaryCursor
BinaryScanOperator -->|fallback| range_with_overlay
BinaryCursor -->|graph-scoped decode| BinaryGraphView
range_with_overlay -->|delegates| RangeProvider
Where this exists in code
-
API entrypoints
fluree-db-api/src/view/query.rs: single-ledgerGraphDbqueries (query)fluree-db-api/src/view/dataset_query.rs: dataset queries (DataSetDb)
-
Unified query runner
fluree-db-query/src/execute/runner.rsprepare_execution(db: GraphDbRef<'_>, query: &ExecutableQuery)builds derived facts/ontology (if enabled), rewrites patterns, and builds the operator tree.execute_prepared(...)runs the operator tree using anExecutionContext.
-
Dataset operator
fluree-db-query/src/dataset_operator.rsDatasetOperatorwraps every triple-pattern scan. In single-graph mode (the common case) it passes through to one innerBinaryScanOperatorwith negligible overhead. In multi-graph mode (FROM/FROM NAMED datasets) it fans out one inner operator per active graph, drives their lifecycles, and stamps ledger provenance (Binding::IriMatch) on results that span multiple ledgers.DatasetBuildertrait (factory pattern): the planner constructs aScanDatasetBuilderat plan time;DatasetOperatorcallsbuild()at execution time duringopen()to produce per-graphBinaryScanOperators.- Nested composition: inner operators can themselves be
DatasetOperators — provenance stamping passesIriMatchthrough unchanged.
-
Scan operators
fluree-db-query/src/binary_scan.rsBinaryScanOperatorhandles single-graph scanning only. Selects between binary cursor (streaming, integer-ID pipeline) and range fallback atopen()time based on theExecutionContext.
-
Range fallback
fluree-db-core/src/range.rs:range_with_overlay(snapshot, g_id, overlay, ...)fluree-db-core/src/range_provider.rs:RangeProvidertrait implemented by the binary range provider
Graph scoping (g_id)
Graph scoping is applied at two key boundaries:
- Binary streaming path:
BinaryCursoroperates on aBinaryGraphView(graph-scoped decode handle), ensuring leaf/leaflet decoding, predicate dictionaries, and specialty arenas are graph-isolated. - Range path:
range_with_overlay(snapshot, g_id, overlay, ...)passesg_idinto theRangeProvider, which routes the range query to the correct per-graph index segments.
Overlay providers are graph-scoped at the trait boundary: the overlay hook receives g_id and must only return flakes for that graph. This keeps multi-tenant named graphs isolated even when overlay data is sourced externally.
Overlay merge semantics (high level)
Both scan paths implement the same logical behavior:
- Read matching flakes from the indexed base (binary files)
- Read matching flakes from the overlay (novelty/staged)
- Merge them using
(t, op)semantics so retractions cancel assertions as-of the query time bound
The details differ:
BinaryScanOperatortranslates overlay flakes into integer-ID space and merges them into the decoded columnar stream.RangeScanOperatordelegates torange_with_overlay, which combinesRangeProvideroutput with overlay output.
Auth contract (CLI ↔ Server)
This document defines the wire-level contract between the Fluree CLI and any Fluree-compatible server (a standalone fluree-server, an OIDC-capable application embedding Fluree, or future products). Any implementation that exposes these endpoints will get zero-configuration CLI auth.
For the overall authentication model, see Authentication.
Implementer checklist (CLI compatibility)
An implementation is considered CLI-compatible if the Fluree CLI can:
- discover how to authenticate,
- obtain/store a Bearer token, and
- use that token for data-plane operations (and optionally refresh it).
Required (for “it works”)
- Auth discovery: implement
GET /.well-known/fluree.json.- Return at least
{ "version": 1 }. - If you support automated login, include an
authobject withtype="oidc_device"and required fields (issuer,client_id,exchange_url). - If you do not support automated login, you may omit
auth(CLI will use manual token input), or returnauth.type="token"to be explicit.
- Return at least
- Token exchange / refresh (only for
auth.type="oidc_device"): implementPOST {exchange_url}:grant_type="urn:ietf:params:oauth:grant-type:token-exchange"for exchanging an IdP token into a Fluree-scoped token.grant_type="refresh_token"for refreshing without user interaction (optional; CLI will still work without refresh, but requires re-login when tokens expire).
- Issue Fluree-scoped JWTs: the
access_tokenyou return MUST include the standard Fluree claims used byfluree-server:- identity:
fluree.identity(recommended) and standardiss/sub/exp/iat - scopes:
fluree.ledger.read.*,fluree.ledger.write.*,fluree.events.*(as applicable) - replication scopes (
fluree.storage.*) MUST be reserved for operator/service principals only.
- identity:
Recommended (for good UX and supportability)
- Stable error messages: keep
errorstrings stable and human-readable. The CLI may pattern-match on substrings (e.g."Bearer token required","Untrusted issuer") to provide hints. - Anti-leak semantics: for data endpoints, return
404for out-of-scope ledgers (do not leak existence). - Verified diagnostics: implement
GET /v1/fluree/whoami(or an equivalent endpoint) to returntoken_present,verified,auth_method, identity, and scope summary.
Auth discovery
GET /.well-known/fluree.json
The CLI fetches this endpoint when a remote is added (fluree remote add) to auto-configure auth. The server MAY expose this endpoint. If absent, the CLI falls back to manual token configuration.
Response (200 OK, application/json):
{
"version": 1,
"api_base_url": "https://data.example.com/v1/fluree",
"auth": {
"type": "oidc_device",
"issuer": "https://issuer.example.com",
"client_id": "fluree-cli",
"exchange_url": "https://data.example.com/v1/fluree/auth/exchange",
"scopes": ["openid", "profile"],
"redirect_port": 8400
}
}
api_base_url
api_base_url tells the CLI where the Fluree HTTP API is mounted.
It is specifically intended to support implementations that:
- mount the Fluree API under a non-root prefix (e.g.
/v1/fluree), and/or - want discovery served from a different host than the data plane (e.g.
www.example.comserving discovery that points atdata.example.com).
Contract:
api_base_urlMAY be:- an absolute URL, e.g.
https://data.example.com/v1/fluree, or - an absolute-path reference (relative to the discovery origin), e.g.
/v1/fluree.
- an absolute URL, e.g.
- If
api_base_urlis an absolute-path reference, the CLI MUST resolve it against the origin (scheme + host + port) of the discovery document URL it fetched (i.e., the URL used forGET /.well-known/fluree.json).- Example: discovery fetched from
https://abc123.cloudfront.net/.well-known/fluree.jsonandapi_base_url="/v1/fluree"resolves tohttps://abc123.cloudfront.net/v1/fluree.
- Example: discovery fetched from
api_base_urlSHOULD include the full prefix includingflureeand SHOULD NOT have a trailing slash.- The CLI MUST use the resolved
api_base_urlas the base for subsequent API calls (query/insert/upsert/update/info/exists). - If
api_base_urlis absent, the CLI MUST derive it from the configured remote URL:- If the remote URL already ends with
/fluree, use it as-is. - Otherwise, append
/fluree. - If you mount a versioned API (for example
/v1/fluree), you SHOULD includeapi_base_urlin discovery to avoid ambiguity.
- If the remote URL already ends with
auth.type values
| Type | Meaning | CLI behavior |
|---|---|---|
oidc_device | OIDC interactive login + token exchange | fluree auth login uses device-code if the IdP supports it, otherwise auth-code+PKCE |
token | Manual Bearer token (no automated login flow) | fluree auth login --token <value> |
Field reference (oidc_device)
| Field | Required | Description |
|---|---|---|
issuer | Yes | OIDC issuer URL (used for /.well-known/openid-configuration discovery) |
client_id | Yes | OAuth client ID for the CLI (public client; no client secret) |
exchange_url | Yes | Absolute URL for the Fluree token exchange endpoint |
scopes | No | OAuth scopes to request (default: ["openid"]) |
redirect_port | No | Port for auth-code callback listener (default: first available in 8400..8405; also overrideable via FLUREE_AUTH_PORT) |
Fallback behavior
- Discovery endpoint absent (404 or connection error) → CLI assumes
tokentype, prompts user to provide a token manually version> 1 → CLI warns but attempts to parse known fields
Token exchange
POST {exchange_url}
After the CLI completes OIDC login with the IdP, it calls the exchange endpoint to trade the IdP token for a Fluree-scoped Bearer token. This endpoint is hosted by the application that manages authorization (e.g., an app embedding Fluree and maintaining user entitlements).
Request:
POST /v1/fluree/auth/exchange HTTP/1.1
Content-Type: application/json
{
"grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
"subject_token": "<idp-access-token-or-id-token>",
"subject_token_type": "urn:ietf:params:oauth:token-type:access_token"
}
Success response (200 OK):
{
"access_token": "<fluree-bearer-token>",
"token_type": "Bearer",
"expires_in": 3600,
"refresh_token": "<optional-refresh-token>"
}
Error response (401/403):
{
"error": "invalid_grant",
"error_description": "IdP token is invalid or user is not authorized for Fluree access"
}
Contract
- The exchange endpoint validates the IdP token (against the IdP’s JWKS or userinfo), looks up the user’s Fluree entitlements, and mints a Fluree-scoped JWT.
- The returned
access_tokenMUST be a JWT thatfluree-servercan verify (via JWKS). It MUST include the standard Fluree claims (fluree.identity,fluree.ledger.*, and optionallyfluree.storage.*). See Bearer token claim set. refresh_tokenis OPTIONAL. If present, the CLI stores it and uses it for silent refresh.subject_token_typeMAY beurn:ietf:params:oauth:token-type:id_tokenif the CLI sends the ID token instead of the access token.
This loosely follows RFC 8693 (OAuth 2.0 Token Exchange).
Token refresh
POST {exchange_url}
If the CLI holds a refresh_token, it can request a new access token without user interaction.
Request:
{
"grant_type": "refresh_token",
"refresh_token": "<stored-refresh-token>"
}
Success response: Same shape as token exchange success.
Failure: CLI clears stored tokens and prompts fluree auth login.
CLI TOML config format
The CLI stores auth configuration per-remote in .fluree/config.toml:
[[remotes]]
name = "solo-prod"
type = "Http"
base_url = "https://solo.example.com"
[remotes.auth]
type = "oidc_device"
issuer = "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_abc123"
client_id = "fluree-cli"
exchange_url = "https://solo.example.com/v1/fluree/auth/exchange"
scopes = ["openid", "profile"]
redirect_port = 8400
token = "eyJ..." # cached Fluree Bearer token (written by 'fluree auth login')
refresh_token = "eyJ..." # refresh token (written by 'fluree auth login')
[[remotes]]
name = "local"
type = "Http"
base_url = "http://localhost:8090"
[remotes.auth]
type = "token"
token = "eyJ..." # manually provided via 'fluree auth login --token'
Backward compatibility: If type is absent, infer "token" if token is present, otherwise treat as unauthenticated.
CLI fluree auth login behavior
fluree auth login [--remote <name>]
- Resolve the target remote.
- Check
auth.type:oidc_device:- Discover OIDC endpoints from
{issuer}/.well-known/openid-configuration. - If the discovery document includes
device_authorization_endpoint, run OAuth device-code:- POST to
device_authorization_endpointto getdevice_code,user_code,verification_uri. - Print:
Open {verification_uri} and enter code: {user_code} - Poll
token_endpointuntil user completes browser auth.
- POST to
- Otherwise, if the discovery document includes
authorization_endpoint, run OAuth authorization-code + PKCE:- Start a localhost callback listener on
http://127.0.0.1:{port}/callback(port selection:redirect_port/FLUREE_AUTH_PORT, else first available in8400..8405). - Open the system browser to the
authorization_endpointURL includingcode_challengeand requestedscopes. - Receive the callback, then exchange the code at
token_endpoint. - Note for Cognito: callback URLs must be pre-allowlisted (no wildcard ports); allowlist
http://127.0.0.1:8400/callbackthroughhttp://127.0.0.1:8405/callback(or your chosen fixed port).
- Start a localhost callback listener on
- POST IdP token to
exchange_url→ get Fluree Bearer token. - Store
tokenandrefresh_tokenin remote config.
- Discover OIDC endpoints from
token: Prompt for token (or accept--token <value|@file|@->). Store in config.- Unset / no discovery: Attempt discovery at
{base_url}/.well-known/fluree.json. If found, configure auth type and proceed. If not found, fall back totokenflow.
See CLI auth command for full command reference.
CLI auto-refresh on 401
Auto-refresh applies to data-plane commands (query, insert, upsert, info) that use RemoteLedgerClient in tracked mode or --remote mode.
When a data-plane command receives a 401 from the remote:
- If
auth.type == "oidc_device"andrefresh_tokenis present:- Attempt silent refresh via the exchange endpoint.
- On success: update stored token and (if rotated) refresh token in
.fluree/config.toml, retry the original request once. - On failure: clear tokens, print
Token expired. Run: fluree auth login --remote <name>
- Otherwise: print
Authentication failed. Run: fluree auth login --remote <name>
Replication commands (fetch, pull, push)
Replication commands use HttpRemoteClient (from fluree-db-nameservice-sync) which does not perform auto-refresh. This is intentional:
- Replication requires
fluree.storage.*scopes, which are reserved for operators and service accounts. - Operator tokens are typically long-lived or non-expiring. If an operator token expires, the user should run
fluree auth loginto obtain a new one. - Regular users who only have query-scoped tokens should use
fluree track+--remotemode instead offetch/pull/push.
Scope rules
- The exchange endpoint MUST NOT grant
fluree.storage.*to regular users. Replication scope is for operators and service accounts only. See Replication vs query boundary. - If a user with only query-scoped tokens attempts
fluree pullorfluree fetch, the CLI MUST fail with a clear message explaining that replication requiresfluree.storage.*and suggestingfluree trackinstead.
Token diagnostic endpoint
GET /v1/fluree/whoami
A verified diagnostic endpoint that performs full cryptographic verification of the Bearer token (if present) using the same code path as data endpoints. This is the recommended way for the CLI or an implementing application to validate a token without side effects.
No token:
{ "token_present": false }
Valid token (verified):
{
"token_present": true,
"verified": true,
"auth_method": "embedded_jwk",
"issuer": "did:key:z6Mk...",
"subject": "admin@example.com",
"identity": "did:key:z6Mk...",
"expires_at": 1739012345,
"scopes": {
"ledger_read_all": true,
"ledger_write_all": true
}
}
Invalid token (verification failed):
{
"token_present": true,
"verified": false,
"error": "Token expired",
"issuer": "did:key:z6Mk...",
"subject": "admin@example.com",
"expires_at": 1738900000
}
When verification fails, the response includes unverified decoded claims (base64-decoded without signature check) for debugging. These fields are explicitly untrustworthy — they help diagnose why verification failed (e.g., wrong issuer, expired token) but must never be used for authorization decisions.
The auth_method field is only present on successful verification: "embedded_jwk" for Ed25519/JWS tokens, "oidc" for JWKS/RS256 tokens.
This endpoint always returns 200 regardless of token validity — it is diagnostic, not a gate.
Error semantics
Standard error response shape
fluree-server returns errors as JSON with a consistent structure. Implementers
SHOULD follow this shape so the CLI can display meaningful diagnostics.
{
"error": "<human-readable description>",
"status": 401,
"@type": "err:db/Unauthorized",
"cause": {
"error": "<nested cause (optional)>",
"status": 400,
"@type": "err:db/JsonParse"
}
}
Notes:
erroris the primary human-readable message. The CLI may pattern-match on substrings inside this field.@typeis a compact error type IRI used as a stable, machine-readable code.causeis optional and may be nested.- Implementers MAY include additional fields, but MUST keep
errorstable and human-readable.
Status codes
| Code | Meaning | When |
|---|---|---|
200 | Success | Request completed successfully |
400 | Bad request | Malformed body, invalid JSON, missing required fields |
401 | Unauthorized | Missing Bearer token, expired token, invalid signature, unknown signing key |
403 | Forbidden | Valid token but insufficient scope (e.g., query-only token on admin endpoint) |
404 | Not found or unauthorized | Ledger does not exist, or token lacks access to this ledger (anti-leak) |
409 | Conflict | Ledger already exists (/fluree/create), concurrent transaction conflict |
500 | Internal error | Server-side failure |
Anti-leak pattern: 404 for out-of-scope ledgers
Data endpoints (/fluree/query, /fluree/update, etc.) return 404 rather than 403 when a valid token lacks access to the requested ledger. This prevents authenticated users from discovering the existence of ledgers they are not authorized to access.
Implication for CLI and implementers: A 404 on a data endpoint can mean either:
- The ledger genuinely does not exist, or
- The token does not have scope for that ledger.
The CLI should present both possibilities in error messages. Implementers should not attempt to distinguish these cases client-side.
Token verification errors (401)
Common 401 error messages and their causes:
| Server message | Cause | CLI hint |
|---|---|---|
Bearer token required | No Authorization: Bearer ... header | fluree auth login --remote <name> |
Invalid token | Malformed JWT/JWS, bad signature | Re-issue token; check signing key |
Token expired | exp claim is in the past | Refresh or re-login |
Untrusted issuer | iss / signing key not in trusted list | Check --trusted-issuer / --jwks-issuer config |
OIDC issuer not configured | Token has kid header but no JWKS configured | Add --jwks-issuer to server config |
Token lacks storage proxy permissions | Valid token but missing fluree.storage.* | Use operator token or fluree track instead |
Implementor checklist
Any Fluree-compatible server that wants zero-config CLI auth must:
- Expose
GET /.well-known/fluree.jsonwith the discovery payload - Implement
POST {exchange_url}for token exchange and refresh - Issue Fluree-scoped JWTs with the standard claim set
- Publish a JWKS endpoint so
fluree-servercan verify issued tokens (configured via--jwks-issuer)
Conformance checklist (status codes)
Implementors MUST return these status codes consistently so the CLI can provide good diagnostics:
| Endpoint | Success | Missing token | Bad token | Insufficient scope | Not found / no access |
|---|---|---|---|---|---|
GET /.well-known/fluree.json | 200 | n/a | n/a | n/a | 404 (not implemented) |
POST /v1/fluree/create | 201 | 401 | 401 | 403 | n/a |
POST /v1/fluree/drop | 200 | 401 | 401 | 403 | 404 |
POST /v1/fluree/query | 200 | 401 | 401 | 404 (anti-leak) | 404 (anti-leak) |
POST /v1/fluree/update | 200 | 401 | 401 | 404 (anti-leak) | 404 (anti-leak) |
POST /v1/fluree/auth/exchange | 200 | n/a | 401 | 403 | n/a |
GET /v1/fluree/whoami | 200 | 200 (token_present=false) | 200 (verified=false) | n/a | n/a |
Conformance checklist (error bodies)
All error responses MUST include a JSON body. The body SHOULD include at least an error or message field. The CLI pattern-matches on specific substrings (e.g., "Bearer token required", "Untrusted issuer") to provide targeted hints, so error messages should be stable across releases.
See also
- Authentication — Auth model, modes, claim set, and access boundaries
- Configuration — OIDC — Server
--jwks-issuersetup - CLI auth command —
auth login,auth status,auth logout - CLI token command — Ed25519 token minting (Mode 2)
Nameservice Schema v2 Design
Schema Version: 2
Overview
This document describes the design for a unified nameservice schema that supports:
- Ledgers with named graphs and independent indexing
- Non-ledger graph sources (indexes/mappings like BM25, Iceberg/R2RML, Vector/HNSW, JDBC, etc.) with varying versioning semantics
- Four independent atomic concerns that can be updated without contention
- Watermarked updates for client subscription and push notifications
- Pluggable backends (DynamoDB, S3, filesystem) with consistent semantics
Terminology:
- Prefer graph source in docs and user-facing API descriptions.
- Non-ledger data sources (BM25, vector, Iceberg, R2RML) are called graph sources.
Design Goals
- Stable schema: Minimize attribute changes as features evolve
- Flexible payloads: Use JSON Maps for evolving/variable content
- Reduced conflict probability: Logically independent concerns minimize contention
- Client subscriptions: Watermarks enable efficient change detection
- Coordination via status: Soft locks/leases for distributed process coordination
The Four Concerns Model
Each nameservice record has four independent concerns, each with its own watermark and payload:
| # | Concern | Watermark | Payload | Updated By |
|---|---|---|---|---|
| 1 | Head | commit_t | commit | Transactor (on commit) |
| 2 | Index | index_t | index | Indexer (on index publish) |
| 3 | Status | status_v | status | Various (state changes, metrics, locks) |
| 4 | Config | config_v | config | Admin (settings changes) |
Each concern can be pushed independently without affecting or contending with the others.
DynamoDB Schema
Table Name
fluree-nameservice (configurable)
Physical layout: item-per-concern (PK+SK)
DynamoDB serializes writes per item, not per attribute. To achieve true per-concern independence (transactor vs indexer vs admin), represent each concern as a separate item under the same address partition:
pk(partition key): record address in thename:branchform (e.g.,"mydb:main","products-search:main")sk(sort key): concern discriminator
Recommended sk values:
metahead(ledgers only)index(ledgers + graph sources)config(ledgers + graph sources)status(ledgers + graph sources)
This layout aligns with the file-backed v2 pattern (.index.json separate) while also eliminating DynamoDB physical contention between writers.
Design Note: Per-Concern Independence
Each concern is logically independent:
- No shared
updated_at: Each concern’s watermark (commit_t,index_t, etc.) serves as its timestamp/version marker - Disjoint items: Updating one concern does not touch any attributes of another concern
- Reduced conflict probability: Independent concerns minimize logical contention
With the item-per-concern layout, DynamoDB contention is limited to writers of the same concern.
Entity kinds and graph source types
The meta item carries the record discriminator:
kind:ledger|graph_sourcesource_type(graph sources only): a type string (e.g.,f:Bm25Index,f:HnswIndex,f:IcebergSource,f:R2rmlSource,f:JdbcSource)
Use graph_source naming consistently in pk values and type strings.
Watermark Semantics
Watermarks are strict monotonic per concern. This ensures:
- Clients can detect changes by comparing watermarks.
- No change is ever “invisible” to subscribers.
- Simple comparison logic:
if remote_watermark > local_watermark then changed.
commit_t (Ledger commit watermark)
- Value: Equals the commit
t(transaction time). - Update rule: Strict monotonic (
new_t > current_t). - Rationale: Commits are already strictly ordered by
t, sotIS the version
index_t (Index watermark)
- Value: Transaction time
tthat the published index covers. - Update rule: Strict monotonic (
new_t > current_t). - Admin reindex: allow idempotent overwrite at the same
t(new_t >= current_t) when rebuilding an index to the same watermark with a new address.
status_v (Status Watermark)
- Value: Atomic incrementing integer
- Update rule: Strict monotonic (
new_v > current_v) - Rationale: Status has no
trelation; version is just a change counter
config_v (Config Watermark)
- Value: Atomic incrementing integer
- Update rule: Strict monotonic (
new_v > current_v) - Rationale: Config has no
trelation; version is just a change counter
Unborn State Semantics
When a record is initialized but has no data yet for a concern:
| Concern | Unborn Watermark | Unborn Payload | Meaning |
|---|---|---|---|
head | commit_t = 0 | commit = null | Ledger initialized, no commits yet |
index | index_t = 0 | index = null | No index published yet |
status | status_v = 1 | status = {state: "ready"} | Always has initial status |
config | config_v = 0 | config = null | No config set yet |
Key distinction:
*_v = 0withpayload = null: Initialized but unborn (record exists)- Record not found (GetItem returns nothing): Unknown/never created
Payload Schemas
commit (Ledger)
{
"id": "bafybeigdyr...commitCid",
"t": 42
}
| Field | Type | Description |
|---|---|---|
id | String | ContentId (CIDv1) of the commit |
t | Number | Transaction time (redundant with commit_t but explicit) |
See ContentId and ContentStore for details on the CID format.
index (Ledger with Named Graphs)
{
"default": {
"id": "bafybeig...indexRootDefault",
"t": 42,
"rev": 0
},
"txn-metadata": {
"id": "bafybeig...indexRootTxnMeta",
"t": 42,
"rev": 1
},
"audit-log": null
}
| Field | Type | Description |
|---|---|---|
{named-graph} | Object | null | Index state per named graph |
.id | String | ContentId (CIDv1) of the index root |
.t | Number | Transaction time the index covers |
.rev | Number | Revision at that t (0, 1, 2… for reindex operations) |
Named graph = null means that graph exists but hasn’t been indexed yet.
index (Graph Source)
For graph sources with index state (e.g., BM25, vector, spatial, Iceberg, etc.), the nameservice stores a head pointer to the graph source’s latest index root/manifest. The payload is intentionally opaque to nameservice: the graph source implementation defines what the ContentId points to and how (or whether) it supports time travel.
{
"id": "bafybeig...graphSourceIndexRoot",
"index_t": 42
}
For graph sources with no index concept (e.g., JDBC mappings): null.
Design note: Snapshot history (if any) is stored in graph-source-owned manifests in storage, not in nameservice. See docs/design/graph-source-index-manifests.md.
status
{
"state": "ready",
"queue_depth": 3,
"last_commit_ms": 45
}
| Field | Type | Description |
|---|---|---|
state | String | Current state (see State Values below) |
* | Any | Additional metadata varies by state and entity type |
State Values
| State | Description | Typical Metadata |
|---|---|---|
ready | Normal operating state (default initial state) | queue_depth, last_commit_ms |
indexing | Background indexing in progress | index_lock |
reindexing | Full reindex in progress | reindex_lock, progress |
syncing | Graph source syncing from source | progress, source_t, synced_t |
maintenance | Administrative maintenance in progress | maintenance_lock |
retracted | Soft-deleted | retracted_at, reason |
error | Error state | error, error_at |
status with Locks (Coordination)
{
"state": "indexing",
"index_lock": {
"holder": "indexer-7f3a",
"target_t": 45,
"acquired_at": 1705312200,
"expires_at": 1705316100
}
}
| Field | Type | Description |
|---|---|---|
index_lock | Object | null | Soft lock for indexing coordination |
.holder | String | Identifier of the process holding the lock |
.target_t | Number | The t being indexed |
.acquired_at | Number | Unix epoch when lock was acquired |
.expires_at | Number | Unix epoch when lock expires (lease timeout) |
config
{
"default_context_id": "bafkreih...contextCid",
"index_threshold": 1000,
"replication": {
"factor": 3,
"regions": ["us-east-1", "us-west-2"]
}
}
Config is fully flexible JSON. Common fields:
| Field | Type | Description |
|---|---|---|
default_context_id | String | ContentId (CIDv1) of default JSON-LD context |
index_threshold | Number | Commits before auto-index |
replication | Object | Replication settings |
For graph sources, config contains type-specific settings:
BM25:
{
"k1": 1.2,
"b": 0.75,
"fields": ["title", "body", "description"]
}
JDBC:
{
"connection_string": "jdbc:postgresql://host:5432/db",
"schema": "public",
"pool_size": 10
}
DynamoDB Operations
CAS Semantics (Git-like Push)
All push operations support compare-and-set (CAS) semantics with expected old values. This enables Git-like divergence detection:
- Caller provides
expected(the last-known state) andnew(the desired state) - Backend rejects if current state doesn’t match
expected - On rejection, backend returns
actualcurrent state for caller to reconcile
This is stronger than simple watermark monotonicity: it detects divergence, not just staleness.
Create (Initialize)
Operation: PutItem
ConditionExpression: attribute_not_exists(#pk)
Item: {
pk: "mydb:main",
sk: "meta",
schema: 2,
kind: "ledger",
name: "mydb",
branch: "main",
dependencies: null,
created_at: <now>, // optional
updated_at_ms: <now_ms>,
retracted: false,
}
push_commit (Publish Commit)
Option A: Monotonic only (simpler, allows fast-forward by any newer commit)
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "head" }
ConditionExpression: attribute_not_exists(#ct) OR #ct < :new_t
UpdateExpression: SET #ct = :new_t, #c = :commit
ExpressionAttributeNames: {
"#ct": "commit_t",
"#c": "commit"
}
ExpressionAttributeValues: {
":new_t": 42,
":commit": { "id": "bafybeig...commitT42", "t": 42 }
}
Option B: CAS with expected value (Git-like, detects divergence)
CAS checks both watermark equality AND payload equality. The condition is a single OR’d expression handling both existing and unborn cases:
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "head" }
// Single condition: existing case OR unborn case
ConditionExpression:
(#ct = :expected_t AND #c = :expected_commit AND :new_t > :expected_t)
OR
(#ct = :zero AND attribute_type(#c, :null_type) AND :new_t > :zero)
UpdateExpression: SET #ct = :new_t, #c = :commit
ExpressionAttributeNames: {
"#ct": "commit_t",
"#c": "commit"
}
ExpressionAttributeValues: {
":expected_t": 41, // caller's last-known watermark
":expected_commit": { "id": "bafybeig...commitT41", "t": 41 }, // caller's last-known payload
":new_t": 42,
":commit": { "id": "bafybeig...commitT42", "t": 42 },
":zero": 0,
":null_type": "NULL"
}
Caller logic: Set :expected_v and :expected_head based on last-known state:
- If unborn:
:expected_v = 0,:expected_headcan be any value (the unborn clause matches on#hv = :zero) - If existing:
:expected_v = last_v,:expected_head = last_payload
Note: DynamoDB does support nested paths like #h.#addr (with #h=head, #addr=address). However, comparing the entire map (#h = :expected_head) is simpler and avoids partial-match edge cases.
Recommendation: Use Option B (CAS) for transactors to detect divergence. Use Option A for distributed sync where fast-forward is acceptable.
push_index (Publish Index)
CAS with expected watermark + monotonic enforcement:
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "index" }
ConditionExpression: (attribute_not_exists(#it) OR #it < :new_t)
UpdateExpression: SET #it = :new_t, #i = :index
ExpressionAttributeNames: {
"#it": "index_t",
"#i": "index"
}
ExpressionAttributeValues: {
":new_t": 42,
":index": {
"default": { "id": "bafybeig...indexDefault", "t": 42, "rev": 0 },
"txn-metadata": { "id": "bafybeig...indexTxnMeta", "t": 42, "rev": 1 }
},
}
Note: For admin rebuilds at the same watermark, allow #it <= :new_t as the condition (idempotent overwrite at equal t).
push_status (Update Status)
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "status" }
ConditionExpression: (#sv = :expected_v AND :new_v > :expected_v)
OR
(attribute_not_exists(#sv) AND :expected_v = :zero)
UpdateExpression: SET #sv = :new_v, #s = :status
ExpressionAttributeNames: {
"#sv": "status_v",
"#s": "status"
}
ExpressionAttributeValues: {
":expected_v": 89,
":zero": 0,
":new_v": 90,
":status": { "state": "ready", "queue_depth": 0 }
}
Note: status_v starts at 1 (not 0) on creation, so attribute_not_exists(#sv) handles cases where the attribute is missing (e.g., partially-written or manually-created items). Normal updates use the first clause.
push_config (Update Config)
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "config" }
ConditionExpression: (#cv = :expected_v AND :new_v > :expected_v)
OR
(#cv = :zero AND attribute_type(#c, :null_type) AND :expected_v = :zero)
UpdateExpression: SET #cv = :new_v, #c = :config
ExpressionAttributeNames: {
"#cv": "config_v",
"#c": "config"
}
ExpressionAttributeValues: {
":expected_v": 2,
":zero": 0,
":new_v": 3,
":config": { "default_context_id": "bafkreih...", "index_threshold": 500 },
":null_type": "NULL"
}
Note: Unborn clause checks both #cv = :zero AND attribute_type(#c, NULL) to prevent accepting writes against inconsistent states.
Retract
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "meta" }
UpdateExpression: SET #r = :true, #sv = :new_sv, #s = :status
ExpressionAttributeNames: {
"#r": "retracted",
"#sv": "status_v",
"#s": "status"
}
ExpressionAttributeValues: {
":true": true,
":new_sv": 91,
":status": { "state": "retracted", "retracted_at": 1705315800 }
}
Lookup (Read)
Operation: GetItem
Key: { pk: "mydb:main", sk: "meta" }
ConsistentRead: true
To read full state, query all items for the record address: pk = "mydb:main" and assemble meta + head + index + status + config as present.
List by Kind
Operation: Query (requires GSI on kind)
KeyConditionExpression: #kind = :kind
ExpressionAttributeNames: { "#kind": "kind" }
ExpressionAttributeValues: { ":kind": "ledger" }
To list graph sources, query kind = graph_source.
To list graph sources of a specific type (optional GSI), query source_type = f:Bm25Index, etc.
Push Result Handling
Each push operation returns one of:
| Result | Meaning | Action |
|---|---|---|
Updated | Update accepted | Proceed |
Conflict | Expected didn’t match current | Reconcile using actual |
Rust Types (aligned with existing RefKind/CasResult vocabulary)
#![allow(unused)]
fn main() {
/// Which concern is being read or updated.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub enum ConcernKind {
/// The commit head pointer (`commit_t` + `commit` payload)
Head,
/// The index state (`index_t` + `index` payload)
Index,
/// The status state (status_v + status payload)
Status,
/// The config state (config_v + config payload)
Config,
}
/// Value of a concern: watermark + optional payload.
///
/// - `Some(ConcernValue { v: 0, payload: None })` — unborn (initialized, no data)
/// - `Some(ConcernValue { v: N, payload: Some(...) })` — has data
/// - `None` (at Option level) — record doesn't exist
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct ConcernValue<T> {
pub v: i64,
pub payload: Option<T>,
}
/// Outcome of a compare-and-set push operation.
///
/// Conflicts are NOT errors — they are expected outcomes of concurrent
/// writes and must be handled by the caller (retry, report, etc.).
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum CasResult<T> {
/// CAS succeeded — the concern was updated to the new value.
Updated,
/// CAS failed — `expected` did not match the current value.
/// `actual` carries the current concern value so the caller can decide
/// what to do next (retry, diverge, etc.).
Conflict { actual: Option<ConcernValue<T>> },
}
}
Conflict Handling
On Conflict, the caller receives the actual current state and can:
- Fast-forward: If
actual.v < new.v, retry withexpected = actual - Divergence: If
actual.v >= new.vor addresses differ unexpectedly, handle merge/error - Retry loop: For distributed systems, implement bounded retry with backoff
#![allow(unused)]
fn main() {
async fn push_with_retry<T>(
ns: &impl ConcernPublisher<T>,
address: &str,
kind: ConcernKind,
new: ConcernValue<T>,
max_retries: usize,
) -> Result<CasResult<T>> {
let mut expected = ns.get_concern(address, kind).await?;
for _ in 0..max_retries {
match ns.push_concern(address, kind, expected.as_ref(), &new).await? {
CasResult::Updated => return Ok(CasResult::Updated),
CasResult::Conflict { actual } => {
// Check if fast-forward is still possible
if let Some(ref act) = actual {
if new.v <= act.v {
// Diverged - can't fast-forward
return Ok(CasResult::Conflict { actual });
}
}
// Retry with new expected
expected = actual;
}
}
}
// Exhausted retries
let actual = ns.get_concern(address, kind).await?;
Ok(CasResult::Conflict { actual })
}
}
Example Records
DynamoDB (item-per-concern) examples
This section shows the DynamoDB physical layout (multiple items per address partition). Other backends serialize the same logical concerns differently.
Ledger (typical items)
Ledger records are represented as multiple items under the same pk:
{
"pk": "mydb:main",
"sk": "meta",
"schema": 2,
"kind": "ledger",
"name": "mydb",
"branch": "main",
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false
}
{
"pk": "mydb:main",
"sk": "head",
"schema": 2,
"commit_t": 42,
"commit": { "id": "bafybeig...commitT42", "t": 42 }
}
{
"pk": "mydb:main",
"sk": "index",
"schema": 2,
"index_t": 42,
"index": {
"default": { "id": "bafybeig...indexDefaultT42", "t": 42, "rev": 0 }
}
}
{
"pk": "mydb:main",
"sk": "config",
"schema": 2,
"config_v": 2,
"config": { "default_context_id": "bafkreih...contextCid", "index_threshold": 1000 }
}
{
"pk": "mydb:main",
"sk": "status",
"schema": 2,
"status_v": 89,
"status": { "state": "ready", "queue_depth": 3, "last_commit_ms": 45 }
}
Ledger (unborn)
An “unborn” ledger has all 5 concern items created atomically at initialization. The head and index items have watermarks set to 0 with null payloads. The status item starts at status_v=1 with state="ready". The config item starts at config_v=0 (unborn).
Graph Source (BM25)
{
"pk": "search:main",
"sk": "meta",
"schema": 2,
"kind": "graph_source",
"source_type": "f:Bm25Index",
"name": "search",
"branch": "main",
"dependencies": ["mydb:main"],
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false
}
Additional concern items for the same pk (examples):
{
"pk": "search:main",
"sk": "config",
"schema": 2,
"config_v": 1,
"config": { "k1": 1.2, "b": 0.75, "fields": ["title", "body"] }
}
{
"pk": "search:main",
"sk": "index",
"schema": 2,
"index_t": 42,
"index": { "id": "bafybeig...bm25IndexRoot" }
}
Graph Source (Iceberg)
{
"pk": "analytics:main",
"sk": "meta",
"schema": 2,
"kind": "graph_source",
"source_type": "f:IcebergSource",
"name": "analytics",
"branch": "main",
"dependencies": ["mydb:main"],
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false,
"...": "see config/index items"
}
Graph Source (JDBC - No Index)
{
"pk": "erp:main",
"sk": "meta",
"schema": 2,
"kind": "graph_source",
"source_type": "f:JdbcSource",
"name": "erp",
"branch": "main",
"dependencies": null,
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false,
"...": "see config item; index item may be absent or have index_t=0"
}
Git-like Push Model
The nameservice follows a git-like model where:
- Local nameservice: Each node has a local NS for reads and local writes
- Upstream nameservice: The “source of truth” that accepts or rejects pushes
- Push operations: Local changes are pushed upstream
- Forward operations: Requests can be forwarded upstream without local write
┌─────────────────┐ push_head ┌─────────────────────┐
│ Transactor │ ────────────────────────▶ │ │
│ (local NS) │ │ Upstream NS │
└─────────────────┘ │ │
│ - DynamoDB, or │
┌─────────────────┐ push_index │ - S3 + ETags, or │
│ Indexer │ ────────────────────────▶ │ - FS + locks, or │
│ (local NS) │ │ - Service │
└─────────────────┘ │ │
▲ │ Enforces: │
│ pull/sync │ - Watermark rules │
└─────────────────────────────────────│ - Serialization │
└─────────────────────┘
Upstream NS Backend Options
| Backend | How It Enforces Rules |
|---|---|
| DynamoDB | Conditional expressions on watermarks |
| S3 | ETags for CAS + application logic |
| Filesystem | File locks or single-writer process |
| Service | Queue + application logic |
The push interface is the same regardless of backend.
Status-based Coordination (Soft Locks)
Status can carry soft locks for coordinating distributed processes:
Lock Acquisition Flow
1. Indexer starts up
2. Read current status
3. If index_lock exists and not expired:
→ Another indexer is working, wait or skip
4. If no lock or lock expired:
→ Push status with our lock claim (status_v + 1)
→ If accepted: we own the lock, proceed
→ If rejected: someone else claimed it, back off
5. Do indexing work (periodically refresh lock by pushing status)
6. Push index update
7. Push status: clear lock, set state to ready
Lock Expiry (Crash Recovery)
If a process crashes while holding a lock:
- The
expires_attimestamp allows other processes to take over - No manual intervention needed
- Typical lease duration: 5-15 minutes depending on operation
Lock Refresh
Long-running operations should periodically refresh their lock:
{
"state": "indexing",
"index_lock": {
"holder": "indexer-7f3a",
"target_t": 45,
"acquired_at": 1705312200,
"expires_at": 1705316100,
"refreshed_at": 1705314000
},
"progress": 0.67
}
Client Subscription Model
Clients track watermarks to detect changes:
{
"subscriptions": {
"mydb:main": {
"kind": "ledger",
"commit_t": 42,
"index_t": 42,
"status_v": 89,
"config_v": 2
},
"search:main": {
"kind": "graph_source",
"source_type": "f:Bm25Index",
"index_t": 42,
"status_v": 12,
"config_v": 1
}
}
}
Change Detection
- Client polls or receives notification
- Compare watermarks:
if remote.commit_t > local.commit_t - Fetch only the changed concern(s)
- Update local cache
Subscription Granularity
Clients can subscribe to:
- All concerns for an address
- Specific concerns (e.g., only
commit_tfor a query client) - All addresses of a kind (e.g., all ledgers)
File-backed Nameservice Considerations
The logical concerns (head/index/status/config) can be stored in different physical layouts depending on the backend.
The file-backed and storage-backed implementations in this repo use the ns@v2 JSON-LD format (see fluree-db-nameservice/src/file.rs and fluree-db-nameservice/src/storage_ns.rs):
- Main record:
ns@v2/{name}/{branch}.json(commit/head + status + config-ish fields) - Index record:
ns@v2/{name}/{branch}.index.json(index head pointer only)
Field names differ from the DynamoDB layout, but the semantics match:
- logical
commit_tis stored asf:t - logical
commit.idis stored asf:ledgerCommit.@id(a CID string) - logical
index_tis stored asf:ledgerIndex.f:t(orf:indexTfor graph source index files) - logical
index.idis stored asf:ledgerIndex.@id(a CID string, orf:indexIdfor graph source index files)
Layout Options
Option A: Single File (Unified)
ns@v2/{name}/{branch}.json
- Contains all four concerns in one file
- Simplest for reads (one fetch)
- Requires single-writer discipline or file-level CAS
Option B: Separate Head and Index Files (Current Implementation)
ns@v2/{name}/{branch}.json # head + status + config
ns@v2/{name}/{branch}.index.json # index only
- Matches current implementation
- Allows transactor and indexer to write independently
- 2 files to read per entity for full state
- Trade-off: Status and config updates contend with head updates at file-lock level. Acceptable if status updates are low-frequency (state changes only, not high-frequency metrics).
Option C: Fully Separate Files (Maximum Independence)
ns@v2/{name}/{branch}.head.json
ns@v2/{name}/{branch}.index.json
ns@v2/{name}/{branch}.status.json
ns@v2/{name}/{branch}.config.json
- Each concern in its own file
- Maximum write independence
- 4 files to read per entity
Recommended Approach
Use Option B (separate head/index) as the default:
- Proven in current implementation
- Solves the main contention issue (transactor vs indexer)
- Reasonable read overhead (2 files)
- Constraint: Status updates should be coarse-grained (state transitions, not per-transaction metrics). If high-frequency status updates are needed, consider Option C.
Use Option C (fully separate files) when:
- Status updates are frequent (e.g., real-time queue depth reporting)
- Multiple independent processes update different concerns
- Write independence is more important than read efficiency
For queryable nameservice with many entities:
- Read files in parallel
- Consider in-memory caching with file-change notification
- The 2-file layout is acceptable; 4-file layout may add too much I/O
Atomicity Mechanisms
| Backend | Mechanism | Notes |
|---|---|---|
| Filesystem | Atomic rename (write to temp, rename) | POSIX guarantees |
| S3 | ETags for CAS | If-Match header |
| GCS | Generation numbers | Similar to ETags |
File Content Format
Each file contains JSON matching the concern’s payload plus metadata:
head file ({name}/{branch}.json):
{
"@context": { "f": "https://ns.flur.ee/db#" },
"@id": "mydb:main",
"@type": ["f:Database", "f:LedgerSource"],
"f:ledger": { "@id": "mydb" },
"f:branch": "main",
"f:ledgerCommit": { "@id": "bafybeig...commitT42" },
"f:t": 42,
"f:ledgerIndex": { "@id": "bafybeig...indexRootT42", "f:t": 42 },
"f:status": "ready"
}
index file ({name}/{branch}.index.json):
{
"@context": { "f": "https://ns.flur.ee/db#" },
"f:ledgerIndex": { "@id": "bafybeig...indexRootT42", "f:t": 42 }
}
Global Secondary Indexes (GSIs)
GSI1: gsi1-kind (Implemented)
| GSI Name | Partition Key | Sort Key | Use Case |
|---|---|---|---|
gsi1-kind | kind | pk | List all entities of a kind (ledger, graph_source) |
- Only
metaitems carry thekindattribute and project into the GSI - Projection:
INCLUDEwithname,branch,source_type,dependencies,retracted - Used by
all_records()(kind=ledger) andall_vg_records()(kind=graph_source) - After GSI query returns meta items,
BatchGetItemfetches remaining concern items (config,index) to assemble full records
Future GSIs
| GSI Name | Partition Key | Sort Key | Use Case |
|---|---|---|---|
source-type-index | source_type | pk | List all graph sources of a given type |
state-index | status_state | pk | Find entities in specific state |
Note on state-index: DynamoDB GSIs cannot use nested map attributes as keys. To enable this GSI:
- Add an optional denormalized attribute
status_state(String) on thestatusitem - Update
status_statewheneverstatus.statechanges - Only add it if you need GSI-based queries by state
Alternative: Use Scan with FilterExpression on status.state (less efficient but no schema extension needed)
Future Considerations
Streams and Events
DynamoDB Streams can be enabled to:
- Trigger Lambda on changes
- Build event sourcing
- Replicate to other regions
Multi-region
For global deployments:
- Use DynamoDB Global Tables
- Or regional nameservices with cross-region sync
Appendix: Attribute Reference
All items share:
| Attribute | Type | Description |
|---|---|---|
pk | String | Record address (name:branch) |
sk | String | Concern discriminator (meta, head, index, status, config) |
schema | Number | Schema version (always 2) |
meta item
| Attribute | Type | Description |
|---|---|---|
kind | String | ledger | graph_source |
name | String | Base name |
branch | String | Branch name |
retracted | Boolean | Soft-delete flag |
branches | Number | Child branch reference count (0 for leaf branches, omitted when 0 in JSON-LD) |
dependencies | List<String> | null | Graph-source dependencies (optional) |
source_type | String | null | Graph-source type (e.g., f:Bm25Index) |
created_at | Number | Creation timestamp (epoch seconds, optional) |
updated_at_ms | Number | Last update time (epoch millis, optional) |
meta item: Branch Attributes
For branches created via create_branch, the meta item carries an additional attribute recording the source branch:
| Attribute | Type | Description |
|---|---|---|
bp_source | String | null | Source branch name (e.g., "main") |
This attribute is null/absent for the original main branch. The JSON-LD format uses f:sourceBranch. The divergence point between a branch and its source is computed on demand by walking the commit chains rather than being stored.
head item (ledgers only)
| Attribute | Type | Description |
|---|---|---|
commit_t | Number | Commit watermark (t) |
commit | Map | null | { id, t } (id is a ContentId CID string) |
index item (ledgers + graph sources)
| Attribute | Type | Description |
|---|---|---|
index_t | Number | Index watermark (t) |
index | Map | null | Ledger index map or graph-source head pointer payload |
status item (ledgers + graph sources)
| Attribute | Type | Description |
|---|---|---|
status_v | Number | Status change counter |
status | Map | Status payload |
status_state | String | null | Optional denormalized status.state for a GSI |
config item (ledgers + graph sources)
| Attribute | Type | Description |
|---|---|---|
config_v | Number | Config change counter |
config | Map | null | Config payload |
Watermark Semantics Summary
| Watermark | Semantics | Initial Value | Update Rule |
|---|---|---|---|
commit_t | = commit t | 0 (unborn) | Strict: new > current |
index_t | = index t | 0 (unborn) | Strict: new > current (admin may allow equal) |
status_v | Counter | 1 (ready) | Strict: new > current |
config_v | Counter | 0 (unborn) | Strict: new > current |
Storage-agnostic commits and sync
Fluree uses ContentId (CIDv1) values as the primary identifiers for commits, index roots, and other immutable artifacts. This decouples the commit chain and nameservice references from any specific storage backend, enabling replication across different storage systems (filesystem, S3, IPFS, etc.) without rewriting commit data.
Pack protocol (fluree-pack-v1)
The pack protocol enables efficient bulk transfer of CAS objects between Fluree instances. Instead of fetching each commit individually (one HTTP round-trip per commit), the pack protocol streams all missing objects in a single binary response.
How it works
- Client sends a
POST /pack/{ledger}request withwant(CIDs the client needs, typically the remote head) andhave(CIDs the client already has, typically the local head). Optionally includesinclude_indexes: truewithwant_index_root_id/have_index_root_idto request binary index artifacts. - Server walks the commit chain from each
wantbackward until it reaches ahave, collecting all missing commits and their referenced txn blobs. When indexes are requested, computes the diff of index artifact CIDs between the want and have index roots. - Server streams commit + txn objects as binary data frames (oldest-first topological order), followed by a Manifest frame and index artifact data frames when indexes are included.
- Client decodes frames incrementally via a
BytesMutbuffer, verifies integrity of each object, and writes to local CAS.
The CLI uses a peek-then-ingest pattern: it reads the Header frame first (via peek_pack_header) to inspect estimated_total_bytes, then prompts for confirmation on large transfers (>1 GiB) before consuming the rest of the stream via ingest_pack_stream_with_header.
Wire format
[Preamble: FPK1 + version(1)] [Header frame] [Data frames...] [Manifest frame]? [Data frames...]? [End frame]
| Frame | Type byte | Content |
|---|---|---|
| Header | 0x00 | JSON metadata: protocol, capabilities, commit count, index artifact count, estimated_total_bytes |
| Data | 0x01 | CID binary + raw object bytes (commit, txn blob, or index artifact) |
| Error | 0x02 | UTF-8 error message (terminates stream) |
| Manifest | 0x03 | JSON metadata for phase transitions (e.g. start of index artifact phase) |
| End | 0xFF | End of stream |
Client-side verification
Each data frame is verified before writing to CAS:
- Commit blobs (
FCV2magic): SHA-256 of full blob viaverify_commit_blob() - All other blobs (txn, index artifacts, config): Full-bytes SHA-256 via
ContentId::verify()
Integrity failure is terminal – the entire ingest is aborted.
Fallback
When the server does not support the pack endpoint (returns 404, 405, 406, or 501), CLI commands automatically fall back to:
- Named-remote: Paginated JSON export via
GET /commits/{ledger} - Origin-based: CID chain walk via
GET /storage/objects/{cid}
Implementation
| Component | Location |
|---|---|
| Wire format (encode/decode), estimation constants | fluree-db-core/src/pack.rs |
| Server-side pack generation + index artifact diff | fluree-db-api/src/pack.rs |
| Server HTTP endpoint | fluree-db-server/src/routes/pack.rs |
Client-side streaming ingest (ingest_pack_stream, peek_pack_header, ingest_pack_stream_with_header) | fluree-db-nameservice-sync/src/pack_client.rs |
| Origin fetcher pack methods | fluree-db-nameservice-sync/src/origin.rs |
| CLI pull/clone with index transfer + size confirmation | fluree-db-cli/src/commands/sync.rs |
set_index_head() API method | fluree-db-api/src/commit_transfer.rs |
For the full design document including graph source packing and protocol evolution, see STORAGE_AGNOSTIC_COMMITS_AND_SYNC.md (repo root).
ContentId and ContentStore
This document describes the content-addressed identity and storage layer introduced by the storage-agnostic commits design. For the full design rationale, see Storage-agnostic commits and sync.
Overview
Fluree’s storage-agnostic architecture separates identity (what something is) from location (where its bytes live). Every immutable artifact—commit, transaction payload, index root, index leaf, dictionary blob—is identified by a ContentId (a CIDv1 value) and stored/retrieved via a ContentStore trait.
Identity is a content ID; location is a local configuration detail.
ContentId
ContentId is a CIDv1 (multiformats) value that encodes three things:
- Version: CIDv1
- Multicodec: identifies the kind of the bytes (e.g., Fluree commit, index root)
- Multihash: identifies the hash function + digest (SHA-256)
Multicodec assignments (private-use range)
Fluree uses the multicodec private-use range for type-tagged CIDs:
| Codec value | ContentKind | Description |
|---|---|---|
0x300001 | Commit | Commit payload |
0x300002 | Txn | Original transaction payload |
0x300003 | IndexRoot | Binary index root descriptor |
0x300004 | IndexBranch | Index branch manifest |
0x300005 | IndexLeaf | Index leaf file |
0x300006 | DictBlob | Dictionary artifact |
0x300007 | DefaultContext | Default JSON-LD @context |
String representation
The canonical string form is base32-lower multibase (the familiar bafy… / bafk… prefixes from IPFS/IPLD). This is the form used in JSON APIs, logs, nameservice records, and CLI output.
bafybeigdyr... (commit CID)
bafkreihdwd... (index root CID)
Binary representation
The compact binary form (varint version + varint codec + multihash bytes) is used for:
- On-wire pack streams
- Internal caches and indexes
- Embedded references inside commit payloads
Creating a ContentId
A ContentId is derived by hashing the canonical bytes of an artifact with SHA-256, then wrapping the digest as a CIDv1 with the appropriate multicodec:
#![allow(unused)]
fn main() {
use fluree_db_core::content_id::{ContentId, ContentKind};
let bytes: &[u8] = /* canonical commit bytes */;
let cid = ContentId::from_bytes(ContentKind::Commit, bytes);
// String form for JSON/logs
let s = cid.to_string(); // "bafybeig..."
// Parse back
let parsed = ContentId::from_str(&s)?;
assert_eq!(cid, parsed);
}
ContentId in commit references
Commits reference parents and related artifacts by ContentId only—never by storage addresses:
{
"t": 42,
"previous": "bafybeigdyr...commitParent",
"txn": "bafkreihdwd...txnBlob",
"index": "bafybeigdyr...indexRoot"
}
ContentKind
ContentKind is an enum that maps 1:1 to multicodec values. It serves two purposes:
- Embedded in CIDs: the multicodec tag lets stores, caches, and validators identify what an object is without parsing its bytes.
- Routing: the ContentStore uses
ContentKindto route objects to the appropriate storage tier (commit store vs index store).
#![allow(unused)]
fn main() {
pub enum ContentKind {
Commit,
Txn,
IndexRoot,
IndexBranch,
IndexLeaf,
DictBlob,
DefaultContext,
}
}
Routing by kind (replaces URL parsing)
Previously, storage routing parsed URL path segments (e.g., looking for "/commit/" in an address string). With ContentId, routing is explicit:
Commit+Txn→ commit-tier store(s)IndexRoot+IndexBranch+IndexLeaf+DictBlob→ index-tier store(s)
ContentStore trait
ContentStore provides content-addressed get/put operations keyed by ContentId:
#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContentStore: Debug + Send + Sync {
/// Retrieve bytes by content ID
async fn get(&self, id: &ContentId) -> Result<Vec<u8>>;
/// Store bytes, returning the computed ContentId
async fn put(&self, kind: ContentKind, bytes: &[u8]) -> Result<ContentId>;
/// Check whether an object exists
async fn has(&self, id: &ContentId) -> Result<bool>;
}
}
Relationship to Storage trait
ContentStore is the primary abstraction for immutable object access. The Storage / StorageRead / ContentAddressedWrite traits handle address-routed I/O for the underlying storage backends (filesystem, S3, etc.), while ContentStore provides the content-addressed layer on top.
Implementations
MemoryContentStore: In-memoryHashMap<ContentId, Vec<u8>>for testing.BridgeContentStore: Adapter that wraps aStorageimplementation, mapping ContentIds to physical storage addresses.- Filesystem / S3 / IPFS: Direct implementations that store objects keyed by CID.
Layered composition
ContentStore implementations can be layered:
Local cache (filesystem)
↓ miss
Shared store (S3 / IPFS / shared filesystem)
Reads fall through from cache to shared store. Writes go to both (policy-configurable).
How ContentId flows through the system
Transaction path
- Transactor produces commit bytes
ContentId::from_bytes(ContentKind::Commit, &bytes)computes the CIDcontent_store.put(Commit, &bytes)stores the blob- Nameservice head is updated:
commit_head_id = cid, commit_t = t
Index path
- Indexer builds binary index, producing root descriptor bytes
ContentId::from_bytes(ContentKind::IndexRoot, &root_bytes)computes the CID- All artifacts (branches, leaves, dicts) are stored via
content_store.put() - Nameservice index head is updated:
index_head_id = cid, index_t = t
Query path
- Query engine reads nameservice to get
index_head_id content_store.get(&index_head_id)fetches the index root- Index root references branches/leaves/dicts by their ContentIds
- Each artifact is fetched via
content_store.get()(with caching)
Replication path (clone/pull/push)
- Client fetches remote nameservice heads (ContentIds + watermarks)
- Client sends
have[]/want[]roots to server - Server walks commit chain and (optionally) index graph to compute missing objects
- Missing objects streamed as
(ContentId, bytes)pairs - Client stores objects in local ContentStore and advances local nameservice heads
No address rewriting is needed because commits contain no storage addresses.
Implementation status
ContentIdtype andContentKindenum:fluree-db-core/src/content_id.rsContentStoretrait +MemoryContentStore+ bridge adapter:fluree-db-core/src/storage.rsCommitandCommitRefuseContentIdfor all references (index pointers are tracked exclusively via nameservice, not embedded in commits)- Nameservice records use
head_commit_id/index_head_idas ContentId values IndexRoot(FIR6) references all artifacts by ContentId- Transact and indexer paths use
ContentStorefor all object I/O
Related documentation
- Storage-agnostic commits and sync — full design rationale
- Storage traits — existing storage trait hierarchy
- Index format — binary index format (IndexRoot / FIR6)
- Nameservice schema v2 — nameservice record schema
Binary index format (leaf / leaflet / dictionaries)
This document describes the on-disk / blob-store formats used by Fluree’s binary indexes: the branch → leaf → leaflet hierarchy for fact indexes, and the dictionary artifacts used to translate between IRIs/strings and compact numeric IDs.
The intent is to make the formats easy to reason about (for debugging and tooling) and to highlight why leaf files contain multiple leaflets: it materially improves performance and cost characteristics on blob/object storage by reducing object counts and request rates while preserving fine-grained decompression and caching at the leaflet level.
Overview
A binary index build produces:
- Per-graph, per-sort-order fact indexes:
- a content-addressed branch manifest (
FBR3, file extension.fbr) - a set of content-addressed leaf files (
FLI3, file extension.fli) - each leaf contains multiple leaflets (compressed blocks with independently compressed regions)
- a content-addressed branch manifest (
- Shared dictionary artifacts:
- small dictionaries (predicates, graphs, datatypes, languages) embedded in the index root (CAS) and/or persisted as flat files in local builds
- large dictionaries (subjects, strings) stored as CoW single-level B-tree-like trees
(a branch manifest
DTB1+ multiple leaf blobsDLF1/DLR1)
- Manifests / roots that describe how to load the above either from a local directory layout
or from the content store via
IndexRoot(FIR6 binary format, CID-based).
Fact indexes exist in up to four sort orders (see RunSortOrder):
- SPOT: ((g, s, p, o, dt, t, op))
- PSOT: ((g, p, s, o, dt, t, op))
- POST: ((g, p, o, dt, s, t, op))
- OPST: ((g, o, dt, p, s, t, op))
Design goals
- Blob-store efficiency: keep object counts low and object sizes in a “healthy” range for S3/GCS/Azure-like stores, avoiding “many tiny objects” request overhead.
- Fast routing: branch manifest enables binary search routing to the relevant leaf range(s).
- Cheap decompression: leaflets are internally structured so query paths can decompress only what they need (e.g., Region 1 to filter before paying for Region 2).
- Content-addressed immutability: leaves/branches/dict leaves can be cached aggressively and safely, because their CAS address (or content hash filename) uniquely identifies content.
- Simple versioning: each binary artifact begins with a magic + version and can be rejected early if incompatible.
Terminology
- Leaflet: a compressed block of rows (default build target:
leaflet_rows = 25_000). - Leaf: a container of multiple leaflets (default:
leaflets_per_leaf = 10) plus a directory for random access to its leaflets. - Branch manifest: maps key ranges to leaf files; used for routing.
- Region: a separately compressed section inside a leaflet.
- Dictionary tree: a
DTB1branch +DLF1/DLR1leaves for large keyspaces (subjects/strings). - ContentId: a CIDv1 value that uniquely identifies a content-addressed artifact by its hash and type. See ContentId and ContentStore.
Physical layout (local build output)
When built to a filesystem directory (see IndexBuildConfig), the output layout is:
index/
index_manifest_spot.json
index_manifest_psot.json
index_manifest_post.json
index_manifest_opst.json
graph_<g_id>/
spot/
<branch_hash>.fbr
<leaf_hash_0>.fli
<leaf_hash_1>.fli
...
psot/
...
post/
...
opst/
...
The .fbr and .fli files are content-addressed by SHA-256 hex of their bytes (the filename is the hash).
index_manifest_<order>.json is a small routing manifest that points to the per-graph directory and branch hash.
Per-order index manifest (index_manifest_<order>.json)
The per-order manifest is JSON and summarizes all graphs for a sort order:
total_rows: total indexed asserted facts for that ordermax_t: max transactiontin the indexed snapshotgraphs[]:g_id,leaf_count,total_rows,branch_hash, anddirectory(relative path)
Root descriptor (CAS): IndexRoot (FIR6)
When publishing an index to nameservice / CAS, the canonical entrypoint is the FIR6 root
(IndexRoot, binary wire format, magic bytes FIR6).
Key properties:
- CID references for all artifacts (dicts, branches, leaves).
- Deterministic binary encoding so the root itself is suitable for content hashing to derive its own ContentId.
- Tracks
index_t(max transaction covered) andbase_t(earliest time for which Region 3 history is valid). - Embeds predicate ID mapping and namespace prefix table inline, so query-time predicate IRI →
p_idtranslation does not require fetching a redundant predicate dictionary blob. - Embeds small dictionaries (graphs, datatypes, languages) inline, so query-time graph/dt/lang resolution does not require fetching tiny dict blobs (important for S3 cold starts).
- Default graph routing is inline: leaf entries (first/last key, row count, leaf CID) are embedded directly, avoiding an extra branch fetch for the common single-graph case.
- Named graph routing uses branch CID pointers: larger multi-graph setups reference branch manifests by CID.
- Optional binary sections for stats, schema, prev_index (GC chain), garbage manifest, and sketch (HLL).
- Import-only performance hint:
IndexRoot.lex_sorted_string_idsindicates whetherStringIdassignment preserves lexicographic UTF-8 byte order of strings (true for bulk imports). Query execution can use this to avoid materializing simple string values duringORDER BYcomparisons. This flag must be cleared on the first post-import write because incremental dictionary appends break the invariant. When the flag is absent (older roots) or false, query execution must assume no lexical ordering.
At a high level the root contains:
- Inline small dictionaries (embedded in the binary root):
graph_iris[](dict_index → graph IRI;g_id = dict_index + 1)datatype_iris[](dt_id → datatype IRI)language_tags[](lang_id-1 → tag string;lang_id = index + 1, 0 = “no tag”)
- Dictionary ContentIds (CAS artifacts):
- tree blobs: subject/string forward & reverse (
DTB1branch +DLF1/DLR1leaves) - optional per-predicate numbig arenas
- optional per-predicate vector arenas (manifest + shards)
- tree blobs: subject/string forward & reverse (
- Default graph routing (inline leaf entries per sort order)
- Named graph routing (branch CIDs per sort order per graph)
Branch manifest (FBR3, .fbr)
A branch manifest is a single-level index mapping key ranges to leaf files. It is written per graph per order and read via binary search to route a lookup/range scan.
File format
[BranchHeader: 16 bytes]
magic: "FBR3" (4B)
version: u8
_pad: [u8; 3]
leaf_count: u32
_reserved: u32
[LeafEntries: leaf_count × 104 bytes]
first_key: key bytes (44B, little-endian) [1]
last_key: key bytes (44B, little-endian) [1]
row_count: u64
path_offset: u32
path_len: u16
_pad: u16
[PathTable]
Concatenated UTF-8 relative paths (typically "<leaf_hash>.fli")
Notes:
first_keyandlast_keyuse the same 44-byte key wire encoding produced by the index builder (see footnote [1]).- The path table stores relative filenames; on read, paths are resolved against the
.fbr’s directory. - In local builds, paths are
<leaf_hash>.flito match the content-addressed leaf filenames.
[1] Key encoding note (internal): the 44-byte key is the RunRecord wire layout used by the import/index-build
pipeline and stored here only for routing. It is an internal build artifact detail (not a core runtime fact type).
Leaf file (FLI3, .fli)
A leaf file groups multiple leaflets into a single blob, and includes a small directory so leaflets can be accessed without scanning the entire file.
File format
[LeafHeader: variable size]
magic: "FLI3" (4B)
version: u8 (currently 1)
order: u8
dt_width: u8 (currently 1; may widen to 2)
p_width: u8 (2=u16, 4=u32)
total_rows: u64
first_key: SortKey (28B)
last_key: SortKey (28B)
[LeafletDirectory: leaflet_count × 40B] (v2: 28B, lacks first_o_*)
offset: u64
compressed_len: u32
row_count: u32
first_s_id: u64
first_p_id: u32
first_o_kind: u8 (v3+)
_pad: [u8; 3] (v3+)
first_o_key: u64 (v3+)
[LeafletData: concatenated encoded leaflets]
The v3 leaflet directory adds first_o_kind and first_o_key to each entry.
These fields enable leaflet-boundary skip-decoding: if two adjacent leaflet
directory entries share the same (p_id, o_kind, o_key), the entire earlier
leaflet is guaranteed to contain only that (p, o) combination. Fast-path
COUNT + GROUP BY operators use this property to count rows by row_count
without decompressing Region 1, which significantly reduces CPU and I/O for
large predicate scans. v2 leaves (which lack these fields) are still readable
but always require full leaflet decoding.
SortKey (leaf routing key)
SortKey is a compact 28-byte key stored in leaf headers:
g_id: u32
s_id: u64
p_id: u32
dt: u16
o_kind: u8
_pad: u8
o_key: u64
SortKey exists to reduce leaf header overhead; the branch manifest uses full RunRecord boundaries.
It also intentionally omits t, op, lang_id, and i — leaf header keys are useful for coarse
metadata and diagnostics, while precise routing is done via the branch’s full RunRecord ranges.
Why “leaf contains leaflets” (blob-store optimization)
If every leaflet were its own object:
- range scans and joins would issue many more GETs (request overhead dominates)
- caches would be pressured by object metadata overhead and higher churn
By grouping N leaflets into one leaf object:
- we reduce object count and request rate roughly by a factor of N
- we still keep leaflet-sized “micro-partitions” internally for:
- selective decompression (region-by-region)
- caching hot leaflets (decoded) independent of unrelated ones
- future optimizations like ranged reads (leaflet offsets are explicit)
The default build targets (leaflet_rows = 25_000, leaflets_per_leaf = 10) yield a leaf that is
large enough to amortize object-store overhead but still small enough to cache and move efficiently.
Leaflet format (compressed block inside a leaf)
A leaflet is a compressed block of rows containing three regions. Each region is independently zstd-compressed.
Leaflet header (fixed 61 bytes)
row_count: u32
region1_offset: u32
region1_compressed_len: u32
region1_uncompressed_len: u32
region2_offset: u32
region2_compressed_len: u32
region2_uncompressed_len: u32
region3_offset: u32
region3_compressed_len: u32
region3_uncompressed_len: u32
first_s_id: u64
first_p_id: u32
first_o_kind: u8
first_o_key: u64
Regions
- Region 1 (core columns): order-dependent layout optimized for scan/join filtering.
- includes an RLE-encoded “primary” column (e.g.,
s_idin SPOT) - stores the other core columns as dense arrays
p_idmay be stored asu16oru32depending on dictionary cardinality (p_width)
- includes an RLE-encoded “primary” column (e.g.,
- Region 2 (metadata columns): values needed to reconstruct full flakes (datatype, transaction time, etc.).
- stored in a layout that supports sparse
lang_idandiwithout per-row overhead dtis stored asu8today (dt_width = 1) and may widen tou16
- stored in a layout that supports sparse
- Region 3 (history journal): optional operation log to support time-travel semantics from
base_tonward.- stored as a sequence of fixed-size entries in reverse chronological order (newest first)
Region 1 layouts (uncompressed)
Region 1’s uncompressed bytes vary by sort order:
- SPOT:
RLE(s_id:u64),p_id[p_width],o_kind[u8],o_key[u64] - PSOT:
RLE(p_id:u32),s_id[u64],o_kind[u8],o_key[u64] - POST:
RLE(p_id:u32),o_kind[u8],o_key[u64],s_id[u64] - OPST:
RLE(o_key:u64),p_id[p_width],s_id[u64]- OPST leaflets are type-homogeneous (segmented by
o_type), so the per-row object type column can be omitted and stored as a constant in the leaflet directory entry. When a leaflet contains mixed types in other orders,o_typeis stored as a per-row column.
- OPST leaflets are type-homogeneous (segmented by
RLE encoding is:
run_count: u32
[(key, run_len)] × run_count
with (key=u64, run_len=u32) or (key=u32, run_len=u32) depending on the field.
Region 2 layout (uncompressed)
dt: [dt_width bytes] × row_count
t: [i64] × row_count
lang_bitmap: u8 × ceil(row_count/8)
lang_values: u16 × popcount(lang_bitmap)
i_bitmap: u8 × ceil(row_count/8)
i_values: i32 × popcount(i_bitmap)
lang_idis 0 when absent; otherwise stored inlang_valueskeyed by bitmap position.iusesListIndex::none()(sentinel) when absent; otherwise stored sparsely.
Region 3 layout (uncompressed)
Region 3 is an operation journal stored newest-first:
entry_count: u32
[Region3Entry; entry_count] // 37 bytes per entry
Region3Entry wire layout (37 bytes):
s_id: u64
p_id: u32
o_kind: u8
o_key: u64
t_signed: i64 // positive = assert, negative = retract, abs() = t
dt: u16
lang_id: u16
i: i32
Dictionary artifacts
Binary indexes store facts in numeric-ID form. Dictionaries are required to:
- translate query inputs (IRIs, strings) to numeric IDs for scans
- decode numeric IDs back to user-visible values when returning flakes
Small flat dictionaries (FRD1)
Several dictionaries use a simple “count + length-prefixed UTF-8” format:
magic: "FRD1" (4B)
count: u32
for each entry:
len: u32
utf8_bytes: [u8; len]
This format is used for predicate-like dictionaries. In local builds these are written
as flat files (e.g., graphs.dict, datatypes.dict, languages.dict), but in CAS
publishes (FIR6 root) these small dictionaries are embedded inline in the binary root.
Legacy forward files + index (FSI1) (primarily build-time)
Some build paths still write a forward file (*.fwd) plus a separate index (*.idx):
FSI1 index format:
magic: "FSI1" (4B)
count: u32
offsets: [u64] × count
lens: [u32] × count
The forward file itself is a raw concatenation of bytes; access is via (offset,len) from the index.
Large dictionaries as CoW trees (DTB1 + leaf blobs)
Subjects and strings are large enough that we represent them as single-level CoW trees:
- Branch:
DTB1mapping key ranges to leaf ContentIds - Leaves:
- forward leaf (
DLF1): numeric ID → value bytes - reverse leaf (
DLR1): key bytes → numeric ID
- forward leaf (
Dictionary branch (DTB1)
[magic: 4B "DTB1"]
[leaf_count: u32]
[offset_table: u32 × leaf_count] // byte offset of each leaf entry
[leaf entries...]
entry :=
[first_key_len: u32] [first_key_bytes]
[last_key_len: u32] [last_key_bytes]
[entry_count: u32]
[content_id_len: u16] [content_id_bytes]
Keys are treated as raw bytes and compared lexicographically. For forward trees keyed by numeric ID, the branch uses 8-byte big-endian keys (so lexical order matches numeric order).
Forward dict leaf (DLF1)
[magic: 4B "DLF1"]
[entry_count: u32]
[offset_table: u32 × entry_count]
[data section]
entry := [id: u64 LE] [value_len: u32] [value_bytes]
Reverse dict leaf (DLR1)
[magic: 4B "DLR1"]
[entry_count: u32]
[offset_table: u32 × entry_count]
[data section]
entry := [key_len: u32] [key_bytes] [id: u64 LE]
Subject reverse key format is:
[ns_code: u16 BE][suffix bytes]
The u16 big-endian prefix ensures that lexicographic byte comparisons match logical (ns_code, suffix) ordering.
Endianness and encoding conventions
- Numeric fields in file formats are little-endian, unless explicitly stated otherwise.
- Subject reverse keys embed
ns_codein big-endian for byte-sort correctness. - Compression is currently zstd via independent region compression within a leaflet.
- Fact keys are keyed by numeric IDs; ID assignment is provided by dictionary artifacts and/or the root.
Integrity, caching, and lifecycle
- Leaf and branch filenames (local) are derived from SHA-256 content hashes; remote references use ContentId (CIDv1).
- Content-addressed artifacts are immutable; caches can key by ContentId.
IndexRoot(FIR6) provides a GC chain (prev_index) and an optional garbage manifest pointer to support retention-based cleanup of replaced artifacts.
Versioning notes
- Fact artifacts:
- branch: magic
FBR3, version1 - leaf: magic
FLI3, version1
- branch: magic
- Dictionary tree artifacts:
- branch: magic
DTB1 - leaves: magic
DLF1/DLR1
- branch: magic
- Small dict blobs: magic
FRD1
When adding new fields, prefer:
- bumping the per-file
versionbyte (when present), and - keeping old readers strict (fail fast on unsupported versions) to avoid silent corruption.
Namespace allocation and fallback modes
Fluree encodes IRIs as compact SIDs: a (ns_code, local) pair where:
ns_codeis au16namespace code that identifies an IRI prefixlocalis the remaining suffix (bytes) after removing the matched prefix
The database maintains a namespace table (LedgerSnapshot.namespace_codes: ns_code -> prefix string).
That table is embedded in the published index root and is loaded whenever a LedgerSnapshot is opened.
This document describes how Fluree chooses a namespace prefix for an IRI, and how it mitigates datasets that would otherwise allocate an excessive number of distinct namespace prefixes.
Goals
- Keep declared namespaces intact: if a dataset declares
@prefix foo: <...>, we want IRIs in that namespace to use that exact prefix, not a derived/split prefix. - Stable behavior across writes: after importing an “outlier” dataset, subsequent transactions should continue using the same fallback rules for previously unseen IRIs (e.g. new hosts), avoiding regression back to finer-grained splitting.
- Contain namespace explosion: avoid allocating one namespace code per highly-specific leaf
(e.g. splitting on the last
/for IRIs whose paths are effectively unique).
Core rule: declared-prefix trie match wins
Namespace resolution is trie-first:
- Load all known prefixes (predefined defaults + DB namespace table) into a byte-level trie.
- For each IRI, perform a longest-prefix match.
- If a match is found, emit
Sid(ns_code, iri[prefix_len..])and do not run fallback logic.
Only IRIs with no matching prefix fall through to the fallback splitter.
Implementation: fluree-db-transact/src/namespace.rs
NamespaceRegistry::sid_for_iri(transactions, serial paths)SharedNamespaceAllocator::sid_for_iri(parallel bulk import)
Fallback split modes (only for unmatched IRIs)
Fluree uses a small set of fallback “splitters” that derive (prefix, local) for IRIs that do not
match any known prefix.
The active fallback behavior is represented by NsFallbackMode:
LastSlashOrHash(default): split on the last/or#(prefix is inclusive)CoarseHeuristic(outlier mitigation):- http(s): usually
scheme://host/<seg1>/ - special-case: DBLP-style
.../pid/<digits>/buckets may keep 2 segments - non-http(s) with
:but no/or#: split at the 2nd:when present (e.g.urn:isbn:), else the 1st:
- http(s): usually
HostOnly(“fallback to the fallback”):- http(s):
scheme://host/ - non-http(s) with
:but no/or#: split at the 1st: - else: last-slash-or-hash
- http(s):
Implementation: fluree-db-transact/src/namespace.rs
Bulk import: streaming preflight + dynamic mitigation
For large Turtle streaming imports, Fluree attempts to detect “namespace explosion” early without an extra I/O pass:
StreamingTurtleReadersamples bounded byte windows within the first chunk region and counts distinct prefixes underLastSlashOrHash.- If the sample exceeds a budget (
NS_PREFLIGHT_BUDGET, currently 255), the reader publishes a preflight result recommending mitigation. - The import forwarder enables
CoarseHeuristicon the shared allocator before parsing begins (so the earliest allocations are already coarse). - If allocations under
CoarseHeuristicstill grow beyond the u8-ish threshold (>255), the shared allocator switches toHostOnlyso new, unseen hosts do not allocate deeper-than-host namespaces.
Implementation:
- Preflight detector:
fluree-graph-turtle/src/splitter.rs - Policy application:
fluree-db-api/src/import.rs - Runtime switch:
SharedNamespaceAllocator::get_or_allocateinfluree-db-transact/src/namespace.rs
Transactions after import: preventing regression for unseen IRIs
Bulk import can upgrade fallback behavior at runtime (shared allocator). For subsequent normal
transactions, we also need “outlier mode” to persist so new IRIs do not regress to LastSlashOrHash.
Fluree derives this from the DB’s namespace table at open time:
- When a
LedgerSnapshotis opened,NamespaceRegistry::from_db(db)loadsdb.namespace_codes. - If the DB has already allocated namespace codes beyond the u8-ish threshold (>255), the registry
sets its fallback mode to
HostOnly.
That means a new IRI like:
http://some-unseen-host/blah/123/456
will allocate (if needed) at:
http://some-unseen-host/
instead of falling back to a finer last-slash split.
Implementation: NamespaceRegistry::from_db and NamespaceRegistry::sid_for_iri in
fluree-db-transact/src/namespace.rs
Notes and trade-offs
HostOnlycan still result in many namespaces if a dataset genuinely contains many distinct hosts (one per host), but it prevents deeper fragmentation that is common in path-heavy IRIs.- The
OVERFLOWnamespace code is a sentinel used whenu16codes are exhausted; it is not a fallback mode. Overflow SIDs store the full IRI as the SID name.
Ontology imports (f:schemaSource + owl:imports)
Reasoning in Fluree needs to see a ledger’s ontology — class and property hierarchies, OWL axioms — even when those triples don’t live in the same graph as the instance data being queried. This document describes how that binding is configured, resolved, and plumbed into the reasoning pipeline.
Topics:
- Config-layer contract (
f:schemaSource,f:followOwlImports,f:ontologyImportMap). - Resolution algorithm for the
owl:importsclosure. SchemaBundleOverlay— how the resolved closure is presented to the reasoner without changing reasoner internals.- Caching, error semantics, and the schema-triple whitelist.
Related docs:
Configuration
Reasoning config is declared in the ledger’s config graph (g_id=2), on the
f:LedgerConfig resource’s f:reasoningDefaults. Three fields drive
ontology resolution:
@prefix f: <https://ns.flur.ee/db#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
GRAPH <urn:fluree:myapp:main#config> {
<urn:myapp:config> a f:LedgerConfig ;
f:reasoningDefaults <urn:myapp:config:reasoning> .
<urn:myapp:config:reasoning>
f:reasoningModes ( "rdfs" "owl2-rl" ) ;
f:schemaSource <urn:myapp:config:schema-ref> ;
f:followOwlImports true ;
f:ontologyImportMap <urn:myapp:config:bfo-binding> .
<urn:myapp:config:schema-ref> a f:GraphRef ;
f:graphSource <urn:myapp:config:schema-source> .
<urn:myapp:config:schema-source>
f:graphSelector <http://example.org/ontology/core> .
<urn:myapp:config:bfo-binding>
f:ontologyIri <http://purl.obolibrary.org/obo/bfo.owl> ;
f:graphRef <urn:myapp:config:bfo-ref> .
<urn:myapp:config:bfo-ref> a f:GraphRef ;
f:graphSource <urn:myapp:config:bfo-source> .
<urn:myapp:config:bfo-source>
f:graphSelector <http://example.org/ontology/local/bfo> .
}
Field reference:
| Field | Type | Meaning |
|---|---|---|
f:schemaSource | f:GraphRef | Starting graph for schema extraction. When absent, reasoning uses the default graph directly. |
f:followOwlImports | xsd:boolean | When true, resolve the transitive closure of owl:imports triples starting from f:schemaSource. When absent or false, the bundle contains only the starting graph. |
f:ontologyImportMap | list of OntologyImportBinding | Mapping table from external ontology IRIs to local graphs. Consulted when an owl:imports IRI doesn’t match a named graph in the current ledger. |
An OntologyImportBinding has two fields:
f:ontologyIri— the IRI that appears inowl:importsstatements.f:graphRef— a nestedf:GraphRefidentifying the local graph.
The GraphRef shape supported for f:schemaSource and
f:ontologyImportMap.graphRef is the same-ledger shape:
f:graphSelector naming a local named graph, f:defaultGraph, or a
registered graph IRI. References are resolved at the query’s effective
to_t — every named graph in a Fluree ledger shares the ledger’s
monotonic t, so the entire closure is consistent at a single point in
time without per-import bookkeeping.
Resolution algorithm
For each owl:imports <X> triple discovered while walking the closure, the
resolver (fluree_db_api::ontology_imports::resolve_schema_bundle) applies
this order:
- Named-graph match — if
<X>is registered as a graph IRI in the current ledger’s [GraphRegistry], resolve to thatGraphId. - Mapping-table fallback — if
<X>appears inf:ontologyImportMap, resolve via the boundGraphSourceRef. - Strict error — otherwise, fail the query with
ApiError::OntologyImport. There is no silent skip.
The walk is BFS, deduplicated by resolved GraphId, and cycle-safe by
construction (we only push unseen IDs onto the queue). The result is a
ResolvedSchemaBundle { ledger_id, to_t, sources: Vec<GraphId> }.
System graphs are off-limits
Imports resolving to CONFIG_GRAPH_ID (g_id=2) or TXN_META_GRAPH_ID
(g_id=1) are rejected — those graphs are structurally reserved and would
leak framework triples into reasoning. The guard sits in the single
resolve_local_graph_source chokepoint, so every resolution path
(direct graph-IRI match, f:ontologyImportMap entry, f:schemaSource
selector) is covered.
owl:imports discovery is subject-wildcarded
Every ?s owl:imports ?o triple in a schema graph is treated as
authoritative, regardless of whether ?s is typed owl:Ontology. This is
broader than strict OWL 2 (which restricts owl:imports to the ontology
header) and matches real-world OWL inputs that rely on file-level
provenance. The resolution layer’s strictness still applies: a stray
owl:imports triple that doesn’t map to a local graph fails the query
rather than silently expanding the closure.
Reasoning-disabled queries don’t trigger resolution
Queries that opt out of reasoning ("reasoning": "none") skip bundle
resolution entirely — a broken ontology import in the ledger’s config
shouldn’t produce errors for a non-reasoning workload. The short-circuit
lives in attach_schema_bundle (both the single-view and dataset paths).
Projecting the bundle into reasoning
RDFS and OWL extraction code reads schema triples out of the default graph
(g_id=0). The resolver feeds that code via a
SchemaBundleOverlay that
projects whitelisted triples from every bundle source onto g_id=0,
so the reasoner sees the full closure without being aware of it.
The projection happens in two phases:
- Materialize.
build_schema_bundle_flakesruns targeted reads against every source graph — one PSOT scan per schema predicate and one OPST scan per schema class — and collects the matching flakes into per-index sorted arrays (SPOT / PSOT / POST / OPST). Reads go through the normalrange_with_overlaypath, so both committed index data and novelty are visible. - Overlay.
SchemaBundleOverlay::new(base_overlay, flakes)wraps the query’s base overlay. Forg_id != 0it delegates straight to the base. Forg_id == 0it emits a linear merge of base flakes and bundle flakes in index order.
The reasoner sees: base default-graph flakes ∪ projected schema flakes,
presented as a single ordered stream at g_id=0. Reasoner code is
unmodified.
Schema-triple whitelist
Only the following predicates are eligible for projection:
- RDFS:
rdfs:subClassOf,rdfs:subPropertyOf,rdfs:domain,rdfs:range - OWL:
owl:inverseOf,owl:equivalentClass,owl:equivalentProperty,owl:sameAs,owl:imports
And rdf:type triples are projected only when the object is one of:
owl:Class, owl:ObjectProperty, owl:DatatypeProperty,
owl:SymmetricProperty, owl:TransitiveProperty, owl:FunctionalProperty,
owl:InverseFunctionalProperty, owl:Ontology, rdf:Property.
Anything else in an import graph — in particular, instance data —
does not surface in the reasoner’s view. See
fluree_db_core::{is_schema_predicate, is_schema_class} for the canonical
checks and
fluree-db-api/tests/it_reasoning_imports.rs::instance_data_in_schema_graph_does_not_leak
for the regression test.
Caching
global_schema_bundle_cache() is a process-wide moka::sync::Cache keyed
by:
ledger_id: Arc<str>to_t: i64starting_g_id: GraphId(the resolvedf:schemaSource)follow_imports: bool
Because config lives in the same ledger (g_id=2) and any config change
advances t, the to_t dimension is sufficient to express “config
version” — there is no separate config_epoch key, and no explicit
invalidation logic. Stale entries age out via LRU.
The cache stores the resolution result (Vec<GraphId>); the projected
flake arrays are rebuilt per query. Materialization is cheap relative to
reasoning itself, and keeping the cached value small lets many entries
coexist for many ledgers without memory pressure.
Error semantics
ApiError::OntologyImport is raised when the configured closure is
invalid. Every message identifies the offending resource and suggests
remediation. Queries fail rather than silently returning reduced results,
so broken ontology references surface early. Sources of this error:
- An
owl:imports <X>that doesn’t match a local named graph and has nof:ontologyImportMapentry. - A resolution that would land on a reserved system graph (config or
txn-meta), whether via direct graph-IRI match, mapping table, or
f:schemaSourceselector. - A
GraphRefthat targets a different ledger, usesf:atT, or carries af:trustPolicy/f:rollbackGuard. The bundle is resolved at the query’s singleto_t, same-ledger scope only, and accepting these fields silently would create a gap between declared intent and actual behavior.
Wiring at query time
Fluree::query(&db, ...) (and the dataset-query counterpart) call
build_executable_for_view → attach_schema_bundle on every query. The
attach step:
- Reads
db.resolved_config().reasoning. If there is nof:schemaSource, returns immediately — the legacy default-graph path applies unchanged. - Calls
resolve_schema_bundlefor the closure, consulting the cache. - Materializes
SchemaBundleFlakesviabuild_schema_bundle_flakes. - Sets
executable.options.schema_bundlesoprepare_executionwrapsdb.overlayin aSchemaBundleOverlayfor the reasoning_prep block.
Downstream, schema_hierarchy_with_overlay, reason_owl2rl, and
Ontology::from_db_with_overlay all receive the same wrapped overlay and
see the full closure on g_id=0 reads.
Testing
The acceptance suite lives in
fluree-db-api/tests/it_reasoning_imports.rs and covers:
- Same-ledger auto resolution of a named schema source.
- Transitive
A → Bwith a subclass edge inB. - Mapping table fallback for external IRIs.
- Unresolved imports surface as
ApiError::OntologyImport. - Cycle
A → B → Aterminates and still yields the correct closure. - Mapping entries that would target a reserved system graph are rejected.
"reasoning": "none"queries skip resolution entirely (no spurious errors from unrelated config).f:atTon aGraphRefis rejected with a clear message.- Instance data in the schema graph does not leak into query results.
- End-to-end OWL2-RL rule firing through a transitive import:
owl:TransitiveProperty,owl:inverseOf, andrdfs:domainaxioms declared in an imported graph produce the expected entailments against instance data in the default graph.
Module-level unit tests cover the cache keys, empty-bundle passthrough, and non-default-graph delegation.
Storage Traits Design
This document describes the storage trait architecture in Fluree DB, explaining the design rationale and providing guidance for implementing new storage backends.
Overview
Fluree uses a layered storage abstraction that separates:
- Content-addressed access (
fluree-db-core): TheContentStoretrait provides get/put/has operations keyed byContentId(CIDv1). This is the primary interface for all immutable artifact access (commits, index roots, leaves, dicts). - Physical storage traits (
fluree-db-core): Runtime-agnostic storage operations (StorageRead,StorageWrite,ContentAddressedWrite) with standardResult<T>error handling. These handle the physical I/O layer beneath ContentStore. - Extension traits (
fluree-db-nameservice): Nameservice-specific operations withStorageExtResult<T>for richer error semantics (CAS operations, pagination, etc.).
See ContentId and ContentStore for the content-addressed identity model.
Quick Start: The Prelude
For convenient imports, use the storage prelude:
#![allow(unused)]
fn main() {
use fluree_db_core::prelude::*;
// Now you have access to:
// - Storage, StorageRead, StorageWrite, ContentAddressedWrite (traits)
// - MemoryStorage, FileStorage (implementations)
// - ContentKind, ContentWriteResult, ReadHint (types)
async fn example<S: Storage>(storage: &S) -> Result<()> {
let bytes = storage.read_bytes("some/address").await?;
storage.write_bytes("other/address", &bytes).await?;
Ok(())
}
}
For API consumers, fluree-db-api re-exports all storage traits:
#![allow(unused)]
fn main() {
use fluree_db_api::{Storage, StorageRead, MemoryStorage};
}
Trait Hierarchy
┌──────────────────────┐
│ ContentStore │ get(ContentId), put(ContentKind, bytes), has(ContentId)
└──────────────────────┘
(primary interface for immutable artifacts)
┌─────────────────┐
│ StorageRead │ read_bytes, exists, list_prefix
└────────┬────────┘
│
┌────────┴────────┐
│ StorageWrite │ write_bytes, delete
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ ContentAddressedWrite │ content_write_bytes[_with_hash]
└──────────────┬──────────────┘
│
┌────────┴────────┐
│ Storage │ (marker trait - blanket impl)
└─────────────────┘
(physical I/O layer)
ContentStore is the content-addressed layer that sits above the physical storage traits. It maps ContentId values to physical storage locations via the underlying Storage implementation.
ContentStore (fluree-db-core)
The ContentStore trait is the primary interface for accessing immutable, content-addressed artifacts (commits, index roots, leaves, dictionaries, etc.).
#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContentStore: Debug + Send + Sync {
/// Retrieve bytes by content ID
async fn get(&self, id: &ContentId) -> Result<Vec<u8>>;
/// Store bytes, returning the computed ContentId
async fn put(&self, kind: ContentKind, bytes: &[u8]) -> Result<ContentId>;
/// Check whether an object exists
async fn has(&self, id: &ContentId) -> Result<bool>;
}
}
Design notes:
ContentIdis a CIDv1 value encoding the hash function, digest, and content kind (multicodec). See ContentId and ContentStore.ContentKindenables routing to different storage tiers (commit store vs index store) without parsing URL paths.putcomputes the content hash and returns the derivedContentId.- Implementations include
MemoryContentStore(for testing) andBridgeContentStore(adapts aStoragebackend).
Physical Storage Traits (fluree-db-core)
The physical storage traits handle raw byte I/O against storage backends (filesystem, S3, memory). ContentStore implementations typically wrap these.
StorageRead
Read-only storage operations. Implement this for any storage that can retrieve data.
#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageRead: Debug + Send + Sync {
/// Read raw bytes from an address
async fn read_bytes(&self, address: &str) -> Result<Vec<u8>>;
/// Read with a hint for content type optimization
/// Default implementation ignores the hint
async fn read_bytes_hint(&self, address: &str, hint: ReadHint) -> Result<Vec<u8>> {
self.read_bytes(address).await
}
/// Check if an address exists
async fn exists(&self, address: &str) -> Result<bool>;
/// Read a `[start, end)` byte range from an address.
/// Default implementation fetches the full object and slices; backends
/// that support native range reads (S3, HTTP) should override.
async fn read_byte_range(&self, address: &str, range: std::ops::Range<u64>)
-> Result<Vec<u8>>;
/// List all addresses with a given prefix
async fn list_prefix(&self, prefix: &str) -> Result<Vec<String>>;
/// List addresses under a prefix together with byte sizes.
/// Default implementation returns an error indicating the backend does
/// not support cheap metadata listing. Backends with native list+size
/// (S3 `list_objects_v2`, GCS, etc.) should override.
async fn list_prefix_with_metadata(&self, prefix: &str)
-> Result<Vec<RemoteObject>>;
/// Resolve a CAS address to a local filesystem path, if available.
fn resolve_local_path(&self, address: &str) -> Option<PathBuf> { None }
}
/// `(address, size)` pair returned by `list_prefix_with_metadata`.
pub struct RemoteObject {
pub address: String,
pub size_bytes: u64,
}
}
Design notes:
read_bytes_hintenables optimizations like returning pre-encoded flakes for leaf nodesread_byte_rangeallows partial reads against backends with native HTTP/S3 range support; the default impl is correct but does N full-object fetches for N range readslist_prefixis essential for garbage collection and administrative operationslist_prefix_with_metadatais used by the bulk-import remote-source path so the importer can size each chunk before fetching. Backends without cheap size metadata return an error; callers can fall back to caller-supplied object listsresolve_local_pathlets callers (e.g., import scratch staging) skip a copy when the storage already exposes data on the local filesystem (FileStorage)- All methods return
fluree_db_core::Result<T>(alias forstd::result::Result<T, Error>)
StorageWrite
Mutating storage operations. Implement alongside StorageRead for read-write storage.
#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageWrite: Debug + Send + Sync {
/// Write raw bytes to an address
async fn write_bytes(&self, address: &str, bytes: &[u8]) -> Result<()>;
/// Delete data at an address
async fn delete(&self, address: &str) -> Result<()>;
}
}
Design notes:
deleteis part of the core write trait (not separate) because any writable storage should support deletion- Implementations should be idempotent: deleting a non-existent address succeeds silently
ContentAddressedWrite
Extension trait for content-addressed (hash-based) writes. Extends StorageWrite.
#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContentAddressedWrite: StorageWrite {
/// Write bytes with a pre-computed content hash
/// Returns the canonical address and metadata
async fn content_write_bytes_with_hash(
&self,
kind: ContentKind,
ledger_id: &str,
content_hash_hex: &str,
bytes: &[u8],
) -> Result<ContentWriteResult>;
/// Write bytes, computing the hash internally
/// Default implementation computes SHA-256 and delegates
async fn content_write_bytes(
&self,
kind: ContentKind,
ledger_id: &str,
bytes: &[u8],
) -> Result<ContentWriteResult> {
let hash = sha256_hex(bytes);
self.content_write_bytes_with_hash(kind, ledger_id, &hash, bytes).await
}
}
}
Design notes:
ContentKindindicates whether data is a commit or index, enabling routing to different storage tiers- The default
content_write_bytesimplementation handles hash computation, so most backends only need to implementcontent_write_bytes_with_hash - Content-addressed storage enables deduplication and integrity verification
Storage (Marker Trait)
A convenience marker trait indicating full storage capability.
#![allow(unused)]
fn main() {
/// Full storage capability: read + content-addressed write
pub trait Storage: StorageRead + ContentAddressedWrite {}
/// Blanket implementation for any type implementing both traits
impl<T: StorageRead + ContentAddressedWrite> Storage for T {}
}
Usage:
#![allow(unused)]
fn main() {
// Instead of this verbose bound:
fn process<S: StorageRead + StorageWrite + ContentAddressedWrite>(storage: &S)
// Use this:
fn process<S: Storage>(storage: &S)
}
Extension Traits (fluree-db-nameservice)
The nameservice crate defines additional traits with StorageExtResult<T> for richer error handling (e.g., PreconditionFailed for CAS operations).
StorageList
Paginated listing for large-scale storage backends.
#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageList {
async fn list_prefix(&self, prefix: &str) -> StorageExtResult<Vec<String>>;
async fn list_prefix_paginated(
&self,
prefix: &str,
continuation_token: Option<String>,
max_keys: usize,
) -> StorageExtResult<ListResult>;
}
}
StorageCas
Compare-and-swap operations for consistent distributed updates.
#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageCas {
/// Write only if the address doesn't exist
async fn write_if_absent(&self, address: &str, bytes: &[u8]) -> StorageExtResult<bool>;
/// Write only if the current version matches expected_etag
async fn write_if_match(
&self,
address: &str,
bytes: &[u8],
expected_etag: &str,
) -> StorageExtResult<String>;
/// Read with version/etag for subsequent CAS operations
async fn read_with_etag(&self, address: &str) -> StorageExtResult<(Vec<u8>, String)>;
}
}
StorageDelete (nameservice)
Delete with nameservice error semantics.
#![allow(unused)]
fn main() {
#[async_trait]
pub trait StorageDelete {
async fn delete(&self, address: &str) -> StorageExtResult<()>;
}
}
Why separate from core StorageWrite::delete?
- Nameservice operations need
StorageExtResultfor errors likePreconditionFailed - Core operations use standard
Resultfor simplicity - Storage backends typically implement both, with the nameservice version delegating to core
Implementing a Storage Backend
Minimal Read-Only Backend
For a read-only backend (e.g., ProxyStorage that fetches via HTTP):
#![allow(unused)]
fn main() {
#[async_trait]
impl StorageRead for MyReadOnlyStorage {
async fn read_bytes(&self, address: &str) -> Result<Vec<u8>> {
// Fetch from remote
}
async fn exists(&self, address: &str) -> Result<bool> {
// Check existence (can implement as try-read)
match self.read_bytes(address).await {
Ok(_) => Ok(true),
Err(Error::NotFound(_)) => Ok(false),
Err(e) => Err(e),
}
}
async fn list_prefix(&self, _prefix: &str) -> Result<Vec<String>> {
Err(Error::storage("list_prefix not supported"))
}
}
// Must also implement StorageWrite (with error stubs) and ContentAddressedWrite
// if you want to satisfy the Storage marker trait
#[async_trait]
impl StorageWrite for MyReadOnlyStorage {
async fn write_bytes(&self, _: &str, _: &[u8]) -> Result<()> {
Err(Error::storage("read-only storage"))
}
async fn delete(&self, _: &str) -> Result<()> {
Err(Error::storage("read-only storage"))
}
}
#[async_trait]
impl ContentAddressedWrite for MyReadOnlyStorage {
async fn content_write_bytes_with_hash(&self, ...) -> Result<ContentWriteResult> {
Err(Error::storage("read-only storage"))
}
}
}
Full Read-Write Backend
For a complete backend (e.g., S3, filesystem):
#![allow(unused)]
fn main() {
// 1. Implement core traits
#[async_trait]
impl StorageRead for MyStorage {
async fn read_bytes(&self, address: &str) -> Result<Vec<u8>> { ... }
async fn exists(&self, address: &str) -> Result<bool> { ... }
async fn list_prefix(&self, prefix: &str) -> Result<Vec<String>> { ... }
}
#[async_trait]
impl StorageWrite for MyStorage {
async fn write_bytes(&self, address: &str, bytes: &[u8]) -> Result<()> { ... }
async fn delete(&self, address: &str) -> Result<()> { ... }
}
#[async_trait]
impl ContentAddressedWrite for MyStorage {
async fn content_write_bytes_with_hash(
&self,
kind: ContentKind,
ledger_id: &str,
content_hash_hex: &str,
bytes: &[u8],
) -> Result<ContentWriteResult> {
// Build address from kind + alias + hash
let address = build_content_address(kind, ledger_id, content_hash_hex);
self.write_bytes(&address, bytes).await?;
Ok(ContentWriteResult {
address,
content_hash: content_hash_hex.to_string(),
size_bytes: bytes.len(),
})
}
}
// Storage marker trait is automatically satisfied via blanket impl
// 2. Optionally implement nameservice traits for advanced features
#[async_trait]
impl StorageList for MyStorage {
async fn list_prefix(&self, prefix: &str) -> StorageExtResult<Vec<String>> {
// Delegate to core trait, convert error
StorageRead::list_prefix(self, prefix)
.await
.map_err(|e| StorageExtError::Other(e.to_string()))
}
// ... paginated version
}
}
BranchedContentStore (fluree-db-core)
BranchedContentStore<S> is a recursive ContentStore implementation that provides namespace-scoped fallback reads for branched ledgers. When a branch is created, it gets its own storage namespace for new writes, but needs to read pre-branch-point content (commits, dictionaries) from ancestor namespaces.
Structure
#![allow(unused)]
fn main() {
pub struct BranchedContentStore<S: Storage> {
branch_store: StorageContentStore<S>,
parents: Vec<BranchedContentStore<S>>,
}
}
branch_store— the branch’s own namespace store; all writes go hereparents— ancestor stores to fall back to for reads (recursive tree)
The recursive structure supports arbitrarily deep branch chains (main → dev → feature) and is designed to support future merge scenarios where a branch may have multiple parents (DAG ancestry).
Constructors
#![allow(unused)]
fn main() {
// Root branch (e.g., main) — no parents
let store = BranchedContentStore::leaf(storage, "mydb:main");
// Branch with parent fallback
let parent = BranchedContentStore::leaf(storage, "mydb:main");
let store = BranchedContentStore::with_parents(storage, "mydb:dev", vec![parent]);
}
Read Behavior
get() tries the branch’s own namespace first, then recurses into parents:
- Try
branch_store.get(id)— if found, return immediately - If
NotFoundand parents exist, try each parent in order - If no parent finds it, return the last
NotFounderror - Non-
NotFounderrors propagate immediately — onlyNotFoundtriggers fallback
has() and resolve_local_path() follow the same fallback pattern.
Write Behavior
put() and put_with_id() always write to branch_store — never to parents. This ensures branch isolation: new content is always scoped to the branch’s own namespace.
What Is and Isn’t Copied at Branch Time
| Artifact | Copied? | Reason |
|---|---|---|
| Commits | No | Immutable chain, never deleted; read via fallback |
| Index structure files (root, leaves, branches, arenas) | Yes | Source may GC old indexes after reindexing |
| String dictionaries | No | Stored globally in the @shared namespace; all branches read from the same location |
Global Dictionary Storage (@shared Namespace)
String dictionaries (mappings between IRIs/strings and compact integer IDs) are the largest index artifact. Rather than copying them per-branch or relying on BranchedContentStore fallback reads, dictionaries are stored in a global namespace shared by all branches of a ledger.
The content_path function routes all DictBlob CIDs to a shared path:
mydb/@shared/dicts/<sha256hex>.subject # Subject dict
mydb/@shared/dicts/<sha256hex>.string # String dict
mydb/@shared/dicts/<sha256hex>.predicate # Predicate dict
...
The @shared prefix uses the @ character, which is forbidden in branch names by validate_branch_name, so it cannot collide with any branch namespace. The constant is defined as SHARED_NAMESPACE in fluree-db-core::address_path.
Legacy fallback: Existing deployments may have dictionaries stored at the old per-branch path (e.g., mydb/main/index/objects/dicts/<sha>.dict). StorageContentStore automatically falls back to the legacy path when a dict CID is not found at the new @shared location. After the next index build, new writes go to the @shared path — no manual migration is needed.
Building the Store Tree
LedgerState::build_branched_store() recursively walks the branch ancestry via nameservice source_branch metadata, constructing the BranchedContentStore tree. This uses Box::pin for the recursive async calls.
The actual ancestry walk lives in fluree-db-nameservice (branched_store::build_branched_store), and LedgerState::build_branched_store is a thin wrapper that delegates there. This keeps the helper available to crates that don’t depend on fluree-db-ledger (notably fluree-db-indexer’s background worker).
When to Use BranchedContentStore
Any code path that walks the commit chain or loads index blobs for a branched ledger MUST use a branch-aware content store. Per-query reads against an already-loaded LedgerState are fine — LedgerState::load already wires the branched store up.
Use the nameservice helpers, not the flat StorageBackend::content_store(...):
| Helper | When to use |
|---|---|
fluree_db_nameservice::branched_content_store_for_record(backend, ns, &record) | An NsRecord is in scope (no extra lookup) |
fluree_db_nameservice::branched_content_store_for_id(backend, ns, ledger_id) | No NsRecord available — does one nameservice lookup |
Fluree::branched_content_store(&self, ledger_id) | API / CLI callers — wraps _for_id |
Both helpers return the flat namespace store unchanged for non-branched ledgers, so adding them to non-branch code paths costs at most a single nameservice lookup.
A flat backend.content_store(ledger_id) on the commit-chain walk path will 404 the moment the walker steps past the fork point and tries to read an ancestor commit from the wrong namespace.
Type Erasure with AnyStorage
For dynamic dispatch (e.g., runtime-selected storage backends), use AnyStorage:
#![allow(unused)]
fn main() {
/// Type-erased storage wrapper
pub struct AnyStorage {
inner: Arc<dyn Storage>,
}
impl AnyStorage {
pub fn new<S: Storage + 'static>(storage: S) -> Self {
Self { inner: Arc::new(storage) }
}
}
}
When to use:
FlureeClientusesAnyStorageto support any backend at runtime- Generic code should prefer concrete types (
S: Storage) for better optimization - Use
AnyStoragewhen storage type is determined at runtime (e.g., from config)
Wrapper Storages
Several wrapper types add functionality to underlying storage:
TieredStorage
Routes commits and indexes to different backends:
#![allow(unused)]
fn main() {
pub struct TieredStorage<S> {
commit_storage: S,
index_storage: S,
}
}
EncryptedStorage
Adds transparent encryption:
#![allow(unused)]
fn main() {
pub struct EncryptedStorage<S, K> {
inner: S,
key_provider: K,
}
}
AddressIdentifierResolverStorage
Routes reads based on address format (e.g., different storage backends by identifier segment):
#![allow(unused)]
fn main() {
pub struct AddressIdentifierResolverStorage {
default_storage: Arc<dyn Storage>,
identifier_storages: HashMap<String, Arc<dyn Storage>>,
}
}
Error Handling
Core Errors (fluree_db_core::Error)
Standard errors for storage operations:
NotFound- Address doesn’t existStorage- Generic storage failureIo- Underlying I/O error
Nameservice Errors (StorageExtError)
Extended errors for nameservice operations:
NotFound- Address doesn’t existPreconditionFailed- CAS condition not metOther- Generic error with message
Summary
| Type | Crate | Purpose | Error Type |
|---|---|---|---|
ContentStore (trait) | core | Content-addressed get/put/has by ContentId | Result<T> |
BranchedContentStore (struct) | core | Recursive ContentStore with namespace fallback for branches | Result<T> |
StorageRead (trait) | core | Physical read operations | Result<T> |
StorageWrite (trait) | core | Physical write + delete | Result<T> |
ContentAddressedWrite (trait) | core | Hash-based physical writes | Result<T> |
Storage (trait) | core | Marker (full physical capability) | - |
StorageList (trait) | nameservice | Paginated listing | StorageExtResult<T> |
StorageCas (trait) | nameservice | Compare-and-swap | StorageExtResult<T> |
StorageDelete (trait) | nameservice | Delete with ext errors | StorageExtResult<T> |
Application code typically interacts with ContentStore for immutable artifact access. Storage backend implementors implement the physical traits (StorageRead, StorageWrite, ContentAddressedWrite) and the Storage marker trait is automatically satisfied. For branched ledgers, BranchedContentStore wraps the physical storage with recursive namespace fallback — see BranchedContentStore above.
HTTP API
The Fluree HTTP API provides RESTful endpoints for all database operations. This section documents the complete API surface including request formats, authentication, and error handling.
Core Endpoints
Overview
High-level introduction to the Fluree HTTP API, including:
- API design principles
- Authentication overview
- Rate limiting and quotas
- API versioning
Endpoints
Complete reference for all HTTP endpoints:
POST /update- Submit update transactions (WHERE/DELETE/INSERT or SPARQL UPDATE)POST /query- Execute queriesGET /v1/fluree/ledgers- List ledgersGET /health- Health checksGET /v1/fluree/stats- Server status- And more…
Headers, Content Types, and Request Sizing
HTTP headers and request format details:
- Content-Type negotiation
- Accept headers for response formats
- Request size limits
- Compression support
- Custom headers
Signed Requests (JWS/VC)
Cryptographically signed and verifiable requests:
- JSON Web Signature (JWS) format
- Verifiable Credentials (VC) support
- Public key verification
- DID authentication
- Signature validation
Errors and Status Codes
HTTP status codes and error responses:
- Standard HTTP status codes
- Fluree-specific error codes
- Error response format
- Troubleshooting common errors
API Characteristics
RESTful Design
The Fluree API follows REST principles:
- Resource-oriented URLs
- Standard HTTP methods (GET, POST)
- Stateless requests
- Standard status codes
Content Negotiation
Fluree supports multiple content types for requests and responses:
Request Content-Types:
application/json- JSON-LD transactions and queriesapplication/sparql-query- SPARQL queriestext/turtle- Turtle RDF formatapplication/ld+json- Explicit JSON-LD
Response Content-Types:
application/json- Default JSON formatapplication/ld+json- JSON-LD with contextapplication/sparql-results+json- SPARQL result format
Authentication
Fluree supports multiple authentication mechanisms:
- No Authentication (development only)
- Signed Requests (JWS/VC for production)
- API Keys (simple token-based auth)
- Bearer Tokens (JWT authentication)
See Signed Requests for cryptographic authentication details.
Quick Examples
Transaction Request
curl -X POST http://localhost:8090/v1/fluree/insert?ledger=mydb:main \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/"
},
"@graph": [
{ "@id": "ex:alice", "ex:name": "Alice" }
]
}'
Query Request
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"from": "mydb:main",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}'
SPARQL Query
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d 'SELECT ?name FROM <mydb:main> WHERE { ?person ex:name ?name }'
Health Check
curl http://localhost:8090/health
API Clients
Command Line (curl)
All examples in this documentation use curl for simplicity. Curl is available on all major platforms.
Programming Languages
Fluree’s HTTP API can be accessed from any language with HTTP client support:
JavaScript/TypeScript:
const response = await fetch('http://localhost:8090/v1/fluree/query', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
from: 'mydb:main',
select: ['?name'],
where: [{ '@id': '?person', 'ex:name': '?name' }]
})
});
const results = await response.json();
Python:
import requests
response = requests.post('http://localhost:8090/v1/fluree/query', json={
'from': 'mydb:main',
'select': ['?name'],
'where': [{'@id': '?person', 'ex:name': '?name'}]
})
results = response.json()
Java:
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:8090/v1/fluree/query"))
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(queryJson))
.build();
HttpResponse<String> response = client.send(request,
HttpResponse.BodyHandlers.ofString());
Development vs Production
Development Setup
For local development, the API typically runs without authentication:
./fluree-db-server --port 8090 --storage memory
Access: http://localhost:8090
Production Setup
For production deployments, enable authentication and use HTTPS:
./fluree-db-server \
--port 8090 \
--storage aws \
--require-signed-requests \
--https-cert /path/to/cert.pem \
--https-key /path/to/key.pem
Access: https://api.yourdomain.com
Always use:
- HTTPS in production
- Signed requests or API keys
- Rate limiting
- Request size limits
Performance Considerations
Request Size Limits
Default limits (configurable):
- Transaction size: 10MB
- Query size: 1MB
- Response size: 100MB
See Headers and Request Sizing for details.
Connection Management
- Keep-alive connections supported
- HTTP/2 support available
- WebSocket support for streaming (planned)
Caching
- Query results can be cached (ETag support)
- Immutable historical queries cache well
- Current queries should not be cached aggressively
Related Documentation
- Getting Started - Quickstart guides
- Transactions - Transaction details
- Query - Query language documentation
- Security - Policy and access control
- Operations - Configuration and deployment
API Overview
The Fluree HTTP API provides a complete RESTful interface for database operations. This document provides a high-level overview of API design principles and capabilities.
API Design Principles
Resource-Oriented
The API is organized around resources:
- Ledgers: Database instances
- Transactions: Write operations
- Queries: Read operations
- Commits: Transaction history
Standard HTTP Methods
Operations use standard HTTP methods:
GET- Retrieve information (idempotent, cacheable)POST- Submit operations (transactions, queries)PUT- Update resources (planned)DELETE- Remove resources (planned)
JSON-First
All request and response bodies use JSON by default:
- Native JSON-LD support
- Clean, readable syntax
- Easy integration with modern applications
Stateless
All requests are stateless:
- No session management required
- Each request contains complete information
- Enables horizontal scaling
Core Concepts
Ledger Identification
Ledgers are identified using aliases with branch names:
ledger-name:branch-name
Examples:
mydb:main- Main branch of mydb ledgercustomers:prod- Production branch of customers ledgertenant/app:dev- Development branch with hierarchical naming
Time Travel in URLs
Historical queries use time specifiers in ledger IDs:
ledger:branch@t:100 # Transaction number
ledger:branch@iso:2024-01-15 # ISO timestamp
ledger:branch@commit:bafybeig... # Commit ID
These work in all query contexts (FROM clauses, dataset specs, etc.).
Content Type Negotiation
Request format determined by Content-Type header:
application/json- JSON-LD (default)application/sparql-query- SPARQLtext/turtle- Turtle RDF
Response format determined by Accept header:
application/json- Compact JSON (default)application/ld+json- Full JSON-LD with contextapplication/sparql-results+json- SPARQL result format
API Endpoints
Except for root diagnostics such as /health and /.well-known/fluree.json,
HTTP API paths are under the discovered API base URL. The standalone server
defaults to /v1/fluree.
Transaction Endpoints
POST /update
- Submit update transactions (WHERE/DELETE/INSERT JSON-LD or SPARQL UPDATE)
- Parameters:
ledger,context - Returns: Transaction receipt with commit info
POST /insert / POST /upsert
- Insert or upsert data (JSON-LD and Turtle; TriG on upsert)
Query Endpoints
POST /query
- Execute queries (JSON-LD Query or SPARQL)
- Parameters: None (ledger specified in query body)
- Returns: Query results
- Supports history queries via time range in
fromclause (see Time Travel)
Ledger Management
GET /ledgers
- List all ledgers
- Parameters: None
- Returns: Array of ledger metadata
GET /info/:ledger-id
- Get specific ledger metadata
- Parameters:
ledger-id(ledger:branch) - Returns: Ledger details (commit_t, index_t, etc.)
POST /create
- Create a new ledger explicitly
- Parameters:
ledger - Returns: Ledger metadata
System Endpoints
GET /health
- Health check endpoint
- Parameters: None
- Returns: Server health status
GET /stats
- Server status and statistics
- Parameters: None
- Returns: Detailed server state
Request Format
URL Structure
https://[host]:[port]/[endpoint]?[parameters]
Example:
http://localhost:8090/v1/fluree/update?ledger=mydb:main
Query Parameters
Common parameters:
ledger- Target ledger (format:name:branch)context- Default context URLformat- Response format override
Request Headers
Essential headers:
Content-Type: application/json
Accept: application/json
Authorization: Bearer [token]
See Headers for complete list.
Request Body
JSON-LD format for transactions:
{
"@context": {
"ex": "http://example.org/ns/"
},
"@graph": [
{ "@id": "ex:alice", "ex:name": "Alice" }
]
}
JSON-LD Query format:
{
"@context": {
"ex": "http://example.org/ns/"
},
"from": "mydb:main",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Response Format
Success Response
Successful operations return appropriate status codes with JSON bodies.
Transaction Response:
{
"t": 5,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT5",
"flakes_added": 3,
"flakes_retracted": 1
}
Query Response:
[
{ "name": "Alice" },
{ "name": "Bob" }
]
Error Response
Errors return appropriate HTTP status codes with structured error objects:
{
"error": "Invalid IRI: not a valid URI",
"status": 400,
"@type": "err:db/BadRequest"
}
See Errors and Status Codes for complete error reference.
Authentication
Fluree supports multiple authentication mechanisms, configured per endpoint group (data, events, admin, storage proxy). Each can be set to none, optional, or required. See Configuration for full details.
Development Mode
No authentication required (default):
curl http://localhost:8090/v1/fluree/query/mydb:main \
-H "Content-Type: application/json" \
-d '{"select": ["?s"], "where": [{"@id": "?s"}]}'
Bearer Token Authentication
Bearer tokens in the Authorization header. Fluree supports two token types with automatic dual-path dispatch:
Ed25519 JWS (did:key) - Locally minted tokens with an embedded JWK. Created with fluree token create:
TOKEN=$(fluree token create --private-key @~/.fluree/key --read-all --write-all)
curl http://localhost:8090/v1/fluree/query/mydb:main \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"select": ["?s"], "where": [{"@id": "?s"}]}'
OIDC/JWKS (RS256) - Tokens from external identity providers, verified against the provider’s JWKS endpoint. Requires the oidc feature and --jwks-issuer server configuration:
curl http://localhost:8090/v1/fluree/query/mydb:main \
-H "Authorization: Bearer <oidc-token>" \
-H "Content-Type: application/json" \
-d '{"select": ["?s"], "where": [{"@id": "?s"}]}'
The server inspects the token header to determine the verification path:
- Embedded JWK (Ed25519): Verifies against the embedded public key; issuer is a
did:key - kid header (RS256): Verifies against the issuer’s JWKS endpoint
Token Scopes
Bearer tokens carry permission scopes that control access:
- Read:
fluree.ledger.read.all=trueorfluree.ledger.read.ledgers=[...] - Write:
fluree.ledger.write.all=trueorfluree.ledger.write.ledgers=[...] - Back-compat:
fluree.storage.*claims also imply read access for data endpoints
Connection-Scoped SPARQL
When a bearer token is present for connection-scoped SPARQL queries (/v1/fluree/query with Content-Type: application/sparql-query), FROM/FROM NAMED clauses are checked against the token’s read scope (fluree.ledger.read.all or fluree.ledger.read.ledgers). Out-of-scope ledgers return 404 (no existence leak).
Signed Requests (JWS/VC)
Cryptographically signed request bodies using Ed25519 JWS or Verifiable Credentials. The signed payload carries the request itself plus the signer’s identity for policy evaluation.
curl http://localhost:8090/v1/fluree/query/mydb:main \
-H "Content-Type: application/jose" \
-d '<compact-jws-string>'
See Signed Requests for detailed documentation.
Rate Limiting
Default Limits
Production deployments should implement rate limiting:
- Queries: 100 requests per minute
- Transactions: 10 requests per minute
- History: 50 requests per minute
Rate Limit Headers
Responses include rate limit information:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642857600
Exceeding Limits
When limits are exceeded:
- Status code:
429 Too Many Requests - Response body includes retry information
Retry-Afterheader indicates wait time
API Versioning
Current Version
The current API is version 1 (v1).
Version in URL (Future)
Future versions may use URL-based versioning:
https://api.example.com/v2/query
Common Patterns
Idempotent Transactions
Use the upsert endpoint for idempotent transactions:
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{...}'
Batch Operations
Submit multiple entities in a single transaction:
{
"@graph": [
{ "@id": "ex:alice", "ex:name": "Alice" },
{ "@id": "ex:bob", "ex:name": "Bob" },
{ "@id": "ex:carol", "ex:name": "Carol" }
]
}
Conditional Updates
Use WHERE/DELETE/INSERT for conditional changes:
{
"where": [
{ "@id": "ex:alice", "ex:age": "?oldAge" }
],
"delete": [
{ "@id": "ex:alice", "ex:age": "?oldAge" }
],
"insert": [
{ "@id": "ex:alice", "ex:age": 31 }
]
}
Historical Queries
Query past states using time specifiers:
{
"from": "mydb:main@t:100",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Best Practices
1. Use Appropriate HTTP Methods
- GET for read-only operations (health, status)
- POST for write and query operations
2. Set Correct Content-Type
Always specify the request format:
Content-Type: application/json
3. Handle Errors Gracefully
Check status codes and parse error responses:
if (response.status !== 200) {
const error = await response.json();
console.error(`Error ${error.code}: ${error.message}`);
}
4. Use Connection Pooling
Reuse HTTP connections for better performance:
const agent = new https.Agent({ keepAlive: true });
5. Implement Retry Logic
Retry failed requests with exponential backoff:
async function retryRequest(fn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (err) {
if (i === maxRetries - 1) throw err;
await sleep(Math.pow(2, i) * 1000);
}
}
}
6. Monitor Rate Limits
Track rate limit headers and back off when approaching limits.
7. Use Compression
Enable compression for large payloads:
Accept-Encoding: gzip, deflate
Security Considerations
HTTPS in Production
Always use HTTPS in production:
- Prevents eavesdropping
- Protects credentials
- Enables trust
Validate Input
Validate all user input before sending to API:
- Check IRI formats
- Validate JSON structure
- Sanitize user data
Secure Credentials
Never expose credentials in code or logs:
- Use environment variables
- Rotate keys regularly
- Use signed requests for highest security
Implement CORS Carefully
If exposing API to web applications, configure CORS appropriately:
Access-Control-Allow-Origin: https://your-app.com
Access-Control-Allow-Methods: POST, GET
Access-Control-Allow-Headers: Content-Type, Authorization
Performance Tips
1. Batch Related Operations
Combine related entities in single transactions for better performance.
2. Use Appropriate Time Specifiers
@t:NNNis fastest (direct lookup)@iso:DATETIMErequires binary search@commit:CIDrequires scan
3. Limit Result Sets
Always use LIMIT for potentially large result sets:
{
"select": ["?name"],
"where": [...],
"limit": 100
}
4. Cache Historical Queries
Historical queries (with time specifiers) are immutable and cache well.
5. Use Streaming for Large Results
For very large result sets, consider streaming responses (when supported).
Related Documentation
- Endpoints - Complete endpoint reference
- Headers - HTTP headers and content types
- Signed Requests - Cryptographic authentication
- Errors - Error codes and troubleshooting
API Endpoints
Complete reference for all Fluree HTTP API endpoints.
Base URL / versioning
All endpoints listed below are under the server’s API base URL (api_base_url from GET /.well-known/fluree.json).
- Standalone
fluree-serverdefault:api_base_url = "/v1/fluree" - All curl examples in this document use the full URL including the base path (e.g.,
http://localhost:8090/v1/fluree/query/<ledger...>)
Discovery and diagnostics
GET /.well-known/fluree.json
CLI auth discovery endpoint. Used by fluree remote add and fluree auth login to auto-configure authentication for a remote.
See Auth contract (CLI ↔ Server) for the full schema.
Standalone fluree-server returns:
{"version":1,"api_base_url":"/v1/fluree"}when no server auth is enabled{"version":1,"api_base_url":"/v1/fluree","auth":{"type":"token"}}when any server auth mode is enabled (data/events/admin)
OIDC-capable implementations should return auth.type="oidc_device" plus issuer, client_id, and exchange_url.
The CLI treats oidc_device as “OIDC interactive login”: it uses device-code when the IdP supports it, otherwise authorization-code + PKCE.
Implementations MAY also return api_base_url to tell the CLI where the Fluree API is mounted (for example,
when the API is hosted under /v1/fluree or on a separate data subdomain).
GET {api_base_url}/whoami
Diagnostic endpoint for Bearer tokens. Returns a summary of the principal:
token_present: whether a Bearer token was presentverified: whether cryptographic verification succeededauth_method:"embedded_jwk"(Ed25519) or"oidc"(JWKS/RS256)- identity + scope summary (when verified)
This endpoint is intended for debugging and operator support. See also Admin, health, and stats.
Transaction Endpoints
POST /update
Submit an update transaction (WHERE/DELETE/INSERT JSON-LD or SPARQL UPDATE) to write data to a ledger.
URL:
POST /update?ledger={ledger-id}
POST /update/{ledger-id}
Query Parameters:
ledger(required for /update): Target ledger (format:name:branch)context(optional): URL to default JSON-LD context
Request Headers:
For JSON-LD transactions:
Content-Type: application/json
Accept: application/json
For SPARQL UPDATE:
Content-Type: application/sparql-update
Accept: application/json
Note: Turtle/TriG are not accepted on /update. Use /insert (Turtle) or /upsert (Turtle/TriG).
Request Body (JSON-LD):
JSON-LD transaction document:
{
"@context": {
"ex": "http://example.org/ns/"
},
"@graph": [
{ "@id": "ex:alice", "ex:name": "Alice" }
]
}
Or WHERE/DELETE/INSERT update:
{
"@context": {
"ex": "http://example.org/ns/"
},
"where": [
{ "@id": "ex:alice", "ex:age": "?oldAge" }
],
"delete": [
{ "@id": "ex:alice", "ex:age": "?oldAge" }
],
"insert": [
{ "@id": "ex:alice", "ex:age": 31 }
]
}
Request Body (SPARQL UPDATE):
PREFIX ex: <http://example.org/ns/>
INSERT DATA {
ex:alice ex:name "Alice" .
ex:alice ex:age 30 .
}
Or with DELETE/INSERT:
PREFIX ex: <http://example.org/ns/>
DELETE {
?person ex:age ?oldAge .
}
INSERT {
?person ex:age 31 .
}
WHERE {
?person ex:name "Alice" .
?person ex:age ?oldAge .
}
Response:
{
"t": 5,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT5",
"flakes_added": 3,
"flakes_retracted": 1,
"previous_commit_id": "bafybeig...commitT4"
}
Status Codes:
200 OK- Transaction successful400 Bad Request- Invalid transaction syntax401 Unauthorized- Authentication required403 Forbidden- Not authorized for this ledger404 Not Found- Ledger not found413 Payload Too Large- Transaction exceeds size limit500 Internal Server Error- Server error
Examples:
JSON-LD transaction:
curl -X POST "http://localhost:8090/v1/fluree/update?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": { "ex": "http://example.org/ns/" },
"@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }]
}'
SPARQL UPDATE (ledger-scoped endpoint):
curl -X POST http://localhost:8090/v1/fluree/update/mydb:main \
-H "Content-Type: application/sparql-update" \
-d 'PREFIX ex: <http://example.org/ns/>
INSERT DATA { ex:alice ex:name "Alice" }'
SPARQL UPDATE (connection-scoped with header):
curl -X POST http://localhost:8090/v1/fluree/update \
-H "Content-Type: application/sparql-update" \
-H "Fluree-Ledger: mydb:main" \
-d 'PREFIX ex: <http://example.org/ns/>
DELETE { ?s ex:age ?old } INSERT { ?s ex:age 31 }
WHERE { ?s ex:name "Alice" . ?s ex:age ?old }'
Note: Turtle and TriG are not accepted on /update. Use /insert (Turtle) or /upsert (Turtle/TriG).
POST /insert
Insert new data into a ledger. Data must not conflict with existing data.
URL:
POST /insert?ledger={ledger-id}
POST /insert/{ledger-id}
Supported Content Types:
application/json- JSON-LDtext/turtle- Turtle (fast direct flake path)
Note: TriG (application/trig) is not supported on the insert endpoint. Named graph ingestion via GRAPH blocks requires the upsert path. Use /upsert for TriG data.
Example (JSON-LD):
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": { "ex": "http://example.org/ns/" },
"@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }]
}'
Example (Turtle):
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
-H "Content-Type: text/turtle" \
-d '@prefix ex: <http://example.org/ns/> .
ex:alice ex:name "Alice" ; ex:age 30 .'
POST /upsert
Upsert data into a ledger. For each (subject, predicate) pair, existing values are retracted before new values are asserted.
URL:
POST /upsert?ledger={ledger-id}
POST /upsert/{ledger-id}
Supported Content Types:
application/json- JSON-LDtext/turtle- Turtleapplication/trig- TriG with named graphs
Example (JSON-LD):
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": { "ex": "http://example.org/ns/" },
"@id": "ex:alice",
"ex:age": 31
}'
Example (TriG with named graphs):
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/trig" \
-d '@prefix ex: <http://example.org/ns/> .
# Default graph
ex:company ex:name "Acme Corp" .
# Named graph for products
GRAPH <http://example.org/graphs/products> {
ex:widget ex:name "Widget" ;
ex:price "29.99"^^xsd:decimal .
}'
POST /push/*ledger
Push precomputed commit v2 blobs to the server.
This endpoint is intended for Git-like workflows (fluree push) where a client has written commits locally and wants the server to validate and commit them.
URL:
POST /push/<ledger...>
Request Headers:
Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>
Idempotency-Key: <string> (optional; recommended)
If Idempotency-Key is provided, servers MAY treat POST /push/*ledger as idempotent for that key (same request body + key should yield the same response), returning the prior success response instead of 409 on client retry after timeouts.
Request Body:
JSON object:
commits: array of base64-encoded commit v2 blobs (oldest → newest)blobs(optional): map of{ cid: base64Bytes }for referenced blobs (currently:commit.txnwhen present)
Response Body (200 OK):
{
"ledger": "mydb:main",
"accepted": 3,
"head": {
"t": 42,
"commit_id": "bafy...headCommit"
},
"indexing": {
"enabled": false,
"needed": true,
"novelty_size": 524288,
"index_t": 30,
"commit_t": 42
}
}
| Field | Description |
|---|---|
indexing.enabled | Whether background indexing is active on this server. |
indexing.needed | Whether novelty has exceeded reindex_min_bytes and indexing should be triggered. |
indexing.novelty_size | Current novelty size in bytes after the push. |
indexing.index_t | Transaction time of the last indexed state. |
indexing.commit_t | Transaction time of the latest committed data (after push). |
When enabled is false (external indexer mode), the caller should use needed and related fields to decide whether to trigger indexing through its own mechanism.
Error Responses:
409 Conflict: head changed / diverged / first committdid not match next-t422 Unprocessable Entity: invalid commit bytes, missing referenced blob, or retraction invariant violation
GET /show/*ledger
Fetch and decode a single commit’s contents with resolved IRIs. This is the server-side equivalent of fluree show — it returns assertions, retractions, and flake tuples with IRIs compacted using the ledger’s namespace prefix table.
URL:
GET /show/<ledger...>?commit=<ref>
Query Parameters:
commit(required): Commit identifier —t:<N>for transaction number, hex-digest prefix (min 6 chars), or full CID
Request Headers:
Authorization: Bearer <token> (when data auth is enabled)
Response Body (200 OK):
{
"id": "bagaybqabciq...",
"t": 5,
"time": "2026-03-12T16:58:18.395474217+00:00",
"size": 327,
"previous": "bagaybqabciq...",
"asserts": 1,
"retracts": 1,
"@context": {
"xsd": "http://www.w3.org/2001/XMLSchema#",
"schema": "http://schema.org/"
},
"flakes": [
["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T14:15:30Z", "xsd:string", false],
["urn:fsys:dataset:zoho3", "schema:dateModified", "2026-03-12T16:58:16Z", "xsd:string", true]
]
}
Each flake is a tuple: [subject, predicate, object, datatype, operation]. Operation true = assert (added), false = retract (removed). When metadata is present (language tag, list index, or named graph), a 6th element is appended.
Policy filtering: Flakes are filtered by the caller’s data-auth identity (extracted from the Bearer token) and the server’s configured default_policy_class. When neither is present, all flakes are returned (root/admin access). Flakes the caller cannot read are silently omitted — the asserts and retracts counts reflect only the visible flakes. Unlike the query endpoints, show does not accept per-request policy overrides via headers or request body.
Responses:
200 OK: Decoded commit returned400 Bad Request: Missing or invalidcommitparameter401 Unauthorized: Bearer token required but missing404 Not Found: Ledger or commit not found501 Not Implemented: Proxy storage mode (no local index available)
Peer mode: Forwards to the transactor.
GET /commits/*ledger
Export commit blobs from a ledger using stable cursors. Pages walk backward via each commit’s parents — O(limit) per page regardless of ledger size. Used by fluree pull and fluree clone.
Requires replication-grade permissions (fluree.storage.*). The storage proxy must be enabled on the server.
URL:
GET /commits/<ledger...>?limit=100&cursor_id=<cid>
Query Parameters:
limit(optional): Max commits per page (default 100, server clamps to max 500)cursor_id(optional): Commit CID cursor for pagination. Omit for first page (starts from head). Usenext_cursor_idfrom the previous response for subsequent pages.
Request Headers:
Authorization: Bearer <token> (requires fluree.storage.* claims)
Response Body (200 OK):
{
"ledger": "mydb:main",
"head_commit_id": "bafy...headCommit",
"head_t": 42,
"commits": ["<base64>", "<base64>"],
"blobs": { "bafy...txnBlob": "<base64>" },
"newest_t": 42,
"oldest_t": 41,
"next_cursor_id": "bafy...prevCommit",
"count": 2,
"effective_limit": 100
}
commits: Raw commit v2 blobs, newest → oldest within each page.blobs: Referenced txn blobs keyed by CID string.next_cursor_id: CID cursor for the next page;nullwhen genesis is reached.effective_limit: Actual limit used (after server clamping).
Responses:
200 OK: Page of commits returned401 Unauthorized: Missing or invalid storage token404 Not Found: Storage proxy not enabled, ledger not found, or not authorized for this ledger
Pagination:
Commit CIDs in the immutable chain are stable cursors. New commits appended to the head do not affect backward pointers, so cursors remain valid across pages even when new commits arrive between requests.
POST /pack/*ledger
Stream all missing CAS objects for a ledger in a single binary response. This is the primary transport for fluree clone and fluree pull, replacing multiple paginated GET /commits requests or per-object GET /storage/objects fetches with a single streaming request.
Requires replication-grade permissions (fluree.storage.*). The storage proxy must be enabled on the server.
URL:
POST /pack/<ledger...>
Request Headers:
Content-Type: application/json
Accept: application/x-fluree-pack
Authorization: Bearer <token> (requires fluree.storage.* claims)
Request Body:
{
"protocol": "fluree-pack-v1",
"want": ["bafy...remoteHead"],
"have": ["bafy...localHead"],
"include_indexes": true,
"include_txns": true,
"want_index_root_id": "bafy...indexRoot",
"have_index_root_id": "bafy...localIndexRoot"
}
| Field | Type | Required | Description |
|---|---|---|---|
protocol | string | Yes | Must be "fluree-pack-v1" |
want | string[] | Yes | ContentId CIDs the client wants (typically the remote commit head) |
have | string[] | No | ContentId CIDs the client already has (typically the local commit head). Server stops walking the commit chain when it reaches a have CID. Empty for full clone. |
want_index_root_id | string | No | Index root CID the client wants (typically remote nameservice index_head_id). Required when include_indexes=true. |
have_index_root_id | string | No | Index root CID the client already has (typically local nameservice index_head_id). Used for index artifact diff. |
include_indexes | bool | Yes | Include index artifacts in the stream. When true, the stream contains commit + txn objects plus index root/branch/leaf/dict artifacts. |
include_txns | bool | Yes | Include original transaction blobs referenced by each commit. When false, only commits (and optionally index artifacts) are streamed — commit envelopes still reference their txn CIDs, but the client will not have the transaction payloads locally. The ledger state is fully reconstructable from commits + indexes; transactions are the original request payloads (e.g., JSON-LD insert/update requests). |
Response:
Binary stream using the fluree-pack-v1 wire format (Content-Type: application/x-fluree-pack):
[Preamble: FPK1 + version(1)] [Header frame] [Data frames...] [End frame]
| Frame | Type byte | Content |
|---|---|---|
| Header | 0x00 | JSON metadata: protocol version, capabilities, commit_count, index_artifact_count, estimated_total_bytes |
| Data | 0x01 | CID binary + raw object bytes (commit, txn blob, or index artifact) |
| Error | 0x02 | UTF-8 error message (terminates stream) |
| Manifest | 0x03 | JSON metadata for phase transitions (e.g. start of index phase) |
| End | 0xFF | End of stream (no payload) |
Data frames are streamed in oldest-first topological order (parents before children), so the client can write objects to CAS as they arrive without buffering the entire stream.
The Header frame includes an estimated_total_bytes field that the CLI uses to warn users before large transfers (~1 GiB or more). The estimate is ratio-based (derived from commit count) and may differ from actual transfer size. Set to 0 for commits-only requests.
Status Codes:
200 OK: Binary pack stream401 Unauthorized: Missing or invalid storage token404 Not Found: Storage proxy not enabled, ledger not found, or not authorized for this ledger
Example:
# Download all commits for a ledger (full clone)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
-H "Content-Type: application/json" \
-H "Accept: application/x-fluree-pack" \
-H "Authorization: Bearer $TOKEN" \
-d '{"protocol":"fluree-pack-v1","want":["bafy...head"],"have":[],"include_indexes":false,"include_txns":true}' \
--output pack.bin
# Download commits without transaction payloads (smaller clone, read-only use)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
-H "Content-Type: application/json" \
-H "Accept: application/x-fluree-pack" \
-H "Authorization: Bearer $TOKEN" \
-d '{"protocol":"fluree-pack-v1","want":["bafy...head"],"have":[],"include_indexes":true,"include_txns":false,"want_index_root_id":"bafy...indexRoot"}' \
--output pack.bin
# Download only missing commits (incremental pull)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
-H "Content-Type: application/json" \
-H "Accept: application/x-fluree-pack" \
-H "Authorization: Bearer $TOKEN" \
-d '{"protocol":"fluree-pack-v1","want":["bafy...remoteHead"],"have":["bafy...localHead"],"include_indexes":false,"include_txns":true}' \
--output pack.bin
# Download commits + index artifacts (default for CLI pull/clone)
curl -X POST "http://localhost:8090/v1/fluree/pack/mydb:main" \
-H "Content-Type: application/json" \
-H "Accept: application/x-fluree-pack" \
-H "Authorization: Bearer $TOKEN" \
-d '{"protocol":"fluree-pack-v1","want":["bafy...head"],"have":[],"include_indexes":true,"include_txns":true,"want_index_root_id":"bafy...indexRoot"}' \
--output pack.bin
Storage Proxy Endpoints
These endpoints are intended for peer mode and fluree clone/pull workflows. They require the storage proxy to be enabled on the server and use replication-grade Bearer tokens (fluree.storage.* claims).
GET /storage/ns/:ledger-id
Fetch the nameservice record for a ledger.
URL:
GET /storage/ns/{ledger-id}
Request Headers:
Authorization: Bearer <token> (requires fluree.storage.* claims)
Response (200 OK):
{
"ledger_id": "mydb:main",
"name": "mydb",
"branch": "main",
"commit_head_id": "bafy...commitCid",
"commit_t": 42,
"index_head_id": "bafy...indexCid",
"index_t": 40,
"default_context": null,
"retracted": false,
"config_id": "bafy...configCid"
}
| Field | Description |
|---|---|
ledger_id | Canonical ledger ID (e.g., “mydb:main”) |
name | Ledger name without branch (e.g., “mydb”) |
config_id | CID of the LedgerConfig object (origin discovery), if set |
Status Codes:
200 OK: Record found404 Not Found: Storage proxy disabled, ledger not found, or not authorized
POST /storage/block
Fetch a storage block (index branch or leaf) by CID. The server derives the storage address internally. Leaf blocks are always policy-filtered before return.
Only replication-relevant content kinds are allowed (commits, txns, config, index roots/branches/leaves, dict blobs). Internal metadata kinds (GC records, stats sketches, graph source snapshots) are rejected with 404.
URL:
POST /storage/block
Request Headers:
Content-Type: application/json
Authorization: Bearer <token>
Accept: application/octet-stream | application/x-fluree-flakes | application/x-fluree-flakes+json
Request Body:
Both fields are required:
{
"cid": "bafy...branchOrLeafCid",
"ledger": "mydb:main"
}
Responses:
200 OK: Block bytes (branches) or encoded flakes (leaves)400 Bad Request: Invalid CID string404 Not Found: Block not found, disallowed kind, or not authorized
GET /storage/objects/:cid
Fetch a CAS (content-addressed storage) object by its content identifier. Returns the raw bytes of the stored object after verifying integrity.
This is a replication-grade endpoint for fluree clone/pull workflows. The client knows the CID (from the nameservice record or the commit chain) and wants the raw bytes.
URL:
GET /storage/objects/{cid}?ledger={ledger-id}
Path Parameters:
cid: CIDv1 string (base32-lower multibase, e.g.,"bafybeig...")
Query Parameters:
ledger(required): Ledger ID (e.g.,"mydb:main"). Required because storage paths are ledger-scoped.
Request Headers:
Authorization: Bearer <token> (requires fluree.storage.* claims)
Kind Allowlist:
All replication-relevant content kinds are served:
| Kind | Description |
|---|---|
commit | Commit chain blobs |
txn | Transaction data blobs |
config | LedgerConfig origin discovery objects |
index-root | Binary index root (FIR6) |
index-branch | Index branch manifests |
index-leaf | Index leaf files |
dict | Dictionary artifacts (predicates, subjects, strings, etc.) |
Only GarbageRecord (internal GC metadata) returns 404.
Response Headers:
Content-Type: application/octet-streamX-Fluree-Content-Kind: Content kind label (commit,txn,config,index-root,index-branch,index-leaf,dict)
Response Body:
Raw bytes of the stored object.
Integrity Verification:
The server verifies the hash of the stored bytes against the CID before returning. Commit blobs are format-sniffed:
- Commit-v2 blobs (
FCV2magic): Uses the canonical sub-range hash (SHA-256 over the payload excluding the trailing hash + signature block). - All other blobs (txn, config, future commit formats): Full-bytes SHA-256.
If verification fails, the server returns 500 Internal Server Error — this indicates storage corruption.
Status Codes:
200 OK: Object found and integrity verified400 Bad Request: Invalid CID string404 Not Found: Object not found, disallowed kind, not authorized, or storage proxy disabled500 Internal Server Error: Hash verification failed (storage corruption)
Example:
# Fetch a commit blob by CID
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:8090/v1/fluree/storage/objects/bafybeig...commitCid?ledger=mydb:main"
# Fetch a config blob by CID
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:8090/v1/fluree/storage/objects/bafybeig...configCid?ledger=mydb:main"
# Fetch an index leaf by CID
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:8090/v1/fluree/storage/objects/bafybeig...leafCid?ledger=mydb:main"
Nameservice Sync Endpoints
Used by replication clients and peer instances to push ref updates, initialize
ledgers, and fetch snapshots of all nameservice records. These are the
server-side counterpart to the fluree-db-nameservice-sync crate.
Authorization: All endpoints require a Bearer token with storage-proxy
permissions. Per-alias endpoints verify the principal is authorized for that
ledger. /snapshot filters results to the principal’s authorized scope
(storage_all returns everything; otherwise results are filtered to
storage_ledgers and graph sources are excluded).
Availability: These endpoints are only available on transaction servers
(direct storage mode). Proxy-mode instances return 404 Not Found.
POST /nameservice/refs/{alias}/commit
Compare-and-set push for a ledger’s commit-head ref.
Request Body:
{
"expected": { /* RefValue or null for initial creation */ },
"new": { /* RefValue */ }
}
Response (200 OK — updated):
{ "status": "updated", "ref": { /* new RefValue */ } }
Response (409 Conflict — CAS failed):
{ "status": "conflict", "actual": { /* current server-side RefValue */ } }
POST /nameservice/refs/{alias}/index
Compare-and-set push for a ledger’s index-head ref. Same request/response shape
as /commit above.
POST /nameservice/refs/{alias}/init
Create a ledger entry in the nameservice if it does not already exist. Idempotent.
Response:
{ "created": true } // new ledger entry was registered
{ "created": false } // already existed; no change
GET /nameservice/snapshot
Return a full snapshot of all ledger (NsRecord) and graph-source
(GraphSourceRecord) records visible to the caller.
Response:
{
"ledgers": [ /* NsRecord, … */ ],
"graph_sources": [ /* GraphSourceRecord, … */ ]
}
Status Codes:
200 OK— snapshot returned401 Unauthorized— missing/invalid storage-proxy token404 Not Found— endpoint disabled (proxy mode)
Query Endpoints
POST /query
Execute a query against one or more ledgers.
URL:
POST /query
GET /query?query={urlencoded-sparql} # SPARQL Protocol GET form
The GET form is provided for W3C SPARQL Protocol compliance. It accepts SPARQL queries via the query query parameter; the body forms below are preferred for larger queries and for JSON-LD. The same form is available on the ledger-scoped /query/{ledger} route.
Optional Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
default-context | boolean | false | When true, use the ledger’s stored default JSON-LD context if the request omits its own @context (JSON-LD) or PREFIX declarations (ledger-scoped SPARQL). |
Request Headers:
Content-Type: application/json
Accept: application/json
Or for SPARQL:
Content-Type: application/sparql-query
Accept: application/sparql-results+json
Request Body (JSON-LD Query):
{
"@context": {
"ex": "http://example.org/ns/"
},
"from": "mydb:main",
"select": ["?name", "?age"],
"where": [
{ "@id": "?person", "ex:name": "?name" },
{ "@id": "?person", "ex:age": "?age" }
],
"orderBy": ["?name"],
"limit": 100
}
Request Body (SPARQL):
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?age
FROM <mydb:main>
WHERE {
?person ex:name ?name .
?person ex:age ?age .
}
ORDER BY ?name
LIMIT 100
Response (JSON-LD Query):
[
{ "name": "Alice", "age": 30 },
{ "name": "Bob", "age": 25 }
]
Response (SPARQL):
{
"head": {
"vars": ["name", "age"]
},
"results": {
"bindings": [
{
"name": { "type": "literal", "value": "Alice" },
"age": { "type": "literal", "value": "30", "datatype": "http://www.w3.org/2001/XMLSchema#integer" }
}
]
}
}
Status Codes:
200 OK- Query successful400 Bad Request- Invalid query syntax401 Unauthorized- Authentication required404 Not Found- Ledger not found413 Payload Too Large- Query exceeds size limit500 Internal Server Error- Server error503 Service Unavailable- Query timeout or resource limit
Example:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"from": "mydb:main",
"select": ["?name"],
"where": [{ "@id": "?person", "ex:name": "?name" }]
}'
POST /query/
Execute a query against a specific ledger (ledger-scoped).
This endpoint is designed for single-ledger queries, but supports selecting named graphs inside the ledger.
URL:
POST /query/{ledger}
Default graph semantics:
- If the request does not specify a graph selector, the query runs against the ledger’s default graph.
- The built-in txn-meta graph can be selected as either:
- JSON-LD:
"from": "txn-meta", or - SPARQL:
FROM <txn-meta>
- JSON-LD:
Named graph selection (within the same ledger):
-
JSON-LD: you can use
"from"to pick a graph in this ledger:"from": "default"→ default graph"from": "txn-meta"→ txn-meta graph"from": "<graph IRI>"→ a user-defined named graph IRI within this ledger- Structured form:
"from": { "@id": "<ledger>", "graph": "<graph selector>" }
-
SPARQL: if the query includes
FROM/FROM NAMED, the server interprets those IRIs as graphs within this ledger (not other ledgers):FROM <default>/FROM <txn-meta>/FROM <graph IRI>selects the default graph for triple patterns outsideGRAPH {}.FROM NAMED <graph IRI>makes that named graph available viaGRAPH <graph IRI> { ... }.
Ledger mismatch protection:
If the body includes a ledger reference that targets a different ledger than {ledger}, the server returns 400 Bad Request with a “Ledger mismatch” error.
Examples:
JSON-LD (query txn-meta):
curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
-H "Content-Type: application/json" \
-d '{
"from": "txn-meta",
"select": ["?commit", "?t"],
"where": [{ "@id": "?commit", "https://ns.flur.ee/db#t": "?t" }]
}'
JSON-LD (query a user-defined named graph by IRI):
curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
-H "Content-Type: application/json" \
-d '{
"from": "http://example.org/graphs/products",
"select": ["?name"],
"where": [{ "@id": "?p", "http://example.org/ns/name": "?name" }]
}'
SPARQL (select txn-meta as default graph):
curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
-H "Content-Type: application/sparql-query" \
-d 'PREFIX f: <https://ns.flur.ee/db#>
SELECT ?commit ?t
FROM <txn-meta>
WHERE { ?commit f:t ?t }'
History Queries via POST /query
Query the history of entities using the standard /query endpoint with from and to keys specifying the time range.
Request Body:
{
"@context": {
"ex": "http://example.org/ns/"
},
"from": "mydb:main@t:1",
"to": "mydb:main@t:latest",
"select": ["?name", "?age", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
{ "@id": "ex:alice", "ex:age": "?age" }
],
"orderBy": "?t"
}
The @t and @op annotations capture transaction metadata:
- @t - Transaction time (integer) when the fact was asserted or retracted.
- @op - Operation type as a boolean:
truefor assertions,falsefor retractions. (MirrorsFlake.opon disk; constants"assert"/"retract"are not accepted.)
Both annotations work uniformly for literal-valued and IRI-valued objects.
Response:
[
["Alice", 30, 1, true],
["Alice", 30, 5, false],
["Alicia", 31, 5, true]
]
Example:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-d '{
"@context": { "ex": "http://example.org/ns/" },
"from": "mydb:main@t:1",
"to": "mydb:main@t:latest",
"select": ["?name", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}'
SPARQL History Query:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/sparql-query" \
-d 'PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?name ?t ?op
FROM <mydb:main@t:1>
TO <mydb:main@t:latest>
WHERE {
<< ex:alice ex:name ?name >> f:t ?t .
<< ex:alice ex:name ?name >> f:op ?op .
}
ORDER BY ?t'
GET/POST /explain
Return a query plan without executing the query. Accepts the same body formats and authentication as /query (JSON-LD, SPARQL via application/sparql-query or ?query=, and JWS/VC signed requests).
URL:
GET /explain[/{ledger...}]
POST /explain[/{ledger...}]
Behavior:
- JSON-LD body: returns the logical plan for the parsed query.
- SPARQL body: returns the plan for the parsed SPARQL query. The ledger-scoped endpoint (
/explain/{ledger}) rejects queries containingFROM/FROM NAMED— strip dataset clauses to explain the core plan. - SPARQL UPDATE is rejected (HTTP 400) — use
/updatefor updates. - Same ledger-scope enforcement for Bearer tokens as
/query.
Response:
A JSON object describing the logical / physical plan. Shape mirrors the query engine’s internal plan representation; treat it as informational and non-stable across releases.
Status Codes:
200 OK— plan returned400 Bad Request— SPARQL UPDATE sent, orFROMclauses on the ledger-scoped explain401 Unauthorized— authentication required and missing404 Not Found— ledger not found or not authorized
Examples:
# Explain a SPARQL query
curl -X POST http://localhost:8090/v1/fluree/explain/mydb \
-H "Content-Type: application/sparql-query" \
--data 'SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10'
# Explain a JSON-LD query
curl -X POST http://localhost:8090/v1/fluree/explain/mydb \
-H "Content-Type: application/json" \
-d '{"select":["?s"],"where":{"@id":"?s"}}'
Nameservice Metadata
The standalone server does not expose a general-purpose POST /nameservice/query
endpoint. Use GET /ledgers to list ledgers and graph sources,
GET /info/{ledger-id} for metadata about a single ledger or graph source, and
GET /nameservice/snapshot for authenticated remote-sync snapshots.
Ledger Management Endpoints
GET /ledgers
List all ledgers and graph sources.
URL:
GET /ledgers
Response:
{
"ledgers": [
{
"ledger_id": "mydb:main",
"branch": "main",
"commit_t": 5,
"index_t": 5,
"created": "2024-01-22T10:00:00.000Z",
"last_updated": "2024-01-22T10:30:00.000Z"
},
{
"ledger_id": "mydb:dev",
"branch": "dev",
"commit_t": 3,
"index_t": 2,
"created": "2024-01-22T11:00:00.000Z",
"last_updated": "2024-01-22T11:15:00.000Z"
}
]
}
Example:
curl http://localhost:8090/v1/fluree/ledgers
For metadata about a specific ledger or graph source, use GET /info/{ledger-id}.
To create a ledger, use POST /create.
POST /create
Create a new ledger.
URL:
POST /create
Authentication: When admin auth is enabled (--admin-auth-mode=required), requires Bearer token from a trusted issuer. See Admin Authentication.
Request Body:
{
"ledger": "mydb:main"
}
| Field | Type | Required | Description |
|---|---|---|---|
ledger | string | Yes | Ledger ID (e.g., “mydb” or “mydb:main”) |
Response:
{
"ledger": "mydb:main",
"t": 0,
"commit_id": "bafybeig...commitT0"
}
| Field | Description |
|---|---|
ledger | Normalized ledger ID |
t | Transaction time (0 for new ledger) |
commit_id | ContentId of the initial commit |
Status Codes:
201 Created- Ledger created successfully400 Bad Request- Invalid request body401 Unauthorized- Bearer token required (when admin auth enabled)409 Conflict- Ledger already exists500 Internal Server Error- Server error
Examples:
# Create ledger (no auth required in default mode)
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
# Create ledger with auth token (when admin auth enabled)
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJ..." \
-d '{"ledger": "mydb:main"}'
# Create with short ledger ID (auto-resolves to :main)
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb"}'
POST /drop
Drop (delete) a ledger.
URL:
POST /drop
Authentication: When admin auth is enabled (--admin-auth-mode=required), requires Bearer token from a trusted issuer. See Admin Authentication.
Request Body:
{
"ledger": "mydb:main",
"hard": false
}
| Field | Type | Required | Description |
|---|---|---|---|
ledger | string | Yes | Ledger ID (e.g., “mydb” or “mydb:main”) |
hard | boolean | No | If true, permanently delete all storage files. Default: false (soft drop) |
Drop Modes:
- Soft drop (
hard: false, default): Retracts the ledger from the nameservice but preserves all data files. The ledger can potentially be recovered. - Hard drop (
hard: true): Permanently deletes all commit and index files. This is irreversible.
Response:
{
"ledger": "mydb:main",
"status": "dropped",
"files_deleted": {
"commit": 15,
"index": 8
}
}
| Field | Description |
|---|---|
ledger | Normalized ledger ID |
status | One of: "dropped", "already_retracted", "not_found" |
files_deleted | File counts (only populated for hard drop) |
Status Codes:
200 OK- Drop successful (or already dropped/not found)400 Bad Request- Invalid request body401 Unauthorized- Bearer token required (when admin auth enabled)500 Internal Server Error- Server error
Drop Sequence:
- Normalizes the ledger ID (ensures branch suffix like
:main) - Cancels any pending background indexing
- Waits for in-progress indexing to complete
- In hard mode: deletes all storage artifacts (commits + indexes)
- Retracts from nameservice
- Disconnects from ledger cache
Idempotency:
Safe to call multiple times:
- Returns
"already_retracted"if the ledger was previously dropped - Hard mode still attempts file deletion even for already-retracted ledgers (useful for cleanup)
Examples:
# Soft drop (retract only, preserve files)
curl -X POST http://localhost:8090/v1/fluree/drop \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
# Hard drop (delete all files - IRREVERSIBLE)
curl -X POST http://localhost:8090/v1/fluree/drop \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main", "hard": true}'
# Drop with auth token (when admin auth enabled)
curl -X POST http://localhost:8090/v1/fluree/drop \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJ..." \
-d '{"ledger": "mydb:main", "hard": true}'
# Drop with short ledger ID (auto-resolves to :main)
curl -X POST http://localhost:8090/v1/fluree/drop \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb"}'
GET /context/
Get the default JSON-LD context for a ledger.
URL:
GET /context/{ledger-id}
Path Parameters:
ledger-id: Ledger identifier (e.g.,mydbormydb:main)
Response:
{
"@context": {
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"owl": "http://www.w3.org/2002/07/owl#",
"ex": "http://example.org/"
}
}
If no default context has been set, "@context" is null.
Status Codes:
200 OK- Context returned (may benull)404 Not Found- Ledger does not exist
Example:
curl http://localhost:8090/v1/fluree/context/mydb:main
PUT /context/
Replace the default JSON-LD context for a ledger.
URL:
PUT /context/{ledger-id}
Path Parameters:
ledger-id: Ledger identifier (e.g.,mydbormydb:main)
Request Body:
A JSON object mapping prefixes to IRIs. Either a bare object or wrapped in {"@context": {...}}:
{
"ex": "http://example.org/",
"foaf": "http://xmlns.com/foaf/0.1/",
"schema": "http://schema.org/"
}
Response (success):
{
"status": "updated"
}
Status Codes:
200 OK- Context replaced successfully400 Bad Request- Body is not a valid JSON object; or peer mode (writes not available)404 Not Found- Ledger does not exist409 Conflict- Concurrent update conflict (retry the request)
Concurrency: The update uses compare-and-set semantics internally (up to 3 retries). A 409 means all retries were exhausted — this is rare and indicates heavy concurrent updates.
Cache invalidation: After a successful update, the server invalidates the cached ledger state. Subsequent queries will use the new context.
Examples:
# Set context
curl -X PUT http://localhost:8090/v1/fluree/context/mydb:main \
-H "Content-Type: application/json" \
-d '{"ex": "http://example.org/", "foaf": "http://xmlns.com/foaf/0.1/"}'
# Wrapped form also accepted
curl -X PUT http://localhost:8090/v1/fluree/context/mydb:main \
-H "Content-Type: application/json" \
-d '{"@context": {"ex": "http://example.org/"}}'
POST /branch
Create a new branch for a ledger.
URL:
POST /branch
Authentication: When admin auth is enabled (--admin-auth-mode=required), requires Bearer token from a trusted issuer. See Admin Authentication.
Request Body:
{
"ledger": "mydb",
"branch": "feature-x",
"source": "main",
"at": "t:5"
}
| Field | Type | Required | Description |
|---|---|---|---|
ledger | string | Yes | Ledger name without branch suffix (e.g., “mydb”) |
branch | string | Yes | New branch name to create (e.g., “feature-x”) |
source | string | No | Source branch to create from. Default: "main" |
at | string | No | Commit on the source branch to start from. "t:N" for a transaction number, or a hex digest / full CID for prefix resolution. When omitted, the branch starts at the source’s current HEAD. t: / prefix resolution requires the source to be indexed. |
Response:
{
"ledger_id": "mydb:feature-x",
"branch": "feature-x",
"source": "main",
"t": 5
}
| Field | Description |
|---|---|
ledger_id | Full ledger:branch identifier for the new branch |
branch | Branch name |
source | Source branch this was created from |
t | Transaction time of the commit at the branch point |
Status Codes:
201 Created- Branch created successfully400 Bad Request- Invalid request body (including malformedatvalue)401 Unauthorized- Bearer token required (when admin auth enabled)404 Not Found- Source branch does not exist, oratcommit is not reachable from source HEAD409 Conflict- Branch already exists500 Internal Server Error- Server error
Examples:
# Create branch from main (default source)
curl -X POST http://localhost:8090/v1/fluree/branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "feature-x"}'
# Create branch from a specific source branch
curl -X POST http://localhost:8090/v1/fluree/branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "staging", "source": "dev"}'
# Branch at a historical commit on main
curl -X POST http://localhost:8090/v1/fluree/branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "rewind", "at": "t:5"}'
GET /branch/
List all non-retracted branches for a ledger.
URL:
GET /branch/{ledger-name}
Response:
[
{
"branch": "main",
"ledger_id": "mydb:main",
"t": 5
},
{
"branch": "feature-x",
"ledger_id": "mydb:feature-x",
"t": 5,
"source": "main"
}
]
| Field | Description |
|---|---|
branch | Branch name |
ledger_id | Full ledger:branch identifier |
t | Current transaction time on this branch |
source | Source branch (only present for branches created via /branch) |
Examples:
curl http://localhost:8090/v1/fluree/branch/mydb
POST /drop-branch
Drop a branch from a ledger. Admin-protected.
URL:
POST /drop-branch
Request body:
{
"ledger": "mydb",
"branch": "feature-x"
}
| Field | Type | Required | Description |
|---|---|---|---|
ledger | string | Yes | Ledger name without branch suffix (e.g., “mydb”) |
branch | string | Yes | Branch name to drop (e.g., “feature-x”) |
Response body (200 OK):
{
"ledger_id": "mydb:feature-x",
"status": "Dropped",
"deferred": false,
"artifacts_deleted": 5,
"cascaded": [],
"warnings": []
}
| Field | Type | Description |
|---|---|---|
ledger_id | Full ledger:branch identifier of the dropped branch | |
status | Drop status ("Dropped", "AlreadyRetracted", "NotFound") | |
deferred | true if the branch has children — retracted but storage preserved | |
artifacts_deleted | Number of storage artifacts removed | |
cascaded | List of ancestor branch ledger_ids that were cascade-dropped | |
warnings | Any non-fatal warnings during the drop |
Behavior:
- Cannot drop
main: Returns 400 Bad Request. - Leaf branch (no children): Fully drops — deletes storage artifacts, purges NsRecord, decrements parent’s child count. If the parent was previously retracted and its child count reaches 0, the parent is cascade-dropped too.
- Branch with children (
branches > 0): Retracted (hidden from listings, rejects new transactions) but storage is preserved for children. When the last child is eventually dropped, the retracted parent is cascade-purged automatically.
Status codes:
200 OK- Branch dropped (or deferred) successfully400 Bad Request- Cannot drop the main branch404 Not Found- Ledger or branch does not exist500 Internal Server Error- Server error
Examples:
# Drop a leaf branch
curl -X POST http://localhost:8090/v1/fluree/drop-branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "feature-x"}'
# Drop a branch with children (will be deferred)
curl -X POST http://localhost:8090/v1/fluree/drop-branch \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "dev"}'
POST /rebase
Rebase a branch onto its source branch’s current HEAD. Admin-protected.
URL:
POST /rebase
Request body:
{
"ledger": "mydb",
"branch": "feature-x",
"strategy": "take-both"
}
| Field | Type | Required | Description |
|---|---|---|---|
ledger | string | Yes | Ledger name without branch suffix (e.g., “mydb”) |
branch | string | Yes | Branch name to rebase (e.g., “feature-x”) |
strategy | string | No | Conflict resolution strategy (default: “take-both”). Options: take-both, abort, take-source, take-branch, skip |
Response body (200 OK):
{
"ledger_id": "mydb:feature-x",
"branch": "feature-x",
"fast_forward": false,
"replayed": 3,
"skipped": 0,
"conflicts": 1,
"failures": 0,
"total_commits": 3,
"source_head_t": 8
}
| Field | Type | Description |
|---|---|---|
ledger_id | string | Full ledger:branch identifier |
branch | string | Branch name |
fast_forward | bool | true if the branch had no unique commits |
replayed | number | Number of commits successfully replayed |
skipped | number | Number of commits skipped (Skip strategy) |
conflicts | number | Number of conflicts detected |
failures | number | Number of commits that failed validation |
total_commits | number | Total branch commits considered |
source_head_t | number | Transaction time of the source branch HEAD |
Conflict strategies:
| Strategy | Behavior |
|---|---|
take-both | Replay as-is, both values coexist (multi-cardinality) |
abort | Fail on first conflict, no changes applied |
take-source | Drop branch’s conflicting flakes (source wins) |
take-branch | Keep branch’s flakes, retract source’s conflicting values |
skip | Skip entire commit if any flakes conflict |
Status codes:
200 OK- Rebase completed successfully400 Bad Request- Cannot rebase main, invalid strategy, or missing branch point404 Not Found- Ledger or branch does not exist409 Conflict- Rebase aborted due to conflict (abort strategy)500 Internal Server Error- Server error
Examples:
# Rebase with default strategy (take-both)
curl -X POST http://localhost:8090/v1/fluree/rebase \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "feature-x"}'
# Rebase with abort strategy (fail on conflicts)
curl -X POST http://localhost:8090/v1/fluree/rebase \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "branch": "feature-x", "strategy": "abort"}'
POST /merge
Merge a source branch into a target branch. Admin-protected.
Fast-forward merges copy the source commit chain into the target namespace and advance the target HEAD. When the target has diverged, Fluree performs a general merge: it computes the source and target deltas since their common ancestor, resolves overlapping (s, p, g) conflicts according to the requested strategy, and creates a merge commit on the target branch.
URL:
POST /merge
Request body:
{
"ledger": "mydb",
"source": "feature-x",
"target": "dev",
"strategy": "take-both"
}
| Field | Type | Required | Description |
|---|---|---|---|
ledger | string | Yes | Ledger name without branch suffix (e.g., “mydb”) |
source | string | Yes | Source branch to merge from (e.g., “feature-x”) |
target | string | No | Target branch to merge into (defaults to source’s parent branch) |
strategy | string | No | Conflict resolution strategy for non-fast-forward merges. Defaults to take-both. Options: take-both, abort, take-source, take-branch |
Conflict strategies:
| Strategy | Behavior |
|---|---|
take-both | Keep source flakes as-is, so both source and target values can coexist |
abort | Fail if conflicts are detected; no merge commit is created |
take-source | Source wins: keep source flakes and retract target’s conflicting values |
take-branch | Target wins: drop source flakes for conflicting keys |
skip is a rebase-only strategy and is not supported for non-fast-forward merges.
Response body (200 OK):
{
"ledger_id": "mydb:dev",
"target": "dev",
"source": "feature-x",
"fast_forward": false,
"new_head_t": 8,
"commits_copied": 3,
"conflict_count": 1,
"strategy": "take-both"
}
| Field | Type | Description |
|---|---|---|
ledger_id | string | Full ledger:branch identifier of the target |
target | string | Target branch name |
source | string | Source branch name |
fast_forward | bool | Whether this merge advanced the target directly to the source HEAD |
new_head_t | number | New commit HEAD transaction time of the target |
commits_copied | number | Number of commit blobs copied to the target namespace |
conflict_count | number | Number of overlapping (s, p, g) keys detected during a non-fast-forward merge |
strategy | string | Conflict strategy used for a non-fast-forward merge. Omitted for fast-forward merges |
Status codes:
200 OK- Merge completed successfully400 Bad Request- Source has no branch point (e.g., main), self-merge, unknown strategy, or unsupported merge strategy404 Not Found- Ledger or branch does not exist409 Conflict- Merge aborted due to conflicts when using theabortstrategy, or the target HEAD changed during commit publishing500 Internal Server Error- Server error
Examples:
# Merge feature-x into its parent (inferred from branch point)
curl -X POST http://localhost:8090/v1/fluree/merge \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "source": "feature-x"}'
# Merge dev into main (explicit target)
curl -X POST http://localhost:8090/v1/fluree/merge \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "source": "dev", "target": "main"}'
# Non-fast-forward merge with source-winning conflict resolution
curl -X POST http://localhost:8090/v1/fluree/merge \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb", "source": "dev", "target": "main", "strategy": "take-source"}'
GET /merge-preview/
Read-only preview of merging a source branch into a target branch. Returns the rich diff — ahead/behind commit summaries, conflict keys, and fast-forward eligibility — without mutating any nameservice or content store state.
Bearer token required when data_auth.mode = required; reads are gated on bearer.can_read(ledger).
URL:
GET /merge-preview/{ledger-name}?source={source}&target={target}&max_commits={n}&max_conflict_keys={n}&include_conflicts={bool}&include_conflict_details={bool}&strategy={strategy}
Path / Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
ledger (path) | string | Yes | Ledger name (e.g., “mydb”) |
source | string | Yes | Source branch to merge from (e.g., “feature-x”) |
target | string | No | Target branch (defaults to the source’s parent branch) |
max_commits | number | No | Cap on per-side commit summaries returned (default 500). Server clamps to a hard maximum of 5,000 — values above are silently lowered. Bounds response size, not divergence-walk cost (the unbounded count is still computed). |
max_conflict_keys | number | No | Cap on conflict keys returned (default 200). Server clamps to a hard maximum of 5,000. Bounds response size, not the conflict-delta walks. |
include_conflicts | bool | No | When false, skips the conflict computation (default true). Use this to make the preview cheap on diverged branches. |
include_conflict_details | bool | No | When true, includes source/target flake values for the returned conflict keys. Defaults to false. Details are computed after max_conflict_keys is applied. |
strategy | string | No | Strategy used to annotate conflict details. Defaults to take-both. Options: take-both, abort, take-source, take-branch. |
Response body (200 OK):
{
"source": "feature-x",
"target": "main",
"ancestor": { "commit_id": "bafy...", "t": 5 },
"ahead": {
"count": 3,
"commits": [
{ "t": 8, "commit_id": "bafy...", "time": "2026-04-25T12:00:00Z",
"asserts": 2, "retracts": 0, "flake_count": 2, "message": null }
],
"truncated": false
},
"behind": { "count": 1, "commits": [...], "truncated": false },
"fast_forward": false,
"mergeable": true,
"conflicts": {
"count": 1,
"keys": [{ "s": [100, "alice"], "p": [100, "status"], "g": null }],
"truncated": false,
"strategy": "take-source",
"details": [
{
"key": { "s": [100, "alice"], "p": [100, "status"], "g": null },
"source_values": [["ex:alice", "ex:status", "active", "xsd:string", true]],
"target_values": [["ex:alice", "ex:status", "archived", "xsd:string", true]],
"resolution": {
"source_action": "kept",
"target_action": "retracted",
"outcome": "source-wins"
}
}
]
}
}
| Field | Type | Description |
|---|---|---|
source | string | Source branch name |
target | string | Target branch name (resolved from default when not supplied) |
ancestor | object | null | Common ancestor {commit_id, t}. null when both heads are absent |
ahead | object | Commits on source not on target (count, commits, truncated) |
behind | object | Commits on target not on source |
fast_forward | bool | True when target HEAD == ancestor (or both heads absent) |
mergeable | bool | False only when the selected preview strategy would abort, e.g. strategy=abort with conflicts. This is a strategy/conflict signal, not full transaction validation. mergeable=true does not guarantee a subsequent POST /merge will succeed; it only reflects the conflict/strategy interaction at preview time. |
conflicts | object | Overlapping (s, p, g) keys touched on both sides since the ancestor. Empty when fast_forward or include_conflicts=false |
Per-commit summaries (ahead.commits[] / behind.commits[]) are newest-first and include assert/retract counts plus an optional message extracted from txn_meta when an f:message string entry is present.
When include_conflict_details=true, conflicts.details[] contains one entry for each returned conflict key. source_values and target_values are the current asserted values for that key at each branch HEAD, using the same resolved flake tuple format as /show: [subject, predicate, object, datatype, operation], with an optional metadata object as the 6th tuple item. The resolution object is an annotation only; preview does not apply the strategy or mutate state.
Status codes:
200 OK— Preview computed successfully400 Bad Request— Source has no branch point (e.g., main),source == target, unknown strategy, unsupported preview strategy,include_conflict_details=truewithinclude_conflicts=false, orstrategy=abortwithinclude_conflicts=false401 Unauthorized— Bearer token required404 Not Found— Ledger or branch does not exist (or bearer cannot read it)
Examples:
# Default target (source's parent), defaults for caps and conflict computation
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=feature-x"
# Counts only — skip the conflict walks for a faster response
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=dev&target=main&include_conflicts=false"
# Cap commit lists at 50 per side
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=dev&max_commits=50"
# Include value details and labels for a source-winning merge
curl "http://localhost:8090/v1/fluree/merge-preview/mydb?source=dev&target=main&include_conflict_details=true&strategy=take-source"
GET /info/
Get ledger metadata. Used by the CLI for info, push, pull, and clone.
URL:
GET /info/{ledger-id}
Path Parameters:
ledger-id: Ledger ID (e.g., “mydb” or “mydb:main”)
Response (non-proxy mode):
Returns comprehensive ledger metadata including namespace codes, property stats, and class counts. Always includes:
{
"ledger_id": "mydb:main",
"t": 42,
"commitId": "bafybeig...headCommitCid",
"indexId": "bafybeig...indexRootCid",
"namespaces": { ... },
"properties": { ... },
"classes": [ ... ]
}
Response (proxy storage mode):
Returns simplified nameservice-only metadata:
{
"ledger_id": "mydb:main",
"t": 42,
"commit_head_id": "bafybeig...commitCid",
"index_head_id": "bafybeig...indexCid"
}
| Field | Type | Required | Description |
|---|---|---|---|
ledger_id | string | Yes | Canonical ledger ID |
t | integer | Yes | Current transaction time. Used by push/pull for head comparison. |
commitId | string | No | Head commit CID (non-proxy mode) |
commit_head_id | string | No | Head commit CID (proxy mode) |
Important: The
tfield is required by the CLI for push/pull/clone operations. See CLI-Server API Contract for details.
Optional query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
realtime_property_details | boolean | true | When false, use the lighter fast novelty-aware stats path instead of the default full lookup-backed path |
include_property_datatypes | boolean | true | Include datatype info for properties |
include_property_estimates | boolean | false | Include index-derived NDV/selectivity estimates for properties |
Status Codes:
200 OK- Ledger found401 Unauthorized- Authentication required404 Not Found- Ledger not found
Examples:
# Get ledger info
curl "http://localhost:8090/v1/fluree/info/mydb:main"
# With auth token
curl "http://localhost:8090/v1/fluree/info/mydb:main" \
-H "Authorization: Bearer eyJ..."
GET /exists/
Check if a ledger exists in the nameservice.
URL:
GET /exists/{ledger-id}
Path Parameters:
ledger-id: Ledger ID (e.g., “mydb” or “mydb:main”)
Response:
{
"ledger": "mydb:main",
"exists": true
}
| Field | Type | Description |
|---|---|---|
ledger | string | Ledger ID (echoed back) |
exists | boolean | Whether the ledger is registered in the nameservice |
Status Codes:
200 OK- Check completed successfully (regardless of whether ledger exists)500 Internal Server Error- Server error
Usage Notes:
This is a lightweight check that only queries the nameservice without loading the ledger data. Use this to:
- Check if a ledger exists before attempting to load it
- Implement conditional create-or-load logic
- Validate ledger IDs in application code
Examples:
# Check a ledger ID
curl "http://localhost:8090/v1/fluree/exists/mydb:main"
# Conditional create-or-load in shell
if curl -s "http://localhost:8090/v1/fluree/exists/mydb" | jq -e '.exists == false' > /dev/null; then
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb"}'
fi
System Endpoints
GET /health
Health check endpoint for monitoring.
URL:
GET /health
Response:
{
"status": "healthy",
"version": "0.1.0",
"storage": "memory",
"uptime_ms": 123456
}
Status Codes:
200 OK- System healthy503 Service Unavailable- System unhealthy
Example:
curl http://localhost:8090/health
GET /stats
Detailed server statistics.
URL:
GET /stats
Response:
{
"version": "0.1.0",
"uptime_ms": 123456789,
"storage": {
"mode": "memory",
"total_bytes": 12345678,
"ledgers": 5
},
"queries": {
"total": 1234,
"active": 3,
"average_duration_ms": 45
},
"transactions": {
"total": 567,
"average_duration_ms": 89
},
"indexing": {
"active": true,
"pending_ledgers": 2
}
}
Example:
curl http://localhost:8090/v1/fluree/stats
Events Endpoint
GET /events
Server-Sent Events (SSE) stream of nameservice changes for ledgers and graph sources. Available on transaction servers only (not peers).
Query parameters:
| Parameter | Description |
|---|---|
all=true | Subscribe to all ledgers and graph sources |
ledger=<id> | Subscribe to a specific ledger (repeatable) |
graph-source=<id> | Subscribe to a specific graph source (repeatable) |
Event types:
| Event | Description |
|---|---|
ns-record | A ledger or graph source was published/updated |
ns-retracted | A ledger or graph source was deleted |
Authentication: Configurable via --events-auth-mode none|optional|required. See Query peers and replication for full details including auth configuration, event payloads, and peer subscription setup.
Graph Source Endpoints
Note: HTTP endpoints for BM25 and vector index lifecycle management (create, sync, drop) are not yet implemented in the server. BM25 and vector indexes are currently managed via the Rust API (
Bm25CreateConfig,create_full_text_index,sync_bm25_index,drop_full_text_index). See BM25 Full-Text Search and Vector Search for API usage.BM25 search is available in queries via the
f:graphSource/f:searchTextpattern in where clauses — see the query documentation for details.
Graph source metadata can be discovered via GET /ledgers or GET /info/{graph-source-id}.
POST {api_base_url}/iceberg/map
Map an Iceberg table (or R2RML-mapped relational source backed by Iceberg) as a graph source. Admin-protected — requires the admin Bearer token when an admin token is configured. Available only when the server is built with the iceberg feature.
URL:
POST {api_base_url}/iceberg/map
For the standalone server and Docker image defaults, this is:
POST http://localhost:8090/v1/fluree/iceberg/map
Request Body:
{
"name": "warehouse-orders",
"mode": "rest",
"catalog_uri": "https://polaris.example.com/api/catalog",
"table": "sales.orders",
"branch": "main",
"r2rml": "@prefix rr: <http://www.w3.org/ns/r2rml#> . ...",
"r2rml_type": "text/turtle",
"warehouse": "prod",
"auth_bearer": "…",
"oauth2_token_url": "https://idp.example.com/token",
"oauth2_client_id": "…",
"oauth2_client_secret": "…",
"no_vended_credentials": false,
"s3_region": "us-east-1",
"s3_endpoint": "https://s3.example.com",
"s3_path_style": false,
"table_location": "s3://bucket/warehouse/sales/orders"
}
| Field | Type | Description |
|---|---|---|
name | string | Graph source name (required) |
mode | string | rest (default) or direct |
catalog_uri | string | REST catalog URI (required in rest mode) |
table | string | Table identifier namespace.table (required in rest mode) |
table_location | string | S3 table location (required in direct mode) |
r2rml | string | Inline R2RML mapping (Turtle/JSON-LD). Omit to auto-generate a direct mapping. |
r2rml_type | string | Media type of r2rml (text/turtle, application/ld+json) |
branch | string | Branch name (default: main) |
auth_bearer | string | Bearer token for catalog auth |
oauth2_* | string | OAuth2 client-credentials flow for the catalog |
warehouse | string | Warehouse identifier |
no_vended_credentials | bool | Disable vended credentials |
s3_region, s3_endpoint, s3_path_style | S3 overrides for direct mode |
Response:
{
"graph_source_id": "warehouse-orders:main",
"table_identifier": "sales.orders",
"catalog_uri": "https://polaris.example.com/api/catalog",
"connection_tested": true,
"mapping_source": "r2rml-inline",
"triples_map_count": 3,
"mapping_validated": true
}
Status Codes:
201 Created— graph source created400 Bad Request— missing required fields or invalid R2RML401/403— admin auth required500 Internal Server Error— catalog connection or mapping failure
See also the CLI wrapper: fluree iceberg map.
Admin Endpoints
POST /reindex
Trigger a full manual reindex for a ledger. Walks the entire commit chain and rebuilds the binary index from scratch using the server’s configured indexer settings. Admin-protected — requires the admin Bearer token when admin auth is enabled.
This endpoint runs the reindex synchronously and returns when the new root is committed. For large ledgers it may run for many minutes; configure your HTTP client timeout accordingly. In peer mode, the request is forwarded to the transaction server.
URL:
POST /reindex
Request Body:
{
"ledger": "mydb:main"
}
| Field | Type | Description |
|---|---|---|
ledger | string | Ledger alias (name or name:branch). Required. |
opts | object | Reserved for future per-request indexer overrides. Currently accepted but ignored. |
Example:
curl -X POST http://localhost:8090/v1/fluree/reindex \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <admin-token>' \
-d '{"ledger": "mydb:main"}'
Response:
{
"ledger_id": "mydb:main",
"index_t": 42,
"root_id": "fluree:cid:bafy…",
"stats": {
"flake_count": 184273,
"leaf_count": 614,
"branch_count": 23,
"total_bytes": 47185920
}
}
| Field | Description |
|---|---|
ledger_id | Ledger alias the reindex was run against |
index_t | Transaction time the new index was built at (matches the head commit) |
root_id | ContentId of the newly written index root |
stats.flake_count | Total flakes in the rebuilt index |
stats.leaf_count | Number of leaf nodes written |
stats.branch_count | Number of branch nodes written |
stats.total_bytes | Bytes written to storage during the reindex |
Status Codes:
200 OK— reindex complete400 Bad Request— missing/invalidledger401/403— admin auth required404 Not Found— ledger does not exist500 Internal Server Error— reindex failed
When triggering indexing through the Rust API instead, see Fluree::reindex and ReindexOptions. For background incremental indexing (which runs automatically as commits are made), see Background indexing.
Admin Authentication
Administrative endpoints (/create, /drop, /reindex, branch operations, and Iceberg mapping when enabled) can be protected with Bearer token authentication.
Configuration
Enable admin authentication with CLI flags:
# Production: require trusted tokens
fluree-server \
--admin-auth-mode=required \
--admin-auth-trusted-issuer=did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
# Development: no authentication (default)
fluree-server --admin-auth-mode=none
Environment Variables:
FLUREE_ADMIN_AUTH_MODE:none(default) orrequiredFLUREE_ADMIN_AUTH_TRUSTED_ISSUERS: Comma-separated list of trusted did:key identifiers
Token Format
Admin tokens use the same JWS format as other Fluree tokens. Required claims:
{
"iss": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"exp": 1705932000,
"sub": "admin@example.com"
}
| Claim | Required | Description |
|---|---|---|
iss | Yes | Issuer did:key (must be in trusted issuers list) |
exp | Yes | Expiration timestamp (Unix seconds) |
sub | No | Subject identifier |
fluree.identity | No | Identity for audit logging |
Making Authenticated Requests
Include the token in the Authorization header:
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOiJFZERTQSIsImp3ayI6ey..." \
-d '{"ledger": "mydb:main"}'
Issuer Trust
Tokens must be signed by a trusted issuer. Configure trusted issuers:
# Single issuer
--admin-auth-trusted-issuer=did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
# Multiple issuers
--admin-auth-trusted-issuer=did:key:z6Mk... \
--admin-auth-trusted-issuer=did:key:z6Mn...
# Fallback to events auth issuers
--events-auth-trusted-issuer=did:key:z6Mk...
If no admin-specific issuers are configured, admin auth falls back to --events-auth-trusted-issuer.
Response Codes
401 Unauthorized: Missing or invalid Bearer token401 Unauthorized: Token expired401 Unauthorized: Untrusted issuer
Error Responses
All endpoints may return error responses in this format (and should return Content-Type: application/json):
{
"error": "Human-readable error message",
"status": 409,
"@type": "err:db/Conflict",
"cause": {
"error": "Optional nested error detail",
"status": 409,
"@type": "err:db/SomeInnerError"
}
}
See Errors and Status Codes for complete error reference.
CLI Compatibility Requirements
This section summarizes the contract that third-party server implementations (e.g., Solo) must follow to be compatible with the Fluree CLI (fluree-db-cli). The CLI discovers the API base URL via fluree remote add and constructs endpoint URLs as {base_url}/{operation}/{ledger}.
Required endpoints
| Endpoint | CLI commands |
|---|---|
GET /info/{ledger} | info, push, pull, clone |
GET /show/{ledger}?commit=<ref> | show --remote |
POST /query/{ledger} | query (JSON-LD and SPARQL) |
POST /insert/{ledger} | insert |
POST /upsert/{ledger} | upsert |
GET /exists/{ledger} | clone (pre-create check) |
GET /context/{ledger} | context get |
PUT /context/{ledger} | context set |
GET /ledgers | list --remote |
For sync workflows (clone/push/pull), these additional endpoints are needed:
| Endpoint | CLI commands | Notes |
|---|---|---|
POST /push/{ledger} | push | Required for push |
GET /commits/{ledger} | clone, pull | Paginated export fallback |
POST /pack/{ledger} | clone, pull | Preferred bulk transport; CLI falls back to /commits on 404/405/501 |
GET /storage/ns/{ledger} | clone, pull | Pack preflight (head CID discovery) |
Critical response field: t
The GET /info/{ledger} response must include a t field (integer) representing the current transaction time. This field is used by the CLI for:
- push: Comparing
local_tvsremote_tto determine what commits to send and detect divergence - pull: Comparing
remote_tvslocal_tto determine if new commits are available - clone: Guarding against cloning empty ledgers (
t == 0) and displaying progress
Omitting t from the info response will cause push and pull to fail with "remote ledger-info response missing 't'".
Transaction response format
The /insert and /upsert endpoints should return a JSON object. The CLI displays the full response as pretty-printed JSON. Common fields include t, tx-id, and commit.hash, but the exact shape is not prescribed — the CLI does not parse individual fields from transaction responses.
Authentication
All endpoints accept Authorization: Bearer <token>. On 401, the CLI attempts a single token refresh (if OIDC is configured) and retries. See Auth contract for the full authentication lifecycle.
Error responses
Error bodies should be JSON with an error or message field. The CLI extracts the first available string from message or error for display. Plain-text error bodies are also accepted.
Related Documentation
- Overview - API overview and principles
- Headers - HTTP headers and content types
- Signed Requests - Authentication
- Errors - Error codes and troubleshooting
Headers, Content Types, and Request Sizing
This document covers HTTP headers, content type negotiation, request size limits, and related considerations for the Fluree HTTP API.
Request Headers
Content-Type
Specifies the format of the request body.
Supported Values:
JSON-LD Transactions and Queries:
Content-Type: application/json
Default for JSON-LD transactions and JSON-LD queries.
Content-Type: application/ld+json
Explicit JSON-LD content type.
SPARQL Queries:
Content-Type: application/sparql-query
For SPARQL SELECT, ASK, CONSTRUCT queries.
Content-Type: application/sparql-update
For SPARQL UPDATE operations. See SPARQL Transactions for supported operations.
RDF Formats:
Content-Type: text/turtle
For Turtle RDF format transactions. Supported on /insert (fast direct path) and /upsert.
Content-Type: application/trig
For TriG format transactions with named graphs (GRAPH blocks). Only supported on /upsert - returns 400 error on /insert because named graph ingestion requires the upsert path.
Content-Type: application/n-triples
For N-Triples format (future support).
Content-Type: application/rdf+xml
For RDF/XML format (future support).
Accept
Specifies the desired response format.
Supported Values:
Accept: application/json
Compact JSON format (default).
Accept: application/ld+json
Full JSON-LD with @context.
Accept: application/sparql-results+json
SPARQL JSON Results format (for SPARQL queries).
Accept: application/sparql-results+xml
SPARQL XML Results format (for SPARQL SELECT/ASK queries).
Accept: text/turtle
Turtle RDF format (for CONSTRUCT queries).
Accept: application/rdf+xml
RDF/XML graph format (for CONSTRUCT/DESCRIBE queries).
Accept: application/vnd.fluree.agent+json
Agent JSON format — optimized for LLM/agent consumption. Returns a self-describing envelope with schema, compact rows, and pagination support. See Output Formats for details.
Use the Fluree-Max-Bytes header to set a byte budget for response truncation:
Fluree-Max-Bytes: 32768
Accept: application/n-triples
N-Triples format (future support).
Multiple Accept Values:
You can specify multiple formats with quality values:
Accept: application/ld+json; q=1.0, application/json; q=0.8
The server will choose the best match based on quality values and support.
Authorization
Authentication credentials. Only required when the server has authentication enabled for the relevant endpoint group (see Configuration).
Bearer Token (Ed25519 JWS or OIDC):
Authorization: Bearer eyJhbGciOiJFZERTQSIsImp3ayI6eyJrdHkiOiJPS1AiLCJjcnYiOiJFZDI1NTE5IiwieCI6Ii4uLiJ9fQ...
The server automatically dispatches to the correct verification path based on the token header:
- Tokens with an embedded
jwkfield use the Ed25519 verification path - Tokens with a
kidfield use the OIDC/JWKS verification path (requiresoidcfeature)
Signed Requests:
For JWS/VC signed request bodies, set Content-Type to application/jose:
Content-Type: application/jose
See Signed Requests for details.
Content-Length
The server requires Content-Length for all POST requests:
Content-Length: 1234
Most HTTP clients set this automatically.
Accept-Encoding
Request compressed responses:
Accept-Encoding: gzip, deflate
The server will compress responses when appropriate, reducing bandwidth usage.
Response Header:
Content-Encoding: gzip
User-Agent
Identify your client application:
User-Agent: MyApp/1.0.0 (https://example.com)
Helpful for server logs and troubleshooting.
X-Request-ID
Client-supplied request ID for tracing:
X-Request-ID: abc-123-def-456
The server will include this in logs and response headers for correlation. When a request queues background indexing work, the copied X-Request-ID also appears on the background indexer worker logs so you can connect the foreground request and later indexing activity in plain log search.
Response Headers
Content-Type
Indicates the format of the response body:
Content-Type: application/json; charset=utf-8
Content-Length
Size of the response body in bytes:
Content-Length: 5678
X-Fluree-T
The transaction time of the data returned (for queries):
X-Fluree-T: 42
Useful for tracking which version of data was queried.
X-Fluree-Commit
The commit ContentId of the data returned:
X-Fluree-Commit: abc123def456789...
ETag
Entity tag for caching:
ETag: "abc123def456"
Can be used with If-None-Match for conditional requests.
Cache-Control
Caching directives:
For current queries:
Cache-Control: no-cache
For historical queries:
Cache-Control: public, max-age=31536000, immutable
Historical queries are immutable and cache indefinitely.
X-RateLimit Headers
Rate limit information (if enabled):
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642857600
X-Request-ID
Echo of client-supplied request ID or server-generated ID:
X-Request-ID: abc-123-def-456
X-Response-Time
Server processing time in milliseconds:
X-Response-Time: 45
Content Type Details
JSON-LD (application/json, application/ld+json)
Request Example:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
}
]
}
Compact vs Expanded:
application/json returns compact JSON:
[
{ "name": "Alice" }
]
application/ld+json returns with full context:
{
"@context": {
"name": "http://schema.org/name"
},
"@graph": [
{ "name": "Alice" }
]
}
SPARQL Query (application/sparql-query)
Request Example:
PREFIX ex: <http://example.org/ns/>
PREFIX schema: <http://schema.org/>
SELECT ?name
FROM <mydb:main>
WHERE {
?person a schema:Person .
?person schema:name ?name .
}
Plain text SPARQL query in the request body.
SPARQL Results JSON (application/sparql-results+json)
Response Example:
{
"head": {
"vars": ["name"]
},
"results": {
"bindings": [
{
"name": {
"type": "literal",
"value": "Alice",
"datatype": "http://www.w3.org/2001/XMLSchema#string"
}
}
]
}
}
Follows W3C SPARQL 1.1 Query Results JSON Format specification.
Turtle (text/turtle)
Transaction Request:
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
ex:alice a schema:Person ;
schema:name "Alice" ;
schema:age 30 .
CONSTRUCT Response:
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
ex:alice a schema:Person .
ex:alice schema:name "Alice" .
Request Size Limits
Default Limits
The server enforces size limits to prevent resource exhaustion:
Transaction Requests:
- Default limit: 10 MB
- Configurable:
--max-transaction-size
Query Requests:
- Default limit: 1 MB
- Configurable:
--max-query-size
History Requests:
- Default limit: 1 MB
- Configurable:
--max-history-size
Exceeding Limits
If a request exceeds size limits:
Status Code: 413 Payload Too Large
Response:
{
"error": "Request body exceeds maximum size of 10485760 bytes",
"status": 413,
"@type": "err:http/PayloadTooLarge"
}
Configuration
Set custom limits when starting the server:
./fluree-db-server \
--max-transaction-size 20971520 \ # 20 MB
--max-query-size 2097152 \ # 2 MB
--max-response-size 104857600 # 100 MB
Response Size Limits
The server also limits response sizes:
Default limit: 100 MB
If a query result exceeds the limit:
Status Code: 413 Payload Too Large
Response:
{
"error": "Query result exceeds maximum response size",
"status": 413,
"@type": "err:http/ResponseTooLarge"
}
Solution: Use LIMIT and pagination:
{
"select": ["?name"],
"where": [...],
"limit": 1000,
"offset": 0
}
Compression
Request Compression
Send compressed requests (for large transactions):
Content-Encoding: gzip
Content-Type: application/json
The request body should be gzip-compressed JSON.
Response Compression
Request compressed responses:
Accept-Encoding: gzip, deflate
The server will compress responses when:
- Client accepts compression
- Response is larger than threshold (typically 1 KB)
- Content-Type is compressible
Response Headers:
Content-Encoding: gzip
Vary: Accept-Encoding
Compression Benefits:
- Reduced bandwidth usage (typically 70-90% for JSON)
- Faster response times on slower connections
- Lower costs for cloud deployments
Character Encoding
All text content uses UTF-8 encoding.
Request:
Content-Type: application/json; charset=utf-8
Response:
Content-Type: application/json; charset=utf-8
Unicode characters are supported in:
- IRIs
- Literal values
- Property names
- Comments
CORS Headers
For web browser access, the server supports Cross-Origin Resource Sharing (CORS).
CORS Request Headers
Preflight Request:
OPTIONS /query HTTP/1.1
Origin: https://example.com
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Content-Type
CORS Response Headers
Preflight Response:
Access-Control-Allow-Origin: https://example.com
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Max-Age: 86400
Actual Response:
Access-Control-Allow-Origin: https://example.com
Access-Control-Allow-Credentials: true
CORS Configuration
Configure CORS when starting the server:
./fluree-db-server \
--cors-origin "https://example.com" \
--cors-methods "GET,POST,OPTIONS" \
--cors-headers "Content-Type,Authorization"
Allow all origins (development only):
./fluree-db-server --cors-origin "*"
Never use --cors-origin "*" in production with credentials.
Caching Headers
ETag and Conditional Requests
The server supports ETags for efficient caching.
Initial Request:
GET /ledgers/mydb:main HTTP/1.1
Response:
HTTP/1.1 200 OK
ETag: "abc123def456"
Cache-Control: no-cache
Conditional Request:
GET /ledgers/mydb:main HTTP/1.1
If-None-Match: "abc123def456"
Not Modified Response:
HTTP/1.1 304 Not Modified
ETag: "abc123def456"
Immutable Historical Data
Historical queries with time specifiers are immutable:
Query:
POST /query HTTP/1.1
{"from": "mydb:main@t:100", ...}
Response:
HTTP/1.1 200 OK
Cache-Control: public, max-age=31536000, immutable
ETag: "mydb:main@t:100:query-hash"
Clients can cache these responses indefinitely.
Custom Headers
X-Fluree-Fuel-Limit
Set query fuel limit to prevent runaway queries:
X-Fluree-Fuel-Limit: 1000000
See Tracking and Fuel Limits for details.
X-Fluree-Timeout
Set query timeout in milliseconds:
X-Fluree-Timeout: 30000
X-Fluree-Policy
Specify a policy to apply (if authorized):
X-Fluree-Policy: ex:restrictive-policy
Best Practices
1. Always Set Content-Type
Explicitly set Content-Type for all requests:
Content-Type: application/json
2. Accept Compression
Always request compression for better performance:
Accept-Encoding: gzip, deflate
3. Use Appropriate Accept Headers
Request the format you need:
Accept: application/json
4. Include User-Agent
Identify your application:
User-Agent: MyApp/1.0.0
5. Handle ETags
Implement ETag caching for frequently accessed resources:
const etag = localStorage.getItem('ledger-etag');
if (etag) {
headers['If-None-Match'] = etag;
}
6. Monitor Rate Limits
Check rate limit headers and back off when needed:
const remaining = response.headers.get('X-RateLimit-Remaining');
if (remaining < 10) {
// Slow down requests
}
7. Use Request IDs
Include request IDs for tracing:
X-Request-ID: uuid-v4-here
Related Documentation
- Overview - API overview
- Endpoints - Endpoint reference
- Signed Requests - Authentication
- Errors - Error handling
Signed Requests (JWS/VC)
Fluree supports cryptographically signed requests using JSON Web Signatures (JWS) and Verifiable Credentials (VC). This provides tamper-proof authentication and enables trustless data exchange.
Note: Requires the credential feature flag. See Compatibility and Feature Flags.
Why Sign Requests?
Signed requests provide:
- Authentication: Prove the identity of the request sender
- Integrity: Ensure the request hasn’t been tampered with
- Non-repudiation: Sender cannot deny sending the request
- Authorization: Cryptographically link requests to specific identities
- Auditability: Complete audit trail of who did what
JSON Web Signatures (JWS)
JWS is an IETF standard (RFC 7515) for representing digitally signed content as JSON.
JWS Structure
A JWS consists of three parts:
- Protected Header: Metadata about the signature (base64url-encoded)
- Payload: The actual content being signed (base64url-encoded)
- Signature: Cryptographic signature (base64url-encoded)
Compact Serialization:
eyJhbGciOiJFZDI1NTE5In0.eyJmcm9tIjoibXlkYjptYWluIn0.c2lnbmF0dXJl
|_______header_______|.|______payload______|.|_signature_|
JSON Serialization:
{
"payload": "eyJmcm9tIjoibXlkYjptYWluIn0",
"signatures": [
{
"protected": "eyJhbGciOiJFZDI1NTE5In0",
"signature": "c2lnbmF0dXJl"
}
]
}
Supported Algorithm
Fluree uses EdDSA (Ed25519) for JWS verification. All signed requests must use "alg": "EdDSA" in the protected header.
Creating Signed Requests
Step 1: Prepare the Payload
Create your query or transaction as usual:
{
"@context": {
"ex": "http://example.org/ns/"
},
"from": "mydb:main",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Step 2: Encode the Payload
Base64url-encode the JSON payload:
const payload = JSON.stringify(query);
const encodedPayload = base64url.encode(payload);
Step 3: Create the Protected Header
Create a header specifying the algorithm and key ID:
{
"alg": "EdDSA",
"kid": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}
Base64url-encode the header:
const header = JSON.stringify({ alg: "EdDSA", kid: keyId });
const encodedHeader = base64url.encode(header);
Step 4: Sign
Create the signing input and sign it:
const signingInput = encodedHeader + "." + encodedPayload;
const signature = sign(signingInput, privateKey);
const encodedSignature = base64url.encode(signature);
Step 5: Construct the JWS
Create the complete JWS:
Compact Format:
const jws = encodedHeader + "." + encodedPayload + "." + encodedSignature;
JSON Format:
{
"payload": "<encodedPayload>",
"signatures": [
{
"protected": "<encodedHeader>",
"signature": "<encodedSignature>"
}
]
}
Step 6: Send the Request
Send the JWS to Fluree:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/jose" \
-d '{
"payload": "eyJmcm9tIjoibXlkYjptYWluIn0...",
"signatures": [{
"protected": "eyJhbGciOiJFZDI1NTE5In0...",
"signature": "c2lnbmF0dXJl..."
}]
}'
Verifiable Credentials (VC)
Verifiable Credentials are a W3C standard for cryptographically verifiable digital credentials.
VC Structure
A Verifiable Credential includes:
{
"@context": [
"https://www.w3.org/2018/credentials/v1"
],
"type": ["VerifiableCredential"],
"issuer": "did:key:z6Mkh...",
"issuanceDate": "2024-01-22T10:00:00Z",
"credentialSubject": {
"id": "did:key:z6Mkh...",
"flureeAction": {
"query": {
"from": "mydb:main",
"select": ["?name"],
"where": [...]
}
}
},
"proof": {
"type": "Ed25519Signature2020",
"created": "2024-01-22T10:00:00Z",
"verificationMethod": "did:key:z6Mkh...#z6Mkh...",
"proofPurpose": "authentication",
"proofValue": "z58DAdFfa9SkqZMVP..."
}
}
Creating a Verifiable Credential
Use a VC library to create signed credentials:
import { issue } from '@digitalbazaar/vc';
const credential = {
'@context': ['https://www.w3.org/2018/credentials/v1'],
type: ['VerifiableCredential'],
issuer: didKey,
issuanceDate: new Date().toISOString(),
credentialSubject: {
id: didKey,
flureeAction: {
query: queryObject
}
}
};
const verifiableCredential = await issue({
credential,
suite: new Ed25519Signature2020({ key: keyPair }),
documentLoader
});
Sending a VC
Send the VC to Fluree:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/vc+ld+json" \
-d '{
"@context": ["https://www.w3.org/2018/credentials/v1"],
"type": ["VerifiableCredential"],
...
}'
Decentralized Identifiers (DIDs)
Fluree uses DIDs to identify public keys.
Supported DID Methods
did:key - Public key embedded in the DID (recommended):
did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
did:web - Web-based DID resolution:
did:web:example.com:users:alice
did:ion - ION network DIDs (future support):
did:ion:EiClkZMDxPKqC9c-umQfTkR8vvZ9JPhl_xLDI9Nfk38w5w
DID Resolution
Fluree resolves DIDs to public keys:
- did:key: Public key extracted directly from DID
- did:web: Fetched from
https://example.com/.well-known/did.json - did:ion: Resolved via ION network
Public Key Resolution
Standalone server signed requests verify Ed25519 JWS material from the request
itself (for example embedded JWK / did:key) or configured OIDC/JWKS issuers.
There is no /admin/keys registration endpoint.
Request Verification
Verification Process
When Fluree receives a signed request:
- Extract the signature and header
- Resolve the key ID (kid) to a public key
- Verify the signature using the public key
- Check expiration (if
expclaim present) - Validate issuer (if required)
- Apply authorization policies based on DID
Verification Failure
If verification fails:
Status Code: 401 Unauthorized
Response:
{
"error": "Invalid signature",
"status": 401,
"@type": "err:auth/InvalidSignature"
}
Key Management
Generating Keys
Ed25519 (EdDSA):
import { generateKeyPair } from '@stablelib/ed25519';
const keyPair = generateKeyPair();
// keyPair.publicKey - 32 bytes
// keyPair.secretKey - 64 bytes
Storing Keys
Secure Storage:
- Hardware Security Modules (HSM)
- Key Management Services (AWS KMS, Azure Key Vault)
- Encrypted files with strong passphrases
- Hardware wallets for blockchain-based DIDs
Never:
- Store private keys in code
- Commit keys to version control
- Send keys over insecure channels
- Share keys between applications
Key Rotation
Rotate keys regularly:
- Generate new key pair
- Register new public key with Fluree
- Update client to use new key
- Revoke old key after transition period
- Remove old key from Fluree
Authorization with Signed Requests
Identity-Based Policies
Fluree policies can use the signer’s DID for authorization:
{
"@context": {
"ex": "http://example.org/ns/",
"f": "https://ns.flur.ee/db#"
},
"@id": "ex:admin-policy",
"f:policy": [
{
"f:subject": "did:key:z6Mkh...",
"f:action": ["query", "transact"],
"f:allow": true
}
]
}
Role-Based Access
Link DIDs to roles:
{
"@id": "did:key:z6Mkh...",
"@type": "ex:User",
"ex:role": "ex:Administrator"
}
Policy checks the role:
{
"f:policy": [
{
"f:subject": { "ex:role": "ex:Administrator" },
"f:action": "*",
"f:allow": true
}
]
}
Code Examples
JavaScript/TypeScript
import jose from 'jose';
async function signQuery(query: object, privateKey: Uint8Array) {
const payload = JSON.stringify(query);
const jws = await new jose.SignJWT(query)
.setProtectedHeader({ alg: 'EdDSA', kid: 'did:key:z6Mkh...' })
.setIssuedAt()
.setExpirationTime('5m')
.sign(privateKey);
return jws;
}
// Send signed request
const signedQuery = await signQuery(query, privateKey);
const response = await fetch('http://localhost:8090/v1/fluree/query', {
method: 'POST',
headers: { 'Content-Type': 'application/jose' },
body: signedQuery
});
Python
from jwcrypto import jwk, jws
import json
def sign_query(query, private_key):
# Create JWK from private key
key = jwk.JWK.from_json(private_key)
# Create JWS
payload = json.dumps(query).encode('utf-8')
jws_token = jws.JWS(payload)
jws_token.add_signature(key, alg='EdDSA',
protected=json.dumps({"kid": "did:key:z6Mkh..."}))
return jws_token.serialize()
# Send signed request
signed_query = sign_query(query, private_key)
response = requests.post('http://localhost:8090/v1/fluree/query',
headers={'Content-Type': 'application/jose'},
data=signed_query)
Best Practices
1. Use EdDSA (Ed25519)
EdDSA provides:
- Excellent security (128-bit security level)
- Fast signing and verification
- Small signatures (64 bytes)
- Deterministic (no random number generation needed)
2. Include Expiration
Always set an expiration time:
{
"alg": "EdDSA",
"exp": 1642857600
}
3. Use Short Expiration Times
For interactive requests: 5-15 minutes For batch processes: 1-24 hours Never: No expiration
4. Rotate Keys Regularly
Rotate signing keys every 90-180 days.
5. Secure Key Storage
Use proper key management:
- Development: Encrypted local storage
- Production: HSM or KMS
6. Validate on Server
Never trust client-side validation alone. Fluree always validates signatures server-side.
7. Use HTTPS
Always use HTTPS with signed requests to prevent replay attacks.
8. Implement Nonce/JTI
Include a unique identifier to prevent replay:
{
"alg": "EdDSA",
"jti": "unique-request-id-12345"
}
Troubleshooting
“Invalid Signature” Error
Causes:
- Wrong private key used
- Payload modified after signing
- Incorrect base64url encoding
- Algorithm mismatch
Solution: Verify the signing process end-to-end.
“Key Not Found” Error
Causes:
- DID not registered with Fluree
- Incorrect key ID (kid) in header
- DID resolution failed
Solution: Register public key or check DID format.
“Signature Expired” Error
Causes:
- Request sent after expiration time
- Clock skew between client and server
Solution: Use NTP to sync clocks, increase expiration time.
Related Documentation
- Overview - API overview
- Endpoints - API endpoints
- Headers - HTTP headers
- Security - Policy and access control
- Verifiable Data - Verifiable credentials concepts
Errors and Status Codes
This document provides a complete reference for HTTP status codes and error responses in the Fluree API.
Error Response Format
fluree-server errors return a consistent JSON structure:
{
"error": "Human-readable error description",
"status": 400,
"@type": "err:db/BadRequest",
"cause": {
"error": "Optional nested cause",
"status": 400,
"@type": "err:db/JsonParse"
}
}
Fields:
error: Human-readable error message (primary diagnostic text)status: HTTP status code (numeric)@type: Compact error type IRI (stable, machine-readable category)cause: Optional nested cause chain (only present for select errors)
Stability note: clients (including the Fluree CLI) may pattern-match on substrings within the error field for targeted hints, so error messages should be stable across releases.
HTTP Status Codes
Success Codes (2xx)
200 OK
The request succeeded.
Used for:
- Successful queries
- Successful transactions
- Successful GET requests
Example:
{
"t": 5,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT5"
}
201 Created
A new resource was created.
Used for:
- Ledger creation
- Index creation
Example:
{
"ledger_id": "mydb:main",
"created": "2024-01-22T10:00:00.000Z"
}
204 No Content
Request succeeded with no response body.
Used for:
- DELETE operations
- Administrative commands
Client Error Codes (4xx)
400 Bad Request
The request is malformed or contains invalid data.
Common Causes:
- Invalid JSON syntax
- Invalid JSON-LD structure
- Invalid SPARQL syntax
- Invalid IRI format
- Type mismatch
Error typing:
The server includes a compact error type IRI in the @type field. This is the
preferred stable, machine-readable category for programmatic handling.
Example:
{
"error": "Invalid JSON: expected value at line 5, column 12",
"status": 400,
"@type": "err:db/JsonParse"
}
How to Fix:
- Validate JSON syntax
- Check IRI formats
- Verify JSON-LD structure
- Review the
errormessage and optionalcause
401 Unauthorized
Authentication is required but not provided or invalid.
Common Causes:
- Missing authentication credentials
- Invalid API key
- Expired JWT token
- Invalid signature (for signed requests)
Example:
{
"error": "Bearer token required",
"status": 401,
"@type": "err:db/Unauthorized"
}
How to Fix:
- Provide valid authentication credentials
- Check API key or token
- Renew expired tokens
- Verify signature process for signed requests
403 Forbidden
Authentication succeeded but authorization failed.
Common Causes:
- Insufficient permissions for operation
- Policy denies access
- Ledger access restricted
Example:
{
"error": "access denied (403)",
"status": 403,
"@type": "err:db/Forbidden"
}
How to Fix:
- Verify user has required permissions
- Check policy configuration
- Contact administrator for access
404 Not Found
The requested resource doesn’t exist.
Common Causes:
- Ledger doesn’t exist
- Entity not found
- Endpoint doesn’t exist
Example:
{
"error": "Ledger not found: mydb:main",
"status": 404,
"@type": "err:db/LedgerNotFound"
}
How to Fix:
- Verify ledger name spelling
- Check if ledger was created
- Verify entity IRI
408 Request Timeout
The request took too long to process.
Common Causes:
- Query timeout exceeded
- Complex query taking too long
- Database under heavy load
Example:
{
"error": "Query execution exceeded timeout",
"status": 408,
"@type": "err:db/Timeout"
}
How to Fix:
- Simplify query
- Add more specific filters
- Use LIMIT clause
- Increase timeout setting
- Check server load
409 Conflict
The request conflicts with current server state.
Common Causes:
- Concurrent modification conflict
- Ledger already exists
- Resource state conflict
Example:
{
"error": "Ledger already exists: mydb:main",
"status": 409,
"@type": "err:db/LedgerExists"
}
How to Fix:
- Use different ledger name
- Handle concurrent modifications with retry logic
- Check resource state before modifying
413 Payload Too Large
The request or response exceeds size limits.
Common Causes:
- Transaction too large
- Query result too large
- Request body exceeds limit
Example:
{
"error": "request body exceeds configured limit",
"status": 413,
"@type": "err:db/PayloadTooLarge"
}
How to Fix:
- Split large transactions into batches
- Use LIMIT clause for queries
- Use pagination for large result sets
- Increase size limits (if appropriate)
415 Unsupported Media Type
The Content-Type is not supported.
Common Causes:
- Wrong Content-Type header
- Unsupported format
- Missing Content-Type header
Example:
{
"error": "Content-Type not supported: text/plain",
"status": 415,
"@type": "err:db/UnsupportedMediaType"
}
How to Fix:
- Set correct Content-Type header
- Use supported format
- Check API documentation for supported types
422 Unprocessable Entity
The request is well-formed but semantically invalid.
Common Causes:
- Invalid data values
- Business rule violation
- Semantic constraint violation
Example:
{
"error": "semantic constraint violation",
"status": 422,
"@type": "err:db/ConstraintViolation"
}
How to Fix:
- Validate data before submitting
- Check business rules
- Review constraint requirements
429 Too Many Requests
Rate limit exceeded.
Common Causes:
- Too many requests in time window
- Exceeded quota
Example:
{
"error": "rate limit exceeded",
"status": 429,
"@type": "err:db/RateLimited"
}
Response Headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1642857645
Retry-After: 45
How to Fix:
- Wait before retrying (check Retry-After header)
- Implement exponential backoff
- Reduce request rate
- Request higher rate limit
Server Error Codes (5xx)
500 Internal Server Error
An unexpected error occurred on the server.
Common Causes:
- Unhandled exception
- Database error
- Internal logic error
Example:
{
"error": "internal error",
"status": 500,
"@type": "err:db/Internal"
}
How to Fix:
- Check server logs
- Report to system administrator
- Retry request
- Contact support if persists
502 Bad Gateway
Error communicating with upstream service.
Common Causes:
- Storage backend unavailable
- Nameservice unavailable
- Network error
Example:
{
"error": "upstream service error",
"status": 502,
"@type": "err:db/BadGateway"
}
How to Fix:
- Check storage backend status
- Verify network connectivity
- Check AWS/cloud service status
- Retry with backoff
503 Service Unavailable
The server is temporarily unavailable.
Common Causes:
- Server overloaded
- Maintenance mode
- Resource exhaustion
Example:
{
"error": "service unavailable",
"status": 503,
"@type": "err:db/ServiceUnavailable"
}
Response Headers:
Retry-After: 300
How to Fix:
- Wait and retry (check Retry-After header)
- Implement retry logic with exponential backoff
- Check service status page
504 Gateway Timeout
Upstream service didn’t respond in time.
Common Causes:
- Storage backend timeout
- Long-running query
- Network latency
Example:
{
"error": "gateway timeout",
"status": 504,
"@type": "err:db/GatewayTimeout"
}
How to Fix:
- Retry request
- Check storage backend performance
- Simplify query
- Increase timeout settings
Error Handling Best Practices
1. Always Check Status Codes
Check HTTP status before parsing response:
const response = await fetch(url, options);
if (!response.ok) {
const err = await response.json();
// err.error is the primary human-readable message, err["@type"] is the stable category.
throw new Error(`${err["@type"] || "err:unknown"}: ${err.error}`);
}
2. Implement Retry Logic
Retry transient errors with exponential backoff:
async function retryRequest(fn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (err) {
if (!isRetryable(err) || i === maxRetries - 1) {
throw err;
}
await sleep(Math.pow(2, i) * 1000);
}
}
}
function isRetryable(err) {
return [408, 429, 502, 503, 504].includes(err.status);
}
3. Handle Rate Limits
Respect rate limit headers:
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
await sleep(retryAfter * 1000);
return retryRequest(fn);
}
4. Log Error Details
Log complete error context for debugging:
console.error({
status: response.status,
error: errorData.error,
error_type: errorData["@type"],
cause: errorData.cause,
requestId: response.headers.get('X-Request-ID')
});
5. User-Friendly Messages
Show appropriate messages to users:
function getUserMessage(error) {
switch (error["@type"]) {
case 'err:db/LedgerNotFound':
return 'Database not found. Please check the name.';
case 'err:db/Timeout':
return 'Query took too long. Please try a simpler query.';
case 'err:db/RateLimited':
return 'Too many requests. Please wait a moment.';
default:
return 'An error occurred. Please try again.';
}
}
6. Graceful Degradation
Handle errors gracefully:
try {
const data = await query(ledger);
return data;
} catch (err) {
if (err["@type"] === 'err:db/LedgerNotFound') {
// Create ledger and retry
await createLedger(ledger);
return await query(ledger);
}
throw err;
}
7. Circuit Breaker Pattern
Prevent cascading failures:
class CircuitBreaker {
constructor(threshold = 5, timeout = 60000) {
this.failures = 0;
this.threshold = threshold;
this.timeout = timeout;
this.state = 'CLOSED';
}
async execute(fn) {
if (this.state === 'OPEN') {
throw new Error('Circuit breaker is OPEN');
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
throw err;
}
}
onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = 'OPEN';
setTimeout(() => {
this.state = 'HALF_OPEN';
this.failures = 0;
}, this.timeout);
}
}
}
Related Documentation
- Overview - API overview
- Endpoints - API endpoints
- Signed Requests - Authentication
- Troubleshooting - General troubleshooting
- Common Errors - Common error solutions
Query
Fluree supports two powerful query languages for querying graph data: JSON-LD Query (Fluree’s native query language) and SPARQL (the W3C standard). Both languages provide access to Fluree’s unique features including time travel, graph sources, and policy enforcement.
Query Languages
JSON-LD Query
Fluree’s native query language that uses JSON-LD syntax. JSON-LD Query provides a natural, JSON-based interface for querying graph data, making it easy to integrate with modern applications.
Key Features:
- JSON-based syntax (no string parsing)
- Full support for time travel (
@t:,@iso:,@commit:) - Graph source integration
- Policy enforcement
- History queries
SPARQL
Industry-standard SPARQL 1.1 query language. Fluree provides full SPARQL support, enabling compatibility with existing RDF tools and knowledge graphs.
Key Features:
- W3C SPARQL 1.1 compliant
- FROM and FROM NAMED clauses
- CONSTRUCT queries
- Time travel support (planned)
- Standard SPARQL functions
Query Features
Output Formats
Fluree supports multiple output formats for query results:
- JSON-LD: Compact, context-aware JSON with IRI expansion/compaction
- SPARQL JSON: Standard SPARQL result format
- Typed JSON: Type-preserving JSON with datatype information
Datasets and Multi-Graph Execution
Query across multiple graphs and ledgers:
- FROM clauses: Specify default graphs
- FROM NAMED: Query named graphs
- Multi-ledger queries: Query across different ledgers
- Time-aware datasets: Query graphs at different time points
CONSTRUCT Queries
Generate RDF graphs from query results:
- Transform query results into RDF
- Create new graph structures
- Extract subgraphs
Graph Crawl
Traverse graph relationships:
- Follow links between entities
- Recursive graph traversal
- Depth-limited crawling
Explain Plans
Understand query execution:
- View query plans
- Analyze index usage
- Optimize query performance
Tracking and Fuel Limits
Monitor and control query execution:
- Query tracking and debugging
- Fuel limits for resource control
- Performance monitoring
Nameservice Queries
Query metadata about all ledgers and graph sources in the system. The nameservice stores information about every database including commit state, index state, and configuration.
JSON-LD Query:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"select": ["?ledger", "?t"],
"where": [
{ "@id": "?ns", "@type": "f:LedgerSource", "f:ledger": "?ledger", "f:t": "?t" }
]
}
SPARQL:
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?ledger ?t WHERE { ?ns a f:LedgerSource ; f:ledger ?ledger ; f:t ?t }
See the Ledgers and Nameservice concept documentation for details.
Time Travel in Queries
Fluree supports querying historical data using time specifiers in ledger references:
Transaction Number:
ledger:main@t:100
ISO 8601 Timestamp:
ledger:main@iso:2024-01-15T10:30:00Z
Commit ContentId:
ledger:main@commit:bafybeig...
See the Time Travel concept documentation for details.
Graph Source Queries
Query graph sources (BM25, Vector, Iceberg, R2RML) using the same syntax as regular ledgers:
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"from": "products:main",
"select": ["?product"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product" }
}
]
}
See the Graph Sources concept documentation for details.
Policy Enforcement
Policies are automatically enforced during query execution, ensuring users only see data they’re authorized to access. No special syntax is required—policies are applied transparently.
See the Policy Enforcement concept documentation for details.
Getting Started
Basic JSON-LD Query
{
"@context": {
"ex": "http://example.org/ns/"
},
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Basic SPARQL Query
PREFIX ex: <http://example.org/ns/>
SELECT ?name
WHERE {
?person ex:name ?name .
}
Query with Time Travel
{
"@context": {
"ex": "http://example.org/ns/"
},
"from": "ledger:main@t:100",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Query Performance
Fluree’s query engine is optimized for:
- Automatic Join Ordering: The planner reorders all WHERE-clause patterns (triples, UNION, OPTIONAL, MINUS, search patterns, and more) using statistics-driven cardinality estimates. When database statistics are available, it uses HLL-derived property counts; otherwise it falls back to heuristic constants. Estimates are context-aware — the planner tracks which variables are already bound and adjusts costs accordingly, so a triple whose subject is bound from an earlier pattern is scored as a cheap per-subject lookup rather than a full scan.
- Index Selection: Automatically chooses optimal indexes (SPOT, POST, OPST, PSOT) based on which triple components are bound.
- Filter Optimization: Filters are automatically applied as soon as their required variables are bound, regardless of where they appear in the query. Range-safe filters are pushed down to index scans, and filters are evaluated inline during joins when possible.
- Streaming Execution: Results stream as they’re computed
- Parallel Processing: Parallel execution where possible
Best Practices
- Use Appropriate Indexes: Structure queries to leverage indexes
- Limit Result Sets: Use LIMIT clauses for large result sets
- Time Travel Efficiency: Use
@t:when transaction numbers are known - Graph Source Selection: Choose appropriate graph sources for query patterns
- Policy Awareness: Understand how policies affect query results
Related Documentation
- Concepts: Core concepts including time travel, graph sources, and policy
- Transactions: Writing data to Fluree
- Security and Policy: Policy configuration and management
JSON-LD Query
JSON-LD Query is Fluree’s native query language, providing a JSON-based interface for querying graph data. It combines the expressiveness of SPARQL with the convenience of JSON, making it easy to integrate with modern applications.
Overview
JSON-LD Query uses JSON-LD syntax to express queries, leveraging @context for IRI expansion and compaction. Queries are structured as JSON objects with familiar clauses like select, where, from, etc.
Basic Query Structure
{
"@context": {
"ex": "http://example.org/ns/"
},
"select": ["?name", "?age"],
"where": [
{ "@id": "?person", "ex:name": "?name", "ex:age": "?age" }
]
}
Query Clauses
@context
The @context defines namespace mappings for IRI expansion/compaction:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/",
"foaf": "http://xmlns.com/foaf/0.1/"
}
}
When querying via the CLI, omitting @context causes the ledger’s default context to be injected automatically. The HTTP API defaults this behavior off; pass ?default-context=true to opt in for a request. To opt out explicitly, pass an empty object: "@context": {}. See opting out of the default context.
Note: When using
fluree-db-apidirectly (embedded),@contextis not injected automatically. Queries must supply their own context or use full IRIs. Usedb_with_default_context()orGraphDb::with_default_context()to opt in.
select
Specifies what to return in results. The shape of select determines the shape of each output row.
Bare variable — one column, each row is the bound value (not wrapped in an array):
{
"select": "?name"
}
Variable list — each row is [v1, v2, ...]:
{
"select": ["?name", "?age"]
}
Wildcard — every variable bound in the WHERE clause:
{
"select": "*"
}
Subject expansion — return a nested JSON-LD object instead of a flat row. The key is either a variable (the WHERE clause binds it to subjects) or an IRI constant (the named subject is expanded directly, no WHERE needed):
{
"select": { "?person": ["*", { "schema:knows": ["@id", "schema:name"] }] },
"where": { "@id": "?person", "@type": "schema:Person" }
}
{
"select": { "ex:alice": ["*"] }
}
The array value is the selection spec — "*" for all forward properties, individual property names ("schema:name"), or nested object forms for sub-selections. Add "depth": N at the query top level to bound auto-expansion of unselected references.
Mixed array — combine flat variables and subject expansions in one row, in any order. Each object is an independent expansion with its own root and selection spec:
{
"select": [
"?age",
{ "?person": ["@id", "schema:name"] },
{ "?org": ["@id", "schema:name"] }
],
"where": {
"@id": "?person",
"ex:age": "?age",
"ex:worksFor": "?org"
}
}
Each row is [age, expanded_person, expanded_org]. When every column is an IRI-constant expansion (no variable dependency anywhere in select), the output is independent of the WHERE solution count: the formatter emits one row regardless of how many solutions the WHERE produced.
S-expression columns — a select item that is a string starting with ( is an S-expression, in two flavors:
Aggregates. Auto-aliased (?count, ?sum, etc.) or with an explicit alias via (as ...):
{
"select": ["?category", "(count ?product)"],
"groupBy": ["?category"]
}
{
"select": ["?category", "(as (count ?product) ?total)"],
"groupBy": ["?category"]
}
Scalar expressions (COALESCE, IF, arithmetic, string/hash/date functions, …). Always require an explicit alias via (as <expr> ?alias). Mirrors SPARQL SELECT (expr AS ?alias):
{
"select": ["?p", "(as (coalesce ?titleFr ?titleEn \"untitled\") ?title)"]
}
{
"select": [
"?name",
"(as (coalesce ?email \"no-email\") ?contact)",
"(as (count ?favNums) ?count)"
],
"groupBy": ["?name", "?contact"]
}
Scalar select expressions desugar to a bind in the WHERE pattern list. If the expression references an aggregate’s output variable (e.g. (as (+ ?count 1) ?adjusted)) the bind runs after aggregation; otherwise it runs before, so the alias is also a valid groupBy key.
The same expression language is shared with bind and filter. The one exception is in / not-in, which require the bracketed-list form and are not accepted in select expressions — rewrite as (or (= ?x 1) (= ?x 2) …) instead.
ask
Tests whether a set of patterns has any solution, returning true or false. No variables are projected. Equivalent to SPARQL ASK. The value of ask is the where clause itself — an array or object of the same patterns accepted by where:
{
"@context": { "ex": "http://example.org/ns/" },
"ask": [
{ "@id": "?person", "ex:name": "Alice" }
]
}
Single-pattern shorthand (object instead of array):
{
"@context": { "ex": "http://example.org/ns/" },
"ask": { "@id": "?person", "ex:name": "Alice" }
}
Returns true if at least one solution exists, false otherwise. Internally, LIMIT 1 is applied for efficiency.
from
Specifies which ledger(s) to query:
Single Ledger:
{
"from": "mydb:main"
}
Multiple Ledgers:
{
"from": ["mydb:main", "otherdb:main"]
}
Time Travel:
{
"from": "mydb:main@t:100"
}
{
"from": "mydb:main@iso:2024-01-15T10:30:00Z"
}
{
"from": "mydb:main@commit:bafybeig..."
}
where
The where clause contains query patterns:
Basic Pattern:
{
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Multiple Patterns:
{
"where": [
{ "@id": "?person", "ex:name": "?name" },
{ "@id": "?person", "ex:age": "?age" }
]
}
Type Pattern:
{
"where": [
{ "@id": "?person", "@type": "ex:User", "ex:name": "?name" }
]
}
Pattern Types
Object Patterns
Match triples where subject, predicate, and object are specified:
{
"@id": "ex:alice",
"ex:name": "Alice"
}
Variable Patterns
Use variables (starting with ?) to match unknown values:
{
"@id": "?person",
"ex:name": "?name"
}
Type Patterns
Match entities by type:
{
"@id": "?person",
"@type": "ex:User",
"ex:name": "?name"
}
Property Join Patterns
Match multiple properties of the same subject:
{
"@id": "?person",
"ex:name": "?name",
"ex:age": "?age",
"ex:email": "?email"
}
Advanced Patterns
Optional Patterns
Match optional data that may not exist:
{
"where": [
{ "@id": "?person", "ex:name": "?name" },
["optional", { "@id": "?person", "ex:email": "?email" }]
]
}
Sibling vs. grouped OPTIONAL — semantics
The two forms below are not equivalent. Each ["optional", ...] array is a
single OPTIONAL block in SPARQL terms — every item inside is part of the same
conjunctive group, and a row is null-extended only when the group as a whole
fails to match. To express two independent left joins, write two sibling
arrays.
Sibling OPTIONALs — two independent left joins:
{
"where": [
{ "@id": "?person", "ex:name": "?name" },
["optional", { "@id": "?person", "ex:email": "?email" }],
["optional", { "@id": "?person", "ex:phone": "?phone" }]
]
}
Equivalent SPARQL:
?person ex:name ?name .
OPTIONAL { ?person ex:email ?email }
OPTIONAL { ?person ex:phone ?phone }
?email and ?phone are independent — a person with only an email keeps
?email bound and gets null for ?phone, and vice versa.
Grouped OPTIONAL — one conjunctive left join:
{
"where": [
{ "@id": "?person", "ex:name": "?name" },
["optional",
{ "@id": "?person", "ex:email": "?email" },
{ "@id": "?person", "ex:phone": "?phone" }
]
]
}
Equivalent SPARQL:
?person ex:name ?name .
OPTIONAL { ?person ex:email ?email . ?person ex:phone ?phone }
?email and ?phone are bound together — a person who has an email but no
phone is null-extended on both variables, because the inner conjunctive
group did not match as a whole.
Filters and binds inside OPTIONAL
filter and bind constrain or compute from existing bindings, so they need
something to anchor to inside the OPTIONAL block. Any binding-producing
pattern qualifies as an anchor — a node-map, values, an earlier bind, a
nested optional, or a sub-query. A filter or bind as the very first
item in an OPTIONAL array is rejected.
["optional",
{ "@id": "?person", "ex:age": "?age" },
["filter", "(> ?age 18)"]
]
["optional",
["values", ["?x", [1, 2, 3]]],
["filter", "(> ?x 0)"]
]
Union Patterns
Match data from multiple alternative patterns:
{
"where": [
["union",
{ "@id": "?person", "ex:name": "?name" },
{ "@id": "?person", "ex:alias": "?name" }
]
]
}
Graph Patterns
Scope patterns to a named graph:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "mydb:main",
"fromNamed": {
"products": {
"@id": "mydb:main",
"@graph": "http://example.org/graphs/products"
}
},
"select": ["?product", "?name"],
"where": [
["graph", "products", { "@id": "?product", "ex:name": "?name" }]
]
}
Notes:
fromNamedis an object whose keys are dataset-local aliases. Each value is an object with@id(ledger reference) and optional@graph(graph selector IRI).- The second element of
["graph", ...]can be a dataset-local alias (recommended) or a graph IRI. - The legacy
"from-named": [...]array format is still accepted for backward compatibility. - For dataset and named-graph configuration details, see
docs/query/datasets.md.
Filter Patterns
Apply conditions to filter results:
Single Filter:
{
"where": [
{ "@id": "?person", "ex:age": "?age" },
["filter", "(> ?age 18)"]
]
}
Multiple Filters:
{
"where": [
{ "@id": "?person", "ex:age": "?age", "ex:name": "?name" },
["filter", "(> ?age 18)", "(strStarts ?name \"A\")"]
]
}
Complex Filters:
{
"where": [
{ "@id": "?person", "ex:age": "?age", "ex:last": "?last" },
["filter", "(and (> ?age 45) (strEnds ?last \"ith\"))"]
]
}
Bind Patterns
Compute values and bind to variables:
{
"where": [
{ "@id": "?person", "ex:age": "?age" },
["bind", "?nextAge", "(+ ?age 1)"]
]
}
Values Patterns
Provide initial bindings:
{
"where": [
["values", "?name", ["Alice", "Bob", "Carol"]],
{ "@id": "?person", "ex:name": "?name" }
]
}
Property Paths
Property paths enable transitive traversal of predicates, following chains of relationships across multiple hops. Define a path alias in @context using @path, then use the alias as a key in WHERE node-maps.
Defining a Path Alias:
Add a term definition with @path to your @context. The value of @path can be a string (SPARQL property path syntax) or an array (S-expression form).
String Form (SPARQL syntax):
{
"@context": {
"ex": "http://example.org/",
"knowsPlus": { "@path": "ex:knows+" }
},
"select": ["?who"],
"where": [
{ "@id": "ex:alice", "knowsPlus": "?who" }
]
}
This returns all entities reachable from ex:alice by following one or more ex:knows edges transitively.
Array Form (S-expression):
{
"@context": {
"ex": "http://example.org/",
"knowsPlus": { "@path": ["+", "ex:knows"] }
},
"select": ["?who"],
"where": [
{ "@id": "ex:alice", "knowsPlus": "?who" }
]
}
The array form uses the operator as the first element followed by its operands.
Supported Operators:
| Operator | String syntax | Array syntax | Description |
|---|---|---|---|
| One or more | ex:p+ | ["+", "ex:p"] | Transitive closure (1+ hops) |
| Zero or more | ex:p* | ["*", "ex:p"] | Reflexive transitive closure (0+ hops) |
| Inverse | ^ex:p | ["^", "ex:p"] | Traverse predicate in reverse direction |
| Alternative | ex:a|ex:b | [“|”, “ex:a”, “ex:b”] | Match any of several predicates |
| Sequence | ex:a/ex:b | ["/", "ex:a", "ex:b"] | Follow a chain of predicates (property chain) |
Zero-or-more (*) includes the starting node itself in the results (zero hops).
Sequence (/) compiles into a chain of triple patterns joined by internal
intermediate variables. Each step must be a simple predicate or an inverse simple
predicate (^ex:p). For example, "ex:friend/ex:name" matches paths where
subject has a ex:friend whose ex:name is the result.
Parsed but Not Yet Supported:
The following operators are recognized by the parser but currently rejected (not yet supported for execution):
| Operator | String syntax | Array syntax |
|---|---|---|
| Zero or one | ex:p? | ["?", "ex:p"] |
Subject and Object Variables:
Path aliases work with variables on either side:
{
"@context": {
"ex": "http://example.org/",
"knowsPlus": { "@path": "ex:knows+" }
},
"select": ["?x", "?y"],
"where": [
{ "@id": "?x", "knowsPlus": "?y" }
]
}
This returns all pairs (?x, ?y) where ?y is transitively reachable from ?x via ex:knows.
Fixed Subject or Object:
You can also fix one end to an IRI:
{
"@context": {
"ex": "http://example.org/",
"knowsPlus": { "@path": "ex:knows+" }
},
"select": ["?who"],
"where": [
{ "@id": "?who", "knowsPlus": { "@id": "ex:bob" } }
]
}
This finds all entities that can reach ex:bob through one or more ex:knows hops.
Inverse Example:
Find entities that know ex:bob (traverse ex:knows in reverse):
{
"@context": {
"ex": "http://example.org/",
"knownBy": { "@path": "^ex:knows" }
},
"select": ["?who"],
"where": [
{ "@id": "ex:bob", "knownBy": "?who" }
]
}
Alternative Example:
Match entities connected by either ex:knows or ex:likes:
{
"@context": {
"ex": "http://example.org/",
"connected": { "@path": "ex:knows|ex:likes" }
},
"select": ["?who"],
"where": [
{ "@id": "ex:alice", "connected": "?who" }
]
}
Inverse can also be applied to complex paths (sequences and alternatives):
^(ex:friend/ex:name)— inverse of a sequence: reverses the step order and inverts each step, producing(^ex:name)/(^ex:friend)^(ex:name|ex:nick)— inverse of an alternative: distributes the inverse into each branch, producing(^ex:name)|(^ex:nick)- Double inverse cancels:
^(^ex:p)simplifies toex:p
Array form examples:
{ "@path": ["^", ["/", "ex:friend", "ex:name"]] }
{ "@path": ["^", ["|", "ex:name", "ex:nick"]] }
Inverse is supported inside alternative branches (e.g. ex:knows|^ex:knows matches both directions of the ex:knows predicate).
Alternative branches can also be sequence chains. For example, ex:friend/ex:name|ex:colleague/ex:name returns the name of a friend OR the name of a colleague:
{
"@context": {
"ex": "http://example.org/",
"contactName": { "@path": "ex:friend/ex:name|ex:colleague/ex:name" }
},
"select": ["?name"],
"where": [
{ "@id": "ex:alice", "contactName": "?name" }
]
}
Branches can freely mix simple predicates, inverse predicates, and sequence chains (e.g. ex:name|ex:friend/ex:name|^ex:colleague).
Alternative uses UNION semantics (bag, not set): when multiple branches match the same (subject, object) pair, duplicate solutions are produced. Use selectDistinct if set semantics are needed.
Sequence (Property Chain) Example:
Follow a chain of predicates. The string form uses / to separate steps:
{
"@context": {
"ex": "http://example.org/",
"friendName": { "@path": "ex:friend/ex:name" }
},
"select": ["?person", "?name"],
"where": [
{ "@id": "?person", "friendName": "?name" }
]
}
The array form uses "/" as the operator:
{ "@path": ["/", "ex:friend", "ex:name"] }
Sequence steps can include inverse predicates. For example, "^ex:parent/ex:name" traverses the ex:parent link backwards, then follows ex:name:
{ "@path": "^ex:parent/ex:name" }
Longer chains are supported: "ex:friend/ex:address/ex:city" follows three hops.
Sequence steps can also be alternatives. For example, "ex:friend/(ex:name|ex:nick)" distributes the alternative into a union of chains (ex:friend/ex:name and ex:friend/ex:nick):
{ "@path": "ex:friend/(ex:name|ex:nick)" }
Array form:
{ "@path": ["/", "ex:friend", ["|", "ex:name", "ex:nick"]] }
Multiple alternative steps are supported: "(ex:a|ex:b)/(ex:c|ex:d)" expands to 4 chains. A safety limit of 64 expanded chains is enforced to prevent combinatorial explosion.
Each step must be a simple predicate (ex:p), inverse simple predicate (^ex:p), or an alternative of simple predicates ((ex:a|ex:b)). Transitive (+/*) and nested sequence modifiers are not allowed inside sequence steps.
Rules:
@pathand@reverseare mutually exclusive on the same term definition (produces an error).@pathand@idmay coexist on the same term definition; when the alias key appears in a WHERE node-map, the@pathdefinition is used.- Cycle detection is built in: transitive traversal terminates when it encounters a node already visited.
- Variable names starting with
?__are reserved for internal use (e.g., intermediate join variables generated by sequence paths). These variables will not appear in wildcard (select: "*") output.
Filter Functions
Comparison Functions
Comparison operators accept two or more arguments. With multiple arguments, they chain pairwise: (< ?a ?b ?c) means ?a < ?b AND ?b < ?c.
(= ?x ?y ...)- Equality(!= ?x ?y ...)- Inequality(> ?x ?y ...)- Greater than(>= ?x ?y ...)- Greater than or equal(< ?x ?y ...)- Less than(<= ?x ?y ...)- Less than or equal
When comparing incomparable types (e.g., a number and a string):
=yieldsfalse— values of different types are not equal!=yieldstrue— values of different types are not equal<,<=,>,>=raise an error — ordering between incompatible types is undefined
Logical Functions
(and ...)- Logical AND(or ...)- Logical OR(not ...)- Logical NOT
String Functions
(strStarts ?str ?prefix)- String starts with(strEnds ?str ?suffix)- String ends with(contains ?str ?substr)- String contains(regex ?str ?pattern)- Regular expression match
Numeric Functions
Arithmetic operators accept two or more arguments. With multiple arguments, they fold left: (+ ?x ?y ?z) evaluates as (?x + ?y) + ?z. A single argument returns the value unchanged.
(+ ?x ?y ...)- Addition(- ?x ?y ...)- Subtraction(* ?x ?y ...)- Multiplication(/ ?x ?y ...)- Division(- ?x)- Unary negation (single argument)(abs ?x)- Absolute value
Vector Similarity Functions
Used with bind to compute similarity scores between @vector values:
(dotProduct ?vec1 ?vec2)- Dot product (inner product)(cosineSimilarity ?vec1 ?vec2)- Cosine similarity (-1 to 1)(euclideanDistance ?vec1 ?vec2)- Euclidean (L2) distance
Function names are case-insensitive. See Vector Search for usage examples.
Type Functions
(bound ?var)- Variable is bound(isIRI ?x)- Is an IRI(isBlank ?x)- Is a blank node(isLiteral ?x)- Is a literal
Query Modifiers
orderBy
Sort results:
{
"orderBy": ["?name"]
}
Descending Order:
{
"orderBy": [["desc", "?age"]]
}
Multiple Sort Keys:
{
"orderBy": ["?last", ["desc", "?age"]]
}
limit
Limit number of results:
{
"limit": 10
}
offset
Skip results:
{
"offset": 20,
"limit": 10
}
groupBy
Group results:
{
"select": ["?category", "(count ?product)"],
"groupBy": ["?category"],
"where": [
{ "@id": "?product", "ex:category": "?category" }
]
}
having
Filter grouped results:
{
"select": ["?category", "(count ?product)"],
"groupBy": ["?category"],
"having": [["filter", "(> (count ?product) 10)"]],
"where": [
{ "@id": "?product", "ex:category": "?category" }
]
}
Aggregation Functions
(count ?var)/(count *)— count non-null values;*counts solution rows(count-distinct ?var)— count distinct non-null values(sum ?var)— sum numeric values(avg ?var)— average numeric values(min ?var)/(max ?var)— extremum(median ?var)— median(variance ?var)/(stddev ?var)— population variance / standard deviation(sample ?var)— implementation-defined sample value(groupconcat ?var)/(groupconcat ?var ", ")— concatenate string values, optional separator (defaults to a single space)
Each aggregate auto-aliases to ?<fn-name> (?count, ?sum, …). Use (as (<fn> ?var) ?alias) to choose an explicit alias.
Time Travel Queries
Query historical data using time specifiers in from:
Transaction Number:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:100",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
ISO Timestamp:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@iso:2024-01-15T10:30:00Z",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Commit ContentId:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@commit:bafybeig...",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
Multiple Ledgers at Different Times:
{
"@context": { "ex": "http://example.org/ns/" },
"from": ["ledger1:main@t:100", "ledger2:main@t:200"],
"select": ["?data"],
"where": [
{ "@id": "?entity", "ex:data": "?data" }
]
}
History Queries
History queries let you see all changes (assertions and retractions) within a time range. Specify the range using from and to keys with time-specced endpoints:
Time Range Syntax
{
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest"
}
Binding Transaction Metadata
Use @t and @op annotations on value objects to capture metadata:
- @t - Binds the transaction time (integer) when the fact was asserted/retracted.
- @op - Binds the operation type as a boolean:
truefor assertions,falsefor retractions. (MirrorsFlake.opon disk; constants"assert"/"retract"are not accepted — usetrue/false.)
Both annotations work uniformly for literal-valued and IRI-valued objects.
Entity History:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?name", "?age", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } },
{ "@id": "ex:alice", "ex:age": "?age" }
],
"orderBy": "?t"
}
Property-Specific History:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:100",
"select": ["?age", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:age": { "@value": "?age", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
Time Range with ISO:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@iso:2024-01-01T00:00:00Z",
"to": "ledger:main@iso:2024-12-31T23:59:59Z",
"select": ["?name", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
]
}
Filter by Operation:
You can either use a constant @op shorthand (preferred) or filter on the bound variable:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?name", "?t"],
"where": [
{ "@id": "ex:alice", "ex:name": { "@value": "?name", "@t": "?t", "@op": false } }
]
}
The shorthand "@op": false lowers to FILTER(op(?name) = false). Equivalent long form using a bound variable: "@op": "?op" plus ["filter", "(= ?op false)"].
All Properties History:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "ledger:main@t:1",
"to": "ledger:main@t:latest",
"select": ["?property", "?value", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "?property": { "@value": "?value", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}
Graph Source Queries
Query graph sources using the same syntax:
BM25 Search:
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"from": "products:main@t:1000",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Vector Similarity:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "documents:main",
"select": ["?document", "?similarity"],
"values": [
["?queryVec"],
[{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
],
"where": [
{
"f:graphSource": "documents-vector:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 5,
"f:searchResult": { "f:resultId": "?document", "f:resultScore": "?similarity" }
}
],
"orderBy": [["desc", "?similarity"]],
"limit": 5
}
Note: f:* keys used for graph source queries should be defined in your @context for clarity.
Complete Examples
Simple Select Query
{
"@context": {
"ex": "http://example.org/ns/"
},
"select": ["?name", "?age"],
"where": [
{
"@id": "?person",
"@type": "ex:User",
"ex:name": "?name",
"ex:age": "?age"
},
["filter", "(> ?age 18)"]
],
"orderBy": ["?name"],
"limit": 10
}
Complex Query with Joins
{
"@context": {
"ex": "http://example.org/ns/"
},
"select": ["?person", "?friend", "?friendName"],
"where": [
{ "@id": "?person", "ex:name": "?name" },
{ "@id": "?person", "ex:friend": "?friend" },
{ "@id": "?friend", "ex:name": "?friendName" },
["filter", "(= ?name \"Alice\")"]
]
}
Aggregation Query
{
"@context": {
"ex": "http://example.org/ns/"
},
"select": [
"?category",
"(as (count ?product) ?count)",
"(as (avg ?price) ?avgPrice)"
],
"groupBy": ["?category"],
"having": [["filter", "(> (count ?product) 5)"]],
"where": [
{ "@id": "?product", "ex:category": "?category", "ex:price": "?price" }
],
"orderBy": [["desc", "?count"]]
}
Parse Options
JSON-LD queries accept parse-time options under a top-level opts object. These control how the query is parsed (not what it returns).
strictCompactIri
By default, JSON-LD queries reject unresolved compact-looking IRIs (prefix:suffix where the prefix is not in @context) at parse time. To opt out:
{
"@context": {"ex": "http://example.org/ns/"},
"opts": {"strictCompactIri": false},
"select": ["?id", "?name"],
"where": {"@id": "?id", "ex:name": "?name"}
}
The default is true. Disable only when you are intentionally working with bare prefix:suffix strings as opaque identifiers. See IRIs and @context — Strict Compact-IRI Guard for the full policy.
Best Practices
- Always Provide @context: Makes queries readable and maintainable
- Use Specific Patterns: More specific patterns are more efficient
- Limit Result Sets: Use
limitfor large result sets - Flexible Filter Placement: Filters can be placed anywhere in
whereclauses - the query engine automatically applies each filter as soon as all its required variables are bound - Use Time Specifiers: Use
@t:when transaction numbers are known (fastest) - Graph Source Selection: Choose appropriate graph sources for query patterns
Related Documentation
- SPARQL: SPARQL query language
- Time Travel: Historical queries
- Graph Sources: Graph source queries
- Output Formats: Query result formats
- IRIs and @context: IRI resolution and the strict compact-IRI guard
SPARQL
Fluree provides full support for SPARQL 1.1, the W3C standard query language for RDF. SPARQL enables compatibility with existing RDF tools, knowledge graphs, and semantic web applications.
Overview
SPARQL (SPARQL Protocol and RDF Query Language) is the industry standard for querying RDF data. Fluree implements SPARQL 1.1, providing full compatibility with SPARQL endpoints and tools.
Basic SPARQL Query
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?age
WHERE {
?person ex:name ?name .
?person ex:age ?age .
}
Default Prefixes
When querying via the CLI, a ledger’s default context prefix mappings are injected into SPARQL queries that have no explicit PREFIX declarations. The HTTP API defaults this behavior off; pass ?default-context=true on ledger-scoped query requests to opt in. For example, if the default context includes {"ex": "http://example.org/ns/"}, this query works without a PREFIX line when default-context injection is enabled:
SELECT ?name ?age
WHERE {
?person ex:name ?name .
?person ex:age ?age .
}
If a query includes any PREFIX declarations, the default context is not used — you must declare every prefix you need. To explicitly opt out of the default context without defining any real prefix, use PREFIX : <>. See opting out of the default context for details.
Note: When using
fluree-db-apidirectly (embedded), queries must declare their ownPREFIXdeclarations. The default context is not injected automatically by the core API. Usedb_with_default_context()orGraphDb::with_default_context()to opt in. See Default Context for details.
You can view and manage the default context with fluree context get/set or GET/PUT /v1/fluree/context/{ledger...}.
Query Forms
SELECT Queries
Return variable bindings:
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?email
WHERE {
?person ex:name ?name .
?person ex:email ?email .
}
DISTINCT Results:
SELECT DISTINCT ?name
WHERE {
?person ex:name ?name .
}
Reduced Results:
SELECT REDUCED ?name
WHERE {
?person ex:name ?name .
}
CONSTRUCT Queries
Generate RDF graphs from query results:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?person ex:displayName ?name .
}
WHERE {
?person ex:name ?name .
}
See CONSTRUCT Queries for details.
ASK Queries
Return boolean indicating if query matches:
PREFIX ex: <http://example.org/ns/>
ASK {
?person ex:name "Alice" .
}
DESCRIBE Queries
Return RDF description of resources:
PREFIX ex: <http://example.org/ns/>
DESCRIBE ex:alice
Fluree’s DESCRIBE returns outgoing triples for each described resource (equivalent to CONSTRUCT { ?r ?p ?o } WHERE { VALUES ?r { ... } . ?r ?p ?o }).
Basic Graph Patterns
Triple Patterns
Match RDF triples:
?person ex:name ?name .
Multiple Patterns
Combine patterns with AND semantics:
?person ex:name ?name .
?person ex:age ?age .
?person ex:email ?email .
Property Paths
SPARQL property paths allow complex traversal patterns in the predicate position of a triple pattern.
Supported Operators
| Syntax | Name | Description |
|---|---|---|
p+ | One or more | Transitive closure (follows p one or more hops) |
p* | Zero or more | Reflexive transitive closure (includes self) |
^p | Inverse | Traverses p in reverse direction |
p|q | Alternative | Matches either p or q (UNION semantics) |
p/q | Sequence | Follows p then q (property chain) |
One or More (+):
?person ex:parent+ ?ancestor .
Zero or More (*):
?person ex:parent* ?ancestorOrSelf .
Inverse (^):
?child ^ex:parent ?parent .
This is equivalent to ?parent ex:parent ?child — it reverses the traversal direction.
Inverse can also be applied to complex paths (sequences and alternatives):
?s ^(ex:friend/ex:name) ?o . -- inverse of a sequence
?s ^(ex:name|ex:nick) ?o . -- inverse of an alternative
^(ex:friend/ex:name)reverses the step order and inverts each step:(^ex:name)/(^ex:friend)^(ex:name|ex:nick)distributes inverse into each branch:(^ex:name)|(^ex:nick)- Double inverse cancels:
^(^ex:p)simplifies toex:p
Alternative (|):
?person ex:friend|ex:colleague ?related .
This produces UNION semantics: results from both ex:friend and ex:colleague are combined (bag semantics, so duplicates are preserved).
Three-way and inverse alternatives are supported:
?s ex:a|ex:b|ex:c ?o .
?s ex:friend|^ex:colleague ?related .
Alternative branches can also be sequence chains. For example, to get the name via either the friend or colleague path:
?s (ex:friend/ex:name)|(ex:colleague/ex:name) ?name .
Branches can freely mix simple predicates, inverse predicates, and sequence chains:
?s ex:name|(ex:friend/ex:name)|^ex:colleague ?val .
Sequence (/) — Property Chains:
?person ex:friend/ex:name ?friendName .
This follows ex:friend then ex:name, expanding into a chain of triple patterns joined by internal variables. Multi-step chains are supported:
?person ex:friend/ex:friend/ex:name ?fofName .
Sequence steps can include inverse predicates:
?person ^ex:friend/ex:name ?name .
This traverses ex:friend backwards (finding who links to ?person), then follows ex:name forward.
Sequence steps can also be alternatives. For example, ex:friend/(ex:name|ex:nick) distributes the alternative into a union of chains (ex:friend/ex:name and ex:friend/ex:nick):
?person ex:friend/(ex:name|ex:nick) ?label .
Multiple alternative steps are supported: (ex:a|ex:b)/(ex:c|ex:d) expands to 4 chains. A safety limit of 64 expanded chains is enforced to prevent combinatorial explosion.
Rules:
- Transitive paths (
+,*) require at least one variable (both subject and object cannot be constants). - Sequence (
/) steps must be simple predicates (ex:p), inverse simple predicates (^ex:p), or alternatives of simple predicates ((ex:a|ex:b)). Transitive (+/*) and nested sequence modifiers are not allowed inside sequence steps. - Variable names starting with
?__are reserved for internal use and will not appear inSELECT *(wildcard) output.
Not Yet Supported
The following operators are parsed but not yet supported for execution:
| Syntax | Name |
|---|---|
p? | Zero or one (optional step) |
!p or !(p|q) | Negated property set |
Query Modifiers
FILTER
Filter results with conditions:
SELECT ?name ?age
WHERE {
?person ex:name ?name .
?person ex:age ?age .
FILTER (?age > 18)
}
Multiple Filters:
FILTER (?age > 18 && ?age < 65)
FILTER (regex(?name, "^A"))
OPTIONAL
Match optional patterns:
SELECT ?name ?email
WHERE {
?person ex:name ?name .
OPTIONAL { ?person ex:email ?email . }
}
Multiple Optionals:
SELECT ?name ?email ?phone
WHERE {
?person ex:name ?name .
OPTIONAL { ?person ex:email ?email . }
OPTIONAL { ?person ex:phone ?phone . }
}
UNION
Match alternative patterns:
SELECT ?name
WHERE {
{ ?person ex:name ?name . }
UNION
{ ?person ex:alias ?name . }
}
MINUS
Exclude matching patterns:
SELECT ?person
WHERE {
?person ex:type ex:User .
MINUS { ?person ex:status ex:Inactive . }
}
BIND
Compute values:
SELECT ?name ?nextAge
WHERE {
?person ex:name ?name .
?person ex:age ?age .
BIND (?age + 1 AS ?nextAge)
}
VALUES
Provide initial bindings:
SELECT ?person ?name
WHERE {
VALUES ?name { "Alice" "Bob" "Carol" }
?person ex:name ?name .
}
Aggregation
GROUP BY
Group results by variable:
SELECT ?category (COUNT(?product) AS ?count)
WHERE {
?product ex:category ?category .
}
GROUP BY ?category
Expression-based GROUP BY:
Group by a computed expression using (expr AS ?alias) syntax:
SELECT ?initial (COUNT(?name) AS ?count)
WHERE {
?person ex:name ?name .
}
GROUP BY (SUBSTR(?name, 1, 1) AS ?initial)
The expression is evaluated per row and bound to the alias variable before grouping. Any SPARQL expression is supported, including function calls, arithmetic, and type casts.
HAVING
Filter grouped results:
SELECT ?category (COUNT(?product) AS ?count)
WHERE {
?product ex:category ?category .
}
GROUP BY ?category
HAVING (COUNT(?product) > 10)
Aggregation Functions
COUNT(?var)- Count non-null valuesSUM(?var)- Sum numeric valuesAVG(?var)- Average numeric valuesMIN(?var)- Minimum valueMAX(?var)- Maximum valueSAMPLE(?var)- Arbitrary value from groupGROUP_CONCAT(?var; separator=",")- Concatenate values
All aggregate functions support the DISTINCT modifier, which eliminates duplicate values before aggregation:
SELECT ?category (COUNT(DISTINCT ?customer) AS ?unique_buyers)
(SUM(DISTINCT ?price) AS ?unique_price_total)
WHERE {
?order ex:category ?category .
?order ex:customer ?customer .
?order ex:price ?price .
}
GROUP BY ?category
Aggregate result types: COUNT and SUM of integers return xsd:integer. SUM of mixed numeric types and AVG return xsd:double.
Sorting and Limiting
ORDER BY
Sort results:
SELECT ?name ?age
WHERE {
?person ex:name ?name .
?person ex:age ?age .
}
ORDER BY ?name
Descending:
ORDER BY DESC(?age)
Multiple Sort Keys:
ORDER BY ?last ASC(?first) DESC(?age)
LIMIT
Limit number of results:
SELECT ?name
WHERE {
?person ex:name ?name .
}
LIMIT 10
OFFSET
Skip results:
SELECT ?name
WHERE {
?person ex:name ?name .
}
OFFSET 20
LIMIT 10
Datasets
FROM
Specify default graph:
PREFIX ex: <http://example.org/ns/>
SELECT ?name
FROM <mydb:main>
WHERE {
?person ex:name ?name .
}
Multiple Default Graphs:
SELECT ?name
FROM <mydb:main>
FROM <otherdb:main>
WHERE {
?person ex:name ?name .
}
FROM NAMED
Specify named graphs:
PREFIX ex: <http://example.org/ns/>
SELECT ?graph ?name
FROM NAMED <mydb:main>
FROM NAMED <otherdb:main>
WHERE {
GRAPH ?graph {
?person ex:name ?name .
}
}
Fluree also exposes a built-in named graph inside each ledger for transaction / commit metadata:
FROM <mydb:main#txn-meta>(txn-meta as the default graph), orFROM NAMED <mydb:main#txn-meta>andGRAPH <mydb:main#txn-meta> { ... }
See Datasets for details.
SPARQL Functions
String Functions
STR(?x)- String valueLANG(?x)- Language tagLANGMATCHES(?lang, ?pattern)- Language matchREGEX(?str, ?pattern)- Regular expressionREPLACE(?str, ?pattern, ?replacement)- ReplaceSUBSTR(?str, ?start, ?length)- SubstringSTRLEN(?str)- String lengthUCASE(?str)- UppercaseLCASE(?str)- LowercaseENCODE_FOR_URI(?str)- URI encodeCONCAT(?str1, ?str2, ...)- Concatenate
Numeric Functions
ABS(?x)- Absolute valueROUND(?x)- RoundCEIL(?x)- CeilingFLOOR(?x)- FloorRAND()- Random number
Date/Time Functions
NOW()- Current timestampYEAR(?date)- YearMONTH(?date)- MonthDAY(?date)- DayHOURS(?time)- HoursMINUTES(?time)- MinutesSECONDS(?time)- Seconds
Type Conversion
STRDT(?str, ?datatype)- String to typed literalSTRLANG(?str, ?lang)- String with languageDATATYPE(?literal)- DatatypeIRI(?str)- IRI from stringURI(?str)- URI from stringBNODE(?str)- Blank node
XSD Datatype Constructors (Casts)
Per W3C SPARQL 1.1 §17.5, XSD constructor functions cast values between datatypes. Invalid casts produce unbound (no binding), not errors.
xsd:boolean(?x)- Cast to boolean ("true","1"→ true;"false","0"→ false; numeric 0 → false, non-zero → true)xsd:integer(?x)- Cast to integer (truncates doubles, parses strings)xsd:float(?x)- Cast to single-precision floatxsd:double(?x)- Cast to double-precision floatxsd:decimal(?x)- Cast to decimal (rejects scientific notation strings)xsd:string(?x)- Cast to string (canonical form for decimals)
Logical Functions
BOUND(?var)- Variable is boundIF(?condition, ?then, ?else)- ConditionalCOALESCE(?x, ?y, ...)- First non-null valueISIRI(?x)- Is IRIISURI(?x)- Is URIISBLANK(?x)- Is blank nodeISLITERAL(?x)- Is literalISNUMERIC(?x)- Is numeric
Subqueries
Nest queries:
SELECT ?person ?name
WHERE {
?person ex:name ?name .
{
SELECT ?person
WHERE {
?person ex:age ?age .
FILTER (?age > 18)
}
}
}
Service Queries
SERVICE enables cross-ledger queries within Fluree. You can execute patterns against different ledgers within the same query using the fluree:ledger: URI scheme.
Basic Cross-Ledger Query
Query data from another ledger in your dataset:
PREFIX ex: <http://example.org/ns/>
SELECT ?customer ?name ?total
FROM <customers:main>
FROM NAMED <orders:main>
WHERE {
?customer ex:name ?name .
SERVICE <fluree:ledger:orders:main> {
?order ex:customer ?customer ;
ex:total ?total .
}
}
Endpoint URI Format
For local Fluree ledger queries, use the fluree:ledger: scheme:
| Format | Description | Matches dataset ledger ID |
|---|---|---|
fluree:ledger:<name> | Query ledger with default branch (main) | <name>:main |
fluree:ledger:<name>:<branch> | Query specific branch | <name>:<branch> |
Where:
<name>is the ledger name without the branch (e.g.,orders,acme/people)<branch>is the branch name (e.g.,main,dev)- The full dataset ledger ID is always
<name>:<branch>(e.g.,orders:main,acme/people:dev)
The endpoint is resolved by matching against the full ledger_id in the dataset.
Examples:
SERVICE <fluree:ledger:orders> { ... } -- matches orders:main
SERVICE <fluree:ledger:orders:main> { ... } -- matches orders:main (explicit)
SERVICE <fluree:ledger:orders:dev> { ... } -- matches orders:dev
SERVICE SILENT
Use SERVICE SILENT to return empty results instead of failing if the service errors or is unavailable:
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?order
WHERE {
?person ex:name ?name .
SERVICE SILENT <fluree:ledger:orders:main> {
?order ex:customer ?person .
}
}
If the orders ledger is not in the dataset or encounters an error, the query returns results with unbound ?order values instead of failing.
Variable Endpoints
SERVICE supports variable endpoints that iterate over available ledgers:
PREFIX ex: <http://example.org/ns/>
SELECT ?ledger ?person ?name
FROM NAMED <db1:main>
FROM NAMED <db2:main>
WHERE {
SERVICE ?ledger {
?person ex:name ?name .
}
}
This queries all named ledgers in the dataset.
Cross-Ledger Join Example
Join customer data from one ledger with their orders from another:
PREFIX ex: <http://example.org/ns/>
SELECT ?customerName ?productName ?quantity
FROM <customers:main>
FROM NAMED <orders:main>
FROM NAMED <products:main>
WHERE {
# Get customer from default graph (customers ledger)
?customer ex:name ?customerName .
# Get orders for this customer from orders ledger
SERVICE <fluree:ledger:orders:main> {
?order ex:customer ?customer ;
ex:product ?product ;
ex:quantity ?quantity .
}
# Get product details from products ledger
SERVICE <fluree:ledger:products:main> {
?product ex:name ?productName .
}
}
Requirements
- The target ledger must be included in the dataset (via
FROMorFROM NAMEDclauses) - Results are joined with the outer query on shared variables
- SERVICE patterns are executed as correlated subqueries (like EXISTS)
Remote Fluree Federation
SERVICE supports querying ledgers on remote Fluree instances using the fluree:remote: scheme. This enables cross-server federation — a single SPARQL query can join data from local ledgers with data from ledgers on other Fluree servers.
Remote Endpoint Format
| Format | Description |
|---|---|
fluree:remote:<connection>/<ledger> | Query a ledger on a registered remote server |
Where:
<connection>is a named remote connection registered at build time (maps to a server URL + bearer token)<ledger>is the ledger ID on the remote server (e.g.,customers:main,acme/people:main)
Example: Cross-Server Join
PREFIX ex: <http://example.org/ns/>
SELECT ?localName ?remoteEmail
WHERE {
?person ex:name ?localName .
SERVICE <fluree:remote:acme/customers:main> {
?person ex:email ?remoteEmail .
}
}
This queries ?person ex:name from the local ledger and joins with ?person ex:email from the customers:main ledger on the remote server named acme.
Multiple Ledgers on the Same Remote Server
A single remote connection gives access to any ledger the bearer token is authorized for:
PREFIX ex: <http://example.org/ns/>
SELECT ?customer ?orderId ?productName
WHERE {
SERVICE <fluree:remote:acme/customers:main> {
?customer ex:name ?name .
?customer ex:id ?customerId .
}
SERVICE <fluree:remote:acme/orders:main> {
?order ex:customerId ?customerId .
?order ex:orderId ?orderId .
?order ex:product ?product .
}
SERVICE <fluree:remote:acme/products:main> {
?product ex:name ?productName .
}
}
SILENT with Remote Endpoints
SERVICE SILENT works with remote endpoints. If the remote server is unreachable, the connection is not registered, or the bearer token is rejected, the SERVICE block returns empty results instead of failing the query:
SERVICE SILENT <fluree:remote:partner/inventory:main> {
?item ex:sku ?sku .
}
Registering Remote Connections
Remote connections are registered at connection build time via the Rust API or server configuration. See Configuration: Remote connections and Rust API: Remote federation for setup details.
Datatype Handling
Remote query results preserve their original datatypes. Values returned from a remote server are parsed into the same rich type system used for local data — xsd:dateTime, xsd:date, xsd:decimal, xsd:integer, etc. are all stored with their proper typed representations. Custom datatypes (e.g., http://example.org/myType) are also preserved: the value is kept as a string with the original datatype IRI retained, so round-tripping and downstream FILTER comparisons on shared custom types work correctly.
Limitations (v1)
- Uncorrelated execution only. The SERVICE body is sent to the remote server as a standalone query. Parent-row bindings are not injected as VALUES (bound-join). This means a SERVICE block that references variables bound in the outer query will not push those constraints to the remote server — the remote returns all matching rows, and the join happens locally.
- SPARQL queries only. Remote SERVICE is available in SPARQL queries. JSON-LD queries do not currently support the
fluree:remote:scheme. - No query cancellation propagation. If the local query is cancelled, in-flight remote HTTP requests are not aborted.
- Policy is local only. The remote server enforces its own policy based on the bearer token. The local server’s policy engine does not filter rows returned from a remote SERVICE.
External SPARQL Endpoints
Federated queries to non-Fluree SPARQL endpoints (e.g., Wikidata, DBpedia) are not yet supported. Only the fluree:ledger: (local) and fluree:remote: (remote Fluree) schemes are currently available.
Time Travel
Point-in-Time Queries
Query data as it existed at a specific time using time specifiers in the FROM clause:
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?age
FROM <ledger:main@t:100>
WHERE {
?person ex:name ?name ;
ex:age ?age .
}
Time specifiers:
@t:100- Transaction number@iso:2024-01-15T10:30:00Z- ISO 8601 datetime@commit:bafybeig...- Commit ContentId@t:latest- Current/latest state
History Queries
Query all changes (assertions and retractions) within a time range using FROM...TO with RDF-star syntax:
PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?age ?t ?op
FROM <ledger:main@t:1>
TO <ledger:main@t:latest>
WHERE {
<< ex:alice ex:age ?age >> f:t ?t .
<< ex:alice ex:age ?age >> f:op ?op .
}
ORDER BY ?t
The << subject predicate object >> syntax (RDF-star) treats the triple as an entity that can have metadata:
f:t- Transaction time (integer) when the fact was asserted or retracted.f:op- Operation type as a boolean:truefor assertions,falsefor retractions. MirrorsFlake.opon disk.
Filter by operation type:
PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?age ?t
FROM <ledger:main@t:1>
TO <ledger:main@t:latest>
WHERE {
<< ex:alice ex:age ?age >> f:t ?t .
<< ex:alice ex:age ?age >> f:op ?op .
FILTER(?op = false)
}
History with ISO datetime range:
PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?name ?t ?op
FROM <ledger:main@iso:2024-01-01T00:00:00Z>
TO <ledger:main@iso:2024-12-31T23:59:59Z>
WHERE {
<< ex:alice ex:name ?name >> f:t ?t .
<< ex:alice ex:name ?name >> f:op ?op .
}
SPARQL UPDATE
Fluree supports SPARQL 1.1 Update for modifying data using standard SPARQL syntax. SPARQL UPDATE requests use the application/sparql-update content type and are sent to the update endpoints.
INSERT DATA
Insert ground triples (no variables):
PREFIX ex: <http://example.org/ns/>
INSERT DATA {
ex:alice ex:name "Alice" .
ex:alice ex:age 30 .
ex:alice ex:email "alice@example.org" .
}
HTTP Request:
curl -X POST http://localhost:8090/v1/fluree/update/mydb:main \
-H "Content-Type: application/sparql-update" \
-d 'PREFIX ex: <http://example.org/ns/>
INSERT DATA { ex:alice ex:name "Alice" }'
DELETE DATA
Delete specific ground triples:
PREFIX ex: <http://example.org/ns/>
DELETE DATA {
ex:alice ex:email "alice@example.org" .
}
DELETE WHERE
Delete triples matching a pattern:
PREFIX ex: <http://example.org/ns/>
DELETE WHERE {
ex:alice ex:age ?age .
}
This finds all ex:age values for ex:alice and deletes them.
DELETE/INSERT (Modify)
The most powerful form combines WHERE, DELETE, and INSERT clauses:
PREFIX ex: <http://example.org/ns/>
DELETE {
?person ex:age ?oldAge .
}
INSERT {
?person ex:age ?newAge .
}
WHERE {
?person ex:name "Alice" .
?person ex:age ?oldAge .
BIND(?oldAge + 1 AS ?newAge)
}
Update multiple properties:
PREFIX ex: <http://example.org/ns/>
DELETE {
?person ex:name ?oldName .
?person ex:status ?oldStatus .
}
INSERT {
?person ex:name "Alicia" .
?person ex:status ex:Active .
}
WHERE {
?person ex:name "Alice" .
OPTIONAL { ?person ex:name ?oldName }
OPTIONAL { ?person ex:status ?oldStatus }
}
Dataset scoping for MODIFY (WITH / USING / USING NAMED)
SPARQL UPDATE MODIFY supports dataset scoping for named graphs:
WITH <iri>: sets the default graph for INSERT/DELETE templates that don’t use an explicitGRAPH <iri> { ... }block.USING <iri>: scopes the default graph(s) forWHEREevaluation. RepeatedUSINGclauses are evaluated as a merged default graph.USING NAMED <iri>: scopes which named graphs are visible toWHEREGRAPH <iri> { ... }patterns. RepeatedUSING NAMEDclauses allow multiple named graphs.
Blank Nodes in INSERT
Blank nodes can be used in INSERT templates to create new entities:
PREFIX ex: <http://example.org/ns/>
INSERT DATA {
_:newPerson ex:name "Bob" .
_:newPerson ex:age 25 .
}
Typed Literals
Specify datatypes explicitly:
PREFIX ex: <http://example.org/ns/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
INSERT DATA {
ex:alice ex:birthDate "1990-05-15"^^xsd:date .
ex:alice ex:salary "75000.00"^^xsd:decimal .
ex:alice ex:active "true"^^xsd:boolean .
}
Language-Tagged Strings
Insert strings with language tags:
PREFIX ex: <http://example.org/ns/>
INSERT DATA {
ex:alice ex:name "Alice"@en .
ex:alice ex:name "Alicia"@es .
ex:alice ex:name "アリス"@ja .
}
SPARQL UPDATE Restrictions
Current restrictions / boundaries:
- Graph management operations:
LOAD,CLEAR,DROP,CREATE,ADD,MOVE,COPYare not yet supported. - Template graph variables: INSERT/DELETE templates support
GRAPH <iri> { ... }blocks, butGRAPH ?g { ... }is not yet supported. - DELETE WHERE + GRAPH blocks:
GRAPH <iri> { ... }blocks are not yet supported insideDELETE WHERE { ... }. - SERVICE: Only local-ledger endpoints of the form
fluree:ledger:<name>[:<branch>]are supported; arbitrary remote HTTPSERVICEendpoints are not supported. - Property paths: Supported in
WHERE(subject to Fluree capability settings).
Endpoint Usage
SPARQL UPDATE uses the update endpoints with Content-Type: application/sparql-update:
| Endpoint | Description |
|---|---|
POST /v1/fluree/update | Connection-scoped, requires Fluree-Ledger header |
POST /v1/fluree/update/<ledger...> | Ledger-scoped, ledger from URL path |
Examples:
# Ledger-scoped (recommended)
curl -X POST http://localhost:8090/v1/fluree/update/mydb:main \
-H "Content-Type: application/sparql-update" \
-d 'PREFIX ex: <http://example.org/ns/>
INSERT DATA { ex:alice ex:name "Alice" }'
# Connection-scoped with header
curl -X POST http://localhost:8090/v1/fluree/update \
-H "Content-Type: application/sparql-update" \
-H "Fluree-Ledger: mydb:main" \
-d 'PREFIX ex: <http://example.org/ns/>
INSERT DATA { ex:alice ex:name "Alice" }'
Best Practices
- Use PREFIX Declarations: Makes queries readable
- Automatic Pattern Optimization: The query planner automatically reorders patterns for efficient execution using statistics-driven cardinality estimates
- Flexible FILTER Placement: Filters can be placed anywhere in the WHERE clause — the query engine automatically applies each filter as soon as all its required variables are bound
- Limit Results: Use LIMIT for large result sets
- Avoid Cartesian Products: Structure queries to avoid large joins
Related Documentation
- JSON-LD Query: Fluree’s native query language
- CONSTRUCT Queries: Generating RDF graphs
- Datasets: Multi-graph queries
- Output Formats: Query result formats
- Transactions: JSON-LD transaction format
Output Formats
Fluree supports multiple output formats for query results, each optimized for different use cases. You can choose the format that best fits your application’s needs.
Supported Formats
JSON-LD Format
Default format for JSON-LD Query. Provides compact, context-aware JSON with IRI expansion/compaction.
Characteristics:
- Uses
@contextfor IRI compaction - Compact IRIs (e.g.,
ex:aliceinstead of full IRIs) - Inferable datatypes (string, long, double, boolean) rendered as bare values
- Language tags preserved
Example (graph crawl):
[
{
"@id": "ex:alice",
"schema:name": "Alice",
"schema:age": 30,
"schema:knows": {"@id": "ex:bob"}
}
]
Example (tabular SELECT):
[
["Alice", 30],
["Bob", 25]
]
SPARQL JSON Format
Standard SPARQL 1.1 result format for SPARQL queries.
Characteristics:
- W3C SPARQL 1.1 compliant
- Standard
resultsandbindingsstructure - Datatype information included
- Language tags included
Example:
{
"head": {
"vars": ["name", "age"]
},
"results": {
"bindings": [
{
"name": {
"type": "literal",
"value": "Alice"
},
"age": {
"type": "literal",
"value": "30",
"datatype": "http://www.w3.org/2001/XMLSchema#integer"
}
}
]
}
}
Typed JSON Format
Type-preserving JSON format with explicit datatype information on every value. Works with both tabular SELECT queries and graph crawl (entity-centric) queries.
Characteristics:
- Every literal includes
{"@value": ..., "@type": "..."}— even inferable types - References use
{"@id": "..."} - Language-tagged strings use
{"@value": ..., "@language": "..."} @jsonvalues use{"@value": <parsed>, "@type": "@json"}- Nested entities in graph crawl results are also fully typed
- IRIs compacted via
@context
Example (tabular SELECT):
[
{
"?name": {"@value": "Alice", "@type": "xsd:string"},
"?age": {"@value": 30, "@type": "xsd:long"}
}
]
Example (graph crawl):
[
{
"@id": "ex:alice",
"@type": ["schema:Person"],
"schema:name": {"@value": "Alice", "@type": "xsd:string"},
"schema:age": {"@value": 30, "@type": "xsd:long"},
"schema:knows": {
"@id": "ex:bob",
"schema:name": {"@value": "Bob", "@type": "xsd:string"}
},
"ex:data": {"@value": {"key": "val"}, "@type": "@json"}
}
]
Agent JSON Format
Optimized for LLM/agent consumption. Returns a self-describing envelope with a schema header, compact object rows using native JSON types, and built-in pagination support.
Request via HTTP:
Accept: application/vnd.fluree.agent+json
Fluree-Max-Bytes: 32768
Characteristics:
- Schema-once header: datatypes declared per variable, not repeated per value
- Native JSON types for values (strings, numbers, booleans — no wrappers for inferable types)
- Non-inferable datatypes annotated inline only where needed (
{"@value": ..., "@type": "..."}) - Byte-budget truncation with
hasMoreflag and resume query - Time-pinning metadata (
tfor single-ledger,isowallclock timestamp for cross-ledger)
Example (single-ledger, no truncation):
{
"schema": {
"?name": "xsd:string",
"?age": "xsd:integer",
"?s": "uri"
},
"rows": [
{"?name": "Alice", "?age": 30, "?s": "ex:alice"},
{"?name": "Bob", "?age": 25, "?s": "ex:bob"}
],
"rowCount": 2,
"t": 5,
"iso": "2026-03-26T14:30:00Z",
"hasMore": false
}
Example (truncated, with resume query):
{
"schema": {
"?name": "xsd:string",
"?age": "xsd:integer"
},
"rows": [
{"?name": "Alice", "?age": 30},
{"?name": "Bob", "?age": 25}
],
"rowCount": 2,
"t": 5,
"iso": "2026-03-26T14:30:00Z",
"hasMore": true,
"message": "Response truncated due to size limit of 32768 bytes. Use the query below to retrieve the next batch.",
"resume": "SELECT ?name ?age FROM <mydb:main@t:5> WHERE { ?s ex:name ?name ; ex:age ?age } OFFSET 2 LIMIT 100"
}
Schema types:
- Single type → string:
"?name": "xsd:string" - Mixed types → array:
"?value": ["xsd:string", "xsd:integer"] - IRI references →
"uri"
Envelope fields:
| Field | Present | Description |
|---|---|---|
schema | Always | Per-variable datatype map |
rows | Always | Array of {variable: value} objects |
rowCount | Always | Number of rows included |
t | Single-ledger only | Transaction number used for the query |
iso | Always | ISO-8601 wallclock timestamp at query time |
hasMore | Always | Whether more rows exist beyond the byte budget |
message | When truncated | Human-readable truncation explanation |
resume | When truncated, single-FROM only | Ready-to-execute SPARQL with @t: pinning and OFFSET |
Multi-ledger queries: The t field is omitted (each ledger has its own timeline). The resume field is also omitted; instead, the message instructs the caller to use @iso: on each FROM clause for time-pinning.
Byte budget: Set via the Fluree-Max-Bytes header. When the cumulative serialized size of rows exceeds this limit, the formatter stops adding rows and sets hasMore: true. The budget applies to row data only (schema and envelope overhead are excluded from the count).
Array Normalization
By default, graph crawl results return single-valued properties as bare scalars and multi-valued properties as arrays:
{"schema:name": "Alice", "ex:tags": ["rust", "wasm"]}
This can be problematic for typed struct deserialization (e.g., a Vec<String> field that receives a bare string when only one value exists).
normalize_arrays forces all property values into arrays regardless of cardinality:
{"schema:name": ["Alice"], "ex:tags": ["rust", "wasm"]}
This is orthogonal to typed JSON and can be combined with any format:
#![allow(unused)]
fn main() {
// Typed + normalized — most predictable for struct deserialization
let config = FormatterConfig::typed_json().with_normalize_arrays();
// JSON-LD + normalized — compact values but predictable shapes
let config = FormatterConfig::jsonld().with_normalize_arrays();
}
The @container: @set context annotation still forces arrays per-property and works regardless of the normalize_arrays setting.
Format Selection
JSON-LD Query
JSON-LD Query defaults to JSON-LD format. You can specify the format explicitly:
{
"@context": { "ex": "http://example.org/ns/" },
"select": ["?name", "?age"],
"where": [
{ "@id": "?person", "ex:name": "?name", "ex:age": "?age" }
],
"format": "jsonld"
}
SPARQL
SPARQL queries return SPARQL JSON format by default:
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?age
WHERE {
?person ex:name ?name .
?person ex:age ?age .
}
Datatype Handling
String Types
JSON-LD:
"Hello"
Typed JSON:
{"@value": "Hello", "@type": "xsd:string"}
SPARQL JSON:
{"type": "literal", "value": "Hello"}
Numeric Types
JSON-LD:
42
Typed JSON:
{"@value": 42, "@type": "xsd:long"}
SPARQL JSON:
{"type": "literal", "value": "42", "datatype": "http://www.w3.org/2001/XMLSchema#integer"}
Language-Tagged Strings
All formats use the same representation:
{"@value": "Hello", "@language": "en"}
IRIs
JSON-LD / Typed JSON:
{"@id": "ex:alice"}
SPARQL JSON:
{"type": "uri", "value": "http://example.org/ns/alice"}
Rust API
Use FormatterConfig to control output format via the query builder API:
#![allow(unused)]
fn main() {
use fluree_db_api::FormatterConfig;
// Single-ledger query with explicit format
let db = fluree.db("mydb:main").await?;
let result = db.query(&fluree)
.sparql("SELECT ?name WHERE { ?s <schema:name> ?name }")
.format(FormatterConfig::typed_json())
.execute_formatted()
.await?;
// Dataset query with format
let result = dataset.query(&fluree)
.sparql("SELECT * WHERE { ?s ?p ?o }")
.format(FormatterConfig::sparql_json())
.execute_formatted()
.await?;
// Connection-level query with format
let result = fluree.query_from()
.jsonld(&query_with_from)
.format(FormatterConfig::jsonld())
.execute_formatted()
.await?;
// AgentJson with byte budget and resume support
use fluree_db_api::AgentJsonContext;
let config = FormatterConfig::agent_json()
.with_max_bytes(32768)
.with_agent_json_context(AgentJsonContext {
sparql_text: Some(sparql.to_string()),
from_count: 1,
iso_timestamp: Some(chrono::Utc::now().to_rfc3339()),
});
let result = db.query(&fluree)
.sparql("SELECT ?name ?age WHERE { ?s ex:name ?name ; ex:age ?age }")
.format(config)
.execute_formatted()
.await?;
// Or directly on QueryResult:
let json = result.to_agent_json(&snapshot)?; // no budget
let json = result.to_agent_json_with_config(&snapshot, &config)?; // with budget
}
Available format constructors:
FormatterConfig::jsonld()— JSON-LD (default for JSON-LD queries)FormatterConfig::sparql_json()— SPARQL 1.1 JSON Results (default for SPARQL queries)FormatterConfig::typed_json()— Typed JSON with explicit datatypes on every valueFormatterConfig::agent_json()— Agent JSON envelope for LLM/agent consumers
Builder methods:
.with_normalize_arrays()— Force array wrapping for all graph crawl properties.with_pretty()— Pretty-print JSON output.with_max_bytes(n)— Set byte budget for AgentJson truncation.with_agent_json_context(ctx)— Set SPARQL text, FROM count, and ISO timestamp for AgentJson resume queries
All three query paths (db.query(), dataset.query(), fluree.query_from()) support .format().
Direct formatting on QueryResult
For graph crawl queries (which require async DB access):
#![allow(unused)]
fn main() {
// Typed JSON with graph crawl support
let json = result.to_typed_json_async(db.as_graph_db_ref()).await?;
// Custom config (e.g., typed + normalize_arrays)
let config = FormatterConfig::typed_json().with_normalize_arrays();
let json = result.format_async(db.as_graph_db_ref(), &config).await?;
}
When no .format() is set:
- JSON-LD queries default to JSON-LD format
- SPARQL queries default to SPARQL JSON format
CLI Usage
The fluree query command supports format selection via --format:
# Default table output
fluree query "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 5"
# JSON output
fluree query --format json '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'
# Typed JSON output (explicit types on every value)
fluree query --format typed-json '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'
# Normalize arrays (force all properties to arrays)
fluree query --format json --normalize-arrays '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'
# Typed JSON + normalize arrays (most predictable for programmatic use)
fluree query --format typed-json --normalize-arrays '{"select": {"ex:alice": ["*"]}, "from": "mydb:main"}'
Performance Considerations
- JSON-LD is the most efficient format — inferable types skip the
@value/@typewrapper - Typed JSON adds a constant-factor overhead per literal value (one extra JSON object allocation). Query execution is unaffected — only the formatting phase is slower.
- normalize_arrays adds zero overhead when disabled (default). When enabled, it skips the
len() == 1check — no additional allocations beyond the array wrapper. - TSV/CSV bypass JSON DOM construction entirely for maximum throughput
Best Practices
- Use JSON-LD for human-facing apps: Compact and readable
- Use Typed JSON for struct deserialization: Unambiguous types prevent parsing surprises
- Use
normalize_arraysfor typed consumers: EnsuresVec<T>fields always get arrays - Use SPARQL JSON for standard tooling: Interoperable with SPARQL clients
- Use TSV/CSV for bulk export: Highest throughput, smallest memory footprint
- Use Agent JSON for LLM/agent integrations: Schema-once + pagination prevents context window overflow
Related Documentation
- JSON-LD Query: JSON-LD Query language
- SPARQL: SPARQL query language
- Datatypes: Type system details
Datasets and Multi-Graph Execution
Fluree supports SPARQL datasets, enabling queries across multiple graphs and ledgers simultaneously. This provides powerful data integration capabilities for complex applications.
SPARQL Datasets
A dataset in SPARQL is a collection of graphs used for query execution:
- Default Graph: The primary graph for triple patterns without GRAPH clauses
- Named Graphs: Additional graphs identified by IRIs, accessible via GRAPH clauses
FROM Clauses
Single Default Graph
Specify a single default graph:
JSON-LD Query:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "mydb:main",
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
SPARQL:
PREFIX ex: <http://example.org/ns/>
SELECT ?name
FROM <mydb:main>
WHERE {
?person ex:name ?name .
}
Multiple Default Graphs
Specify multiple default graphs (union semantics):
JSON-LD Query:
{
"@context": { "ex": "http://example.org/ns/" },
"from": ["mydb:main", "otherdb:main"],
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
]
}
SPARQL:
PREFIX ex: <http://example.org/ns/>
SELECT ?name
FROM <mydb:main>
FROM <otherdb:main>
WHERE {
?person ex:name ?name .
}
FROM NAMED Clauses
Named graph sources (datasets)
In SPARQL, FROM NAMED identifies named graphs in the dataset. In Fluree, these are often graph sources such as:
- another ledger (federation / multi-ledger queries), or
- a graph source (search, tabular mapping, etc.).
Note: On the ledger-scoped HTTP query endpoint (POST /query/{ledger}), FROM / FROM NAMED is also supported, but is interpreted as selecting named graphs inside that same ledger. Use the connection-scoped endpoint (POST /query) when you want a dataset that spans multiple ledgers.
Query across multiple named graph sources:
JSON-LD Query:
{
"@context": { "ex": "http://example.org/ns/" },
"fromNamed": {
"mydb": { "@id": "mydb:main" },
"otherdb": { "@id": "otherdb:main" }
},
"select": ["?graph", "?name"],
"where": [
["graph", "?graph", { "@id": "?person", "ex:name": "?name" }]
]
}
SPARQL:
PREFIX ex: <http://example.org/ns/>
SELECT ?graph ?name
FROM NAMED <mydb:main>
FROM NAMED <otherdb:main>
WHERE {
GRAPH ?graph {
?person ex:name ?name .
}
}
Specific Named Graph
Query a specific named graph:
SPARQL:
PREFIX ex: <http://example.org/ns/>
SELECT ?name
FROM NAMED <mydb:main>
WHERE {
GRAPH <mydb:main> {
?person ex:name ?name .
}
}
Ledger named graph: txn-meta
Fluree provides a built-in named graph inside each ledger for transactional / commit metadata: txn-meta.
Use the #txn-meta fragment on a ledger reference:
mydb:main#txn-metamydb:main@t:100#txn-meta(time pinned)
JSON-LD Query (txn-meta as the default graph):
{
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/ns/"
},
"from": "mydb:main#txn-meta",
"select": ["?commit", "?t", "?machine"],
"where": [
{ "@id": "?commit", "f:t": "?t" },
{ "@id": "?commit", "ex:machine": "?machine" }
]
}
SPARQL Query:
PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/ns/>
SELECT ?commit ?t ?machine
FROM <mydb:main#txn-meta>
WHERE {
?commit f:t ?t .
OPTIONAL { ?commit ex:machine ?machine }
}
User-Defined Named Graphs
Fluree supports user-defined named graphs ingested via TriG format. These graphs are queryable using the structured from object syntax with a graph field.
For the ledger-scoped HTTP endpoint (POST /query/{ledger}), the server also accepts a convenient shorthand:
"from": "txn-meta"/"from": "default"/"from": "<graph IRI>"to select a graph within the ledger in the URL.
Ingesting data with named graphs (TriG):
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/trig" \
-d '@prefix ex: <http://example.org/ns/> .
GRAPH <http://example.org/graphs/products> {
ex:widget ex:name "Widget" ;
ex:price "29.99"^^xsd:decimal .
}'
Querying the named graph (JSON-LD):
Use the structured from object with a graph field specifying the graph IRI:
{
"@context": { "ex": "http://example.org/ns/" },
"from": {
"@id": "mydb:main",
"graph": "http://example.org/graphs/products"
},
"select": ["?name", "?price"],
"where": [
{ "@id": "?product", "ex:name": "?name" },
{ "@id": "?product", "ex:price": "?price" }
]
}
With time-travel:
{
"from": {
"@id": "mydb:main",
"t": 100,
"graph": "http://example.org/graphs/products"
},
"select": ["?name", "?price"],
"where": [...]
}
Combining multiple graphs (JSON-LD):
Query across the default graph and user-defined named graphs:
{
"@context": { "ex": "http://example.org/ns/" },
"from": "mydb:main",
"fromNamed": {
"products": {
"@id": "mydb:main",
"@graph": "http://example.org/graphs/products"
}
},
"select": ["?company", "?product", "?price"],
"where": [
{ "@id": "?company", "@type": "ex:Company" },
["graph", "products", { "@id": "?product", "ex:name": "?productName", "ex:price": "?price" }]
]
}
Notes:
- Named graphs are queryable after indexing completes
- The
@graphfield accepts the full graph IRI (no URL-encoding required) - Time-travel is specified via the
t,iso, orshafield in the object form - Object keys in
fromNamedserve as dataset-local aliases for use in GRAPH patterns
Graph Source Object Schema
fromNamed (named graphs) — preferred format
fromNamed is an object whose keys are dataset-local aliases. Each value has:
| Field | Type | Required | Description |
|---|---|---|---|
@id | string | Yes | Ledger reference (e.g., mydb:main, mydb:main@t:100) |
@graph | string | No | Graph selector: "default", "txn-meta", or full IRI |
t | integer | No | Time-travel: specific transaction number |
at | string | No | Time-travel: ISO-8601 timestamp or commit:<hash> |
policy | object | No | Per-source policy override (see below) |
from (default graphs) — object syntax
When using object syntax for from, the following fields are available:
| Field | Type | Required | Description |
|---|---|---|---|
@id | string | Yes | Ledger reference (e.g., mydb:main, mydb:main@t:100) |
alias | string | No | Dataset-local alias for GRAPH pattern reference |
graph | string | No | Graph selector: "default", "txn-meta", or full IRI |
t | integer | No | Time-travel: specific transaction number |
iso | string | No | Time-travel: ISO-8601 timestamp |
commit_id | string | No | Time-travel: commit ContentId |
policy | object | No | Per-source policy override (see below) |
Legacy format: The array format
"from-named": [...]with"alias"and"graph"fields is still accepted for backward compatibility. The"fromNamed"object format is preferred.
Dataset-Local Aliases
Aliases provide short names for referencing graphs in query patterns. They are especially useful when:
- Same graph IRI exists in multiple ledgers - Use distinct aliases to disambiguate
- Complex IRIs - Use short aliases instead of repeating long IRIs
Example: Disambiguating same graph IRI across ledgers
{
"@context": { "ex": "http://example.org/ns/" },
"fromNamed": {
"salesProducts": {
"@id": "sales:main",
"@graph": "http://example.org/vocab#products"
},
"inventoryProducts": {
"@id": "inventory:main",
"@graph": "http://example.org/vocab#products"
}
},
"select": ["?g", "?sku", "?data"],
"where": [
["graph", "?g", { "@id": "?sku", "ex:data": "?data" }]
]
}
In this example, both ledgers have a graph with the same IRI (http://example.org/vocab#products). The aliases salesProducts and inventoryProducts (the object keys) allow you to reference them distinctly.
Validation Rules:
- Aliases must be unique across the entire dataset (both
fromandfromNamed) - Aliases cannot collide with identifiers (the
@idvalues) - Duplicate aliases will cause an error
Graph Selector Values
The graph field accepts three types of values:
| Value | Meaning |
|---|---|
"default" | Explicitly select the ledger’s default graph |
"txn-meta" | Select the built-in transaction metadata graph (urn:fluree:{ledger_id}#txn-meta) |
"<full-iri>" | Select a user-defined named graph by its full IRI |
Note: If using #txn-meta fragment syntax in @id, do not also specify graph: "txn-meta". This is considered ambiguous and will return an error.
Per-Source Policy Override
Each graph source can have its own policy, enabling fine-grained access control where different graphs in the same query use different policies.
Policy object fields:
identity: Identity IRI stringpolicy-class: Policy class IRI or array of IRIspolicy: Inline policy JSONpolicy-values: Policy parameter valuesdefault-allow: Boolean (default: false). Governs access when no policies match. Ignored (forced false) ifidentityis specified but has no subject node in the ledger.
Example:
{
"from": [
{
"@id": "public:main",
"policy": {
"default-allow": true
}
},
{
"@id": "sensitive:main",
"policy": {
"identity": "did:fluree:alice",
"policy-class": ["ex:EmployeePolicy"],
"default-allow": false
}
}
],
"select": ["?data"],
"where": [{ "@id": "?s", "ex:data": "?data" }]
}
Policy Precedence:
- Per-source
policytakes precedence over globaloptspolicy - If a source has no
policyfield, the global policy (if any) applies
Multi-Ledger Queries
Query across different ledgers:
JSON-LD Query:
{
"@context": { "ex": "http://example.org/ns/" },
"from": ["customers:main", "orders:main"],
"select": ["?customer", "?order"],
"where": [
{ "@id": "?customer", "ex:name": "Alice" },
{ "@id": "?order", "ex:customer": "?customer" }
]
}
SPARQL:
PREFIX ex: <http://example.org/ns/>
SELECT ?customer ?order
FROM <customers:main>
FROM <orders:main>
WHERE {
?customer ex:name "Alice" .
?order ex:customer ?customer .
}
Time-Aware Datasets
Query graphs at different time points:
JSON-LD Query:
{
"@context": { "ex": "http://example.org/ns/" },
"from": ["ledger1:main@t:100", "ledger2:main@t:200"],
"select": ["?data"],
"where": [
{ "@id": "?entity", "ex:data": "?data" }
]
}
SPARQL:
PREFIX ex: <http://example.org/ns/>
SELECT ?data
FROM <ledger1:main@t:100>
FROM <ledger2:main@t:200>
WHERE {
?entity ex:data ?data .
}
Graph Patterns
Default Graph Only
Query only the default graph:
SPARQL:
SELECT ?name
FROM <mydb:main>
WHERE {
?person ex:name ?name .
# Matches triples in default graph only
}
Named Graph Only
Query only named graphs:
SPARQL:
SELECT ?name
FROM NAMED <mydb:main>
WHERE {
GRAPH <mydb:main> {
?person ex:name ?name .
}
}
Mixed Patterns
Combine default and named graph patterns:
SPARQL:
PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?commit ?t
FROM <mydb:main>
FROM NAMED <mydb:main#txn-meta>
WHERE {
?person ex:name ?name .
GRAPH <mydb:main#txn-meta> {
?commit f:t ?t .
}
}
Use Cases
Data Integration
Combine data from multiple sources:
{
"@context": { "ex": "http://example.org/ns/" },
"from": ["customers:main", "products:main", "orders:main"],
"select": ["?customer", "?product", "?order"],
"where": [
{ "@id": "?customer", "ex:name": "Alice" },
{ "@id": "?order", "ex:customer": "?customer" },
{ "@id": "?order", "ex:product": "?product" }
]
}
Cross-Ledger Joins
Join data across different ledgers:
PREFIX ex: <http://example.org/ns/>
SELECT ?customer ?order ?product
FROM <customers:main>
FROM <orders:main>
FROM <products:main>
WHERE {
?customer ex:name "Alice" .
?order ex:customer ?customer .
?order ex:product ?product .
}
SERVICE for Cross-Ledger Queries
Use SPARQL SERVICE to explicitly target specific ledgers within a query:
PREFIX ex: <http://example.org/ns/>
SELECT ?customerName ?productName ?quantity
FROM <customers:main>
FROM NAMED <orders:main>
FROM NAMED <products:main>
WHERE {
# Get customer from default graph
?customer ex:name ?customerName .
# Get orders from orders ledger
SERVICE <fluree:ledger:orders:main> {
?order ex:customer ?customer ;
ex:product ?product ;
ex:quantity ?quantity .
}
# Get product details from products ledger
SERVICE <fluree:ledger:products:main> {
?product ex:name ?productName .
}
}
SERVICE provides explicit control over which ledger each pattern executes against, enabling complex cross-ledger joins with clear data provenance.
See SPARQL Service Queries for full documentation.
Time-Consistent Queries
Query multiple ledgers at the same point in time:
{
"@context": { "ex": "http://example.org/ns/" },
"from": [
"products:main@t:1000",
"inventory:main@t:1000",
"pricing:main@t:1000"
],
"select": ["?product", "?stock", "?price"],
"where": [
{ "@id": "?product", "ex:stockLevel": "?stock" },
{ "@id": "?product", "ex:price": "?price" }
]
}
Error Handling
Common Dataset Errors
| Error | Cause | Resolution |
|---|---|---|
| Duplicate alias | Same alias used twice in dataset spec | Use unique aliases for each source |
| Alias collision | alias matches an existing @id | Choose a different alias name |
| Ambiguous graph selector | Both #txn-meta fragment AND graph field specified | Use only one method |
| Unknown ledger | Ledger reference not found | Verify ledger exists and is accessible |
| Unknown graph IRI | Graph IRI not found in ledger | Verify graph was ingested and indexed |
| Binary index required | Named graph query requires binary index | Ensure ledger has been indexed |
Example Error Messages
Duplicate alias:
{
"error": "Duplicate dataset-local alias: 'products' appears multiple times"
}
Ambiguous graph selector:
{
"error": "Ambiguous graph selector: cannot specify both #txn-meta fragment and graph field"
}
SPARQL Execution Modes
Fluree supports two SPARQL execution modes:
Ledger-Bound Mode
When a query targets a single ledger (via endpoint or single FROM clause), GRAPH patterns reference named graphs within that ledger:
-- Ledger-bound: GRAPH references graphs inside mydb:main
SELECT ?name ?price
FROM <mydb:main>
WHERE {
GRAPH <http://example.org/graphs/products> {
?product ex:name ?name ;
ex:price ?price .
}
}
Connection-Bound Mode
When querying across multiple ledgers, use SERVICE to select which ledger each pattern executes against:
-- Connection-bound: SERVICE selects the target ledger
SELECT ?name ?stock
WHERE {
SERVICE <fluree:ledger:sales:main> {
GRAPH <http://example.org/graphs/products> {
?product ex:name ?name .
}
}
SERVICE <fluree:ledger:inventory:main> {
?product ex:stock ?stock .
}
}
When to use each mode:
- Ledger-bound: Single ledger queries, standard SPARQL datasets within one ledger
- Connection-bound: Multi-ledger queries, explicit control over data provenance
Best Practices
- Consistent Time Points: Use the same time specifier for all graphs in a query
- Graph Selection: Use FROM NAMED when you need to identify the source graph
- Use Aliases: Create meaningful aliases for complex graph IRIs or disambiguation
- Performance: Queries across multiple ledgers may be slower
- Data Locality: Consider data locality when designing multi-ledger queries
- Policy Granularity: Use per-source policy when different graphs need different access control
Related Documentation
- JSON-LD Query: JSON-LD Query syntax
- SPARQL: SPARQL syntax
- Time Travel: Historical queries
- Datasets and Named Graphs: Concept documentation
CONSTRUCT Queries
CONSTRUCT queries generate RDF graphs from query results, enabling you to transform and reshape data into new graph structures.
Overview
CONSTRUCT queries return RDF graphs instead of variable bindings. They’re useful for:
- Extracting subgraphs
- Transforming data structures
- Creating new graph views
- Generating RDF for export
Basic CONSTRUCT
SPARQL CONSTRUCT
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?person ex:displayName ?name .
}
WHERE {
?person ex:name ?name .
}
This generates a new graph with ex:displayName properties from ex:name values.
Multiple Triples
Construct multiple triples per solution:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?person ex:displayName ?name .
?person ex:hasAge ?age .
}
WHERE {
?person ex:name ?name .
?person ex:age ?age .
}
Complex Patterns
Conditional Construction
Use filters to conditionally construct triples:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?person ex:status ex:Adult .
}
WHERE {
?person ex:age ?age .
FILTER (?age >= 18)
}
Transitive Relationships
Construct inferred relationships:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?person ex:knows ?friendOfFriend .
}
WHERE {
?person ex:friend ?friend .
?friend ex:friend ?friendOfFriend .
}
CONSTRUCT with Aggregation
Construct triples from aggregated data:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?category ex:productCount ?count .
}
WHERE {
{
SELECT ?category (COUNT(?product) AS ?count)
WHERE {
?product ex:category ?category .
}
GROUP BY ?category
}
}
Use Cases
Extract Subgraph
Extract a subgraph for a specific entity:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?s ?p ?o .
}
WHERE {
ex:alice ?p ?o .
BIND (ex:alice AS ?s)
}
Transform Data Structure
Transform data into a different structure:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?order ex:hasItem [
ex:product ?product ;
ex:quantity ?quantity
] .
}
WHERE {
?order ex:item ?item .
?item ex:product ?product .
?item ex:quantity ?quantity .
}
Generate Inferred Facts
Generate inferred relationships:
PREFIX ex: <http://example.org/ns/>
CONSTRUCT {
?person ex:ancestor ?ancestor .
}
WHERE {
?person ex:parent+ ?ancestor .
}
Best Practices
- Specific Patterns: Construct specific patterns rather than wildcards
- Filter Early: Apply filters in WHERE clause, not CONSTRUCT
- Avoid Duplicates: Use DISTINCT if needed
- Performance: CONSTRUCT can be expensive for large result sets
Related Documentation
- SPARQL: SPARQL query language
- JSON-LD Query: JSON-LD Query language
- Output Formats: Result formats
Graph Crawl
Graph crawl enables recursive traversal of relationships — following links between entities to discover connected data. This is built on property paths, which provide operators for transitive, inverse, and multi-predicate traversal.
Overview
Graph crawl queries traverse relationships in the graph, following links from one entity to another. Common use cases:
- Social networks — Find friends-of-friends, influence chains
- Organizational hierarchies — Traverse reporting structures
- Knowledge graphs — Follow related concepts across multiple hops
- Dependency graphs — Trace transitive dependencies
- Bill of materials — Recursive part containment
Property path operators
Property paths are the foundation of graph crawl. They let you follow relationships beyond a single hop.
| Operator | Syntax | Description | Example |
|---|---|---|---|
One or more (+) | ex:knows+ | Follow 1+ times (transitive closure) | Friends of friends |
Zero or more (*) | ex:knows* | Follow 0+ times (includes self) | Self and all reachable |
Inverse (^) | ^ex:reportsTo | Follow in reverse direction | Who reports to me? |
Alternative (|) | ex:knows|ex:colleague | Match any of several predicates | Social or professional connections |
Sequence (/) | ex:knows/ex:name | Chain of predicates | Names of friends |
JSON-LD Query syntax
Property paths are defined using @path in the @context:
{
"@context": {
"ex": "http://example.org/",
"allReports": { "@path": "^ex:reportsTo+" }
},
"select": ["?name"],
"where": [
{ "@id": "ex:ceo", "allReports": "?person" },
{ "@id": "?person", "ex:name": "?name" }
]
}
Two syntax forms are available:
String form (SPARQL-style):
"knowsTransitive": { "@path": "ex:knows+" }
Array form (S-expression):
"knowsTransitive": { "@path": ["+", "ex:knows"] }
SPARQL syntax
Property paths are native SPARQL syntax:
PREFIX ex: <http://example.org/>
# All people reachable through ex:knows (1+ hops)
SELECT ?person WHERE {
ex:alice ex:knows+ ?person .
}
Patterns
Friend-of-friend network
Find everyone Alice knows, directly or transitively:
SPARQL:
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
SELECT ?name WHERE {
ex:alice ex:knows+ ?person .
?person schema:name ?name .
}
JSON-LD:
{
"@context": {
"ex": "http://example.org/",
"schema": "http://schema.org/",
"knowsTransitive": { "@path": "ex:knows+" }
},
"select": ["?name"],
"where": [
{ "@id": "ex:alice", "knowsTransitive": "?person" },
{ "@id": "?person", "schema:name": "?name" }
]
}
Organizational hierarchy
Find all people who report to a manager (at any level):
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
SELECT ?name WHERE {
?person ex:reportsTo+ ex:vp-engineering .
?person schema:name ?name .
}
Or use inverse path to start from the top:
SELECT ?name WHERE {
ex:vp-engineering ^ex:reportsTo+ ?person .
?person schema:name ?name .
}
Class hierarchy (RDFS)
Find all subclasses of a class:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subclass WHERE {
?subclass rdfs:subClassOf+ ex:Vehicle .
}
Path chaining (sequence)
Follow a chain of different predicates:
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
# Names of Alice's friends' managers
SELECT ?managerName WHERE {
ex:alice ex:knows/ex:reportsTo ?manager .
?manager schema:name ?managerName .
}
In JSON-LD:
{
"@context": {
"ex": "http://example.org/",
"schema": "http://schema.org/",
"friendManager": { "@path": "ex:knows/ex:reportsTo" }
},
"select": ["?managerName"],
"where": [
{ "@id": "ex:alice", "friendManager": "?manager" },
{ "@id": "?manager", "schema:name": "?managerName" }
]
}
Multi-relationship traversal
Follow any of several relationship types:
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
# People connected by friendship OR professional relationship
SELECT ?name WHERE {
ex:alice (ex:knows|ex:colleague)+ ?person .
?person schema:name ?name .
}
Self-inclusive traversal (zero or more)
Use * to include the starting node:
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
# Alice and everyone she knows (transitively)
SELECT ?name WHERE {
ex:alice ex:knows* ?person .
?person schema:name ?name .
}
With *, Alice herself is included in results (zero hops). With +, only her connections are returned.
Inverse relationships
Find who links to a given entity:
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
# Who has Alice as a friend?
SELECT ?name WHERE {
?person ex:knows ex:alice .
?person schema:name ?name .
}
# Same thing using inverse path syntax
SELECT ?name WHERE {
ex:alice ^ex:knows ?person .
?person schema:name ?name .
}
Inverse paths are especially useful in transitive queries:
# All ancestors in a taxonomy
SELECT ?ancestor WHERE {
ex:goldRetriever rdfs:subClassOf+ ?ancestor .
}
# All descendants (inverse)
SELECT ?descendant WHERE {
ex:animal ^rdfs:subClassOf+ ?descendant .
}
Performance considerations
Property path cost
| Operator | Cost | Notes |
|---|---|---|
| Simple predicate | O(log n) | Single index lookup |
Sequence (/) | O(k * log n) | k joins, each indexed |
One-or-more (+) | O(reachable * log n) | Breadth-first expansion |
Zero-or-more (*) | O(reachable * log n) | Same as + plus start node |
Alternative (|) | O(sum of alternatives) | Each alternative evaluated |
Inverse (^) | O(log n) | Uses OPST index |
Transitive operators (+, *) expand breadth-first and track visited nodes to detect cycles. The cost is proportional to the number of reachable nodes, not the total graph size.
Optimizing traversals
-
Start from the specific side — If you know one endpoint, start there.
ex:alice ex:knows+ ?personis faster than?person ex:knows+ ex:alicebecause it anchors the traversal. -
Add filters after traversal — Filter the results of a traversal rather than trying to filter during:
SELECT ?name WHERE { ex:alice ex:knows+ ?person . ?person schema:name ?name . ?person ex:department "Engineering" . } -
Use
+over*when possible —*includes the start node and typically has one more step to evaluate. -
Prefer sequence over transitive for known depth — If you know the relationship is exactly 2 hops, use a sequence (
ex:a/ex:b) or two explicit patterns instead ofex:a+. -
Combine with LIMIT — For exploration, limit results to avoid materializing the full reachable set:
SELECT ?person WHERE { ex:alice ex:knows+ ?person . } LIMIT 100
Cycle handling
Fluree’s property path engine tracks visited nodes during transitive expansion. If a cycle is encountered (e.g., A knows B knows C knows A), the traversal stops at the already-visited node. This prevents infinite loops without requiring user intervention.
Property paths vs. explicit patterns
For fixed-depth queries, explicit patterns are equivalent and sometimes clearer:
Property path (2 hops):
SELECT ?fof WHERE {
ex:alice ex:knows/ex:knows ?fof .
}
Explicit patterns (2 hops):
SELECT ?fof WHERE {
ex:alice ex:knows ?friend .
?friend ex:knows ?fof .
}
Both produce the same results. Use property paths when:
- The depth is variable or unknown (transitive closure)
- You want compact syntax for chains
- You need alternative or inverse traversal
Use explicit patterns when:
- The depth is fixed and small
- You need to bind intermediate variables (e.g.,
?friendabove) - You want maximum clarity
Related documentation
- JSON-LD Query — Property Paths — Full JSON-LD property path syntax
- SPARQL — Property Paths — SPARQL property path reference
- Datasets — Multi-graph traversal
- Explain Plans — Understand query execution
Explain Plans
Explain plans provide insight into how the query planner reorders WHERE-clause patterns, helping you understand optimization decisions and diagnose performance issues.
Overview
Explain plans show:
- Whether patterns were reordered and why
- Whether database statistics were available for optimization
- The cardinality category and cost estimate assigned to each pattern
- The original vs. optimized pattern order
- Execution strategy hints for special fast paths such as fused property-join stars
Requesting Explain Plans
JSON-LD Query
Use the /fluree/explain endpoint (or the CLI fluree query --explain ...) to get a plan without executing.
For JSON-LD, the explain request body is the same as a normal JSON-LD query body.
{
"@context": { "ex": "http://example.org/ns/" },
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
],
"from": "mydb:main"
}
SPARQL
Use the explain endpoint with SPARQL content type:
PREFIX ex: <http://example.org/ns/>
SELECT ?name
WHERE {
?person ex:name ?name .
}
How the Query Planner Works
The query planner reorders WHERE-clause patterns to minimize the number of intermediate rows flowing through the execution pipeline. It uses a greedy algorithm that places patterns one at a time, choosing the cheapest eligible pattern at each step.
Pattern Categories
Every pattern is classified into one of four cardinality categories:
| Category | Meaning | Patterns |
|---|---|---|
| Source | Produces rows (estimated row count) | Triple, VALUES, UNION, Subquery, IndexSearch, VectorSearch, GeoSearch, S2Search, Graph, PropertyPath, R2rml, Service |
| Reducer | Shrinks the stream (multiplier < 1.0) | MINUS, EXISTS, NOT EXISTS |
| Expander | Grows the stream (multiplier >= 1.0) | OPTIONAL |
| Deferred | No cardinality effect | FILTER, BIND |
Placement Priority
The greedy loop places patterns in this priority order:
- Eligible reducers (lowest multiplier first) — shrink the stream as early as possible.
- Sources (lowest row count first, preferring patterns that join on already-bound variables) — most selective first.
- Eligible expanders (lowest multiplier first) — defer row expansion until prerequisite variables are bound.
A reducer or expander is “eligible” when at least one of its variables is already bound by a previously placed pattern.
FILTER and BIND patterns are integrated into the greedy loop: after each
source, reducer, or expander is placed, any deferred patterns whose input
variables are now satisfied are drained in original-position order. For BIND
patterns, only the expression’s input variables must be bound — the target
variable is an output that feeds back into bound_vars, potentially enabling
further deferred patterns to be placed immediately (cascading placement).
Compound Pattern Nesting
When a deferred pattern (FILTER or BIND) becomes ready and the last placed
pattern is a compound pattern (UNION, Graph, or Service), the planner nests
the deferred pattern into the compound pattern’s inner lists instead of
appending it after. This enables the deferred pattern to participate in the
compound pattern’s inner reorder_patterns pipeline, unlocking:
- Optimal placement after the specific triple that binds its variable
- Range-safe filter pushdown to index scans
- Inline evaluation during joins
Nesting occurs only when the compound pattern guarantees the deferred pattern’s required variable is bound:
| Compound | Nest? | Guarantee |
|---|---|---|
| UNION | Yes | Variable must appear in the intersection of all branches |
| Graph | Yes | Variable is in inner patterns or is the graph name variable |
| Service | Yes | Variable is in inner patterns or is the endpoint variable |
| OPTIONAL | No | Left-join: inner vars may be Unbound |
| MINUS | No | Anti-join: inner vars not exported to outer scope |
| EXISTS | No | Filter-only: inner vars not exported |
| NOT EXISTS | No | Filter-only: inner vars not exported |
For UNION, the deferred pattern is cloned into every branch. For Graph and
Service, it is appended to the inner pattern list. Recursion is handled
naturally: when a nested filter lands inside a branch containing another
compound pattern, the branch’s reorder_patterns call applies the same logic.
Bound-Variable-Aware Estimation
The planner tracks which variables become bound as each pattern is placed. This significantly affects estimates for subsequent patterns:
- A triple
?s :name ?namewith?sunbound is a property scan — estimated at the full property count (or a 1000-row fallback). - The same triple with
?salready bound from an earlier pattern is a per-subject lookup — estimated atcount / ndv_subjects(typically ~10 rows).
This context-aware scoring also applies inside compound patterns: UNION branches and subqueries receive database statistics and use the same selectivity model for their inner patterns.
Statistics-Based vs. Fallback Scoring
When a StatsView is available (after at least one indexing cycle), the
planner uses HLL-derived property statistics:
- count: total number of triples for this predicate
- ndv_subjects: number of distinct subjects
- ndv_values: number of distinct objects
Without statistics, the planner falls back to heuristic constants:
| Pattern Type | Fallback Estimate |
|---|---|
| ExactMatch | 1 |
| BoundSubject | 10 |
| BoundObject | 1,000 |
| PropertyScan | 1,000 |
| FullScan | 1e12 |
Search and Graph Source Estimates
Search patterns (IndexSearch, VectorSearch, GeoSearch, S2Search) use their
limit field when present. Without an explicit limit, the planner assumes a
default of 100 rows. Graph patterns recursively estimate their inner
patterns. Service patterns use a very high estimate (1e12) so they are placed
last among sources, minimizing data sent to the remote endpoint.
Reading Explain Output
The explain plan shows two sections: the original pattern order and the optimized order. Each pattern is annotated with its category and estimate.
Example output for a multi-pattern query:
=== Query Optimization Explain (Generalized) ===
Statistics available: yes
Optimization: patterns reordered
--- Original Pattern Order ---
[1] ?s :age ?age | category=Source row_count=5000
[2] ?s :name ?name | category=Source row_count=10000
[3] FILTER((> ?age 25)) | category=Deferred
[4] OPTIONAL { ?s :email ?email } | category=Expander multiplier=1.00
--- Optimized Pattern Order ---
[1] ?s :age ?age | category=Source row_count=5000
[2] FILTER((> ?age 25)) | category=Deferred
[3] ?s :name ?name | category=Source row_count=10000
[4] OPTIONAL { ?s :email ?email } | category=Expander multiplier=1.00
Key things to look for:
- Source row_count: Lower values are placed first. If a pattern with a high row count appears early, it may indicate missing statistics or an inherently broad pattern.
- Reducer multiplier: Values below 1.0 indicate the fraction of rows that survive. A MINUS with multiplier 0.90 removes ~10% of rows.
- Deferred placement: FILTERs and BINDs appear immediately after all of their input variables become bound. BIND outputs cascade — a BIND placed early can enable subsequent FILTERs or BINDs that depend on its target variable. If a FILTER appears late, check whether its variables could be bound sooner.
- Statistics available: no: Without statistics, the planner uses conservative heuristics. Run at least one indexing cycle to enable statistics-based optimization.
Execution Hints
Explain responses may also include an execution-hints array. These are not
generic cardinality estimates; they describe when the executor expects to use a
specialized path after planning.
For the star-join work, look for:
property_join: the planner chose the same-subject property-join pathproperty_join_fused_star: the planner chose property join and also fused trailing same-subject single-tripleOPTIONALs plus eligible trailingFILTER/BINDpatterns into the same star operator
Typical fields include:
required_triples: number of required star predicatesfused_optional_triples: number of fused trailingOPTIONALtriplesfused_filters: number of trailing filters evaluated inside the star pathfused_binds: number of trailing binds evaluated inside the star pathwidth_score: weighted star width used by the property-join gateoptional_bonus: how much of the width score came from trailing optionals
This is the clearest signal that a query like:
?deal a crm:Deal ;
crm:name ?name ;
crm:amount ?amount ;
crm:stage ?stage .
OPTIONAL { ?deal crm:probability ?probability }
OPTIONAL { ?deal crm:closedAt ?closedAt }
FILTER (!STRSTARTS(STR(?stage), "Closed"))
is using the fused two-pass star path rather than falling back to separate OPTIONAL and FILTER operators.
Indexes
Scan operations use one of four index permutations depending on which components of the triple pattern are bound:
- SPOT: Subject-Predicate-Object-Time — used when the subject is bound
- POST: Predicate-Object-Subject-Time — used for predicate+object lookups
- OPST: Object-Predicate-Subject-Time — used for object-based lookups
- PSOT: Predicate-Subject-Object-Time — used for full predicate scans
Filter Optimization
Filters are automatically optimized by the query engine in three ways:
- Dependency-based placement: Filters and BINDs are placed as soon as all their input variables are bound, as part of the greedy reordering loop. BIND target variables feed back into the bound set, enabling cascading placement of dependent patterns.
- Index pushdown: Range-safe filters (comparisons like
>,<,>=,<=on indexed properties) are pushed down to the index scan, reducing the number of rows read. - Inline evaluation: Filters whose variables are all bound by a join are evaluated inside the join operator itself, avoiding the overhead of a separate filter pass.
- BIND filter fusion: When a FILTER’s last required variable is the output of a BIND, the filter is fused into the BindOperator and evaluated inline after computing each row’s BIND value. Failing rows are dropped before materialization, eliminating a separate FilterOperator pass.
Best Practices
- Review plans for new queries: Use explain to verify that the planner chose a reasonable order, especially for queries with many patterns.
- Ensure statistics are available: Statistics enable much better estimates. If explain shows “Statistics available: no”, check that at least one indexing cycle has completed.
- Check for high row counts early in the plan: A source with a very high row count placed first can indicate a missing join variable or an overly broad pattern.
- Use LIMIT on search patterns: IndexSearch, VectorSearch, GeoSearch, and
S2Search patterns use their
limitfield for cost estimation. Providing an explicit limit helps the planner place them more accurately.
Related Documentation
- JSON-LD Query: JSON-LD Query syntax
- SPARQL: SPARQL syntax
- Indexing and Search: Index details
- Debugging Queries: Troubleshooting guide
Tracking and Fuel Limits
Fluree provides query tracking and fuel limits to monitor and control query execution, ensuring system stability and performance.
Query Tracking
Query tracking provides visibility into query execution, helping you understand query behavior and performance.
Enable Tracking
Enable tracking via the opts object. Use "meta": true to enable all tracking, or selectively enable specific metrics:
{
"@context": { "ex": "http://example.org/ns/" },
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
],
"opts": { "meta": true }
}
Or enable specific metrics:
{
"opts": {
"meta": {
"time": true,
"fuel": true,
"policy": true
}
}
}
Tracked Information
Tracking provides:
- time: Query execution duration (formatted as “12.34ms”)
- fuel: Total cost as a decimal value (rounded to 3 places)
- policy: Policy evaluation statistics (
{policy-id: {executed: N, allowed: M}})
Fuel Limits
Fuel limits control resource consumption, preventing runaway queries from consuming excessive resources.
What Is Fuel?
Fuel is a decimal measure of query/transaction cost. Internally it is accumulated as micro-fuel (1 fuel = 1000 micro-fuel) and reported back rounded to 3 decimal places. Costs reflect actual work — primarily I/O — rather than output cardinality.
Cost ladder (per event):
| Event | Cost (fuel) |
|---|---|
| Index leaflet touched (per scan batch, regardless of cache state) | 1.000 |
| Forward-dict touch (per dict-backed value resolved during result materialization) | 1.000 |
Flake returned from a db.range call (e.g. SHACL graph reads, graph crawl) | 0.001 |
| Overlay/novelty row materialized | 0.001 |
| R2RML row emitted (Iceberg/Parquet) | 0.001 |
| Transaction commit baseline (once per commit) | 100.000 |
| Staged flake (per non-schema flake in a transaction) | 0.001 |
REGEX / REPLACE evaluation | 0.001 |
Hash function (MD5, SHA1, SHA256, SHA384, SHA512) | 0.001 |
UUID / STRUUID | 0.001 |
geof:distance | 0.001 |
Vector similarity (DotProduct, CosineSimilarity, EuclideanDistance) | 0.002 |
Fulltext (per-row BM25 scoring) | 0.005 |
Cheap operations (comparisons, arithmetic, type checks, simple string ops, datetime extraction, etc.) cost zero — instrumentation overhead would dwarf the actual cost.
Setting Fuel Limits
Set fuel limits via opts.max-fuel (decimal allowed). Setting a fuel limit implicitly enables fuel tracking:
{
"@context": { "ex": "http://example.org/ns/" },
"select": ["?name"],
"where": [
{ "@id": "?person", "ex:name": "?name" }
],
"opts": { "max-fuel": 10000 }
}
You can also use "maxFuel" or "max_fuel" as alternative key names. The HTTP equivalent is the fluree-max-fuel header.
Fuel Limit Behavior
When fuel limit is exceeded:
- Query execution stops
- Error returned to client
- Partial results not returned
Response Format
When tracking is enabled, the response includes tracking information as top-level siblings:
{
"status": 200,
"result": [...],
"time": "12.34ms",
"fuel": 42.317,
"policy": {
"http://example.org/myPolicy": {
"executed": 10,
"allowed": 8
}
}
}
The fuel value is decimal with up to 3 places of precision. The HTTP x-fdb-fuel response header carries the same value.
Best Practices
Tracking
- Enable for Debugging: Use
"opts": {"meta": true}to debug slow queries - Monitor Performance: Track query performance over time
- Identify Bottlenecks: Use tracking to identify performance bottlenecks
Fuel Limits
- Set Appropriate Limits: Set fuel limits based on expected query complexity
- Monitor Fuel Usage: Track fuel usage to optimize queries
- Prevent Runaway Queries: Use fuel limits to prevent resource exhaustion
Related Documentation
- JSON-LD Query: JSON-LD Query syntax
- SPARQL: SPARQL syntax
- Explain Plans: Query execution plans
Query-Time Reasoning
This page covers how to enable and use reasoning in your queries. For background concepts see Reasoning and inference; for the full list of supported OWL/RDFS constructs see the OWL & RDFS reference.
The reasoning parameter
Add a "reasoning" key to any JSON-LD query to control which inference modes
are active:
Single mode
{
"@context": {"ex": "http://example.org/"},
"select": ["?s"],
"where": {"@id": "?s", "@type": "ex:Person"},
"reasoning": "rdfs"
}
Multiple modes
{
"select": ["?s"],
"where": {"@id": "?s", "@type": "ex:Person"},
"reasoning": ["rdfs", "owl2rl"]
}
Disable reasoning
{
"select": ["?s"],
"where": {"@id": "?s", "@type": "ex:Person"},
"reasoning": "none"
}
Use "none" to suppress auto-enabled RDFS and any ledger-wide defaults.
Valid mode strings
| String | Aliases | Mode |
|---|---|---|
"rdfs" | — | RDFS subclass/subproperty expansion |
"owl2ql" | "owl-ql", "owlql" | OWL 2 QL query rewriting (includes RDFS) |
"owl2rl" | "owl-rl", "owlrl" | OWL 2 RL forward-chaining materialization |
"datalog" | — | Datalog rule execution |
"none" | — | Disable all reasoning |
Default behavior
When the reasoning key is absent from a query:
- RDFS auto-enables if your data contains
rdfs:subClassOforrdfs:subPropertyOfhierarchies. This is lightweight (query rewriting only) and usually desirable. - OWL 2 QL, OWL 2 RL, and Datalog remain disabled unless enabled via ledger-wide configuration.
To override ledger defaults for a single query, use "reasoning": "none".
Examples
The examples below assume this schema and data have been transacted:
{
"@context": {
"ex": "http://example.org/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"owl": "http://www.w3.org/2002/07/owl#"
},
"insert": [
{"@id": "ex:Student", "rdfs:subClassOf": {"@id": "ex:Person"}},
{"@id": "ex:GradStudent", "rdfs:subClassOf": {"@id": "ex:Student"}},
{"@id": "ex:alice", "@type": "ex:GradStudent", "ex:name": "Alice"},
{"@id": "ex:bob", "@type": "ex:Person", "ex:name": "Bob"},
{"@id": "ex:livesWith", "@type": "owl:SymmetricProperty"},
{"@id": "ex:alice", "ex:livesWith": {"@id": "ex:bob"}},
{"@id": "ex:hasAncestor", "@type": "owl:TransitiveProperty"},
{"@id": "ex:carol", "ex:hasAncestor": {"@id": "ex:dave"}},
{"@id": "ex:dave", "ex:hasAncestor": {"@id": "ex:eve"}},
{"@id": "ex:hasMother", "owl:inverseOf": {"@id": "ex:childOf"}},
{"@id": "ex:frank", "ex:hasMother": {"@id": "ex:grace"}}
]
}
RDFS: subclass expansion
Query for all ex:Person instances — Alice is returned even though she was
only typed as ex:GradStudent:
{
"@context": {"ex": "http://example.org/"},
"select": ["?name"],
"where": {
"@id": "?s", "@type": "ex:Person",
"ex:name": "?name"
},
"reasoning": "rdfs"
}
Result: ["Alice", "Bob"]
Without reasoning (or with "reasoning": "none"), only "Bob" is returned
because Alice’s explicit type is GradStudent, not Person.
OWL 2 RL: symmetric properties
Query who lives with Bob — Alice is inferred even though only
alice livesWith bob was asserted:
{
"@context": {"ex": "http://example.org/"},
"select": ["?who"],
"where": {"@id": "ex:bob", "ex:livesWith": "?who"},
"reasoning": "owl2rl"
}
Result: ["ex:alice"]
OWL 2 RL: transitive properties
Query for all ancestors of Carol — Eve is inferred through transitivity:
{
"@context": {"ex": "http://example.org/"},
"select": ["?ancestor"],
"where": {"@id": "ex:carol", "ex:hasAncestor": "?ancestor"},
"reasoning": "owl2rl"
}
Result: ["ex:dave", "ex:eve"]
OWL 2 QL: inverse properties
Query childOf — inferred from the hasMother / inverseOf declaration:
{
"@context": {"ex": "http://example.org/"},
"select": ["?child"],
"where": {"@id": "ex:grace", "ex:childOf": "?child"},
"reasoning": "owl2ql"
}
Result: ["ex:frank"]
OWL 2 RL: domain and range inference
If your schema declares rdfs:domain and rdfs:range:
{
"insert": [
{"@id": "ex:teaches", "rdfs:domain": {"@id": "ex:Professor"},
"rdfs:range": {"@id": "ex:Course"}},
{"@id": "ex:alice", "ex:teaches": {"@id": "ex:cs101"}}
]
}
Then with "reasoning": "owl2rl":
ex:alice rdf:type ex:Professoris inferred (from domain)ex:cs101 rdf:type ex:Courseis inferred (from range)
Combined modes
Enable RDFS + OWL 2 RL + Datalog together:
{
"select": ["?s"],
"where": {"@id": "?s", "@type": "ex:Person"},
"reasoning": ["rdfs", "owl2rl", "datalog"],
"rules": [
{
"@context": {"ex": "http://example.org/"},
"where": {"@id": "?p", "ex:parent": {"ex:parent": "?gp"}},
"insert": {"@id": "?p", "ex:grandparent": {"@id": "?gp"}}
}
]
}
OWL 2 RL facts are materialized first, then Datalog rules run over the combined base + OWL data, and finally RDFS query rewriting is applied.
SPARQL
In SPARQL queries, reasoning is controlled via the Fluree-specific
PRAGMA reasoning directive. Property paths (+, *, ^) provide a
complementary mechanism for navigating transitive and inverse relationships
directly in the query pattern — see SPARQL for details.
Interaction with ledger configuration
If f:reasoningDefaults is set in the ledger configuration graph (see
Setting groups), those modes are the
baseline for every query. The per-query reasoning parameter can:
- Add modes — the query modes are merged with the defaults.
- Disable all —
"reasoning": "none"overrides the defaults entirely.
The f:overrideControl setting on the ledger config determines whether
query-time overrides are allowed. See
Override control for details.
Performance considerations
| Mode | Overhead | Caching |
|---|---|---|
| RDFS | Negligible — query rewriting only | N/A |
| OWL 2 QL | Negligible — query rewriting only | N/A |
| OWL 2 RL | First query materializes derived facts; subsequent queries use cache | LRU cache (16 entries), keyed on database state + reasoning modes |
| Datalog | Each unique rule set + database state combination is cached | Same LRU cache as OWL 2 RL |
Tips:
- Start with RDFS if you only need class/property hierarchies — it has virtually zero overhead.
- Use OWL 2 QL when you also need inverse properties and domain/range inference but want to stay in the query-rewriting approach.
- Use OWL 2 RL when you need the full rule set (transitive, symmetric,
functional properties,
owl:sameAs, restrictions, property chains). - The materialization cache is invalidated when the underlying data changes (new transactions), so the first query after a write will re-materialize.
Related pages
| Topic | Page |
|---|---|
| Conceptual introduction | Reasoning and inference |
| Custom inference rules | Datalog rules |
| Supported OWL & RDFS constructs | OWL & RDFS reference |
| Ledger-wide reasoning config | Setting groups |
Datalog Rules
Datalog rules let you define custom inference logic that goes beyond what
OWL and RDFS provide. Rules are expressed in a familiar JSON-LD pattern syntax
with where (conditions) and insert (conclusions) clauses, and execute in a
fixpoint loop that can chain rules together.
For background concepts see Reasoning and inference; for enabling reasoning in queries see Query-time reasoning.
Quick example
Infer a grandparent relationship from two parent hops:
{
"@context": {"ex": "http://example.org/"},
"select": ["?gp"],
"where": {"@id": "ex:alice", "ex:grandparent": "?gp"},
"reasoning": "datalog",
"rules": [
{
"@context": {"ex": "http://example.org/"},
"where": {"@id": "?person", "ex:parent": {"ex:parent": "?gp"}},
"insert": {"@id": "?person", "ex:grandparent": {"@id": "?gp"}}
}
]
}
The rule says: “For any ?person whose parent has a parent ?gp, insert
that ?person has a grandparent ?gp.” The query then finds Alice’s
grandparents using the inferred facts.
Rule format
Each rule is a JSON object with three parts:
| Key | Required | Description |
|---|---|---|
@context | Yes | JSON-LD context for expanding compact IRIs |
where | Yes | Pattern(s) that must match for the rule to fire |
insert | Yes | Pattern(s) of new facts to derive when the rule fires |
@id | No | Optional name/IRI for the rule (for documentation/debugging) |
Where clause
The where clause defines the conditions under which the rule fires. It
follows the same pattern syntax as JSON-LD queries.
Single pattern:
"where": {"@id": "?person", "ex:parent": "?parent"}
Multiple patterns (implicit join on shared variables):
"where": [
{"@id": "?person", "ex:parent": "?parent"},
{"@id": "?parent", "ex:name": "?parentName"}
]
Nested patterns (shorthand for multi-hop traversal):
"where": {"@id": "?person", "ex:parent": {"ex:parent": "?gp"}}
This is equivalent to two patterns joined on an intermediate variable.
With filters:
"where": [
{"@id": "?person", "ex:age": "?age"},
["filter", "(>= ?age 65)"]
]
Insert clause
The insert clause defines what facts to produce for each set of matching
variable bindings.
"insert": {"@id": "?person", "ex:grandparent": {"@id": "?gp"}}
- Variables (
?person,?gp) are replaced with the bound values fromwhere. - Use
{"@id": "?var"}for IRI/entity values; use"?var"directly for literal values. - Multiple triples can be generated from a single insert pattern.
Providing rules
Rules can be provided in two ways:
1. Query-time rules
Pass rules directly in the query via the rules array. This is the simplest
approach and doesn’t require any prior setup:
{
"select": ["?result"],
"where": {"@id": "?s", "ex:derived": "?result"},
"reasoning": "datalog",
"rules": [ ... ]
}
Note: Providing a
rulesarray automatically enables datalog reasoning — you don’t strictly need"reasoning": "datalog", though including it is recommended for clarity.
2. Database-stored rules
Rules can be stored in the database as f:rule assertions and referenced via
ledger configuration. This is useful for rules that should apply consistently
across all queries.
Store a rule:
{
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/"
},
"insert": {
"@id": "ex:grandparentRule",
"f:rule": {
"@context": {"ex": "http://example.org/"},
"where": {"@id": "?person", "ex:parent": {"ex:parent": "?gp"}},
"insert": {"@id": "?person", "ex:grandparent": {"@id": "?gp"}}
}
}
}
Configure the ledger to use stored rules:
{
"insert": {
"@id": "urn:fluree:mydb:main:config:ledger",
"@type": "f:LedgerConfig",
"f:datalogDefaults": {
"f:datalogEnabled": true,
"f:rulesSource": {
"@type": "f:GraphRef",
"f:graphSource": {"f:graphSelector": {"@id": "f:defaultGraph"}}
},
"f:allowQueryTimeRules": true
}
}
}
See Setting groups — datalogDefaults for full configuration options.
When both stored and query-time rules are present, they are merged and execute together in the same fixpoint loop.
Examples
Sibling inference
Infer siblings from shared parents:
{
"@context": {"ex": "http://example.org/"},
"select": ["?sibling"],
"where": {"@id": "ex:alice", "ex:sibling": "?sibling"},
"reasoning": "datalog",
"rules": [
{
"@context": {"ex": "http://example.org/"},
"where": [
{"@id": "?person", "ex:parent": "?parent"},
{"@id": "?sibling", "ex:parent": "?parent"}
],
"insert": {"@id": "?person", "ex:sibling": {"@id": "?sibling"}}
}
]
}
Note: This rule will also infer that a person is their own sibling. You could add a filter
["filter", "(!= ?person ?sibling)"]to exclude self-references.
Chained rules (uncle + aunt)
Multiple rules that build on each other:
{
"@context": {"ex": "http://example.org/"},
"select": ["?aunt"],
"where": {"@id": "ex:alice", "ex:aunt": "?aunt"},
"reasoning": "datalog",
"rules": [
{
"@context": {"ex": "http://example.org/"},
"where": {"@id": "?person", "ex:parent": {"ex:brother": "?uncle"}},
"insert": {"@id": "?person", "ex:uncle": {"@id": "?uncle"}}
},
{
"@context": {"ex": "http://example.org/"},
"where": {
"@id": "?person",
"ex:uncle": {
"ex:spouse": {"@id": "?aunt", "ex:gender": {"@id": "ex:Female"}}
}
},
"insert": {"@id": "?person", "ex:aunt": {"@id": "?aunt"}}
}
]
}
The second rule (aunt) depends on facts derived by the first rule (uncle). The fixpoint loop handles this automatically — it keeps iterating until no new facts are produced.
Rules with filters
Classify people by age:
{
"@context": {"ex": "http://example.org/"},
"select": ["?person"],
"where": {"@id": "?person", "ex:status": "senior"},
"reasoning": "datalog",
"rules": [
{
"@context": {"ex": "http://example.org/"},
"where": [
{"@id": "?person", "ex:age": "?age"},
["filter", "(>= ?age 65)"]
],
"insert": {"@id": "?person", "ex:status": "senior"}
}
]
}
Combining with OWL reasoning
Datalog rules can build on OWL-derived facts. For example, use OWL 2 RL to materialize transitive and symmetric properties, then use Datalog for custom business logic:
{
"select": ["?recommendation"],
"where": {"@id": "ex:alice", "ex:recommended": "?recommendation"},
"reasoning": ["owl2rl", "datalog"],
"rules": [
{
"@context": {"ex": "http://example.org/"},
"where": [
{"@id": "?person", "ex:friend": "?friend"},
{"@id": "?friend", "ex:likes": "?item"},
{"@id": "?person", "ex:likes": "?item"}
],
"insert": {"@id": "?person", "ex:recommended": {"@id": "?item"}}
}
]
}
If ex:friend is declared as a owl:SymmetricProperty, OWL 2 RL
materializes the reverse friendship links, and then the Datalog rule can
find items liked by mutual friends.
Execution model
Fixpoint evaluation
Rules execute in a fixpoint loop:
- All rules are applied against the current data (base + previously derived facts).
- New facts produced in this iteration are collected.
- If any new facts were produced, go back to step 1 with the expanded fact set.
- When no new facts are produced (fixpoint reached), the loop terminates.
This means:
- Recursive rules work. A rule can produce facts that trigger itself again.
- Rule chaining works. Rule A can produce facts that trigger Rule B, and vice versa.
- Termination is guaranteed by the budget controls (max iterations, max facts, max time, max memory).
Execution order
Rules are topologically sorted by their predicate dependencies: a rule that
generates ex:uncle triples runs before a rule that consumes ex:uncle in its
where clause. This minimizes the number of fixpoint iterations needed.
Interaction with OWL 2 RL
When both OWL 2 RL and Datalog are enabled:
- OWL 2 RL materialization runs first.
- Datalog rules run over the combined base data + OWL-derived facts.
- Both result sets are merged into a single overlay for query execution.
Filter expressions
Filters use S-expression syntax within the where array:
["filter", "(expression)"]
Available operators
| Category | Operators |
|---|---|
| Comparison | =, !=, <, >, <=, >= |
| Logical | and, or, not |
| Arithmetic | +, -, *, / |
| String | str, strlen, contains, strstarts, strends |
| Type checking | isIRI, isBlank, isLiteral, bound |
Examples
["filter", "(> ?age 21)"]
["filter", "(and (>= ?age 18) (< ?age 65))"]
["filter", "(contains ?name \"Smith\")"]
["filter", "(!= ?person ?other)"]
Performance considerations
- Keep rules focused. Broad rules that match many patterns produce more derived facts and require more iterations.
- Budget limits apply. The same time/fact/memory budgets as OWL 2 RL materialization apply to Datalog execution (default: 30s, 1M facts, 100MB).
- Results are cached. The same rule set + database state returns instantly from cache on subsequent queries.
- Query-time rules disable caching across queries with different rule sets, since the cache key includes a hash of the rules.
Related pages
| Topic | Page |
|---|---|
| Conceptual introduction | Reasoning and inference |
| Enabling reasoning in queries | Query-time reasoning |
| OWL & RDFS constructs | OWL & RDFS reference |
| Ledger-wide config | Setting groups |
Transactions
Transactions are how you write data to Fluree. This section covers all transaction patterns, formats, and behaviors.
Transaction Patterns
Overview
High-level introduction to Fluree transactions:
- Transaction lifecycle
- Commit process
- Indexing pipeline
- Transaction semantics
Insert
Adding new data to the database:
- Basic inserts
- Batch inserts
- Entity creation
- Relationship creation
Upsert
Idempotent transactions that replace values for supplied predicates:
- Upsert semantics
- Use cases for upsert
- Idempotent operations
- Synchronization patterns
Update (WHERE/DELETE/INSERT)
Targeted updates to existing data:
- WHERE clause patterns
- DELETE operations
- INSERT operations
- Conditional updates
- Partial updates
Retractions
Removing data from the database:
- Retract specific triples
- Retract entire entities
- Retraction semantics
- Time travel and retractions
Transaction Formats
Turtle Ingest
Import RDF data in Turtle format:
- Turtle syntax
- Bulk imports
- File uploads
- Format conversion
Signed / Credentialed Transactions
Cryptographically signed transactions:
- JWS signed transactions
- Verifiable Credentials
- Identity-based transactions
- Audit trails
Transaction Metadata
Commit Receipts and tx-id
Understanding transaction receipts:
- Receipt structure
- Transaction ID (t)
- Commit ID
- Timestamps
- Flake counts
Indexing Side-Effects
How transactions affect indexing:
- Background indexing
- Novelty layer
- Index triggers
- Performance considerations
Transaction Concepts
Immutability
Once committed, transactions are immutable:
- Changes are represented as new assertions and retractions
- Historical data is never modified
- Complete audit trail preserved
- Time travel enabled by immutability
Atomicity
Transactions are atomic:
- All changes succeed or all fail
- No partial commits
- Consistent state guaranteed
- Validation before commit
Transaction Time
Every transaction receives a unique transaction time:
- Monotonically increasing integer (t)
- Unique across all ledgers in instance
- Used for time travel queries
- Basis for temporal ordering
Assertions and Retractions
Transactions consist of two operations:
- Assertions: Add new triples
- Retractions: Remove existing triples
Updates are represented as retraction + assertion pairs.
Common Transaction Patterns
Create Entity
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": "alice@example.org"
}
]
}
Update Property
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"delete": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"insert": [
{ "@id": "ex:alice", "schema:age": 31 }
]
}
Add Relationship
{
"@graph": [
{
"@id": "ex:alice",
"schema:worksFor": { "@id": "ex:company-a" }
}
]
}
Remove Property
{
"delete": [
{ "@id": "ex:alice", "schema:telephone": "?phone" }
],
"where": [
{ "@id": "ex:alice", "schema:telephone": "?phone" }
]
}
Replace Entity (Upsert)
POST /upsert?ledger=mydb:main
{
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice Smith",
"schema:email": "alice.smith@example.org"
}
]
}
Transaction Types
- Insert (
POST /insert) — add triples (JSON-LD or Turtle) - Update (
POST /update) — WHERE/DELETE/INSERT (JSON-LD) or SPARQL UPDATE - Upsert (
POST /upsert) — replace values for the predicates you supply (JSON-LD, Turtle, TriG)
Transaction Validation
Before commit, transactions are validated:
Syntax Validation:
- Valid JSON/JSON-LD syntax
- Well-formed IRIs
- Correct datatype formats
Semantic Validation:
- Type compatibility
- Constraint adherence
- Reference integrity (optional)
Policy Validation:
- Authorization checks
- Access control enforcement
- Data-level permissions
Validation failures result in transaction rejection with detailed error messages.
Transaction Size Limits
Default Limits:
- Transaction size: 10 MB
- Triple count: 10,000 triples
- Configurable per deployment
Large Transactions:
- Split into batches for large imports
- Use streaming for bulk data
- Monitor indexing lag
See Indexing Side-Effects for performance considerations.
Error Handling
Transaction Errors
Common errors:
PARSE_ERROR- Invalid JSON-LDINVALID_IRI- Malformed IRITYPE_ERROR- Type mismatchCONSTRAINT_VIOLATION- Constraint violatedPOLICY_DENIED- Not authorized
Retry Logic
Implement retry for transient errors:
- Network errors: Retry with backoff
- Conflicts: Retry with updated data
- Timeouts: Retry after delay
- Server errors: Retry with backoff
Idempotency
For idempotent transactions:
- Use replace mode
- Include unique identifiers
- Design for retry safety
- Use deterministic IRIs
Best Practices
1. Use Meaningful IRIs
Good:
{"@id": "ex:user-alice-123"}
Bad:
{"@id": "ex:1"}
2. Batch Related Changes
Combine related entities in single transaction:
{
"@graph": [
{ "@id": "ex:order-123", "ex:customer": { "@id": "ex:alice" } },
{ "@id": "ex:order-123", "ex:product": { "@id": "ex:widget" } },
{ "@id": "ex:order-123", "ex:total": 99.99 }
]
}
3. Use Appropriate Mode
- Default mode: For additive operations
- Replace mode: For complete replacements, sync operations
4. Include Types
Always specify entity types:
{
"@id": "ex:alice",
"@type": "schema:Person"
}
5. Use Typed Literals
Be explicit about types:
{
"schema:birthDate": {
"@value": "1990-05-15",
"@type": "xsd:date"
}
}
6. Design for History
Consider how data will look in historical queries:
- Use descriptive property names
- Include relevant metadata
- Design for temporal queries
7. Monitor Performance
Track transaction metrics:
- Commit time
- Indexing lag
- Error rates
- Transaction size
Related Documentation
- Getting Started: Write Data - Quickstart guide
- Concepts: Time Travel - Temporal semantics
- API: POST /update - HTTP endpoint details
- Indexing - Indexing and search
Transaction Overview
This document provides a comprehensive overview of how transactions work in Fluree, from submission to final indexing.
What is a Transaction?
A transaction in Fluree is a set of changes to the database, represented as RDF triple assertions and retractions. Each transaction is:
- Atomic: All changes succeed or all fail
- Immutable: Once committed, never modified
- Timestamped: Assigned a unique transaction time (t)
- Auditable: Complete metadata preserved
Transaction Lifecycle
1. Submission
Client submits transaction to Fluree using either JSON-LD or SPARQL UPDATE:
JSON-LD Transaction:
POST /update?ledger=mydb:main
Content-Type: application/json
{
"@context": { "ex": "http://example.org/ns/" },
"@graph": [{ "@id": "ex:alice", "ex:name": "Alice" }]
}
SPARQL UPDATE:
POST /update/mydb:main
Content-Type: application/sparql-update
PREFIX ex: <http://example.org/ns/>
INSERT DATA { ex:alice ex:name "Alice" }
2. Parsing
Fluree parses the transaction:
- Parse JSON/JSON-LD structure
- Expand compact IRIs using @context
- Convert to internal representation
3. Validation
Transaction is validated:
- Syntax validation: Well-formed IRIs, valid datatypes
- Semantic validation: Type compatibility, constraints
- Policy validation: Authorization checks
If validation fails, transaction is rejected with error details.
4. Conversion to Flakes
Transaction is converted to flakes (Fluree’s internal triple format):
Subject Predicate Object Operation
------------------------------------------------------------------------
ex:alice rdf:type schema:Person assert
ex:alice schema:name "Alice"^^xsd:string assert
Each flake is a tuple: (subject, predicate, object, transaction-time, operation, metadata)
5. Assignment of Transaction Time
Fluree assigns a unique transaction time (t):
- Monotonically increasing integer
- Unique across all transactions
- Used for temporal queries
Example: t=42
6. Commit
Transaction is committed to storage:
- Flakes written to transaction log
- Commit metadata created (ContentId, timestamp, etc.)
- Commit ID published to nameservice
Commit Data:
{
"t": 42,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT42",
"flakes_added": 2,
"flakes_retracted": 0
}
7. Nameservice Update
Nameservice is updated with new commit:
commit_tupdated to 42commit_idupdated- Other processes can see new commit
8. Indexing (Asynchronous)
Background process indexes the transaction:
- Flakes added to index structures (SPOT, POST, OPST, PSOT)
- Query-optimized data structures built
- Graph sources updated (if applicable)
9. Index Publication
When indexing completes:
index_tupdated to 42index_idpublished- Novelty layer reduced
Transaction Components
@context
Defines namespace mappings:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
}
}
The @context can be:
- Inline (as above)
- External URL:
"@context": "http://example.org/context.jsonld" - Array of contexts:
"@context": [url1, {...}]
@graph
Contains the entities being asserted:
{
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
},
{
"@id": "ex:bob",
"@type": "schema:Person",
"schema:name": "Bob"
}
]
}
opts
Top-level parse-time options. These control how the transaction is parsed (not what it writes).
{
"@context": {"ex": "http://example.org/ns/"},
"opts": {"strictCompactIri": false},
"@graph": [{"@id": "legacy:bob", "ex:name": "Bob"}]
}
Currently supported keys:
strictCompactIri(bool, defaulttrue): Reject unresolved compact-looking IRIs (prefix:suffixwhere the prefix is missing from@context). Disable only for legacy data where bareprefix:suffixstrings are intentional. See IRIs and @context — Strict Compact-IRI Guard.
Programmatic Rust callers can override strictCompactIri via TxnOpts.strict_compact_iri, which takes precedence over the JSON opts value.
WHERE/DELETE/INSERT
For updates, specify what to match, delete, and insert:
{
"where": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"delete": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"insert": [
{ "@id": "ex:alice", "schema:age": 31 }
]
}
SPARQL UPDATE
Alternatively, use SPARQL UPDATE syntax with Content-Type: application/sparql-update:
PREFIX ex: <http://example.org/ns/>
DELETE {
?person ex:age ?oldAge .
}
INSERT {
?person ex:age 31 .
}
WHERE {
?person ex:name "Alice" .
?person ex:age ?oldAge .
}
SPARQL UPDATE supports:
INSERT DATA- Insert ground triplesDELETE DATA- Delete specific triplesDELETE WHERE- Delete matching patternsDELETE/INSERT WHERE- Full update with patterns
See SPARQL UPDATE for complete documentation.
Transaction Endpoints
Fluree exposes three transaction endpoints (all under /v1/fluree/):
POST /insert— add triples (JSON-LD or Turtle)POST /update— WHERE/DELETE/INSERT (JSON-LD) and SPARQL UPDATEPOST /upsert— replace values for the predicates you supply (JSON-LD, Turtle, TriG)
See Insert, Update, and Upsert for details.
Transaction Semantics
Assertions
Assertions add new triples to the database:
{
"@id": "ex:alice",
"schema:name": "Alice"
}
Creates triple:
ex:alice schema:name "Alice"
Retractions
Retractions remove existing triples:
{
"delete": [
{ "@id": "ex:alice", "schema:age": "?age" }
],
"where": [
{ "@id": "ex:alice", "schema:age": "?age" }
]
}
Removes matching triples.
Updates
Updates are retraction + assertion:
t=10: ex:alice schema:age 30 (assert)
t=20: ex:alice schema:age 30 (retract), ex:alice schema:age 31 (assert)
Historical queries can see both states.
Commit Metadata
Each commit includes rich metadata:
Core Fields:
t: Transaction timetimestamp: ISO 8601 timestampcommit_id: Content-addressed identifier (CIDv1)
Counts:
flakes_added: Number of assertionsflakes_retracted: Number of retractions
Provenance (in txn-meta graph, under the commit subject):
f:identity: Authenticated identity acting on the transaction. System-controlled — verified DID for signed requests, otherwise fromopts.identity/CommitOpts::identity. Any user-suppliedf:identityin the body is overridden.f:author: Optional author claim. Pure user txn-meta — supplyf:authoras a top-level property in the envelope-form transaction body.f:message: Optional commit message. Pure user txn-meta — supplyf:messageas a top-level property in the envelope-form transaction body.previous_commit_id: ContentId of previous commit (in the commit envelope).
See Commit Receipts for details.
Indexing Pipeline
Commit vs Index
Commit (immediate):
- Transaction written to log
- Available for time travel queries
- Small, append-only files
Index (asynchronous):
- Query-optimized data structures
- Background process
- May lag behind commits
Novelty Layer
The novelty layer is uncommitted data between index and commit:
index_t = 40
commit_t = 45
novelty layer = transactions 41, 42, 43, 44, 45
Queries combine:
- Indexed data (up to t=40)
- Novelty layer (t=41 to t=45)
Index Structures
Fluree maintains four index permutations (SPOT, POST, OPST, PSOT):
SPOT (Subject-Predicate-Object-Time):
ex:alice → schema:name → "Alice" → t=10
POST (Predicate-Object-Subject-Time):
schema:name → "Alice" → ex:alice → t=10
OPST (Object-Predicate-Subject-Time):
"Alice" → schema:name → ex:alice → t=10
PSOT (Predicate-Subject-Object-Time):
schema:name → ex:alice → "Alice" → t=10
Different query patterns use different indexes for optimal performance.
Transaction Properties
Atomicity
All-or-nothing execution:
- Validation failure rejects entire transaction
- Parse error rejects entire transaction
- No partial commits
Consistency
Database remains consistent:
- Constraints enforced
- Types validated
- References checked (optionally)
Isolation
Transactions are isolated:
- Each sees consistent snapshot
- No dirty reads
- Serializable execution
Durability
Committed data is durable:
- Written to persistent storage
- Replicated (if configured)
- Immutable
Error Handling
Validation Errors
{
"error": "ValidationError",
"message": "Invalid IRI format",
"code": "INVALID_IRI",
"details": {
"iri": "not a uri",
"line": 3
}
}
Conflict Errors
{
"error": "ConflictError",
"message": "Concurrent modification detected",
"code": "CONCURRENT_MODIFICATION"
}
Policy Errors
{
"error": "Forbidden",
"message": "Policy denies transact on mydb:main",
"code": "POLICY_DENIED"
}
Performance Considerations
Transaction Size
- Recommended: < 1,000 triples per transaction
- Maximum: Configurable (default 10,000)
- Large transactions increase commit time
Indexing Lag
- Background indexing may lag behind commits
- Monitor
commit_t - index_tgap - Tune indexing frequency if needed
Batch Operations
For bulk imports:
- Batch into reasonably-sized transactions
- Monitor memory usage
- Allow time for indexing between batches
For initial ledger bootstraps (large Turtle datasets), prefer the Rust bulk import API which streams commits and builds multi-order binary indexes:
See Indexing Side-Effects for details.
Best Practices
1. Meaningful Transaction Units
Group related changes in single transaction:
Good:
{
"@graph": [
{ "@id": "ex:order-123", "ex:customer": { "@id": "ex:alice" } },
{ "@id": "ex:order-123", "ex:items": [...] },
{ "@id": "ex:order-123", "ex:total": 99.99 }
]
}
2. Include Metadata
Add provenance information:
{
"@graph": [
{
"@id": "ex:alice",
"schema:name": "Alice",
"ex:created": "2024-01-22T10:00:00Z",
"ex:createdBy": "user-123"
}
]
}
3. Use Descriptive IRIs
Good: ex:user-alice-123
Bad: ex:1
4. Test Transactions
Test transactions before production:
- Validate JSON-LD syntax
- Check IRI formats
- Verify types and constraints
5. Monitor Performance
Track metrics:
- Average commit time
- Indexing lag
- Transaction size
- Error rate
6. Handle Errors Gracefully
Implement retry logic for transient errors:
- Network errors
- Timeout errors
- Conflict errors (with updated data)
7. Design for Time Travel
Remember data is immutable:
- Changes create new versions
- Historical queries see all versions
- Design with temporal access in mind
Related Documentation
- Insert - Adding new data
- Upsert - Replace mode
- Update - Targeted updates
- Commit Receipts - Receipt details
- Indexing Side-Effects - Indexing behavior
Insert
Insert operations add new data to Fluree. This is the most common transaction type for creating new entities and relationships.
Basic Insert
Single Entity
Insert a single entity with properties:
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": "alice@example.org",
"schema:age": 30
}
]
}'
Result:
{
"t": 1,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT1",
"flakes_added": 4,
"flakes_retracted": 0
}
This creates 4 triples:
ex:alice rdf:type schema:Person
ex:alice schema:name "Alice"
ex:alice schema:email "alice@example.org"
ex:alice schema:age 30
Multiple Entities
Insert multiple entities in one transaction:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
},
{
"@id": "ex:bob",
"@type": "schema:Person",
"schema:name": "Bob"
},
{
"@id": "ex:carol",
"@type": "schema:Person",
"schema:name": "Carol"
}
]
}
Benefits:
- Atomic: All entities created or none
- Efficient: Single commit, single index update
- Consistent: All entities at same transaction time
Insert with Relationships
Create entities with relationships:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:company-a",
"@type": "schema:Organization",
"schema:name": "Acme Corp"
},
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:worksFor": { "@id": "ex:company-a" }
}
]
}
This creates:
ex:company-a rdf:type schema:Organization
ex:company-a schema:name "Acme Corp"
ex:alice rdf:type schema:Person
ex:alice schema:name "Alice"
ex:alice schema:worksFor ex:company-a
Nested Objects
Create nested structures:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:address": {
"@id": "ex:alice-address",
"@type": "schema:PostalAddress",
"schema:streetAddress": "123 Main St",
"schema:addressLocality": "Springfield",
"schema:postalCode": "12345"
}
}
]
}
This creates two entities (alice and alice-address) linked by schema:address.
Multi-Valued Properties
Add multiple values for a property:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": ["alice@example.org", "alice@work.com"],
"schema:telephone": ["+1-555-0100", "+1-555-0101"]
}
]
}
Creates separate triples for each value:
ex:alice schema:email "alice@example.org"
ex:alice schema:email "alice@work.com"
ex:alice schema:telephone "+1-555-0100"
ex:alice schema:telephone "+1-555-0101"
Typed Literals
Dates
{
"@id": "ex:alice",
"schema:birthDate": {
"@value": "1994-05-15",
"@type": "xsd:date"
}
}
Timestamps
{
"@id": "ex:event",
"schema:startDate": {
"@value": "2024-01-22T10:30:00Z",
"@type": "xsd:dateTime"
}
}
Numbers
{
"@id": "ex:product",
"schema:price": {
"@value": "29.99",
"@type": "xsd:decimal"
}
}
Booleans
{
"@id": "ex:alice",
"schema:active": {
"@value": "true",
"@type": "xsd:boolean"
}
}
Or use native JSON boolean:
{
"@id": "ex:alice",
"schema:active": true
}
Language Tags
Add language-tagged strings:
{
"@id": "ex:alice",
"schema:name": {
"@value": "Alice",
"@language": "en"
},
"schema:description": [
{ "@value": "Software engineer", "@language": "en" },
{ "@value": "Ingénieure logicielle", "@language": "fr" },
{ "@value": "Softwareingenieurin", "@language": "de" }
]
}
Blank Nodes
Create entities without explicit IRIs:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"schema:address": {
"@type": "schema:PostalAddress",
"schema:streetAddress": "123 Main St"
}
}
]
}
Fluree generates a unique IRI for the blank node address.
Adding to Existing Entities
Add properties to existing entities:
Initial Insert (t=1):
{
"@graph": [
{
"@id": "ex:alice",
"schema:name": "Alice"
}
]
}
Add Email (t=2):
{
"@graph": [
{
"@id": "ex:alice",
"schema:email": "alice@example.org"
}
]
}
After t=2, ex:alice has both name and email.
Insert Semantics
Additive by Default
Inserts are additive—they don’t remove existing data:
t=1: INSERT { ex:alice schema:name "Alice" }
Result: ex:alice has name "Alice"
t=2: INSERT { ex:alice schema:age 30 }
Result: ex:alice has name "Alice" AND age 30
Duplicate Prevention
Inserting the same triple again is a no-op:
t=1: INSERT { ex:alice schema:name "Alice" }
t=2: INSERT { ex:alice schema:name "Alice" }
(No change—triple already exists)
Multi-Value Handling
Multiple values create multiple triples:
t=1: INSERT { ex:alice schema:email "alice@example.org" }
t=2: INSERT { ex:alice schema:email "alice@work.com" }
Result: ex:alice has TWO email values
IRI Generation
Explicit IRIs
Specify IRIs explicitly:
{
"@id": "ex:user-12345",
"schema:name": "Alice"
}
UUID-Based IRIs
Generate UUIDs for unique IRIs:
const uuid = crypto.randomUUID();
const entity = {
"@id": `ex:user-${uuid}`,
"schema:name": "Alice"
};
Content-Addressable IRIs
Use content hashing for deterministic IRIs:
const hash = sha256(JSON.stringify(data));
const entity = {
"@id": `ex:entity-${hash}`,
...data
};
Batch Inserts
Small Batches (Recommended)
{
"@graph": [
{ "@id": "ex:user-1", "schema:name": "Alice" },
{ "@id": "ex:user-2", "schema:name": "Bob" },
{ "@id": "ex:user-3", "schema:name": "Carol" }
// ... 100-1000 entities
]
}
Large Imports
For very large imports:
const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
const batch = entities.slice(i, i + batchSize);
await transact({ "@graph": batch });
// Optional: wait for indexing
await sleep(1000);
}
Error Handling
Common Insert Errors
Invalid IRI:
{
"error": "ValidationError",
"message": "Invalid IRI format",
"code": "INVALID_IRI"
}
Type Mismatch:
{
"error": "TypeError",
"message": "Expected number, got string",
"code": "TYPE_ERROR"
}
Constraint Violation:
{
"error": "ConstraintViolation",
"message": "Unique constraint violated",
"code": "CONSTRAINT_VIOLATION"
}
Validation Before Insert
Validate data before inserting:
function validateEntity(entity) {
if (!entity['@id']) {
throw new Error('Entity must have @id');
}
if (!isValidIRI(entity['@id'])) {
throw new Error('Invalid IRI format');
}
// Additional validation...
}
Best Practices
1. Use Meaningful IRIs
Good:
{ "@id": "ex:user-alice-12345" }
Bad:
{ "@id": "ex:1" }
2. Always Include Type
{
"@id": "ex:alice",
"@type": "schema:Person"
}
3. Use Appropriate Datatypes
{
"schema:age": 30,
"schema:price": 29.99,
"schema:active": true,
"schema:birthDate": { "@value": "1994-05-15", "@type": "xsd:date" }
}
4. Batch Related Entities
Insert related entities in same transaction:
{
"@graph": [
{ "@id": "ex:order-123", ... },
{ "@id": "ex:order-item-1", ... },
{ "@id": "ex:order-item-2", ... }
]
}
5. Use Consistent Namespaces
Define and use consistent namespace prefixes:
{
"@context": {
"app": "https://myapp.com/ns/",
"schema": "http://schema.org/"
}
}
6. Include Metadata
Add creation metadata:
{
"@id": "ex:alice",
"schema:name": "Alice",
"app:createdAt": "2024-01-22T10:00:00Z",
"app:createdBy": "user-admin"
}
7. Validate Before Insert
Always validate:
- JSON-LD syntax
- IRI formats
- Required fields
- Type constraints
Performance Tips
1. Batch Appropriately
- Recommended: 100-1000 entities per batch
- Too small: Many commits, slow
- Too large: Memory pressure, long commits
2. Monitor Indexing
Track indexing lag after large inserts:
curl http://localhost:8090/v1/fluree/info/mydb:main
# Check: t - index.t
3. Use Efficient IRIs
Short IRIs are more efficient:
Good: ex:user-123
Less efficient: https://example.org/very/long/path/user-123
4. Minimize Context Size
Use compact contexts:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
}
}
Related Documentation
- Overview - Transaction overview
- Upsert - Replace mode inserts
- Update - Updating existing data
- Data Types - Supported datatypes
- API Endpoints - HTTP API details
Upsert
Upsert operations provide idempotent transactions by replacing the values of the predicates you supply for an entity (matched by @id).
What is Upsert?
Upsert = Update or Insert:
- If the entity exists: for each predicate present in your payload, retract existing values for that predicate and assert the new value(s)
- If the entity doesn’t exist: create it with the supplied triples
This makes upserts safe to retry: sending the same upsert repeatedly produces the same current-state values for those predicates.
HTTP Endpoint
Use the dedicated upsert endpoint:
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice Smith",
"schema:email": "alice.smith@example.org"
}
]
}'
Upsert Behavior
First Transaction (Entity Doesn’t Exist)
{
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": "alice@example.org"
}
]
}
Result: Entity created with specified properties.
Triples After t=1:
ex:alice rdf:type schema:Person
ex:alice schema:name "Alice"
ex:alice schema:email "alice@example.org"
Second Transaction (Entity Exists)
{
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice Smith",
"schema:email": "alice.smith@example.org",
"schema:age": 30
}
]
}
Operations:
- Retract ALL existing properties of ex:alice
- Assert new properties
Flakes:
# Retractions (t=2)
ex:alice schema:name "Alice" (retract)
ex:alice schema:email "alice@example.org" (retract)
# Assertions (t=2)
ex:alice rdf:type schema:Person (assert)
ex:alice schema:name "Alice Smith" (assert)
ex:alice schema:email "alice.smith@example.org" (assert)
ex:alice schema:age 30 (assert)
Triples After t=2:
ex:alice rdf:type schema:Person
ex:alice schema:name "Alice Smith"
ex:alice schema:email "alice.smith@example.org"
ex:alice schema:age 30
Note: The @type is re-asserted (types are always included in replace).
Idempotency
Replace mode is idempotent—repeated submissions produce the same result:
First Submission (t=1):
{"@id": "ex:alice", "schema:name": "Alice", "schema:age": 30}
Result: Entity created.
Second Submission (t=2):
{"@id": "ex:alice", "schema:name": "Alice", "schema:age": 30}
Result: No actual changes (retracts and re-asserts same values).
Third Submission (t=3):
{"@id": "ex:alice", "schema:name": "Alice", "schema:age": 30}
Result: No actual changes.
This makes upserts safe to retry.
Comparison: Insert vs Update vs Upsert
Insert
POST /insert?ledger=mydb:main
Behavior:
- Additive: asserts the triples you submit
- Does not retract existing values automatically
Example:
t=1: INSERT { ex:alice schema:name "Alice", schema:age 30 }
t=2: INSERT { ex:alice schema:email "alice@example.org" }
Result: ex:alice has name, age, AND email (all three)
Update (WHERE/DELETE/INSERT)
POST /update?ledger=mydb:main
Behavior:
- Explicit: you retract exactly what you match in
where/delete, then assertinsert - Most flexible (conditional updates, partial updates, computed values)
Example:
t=1: INSERT { ex:alice schema:name "Alice", schema:age 30 }
t=2: UPDATE { DELETE { ex:alice schema:age 30 } INSERT { ex:alice schema:age 31 } WHERE { ex:alice schema:age 30 } }
Result: ex:alice has name "Alice", age 31
Upsert
POST /upsert?ledger=mydb:main
Behavior:
- Replaces values for the predicates you supply (per subject)
- Leaves other predicates unchanged
- Retry-safe/idempotent for the supplied predicates
Use Cases
1. Synchronization from External Systems
Sync data from external database:
async function syncUser(externalUser) {
await fetch('http://localhost:8090/v1/fluree/upsert?ledger=mydb:main', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
"@graph": [{
"@id": `ex:user-${externalUser.id}`,
"@type": "schema:Person",
"schema:name": externalUser.name,
"schema:email": externalUser.email,
"schema:telephone": externalUser.phone
}]
})
})
}
// Safe to call repeatedly—always matches external state
await syncUser(fetchUserFromDB(123));
2. Idempotent API Operations
Make API operations retry-safe:
// Safe to retry on failure
async function updateProduct(productId, productData) {
return await fetch('http://localhost:8090/v1/fluree/upsert?ledger=mydb:main', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
"@graph": [{
"@id": `ex:product-${productId}`,
...productData
}]
})
})
}
3. Configuration Management
Update configuration atomically:
{
"@graph": [
{
"@id": "ex:config",
"@type": "ex:Configuration",
"ex:apiEndpoint": "https://api.example.com",
"ex:timeout": 30000,
"ex:retries": 3,
"ex:enabled": true
}
]
}
Each update replaces entire configuration—no orphaned settings.
4. State Machine Transitions
Model state machines where entity has well-defined state:
{
"@graph": [
{
"@id": "ex:order-123",
"@type": "ex:Order",
"ex:status": "shipped",
"ex:shippedAt": "2024-01-22T10:30:00Z",
"ex:carrier": "FedEx",
"ex:trackingNumber": "123456789"
}
]
}
Batch Upserts
Upsert multiple entities:
POST /upsert?ledger=mydb:main
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:user-1",
"@type": "schema:Person",
"schema:name": "Alice"
},
{
"@id": "ex:user-2",
"@type": "schema:Person",
"schema:name": "Bob"
},
{
"@id": "ex:user-3",
"@type": "schema:Person",
"schema:name": "Carol"
}
]
}
Each entity is replaced independently.
Type Handling
Types are Preserved
Upsert preserves existing @type values unless you explicitly include @type in the upsert payload (in which case rdf:type is treated like any other predicate and its values are replaced for that subject).
{
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
}
]
}
The @type is always asserted, even if it existed before.
Multiple Types
Entities can have multiple types:
{
"@graph": [
{
"@id": "ex:alice",
"@type": ["schema:Person", "ex:Employee"],
"schema:name": "Alice"
}
]
}
All types are replaced together.
Edge Cases
Empty Replacement
Replacing with minimal data removes other properties:
Before (t=1):
{
"@id": "ex:alice",
"schema:name": "Alice",
"schema:email": "alice@example.org",
"schema:age": 30,
"schema:telephone": "+1-555-0100"
}
Replace (t=2):
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
}
After t=2:
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
}
Email, age, and telephone are removed.
Partial Updates Not Possible
Replace mode replaces ALL properties—partial updates not supported.
For partial updates, use WHERE/DELETE/INSERT.
Error Handling
Same Errors as Default Mode
Replace mode has same validation errors:
{
"error": "ValidationError",
"message": "Invalid IRI format",
"code": "INVALID_IRI"
}
No Special Errors
Replace mode doesn’t introduce new error types—it’s just different semantics for the same operations.
Performance Considerations
Retraction Overhead
Replace mode may retract many triples:
Entity with 50 properties:
- 50 retractions
- 50 assertions
= 100 flakes per entity
For entities with many properties, this can be expensive.
Indexing Impact
Each retraction and assertion updates indexes:
- More work for indexing process
- May increase indexing lag
- Consider batch size for large replacements
Best Practices
1. Use for Idempotent Operations
Good use:
// Idempotent sync
await upsertUser(userId, userData);
await upsertUser(userId, userData); // Safe to retry
2. Include All Required Properties
Always include all properties entity should have:
Good:
{
"@id": "ex:user-123",
"@type": "schema:Person",
"schema:name": "Alice",
"schema:email": "alice@example.org",
"ex:status": "active"
}
Bad (incomplete):
{
"@id": "ex:user-123",
"schema:name": "Alice"
}
3. Use Consistent Schema
Define entity schema and always include all fields:
function createUserTransaction(user) {
return {
"@id": `ex:user-${user.id}`,
"@type": "schema:Person",
"schema:name": user.name || null,
"schema:email": user.email || null,
"schema:telephone": user.phone || null,
"ex:status": user.status || "active"
};
}
4. Document Upsert Usage
Comment when using upsert for idempotent sync:
// Upsert for idempotent sync with external API
await fetch('http://localhost:8090/v1/fluree/upsert?ledger=users:main', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(userPayload),
});
5. Test Idempotency
Verify operations are truly idempotent:
const result1 = await upsert(data);
const result2 = await upsert(data);
// Should produce same final state
6. Monitor Performance
Track metrics for replace operations:
- Flakes retracted
- Flakes asserted
- Commit time
- Indexing lag
7. Consider Alternatives
For partial updates, use WHERE/DELETE/INSERT:
{
"where": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
"delete": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
"insert": [{ "@id": "ex:alice", "schema:age": 31 }]
}
Comparison Table
| Feature | Default Mode | Replace Mode |
|---|---|---|
| Behavior | Additive | Replace all |
| Existing properties | Preserved | Removed |
| Idempotent | No | Yes |
| Partial updates | Yes (with WHERE/DELETE/INSERT) | No |
| Use case | Adding data | Synchronization |
| Retry safety | Requires care | Safe by default |
| Performance | Fewer operations | More operations |
Related Documentation
- Insert - Adding new data
- Update - Partial updates
- Overview - Transaction overview
- API Endpoints - HTTP API details
Update (WHERE/DELETE/INSERT)
The WHERE/DELETE/INSERT pattern enables targeted updates to existing data in Fluree. This is the most flexible update mechanism, allowing conditional modifications, partial updates, and complex transformations.
Basic Pattern
The WHERE/DELETE/INSERT pattern has three clauses:
- WHERE: Pattern to match existing data
- DELETE: Triples to retract (using variables from WHERE)
- INSERT: Triples to assert (using variables from WHERE)
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"delete": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"insert": [
{ "@id": "ex:alice", "schema:age": 31 }
]
}
This:
- Finds the current age of ex:alice
- Deletes that age value
- Inserts the new age value
WHERE clause capabilities
The update transaction where clause uses the same pattern grammar as JSON-LD queries, so you can use rich patterns like OPTIONAL, UNION, FILTER, VALUES, and subqueries.
Two common forms:
- Node-map: a single object (simple triple patterns)
- Array: a sequence of node-maps plus special forms (recommended for anything beyond basic matching)
Supported special forms inside the where array:
["filter", <expr>]["bind", "?var", <expr>](may include multiple var/expr pairs)["optional", <pattern>]["union", <pattern>, <pattern>, ...]["minus", <pattern>]["exists", <pattern>]/["not-exists", <pattern>]["values", <values-clause>]["query", <subquery>](subquery can useselect,groupBy, aggregates like(max ?x), etc.)["graph", <graph-name>, <pattern>]
Expression format for filter/bind supports either:
- Data expressions like
["+", "?x", 1],["and", [">=", "?age", 18], ["=", "?status", "pending"]] - S-expressions like
"(+ ?x 1)"
Graph scoping (named graphs)
JSON-LD update supports writing into user-defined named graphs (ingested via TriG or JSON-LD @graph) and scoping the update to a named graph.
Default graph for WHERE/DELETE/INSERT
Use a top-level graph key to scope the update to a named graph as the default graph:
{
"@context": { "ex": "http://example.org/ns/", "schema": "http://schema.org/" },
"graph": "http://example.org/graphs/audit",
"where": { "@id": "ex:event1", "schema:description": "?old" },
"delete": { "@id": "ex:event1", "schema:description": "?old" },
"insert": { "@id": "ex:event1", "schema:description": "new" }
}
This is the JSON-LD UPDATE analog of SPARQL UPDATE WITH <iri>:
- WHERE patterns are evaluated against the named graph
- DELETE/INSERT templates without an explicit graph are written to that named graph
Writing templates to specific graphs
There are two ways to target graphs in insert / delete templates:
- Per-node
@graph: attach a graph IRI to a node object (overrides the transaction-levelgraph)
{
"insert": [
{ "@id": "ex:event1", "@graph": "http://example.org/graphs/audit", "schema:description": "v" }
]
}
- Template sugar: inside
insert/deletearrays, use["graph", "<graph IRI>", <pattern>]
{
"insert": [
["graph", "http://example.org/graphs/audit", { "@id": "ex:event1", "schema:description": "v" }]
]
}
Notes:
graphis a graph IRI (a string like"http://example.org/graphs/audit")- Named-graph reads are available after indexing completes (see
docs/query/datasets.md)
Dataset scoping for WHERE (from / fromNamed)
JSON-LD update reuses the same dataset keys as JSON-LD query to control where the where clause reads from:
from: scopes the default graph used forwhereevaluation (equivalent to SPARQL UPDATEUSING <iri>)fromNamed: restricts which named graphs are visible towhere["graph", ...]patterns (equivalent to SPARQL UPDATEUSING NAMED <iri>)
This is why JSON-LD update uses from rather than introducing new keywords: it matches the existing JSON-LD query language vocabulary and keeps dataset configuration consistent across read-only queries and updates.
from (WHERE default graph)
When from is present, it scopes the where clause evaluation without changing where templates write:
graph(if present) controls the default graph for DELETE/INSERT templates (SPARQL UPDATEWITH)fromcontrols the default graph(s) forwhereevaluation (SPARQL UPDATEUSING)
Notes:
fromcan be:- a string graph IRI (shorthand for
{"graph": "<iri>"}) - an object with
{"graph": "<iri>"}(or{"graph": ["<iri1>", "<iri2>"]}) - an array of graph IRIs/selectors (multiple graphs are evaluated as a merged default graph)
- a string graph IRI (shorthand for
- If your
insert/deletetemplates write into the same graph as the top-levelgraph, you can omit per-template graph selection. The top-levelgraphbecomes the default target for templates that don’t specify@graph(or["graph", ...]sugar). - If you want to write to multiple graphs in one update, keep a top-level
graphas the default (optional) and use per-template["graph", ...]for the exceptions.
{
"@context": { "ex": "http://example.org/ns/", "schema": "http://schema.org/" },
"graph": "http://example.org/g2",
"from": { "graph": "http://example.org/g1" },
"where": { "@id": "ex:s", "schema:description": "?d" },
"insert": [{ "@id": "ex:s", "schema:copyFromG1": "?d" }]
}
Example: read from one graph, write to two graphs
{
"@context": { "ex": "http://example.org/ns/", "schema": "http://schema.org/" },
"graph": "http://example.org/g2",
"from": { "graph": "http://example.org/g1" },
"where": { "@id": "ex:s", "schema:description": "?d" },
"insert": [
{ "@id": "ex:s", "schema:copyFromG1": "?d" },
["graph", "http://example.org/audit", { "@id": "ex:event1", "schema:description": "copied description" }]
]
}
fromNamed (WHERE named graphs allowlist)
Use fromNamed to allow (and optionally alias) named graphs for where ["graph", ...] patterns:
Notes:
- In
whereGRAPH patterns, you can reference the graph by alias (e.g."g2") or by the graph IRI (e.g."http://example.org/g2"). Aliases are just convenience names for matching. - In
insert/deletetemplates, graph selection is a write target. You can use:- the full graph IRI (
"http://example.org/g2") - a compact IRI/term that expands via
@context(e.g."ex:g2") - the
fromNamedalias (e.g."g2") for consistency within the same update transaction
- the full graph IRI (
{
"@context": { "ex": "http://example.org/ns/" },
"fromNamed": [
{ "alias": "g2", "graph": "http://example.org/g2" }
],
"where": [
["graph", "g2", { "@id": "ex:s", "ex:p": "?o" }]
],
"insert": [["graph", "g2", { "@id": "ex:s", "ex:q": "touched" }]]
}
Same example, but with a compacted graph IRI via @context:
{
"@context": { "ex": "http://example.org/ns/" },
"fromNamed": [{ "alias": "g2", "graph": "ex:g2" }],
"where": [["graph", "g2", { "@id": "ex:s", "ex:p": "?o" }]],
"insert": [["graph", "ex:g2", { "@id": "ex:s", "ex:q": "touched" }]]
}
Same idea without an explicit alias (the fromNamed string acts as its own identifier):
{
"@context": { "ex": "http://example.org/ns/" },
"fromNamed": ["ex:g2"],
"where": [["graph", "ex:g2", { "@id": "ex:s", "ex:p": "?o" }]],
"insert": [["graph", "ex:g2", { "@id": "ex:s", "ex:q": "touched" }]]
}
Simple Property Update
Update a single property value:
curl -X POST "http://localhost:8090/v1/fluree/update?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:email": "?oldEmail" }
],
"delete": [
{ "@id": "ex:alice", "schema:email": "?oldEmail" }
],
"insert": [
{ "@id": "ex:alice", "schema:email": "alice.new@example.org" }
]
}'
Multiple Property Updates
Update several properties at once:
{
"where": [
{ "@id": "ex:alice", "schema:name": "?oldName" },
{ "@id": "ex:alice", "schema:email": "?oldEmail" }
],
"delete": [
{ "@id": "ex:alice", "schema:name": "?oldName" },
{ "@id": "ex:alice", "schema:email": "?oldEmail" }
],
"insert": [
{ "@id": "ex:alice", "schema:name": "Alice Johnson" },
{ "@id": "ex:alice", "schema:email": "alice.j@example.org" }
]
}
Conditional Updates
Only update if condition is met:
{
"where": [
{ "@id": "ex:alice", "schema:age": "?age" },
{ "@id": "ex:alice", "ex:status": "?status" },
["filter", ["and", [">=", "?age", 18], ["=", "?status", "pending"]]]
],
"delete": [
{ "@id": "ex:alice", "ex:status": "?status" }
],
"insert": [
{ "@id": "ex:alice", "ex:status": "approved" }
]
}
The update only happens if Alice is 18+ and status is “pending”.
Pattern Matching
Find and Update
Find entities matching a pattern and update them:
{
"where": [
{ "@id": "?person", "@type": "schema:Person" },
{ "@id": "?person", "ex:status": "pending" }
],
"delete": [
{ "@id": "?person", "ex:status": "pending" }
],
"insert": [
{ "@id": "?person", "ex:status": "active" }
]
}
This updates ALL people with status=“pending” to status=“active”.
Relationship-Based Updates
Update based on relationships:
{
"where": [
{ "@id": "?employee", "schema:worksFor": "ex:company-a" },
{ "@id": "?employee", "ex:salary": "?oldSalary" },
["bind", "?newSalary", ["*", "?oldSalary", 1.1]]
],
"delete": [
{ "@id": "?employee", "ex:salary": "?oldSalary" }
],
"insert": [
{ "@id": "?employee", "ex:salary": "?newSalary" }
]
}
Gives all company-a employees a 10% raise.
Variable Transformation
Use variables from WHERE in INSERT with transformations:
{
"where": [
{ "@id": "ex:product-123", "ex:price": "?currentPrice" },
["bind", "?newPrice", ["*", "?currentPrice", 0.9]]
],
"delete": [
{ "@id": "ex:product-123", "ex:price": "?currentPrice" }
],
"insert": [
{ "@id": "ex:product-123", "ex:price": "?newPrice" },
{ "@id": "ex:product-123", "ex:previousPrice": "?currentPrice" }
]
}
Applies 10% discount and saves previous price.
Partial Updates
Update only specific properties, leaving others unchanged:
Current State:
ex:alice schema:name "Alice"
ex:alice schema:email "alice@example.org"
ex:alice schema:age 30
ex:alice schema:telephone "+1-555-0100"
Update Only Age:
{
"where": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"delete": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
],
"insert": [
{ "@id": "ex:alice", "schema:age": 31 }
]
}
Result:
ex:alice schema:name "Alice" (unchanged)
ex:alice schema:email "alice@example.org" (unchanged)
ex:alice schema:age 31 (updated)
ex:alice schema:telephone "+1-555-0100" (unchanged)
Adding Properties
Add a property without WHERE (when it might not exist):
{
"insert": [
{ "@id": "ex:alice", "schema:telephone": "+1-555-0100" }
]
}
Or conditionally add if missing:
{
"where": [
{ "@id": "ex:alice", "schema:name": "?name" },
["optional", { "@id": "ex:alice", "schema:telephone": "?existingPhone" }],
["filter", ["not", ["bound", "?existingPhone"]]]
],
"insert": [
{ "@id": "ex:alice", "schema:telephone": "+1-555-0100" }
]
}
Removing Properties
Remove a property entirely:
{
"where": [
{ "@id": "ex:alice", "schema:telephone": "?phone" }
],
"delete": [
{ "@id": "ex:alice", "schema:telephone": "?phone" }
]
}
No INSERT clause—just deletes.
Multi-Value Properties
Replace One Value
{
"where": [
{ "@id": "ex:alice", "schema:email": "alice.old@example.org" }
],
"delete": [
{ "@id": "ex:alice", "schema:email": "alice.old@example.org" }
],
"insert": [
{ "@id": "ex:alice", "schema:email": "alice.new@example.org" }
]
}
Add Value
{
"insert": [
{ "@id": "ex:alice", "schema:email": "alice.work@example.org" }
]
}
Remove One Value
{
"where": [
{ "@id": "ex:alice", "schema:email": "alice.old@example.org" }
],
"delete": [
{ "@id": "ex:alice", "schema:email": "alice.old@example.org" }
]
}
Remove All Values
{
"where": [
{ "@id": "ex:alice", "schema:email": "?email" }
],
"delete": [
{ "@id": "ex:alice", "schema:email": "?email" }
]
}
Relationship Updates
Change Relationship
{
"where": [
{ "@id": "ex:alice", "schema:worksFor": "?oldCompany" }
],
"delete": [
{ "@id": "ex:alice", "schema:worksFor": "?oldCompany" }
],
"insert": [
{ "@id": "ex:alice", "schema:worksFor": "ex:company-b" }
]
}
Add Relationship
{
"insert": [
{ "@id": "ex:alice", "schema:knows": "ex:bob" }
]
}
Remove Relationship
{
"where": [
{ "@id": "ex:alice", "schema:knows": "ex:bob" }
],
"delete": [
{ "@id": "ex:alice", "schema:knows": "ex:bob" }
]
}
Complex Updates
Cascading Updates
Update related entities:
{
"where": [
{ "@id": "ex:order-123", "ex:status": "?oldStatus" },
{ "@id": "ex:order-123", "ex:items": "?item" },
{ "@id": "?item", "ex:status": "?itemStatus" }
],
"delete": [
{ "@id": "ex:order-123", "ex:status": "?oldStatus" },
{ "@id": "?item", "ex:status": "?itemStatus" }
],
"insert": [
{ "@id": "ex:order-123", "ex:status": "shipped" },
{ "@id": "?item", "ex:status": "shipped" }
]
}
Computed Values
Calculate new values based on old:
{
"where": [
{ "@id": "ex:product-123", "ex:inventory": "?current" },
{ "@id": "ex:product-123", "ex:sold": "?sold" },
["bind", "?newInventory", ["-", "?current", "?sold"]]
],
"delete": [
{ "@id": "ex:product-123", "ex:inventory": "?current" }
],
"insert": [
{ "@id": "ex:product-123", "ex:inventory": "?newInventory" }
]
}
Error Handling
No Match
If WHERE doesn’t match, nothing happens (not an error):
{
"where": [
{ "@id": "ex:nonexistent", "schema:name": "?name" }
],
"delete": [...],
"insert": [...]
}
Result: No changes, no error.
Multiple Matches
If WHERE matches multiple entities, all are updated:
{
"where": [
{ "@id": "?person", "ex:status": "pending" }
],
"delete": [
{ "@id": "?person", "ex:status": "pending" }
],
"insert": [
{ "@id": "?person", "ex:status": "approved" }
]
}
Updates ALL entities with status=“pending”.
Comparison: WHERE/DELETE/INSERT vs Replace Mode
| Feature | WHERE/DELETE/INSERT | Replace Mode |
|---|---|---|
| Granularity | Property-level | Entity-level |
| Other properties | Preserved | Removed |
| Conditional | Yes (with filters) | No |
| Pattern matching | Yes | No |
| Idempotent | Depends on logic | Yes |
| Use case | Partial updates | Complete replacement |
Best Practices
1. Be Specific in WHERE
Good (specific):
{
"where": [
{ "@id": "ex:alice", "schema:age": "?oldAge" }
]
}
Risky (might match many):
{
"where": [
{ "@id": "?person", "schema:age": "?age" }
]
}
2. Always Use Variables
Use variables from WHERE in DELETE:
Good:
{
"where": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
"delete": [{ "@id": "ex:alice", "schema:age": "?oldAge" }]
}
Bad (deletes all ages):
{
"where": [{ "@id": "ex:alice", "schema:age": "?oldAge" }],
"delete": [{ "@id": "ex:alice", "schema:age": "?age" }]
}
3. Test Updates
Test on development data first:
// Test update logic
const result = await transact(updateQuery);
console.log(`Updated ${result.flakes_retracted} values`);
4. Use Filters for Safety
Add filters to prevent unintended updates:
{
"where": [
"...",
["filter", ["and", [">=", "?age", 0], ["<=", "?age", 150]]]
],
"delete": [...],
"insert": [...]
}
5. Handle No Matches
Decide if no matches should be an error in your application:
const result = await transact(updateQuery);
if (result.flakes_retracted === 0) {
console.warn('Update matched no entities');
}
6. Document Complex Updates
Comment complex update logic:
// Update inventory after sale completion
// - Decrement stock by sold quantity
// - Update last-sold timestamp
// - Mark as low-stock if below threshold
const updateInventory = { ... };
Performance Considerations
Index Usage
WHERE clauses use indexes:
- Subject-based: Fast
- Predicate-based: Fast
- Pattern-based: May be slower
Batch Updates
For many updates, consider batching:
const updates = entities.map(e => createUpdateQuery(e));
for (const update of updates) {
await transact(update);
}
Related Documentation
- Conditional updates (atomic / CAS patterns) - Increment, compare-and-swap, state machines, transfers
- Insert - Adding new data
- Upsert - Replace mode
- Retractions - Removing data
- Overview - Transaction overview
- Query WHERE Clauses - WHERE pattern syntax
Conditional Updates (Atomic / Compare-and-Swap Patterns)
Fluree’s WHERE/DELETE/INSERT transaction model supports powerful conditional update patterns that depend on the current database state. Every operation runs atomically within a single transaction — the WHERE clause reads current state, and the DELETE/INSERT templates modify it, all as one unit.
This guide covers common patterns for state-dependent updates with both JSON-LD and SPARQL UPDATE syntax.
Key Concept: How Conditional Updates Work
┌──────────────────────────────────────────────────────┐
│ 1. WHERE — query current state, bind variables │
│ 2. FILTER — guard: eliminate rows that don't pass │
│ 3. BIND — compute new values from bound vars │
│ 4. DELETE — retract matched triples │
│ 5. INSERT — assert new triples │
│ │
│ All steps execute atomically in one transaction. │
│ If WHERE returns zero rows, nothing happens (no-op).│
└──────────────────────────────────────────────────────┘
The WHERE clause runs against the current database state. If it matches, the bound variables flow into DELETE (to retract old values) and INSERT (to assert new ones). If WHERE returns zero rows — because a FILTER eliminated them or a pattern didn’t match — DELETE is skipped entirely (nothing to retract) and INSERT templates with unbound variables produce zero flakes.
Two INSERT behaviors
- INSERT with variables from WHERE (e.g.,
"@id": "?s") — conditional. When WHERE returns zero rows, the variable is unbound and the INSERT produces nothing. Use this for CAS, state machines, and guards. - All-literal INSERT (e.g.,
"@id": "ex:alice") — unconditional. Fires even when WHERE returns zero rows. Use this for “delete-if-exists, always insert” patterns.
1. Atomic Increment / Decrement
Read the current value, compute a new one, write it back — all in one transaction. Classic use cases: counters, inventory quantities, vote tallies, loyalty points.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": [
{ "@id": "ex:counter", "ex:count": "?old" },
["bind", "?new", "(+ ?old 1)"]
],
"delete": { "@id": "ex:counter", "ex:count": "?old" },
"insert": { "@id": "ex:counter", "ex:count": "?new" }
}
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE { ex:counter ex:count ?old }
INSERT { ex:counter ex:count ?new }
WHERE {
ex:counter ex:count ?old .
BIND (?old + 1 AS ?new)
}
Variations
- Decrement:
["bind", "?new", "(- ?old 1)"] - Increment by N:
["bind", "?new", "(+ ?old 50)"] - Multiply:
["bind", "?new", "(* ?old 2)"]
2. Compare-and-Swap (Optimistic Concurrency)
Only update if the current value matches what the client last read. If another transaction changed the data since the read, the WHERE won’t match and the update is a no-op. This is the foundation of optimistic concurrency control.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": { "@id": "?s", "ex:version": 1, "ex:price": "?oldPrice" },
"delete": { "@id": "?s", "ex:version": 1, "ex:price": "?oldPrice" },
"insert": { "@id": "?s", "ex:version": 2, "ex:price": 24.99 }
}
How it works:
- Client reads
ex:itemand seesversion: 1, price: 19.99 - Client submits update pinning
version: 1in WHERE - If version is still 1 → match → update succeeds, version bumps to 2
- If another client already changed version to 2 → no match → no-op
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE { ?s ex:version 1 . ?s ex:price ?oldPrice }
INSERT { ?s ex:version 2 . ?s ex:price 24.99 }
WHERE {
?s ex:version 1 ;
ex:price ?oldPrice .
}
Application-Level Handling
When a CAS update is a no-op (stale read), the client can detect this by checking whether t advanced:
response.t == request.t_before → stale read, retry with fresh data
response.t > request.t_before → update succeeded
3. State Machine Transitions
Only allow transitions from a valid source state. Invalid transitions (e.g., trying shipped → delivered when the current state is pending) are silently rejected as no-ops.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": { "@id": "?order", "ex:status": "pending" },
"delete": { "@id": "?order", "ex:status": "pending" },
"insert": { "@id": "?order", "ex:status": "approved" }
}
This only fires when the order’s current status is exactly "pending". If the status is anything else, the WHERE returns zero rows and nothing changes.
Multi-Step Chain
Chain transitions across sequential transactions:
pending → approved → shipped → delivered
Each step is its own transaction, each guarded by the expected source state. If any step finds the state has already moved (e.g., another process approved it), that step is a no-op.
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE { ?order ex:status "pending" }
INSERT { ?order ex:status "approved" }
WHERE { ?order ex:status "pending" }
4. Guarded Update (Threshold / Precondition)
Only apply a change when a numeric (or other) precondition is met. Classic use case: prevent overdrafts by checking balance before deducting.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": [
{ "@id": "ex:account", "ex:balance": "?bal" },
["filter", "(>= ?bal 100)"],
["bind", "?newBal", "(- ?bal 100)"]
],
"delete": { "@id": "ex:account", "ex:balance": "?bal" },
"insert": { "@id": "ex:account", "ex:balance": "?newBal" }
}
How it works:
- If
balance >= 100→ FILTER passes → deduction applied - If
balance < 100→ FILTER eliminates the row → no-op (overdraft prevented)
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE { ex:account ex:balance ?bal }
INSERT { ex:account ex:balance ?newBal }
WHERE {
ex:account ex:balance ?bal .
FILTER (?bal >= 100)
BIND (?bal - 100 AS ?newBal)
}
5. Atomic Transfer (Double-Entry)
Move a value between two entities atomically in a single transaction. Both the debit and credit happen together — if the guard fails, neither side is modified. Classic use cases: balance transfers, inventory moves between warehouses.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": [
{ "@id": "ex:alice-acct", "ex:balance": "?aliceBal" },
{ "@id": "ex:bob-acct", "ex:balance": "?bobBal" },
["filter", "(>= ?aliceBal 150)"],
["bind", "?newAlice", "(- ?aliceBal 150)",
"?newBob", "(+ ?bobBal 150)"]
],
"delete": [
{ "@id": "ex:alice-acct", "ex:balance": "?aliceBal" },
{ "@id": "ex:bob-acct", "ex:balance": "?bobBal" }
],
"insert": [
{ "@id": "ex:alice-acct", "ex:balance": "?newAlice" },
{ "@id": "ex:bob-acct", "ex:balance": "?newBob" }
]
}
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE {
ex:alice-acct ex:balance ?aliceBal .
ex:bob-acct ex:balance ?bobBal .
}
INSERT {
ex:alice-acct ex:balance ?newAlice .
ex:bob-acct ex:balance ?newBob .
}
WHERE {
ex:alice-acct ex:balance ?aliceBal .
ex:bob-acct ex:balance ?bobBal .
FILTER (?aliceBal >= 150)
BIND (?aliceBal - 150 AS ?newAlice)
BIND (?bobBal + 150 AS ?newBob)
}
6. Insert-If-Not-Exists (Conditional Create)
Create an entity only if it doesn’t already exist. Useful for preventing duplicate records.
This pattern uses OPTIONAL + FILTER to check for absence.
JSON-LD
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
["optional", { "@id": "ex:bob", "schema:name": "?existing" }],
["filter", "(not (bound ?existing))"]
],
"insert": {
"@id": "ex:bob",
"schema:name": "Bob",
"schema:age": 25
}
}
How it works:
- If
ex:bobdoes not exist: OPTIONAL leaves?existingunbound →(not (bound ?existing))is true → INSERT fires - If
ex:bobexists: OPTIONAL binds?existing→(not (bound ?existing))is false → FILTER eliminates the row → INSERT is skipped (zero solution rows = zero template instantiations)
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
PREFIX schema: <http://schema.org/>
INSERT { ex:bob schema:name "Bob" ; schema:age 25 }
WHERE {
OPTIONAL { ex:bob schema:name ?existing }
FILTER (!BOUND(?existing))
}
7. Capped Accumulator (Increment with Ceiling)
Increment a value but never exceed a maximum. Useful for loyalty points, rate limits, or any bounded counter.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": [
{ "@id": "ex:user", "ex:points": "?pts" },
["filter", "(< ?pts 1000)"],
["bind", "?new", "(if (> (+ ?pts 150) 1000) 1000 (+ ?pts 150))"]
],
"delete": { "@id": "ex:user", "ex:points": "?pts" },
"insert": { "@id": "ex:user", "ex:points": "?new" }
}
How it works:
- If
pts < 1000→ FILTER passes → BIND computesmin(pts + 150, 1000)→ update applied - If
pts >= 1000→ FILTER eliminates → no-op (already at cap)
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE { ex:user ex:points ?pts }
INSERT { ex:user ex:points ?new }
WHERE {
ex:user ex:points ?pts .
FILTER (?pts < 1000)
BIND (IF(?pts + 150 > 1000, 1000, ?pts + 150) AS ?new)
}
8. Cascading / Dependent Update (Graph Traversal)
Update one entity based on values from a related entity. The WHERE clause traverses the graph to gather data from multiple nodes.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": [
{ "@id": "ex:order1", "ex:customer": "?cust", "ex:total": "?orderTotal" },
{ "@id": "?cust", "ex:lifetimeSpend": "?ls" },
["bind", "?newLs", "(+ ?ls ?orderTotal)"]
],
"delete": { "@id": "?cust", "ex:lifetimeSpend": "?ls" },
"insert": { "@id": "?cust", "ex:lifetimeSpend": "?newLs" }
}
This traverses order → customer and accumulates the order total into the customer’s lifetime spend — all atomically.
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE { ?cust ex:lifetimeSpend ?ls }
INSERT { ?cust ex:lifetimeSpend ?newLs }
WHERE {
ex:order1 ex:customer ?cust ;
ex:total ?orderTotal .
?cust ex:lifetimeSpend ?ls .
BIND (?ls + ?orderTotal AS ?newLs)
}
9. Batch Conditional Update (Multi-Entity)
Apply the same transformation to every entity matching a pattern. The WHERE clause acts as a filter across the dataset.
Give All Engineers a 10% Raise
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": [
{ "@id": "?emp", "ex:dept": "engineering", "ex:salary": "?sal" },
["bind", "?newSal", "(+ ?sal (/ ?sal 10))"]
],
"delete": { "@id": "?emp", "ex:salary": "?sal" },
"insert": { "@id": "?emp", "ex:salary": "?newSal" }
}
SPARQL UPDATE
PREFIX ex: <http://example.org/ns/>
DELETE { ?emp ex:salary ?sal }
INSERT { ?emp ex:salary ?newSal }
WHERE {
?emp ex:dept "engineering" ;
ex:salary ?sal .
BIND (?sal + ?sal / 10 AS ?newSal)
}
Batch Status Change
Approve all pending tasks in one transaction:
{
"@context": { "ex": "http://example.org/ns/" },
"where": { "@id": "?task", "ex:status": "pending" },
"delete": { "@id": "?task", "ex:status": "pending" },
"insert": { "@id": "?task", "ex:status": "approved" }
}
Only entities with status: "pending" are affected; all others remain untouched.
10. Update with Audit Trail
Change a value and simultaneously record the old value for auditing — in a single atomic transaction.
JSON-LD
{
"@context": { "ex": "http://example.org/ns/" },
"where": [
{ "@id": "ex:product", "ex:price": "?oldPrice" },
["bind", "?newPrice", "(- ?oldPrice 10)"]
],
"delete": { "@id": "ex:product", "ex:price": "?oldPrice" },
"insert": {
"@id": "ex:product",
"ex:price": "?newPrice",
"ex:previousPrice": "?oldPrice"
}
}
After the update, the product has both its new price and a record of the previous price.
Note: Fluree’s immutable ledger also preserves full history via time travel, so you can always query any prior state. This pattern is useful when you want the previous value accessible without time-travel queries.
Pattern Summary
| Pattern | WHERE Matches | FILTER | BIND | Effect on No-Match |
|---|---|---|---|---|
| Atomic increment | Current value | — | Compute new value | No-op |
| Compare-and-swap | Expected value | — | — | No-op (stale read) |
| State machine | Expected state | — | — | No-op (invalid transition) |
| Guarded update | Current value | Threshold check | Compute new value | No-op (guard failed) |
| Atomic transfer | Both accounts | Sender balance | Both new balances | No-op (insufficient) |
| Insert-if-not-exists | OPTIONAL probe | not bound | — | No-op (already exists) |
| Capped accumulator | Current value | Below cap | Min(new, cap) | No-op (at cap) |
| Cascading update | Graph traversal | — | Derived value | No-op (path broken) |
| Batch update | All matching | — | Per-entity transform | Only matching entities |
| Audit trail | Current value | — | New value | No-op |
Best Practices
-
Prefer pattern matching over FILTER for equality. Pinning a value in the WHERE pattern (e.g.,
"ex:status": "pending") is simpler and more efficient than["filter", "(= ?st \"pending\")"]. -
Check
tto detect no-ops. When your application needs to distinguish between “update succeeded” and “condition not met,” comparetbefore and after the transaction. -
Use BIND for all computed values. The
["bind", "?var", "(expression)"]form keeps computation inside the transaction, ensuring atomicity. -
Use OPTIONAL + FILTER for absence checks. The
["optional", ...], ["filter", "(not (bound ?var))"]pattern is the idiomatic way to test for non-existence. -
Leverage Fluree’s immutability. Every transaction creates an immutable commit. Even without explicit audit trail patterns, you can always query previous states using time travel. Use the audit trail pattern when you want the old value readily accessible in the current state.
Related Documentation
- Update (WHERE/DELETE/INSERT) — Core syntax reference
- Insert — Adding new data
- Upsert — Replace mode
- Time travel — Querying historical states
Retractions
Retractions remove data from Fluree. While data is never truly deleted (it remains in history), retractions mark triples as no longer current.
What is a Retraction?
A retraction removes a triple from the current state:
- The triple existed at some point (was asserted)
- The retraction marks it as no longer true
- Historical queries can still see the triple
- Current queries don’t see the triple
Basic Retraction
Remove a specific triple:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:age": "?age" }
],
"delete": [
{ "@id": "ex:alice", "schema:age": "?age" }
]
}
This removes the age property from ex:alice.
Retract Specific Property
Remove a specific property value:
curl -X POST "http://localhost:8090/v1/fluree/update?ledger=mydb:main" \
-H "Content-Type: application/json" \
-d '{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"where": [
{ "@id": "ex:alice", "schema:email": "alice.old@example.org" }
],
"delete": [
{ "@id": "ex:alice", "schema:email": "alice.old@example.org" }
]
}'
Retract All Values of a Property
Remove all values for a property:
{
"where": [
{ "@id": "ex:alice", "schema:telephone": "?phone" }
],
"delete": [
{ "@id": "ex:alice", "schema:telephone": "?phone" }
]
}
If ex:alice has multiple phone numbers, this removes them all.
Retract Multiple Properties
Remove several properties at once:
{
"where": [
{ "@id": "ex:alice", "schema:email": "?email" },
{ "@id": "ex:alice", "schema:telephone": "?phone" },
{ "@id": "ex:alice", "ex:preferences": "?prefs" }
],
"delete": [
{ "@id": "ex:alice", "schema:email": "?email" },
{ "@id": "ex:alice", "schema:telephone": "?phone" },
{ "@id": "ex:alice", "ex:preferences": "?prefs" }
]
}
Retract Entire Entity
Remove all triples for an entity:
{
"where": [
{ "@id": "ex:alice", "?predicate": "?value" }
],
"delete": [
{ "@id": "ex:alice", "?predicate": "?value" }
]
}
This finds all triples where ex:alice is the subject and retracts them all.
Result: Entity is “deleted” from current state (but remains in history).
Conditional Retractions
Retract only if conditions are met:
{
"where": [
{ "@id": "?user", "@type": "schema:Person" },
{ "@id": "?user", "ex:lastLogin": "?lastLogin" },
{ "@id": "?user", "ex:status": "?status" }
],
"filter": "?lastLogin < '2023-01-01' && ?status == 'inactive'",
"delete": [
{ "@id": "?user", "?predicate": "?value" }
],
"where": [
{ "@id": "?user", "?predicate": "?value" }
]
}
Removes all inactive users who haven’t logged in since 2023.
Retract Relationships
Remove Single Relationship
{
"where": [
{ "@id": "ex:alice", "schema:knows": "ex:bob" }
],
"delete": [
{ "@id": "ex:alice", "schema:knows": "ex:bob" }
]
}
Remove All Relationships of a Type
{
"where": [
{ "@id": "ex:alice", "schema:knows": "?person" }
],
"delete": [
{ "@id": "ex:alice", "schema:knows": "?person" }
]
}
Bidirectional Relationship Removal
Remove relationship in both directions:
{
"where": [
{ "@id": "ex:alice", "schema:knows": "ex:bob" },
{ "@id": "ex:bob", "schema:knows": "ex:alice" }
],
"delete": [
{ "@id": "ex:alice", "schema:knows": "ex:bob" },
{ "@id": "ex:bob", "schema:knows": "ex:alice" }
]
}
Cascading Retractions
Retract an entity and all related entities:
{
"where": [
{ "@id": "ex:order-123", "ex:items": "?item" },
{ "@id": "?item", "?itemPred": "?itemVal" },
{ "@id": "ex:order-123", "?orderPred": "?orderVal" }
],
"delete": [
{ "@id": "?item", "?itemPred": "?itemVal" },
{ "@id": "ex:order-123", "?orderPred": "?orderVal" }
]
}
Deletes order and all its items.
Soft Delete vs Hard Retraction
Soft Delete (Recommended)
Mark as deleted without retracting:
{
"where": [
{ "@id": "ex:alice", "ex:status": "?status" }
],
"delete": [
{ "@id": "ex:alice", "ex:status": "?status" }
],
"insert": [
{ "@id": "ex:alice", "ex:status": "deleted" },
{ "@id": "ex:alice", "ex:deletedAt": "2024-01-22T10:30:00Z" }
]
}
Benefits:
- Easy to “undelete”
- Audit trail of deletion
- Can query deleted entities
- Less impact on indexes
Hard Retraction
Retract all data:
{
"where": [
{ "@id": "ex:alice", "?predicate": "?value" }
],
"delete": [
{ "@id": "ex:alice", "?predicate": "?value" }
]
}
When to use:
- Legal requirement to remove data
- Sensitive data that must be removed
- Test data cleanup
Note: Data still exists in history. For true deletion, see data purging operations.
Pattern-Based Retractions
Retract by Type
Remove all entities of a type:
{
"where": [
{ "@id": "?entity", "@type": "ex:TempData" },
{ "@id": "?entity", "?predicate": "?value" }
],
"delete": [
{ "@id": "?entity", "?predicate": "?value" }
]
}
Retract by Property Value
Remove entities with specific property:
{
"where": [
{ "@id": "?entity", "ex:expired": true },
{ "@id": "?entity", "?predicate": "?value" }
],
"delete": [
{ "@id": "?entity", "?predicate": "?value" }
]
}
Retraction Semantics
Idempotent
Retracting a non-existent triple is a no-op:
t=1: No triple exists
t=2: DELETE { ex:alice schema:age 30 }
Result: No change (triple didn't exist)
No Cascading by Default
Retracting an entity doesn’t automatically retract references to it:
t=1: ex:alice schema:worksFor ex:company-a
ex:company-a schema:name "Acme"
t=2: DELETE all triples for ex:company-a
Result:
- ex:company-a properties are gone
- ex:alice schema:worksFor ex:company-a REMAINS
- Reference is now "dangling"
To cascade, explicitly match and delete references.
Time Travel and Retractions
Historical Queries See Retracted Data
# Current query (after retraction at t=5)
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "mydb:main", "select": ["?name"], ...}'
# Returns: [] (no results)
# Historical query (before retraction)
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "mydb:main@t:3", "select": ["?name"], ...}'
# Returns: [{"name": "Alice"}] (data visible)
History Shows Retractions
Query the history to see both assertions and retractions:
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{
"@context": { "schema": "http://schema.org/" },
"from": "mydb:main@t:1",
"to": "mydb:main@t:latest",
"select": ["?name", "?t", "?op"],
"where": [
{ "@id": "ex:alice", "schema:name": { "@value": "?name", "@t": "?t", "@op": "?op" } }
],
"orderBy": "?t"
}'
Response:
[
["Alice", 1, true],
["Alice", 5, false]
]
The @t annotation captures the transaction time and @op binds a boolean — true for assertions, false for retractions (mirroring Flake.op on disk).
Error Handling
Common Errors
No Match (Not an Error):
{
"where": [{ "@id": "ex:nonexistent", "schema:name": "?name" }],
"delete": [{ "@id": "ex:nonexistent", "schema:name": "?name" }]
}
Result: No changes, no error.
Invalid Pattern:
{
"error": "QueryError",
"message": "Invalid WHERE pattern",
"code": "INVALID_PATTERN"
}
Performance Considerations
Index Updates
Retractions update all indexes:
- Each retracted triple updates SPOT, POST, OPST, PSOT
- Large retractions can impact performance
- Consider batch size for bulk deletions
Indexing Lag
Large retractions may increase indexing lag:
- Monitor
commit_t - index_t - Allow time for indexing between large retractions
- Consider scheduling during low-traffic periods
Vacuum/Compaction
Eventually, consider compaction to reclaim space from retracted data (implementation-specific).
Best Practices
1. Use Soft Deletes
Prefer marking as deleted:
Good:
{
"insert": [{ "@id": "ex:alice", "ex:status": "deleted" }]
}
Over:
{
"delete": [{ "@id": "ex:alice", "?pred": "?val" }]
}
2. Add Audit Metadata
Include deletion metadata:
{
"insert": [
{ "@id": "ex:alice", "ex:status": "deleted" },
{ "@id": "ex:alice", "ex:deletedAt": "2024-01-22T10:30:00Z" },
{ "@id": "ex:alice", "ex:deletedBy": "user-admin" },
{ "@id": "ex:alice", "ex:deleteReason": "User request" }
]
}
3. Be Specific in WHERE
Avoid accidentally retracting too much:
Good:
{
"where": [{ "@id": "ex:alice", "schema:age": "?age" }],
"delete": [{ "@id": "ex:alice", "schema:age": "?age" }]
}
Dangerous:
{
"where": [{ "@id": "?entity", "schema:age": "?age" }],
"delete": [{ "@id": "?entity", "?pred": "?val" }]
}
4. Test Retractions
Test on development data:
// Count before
const countBefore = await query('SELECT (COUNT(?e) as ?count) WHERE { ... }');
// Retract
await transact(retractionQuery);
// Count after
const countAfter = await query('SELECT (COUNT(?e) as ?count) WHERE { ... }');
console.log(`Retracted ${countBefore - countAfter} entities`);
5. Handle Cascading Explicitly
Don’t rely on cascading—make it explicit:
{
"where": [
{ "@id": "ex:order-123", "?pred": "?val" },
{ "@id": "?item", "ex:orderId": "ex:order-123" },
{ "@id": "?item", "?itemPred": "?itemVal" }
],
"delete": [
{ "@id": "ex:order-123", "?pred": "?val" },
{ "@id": "?item", "?itemPred": "?itemVal" }
]
}
6. Document Deletion Logic
Comment deletion logic:
// Hard delete expired sessions older than 30 days
// - Finds all sessions with expired=true and oldDate
// - Retracts all properties
// - Logs count of deleted sessions
await retractExpiredSessions();
7. Monitor Impact
Track retraction metrics:
- Count of retractions
- Entities affected
- Indexing lag after large retractions
- Query performance impact
Data Privacy Compliance
GDPR “Right to be Forgotten”
For compliance, consider:
- Soft delete first (marks as deleted)
- Schedule purge (actual removal from history)
- Anonymize references (replace with pseudonymous ID)
Example:
{
"where": [{ "@id": "ex:user-123", "?pred": "?val" }],
"delete": [{ "@id": "ex:user-123", "?pred": "?val" }],
"insert": [{
"@id": "ex:user-123",
"ex:anonymized": true,
"ex:anonymizedAt": "2024-01-22T10:30:00Z"
}]
}
Note: True purging from history requires administrative operations beyond standard retractions.
Related Documentation
- Insert - Adding data
- Update - Updating data
- Time Travel - Historical queries
- History Queries - Viewing changes over time
Turtle and TriG Ingest
Fluree supports ingesting RDF data in Turtle (Terse RDF Triple Language) and TriG formats. Turtle is a compact, human-readable format for RDF triples, while TriG extends Turtle to support named graphs.
What is Turtle?
Turtle is a W3C standard format for writing RDF triples. It’s more readable than XML-based formats and commonly used in the Semantic Web community.
Example Turtle:
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
ex:alice a schema:Person ;
schema:name "Alice" ;
schema:email "alice@example.org" ;
schema:age 30 .
ex:bob a schema:Person ;
schema:name "Bob" ;
schema:email "bob@example.org" .
Transaction Endpoints
Fluree supports Turtle and TriG on different endpoints with different semantics:
| Endpoint | Turtle (text/turtle) | TriG (application/trig) |
|---|---|---|
/insert | Supported (fast direct path) | Not supported (400 error) |
/upsert | Supported | Supported |
- Insert (
/insert): Pure insert semantics. Uses fast direct flake parsing. Will fail if subjects already exist with conflicting data. TriG is not supported because named graphs require the upsert path for GRAPH block extraction. - Upsert (
/upsert): For each (subject, predicate) pair, existing values are retracted before new values are asserted. Supports TriG with GRAPH blocks for named graph ingestion.
Basic Turtle Transaction
Submit Turtle data via HTTP API:
# Insert (pure insert, fast path)
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
-H "Content-Type: text/turtle" \
--data-binary '@data.ttl'
# Or upsert (replace existing values)
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: text/turtle" \
--data-binary '@data.ttl'
File: data.ttl
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
ex:alice a schema:Person ;
schema:name "Alice" ;
schema:email "alice@example.org" .
Turtle Syntax
Prefixes
Define namespace prefixes:
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
Basic Triples
ex:alice schema:name "Alice" .
ex:alice schema:age 30 .
ex:alice schema:email "alice@example.org" .
Semicolon Shorthand
Share subject across predicates:
ex:alice schema:name "Alice" ;
schema:age 30 ;
schema:email "alice@example.org" .
Equivalent to three separate triples.
Comma Shorthand
Share subject and predicate:
ex:alice schema:email "alice@example.org" ,
"alice@work.com" ,
"alice@personal.net" .
Creates three triples with same subject and predicate.
Type Shorthand
ex:alice a schema:Person .
Equivalent to:
ex:alice rdf:type schema:Person .
Literals
Plain String:
ex:alice schema:name "Alice" .
Typed Literal:
ex:alice schema:age "30"^^xsd:integer .
ex:alice schema:price "29.99"^^xsd:decimal .
ex:alice schema:birthDate "1994-05-15"^^xsd:date .
Language-Tagged:
ex:alice schema:name "Alice"@en .
ex:alice schema:name "アリス"@ja .
Boolean:
ex:alice schema:active true .
Numbers:
ex:alice schema:age 30 .
ex:alice schema:height 1.68 .
IRIs
Full IRI:
<http://example.org/ns/alice> schema:name "Alice" .
Prefixed IRI:
ex:alice schema:name "Alice" .
Blank Nodes
Anonymous:
ex:alice schema:address [
a schema:PostalAddress ;
schema:streetAddress "123 Main St" ;
schema:addressLocality "Springfield"
] .
Labeled:
ex:alice schema:address _:addr1 .
_:addr1 a schema:PostalAddress ;
schema:streetAddress "123 Main St" .
Collections
RDF Lists:
ex:alice schema:favoriteColors ( "red" "blue" "green" ) .
Equivalent to linked list structure in RDF.
Bulk Import
From File
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: text/turtle" \
--data-binary '@large-dataset.ttl'
From URL
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: text/turtle" \
-d "@https://example.org/data.ttl"
Streaming Large Files
For very large files, split into batches:
# Split large file
split -l 10000 large-dataset.ttl batch-
# Import batches
for file in batch-*; do
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: text/turtle" \
--data-binary "@$file"
sleep 1 # Allow indexing time
done
Complete Example
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# Company
ex:company-a a schema:Organization ;
schema:name "Acme Corp" ;
schema:url <https://acme.example.com> ;
schema:foundingDate "2000-01-15"^^xsd:date .
# People
ex:alice a schema:Person ;
schema:name "Alice" ;
schema:email "alice@example.org" , "alice@work.com" ;
schema:age 30 ;
schema:worksFor ex:company-a ;
schema:address [
a schema:PostalAddress ;
schema:streetAddress "123 Main St" ;
schema:addressLocality "Springfield" ;
schema:postalCode "12345"
] .
ex:bob a schema:Person ;
schema:name "Bob" ;
schema:email "bob@example.org" ;
schema:age 25 ;
schema:worksFor ex:company-a ;
schema:knows ex:alice .
ex:carol a schema:Person ;
schema:name "Carol" ;
schema:email "carol@example.org" ;
schema:knows ex:alice , ex:bob .
Format Conversion
From JSON-LD to Turtle
Many tools can convert between formats:
# Using rapper (from Redland)
rapper -i json-ld -o turtle data.jsonld > data.ttl
# Using riot (from Apache Jena)
riot --output=turtle data.jsonld > data.ttl
From RDF/XML to Turtle
rapper -i rdfxml -o turtle data.rdf > data.ttl
From N-Triples to Turtle
rapper -i ntriples -o turtle data.nt > data.ttl
Validation
Validate Turtle syntax before importing:
# Using rapper
rapper -i turtle -c data.ttl
# Using riot
riot --validate data.ttl
Error Handling
Syntax Errors
{
"error": "ParseError",
"message": "Invalid Turtle syntax at line 5",
"code": "TURTLE_PARSE_ERROR",
"details": {
"line": 5,
"column": 12,
"token": "unexpected EOF"
}
}
Invalid IRIs
{
"error": "ValidationError",
"message": "Invalid IRI: not a valid URI",
"code": "INVALID_IRI",
"details": {
"iri": "not a uri",
"line": 8
}
}
Performance Tips
1. Use Batch Import
Import large datasets in batches of 10,000-100,000 triples.
2. Optimize Prefixes
Use short prefixes for efficiency:
Good:
@prefix ex: <http://example.org/ns/> .
ex:alice ex:name "Alice" .
Less efficient:
<http://example.org/ns/alice> <http://example.org/ns/name> "Alice" .
3. Monitor Memory
Large Turtle files consume memory during parsing. Split very large files.
4. Allow Indexing Time
After large imports, wait for indexing:
# Import
curl -X POST ... --data-binary '@batch.ttl'
# Wait for indexing
sleep 5
# Import next batch
curl -X POST ... --data-binary '@batch2.ttl'
Best Practices
1. Use Standard Vocabularies
Prefer well-known vocabularies:
@prefix schema: <http://schema.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/terms/> .
2. Include Types
Always specify entity types:
ex:alice a schema:Person ;
schema:name "Alice" .
3. Use Typed Literals
Be explicit about datatypes:
ex:alice schema:birthDate "1994-05-15"^^xsd:date ;
schema:age "30"^^xsd:integer ;
schema:height "1.68"^^xsd:decimal .
4. Document Namespaces
Comment your prefixes:
# Schema.org vocabulary for general entities
@prefix schema: <http://schema.org/> .
# Application-specific namespace
@prefix ex: <http://example.org/ns/> .
# Standard XSD datatypes
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
5. Validate Before Import
Always validate Turtle syntax:
rapper -i turtle -c data.ttl
6. Split Large Files
For files > 100MB, split into smaller batches.
7. Include Provenance
Add metadata about the import:
ex:dataset-import-2024-01-22 a ex:DatasetImport ;
schema:dateCreated "2024-01-22T10:00:00Z"^^xsd:dateTime ;
schema:author <https://example.org/users/admin> ;
ex:sourceFile "data-2024-01.ttl" ;
ex:recordCount 1234567 .
Comparing Formats
JSON-LD vs Turtle
JSON-LD:
- Native to Fluree
- Easy for JavaScript applications
- Verbose for large datasets
Turtle:
- More compact
- Standard in RDF community
- Better for bulk imports
- Requires conversion for JavaScript apps
When to Use Turtle
Use Turtle for:
- Large bulk imports
- Integration with RDF tools
- Data from Semantic Web sources
- Data exchange with RDF systems
Use JSON-LD for:
- Application integration
- Real-time transactions
- JavaScript/TypeScript apps
- REST API interactions
TriG Format (Named Graphs)
TriG extends Turtle to support named graphs. Each named graph groups triples under a graph IRI.
What is TriG?
TriG (TriG RDF Triple Graph) is a W3C standard format that adds named graph support to Turtle syntax. It allows you to partition data into logical groups that can be queried independently.
Basic TriG Syntax
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
# Default graph triples (no GRAPH block)
ex:company a schema:Organization ;
schema:name "Acme Corp" .
# Named graph for products
GRAPH <http://example.org/graphs/products> {
ex:widget a schema:Product ;
schema:name "Widget" ;
schema:price "29.99"^^xsd:decimal .
ex:gadget a schema:Product ;
schema:name "Gadget" ;
schema:price "49.99"^^xsd:decimal .
}
# Named graph for inventory
GRAPH <http://example.org/graphs/inventory> {
ex:widget schema:inventory 42 ;
schema:warehouse "main" .
ex:gadget schema:inventory 15 ;
schema:warehouse "secondary" .
}
Submitting TriG Data
TriG is only supported on the upsert endpoint (or transact). Use the application/trig content type:
# TriG requires upsert (for named graph support)
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/trig" \
--data-binary '@data.trig'
TriG on the /insert endpoint will return a 400 error because named graph extraction requires the upsert path.
Querying Named Graphs
After ingesting TriG data, query specific graphs using JSON-LD with the structured from object:
{
"@context": { "schema": "http://schema.org/" },
"from": {
"@id": "mydb:main",
"graph": "http://example.org/graphs/products"
},
"select": ["?name", "?price"],
"where": [
{ "@id": "?product", "schema:name": "?name" },
{ "@id": "?product", "schema:price": "?price" }
]
}
For cross-graph queries, use fromNamed with aliases:
{
"@context": { "schema": "http://schema.org/" },
"from": "mydb:main",
"fromNamed": [
{ "@id": "mydb:main", "alias": "products", "graph": "http://example.org/graphs/products" },
{ "@id": "mydb:main", "alias": "inventory", "graph": "http://example.org/graphs/inventory" }
],
"select": ["?name", "?inventory", "?warehouse"],
"where": [
["graph", "products", { "@id": "?product", "schema:name": "?name" }],
["graph", "inventory", { "@id": "?product", "schema:inventory": "?inventory", "schema:warehouse": "?warehouse" }]
]
}
Graph IDs
Fluree assigns internal graph IDs to named graphs:
| Graph ID | Purpose |
|---|---|
| 0 | Default graph (triples without GRAPH block) |
| 1 | txn-meta (commit metadata) |
| 2+ | User-defined named graphs |
TriG with Transaction Metadata
You can combine named graphs with transaction metadata using the special #txn-meta graph fragment:
@prefix ex: <http://example.org/ns/> .
@prefix f: <https://ns.flur.ee/db#> .
# Transaction metadata (stored in txn-meta graph)
GRAPH <#txn-meta> {
fluree:commit:this ex:jobId "batch-import-001" ;
ex:source "warehouse-export" ;
ex:operator "system-admin" .
}
# User data in named graph
GRAPH <http://example.org/graphs/products> {
ex:widget a ex:Product ;
ex:name "Widget" .
}
Limits
- Maximum 256 named graphs per transaction
- Maximum 8KB per graph IRI
- Named graphs are queryable after indexing completes
When to Use TriG
Use TriG when you need to:
- Partition data into logical groups
- Separate data by source, tenant, or domain
- Maintain provenance at the graph level
- Integrate with RDF quad stores
Use plain Turtle when:
- All data belongs in the default graph
- Graph partitioning isn’t needed
- Working with simpler data models
Bulk import (Rust API)
For high-throughput ingest of large Turtle datasets into a fresh ledger, prefer the bulk import
pipeline exposed by fluree-db-api:
This pipeline:
- Parses Turtle in parallel, but writes commits serially (hash-linked commit chain).
- Streams run generation during import and builds multi-order binary indexes (SPOT/PSOT/POST/OPST).
- Writes an index root to CAS and publishes it to the nameservice so queries can use the normal
db()/query()path.
Temporary tmp_import/ session files are cleaned up on success (configurable).
Tools and Libraries
Command-Line Tools
Rapper (Redland):
# Install on macOS
brew install redland
# Parse Turtle
rapper -i turtle data.ttl
Riot (Apache Jena):
# Install
# Download from https://jena.apache.org/
# Validate
riot --validate data.ttl
Programming Libraries
JavaScript/TypeScript:
import { Parser } from 'n3';
const parser = new Parser();
const quads = parser.parse(turtleString);
Python:
from rdflib import Graph
g = Graph()
g.parse('data.ttl', format='turtle')
Java:
import org.apache.jena.rdf.model.*;
Model model = ModelFactory.createDefaultModel();
model.read("data.ttl", "TURTLE");
Related Documentation
- Insert - Adding data via JSON-LD
- Overview - Transaction overview
- Datasets and Named Graphs - Named graph concepts
- Data Types - Supported datatypes
- API Headers - Content-Type specifications
Signed / Credentialed Transactions
Fluree supports cryptographically signed transactions using JSON Web Signatures (JWS) and Verifiable Credentials (VC). Signed transactions provide authentication, integrity, and non-repudiation for all transaction operations.
Why Sign Transactions?
Signed transactions provide:
- Authentication: Prove who submitted the transaction
- Integrity: Ensure transaction hasn’t been tampered with
- Non-repudiation: Transaction author cannot deny authorship
- Authorization: Link transaction to specific identity for policy enforcement
- Audit Trail: Complete provenance of all data changes
Basic Signed Transaction
Step 1: Create Transaction
Create your transaction as normal:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
{
"@id": "ex:alice",
"@type": "schema:Person",
"schema:name": "Alice"
}
]
}
Step 2: Sign with JWS
Sign the transaction using JWS:
import jose from 'jose';
const privateKey = ... // Your Ed25519 private key
const jws = await new jose.SignJWT(transaction)
.setProtectedHeader({
alg: 'EdDSA',
kid: 'did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK'
})
.setIssuedAt()
.setExpirationTime('15m')
.sign(privateKey);
Step 3: Submit
Submit the signed transaction:
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/jose" \
-d "$jws"
JWS Format
Compact Serialization
eyJhbGciOiJFZDI1NTE5IiwidHlwIjoiSldUIn0.eyJAY29udGV4dCI6eyJleCI6Imh0...
Three base64url-encoded parts separated by dots:
- Header (algorithm, key ID)
- Payload (transaction)
- Signature
JSON Serialization
{
"payload": "eyJAY29udGV4dCI6eyJleCI6Imh0...",
"signatures": [
{
"protected": "eyJhbGciOiJFZDI1NTE5In0",
"signature": "c2lnbmF0dXJl..."
}
]
}
Verifiable Credentials
Use W3C Verifiable Credentials for transactions:
{
"@context": [
"https://www.w3.org/2018/credentials/v1"
],
"type": ["VerifiableCredential"],
"issuer": "did:key:z6Mkh...",
"issuanceDate": "2024-01-22T10:00:00Z",
"credentialSubject": {
"id": "did:key:z6Mkh...",
"flureeTransaction": {
"@context": {
"ex": "http://example.org/ns/"
},
"@graph": [
{ "@id": "ex:alice", "schema:name": "Alice" }
]
}
},
"proof": {
"type": "Ed25519Signature2020",
"created": "2024-01-22T10:00:00Z",
"verificationMethod": "did:key:z6Mkh...#z6Mkh...",
"proofPurpose": "authentication",
"proofValue": "z58DAdFfa9SkqZMVP..."
}
}
Submit with:
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=mydb:main" \
-H "Content-Type: application/vc+ld+json" \
-d @credential.json
Supported Algorithm
EdDSA (Ed25519):
- Fast, secure, deterministic
- 64-byte signatures
- 128-bit security level
Identity Management
Decentralized Identifiers (DIDs)
Use DIDs to identify transaction authors:
did:key (simplest):
did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
did:web (organization-managed):
did:web:example.com:users:alice
did:ion (blockchain-based):
did:ion:EiClkZMDxPKqC9c-umQfTkR8vvZ9JPhl_xLDI9Nfk38w5w
Key Resolution
Standalone server signed requests verify Ed25519 JWS material from the request
itself (for example embedded JWK / did:key) or configured OIDC/JWKS issuers.
There is no /admin/keys registration endpoint.
Transaction Provenance
Signed transactions include author information in commit metadata:
{
"t": 42,
"timestamp": "2024-01-22T10:30:00Z",
"commit_id": "bafybeig...commitT42",
"author": "did:key:z6Mkh...",
"signature": "z58DAdFfa9...",
"flakes_added": 3,
"flakes_retracted": 0
}
Query provenance:
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?t ?author ?timestamp
WHERE {
?commit f:t ?t ;
f:author ?author ;
f:timestamp ?timestamp .
}
ORDER BY DESC(?t)
Policy-Based Authorization
Use signed transaction author for authorization:
{
"@context": {
"ex": "http://example.org/ns/",
"f": "https://ns.flur.ee/db#"
},
"@id": "ex:admin-policy",
"f:policy": [
{
"f:subject": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"f:action": "transact",
"f:allow": true
}
]
}
Only transactions signed by this DID will be accepted.
Code Examples
JavaScript/TypeScript
import jose from 'jose';
import { Ed25519VerificationKey2020 } from '@digitalbazaar/ed25519-verification-key-2020';
async function signTransaction(transaction: object, privateKey: Uint8Array) {
const jws = await new jose.SignJWT(transaction)
.setProtectedHeader({
alg: 'EdDSA',
kid: 'did:key:z6Mkh...'
})
.setIssuedAt()
.setExpirationTime('15m')
.sign(privateKey);
return jws;
}
async function submitSignedTransaction(ledger: string, transaction: object) {
const signed = await signTransaction(transaction, privateKey);
const response = await fetch(`http://localhost:8090/v1/fluree/upsert?ledger=${ledger}`, {
method: 'POST',
headers: { 'Content-Type': 'application/jose' },
body: signed
});
return await response.json();
}
Python
from jwcrypto import jwk, jws
import json
def sign_transaction(transaction, private_key):
# Create JWK from private key
key = jwk.JWK.from_json(private_key)
# Create JWS
payload = json.dumps(transaction).encode('utf-8')
jws_token = jws.JWS(payload)
jws_token.add_signature(
key,
alg='EdDSA',
protected=json.dumps({"kid": "did:key:z6Mkh..."})
)
return jws_token.serialize()
def submit_signed_transaction(ledger, transaction, private_key):
signed = sign_transaction(transaction, private_key)
response = requests.post(
f'http://localhost:8090/v1/fluree/upsert?ledger={ledger}',
headers={'Content-Type': 'application/jose'},
data=signed
)
return response.json()
Verification Process
When Fluree receives a signed transaction:
- Extract signature and header
- Resolve key ID (kid) to public key
- Verify signature using public key
- Check expiration (if exp claim present)
- Validate issuer (if required by policy)
- Apply authorization policies based on DID
- Process transaction if verification succeeds
Error Handling
Invalid Signature
{
"error": "SignatureVerificationFailed",
"message": "Invalid signature",
"code": "INVALID_SIGNATURE",
"details": {
"kid": "did:key:z6Mkh...",
"reason": "Signature does not match"
}
}
Expired Transaction
{
"error": "TokenExpired",
"message": "Transaction signature expired",
"code": "TOKEN_EXPIRED",
"details": {
"exp": 1642857600,
"now": 1642858000
}
}
Key Not Found
{
"error": "KeyNotFound",
"message": "Public key not registered",
"code": "KEY_NOT_FOUND",
"details": {
"kid": "did:key:z6Mkh..."
}
}
Unauthorized
{
"error": "Forbidden",
"message": "Policy denies transact permission",
"code": "POLICY_DENIED",
"details": {
"subject": "did:key:z6Mkh...",
"action": "transact",
"ledger": "mydb:main"
}
}
Best Practices
1. Use EdDSA (Ed25519)
Best security and performance:
{
"alg": "EdDSA",
"kid": "did:key:z6Mkh..."
}
2. Set Expiration
Always include expiration:
.setExpirationTime('15m') // 15 minutes
3. Secure Key Storage
Never hardcode private keys:
Good:
const privateKey = await loadKeyFromSecureStorage();
Bad:
const privateKey = "hardcoded-key-here";
4. Use did:key for Simplicity
For simple deployments:
did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
5. Implement Key Rotation
Rotate keys every 90-180 days:
async function rotateKey() {
const newKey = generateKeyPair();
await registerKey(newKey.publicKey);
await revokeKey(oldKey.kid);
updateApplicationKey(newKey);
}
6. Include Request ID
Add unique ID to prevent replay:
.setClaim('jti', crypto.randomUUID())
7. Use HTTPS
Always use HTTPS with signed transactions to prevent replay attacks.
Compliance and Auditing
Complete Audit Trail
Signed transactions provide complete audit trail:
SELECT ?t ?author ?timestamp ?action
WHERE {
?commit f:t ?t ;
f:author ?author ;
f:timestamp ?timestamp .
?commit f:assert ?assertion .
?assertion ?predicate ?object .
}
ORDER BY DESC(?t)
Regulatory Compliance
Signed transactions support:
- SOC 2 (audit trails)
- HIPAA (data provenance)
- GDPR (data processing records)
- PCI DSS (transaction logs)
Non-Repudiation
Cryptographic signatures provide non-repudiation:
- Author cannot deny submitting transaction
- Tampering is detectable
- Legal admissibility in disputes
Related Documentation
- API: Signed Requests - HTTP API details
- Commit Signing and Attestation - Infrastructure-level commit signatures
- Security: Policy Model - Authorization policies
- Verifiable Data - Cryptographic verification concepts
- Commit Receipts - Transaction metadata
Commit Receipts and tx-id
Every successful transaction returns a commit receipt containing metadata about the transaction. This receipt provides important information for tracking, auditing, and referencing transactions.
Commit Receipt Structure
Basic commit receipt:
{
"t": 42,
"timestamp": "2024-01-22T10:30:00.000Z",
"commit_id": "bafybeig...commitT42",
"flakes_added": 15,
"flakes_retracted": 3,
"previous_commit_id": "bafybeig...commitT41"
}
Receipt Fields
Transaction Time (t)
The transaction time is a monotonically increasing integer uniquely identifying this transaction:
{
"t": 42
}
Properties:
- Unique across all ledgers in the Fluree instance
- Monotonically increasing (never decreases)
- Used for time travel queries
- Basis for temporal ordering
Usage:
# Query at specific transaction
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "mydb:main@t:42", ...}'
Read-after-write consistency: The t value is the key to ensuring queries
see freshly committed data. Pass it as min_t to refresh() to gate queries
on a minimum transaction time. See Time Travel — Consistency and Read-After-Write for details.
Timestamp
ISO 8601 formatted timestamp of when the transaction was committed:
{
"timestamp": "2024-01-22T10:30:00.000Z"
}
Properties:
- UTC timezone
- Millisecond precision
- Server-assigned (not client-provided)
- Monotonic (within same transaction time ordering)
Usage:
# Query at specific time
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "mydb:main@iso:2024-01-22T10:30:00Z", ...}'
Commit ID
Content-addressed identifier for the commit:
{
"commit_id": "bafybeig...commitT42"
}
Properties:
- CIDv1 value (base32-lower multibase string)
- Derived from the commit’s canonical bytes via SHA-256
- Storage-agnostic – does not depend on where the commit is stored
- Can be used to fetch the commit from any content store
Usage:
# Query at specific commit
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "mydb:main@commit:bafybeig...commitT42", ...}'
Flake Counts
Number of triples added and retracted:
{
"flakes_added": 15,
"flakes_retracted": 3
}
flakes_added: Number of new triples asserted flakes_retracted: Number of existing triples removed
Net change: flakes_added - flakes_retracted
Previous Commit
ContentId of the previous commit (forms a chain):
{
"previous_commit_id": "bafybeig...commitT41"
}
Properties:
- Links to parent commit by ContentId
- Forms immutable commit chain
- Enables commit history traversal
nullfor first transaction (t=1)
Extended Receipt Fields
Author (Signed Transactions)
For signed transactions, includes author DID:
{
"t": 42,
"author": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"signature": "z58DAdFfa9SkqZMVP...",
...
}
Message
Optional commit message (if provided):
{
"t": 42,
"message": "Add new customer records for Q1 2024",
...
}
Ledger
Ledger ID:
{
"t": 42,
"ledger": "mydb:main",
...
}
Duration
Transaction processing time in milliseconds:
{
"t": 42,
"duration_ms": 45,
...
}
Using Transaction IDs
Referencing Transactions
Store transaction ID for later reference:
const receipt = await transact({
"@graph": [{ "@id": "ex:alice", "schema:name": "Alice" }]
});
// Store for audit trail
await logTransaction({
entity: "ex:alice",
operation: "create",
transactionId: receipt.t,
timestamp: receipt.timestamp
});
Historical Queries
Query data at specific transaction:
// Get data as it was at transaction 42
const historicalData = await query({
from: `mydb:main@t:${receipt.t}`,
select: ["?name"],
where: [{ "@id": "ex:alice", "schema:name": "?name" }]
});
Commit Verification
Verify commit integrity by re-deriving the ContentId from fetched bytes:
async function verifyCommit(receipt) {
const bytes = await contentStore.get(receipt.commit_id);
const derivedCid = computeContentId("Commit", bytes);
if (derivedCid !== receipt.commit_id) {
throw new Error('Commit integrity violation!');
}
}
Commit Chain
Commits form an immutable chain:
t=1 (cid:aaa) ← t=2 (cid:bbb) ← t=3 (cid:ccc) ← t=4 (cid:ddd)
↑ ↑ ↑ ↑
| | | |
previous=null previous=aaa previous=bbb previous=ccc
Traversing History
Walk the commit chain:
async function getCommitHistory(ledger, fromT, toT) {
const history = [];
let currentT = fromT;
while (currentT >= toT) {
const commit = await getCommit(ledger, currentT);
history.push(commit);
currentT = commit.previous_t;
}
return history;
}
Querying Commit Metadata
SPARQL Query for Commits
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?t ?timestamp ?commitId ?author
WHERE {
?commit a f:Commit ;
f:t ?t ;
f:timestamp ?timestamp ;
f:commitId ?commitId .
OPTIONAL { ?commit f:author ?author }
}
ORDER BY DESC(?t)
LIMIT 10
JSON-LD Query for Recent Commits
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"select": ["?t", "?timestamp", "?commitId"],
"where": [
{ "@id": "?commit", "@type": "f:Commit" },
{ "@id": "?commit", "f:t": "?t" },
{ "@id": "?commit", "f:timestamp": "?timestamp" },
{ "@id": "?commit", "f:commitId": "?commitId" }
],
"orderBy": ["-?t"],
"limit": 10
}
Receipt Storage
Application Database
Store receipts in your application database:
CREATE TABLE transaction_receipts (
id SERIAL PRIMARY KEY,
ledger VARCHAR(255),
transaction_t INTEGER,
commit_id TEXT,
timestamp TIMESTAMP,
flakes_added INTEGER,
flakes_retracted INTEGER,
author VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW()
);
Document Store
Store as JSON documents:
await mongodb.collection('receipts').insertOne({
ledger: receipt.ledger,
t: receipt.t,
commit_id: receipt.commit_id,
timestamp: receipt.timestamp,
flakes: {
added: receipt.flakes_added,
retracted: receipt.flakes_retracted
},
metadata: {
author: receipt.author,
duration_ms: receipt.duration_ms
}
});
Time-Series Database
For analytics:
await influxdb.writePoint({
measurement: 'transactions',
tags: { ledger: receipt.ledger },
fields: {
t: receipt.t,
flakes_added: receipt.flakes_added,
flakes_retracted: receipt.flakes_retracted,
duration_ms: receipt.duration_ms
},
timestamp: new Date(receipt.timestamp)
});
Audit Trail
Transaction Log
Build complete audit log from receipts:
async function buildAuditLog(ledger, startDate, endDate) {
const receipts = await fetchReceipts(ledger, startDate, endDate);
return receipts.map(r => ({
time: r.timestamp,
transactionId: r.t,
author: r.author || 'anonymous',
changes: {
added: r.flakes_added,
removed: r.flakes_retracted
},
commit: r.commit_id,
verifiable: true
}));
}
Compliance Reports
Generate compliance reports:
async function generateComplianceReport(ledger, period) {
const receipts = await fetchReceipts(ledger, period.start, period.end);
return {
period: period,
totalTransactions: receipts.length,
totalChanges: receipts.reduce((sum, r) => sum + r.flakes_added, 0),
authors: [...new Set(receipts.map(r => r.author))],
verifiedChain: verifyCommitChain(receipts)
};
}
Performance Monitoring
Transaction Metrics
Track transaction performance:
function analyzeReceipts(receipts) {
const durations = receipts.map(r => r.duration_ms);
const sizes = receipts.map(r => r.flakes_added + r.flakes_retracted);
return {
avgDuration: average(durations),
maxDuration: Math.max(...durations),
avgSize: average(sizes),
maxSize: Math.max(...sizes),
throughput: receipts.length / (period.hours)
};
}
Alert on Anomalies
function checkForAnomalies(receipt) {
if (receipt.duration_ms > 1000) {
alert(`Slow transaction: ${receipt.t} took ${receipt.duration_ms}ms`);
}
if (receipt.flakes_added > 10000) {
alert(`Large transaction: ${receipt.t} added ${receipt.flakes_added} flakes`);
}
}
Best Practices
1. Always Store Receipts
Store transaction receipts for audit trail:
const receipt = await transact(transaction);
await storeReceipt(receipt);
2. Verify Commit Chain
Periodically verify commit chain integrity:
async function verifyChainIntegrity(ledger) {
const receipts = await fetchAllReceipts(ledger);
for (let i = 1; i < receipts.length; i++) {
if (receipts[i].previous_commit_id !== receipts[i-1].commit_id) {
throw new Error(`Chain broken at t=${receipts[i].t}`);
}
}
}
3. Use Transaction IDs for References
Store transaction IDs rather than timestamps:
Good:
{ entity: "ex:alice", createdAt_t: 42 }
Less reliable:
{ entity: "ex:alice", createdAt: "2024-01-22T10:30:00Z" }
4. Monitor Performance
Track receipt metadata for performance insights:
const avgDuration = receipts.reduce((sum, r) => sum + r.duration_ms, 0) / receipts.length;
5. Include in Error Handling
Log receipt info on errors:
try {
const receipt = await transact(transaction);
logger.info(`Transaction successful: t=${receipt.t}`);
} catch (err) {
logger.error(`Transaction failed`, {
error: err.message,
transaction: transaction
});
}
Related Documentation
- Overview - Transaction overview
- Signed Transactions - Transaction signing
- Commit Signing and Attestation - Commit-level signatures
- Time Travel - Historical queries
- Indexing Side-Effects - Indexing behavior
Indexing Side-Effects
Transactions in Fluree trigger background indexing processes that build query-optimized data structures. Understanding these side-effects is crucial for performance tuning and capacity planning.
What is Indexing?
Indexing is the process of building query-optimized data structures from transaction data. Fluree maintains four index permutations (SPOT, POST, OPST, PSOT) that enable efficient query execution.
Commit vs Index
Commit (immediate):
- Transaction written to log
- Small, append-only files
- Published to nameservice immediately
- Available for time travel queries
Index (asynchronous):
- Query-optimized structures built
- Background process
- Published to nameservice when complete
- May lag behind commits
Index Structure
Fluree maintains four index permutations:
SPOT (Subject-Predicate-Object-Time)
ex:alice → schema:name → "Alice" → [t=1, t=5, t=10]
ex:alice → schema:age → 30 → [t=1]
ex:alice → schema:age → 31 → [t=10]
Optimized for: “What are all properties of this subject?”
POST (Predicate-Object-Subject-Time)
schema:name → "Alice" → ex:alice → [t=1, t=5, t=10]
schema:age → 30 → ex:alice → [t=1]
schema:age → 31 → ex:alice → [t=10]
Optimized for: “What subjects have this property/value?”
OPST (Object-Predicate-Subject-Time)
"Alice" → schema:name → ex:alice → [t=1, t=5, t=10]
30 → schema:age → ex:alice → [t=1]
31 → schema:age → ex:alice → [t=10]
Optimized for: “What subjects have this value?”
PSOT (Predicate-Subject-Object-Time)
schema:name → ex:alice → "Alice" → [t=1, t=5, t=10]
schema:age → ex:alice → 30 → [t=1]
schema:age → ex:alice → 31 → [t=10]
Optimized for: “What are all values for this predicate?”
Indexing Pipeline
1. Transaction Commit
t=42: Transaction committed
- Flakes written to transaction log
- Commit published to nameservice
- commit_t updated to 42
2. Index Trigger
Background indexing process detects new commits:
Indexer: commit_t=42, index_t=40
Indexer: Need to index t=41, t=42
3. Index Building
Process transactions to build indexes:
For each flake in t=41, t=42:
- Update SPOT index
- Update POST index
- Update OPST index
- Update PSOT index
4. Index Publication
When complete, publish new index:
- Write index snapshot to storage
- Publish index_id to nameservice
- Update index_t to 42
Novelty Layer
The novelty layer is the gap between indexed and committed data:
commit_t = 45
index_t = 40
novelty layer = [t=41, t=42, t=43, t=44, t=45]
Query Execution with Novelty
Queries combine index + novelty:
Query Result = Indexed Data (t ≤ 40) + Novelty Layer (41 ≤ t ≤ 45)
Performance Impact:
- Small novelty: Fast queries (mostly indexed)
- Large novelty: Slower queries (more transaction replay)
Indexing Performance
Transaction Size Impact
Larger transactions take longer to index:
Transaction with 10 flakes:
- 10 flakes × 4 indexes = 40 index updates
- Indexing time: ~1ms
Transaction with 10,000 flakes:
- 10,000 flakes × 4 indexes = 40,000 index updates
- Indexing time: ~100ms
Indexing Rate
Typical indexing rates:
Light load:
- 1,000 flakes/second
- ~10 moderate transactions/second
Heavy load:
- 10,000 flakes/second
- ~100 moderate transactions/second
Actual rates depend on:
- Hardware (CPU, disk I/O)
- Storage backend (memory, file, AWS)
- Transaction patterns
- System load
Monitoring Indexing
Check Indexing Status
curl http://localhost:8090/v1/fluree/info/mydb:main
Response:
{
"ledger_id": "mydb:main",
"commit_t": 150,
"index_t": 140
}
Indexing lag (txns): commit_t - index_t = number of unindexed transactions
Healthy vs Unhealthy
Healthy:
commit_t = 1000
index_t = 998
novelty = 2 transactions (good!)
Unhealthy:
commit_t = 1000
index_t = 850
novelty = 150 transactions (indexing lag!)
Indexing Lag
Indexing lag occurs when indexing can’t keep up with transaction rate.
Causes
-
High Transaction Rate
- More transactions than indexing can handle
- Sustained write load
-
Large Transactions
- Individual transactions with many flakes
- Bulk imports
-
Resource Constraints
- CPU bottleneck
- Disk I/O bottleneck
- Memory pressure
-
Storage Backend Latency
- Slow storage (network attached)
- AWS S3 latency
Impact
Large indexing lag affects:
Query Performance:
- More novelty to replay
- Slower query execution
- Higher CPU usage for queries
Memory Usage:
- Novelty layer held in memory
- Larger memory footprint
Backup/Recovery:
- Larger gap to replay
- Longer recovery times
Tuning Indexing
Background indexing is controlled primarily by:
- Enabling/disabling background indexing (
--indexing-enabled/FLUREE_INDEXING_ENABLED) - Novelty thresholds that trigger indexing / apply backpressure (
--reindex-min-bytes,--reindex-max-bytes)
See Operations: Configuration and Background Indexing for the canonical settings and tuning guidance.
4. Dedicated Indexing Process
For high-load deployments, run dedicated indexer:
# Main server (transact only; background indexing disabled)
fluree-server --indexing-enabled=false
# Indexing server
./fluree-db-indexer --ledgers mydb:main,mydb:dev
Transaction Patterns and Indexing
Batch Transactions
Good pattern:
// Batch into reasonable sizes
const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
const batch = entities.slice(i, i + batchSize);
await transact({ "@graph": batch });
// Allow indexing time
if (i % (batchSize * 10) === 0) {
await sleep(1000);
}
}
Bad pattern:
// Single giant transaction
await transact({ "@graph": allEntities }); // 1 million entities!
Continuous Transactions
For continuous transaction load:
async function writeWithBackpressure(data) {
const status = await checkIndexingStatus();
const lag = status.commit_t - status.index_t;
if (lag > 100) {
// Too much lag, slow down
await sleep(1000);
}
await transact(data);
}
Bulk Imports
For large imports:
async function bulkImport(entities) {
const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
const batch = entities.slice(i, i + batchSize);
await transact({ "@graph": batch });
// Wait for indexing to catch up every 10 batches
if ((i / batchSize) % 10 === 0) {
await waitForIndexing();
}
console.log(`Imported ${i + batch.length} / ${entities.length}`);
}
}
async function waitForIndexing() {
while (true) {
const status = await checkIndexingStatus();
const lag = status.commit_t - status.index_t;
if (lag < 5) break;
await sleep(1000);
}
}
Graph Source Indexing
Graph sources have their own indexing processes:
BM25 Indexing
Full-text search indexes built asynchronously:
t=100: Transaction with new documents
- Main index updated
- BM25 indexer triggered
- Documents added to BM25 index
Vector Search Indexing
Vector embeddings can be indexed separately for approximate nearest-neighbor (ANN) search via HNSW vector indexes (implemented with usearch, feature-gated behind the vector feature).
Inline similarity functions (dotProduct, cosineSimilarity, euclideanDistance) do not require a separate graph-source index; they compute scores directly during query execution.
t=100: Transaction with embeddings
- Main index updated
- Vector indexer triggered
- Vectors added to vector index
See Vector Search for details on HNSW vector indexes and query syntax.
Best Practices
1. Monitor Novelty Layer
Track indexing lag:
setInterval(async () => {
const status = await checkIndexingStatus();
const lag = status.commit_t - status.index_t;
metrics.gauge('index_lag_txns', lag);
if (lag > 100) {
logger.warn(`High indexing lag: ${lag} transactions`);
}
}, 10000); // Check every 10 seconds
2. Batch Appropriately
Keep transactions reasonable size:
- Recommended: 100-1000 entities per transaction
- Maximum: 10,000 entities per transaction
3. Rate Limiting
Implement rate limiting for heavy write loads:
const rateLimiter = new RateLimiter({
tokensPerInterval: 100,
interval: "minute"
});
await rateLimiter.removeTokens(1);
await transact(data);
4. Scheduled Imports
Run large imports during off-hours:
if (isOffPeakHours()) {
await runBulkImport();
} else {
logger.info('Deferring bulk import to off-peak hours');
}
5. Alert on Lag
Set up alerts for indexing lag:
const lag = status.commit_t - status.index_t;
if (lag > 200) {
alert('Critical: Indexing lag > 200 transactions');
}
6. Capacity Planning
Plan capacity based on write load:
Expected load: 10,000 transactions/day
Average size: 100 flakes/transaction
Total: 1,000,000 flakes/day
Indexing capacity needed: ~12 flakes/second
With 4× safety margin: ~50 flakes/second
Troubleshooting
High indexing lag
Symptom: commit_t - index_t growing continuously
Causes:
- Transaction rate exceeds indexing capacity
- Large transactions
- Resource constraints
Solutions:
- Reduce transaction rate
- Split large transactions
- Increase indexing resources
- Tune indexing parameters
Slow Queries
Symptom: Queries slower than expected
Possible Cause: Large novelty layer
Check:
curl http://localhost:8090/v1/fluree/info/mydb:main | jq '.t - .index.t'
Solution: Wait for indexing or reduce write rate
Index Memory Usage
Symptom: High memory usage
Cause: Large indexes or large novelty layer
Solutions:
- Increase system memory
- Reduce novelty layer
- Compact indexes (if supported)
Related Documentation
- Overview - Transaction overview
- Insert - Adding data
- Commit Receipts - Transaction metadata
- Background Indexing - Indexing configuration
Ledger Configuration (Config Graph)
Fluree stores ledger-level configuration as data inside each ledger, in a dedicated system graph called the config graph. This is distinct from server configuration (TOML files, environment variables) which controls how the Fluree process runs.
The config graph holds RDF triples that define operational defaults for the ledger: which policy rules apply, whether SHACL validation runs, what reasoning modes are active, which properties enforce uniqueness, and more. Because config lives inside the ledger, it is:
- Immutable and time-travelable — config at any historical
tis recoverable - Auditable — every config change is a signed, committed transaction
- Replicable — config travels with the ledger across nodes and forks
- Replay-safe — deterministic interpretation without runtime environment state
Graph layout
Every ledger reserves system named graphs:
| Graph | IRI pattern | Purpose |
|---|---|---|
| Default graph | (implicit) | Application data |
| Txn-meta | urn:fluree:{ledger_id}#txn-meta | Commit metadata |
| Config graph | urn:fluree:{ledger_id}#config | Ledger configuration |
User-defined named graphs (created via TriG) are identified by their IRI and allocated after the system graphs.
The config graph IRI is deterministic — derived from the ledger identifier. For a ledger mydb:main, the config graph is urn:fluree:mydb:main#config.
Core concepts
f:LedgerConfig
A single f:LedgerConfig resource in the config graph defines ledger-wide defaults. If multiple exist, the one with the lexicographically smallest @id wins (with a logged warning).
Setting groups
Configuration is organized into independent setting groups, each governing a different subsystem:
| Setting group | Subsystem | Key fields |
|---|---|---|
f:policyDefaults | Policy enforcement | f:defaultAllow, f:policySource, f:policyClass |
f:shaclDefaults | SHACL validation | f:shaclEnabled, f:shapesSource, f:validationMode |
f:reasoningDefaults | OWL/RDFS reasoning | f:reasoningModes, f:schemaSource |
f:datalogDefaults | Datalog rules | f:datalogEnabled, f:rulesSource |
f:transactDefaults | Transaction constraints | f:uniqueEnabled, f:constraintsSource |
Each group is resolved independently — locking down policy does not affect whether reasoning can be overridden.
Per-graph overrides
Ledger-wide defaults apply to all graphs. For finer control, f:graphOverrides on the f:LedgerConfig contains f:GraphConfig entries that override settings for specific named graphs. See Override control for the full resolution model.
Privileged system read
Config is read via a privileged system read that bypasses policy enforcement. This is necessary because config defines the policy — reading it through the policy-enforced path would create a circular dependency. User queries against the config graph still go through normal policy enforcement.
Lagging config
Config changes take effect on the next transaction, not the current one. The transaction pipeline reads config from the pre-transaction state. This prevents a transaction from “authorizing itself” by changing config within its own payload.
Common patterns
These recipes cover typical scenarios. Each assumes the ledger mydb:main — substitute your own ledger ID.
Lock down a production ledger
Deny all access by default and require policy rules for every operation. Use f:OverrideNone so no query can bypass:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:policyDefaults [
f:defaultAllow false ;
f:policySource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] ;
f:overrideControl f:OverrideNone
] .
}
After this transaction, the next transaction and all subsequent queries will require matching policy rules in the default graph. Make sure policy rules are already in place before enabling this — see Config mutation governance.
Enable SHACL validation in development (warn mode)
Validate data shapes but log warnings instead of rejecting — useful during development:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:shaclDefaults [
f:shaclEnabled true ;
f:shapesSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] ;
f:validationMode f:ValidationWarn
] .
}
Switch to f:ValidationReject when ready for production.
Enforce unique emails
Two-step setup: annotate the property, then enable enforcement:
@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .
# Step 1: Annotate the property (in the default graph)
ex:email f:enforceUnique true .
# Step 2: Enable enforcement (in the config graph)
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:transactDefaults [
f:uniqueEnabled true
] .
}
See Unique constraints for full details including per-graph scoping and edge cases.
Enable RDFS reasoning by default
Automatically expand rdfs:subClassOf and rdfs:subPropertyOf in all queries:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:reasoningDefaults [
f:reasoningModes f:RDFS ;
f:schemaSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
]
] .
}
With f:OverrideAll (the default), individual queries can still opt out by passing "reasoning": "none".
Different policy per graph
Allow open access to most graphs but lock down a sensitive one:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:policyDefaults [
f:defaultAllow true ;
f:overrideControl f:OverrideAll
] ;
f:graphOverrides (
[ a f:GraphConfig ;
f:targetGraph <http://example.org/sensitive> ;
f:policyDefaults [
f:defaultAllow false ;
f:policySource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] ;
f:overrideControl f:OverrideNone
]
]
) .
}
The sensitive graph requires policy rules and cannot be overridden at query time. All other graphs remain open.
Troubleshooting
Config changes seem to have no effect
Config uses lagging semantics — changes take effect on the next transaction, not the current one. If you enable SHACL and insert invalid data in the same transaction, the data will be accepted. The next transaction will enforce the new config.
Ledger became unmodifiable after policy misconfiguration
If you set f:defaultAllow false with f:OverrideNone before granting write access to the config graph, the ledger becomes locked — no transaction can modify it (including config changes). Recovery requires a ledger fork/restore. To prevent this:
- Always write policy rules first, then enable restrictive policy in a subsequent transaction
- Test with
f:OverrideAllbefore switching tof:OverrideNone - Ensure at least one identity has write access to the config graph before locking down
Multiple f:LedgerConfig resources
If the config graph contains more than one f:LedgerConfig resource, the system uses the one with the lexicographically smallest @id and logs a warning. Use the recommended subject IRI convention (urn:fluree:{ledger_id}:config:ledger) to avoid this.
Config graph query returns empty results
User queries against the config graph go through policy enforcement. If f:defaultAllow is false and no policy explicitly grants read access to the config graph, queries will return empty results even though config is active. The system’s internal privileged read is unaffected.
CLI usage
The config graph is written and queried through normal CLI transaction and query commands:
# Write config via TriG
fluree insert --ledger mydb:main --format trig config.trig
# Query the config graph via SPARQL
fluree query --ledger mydb:main --format sparql \
'PREFIX f: <https://ns.flur.ee/db#>
SELECT ?s ?p ?o
FROM <urn:fluree:mydb:main#config>
WHERE { ?s ?p ?o }'
No special CLI commands are needed — config is data, written and queried like any other named graph.
In this section
- Writing config data — How to create and update config via TriG, SPARQL, or JSON-LD
- Setting groups — All setting groups with fields and examples
- Override control — Resolution precedence, identity gating, monotonicity
- Unique constraints — Enforcing property value uniqueness with
f:enforceUnique
Writing Config Data
The config graph is mutated using normal ledger transactions — config writes are signed, versioned, and replicable like any other write. The only difference is that the triples target the config graph IRI.
Config graph IRI
Each ledger’s config graph has a deterministic IRI:
urn:fluree:{ledger_id}#config
For a ledger named mydb:main, the config graph is urn:fluree:mydb:main#config.
Writing via TriG
TriG is the most natural format for writing to named graphs. Wrap your config triples in a GRAPH block targeting the config graph IRI:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:policyDefaults [
f:defaultAllow false
] ;
f:shaclDefaults [
f:shaclEnabled true ;
f:validationMode f:ValidationReject
] .
}
Writing via SPARQL UPDATE
Use INSERT DATA with a GRAPH clause:
PREFIX f: <https://ns.flur.ee/db#>
INSERT DATA {
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:reasoningDefaults [
f:reasoningModes f:RDFS ;
f:schemaSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
]
] .
}
}
Writing via JSON-LD
Use the @graph key with a named graph wrapper:
{
"@context": { "f": "https://ns.flur.ee/db#" },
"@graph": [
{
"@id": "urn:fluree:mydb:main:config:ledger",
"@type": "f:LedgerConfig",
"@graph": "urn:fluree:mydb:main#config",
"f:shaclDefaults": {
"f:shaclEnabled": true,
"f:validationMode": { "@id": "f:ValidationReject" }
}
}
]
}
Updating config
Config changes are normal ledger operations. To change a setting, use a DELETE/INSERT WHERE pattern that binds the existing blank node:
PREFIX f: <https://ns.flur.ee/db#>
DELETE {
GRAPH <urn:fluree:mydb:main#config> {
?policy f:defaultAllow false .
}
}
INSERT {
GRAPH <urn:fluree:mydb:main#config> {
?policy f:defaultAllow true .
}
}
WHERE {
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> f:policyDefaults ?policy .
?policy f:defaultAllow false .
}
}
This pattern binds ?policy to the existing setting-group blank node, retracts the old value, and asserts the new one. It avoids the problem of DELETE DATA with blank nodes (which cannot match stored blank node identities).
Alternatively, give setting-group nodes explicit IRIs so they can be addressed directly:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:policyDefaults <urn:fluree:mydb:main:config:policy> .
<urn:fluree:mydb:main:config:policy>
f:defaultAllow false ;
f:overrideControl f:OverrideAll .
}
With explicit IRIs, individual fields can be retracted by subject IRI without binding.
Retracting a field returns the ledger to the system default for that setting (as if the field were absent).
Config mutation governance
Config writes go through the normal policy-enforced transaction path. This means:
- Reading config is privileged (system read, bypasses policy) — necessary to bootstrap.
- Writing config is not privileged — policy enforcement applies.
A defaultAllow: false config is self-protecting: the policy it defines must explicitly grant write access to the config graph for any changes to be possible.
If a ledger becomes unmodifiable due to a policy misconfiguration (no authorized config writers), recovery requires a ledger fork/restore — there is no superuser bypass.
Recommended subject IRI
For operational simplicity, use a stable, conventional subject IRI:
urn:fluree:{ledger_id}:config:ledger
Colons (not a second # fragment) keep the IRI well-formed: the graph IRI already uses a fragment (#config), and RFC 3986 allows only one fragment per IRI. Using colons produces a valid URN (RFC 8141) that stays scoped to the ledger and avoids accidental multiple-config instances.
Querying the config graph
The config graph is a named graph like any other — you can query it with SPARQL or JSON-LD to inspect the current configuration.
SPARQL
PREFIX f: <https://ns.flur.ee/db#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?setting ?pred ?val
FROM <mydb:main#config>
WHERE {
?config rdf:type f:LedgerConfig ;
?setting ?group .
?group ?pred ?val .
FILTER(?setting IN (
f:policyDefaults, f:shaclDefaults, f:reasoningDefaults,
f:datalogDefaults, f:transactDefaults
))
}
JSON-LD query
{
"@context": { "f": "https://ns.flur.ee/db#" },
"from": {
"@id": "mydb:main",
"graph": "urn:fluree:mydb:main#config"
},
"select": ["?config", "?pred", "?val"],
"where": [
{ "@id": "?config", "@type": "f:LedgerConfig", "?pred": "?val" }
]
}
Ledger-scoped endpoint
curl -X POST "http://localhost:8090/v1/fluree/query/mydb:main" \
-H "Content-Type: application/sparql-query" \
-d 'PREFIX f: <https://ns.flur.ee/db#>
SELECT ?s ?p ?o
FROM <urn:fluree:mydb:main#config>
WHERE { ?s ?p ?o }'
Policy applies to reads
User queries against the config graph go through normal policy enforcement. If f:defaultAllow is false and no policy grants read access to the config graph, user queries will return empty results. The system still reads config via a privileged path (bypassing policy), so config always takes effect regardless of policy.
Time-travel
Config is part of the ledger’s immutable commit chain. You can query config at any historical point:
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?setting ?val
FROM <mydb:main@t:5#config>
WHERE {
?config a f:LedgerConfig ;
f:policyDefaults ?policy .
?policy ?setting ?val .
}
Lagging semantics
Config changes take effect on the next transaction. The transaction pipeline reads config from the pre-transaction state (t - 1). This prevents a transaction from changing the rules it is validated against.
This means:
- Enabling SHACL in the same transaction as invalid data will not reject that data
- Enabling
f:uniqueEnabledin the same transaction as duplicate values will not reject those duplicates - The next transaction after the config change will be validated against the new config
Setting Groups
Each setting group configures a different subsystem. Groups are resolved independently — locking down one group does not affect others.
All setting groups can appear on both f:LedgerConfig (ledger-wide defaults) and f:GraphConfig (per-graph overrides), except where noted.
System defaults
When no config graph is present (or a setting group is absent), the system defaults apply:
| Setting group | System default |
|---|---|
| Policy | f:defaultAllow true — all queries and transactions are permitted |
| SHACL | Disabled — no shape validation |
| Reasoning | Disabled — no OWL/RDFS inference |
| Datalog | Disabled — no rule evaluation |
| Transact constraints | Disabled — no uniqueness enforcement |
| Override control | f:OverrideAll — any request can override any setting |
In other words, an unconfigured ledger is fully open: no policy, no validation, no reasoning. This matches the behavior of a fresh ledger and ensures backward compatibility.
Policy defaults
Group predicate: f:policyDefaults
Controls default policy enforcement behavior.
| Field | Type | Default | Description |
|---|---|---|---|
f:defaultAllow | boolean | true | Allow (true) or deny (false) when no policy rule matches |
f:policySource | f:GraphRef | (none) | Graph containing policy rules (f:Allow, f:Modify, etc.) |
f:policyClass | IRI or list | (none) | Default policy classes to apply |
f:overrideControl | IRI or object | f:OverrideAll | Override gating (see Override control) |
f:policySource is non-overridable — it can only be changed by writing to the config graph, not at query time. f:defaultAllow and f:policyClass are overridable (subject to override control).
When f:policySource is set, the policy loader scans the specified graph for policy rules instead of the default graph. This keeps policy rules separate from end-user data. If f:policySource is not set, policies are loaded from the default graph (backward compatible).
Current limitations: f:policySource only supports same-ledger graphs. Cross-ledger references (f:ledger), temporal pinning (f:atT), trust policy, and rollback guard fields are parsed but will produce an error if configured.
Example: policies in the default graph
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:policyDefaults [
f:defaultAllow false ;
f:policySource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] ;
f:overrideControl f:OverrideAll
] .
}
Example: policies in a named graph
Storing policy rules in a dedicated named graph keeps them out of the default data graph. The identity’s f:policyClass triples must also be in the policy graph.
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:policyDefaults [
f:defaultAllow false ;
f:policySource [
a f:GraphRef ;
f:graphSource [ f:graphSelector <urn:fluree:mydb:main/policy> ]
]
] .
}
SHACL defaults
Group predicate: f:shaclDefaults
Controls SHACL shape validation at transaction time.
| Field | Type | Default | Description |
|---|---|---|---|
f:shaclEnabled | boolean | false | Enable or disable SHACL validation |
f:shapesSource | f:GraphRef | (none) | Graph containing SHACL shapes |
f:validationMode | IRI | f:ValidationReject | f:ValidationReject (reject invalid data) or f:ValidationWarn (log warning, allow) |
f:overrideControl | IRI or object | f:OverrideAll | Override gating |
f:shapesSource is non-overridable. f:shaclEnabled and f:validationMode are overridable.
Example
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:shaclDefaults [
f:shaclEnabled true ;
f:shapesSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] ;
f:validationMode f:ValidationReject ;
f:overrideControl f:OverrideNone
] .
}
Reasoning defaults
Group predicate: f:reasoningDefaults
Controls OWL/RDFS reasoning applied at query time.
| Field | Type | Default | Description |
|---|---|---|---|
f:reasoningModes | IRI or list | (none) | Reasoning modes: f:RDFS, f:OWL2QL, f:OWL2RL, f:Datalog |
f:schemaSource | f:GraphRef | (none) | Graph containing schema triples (rdfs:subClassOf, etc.) |
f:overrideControl | IRI or object | f:OverrideAll | Override gating |
f:schemaSource is non-overridable. f:reasoningModes is overridable.
Example
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:reasoningDefaults [
f:reasoningModes f:RDFS ;
f:schemaSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] ;
f:overrideControl f:OverrideAll
] .
}
Datalog defaults
Group predicate: f:datalogDefaults
Controls Fluree’s stored datalog rules (f:rule).
| Field | Type | Default | Description |
|---|---|---|---|
f:datalogEnabled | boolean | false | Enable or disable datalog rule evaluation |
f:rulesSource | f:GraphRef | (none) | Graph containing f:rule definitions |
f:allowQueryTimeRules | boolean | true | Allow queries to supply ad-hoc rules |
f:overrideControl | IRI or object | f:OverrideAll | Override gating |
f:rulesSource is non-overridable. f:datalogEnabled and f:allowQueryTimeRules are overridable.
Example
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:datalogDefaults [
f:datalogEnabled true ;
f:rulesSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] ;
f:allowQueryTimeRules false ;
f:overrideControl f:OverrideNone
] .
}
Transact defaults
Group predicate: f:transactDefaults
Controls transaction-time constraint enforcement, such as property value uniqueness.
| Field | Type | Default | Description |
|---|---|---|---|
f:uniqueEnabled | boolean | false | Enable unique constraint enforcement |
f:constraintsSource | f:GraphRef or list | default graph | Graph(s) containing constraint annotations (e.g., f:enforceUnique) |
f:overrideControl | IRI or object | f:OverrideAll | Override gating |
When f:uniqueEnabled is true and f:constraintsSource is omitted, the default graph is used as the constraint source.
Additive merge semantics
Unlike other setting groups where per-graph values replace ledger-wide values field-by-field, transact defaults use additive merge semantics:
f:uniqueEnabled: Once enabled at the ledger level, it stays enabled for all graphs. Per-graph configs cannot disable it.f:constraintsSource: Per-graph sources are added to ledger-wide sources, not substituted. A graph checks annotations from all sources (ledger-wide + graph-specific).
This prevents a per-graph override from accidentally disabling enforcement or dropping constraint sources.
Note: additive merge is still subject to override control. If the ledger-wide f:overrideControl for f:transactDefaults is f:OverrideNone, per-graph additions are blocked entirely — the ledger-wide settings are final.
Example
@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .
# Define constraint annotations in the default graph
ex:email f:enforceUnique true .
ex:ssn f:enforceUnique true .
# Enable enforcement via config
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:transactDefaults [
f:uniqueEnabled true ;
f:constraintsSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
]
] .
}
See Unique constraints for full details on f:enforceUnique.
Full-text defaults
Group predicate: f:fullTextDefaults
Declares properties whose string values should be indexed for BM25 full-text
scoring without requiring the @fulltext datatype per value, and sets the
default analyzer language for untagged plain strings.
| Field | Type | Default | Description |
|---|---|---|---|
f:defaultLanguage | BCP-47 string | "en" | Analyzer language for plain (xsd:string) values on configured properties |
f:property | f:FullTextProperty list | empty | One node per property to full-text index |
f:overrideControl | IRI or object | f:OverrideAll | Override gating |
Each f:property entry is an f:FullTextProperty node carrying f:target —
the IRI of the property being indexed. Additional optional knobs (per-property
language, tokenizer, etc.) can be added to f:FullTextProperty in the future
without breaking the schema.
The @fulltext datatype retains its zero-config shortcut semantics: any value
tagged @fulltext always indexes as English, regardless of what
f:fullTextDefaults declares. Configured plain-string paths and
@fulltext-datatype English content share the same per-property English
arena — no duplication.
rdf:langString values auto-route to per-language arenas by their tag. An
unrecognized BCP-47 tag tokenizes + lowercases only (no stopwords, no
stemming) — consistent on both indexing and query sides.
Additive merge semantics
Like f:transactDefaults, f:fullTextDefaults uses additive merge. Per-graph
f:property entries are appended to the ledger-wide list (deduping by
target IRI — per-graph wins on a collision). Per-graph f:defaultLanguage
shadows the ledger-wide value. Ledger-wide f:OverrideNone blocks per-graph
overrides entirely.
Config changes require a manual reindex
Editing f:fullTextDefaults never triggers any indexing automatically. Arenas
reflect the config that was in effect at their build time; to pick up a
changed property list or default language, run a full reindex (fluree reindex … or equivalent). Until then, existing arenas stay authoritative and
novelty written after the config change is scored with whatever language the
current effective config resolves to — which may produce temporarily
mismatched scoring until the reindex completes.
An in-flight reindex operates on a point-in-time snapshot and will not see a config change committed during its run. Wait for the reindex to finish, then trigger a new one against the post-change state.
Example
@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:fullTextDefaults [
a f:FullTextDefaults ;
f:defaultLanguage "en" ;
f:property [ a f:FullTextProperty ; f:target ex:title ] ,
[ a f:FullTextProperty ; f:target ex:body ]
] .
}
Per-graph override example
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:fullTextDefaults [
a f:FullTextDefaults ;
f:defaultLanguage "en" ;
f:property [ a f:FullTextProperty ; f:target ex:title ]
] ;
f:graphOverrides [
a f:GraphConfig ;
f:targetGraph <urn:example:productCatalog> ;
f:fullTextDefaults [
a f:FullTextDefaults ;
f:defaultLanguage "es" ;
f:property [ a f:FullTextProperty ; f:target ex:productName ]
]
] .
}
Under this config, queries touching the productCatalog graph analyze
untagged plain strings as Spanish ("es"); other graphs keep English.
ex:title is full-text indexed everywhere (ledger-wide); ex:productName
is indexed only in the productCatalog graph.
See Inline fulltext search for the
end-user guide — when to pick this path over the @fulltext datatype,
supported languages, per-graph multilingual setups, the reindex workflow,
and how configured properties interact with @fulltext-datatype values.
Ledger-scoped settings
Some settings are structurally tied to the ledger as a whole and are not meaningful per-graph. They live exclusively on f:LedgerConfig and are ignored if present on f:GraphConfig.
Override control does not apply to ledger-scoped settings — they are changed only by writing to the config graph.
Note:
f:authzSource(an identity/relationship graph used by policy evaluation) is planned as a ledger-scoped setting but is not yet implemented. When available, it will let the config graph specify which graph contains identity data (e.g., DID→role mappings) for policy resolution.
f:GraphRef: referencing source graphs
Several fields (f:policySource, f:shapesSource, f:schemaSource, f:rulesSource, f:constraintsSource) use f:GraphRef to point at graphs containing rules, shapes, schema, or constraints.
A f:GraphRef has two levels: the outer node carries the type and optional trust/rollback settings, and a nested f:graphSource object carries the source coordinates:
| Field | Level | Type | Description |
|---|---|---|---|
f:graphSource | f:GraphRef | object | Nested source coordinates (required) |
f:trustPolicy | f:GraphRef | object | How to verify the referenced graph (future) |
f:rollbackGuard | f:GraphRef | object | Freshness constraints (future) |
f:graphSelector | f:graphSource | IRI | Target graph: f:defaultGraph, f:txnMetaGraph, or a named graph IRI |
f:ledger | f:graphSource | IRI | Ledger identifier (for cross-ledger references; not yet supported for constraint sources) |
f:atT | f:graphSource | integer | Pin to a specific transaction time (optional) |
For the common case of referencing a graph within the same ledger, only f:graphSelector is needed inside f:graphSource:
f:shapesSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] .
For referencing the config graph itself (co-resident rules/shapes):
f:policySource [
a f:GraphRef ;
f:graphSource [ f:graphSelector <urn:fluree:mydb:main#config> ]
] .
Cross-ledger f:GraphRef (using f:ledger to reference another ledger) is defined in the schema but not yet supported for constraint source resolution. Currently, only local graph references are resolved.
Override Control
Fluree’s config resolution follows a three-tier precedence model. Each setting group is resolved independently, and an override control mechanism governs whether higher-priority sources can change values set at lower tiers.
Resolution precedence
Settings are resolved from lowest to highest priority:
| Priority | Source | When it applies |
|---|---|---|
| 4 (lowest) | System defaults | No config present (allow-all, no SHACL, no reasoning) |
| 3 | Ledger-wide config (f:LedgerConfig) | Fallback for any setting not overridden at higher tiers |
| 2 | Per-graph config (f:GraphConfig) | Only if ledger-wide override control permits |
| 1 (highest) | Query/transaction-time opts | Only if effective override control permits + identity check passes |
Override control modes
Each setting group may include an f:overrideControl field controlling whether higher-priority sources can override the value.
| Mode | Value | Behavior |
|---|---|---|
| No overrides | f:OverrideNone | Config values are final. No per-graph or query-time overrides permitted. |
| All overrides | f:OverrideAll | Any request can override. Default when f:overrideControl is absent. |
| Identity-gated | Object with f:controlMode: f:IdentityRestricted | Only requests with a server-verified identity matching f:allowedIdentities can override. |
Identity-gated example
{
"f:overrideControl": {
"f:controlMode": { "@id": "f:IdentityRestricted" },
"f:allowedIdentities": [
{ "@id": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK" }
]
}
}
Identity verification
Override identity is the server-verified request identity (canonical DID string), not a user-supplied query parameter. Specifically:
- With the
credentialfeature: the DID from the verified JWSkidheader - With server auth middleware: the DID mapped from an OAuth token
- A caller cannot become an allowed identity by setting
"opts": {"identity": "..."}in query JSON — that field is for policy evaluation context, not override authorization - Anonymous requests (no verified identity) are always denied by
f:IdentityRestricted
Query-time vs transact-time overrides
- Query-time overrides (reasoning modes, policy opts): identity is the query caller
- Transact-time overrides (SHACL mode, validation settings): identity is the transaction signer
Monotonicity: per-graph can only tighten
Ledger-wide f:overrideControl sets the maximum permissiveness. Per-graph configs may only restrict further, never loosen.
Permissiveness ordering: f:OverrideNone < f:IdentityRestricted < f:OverrideAll
The effective per-graph override control is min(ledger-wide, per-graph):
| Ledger-wide | Per-graph | Effective | Why |
|---|---|---|---|
OverrideNone | OverrideAll | OverrideNone | Per-graph cannot loosen (warning logged) |
IdentityRestricted({alice}) | OverrideAll | IdentityRestricted({alice}) | Per-graph cannot loosen |
IdentityRestricted({alice, bob}) | IdentityRestricted({alice}) | IdentityRestricted({alice}) | Intersection: per-graph tightens |
OverrideAll | OverrideNone | OverrideNone | Per-graph tightens (valid) |
OverrideAll | IdentityRestricted({alice}) | IdentityRestricted({alice}) | Per-graph tightens (valid) |
OverrideAll | (absent) | OverrideAll | Inherits ledger-wide |
When both are IdentityRestricted, the effective allowedIdentities is the intersection of the two lists.
Resolution algorithm
For each setting group independently:
1. Start with system defaults
2. Apply ledger-wide config for this group (if present)
3. Get ledger-wide overrideControl (default: OverrideAll)
4. If ledger-wide overrideControl is OverrideNone:
→ this group is final. Skip to step 8.
5. Apply per-graph config for this group (if present)
6. Compute effective overrideControl:
= min(ledgerWide, perGraph)
If both IdentityRestricted: allowedIdentities = intersection
7. Check effective overrideControl against query/txn-time opts:
OverrideNone → config values are final
OverrideAll → apply query-time opts
IdentityRestricted → apply only if request identity matches
8. Result is the effective setting for this group.
Per-group truth tables
Policy (f:policyDefaults)
| Ledger-wide | Per-graph | Query (identity) | Effective | Why |
|---|---|---|---|---|
defaultAllow: false, OverrideNone | (none) | defaultAllow: true (any) | deny | No overrides allowed |
defaultAllow: false, OverrideAll | (none) | defaultAllow: true (any) | allow | All overrides allowed |
defaultAllow: false, IdentityRestricted({alice}) | (none) | defaultAllow: true (alice) | allow | Alice is authorized |
defaultAllow: false, IdentityRestricted({alice}) | (none) | defaultAllow: true (bob) | deny | Bob not authorized |
defaultAllow: false, IdentityRestricted({alice}) | (none) | defaultAllow: true (anon) | deny | No identity = no override |
defaultAllow: false, OverrideNone | defaultAllow: true | (none) | deny | OverrideNone blocks per-graph |
defaultAllow: false, OverrideAll | defaultAllow: true | (none) | allow | Per-graph overrides ledger-wide |
defaultAllow: true, OverrideAll | defaultAllow: false, OverrideNone | defaultAllow: true (any) | deny | Per-graph OverrideNone blocks query |
| (none) | (none) | (none) | allow | System default (allow-all) |
Reasoning (f:reasoningDefaults)
| Ledger-wide | Per-graph | Query (identity) | Effective | Why |
|---|---|---|---|---|
modes: [rdfs], OverrideNone | (none) | reasoning: [owl2-rl] (any) | rdfs | No overrides |
modes: [rdfs], OverrideAll | (none) | reasoning: [owl2-rl] (any) | owl2-rl | Override allowed |
modes: [rdfs], IdentityRestricted({alice}) | (none) | reasoning: [owl2-rl] (alice) | owl2-rl | Alice authorized |
modes: [rdfs], IdentityRestricted({alice}) | (none) | reasoning: [owl2-rl] (bob) | rdfs | Bob not authorized |
modes: [rdfs], OverrideAll | modes: [owl2-rl] | (none) | owl2-rl | Per-graph overrides |
modes: [rdfs], OverrideNone | modes: [owl2-rl] | (none) | rdfs | OverrideNone blocks per-graph |
SHACL (f:shaclDefaults)
| Ledger-wide | Per-graph | Effective | Why |
|---|---|---|---|
enabled: false, OverrideNone | enabled: true | disabled | OverrideNone blocks per-graph |
enabled: true, OverrideAll | enabled: false | disabled | Per-graph disables for its graph |
mode: warn, OverrideAll | mode: reject | reject | Per-graph overrides |
Transact (f:transactDefaults)
Transact defaults use additive merge semantics, unlike other groups. However, the general override control rule still applies: if the ledger-wide f:overrideControl is f:OverrideNone, per-graph transact defaults are blocked entirely.
| Ledger-wide | Per-graph | Effective | Why |
|---|---|---|---|
uniqueEnabled: true | uniqueEnabled: false | enabled | Monotonic OR — cannot disable |
uniqueEnabled: true, sources: [default] | sources: [schemaGraph] | sources: [default, schemaGraph] | Additive — sources accumulate |
uniqueEnabled: false | uniqueEnabled: true | enabled | Per-graph can enable |
uniqueEnabled: true, OverrideNone | sources: [schemaGraph] | sources: [default] only | OverrideNone blocks per-graph additions |
Overridable vs non-overridable fields
Not all fields in a setting group are overridable. Source pointers (where rules/shapes/schema come from) are always config-only:
| Subsystem | Overridable fields | Non-overridable (config-only) |
|---|---|---|
f:policyDefaults | f:defaultAllow, f:policyClass | f:policySource |
f:shaclDefaults | f:validationMode, f:shaclEnabled | f:shapesSource |
f:reasoningDefaults | f:reasoningModes | f:schemaSource |
f:datalogDefaults | f:datalogEnabled, f:allowQueryTimeRules | f:rulesSource |
Non-overridable fields can only be changed by writing to the config graph. This prevents a query from redirecting the engine to read rules or schema from an arbitrary graph.
Per-graph overrides
Per-graph overrides target specific named graphs by IRI:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:policyDefaults [
f:defaultAllow true ;
f:overrideControl f:OverrideAll
] ;
f:graphOverrides (
[ a f:GraphConfig ;
f:targetGraph <http://example.org/sensitive> ;
f:policyDefaults [
f:defaultAllow false ;
f:overrideControl f:OverrideNone
]
]
) .
}
In this example:
- All graphs default to
defaultAllow: truewithOverrideAll http://example.org/sensitiveoverrides todefaultAllow: falsewithOverrideNone— no query can override policy for this graphf:targetGraphusesf:defaultGraphfor the default graph
Unique Constraints (f:enforceUnique)
Fluree supports transaction-time enforcement of property value uniqueness via f:enforceUnique. This is complementary to SHACL — it runs independently.
How it works
Unique constraint enforcement has two parts:
- Annotation: Mark properties as unique by asserting
f:enforceUnique trueon their IRIs in any graph - Activation: Enable enforcement in the config graph via
f:transactDefaults
This separation follows the same pattern as SHACL (shapes + config activation) and reasoning (schema + config activation). Annotations alone do nothing — enforcement must be explicitly enabled.
Step 1: Define unique properties
Assert f:enforceUnique true on any property IRI that should enforce uniqueness. These annotations can live in the default graph or any named graph:
@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .
# In the default graph
ex:email f:enforceUnique true .
ex:ssn f:enforceUnique true .
Step 2: Enable enforcement
Enable unique constraint checking in the config graph:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:transactDefaults [
f:uniqueEnabled true
] .
}
When f:constraintsSource is omitted, the default graph is used as the annotation source.
Explicit constraint source
To read annotations from a specific graph:
@prefix f: <https://ns.flur.ee/db#> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:transactDefaults [
f:uniqueEnabled true ;
f:constraintsSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
]
] .
}
Multiple constraint sources
Multiple sources can be specified — all are checked:
f:transactDefaults [
f:uniqueEnabled true ;
f:constraintsSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
] , [
a f:GraphRef ;
f:graphSource [ f:graphSelector <http://example.org/schema> ]
]
] .
What gets enforced
Once enabled, any transaction that would result in two or more distinct subjects holding the same value for a unique property within the same graph is rejected.
Scoping: per-graph
Uniqueness is enforced per graph. The same value on the same property is allowed across different named graphs:
# Graph A: ex:alice ex:email "alice@example.com" — OK
# Graph B: ex:bob ex:email "alice@example.com" — OK (different graph)
# Graph A: ex:carol ex:email "alice@example.com" — REJECTED (same graph as alice)
Value identity
Uniqueness is determined by the storage-layer value representation, not by RDF strict equality. The uniqueness key is:
(graph, predicate, value)
where “value” is the internal storage representation (type discriminant + payload).
The enforcement query matches on (predicate, object) in the POST index without constraining by datatype or language tag. This means:
- Two values with different datatype IRIs but the same internal representation are treated as the same value. For example,
"hello"^^xsd:stringand"hello"^^ex:customTypeboth store as the same string value internally, so they conflict. - Two values with different language tags but the same string content conflict, because the language tag is metadata, not part of the value key.
- Two values with different internal representations are naturally distinct. For example,
"42"(stored as a string) and42(stored as an integer) do not conflict because they are different value types at the storage layer.
This design matches how humans think about value identity and prevents circumventing uniqueness by attaching a different datatype annotation or language tag.
Intra-transaction enforcement
Uniqueness is checked after staging, so conflicts within a single transaction are caught:
{
"@context": { "ex": "http://example.org/ns/" },
"@graph": [
{ "@id": "ex:alice", "ex:email": "same@example.com" },
{ "@id": "ex:bob", "ex:email": "same@example.com" }
]
}
This transaction is rejected because two subjects assert the same value for a unique property.
Upsert safety
Upserts that change a value are handled correctly. When an upsert retracts the old value and asserts a new one in the same transaction, the old value is no longer active — no false positive.
Idempotent re-insert
Re-asserting the same (subject, property, value) triple that already exists is allowed. One subject still holds the value — no violation.
Error message
When a uniqueness violation is detected, the transaction fails with an error like:
Unique constraint violation: property <http://example.org/ns/email>
value "alice@example.com" already exists for subject
<http://example.org/ns/alice> in graph default
(conflicting subject: <http://example.org/ns/bob>)
Lagging config
Config is read from the pre-transaction state. This means:
- Enabling
f:uniqueEnabledand inserting duplicate values in the same transaction will not reject the duplicates - The next transaction will enforce the constraint
This is intentional and consistent with all other config graph features.
Per-graph overrides
Transact defaults use additive merge semantics:
f:uniqueEnableduses monotonic OR — once enabled at the ledger level, per-graph configs cannot disable itf:constraintsSourceis additive — per-graph sources are added to (not replace) ledger-wide sources
Note: additive merge is still subject to override control. If the ledger-wide f:overrideControl for f:transactDefaults is f:OverrideNone, per-graph additions are blocked entirely.
This means a per-graph override can add additional constraint sources but cannot remove ledger-wide ones:
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:transactDefaults [
f:uniqueEnabled true ;
f:constraintsSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector f:defaultGraph ]
]
] ;
f:graphOverrides (
[ a f:GraphConfig ;
f:targetGraph <http://example.org/graphX> ;
f:transactDefaults [
f:constraintsSource [
a f:GraphRef ;
f:graphSource [ f:graphSelector <http://example.org/schema> ]
]
]
]
) .
}
In this example, graphX checks unique annotations from both the default graph (ledger-wide) and http://example.org/schema (per-graph addition).
Zero cost when not configured
When f:uniqueEnabled is not set or is false, uniqueness checking is completely skipped — no property scan, no index queries, no overhead. The enforcement code fast-paths out immediately.
Complete example
@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/ns/> .
# 1. Define unique annotations in the default graph
ex:email f:enforceUnique true .
# 2. Enable enforcement in the config graph
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:transactDefaults [
f:uniqueEnabled true
] .
}
After this transaction, the next transaction that attempts to give two subjects the same ex:email value (within the same graph) will be rejected.
Security and Policy
Fluree provides comprehensive security features including authentication, fine-grained access control through policies, and transparent encryption of data at rest.
Authentication
Authentication
Fluree’s authentication model, covering:
- Identity vs transport (DIDs, signed requests, Bearer tokens)
- Three auth modes: decentralized did:key, standalone server tokens, OIDC/OAuth2
- Bearer token claim set and scope definitions
- Replication vs query access boundary
- Token verification paths (Ed25519 + OIDC/JWKS)
Data Encryption
Storage Encryption
Protect data at rest with AES-256-GCM encryption:
- Transparent encryption/decryption
- Environment variable key configuration
- Portable ciphertext format
- Key rotation support
Commit Integrity
Commit Signing and Attestation
Cryptographic proof of which node wrote a commit:
- Ed25519 signatures over domain-separated commit digests
- Embedded signature blocks in commit files
- did:key signer identities
- Future: detached attestations and consensus policies
Policy System
Policy Model and Inputs
Understanding Fluree’s policy architecture:
- Policy structure and syntax
- Subject, action, resource model
- Policy evaluation order
- Input data for policy decisions
- Default allow vs default deny
Policy in Queries
How policies affect query execution:
- Query-time filtering
- Result set restrictions
- Pattern-based filtering
- Performance considerations
- Policy debugging for queries
Policy in Transactions
How policies affect transaction operations:
- Transaction validation
- Authorization checks
- Entity-level permissions
- Property-level permissions
- Policy-based retractions
Programmatic Policy API (Rust)
Using policies in Rust applications:
wrap_identity_policy_view- Identity-based policy lookup viaf:policyClasswrap_policy_view- Inline policies withQueryConnectionOptions- Policy precedence rules
- Transaction-side policy enforcement
- Historical views with policy
Key Concepts
Data-Level Security
Fluree enforces security at the data level, not just the application level:
- Users see only authorized data
- Policies applied during query execution
- No unauthorized data leakage
- Transparent to applications
Policy as Data
Policies are stored as RDF triples in the database:
- Version controlled with data
- Query policies like any data
- Time travel for policy history
- Policies can reference other data
Identity-Based Access
Policies use decentralized identifiers (DIDs):
- did:key for cryptographic identity
- did:web for organization identity
- Signed requests link to DID
- Policies grant/deny based on DID
Policy Structure
Basic policy format:
{
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/ns/"
},
"@id": "ex:read-policy",
"@type": "f:Policy",
"f:subject": "did:key:z6Mkh...",
"f:action": "query",
"f:resource": {
"@type": "schema:Person"
},
"f:allow": true
}
Subject: Who (DID, role, group) Action: What operation (query, transact) Resource: Which data (type, predicate, specific entities) Allow/Deny: Grant or deny access
Policy Enforcement Points
Query Time
Policies filter query results:
SELECT ?name
WHERE {
?person schema:name ?name .
}
Policy filters results to only show authorized people.
Transaction Time
Policies validate transaction operations:
{
"@graph": [
{ "@id": "ex:alice", "schema:age": 31 }
]
}
Policy checks if user can modify ex:alice.
Common Policy Patterns
Allow All (Development)
{
"@id": "ex:allow-all",
"f:subject": "*",
"f:action": "*",
"f:allow": true
}
Role-Based Access
{
"@id": "ex:admin-policy",
"f:subject": { "ex:role": "admin" },
"f:action": "*",
"f:allow": true
}
Resource-Type Based
{
"@id": "ex:public-data-policy",
"f:subject": "*",
"f:action": "query",
"f:resource": { "@type": "ex:PublicData" },
"f:allow": true
}
Property-Level Access
{
"@id": "ex:sensitive-property-policy",
"f:subject": { "ex:role": "hr" },
"f:action": "query",
"f:resource": {
"f:predicate": "ex:salary"
},
"f:allow": true
}
Owner-Based Access
{
"@id": "ex:owner-policy",
"f:subject": "?user",
"f:action": ["query", "transact"],
"f:resource": {
"ex:owner": "?user"
},
"f:allow": true
}
Policy Evaluation
Evaluation Order
- Collect applicable policies based on subject, action, resource
- Evaluate each policy against request context
- Combine results using policy combining algorithm
- Apply default if no policies match
Combining Algorithms
Deny Overrides (default):
- If any policy denies, access denied
- Otherwise, allow if any policy allows
- Default: deny if no matches
Allow Overrides:
- If any policy allows, access granted
- Otherwise, deny if any policy denies
- Default: deny if no matches
Policy Context
Policies have access to runtime context:
Request Context:
- Subject DID
- Action being performed
- Target resource/entity
- Timestamp
Data Context:
- Entity properties
- Related entities
- Graph structure
- Historical data
Example using context:
{
"f:subject": "?user",
"f:resource": {
"ex:department": "?dept"
},
"f:condition": "?user ex:department ?dept",
"f:allow": true
}
Allows access if user is in same department as resource.
Multi-Tenant Policies
Isolate data by tenant:
{
"@id": "ex:tenant-isolation-policy",
"f:subject": "?user",
"f:action": "*",
"f:resource": {
"ex:tenant": "?tenant"
},
"f:condition": "?user ex:tenant ?tenant",
"f:allow": true
}
Users can only access data from their tenant.
Policy Performance
Efficient Policies
Good (specific):
{
"f:resource": { "@type": "ex:PublicData" },
"f:allow": true
}
Less efficient (broad):
{
"f:resource": { "?pred": "?value" },
"f:condition": "complex graph pattern",
"f:allow": true
}
Query Optimization
Policies are optimized during query planning:
- Type-based filters pushed down
- Property filters optimized
- Complex patterns may impact performance
Policy Management
Creating Policies
Policies are created via transactions:
curl -X POST "http://localhost:8090/v1/fluree/upsert?ledger=policies:main" \
-H "Content-Type: application/json" \
-d '{
"@graph": [
{
"@id": "ex:new-policy",
"@type": "f:Policy",
"f:subject": "did:key:z6Mkh...",
"f:action": "query",
"f:allow": true
}
]
}'
Updating Policies
Update using WHERE/DELETE/INSERT:
{
"where": [
{ "@id": "ex:policy-1", "f:allow": "?oldValue" }
],
"delete": [
{ "@id": "ex:policy-1", "f:allow": "?oldValue" }
],
"insert": [
{ "@id": "ex:policy-1", "f:allow": false }
]
}
Policy Versioning
Policies are versioned with data:
- Time travel to see historical policies
- Audit who changed policies when
- Rollback policies if needed
Security Best Practices
1. Principle of Least Privilege
Grant minimum necessary permissions:
// Good: Specific permissions
{
"f:subject": "did:key:z6Mkh...",
"f:action": "query",
"f:resource": { "@type": "ex:PublicData" },
"f:allow": true
}
// Bad: Overly broad
{
"f:subject": "did:key:z6Mkh...",
"f:action": "*",
"f:allow": true
}
2. Default Deny
Start with deny-all, add specific allows:
// Default policy
{
"@id": "ex:default",
"f:subject": "*",
"f:action": "*",
"f:allow": false
}
// Specific allows
{
"@id": "ex:public-read",
"f:subject": "*",
"f:action": "query",
"f:resource": { "@type": "ex:PublicData" },
"f:allow": true
}
3. Use Roles
Define roles, not individual permissions:
{
"@id": "ex:admin-role",
"@type": "ex:Role",
"ex:permissions": ["read", "write", "admin"]
}
{
"@id": "ex:role-policy",
"f:subject": { "ex:hasRole": "ex:admin-role" },
"f:action": "*",
"f:allow": true
}
4. Audit Policy Changes
Track who changes policies:
{
"@id": "ex:policy-audit",
"ex:changedBy": "did:key:z6Mkh...",
"ex:changedAt": "2024-01-22T10:00:00Z",
"ex:reason": "Added read access for contractors"
}
5. Test Policies
Test policies before deploying:
async function testPolicy(policy, testCases) {
for (const testCase of testCases) {
const result = await evaluatePolicy(policy, testCase);
assert.equal(result.allowed, testCase.expected);
}
}
Related Documentation
- Verifiable Data - Cryptographic signatures
- Signed Requests - Request authentication
- Signed Transactions - Transaction signing
- Commit Signing and Attestation - Commit-level signatures
Authentication
Fluree supports multiple authentication mechanisms to cover different deployment scenarios — from standalone servers with no external identity provider to managed platforms using OIDC.
This document describes the authentication model, the supported modes, the bearer token claim set, and the access boundary between replication and query operations.
Identity vs transport
Identity (who)
Fluree policy enforcement is based on an identity, ideally a DID:
- Preferred:
did:key:...— portable across environments, no central identity server required - Also possible: other DIDs or IRIs mapped into Fluree policy (e.g.
ex:alice)
Policies are stored as RDF triples in the ledger and evaluated at query/transaction time against the requesting identity. See Policy model for details.
Transport (how requests authenticate)
Two “on-the-wire” mechanisms carry the identity:
| Mechanism | Format | When to use |
|---|---|---|
| Signed requests | JWS/VC envelope containing the DID | Proof-of-possession; trustless environments |
| Bearer tokens | Authorization: Bearer <JWT> | Session-based; OIDC/OAuth2 flows |
Bearer tokens are a UX and deployment convenience — they do not replace the identity model. The server extracts the identity from the token claims and enforces the same dataset policies as signed requests.
Three supported auth modes
Mode 1 — Decentralized: did:key signed requests (no IdP)
- The client holds an Ed25519 keypair and derives a
did:key:... - Requests are signed using JWS or Verifiable Credential format
- The server verifies the signature and uses the DID as the principal
- Dataset policies decide allow/deny
This preserves Fluree’s core value: no central identity server required.
See Signed requests (JWS/VC) for the wire format.
Mode 2 — Standalone server with offline-minted tokens
Designed for: “stand up a server somewhere” (local dev, single-node EC2, etc.).
- An admin generates an Ed25519 keypair with
fluree token keygen - The admin mints a scoped Bearer token with
fluree token create - The admin provides the token to CLI users or stores it in a secret manager
- The server validates the token’s embedded JWK signature and enforces scopes + policy
The policy identity remains DID-based (fluree.identity claim), so authorization stays dataset/policy driven even though the transport is a Bearer token.
See CLI token command for minting instructions.
Mode 3 — OIDC/OAuth2 with an external identity provider
Designed for: managed platforms (e.g., any application using an OIDC provider).
- The IdP authenticates the user (device flow, PKCE, etc.)
- The application knows the user’s Fluree dataset entitlements
- The application issues (or exchanges for) a Fluree-scoped token carrying:
- identity (
fluree.identity— ideally a DID) - ledger read/write scopes
- optional policy class
- identity (
- The server verifies the token against the provider’s JWKS endpoint
This preserves separation of concerns:
- IdP: authentication (who logged in)
- Application: authorization (what they can access in Fluree)
The server must be configured with --jwks-issuer to trust OIDC tokens. See Configuration — OIDC.
Bearer token claim set
All Fluree Bearer tokens (Mode 2 and Mode 3) share the same claim set. The server extracts identity and scopes from these claims regardless of how the token was signed.
Standard JWT claims
| Claim | Required | Description |
|---|---|---|
iss | Yes | Issuer — did:key:... for Ed25519 tokens, URL for OIDC tokens |
sub | No | Subject — human-readable identity of the token holder |
aud | No | Audience — target service (e.g. server URL) |
exp | Yes | Expiration time (Unix timestamp) |
iat | Yes | Issued-at time (Unix timestamp) |
Fluree-specific claims
| Claim | Type | Description |
|---|---|---|
fluree.identity | String (IRI/DID) | Identity for policy enforcement — takes precedence over sub |
fluree.policy.class | String (IRI) | Optional policy class for identity-based policy lookup |
Scope claims
Scopes control which endpoints and ledgers a token can access.
Query scopes (fluree.ledger.*)
| Claim | Type | Description |
|---|---|---|
fluree.ledger.read.all | Boolean | Read access to all ledgers via data API |
fluree.ledger.read.ledgers | Array of strings | Read access to specific ledgers |
fluree.ledger.write.all | Boolean | Write access to all ledgers via data API |
fluree.ledger.write.ledgers | Array of strings | Write access to specific ledgers |
Replication scopes (fluree.storage.*)
| Claim | Type | Description |
|---|---|---|
fluree.storage.all | Boolean | Storage/replication access to all ledgers |
fluree.storage.ledgers | Array of strings | Storage/replication access to specific ledgers |
Back-compat: fluree.storage.* claims also imply data API read access for the same ledgers.
Populating fluree.storage.ledgers (multi-tenant hint)
If you run an IdP or a request-router that exchanges IdP tokens for Fluree-scoped tokens, prefer populating fluree.storage.ledgers rather than granting fluree.storage.all.
Recommended conventions for mapping IdP group/role claims to ledger scopes:
- Treat group values like
fluree:storage:<ledger-id>(example:fluree:storage:books:main) as permission to replicate that ledger. - Optionally support wildcards at the router boundary (example:
fluree:storage:books:*expands to the set of ledgers your router knows about underbooks:). - Reserve
fluree.storage.all=truefor admin/service accounts.
Event scopes (fluree.events.*)
| Claim | Type | Description |
|---|---|---|
fluree.events.all | Boolean | SSE event stream for all ledgers |
fluree.events.ledgers | Array of strings | SSE event stream for specific ledgers |
Example token payload
{
"iss": "https://solo.example.com",
"sub": "alice@example.com",
"aud": "https://fluree.example.com",
"exp": 1700000000,
"iat": 1699996400,
"fluree.identity": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"fluree.ledger.read.all": true,
"fluree.ledger.write.ledgers": ["mydb:main", "mydb:staging"]
}
Token verification paths
The server supports two verification paths, selected automatically based on the JWT header:
| JWT header | Path | Algorithm | Trust model |
|---|---|---|---|
Contains jwk (embedded key) | Ed25519 / did:key | EdDSA | Issuer trust checked against --events-auth-trusted-issuer (or admin/storage equivalents) |
Contains kid (key ID) | OIDC / JWKS | RS256 | Issuer must match a --jwks-issuer; key fetched from JWKS endpoint |
This dual-path dispatch is transparent to callers — the same Authorization: Bearer <token> header works for both paths. The server applies the same scope and identity enforcement regardless of which path verified the signature.
Replication vs query access boundary
Fluree draws a hard boundary between replication-scoped and query-scoped access.
Replication access (fluree.storage.*)
Replication operations — nameservice sync, storage proxy reads, and CLI fetch/pull/push — require root-level fluree.storage.* claims. These operations transfer raw commit data and index blocks; they bypass dataset policy because the data must be bit-identical to what the transaction server wrote.
Replication tokens are intended for operator and service-account use (e.g. a peer server’s storage-proxy token, or an admin’s CLI pull/push workflow). They should never be issued to end users.
Query access (fluree.ledger.read/write.*)
Query operations — /v1/fluree/query/{ledger...}, /v1/fluree/insert/{ledger...}, connection-scoped SPARQL, etc. — use fluree.ledger.read/write.* claims. These go through the full query engine and dataset policy enforcement. The server never exposes raw storage bytes through query endpoints.
Query tokens are appropriate for end users and application service accounts. Combined with a fluree.identity claim and dataset policies, the server enforces fine-grained row- and property-level access control.
CLI consequence: track vs pull
| Command | Access type | Required scope | What happens |
|---|---|---|---|
fluree pull | Replication | fluree.storage.* | Downloads raw commits and indexes into local storage |
fluree track | Query | fluree.ledger.read/write.* | Registers a remote ledger; queries forwarded to server |
If a user holds only query-scoped tokens, they cannot clone or pull a ledger. They can only track it and issue queries/transactions against the remote.
Identity precedence
When multiple identity signals are present, the server uses this precedence (highest first):
- Signed request DID — proof-of-possession from JWS/VC signature
- Bearer token
fluree.identity— identity claim in the token - Client-provided headers/body — only honored when the server is in unauthenticated mode
When auth is present, the server forces opts.identity (and optional policy class) from the token, ignoring any client-provided identity in headers or request bodies. This prevents identity spoofing.
Endpoint coverage
All Bearer-token-authenticated endpoints support both Ed25519 and OIDC verification paths:
| Endpoint group | Extractor | Scopes checked |
|---|---|---|
| Data API (query/update/info/exists) | MaybeDataBearer | fluree.ledger.read/write.* |
| Admin (create/drop) | require_admin_token | Issuer trust |
| Events (SSE) | MaybeBearer | fluree.events.* |
| Storage proxy | StorageProxyBearer | fluree.storage.* |
| Nameservice refs | StorageProxyBearer | fluree.storage.* |
MCP endpoints currently use the Ed25519 path only.
Security notes
- Tokens are validated server-side on every request; client-side validation is never trusted
- Out-of-scope ledgers return
404(not403) to avoid existence leaks fluree.storage.*tokens grant raw data access — issue only to trusted operators- Connection-scoped SPARQL (
FROM/FROM NAMED) requires all referenced ledgers to be within the token’s read scope
See also
- Signed requests (JWS/VC) — Wire format for signed requests
- Configuration — OIDC — Server OIDC/JWKS setup
- CLI auth command — Managing tokens on remotes
- CLI token command — Minting Ed25519 tokens
- Auth contract (CLI ↔ Server) — Discovery, exchange, and refresh protocol
- Policy model — Dataset-level access control
Storage Encryption
Fluree supports transparent encryption of data at rest using AES-256-GCM authenticated encryption. When enabled, all data written to storage is automatically encrypted, and data is decrypted transparently when read.
Overview
Key Features:
- AES-256-GCM: Industry-standard authenticated encryption with integrity protection
- Transparent Operation: Encryption/decryption happens automatically on read/write
- All Storage Backends: Works natively with file, S3, and memory storage
- Portable Ciphertext: Encrypted data can be moved between storage backends (file ↔ S3)
- Environment Variable Support: Keys can be loaded from environment variables
- Secure Key Handling: Key material in
EncryptionKeyis zeroized on drop
Quick Start
Rust API
#![allow(unused)]
fn main() {
use fluree_db_api::FlureeBuilder;
// Option 1: Direct key (for testing)
let key: [u8; 32] = /* your 32-byte key */;
let fluree = FlureeBuilder::file("/data/fluree")
.build_encrypted(key)?;
// Option 2: Base64-encoded key
let fluree = FlureeBuilder::file("/data/fluree")
.with_encryption_key_base64("your-base64-encoded-32-byte-key")?
.build_encrypted_from_config()?;
// Option 3: From JSON-LD config with env var
let config = serde_json::json!({
"@context": {"@vocab": "https://ns.flur.ee/system#"},
"@graph": [{
"@type": "Connection",
"indexStorage": {
"@type": "Storage",
"filePath": "/data/fluree",
"AES256Key": {"envVar": "FLUREE_ENCRYPTION_KEY"}
}
}]
});
let fluree = FlureeBuilder::from_json_ld(&config)?
.build_encrypted_from_config()?;
}
Server Configuration
Set the encryption key via environment variable:
# Generate a secure 32-byte key and base64 encode it
export FLUREE_ENCRYPTION_KEY=$(openssl rand -base64 32)
# Start the server with JSON-LD config
./fluree-db-server --config config.jsonld
Configuration
JSON-LD Configuration
The encryption key is specified in the storage configuration using AES256Key:
{
"@context": {
"@base": "https://example.org/config/",
"@vocab": "https://ns.flur.ee/system#"
},
"@graph": [
{
"@id": "indexStorage",
"@type": "Storage",
"filePath": "/var/lib/fluree/data",
"AES256Key": {
"envVar": "FLUREE_ENCRYPTION_KEY"
}
},
{
"@id": "mainConnection",
"@type": "Connection",
"indexStorage": {"@id": "indexStorage"},
"cacheMaxMb": 2000
}
]
}
Configuration Options
| Field | Type | Description |
|---|---|---|
AES256Key | string or object | Base64-encoded 32-byte encryption key |
AES256Key.envVar | string | Environment variable containing the key |
AES256Key.defaultVal | string | Fallback key if env var is not set |
Environment Variable Indirection
You can load the encryption key from an environment variable:
{
"AES256Key": {
"envVar": "FLUREE_ENCRYPTION_KEY"
}
}
Or with a fallback default (not recommended for production):
{
"AES256Key": {
"envVar": "FLUREE_ENCRYPTION_KEY",
"defaultVal": "fallback-base64-key-for-dev-only"
}
}
Key Management
Generating Keys
Generate a cryptographically secure 32-byte key:
# Using OpenSSL (recommended)
openssl rand -base64 32
# Using /dev/urandom
head -c 32 /dev/urandom | base64
# Example output: "K7gNU3sdo+OL0wNhqoVWhr3g6s1xYv72ol/pe/Unols="
Key Storage Best Practices
- Never commit keys to version control
- Use environment variables or secret managers
- Rotate keys periodically (see Key Rotation below)
- Limit access to key material
Recommended secret management solutions:
- HashiCorp Vault
- AWS Secrets Manager
- Kubernetes Secrets
- Docker secrets
Key Rotation
The encryption envelope format includes a key_id field to support key rotation:
- Existing data continues to be readable with the old key
- New writes use the new key
- Re-encrypt on read (optional): Decrypt with old key, re-encrypt with new key
Note: Full key rotation support with
KeyProvidertrait is planned for a future release. Currently, a single static key is used.
Encryption Details
Algorithm
- Cipher: AES-256-GCM (Galois/Counter Mode)
- Key Size: 256 bits (32 bytes)
- Nonce Size: 96 bits (12 bytes), randomly generated per write
- Tag Size: 128 bits (16 bytes)
Ciphertext Envelope Format
All encrypted data uses a portable envelope format:
┌──────────────────────────────────────────────────────────────┐
│ Header (22 bytes) │
├──────────┬─────────┬─────────┬──────────┬───────────────────┤
│ Magic │ Version │ Alg │ Key ID │ Nonce │
│ 4 bytes │ 1 byte │ 1 byte │ 4 bytes │ 12 bytes │
│ "FLU\0" │ 0x01 │ 0x01 │ uint32 │ random │
├──────────┴─────────┴─────────┴──────────┴───────────────────┤
│ Ciphertext (variable length) │
├──────────────────────────────────────────────────────────────┤
│ Authentication Tag (16 bytes) │
└──────────────────────────────────────────────────────────────┘
- Magic bytes:
FLU\0(0x46 0x4C 0x55 0x00) for format detection - Version: Format version (currently 0x01)
- Algorithm: 0x01 = AES-256-GCM
- Key ID: Identifier for key rotation support
- Nonce: Randomly generated per encryption operation
- Authentication Tag: GCM integrity tag (authenticates header + ciphertext)
Security Properties
- Confidentiality: AES-256 encryption protects data content
- Integrity: GCM authentication tag detects tampering
- Authenticity: Header is included in AAD (Additional Authenticated Data)
- Non-deterministic: Random nonces mean same plaintext → different ciphertext
Portability
Encrypted data is portable between storage backends:
# Encrypted files can be copied from local storage to S3
aws s3 sync /var/lib/fluree/data s3://my-bucket/fluree/
# And back again
aws s3 sync s3://my-bucket/fluree/ /var/lib/fluree/data
The same encryption key will decrypt data regardless of where it’s stored.
Performance Considerations
- CPU overhead: ~5-15% for encryption/decryption (depends on hardware AES support)
- Storage overhead: 22 bytes header + 16 bytes tag per object
- Memory: Keys are kept in memory while the connection is open
Modern CPUs with AES-NI instructions provide hardware acceleration, minimizing the performance impact.
Troubleshooting
Common Errors
“Invalid encryption format”
- The data doesn’t have the expected magic bytes
- Possible causes: trying to read unencrypted data with encryption enabled, or corrupted data
“Unknown encryption key ID”
- The data was encrypted with a different key than what’s configured
- Check that the correct key is being used
“Decryption failed”
- The encryption key doesn’t match
- The data may be corrupted
- The authentication tag verification failed (data was tampered with)
“Encryption key must be 32 bytes”
- The provided key is the wrong length
- Base64-decode your key and verify it’s exactly 32 bytes
Verifying Encryption
Check if a file is encrypted by looking for the magic bytes:
# Check first 4 bytes of a file
xxd -l 4 /var/lib/fluree/data/some-file
# Encrypted: 00000000: 464c 5500 FLU.
# Unencrypted: will show different bytes (likely JSON or Avro magic)
Changing Encryption Settings
Enabling Encryption on Existing Data
To encrypt existing unencrypted data:
- Export all ledgers to JSON-LD
- Delete the old unencrypted data directory
- Configure encryption with a new key
- Import the JSON-LD data
# 1. Export (while running without encryption)
fluree export mydb:main --format json-ld > mydb-export.jsonld
# 2. Stop server and backup/delete old data
mv /var/lib/fluree/data /var/lib/fluree/data-unencrypted-backup
# 3. Configure encryption key
export FLUREE_ENCRYPTION_KEY=$(openssl rand -base64 32)
echo "Save this key securely: $FLUREE_ENCRYPTION_KEY"
# 4. Start server with encryption config and import
./fluree-db-server --config encrypted-config.jsonld
fluree create mydb --from mydb-export.jsonld
Disabling Encryption
Warning: This exposes your data. Only do this if absolutely necessary.
Follow the same export/import process, but configure without an encryption key.
Related Documentation
- Storage Modes - Storage backend configuration
- Configuration - General configuration reference
- Policy Model - Access control and authorization
Commit Signing and Attestation
Fluree supports cryptographic signing at two levels:
- Transaction signatures prove who submitted a transaction (user-facing). See Signed Transactions.
- Commit signatures prove which node wrote a commit (infrastructure-facing). This page covers commit signatures.
Both use did:key identifiers with Ed25519 signatures, aligning with the credential infrastructure in fluree-db-credential.
Note: Requires the credential feature flag. See Compatibility and Feature Flags.
Transaction Signatures vs Commit Signatures
These two signature types serve different purposes:
| Transaction Signature | Commit Signature | |
|---|---|---|
| Proves | Who submitted the transaction | Which node wrote the commit |
| Signed by | End user (client-side) | Fluree node (server-side) |
| Trust model | User authentication | Infrastructure integrity |
| Format | JWS / Verifiable Credential | Domain-separated Ed25519 over commit hash |
| Stored in | Commit envelope (txn_signature) | Trailing signature block after commit hash |
A single commit can have both: a transaction signature from the user who submitted it, and a commit signature from the node that wrote it.
How Commit Signing Works
Commit Digest
When a commit is written, its content is hashed with SHA-256 to produce a commit_hash. The signing digest is then computed with domain separation to prevent cross-protocol and cross-ledger replay:
to_sign = SHA-256("fluree/commit/v1" || varint(ledger_id.len()) || ledger_id || commit_hash)
Where:
"fluree/commit/v1"is a domain separator (18 bytes ASCII)ledger_idis the ledger ID (name:branch, length-prefixed)commit_hashis the 32-byte SHA-256 of the commit content
Signature Block Layout
The signature block is appended after the commit hash and is not covered by it:
+-------------------------------------+
| Header (32 bytes) |
| flags: includes HAS_COMMIT_SIG |
+-------------------------------------+
| Envelope + Ops + Dictionaries |
+-------------------------------------+
| Footer (64 bytes) |
+-------------------------------------+
| commit_hash (32 bytes) |
+-------------------------------------+
| Signature Block (optional) | <-- after hash boundary
| sig_count: u16 |
| signatures: [CommitSignature] |
+-------------------------------------+
This design means:
commit_hashis stable regardless of signatures- Signatures can be added without changing the commit’s content address
- Existing verification (hash check) works unchanged
Signature Entry Format
Each signature entry contains:
| Field | Type | Description |
|---|---|---|
signer | String | Signer identity (did:key:z6Mk...) |
algo | u8 | Signing algorithm (0x01 = Ed25519) |
signature | [u8; 64] | Ed25519 signature bytes |
timestamp | i64 | Signing time (epoch millis, informational only) |
metadata | Option<Vec<u8>> | Optional metadata (node_id, region, role for consensus) |
The algo byte provides forward compatibility for new signature algorithms. Unknown algo values are rejected on decode (not silently skipped).
The timestamp is informational only and is not part of the signed digest. Ordering is determined by the commit chain, not by signature timestamps.
The metadata field is reserved for future consensus features (multi-node signing, quorum sets). It allows nodes to include identifying information like node ID, region, or role. Currently unused but present in the format to avoid future versioning.
Enabling Commit Signing (Rust API)
Commit signing is opt-in via CommitOpts when using the Rust API:
#![allow(unused)]
fn main() {
use std::sync::Arc;
use fluree_db_novelty::SigningKey;
// Load or generate an Ed25519 signing key
let signing_key = Arc::new(SigningKey::from_bytes(&key_bytes));
// Attach to commit options
let opts = CommitOpts::default()
.with_signing_key(signing_key);
}
When a signing key is present, the commit writer:
- Computes the domain-separated digest from the commit hash and ledger ID
- Signs the digest with Ed25519
- Appends the signature block after the commit hash
- Sets the
FLAG_HAS_COMMIT_SIGbit in the header
Verifying Commit Signatures
Verification recomputes the domain-separated digest and checks the Ed25519 signature:
#![allow(unused)]
fn main() {
use fluree_db_credential::verify_commit_digest;
verify_commit_digest(
&signer_did, // "did:key:z6Mk..."
&signature_bytes, // [u8; 64]
&commit_hash, // [u8; 32]
ledger_id, // "mydb:main"
)?;
}
The verifier:
- Extracts the Ed25519 public key from the
did:keyidentifier - Recomputes
to_sign = SHA-256("fluree/commit/v1" || varint(ledger_id.len()) || ledger_id || commit_hash) - Verifies the signature over
to_sign
No external key registry is needed for did:key identifiers — the public key is embedded in the DID itself.
Wire Format
Each CommitSignature is encoded as:
signer_len: u16 (LE) - length of signer string
signer: [u8; signer_len] - UTF-8 did:key identifier
algo: u8 - signature algorithm (0x01 = Ed25519)
signature: [u8; 64] - Ed25519 signature bytes
timestamp: i64 (LE) - signing timestamp (epoch millis)
meta_len: u16 (LE) - metadata length (0 if none)
metadata: [u8; meta_len] - optional metadata bytes
The signature block is prefixed with sig_count: u16 (LE) containing the number of signatures.
Security Properties
Replay Prevention
- Cross-ledger: The ledger ID is part of the signed digest, so a signature from ledger A cannot be replayed on ledger B
- Cross-protocol: The domain separator
"fluree/commit/v1"prevents signatures meant for other systems from being accepted - Version upgrade: Changing the domain separator (e.g.,
v1tov2) invalidates old signatures
What Commit Signatures Do Not Provide
- Transaction authorization: Use transaction signatures and policies for user-level access control
- Consensus: A single commit signature proves one node wrote it. Multi-node consensus requires attestation policies (see below)
- Encryption: Commit signing provides integrity and authenticity, not confidentiality. See Storage Encryption for data-at-rest protection
Future: Attestations and Consensus Policy
The following capabilities are designed but not yet implemented.
Detached Attestations
For multi-node deployments, signatures can be collected as separate attestation objects rather than embedded in the commit:
- Commit file remains immutable and content-addressed
- Signatures collected asynchronously from multiple nodes
- No coordination needed during commit write
- Attestations from different nodes can arrive at different times
Consensus Policy
Consensus policy will define how many signatures are required for a commit to be accepted:
- None: No signatures required (default)
- Single signer: One designated writer must sign
- Threshold (K-of-N): At least K signatures from an allowlist of N signers
- Quorum set: At least one signature from each required group
Policy validation runs after commit hash integrity check, before accepting the commit.
Related Documentation
- Signed Transactions — User-facing transaction signing (JWS/VC)
- Verifiable Data — Cryptographic verification concepts
- Storage Encryption — Data-at-rest encryption
- Commit Receipts — Commit metadata and content hashes
Policy Model and Inputs
This is the reference for Fluree’s access-control policy model. For a conceptual introduction, see Policy enforcement. For worked examples, see the policy cookbook. For Rust-side wiring (building a PolicyContext, wrap_identity_policy_view, transaction helpers), see Programmatic policy API.
Policy node shape
Every policy is a JSON-LD node. Required @type: f:AccessPolicy (the IRI is https://ns.flur.ee/db#AccessPolicy). A second class IRI (e.g. ex:CorpPolicy) is conventional and allows the policy to be loaded by policy-class.
{
"@id": "ex:somePolicy",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "ex:salary"}],
"f:onClass": [{"@id": "ex:Employee"}],
"f:onSubject": [{"@id": "ex:alice"}],
"f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
"f:query": "<JSON-encoded WHERE>",
"f:allow": true,
"f:exMessage": "Reason returned to caller on denial"
}
Predicate reference
| Predicate | Type | Required? | Description |
|---|---|---|---|
f:action | array of IRIs (or single IRI string) | yes | Which operations the policy governs. Values: f:view (queries), f:modify (transactions). |
f:allow | boolean | one of f:allow / f:query | Static decision. true permits, false denies. Takes precedence over f:query if both are present. |
f:query | string (JSON-encoded JSON-LD WHERE) | one of f:allow / f:query | Dynamic decision. The targeted flake is permitted when the query returns at least one row. ?$this and ?$identity are pre-bound. |
f:onProperty | array of @id references | no | Restrict the policy to flakes whose predicate is one of these IRIs. |
f:onClass | array of @id references | no | Restrict the policy to flakes whose subject has one of these rdf:types. |
f:onSubject | array of @id references | no | Restrict the policy to flakes whose subject IRI is one of these. |
f:required | boolean | no, defaults to false | When true, the policy MUST allow for access to its targets to be granted, regardless of default-allow. |
f:exMessage | string | no | User-facing error message returned when this policy denies a transaction. |
If neither f:allow nor f:query is present, the policy is deny by default.
If multiple targeting predicates are present, they intersect: the policy applies only to flakes that match the property AND the class AND the subject sets.
If all targeting predicates are omitted, the policy is a default policy that applies to every flake of its f:actions.
Action values
f:action carries IRIs in the f: namespace:
"f:view"(or{"@id": "f:view"}) — queries."f:modify"(or{"@id": "f:modify"}) — transactions.- Both:
[{"@id": "f:view"}, {"@id": "f:modify"}].
A policy with no f:action defaults to applying to both view and modify.
f:query syntax
f:query is a string containing a JSON-encoded JSON-LD query. The engine parses the string and runs the query as a subquery for each candidate flake, with two pre-bound variables:
| Variable | Binding |
|---|---|
?$this | The IRI of the subject being read or written. |
?$identity | The IRI of the requesting identity (resolved from opts.identity, policy_values["?$identity"], or the verified bearer-token subject). |
Anything else binds via the embedded WHERE just like a normal Fluree query.
Because RDF can’t carry structured JSON values natively, stored policies must JSON-encode the query (serde_json::to_string). For inline policies passed via opts.policy, you can also use the JSON-LD typed-literal form {"@type": "@json", "@value": {...}} to avoid manually escaping.
Example (string form, suitable for storing in a transaction):
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"
Example (typed-literal form, suitable for inline policies):
"f:query": {
"@type": "@json",
"@value": {
"where": {"@id": "?$identity", "http://example.org/role": "hr"}
}
}
Inline policies must use full IRIs. Compact IRIs (
schema:ssn) inside an inline policy passed throughopts.policyare not expanded against the request@context. Use full IRIs (http://schema.org/ssn).
Combining algorithm
When more than one policy targets the same flake, the engine combines them as follows:
- If any required policy (
f:required: true) targets the flake and does not allow it (eitherf:allow: false, missingf:allow, orf:queryreturning no rows), access is denied for that flake. Required policies are gates: they cannot be overridden by other allows or bydefault-allow. - If at least one targeted (but not required) policy allows the flake, access is granted. Non-required allows combine with allow-overrides semantics.
- If a targeted policy’s
f:queryreturns false (no rows), that policy applied but did not permit — the flake is denied even ifdefault-allowistrue. Default-allow only applies when no policy targets the flake. - If no policies target the flake,
default-allowdecides.falsedenies;truepermits.
f:allow always takes precedence over f:query: if both are set on the same policy, f:allow wins.
For a deeper treatment, including the three-state identity resolution semantics (FoundWithPolicies / FoundNoPolicies / NotFound), see the Policy combining algorithm section in the programmatic policy API reference.
Default-allow
default-allow is the fallback decision for flakes that no policy targets:
| Setting | Behavior |
|---|---|
default-allow: false | Fail-closed. A flake with no targeting policies is denied. Recommended for production. |
default-allow: true | Fail-open. A flake with no targeting policies is allowed. Useful in development or in deployments where an application layer handles authorization and Fluree is recording signed transactions for provenance. |
Important: default-allow: true does not override required policies that fail. It only governs the no-policy case.
Identity resolution
When opts.identity is set, Fluree resolves it to a ?$identity SID and applies the identity’s f:policyClass automatically — every stored policy of that class is loaded into the request’s policy set.
The resolution path:
opts.identity → policy_class → policy → policy_values["?$identity"]
(highest) (lowest)
If multiple are set, the higher-priority binding wins. policy_values["?$identity"] is a manual escape hatch — useful when you want to test a specific identity SID without going through the full resolution path.
A request with no identity supplied uses an “anonymous” context: only inline policies, no class-based discovery, no ?$identity binding.
Where policies come from
Two delivery paths, often combined:
Stored policies
Persist policies as data in the ledger. The policy node carries the class type alongside f:AccessPolicy:
{
"@id": "ex:salary-restriction",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
...
}
Identities tag themselves with f:policyClass:
{
"@id": "ex:aliceIdentity",
"ex:user": {"@id": "ex:alice"},
"f:policyClass": [{"@id": "ex:CorpPolicy"}]
}
When opts.identity = "ex:aliceIdentity", every f:AccessPolicy whose @type includes ex:CorpPolicy is loaded for the request — no per-request policy listing needed. Stored policies are versioned, time-travelable, branchable, and consistent across all callers.
Inline policies
Pass policies in opts.policy (an array of policy nodes) for ad-hoc requests:
{
"from": "mydb:main",
"select": "?x",
"where": [...],
"opts": {
"policy": [
{"@id": "ex:adhoc", "@type": "f:AccessPolicy", "f:action": "f:view", "f:allow": true}
],
"default-allow": false
}
}
Useful for tests, admin scripts, and migration tooling. Inline policies and stored policies can coexist in a single request.
Request-time options
Each request can supply these opts fields (JSON-LD form). Over SPARQL, the equivalent fluree-* HTTP headers carry the same values.
opts field | HTTP header | Description |
|---|---|---|
identity | fluree-identity | IRI of an identity entity. Drives f:policyClass discovery and binds ?$identity. |
policy-class | fluree-policy-class | Class IRI(s) to load stored policies by. Repeated header or comma-separated. |
policy-values | fluree-policy-values | JSON object of additional ?$var bindings injected into every policy’s f:query. |
policy | fluree-policy | Inline policy array (full JSON-LD). |
default-allow | fluree-default-allow | true / false. Fallback decision for flakes that no policy targets. |
When the server is configured with data_auth_default_policy_class, a verified bearer token’s identity claim is auto-applied to policy-values and the configured class to policy-class — no client-side opts needed. See Configuration and Authentication for the bearer-token flow.
Read enforcement vs write enforcement
The same model governs both, distinguished by f:action:
f:view— applied during query execution. Flakes that fail the policy are filtered before the query plan emits results. The query never sees them.f:modify— applied during transaction staging. The transaction is rejected — withf:exMessageif provided — when a write would touch flakes the identity isn’t allowed to modify.
A single policy can govern both. See Policy in queries and Policy in transactions for path-specific details.
Performance notes
Two phases:
- Load. The relevant policies for a request are gathered once (from
policy-classlookups + inlinepolicy). Cost is small and proportional to the size of the policy set. - Apply. During plan execution, each candidate flake is checked against the matching subset of the policy set. Cost is proportional to the number of touched flakes × the average per-flake check cost.
Two practical implications:
- Target every policy you can. A policy with
f:onPropertyorf:onClassonly runs on flakes whose predicate or rdf:type matches. Default policies (no targeting) run on every flake. - Keep
f:querycheap. It runs once per targeted flake. Lean on identity-side properties already loaded (@type,f:policyClass, role flags) rather than deep traversals.
Policies are queryable data
Because each policy is just a JSON-LD node, you can query the policies themselves:
PREFIX f: <https://ns.flur.ee/db#>
PREFIX ex: <http://example.org/>
SELECT ?policy ?action ?onProperty
WHERE {
?policy a f:AccessPolicy ;
a ex:CorpPolicy ;
f:action ?action ;
f:onProperty ?onProperty .
}
History queries against the same shape produce a complete audit trail of policy changes over time. See Time travel for query-at-t syntax.
Related documentation
- Policy enforcement (concepts) — model and architecture
- Cookbook: Access control policies — worked examples and patterns
- Policy in queries — read-time enforcement details
- Policy in transactions — write-time enforcement details
- Programmatic policy API (Rust) —
PolicyContext, builder helpers, combining algorithm - Authentication — identities, JWTs, bearer-token verification
- Configuration — server-side policy defaults (
data_auth_default_policy_class, etc.) - Vocabulary reference — predicate IRIs
Policy in Queries
Query-time enforcement uses Fluree’s policy model to filter individual flakes during query execution. The query plan is the same regardless of policy — what changes is which flakes the engine returns. The application sees a query result; the policy filtering is invisible.
This page documents how query-time enforcement works, how patterns interact with the plan, and how to test policies from the CLI. For the policy node shape and combining algorithm, see the policy model reference. For the underlying concept, see Policy enforcement.
How query-time filtering works
When a query is executed against a PolicyContext:
- The engine resolves the request’s policy set: identity-driven
f:policyClasslookups + any inlineopts.policyarray. - The plan executes normally — same join order, same indices.
- Each flake the plan would emit is checked against the policies whose target matches it (
f:onProperty,f:onClass,f:onSubject, or default for untargeted policies). - A flake survives only if the combining algorithm approves it.
- Surviving flakes flow through the rest of the plan (joins, filters, aggregates) as normal.
Filtering is at the flake level — a single subject can appear in the result with some properties visible and others elided.
Worked example
Two users in a mydb:main ledger:
fluree insert '{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/"},
"@graph": [
{"@id": "ex:alice", "schema:name": "Alice", "ex:role": "engineer", "ex:salary": 130000},
{"@id": "ex:bob", "schema:name": "Bob", "ex:role": "manager", "ex:salary": 155000}
]
}'
A required policy that hides ex:salary unless the requester is a manager:
fluree insert '{
"@context": {"f": "https://ns.flur.ee/db#", "ex": "http://example.org/"},
"@graph": [
{
"@id": "ex:salary-restriction",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "ex:salary"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"manager\"}}"
},
{
"@id": "ex:default-view",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:action": [{"@id": "f:view"}],
"f:allow": true
},
{"@id": "ex:aliceIdentity", "f:policyClass": [{"@id": "ex:CorpPolicy"}], "ex:role": "engineer"},
{"@id": "ex:bobIdentity", "f:policyClass": [{"@id": "ex:CorpPolicy"}], "ex:role": "manager"}
]
}'
The same query, executed as different identities:
# As Bob (manager) — sees salaries
fluree query --as ex:bobIdentity --policy-class ex:CorpPolicy \
'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'
# → Alice 130000, Bob 155000
# As Alice (engineer) — salary flakes filtered out
fluree query --as ex:aliceIdentity --policy-class ex:CorpPolicy \
'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'
# → no results: the join requires ?salary which is filtered for Alice
To get Alice’s name back without the salary join, use OPTIONAL:
SELECT ?name ?salary WHERE {
?p <http://schema.org/name> ?name .
OPTIONAL { ?p <http://example.org/salary> ?salary }
}
Now Alice sees both names, with ?salary unbound — exactly the behavior an application expects when a property is suppressed by policy.
Targeting patterns
Property-level (f:onProperty)
Restricts a flake whose predicate matches:
{
"@id": "ex:hide-ssn",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "http://schema.org/ssn"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"
}
Flakes whose predicate is not schema:ssn are unaffected by this policy.
Class-level (f:onClass)
Restricts flakes whose subject has one of the listed rdf:types:
{
"@id": "ex:employee-data-only",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onClass": [{"@id": "http://example.org/Employee"}],
"f:action": [{"@id": "f:view"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/Employee\"}}"
}
Flakes about non-Employee subjects fall through to other policies.
Subject-level (f:onSubject)
Restricts flakes about specific subjects:
{
"@id": "ex:hide-internal-doc",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onSubject": [{"@id": "http://example.org/secret-doc"}],
"f:action": [{"@id": "f:view"}],
"f:allow": false
}
Default (no targeting)
A policy with no f:onProperty / f:onClass / f:onSubject applies to every flake. Use sparingly — default policies are evaluated against every emitted flake, which is more expensive than targeted policies.
SPARQL queries
SPARQL queries have no opts block, so policy is delivered via headers:
curl -X POST 'http://localhost:8090/v1/fluree/query?ledger=mydb:main' \
-H 'Content-Type: application/sparql-query' \
-H "Authorization: Bearer $JWT" \
-H 'fluree-identity: ex:aliceIdentity' \
-H 'fluree-policy-class: ex:CorpPolicy' \
-H 'fluree-default-allow: false' \
-d 'SELECT ?name WHERE { ?p <http://schema.org/name> ?name }'
The full header set is documented in the policy model.
JSON-LD queries
JSON-LD queries put policy in opts:
{
"from": "mydb:main",
"select": ["?name", "?salary"],
"where": [
{"@id": "?p", "schema:name": "?name"},
["optional", {"@id": "?p", "ex:salary": "?salary"}]
],
"opts": {
"identity": "ex:aliceIdentity",
"policy-class": ["ex:CorpPolicy"],
"default-allow": false
}
}
Inline policies, additional policy-values, and multiple policy-class entries all live under opts. The full vocabulary is in the policy model reference.
Multi-graph queries
Policies apply per-flake, regardless of which named graph the flake came from. A query that pulls from multiple from-named graphs sees a uniformly filtered result — there’s no per-graph policy override.
If different graphs need different policy regimes, use targeted policies (f:onClass for type-scoped restrictions, f:onSubject for explicit subject lists). For wholly separate access regimes, use separate ledgers.
Time-travel queries
Policy evaluation honors the query’s t. When you query --at a past t:
- The policy set itself is resolved at that
t(so retired policies still apply when you time-travel back to when they were live). - Identity attributes used in
f:queryare evaluated at thatt.
This makes audit-style queries — “What could Alice see on 2024-06-15?” — directly expressible:
fluree query --as ex:aliceIdentity --policy-class ex:CorpPolicy --at 2024-06-15T00:00:00Z \
'SELECT ?p ?o WHERE { <http://example.org/financial-report> ?p ?o }'
Performance considerations
Two phases: load the policy set once per request; apply it to each touched flake.
- Target policies whenever possible. A policy with
f:onPropertyonly runs against flakes whose predicate matches. Default policies (no targeting) run against every flake. - Keep
f:querycheap. It runs once per flake-target. Lean on identity-side properties already loaded (@type,f:policyClass, role flags) rather than deep traversals. - Avoid deep recursion in
f:query. Each level of indirection multiplies the per-flake cost. - Required policies short-circuit. If a required policy denies, no further required policies are checked for that flake.
For complex deployments, the explain plan shows whether a query is dominated by policy filtering and which policies contribute.
Testing policies from the CLI
The fluree CLI supports policy-enforced queries so you can verify that the policies you’ve configured filter results as expected — without writing any client code.
Flags
Available on fluree query (and on fluree insert, upsert, update for write-time enforcement):
| Flag | Purpose |
|---|---|
--as <IRI> | Execute as this identity. Resolves f:policyClass on the identity subject to collect applicable policies, and binds ?$identity. |
--policy-class <IRI> | Apply stored policies of the given class IRI. Repeatable. Narrows to the intersection with the identity’s policies, or applies directly without --as. |
--default-allow | Allow when no matching policy exists for the operation. Defaults to false (deny-by-default). |
Workflow
- Transact your policy rules (and the identities with their
f:policyClassassignments) into the ledger, using any of the normal insert / upsert / update commands. - Re-run the same query as different identities to confirm results differ as the policies prescribe:
# Full result set (no policy enforcement)
fluree query 'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'
# As an HR user — should see all salaries
fluree query --as ex:hrIdentity --policy-class ex:CorpPolicy \
'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'
# As a regular employee — policies should hide salary field
fluree query --as ex:engineerIdentity --policy-class ex:CorpPolicy \
'SELECT ?name ?salary WHERE { ?p <http://schema.org/name> ?name ; <http://example.org/salary> ?salary }'
Local vs remote
The flags work in both modes:
- Local (default, or with
--direct): the CLI loads the ledger directly and applies policy via the in-process query engine. - Remote (with
--remote <name>, or auto-routed through a running local server): the CLI sends the flags to the server as HTTP headers (fluree-identity,fluree-policy-class,fluree-default-allow) and, for JSON-LD bodies, also injects them intoopts. Multi-value--policy-classrides through the body opts only; SPARQL transport is single-valued via the header.
Remote impersonation: how it’s authorized
When you run against a remote server with --as <iri>, the server treats the request as impersonation and gates it as follows:
- Your bearer token’s identity is resolved on the target ledger.
- If that identity has no
f:policyClassassignments (theFoundNoPoliciesoutcome — your service account is unrestricted on this ledger), the server honors--asand runs the query as the target identity. - If your bearer identity is itself policy-constrained (
FoundWithPolicies) or unknown to this ledger (NotFound), the server force-overrides--aswith your bearer identity. You see your own filtered view, not the target’s.
Each successful impersonation is logged at info level on the server:
policy impersonation: bearer=<svc-id> target=<as-iri> ledger=<name>
This is the standard service-account pattern: register your CLI/app-server identity in the ledger with no f:policyClass, and it gains the right to delegate to any end-user identity for testing or per-request enforcement. Assigning a policy class to that identity revokes the delegation right with no config change.
Limitations
- Inline policy rules (
opts.policy) and policy variable bindings (opts.policy-values) are not yet exposed as CLI flags — use a JSON-LD query body with an"opts"block when you need those. - For SPARQL queries against a remote, only
--as, single-value--policy-class, and--default-alloware wired (via headers). Multi-value--policy-classworks on JSON-LD only. - Proxy-mode servers fall back to the legacy non-impersonation behavior — the upstream server performs the impersonation check.
Related documentation
- Policy model and inputs — node shape, combining algorithm, request-time options
- Policy enforcement (concepts) — model overview
- Policy in transactions — write-time enforcement
- Cookbook: Access control policies — worked patterns
- Programmatic policy API (Rust) — building
PolicyContextin code - Query reference — SPARQL and JSON-LD syntax
- Explain plans — diagnosing policy filter overhead
Policy in Transactions
Transaction-time enforcement uses the same policy model as queries, switched on by f:action: f:modify. Where query-time enforcement filters flakes from results, transaction-time enforcement rejects the transaction when a write would touch flakes the identity isn’t allowed to modify.
This page documents how write-time enforcement integrates with the transaction lifecycle, the failure shape, and the patterns that come up most often. For the policy node shape and combining algorithm, see the policy model reference. For the conceptual frame, see Policy enforcement.
How transaction-time enforcement works
When a transaction is staged against a PolicyContext:
- The engine resolves the request’s policy set: identity-driven
f:policyClasslookups + any inlineopts.policyarray, restricted to policies whosef:actionincludesf:modify. - The transaction is staged into novelty (assertions and retractions are computed from
insert/delete/whereclauses). - Each staged flake is checked against the matching policies.
- If any required policy denies a flake (or any non-required allow is missing where one would be needed), the entire transaction is rejected. Transactions are atomic — a partial write is never persisted.
- On rejection, the response carries the policy’s
f:exMessage(when supplied), the offending flake, and the policy’s@id.
The result: the requester gets a clear authorization failure rather than a silently incomplete write.
Worked example
fluree insert '{
"@context": {"f": "https://ns.flur.ee/db#", "ex": "http://example.org/"},
"@graph": [
{
"@id": "ex:email-restriction",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "http://schema.org/email"}],
"f:action": [{"@id": "f:modify"}],
"f:exMessage": "Users can only update their own email.",
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$this\"}}}"
},
{
"@id": "ex:default-rw",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:action": [{"@id": "f:view"}, {"@id": "f:modify"}],
"f:allow": true
},
{"@id": "ex:johnIdentity", "ex:user": {"@id": "ex:john"}, "f:policyClass": [{"@id": "ex:CorpPolicy"}]},
{"@id": "ex:janeIdentity", "ex:user": {"@id": "ex:jane"}, "f:policyClass": [{"@id": "ex:CorpPolicy"}]}
]
}'
Now John attempts to update his own email — succeeds:
fluree update --as ex:johnIdentity --policy-class ex:CorpPolicy '
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
WHERE { ex:john schema:email ?email }
DELETE { ex:john schema:email ?email }
INSERT { ex:john schema:email "new-john@flur.ee" }
'
John attempts to update Jane’s email — rejected:
fluree update --as ex:johnIdentity --policy-class ex:CorpPolicy '
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
WHERE { ex:jane schema:email ?email }
DELETE { ex:jane schema:email ?email }
INSERT { ex:jane schema:email "hacked@flur.ee" }
'
# Error: policy denied: Users can only update their own email. (ex:email-restriction)
What gets enforced
Every modification path runs the same f:modify policy check on its staged flakes:
| Operation | Flakes checked |
|---|---|
| Insert | All asserted flakes. |
| Upsert | Asserted flakes + retractions for any pre-existing values being replaced. |
| Update (WHERE/DELETE/INSERT) | Both retracted flakes (DELETE) and asserted flakes (INSERT). |
Retraction (@type: f:Retraction) | Retracted flakes. |
Crucially, the policy is checked against the flakes, not the operation type. A transaction that retracts a flake the identity can’t modify is rejected just like an insert that asserts one.
Targeting patterns
Whitelist a property to a role
{
"@id": "ex:salary-write",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "http://example.org/salary"}],
"f:action": [{"@id": "f:modify"}],
"f:exMessage": "Only HR may write salary.",
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/role\": \"hr\"}}"
}
Combined with default-allow: true (or a permissive default f:modify policy), every other property remains writable.
Owner-only edits
{
"@id": "ex:owner-edit",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:action": [{"@id": "f:modify"}],
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"http://example.org/user\": {\"@id\": \"?$user\"}}, \"$where\": {\"@id\": \"?$this\", \"http://example.org/owner\": {\"@id\": \"?$user\"}}}"
}
The f:query resolves the identity’s user and verifies that ?$this (the entity being modified) has that user as its owner.
Status-based gates
Prevent edits to records past a workflow gate:
{
"@id": "ex:no-edit-after-approval",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onClass": [{"@id": "http://example.org/Order"}],
"f:action": [{"@id": "f:modify"}],
"f:exMessage": "Approved orders cannot be modified.",
"f:query": "{\"where\": [{\"@id\": \"?$this\", \"http://example.org/status\": \"?status\"}, [\"filter\", \"(!= ?status \\\"approved\\\")\"]]}"
}
Approved orders fail the gate — their flakes can’t be retracted or modified.
Workflow service exception
Combine targeting + identity-typed checks to limit a write to a single service:
{
"@id": "ex:approved-by-workflow-only",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onProperty": [{"@id": "http://example.org/approved"}],
"f:action": [{"@id": "f:modify"}],
"f:exMessage": "ex:approved is set by the workflow service only.",
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"@type\": \"http://example.org/WorkflowService\"}}"
}
End-user identities can read ex:approved, but only the workflow service can write it.
Immutable records
{
"@id": "ex:audit-log-immutable",
"@type": ["f:AccessPolicy", "ex:CorpPolicy"],
"f:required": true,
"f:onClass": [{"@id": "http://example.org/AuditEvent"}],
"f:action": [{"@id": "f:modify"}],
"f:exMessage": "Audit events are immutable.",
"f:allow": false
}
Notice the absence of f:query — f:allow: false is a flat deny, applied to every modification of ex:AuditEvent instances. New events can still be inserted because the policy targets only existing-instance flakes; a fresh @type: ex:AuditEvent insertion creates a new subject and a new rdf:type flake, neither of which the targeting matches.
(For a hard “append-only” guarantee that forbids anything but new insertions, model the constraint with a SHACL shape that requires the property to be unset on prior commits — SHACL is a better fit for that pattern than policy.)
Failure shape
When a transaction is rejected, the API returns:
{
"error": "policy_denied",
"message": "Users can only update their own email.",
"policy": "http://example.org/email-restriction",
"subject": "http://example.org/jane",
"property": "http://schema.org/email"
}
f:exMessage is the user-visible string. The policy @id, the offending subject, and the property are reported for diagnostics.
When no f:exMessage is set, a generic message is returned ("policy denied"); the structured fields are still present so a client can surface the right error to a user.
WHERE/DELETE/INSERT semantics with policy
A WHERE/DELETE/INSERT transaction proceeds in three phases — match → retract → assert. Policy enforcement is on the staged flakes from phases 2 and 3:
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
WHERE { ?u schema:email ?old . FILTER(?u = ex:jane) }
DELETE { ?u schema:email ?old }
INSERT { ?u schema:email "new@flur.ee" }
When run by an identity that lacks modify rights on ?u’s email:
- The WHERE pattern still binds normally — policy doesn’t filter the match phase.
- The DELETE retraction stages a flake the identity can’t modify — rejected.
To prevent accidental no-op rejections (the WHERE matches but the DELETE/INSERT can’t proceed), pair transaction-time f:modify policies with the same shape f:view policies, so the WHERE itself sees a filtered view.
Signed transactions and impersonation
When a transaction is signed (JWS or VC-wrapped), the signing key’s identity replaces the bearer identity for policy purposes. The signed credential becomes the source of truth: the server verifies the signature, resolves the signer’s identity entity, and applies that identity’s f:policyClass policies.
For the impersonation rules — when --as <iri> is honored vs force-overridden — see Policy in queries → Remote impersonation. The same gate applies to transactions.
See Signed / credentialed transactions for the wire format.
Provenance
Every committed transaction carries the asserting identity in its commit metadata. Combined with policy enforcement, this gives a clean audit trail:
- The identity is recorded on the commit.
- The policies in effect at commit time are themselves time-travelable.
- Replay-from-commit produces the same policy decisions.
Performance considerations
- Stage cost dominates. Most of the work is staging the transaction (computing assertions/retractions, building the novelty layer). Policy checks add a small per-flake cost on top.
- Required policies short-circuit. A failure rejects the transaction immediately without checking remaining flakes.
- Batch transactions amortize loading. Loading the policy set is per-transaction, not per-flake — large batched transactions pay the load cost once.
- Cache identity properties. The identity’s
@type,f:policyClass, and any role tags used inf:queryare loaded once per transaction.
Testing policies from the CLI
The same --as, --policy-class, and --default-allow flags used on fluree query are available on fluree insert, fluree upsert, and fluree update so you can verify write-time enforcement without any client code:
# Attempt a write as an identity that lacks the f:modify policy — expect failure
fluree insert --as ex:readOnlyIdentity --policy-class ex:CorpPolicy -f new-data.ttl
# Same write as an authorized identity — expect success
fluree insert --as ex:writerIdentity --policy-class ex:CorpPolicy -f new-data.ttl
The flags work locally and against remote servers. On remote, the CLI sends the policy options as HTTP headers (fluree-identity, fluree-policy-class, fluree-default-allow) and, for JSON-LD bodies, also injects them into opts. The server applies the root-impersonation gate: your bearer identity may delegate to --as <iri> only when the bearer identity itself has no f:policyClass on the target ledger. Restricted bearers have --as force-overridden back to their own identity (and writes only what their own policies permit).
This is the standard service-account pattern — see Policy in queries → Remote impersonation for the full authorization rules and audit-log format.
Transaction enforcement is end-to-end
Unsigned bearer-authenticated transactions build a PolicyContext from the (post-header-merge) opts and route through the policy-enforcing transact_tracked_with_policy path. A non-root bearer’s f:modify constraints apply to their writes, matching the long-standing query-side behavior. SPARQL UPDATE inherits the same enforcement, with identity sourced from either the bearer or the fluree-identity header (impersonation-gated).
Related documentation
- Policy model and inputs — node shape, combining algorithm, request-time options
- Policy enforcement (concepts) — model overview
- Policy in queries — read-time enforcement
- Cookbook: Access control policies — worked patterns
- Programmatic policy API (Rust) — building
PolicyContextand usingtransact_tracked_with_policy - Signed / credentialed transactions — JWS / VC transaction wrapping
- Transaction overview — transaction lifecycle
Programmatic Policy API (Rust)
This guide covers how to use Fluree’s policy system programmatically in Rust applications.
Overview
There are two main approaches to applying policies programmatically:
- Identity-based policies (
wrap_identity_policy_view): Policies stored in the database and loaded viaf:policyClasson an identity subject - Inline policies (
wrap_policy_viewwithopts.policy): Policies provided directly in the query/transaction options
Identity-Based Policy Lookup
The recommended approach for production systems. Policies are stored in the ledger and loaded dynamically based on the identity’s f:policyClass property.
Storing Policies in the Database
First, insert policies with types that will be referenced by identities:
#![allow(unused)]
fn main() {
let policies = json!({
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"@graph": [
// Identity with policy class assignment
{
"@id": "http://example.org/identity/alice",
"f:policyClass": [{"@id": "ex:EmployeePolicy"}],
"ex:user": {"@id": "ex:alice"}
},
// SSN restriction policy - only see your own SSN
{
"@id": "ex:ssnRestriction",
"@type": ["f:AccessPolicy", "ex:EmployeePolicy"],
"f:required": true,
"f:onProperty": [{"@id": "schema:ssn"}],
"f:action": {"@id": "f:view"},
"f:query": serde_json::to_string(&json!({
"where": {
"@id": "?$identity",
"http://example.org/ns/user": {"@id": "?$this"}
}
})).unwrap()
},
// Default allow policy for other properties
{
"@id": "ex:defaultAllowView",
"@type": ["f:AccessPolicy", "ex:EmployeePolicy"],
"f:action": {"@id": "f:view"},
"f:allow": true
}
]
});
// Prefer the lazy Graph API for transactions
fluree.graph("mydb:main")
.transact()
.insert(&policies)
.commit()
.await?;
}
Using wrap_identity_policy_view
Create a policy-wrapped view using an identity IRI:
#![allow(unused)]
fn main() {
use fluree_db_api::{wrap_identity_policy_view, FlureeBuilder, GraphDb};
let fluree = FlureeBuilder::memory().build_memory();
let ledger = fluree.ledger("mydb:main").await?;
// Wrap the ledger with identity-based policy
let wrapped = wrap_identity_policy_view(
&ledger,
"http://example.org/identity/alice", // identity IRI
true // default_allow: allow access when no policy matches
).await?;
// Check policy properties
assert!(!wrapped.is_root(), "Should not be root/unrestricted");
// Create a view with the policy applied, then query using the builder
let view = GraphDb::from_ledger_state(&ledger)
.with_policy(std::sync::Arc::new(wrapped.policy().clone()));
let query = json!({
"select": ["?s", "?ssn"],
"where": {
"@id": "?s",
"@type": "ex:User",
"schema:ssn": "?ssn"
}
});
let result = view.query(&fluree)
.jsonld(&query)
.execute()
.await?;
}
How Identity Lookup Works
When you call wrap_identity_policy_view:
-
Fluree queries for policies via the identity’s
f:policyClass:SELECT ?policy WHERE { <identity-iri> f:policyClass ?class . ?policy a ?class . ?policy a f:AccessPolicy . } -
Each matching policy’s properties are loaded (
f:action,f:allow,f:query,f:onProperty, etc.) -
The
?$identityvariable is automatically bound to the identity IRI for use inf:querypolicies
Inline Policies with policy-values
For cases where policies should not be stored in the database, use inline policies with explicit ?$identity binding.
QueryConnectionOptions Pattern
#![allow(unused)]
fn main() {
use fluree_db_api::{QueryConnectionOptions, wrap_policy_view};
use std::collections::HashMap;
let policy = json!([{
"@id": "ex:inlineSsnPolicy",
"f:required": true,
"f:onProperty": [{"@id": "http://schema.org/ssn"}],
"f:action": "f:view",
"f:query": serde_json::to_string(&json!({
"where": {
"@id": "?$identity",
"http://example.org/ns/user": {"@id": "?$this"}
}
})).unwrap()
}]);
let opts = QueryConnectionOptions {
policy: Some(policy),
policy_values: Some(HashMap::from([(
"?$identity".to_string(),
json!({"@id": "http://example.org/identity/alice"}),
)])),
default_allow: true,
..Default::default()
};
let wrapped = wrap_policy_view(&ledger, &opts).await?;
}
Using query_from with Inline Policy
For FROM-driven queries where policy options are embedded in the query body, use query_from():
#![allow(unused)]
fn main() {
let query = json!({
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
},
"from": "mydb:main",
"opts": {
"default-allow": true,
"policy": [{
"@id": "inline-ssn-policy",
"f:required": true,
"f:onProperty": [{"@id": "http://schema.org/ssn"}],
"f:action": "f:view",
"f:query": serde_json::to_string(&json!({
"where": {
"@id": "?$identity",
"http://example.org/ns/user": {"@id": "?$this"}
}
})).unwrap()
}],
"policy-values": {
"?$identity": {"@id": "http://example.org/identity/alice"}
}
},
"select": ["?s", "?ssn"],
"where": {
"@id": "?s",
"@type": "ex:User",
"schema:ssn": "?ssn"
}
});
let result = fluree.query_from()
.jsonld(&query)
.execute()
.await?;
}
Policy Options Precedence
When multiple policy options are provided, they follow this precedence:
| Priority | Option | Behavior |
|---|---|---|
| 1 (highest) | opts.identity | Query f:policyClass policies, auto-bind ?$identity |
| 2 | opts.policy_class | Query policies of specified types |
| 3 (lowest) | opts.policy | Use inline policy JSON directly |
Important: If opts.identity is set, inline opts.policy is ignored.
Policy Structure Reference
f:allow (Static Allow/Deny)
{
"@id": "ex:allowAll",
"@type": ["f:AccessPolicy", "ex:MyPolicyClass"],
"f:action": {"@id": "f:view"},
"f:allow": true
}
f:query (Dynamic Evaluation)
{
"@id": "ex:ownerOnly",
"@type": ["f:AccessPolicy", "ex:MyPolicyClass"],
"f:action": {"@id": "f:view"},
"f:onProperty": [{"@id": "schema:ssn"}],
"f:required": true,
"f:query": "{\"where\": {\"@id\": \"?$identity\", \"ex:user\": {\"@id\": \"?$this\"}}}"
}
Policy Properties
| Property | Type | Description |
|---|---|---|
f:action | f:view / f:modify | What action this policy applies to |
f:allow | boolean | Static allow (true) or deny (false) |
f:query | string (JSON) | Query that must return results for access to be granted |
f:onProperty | IRI(s) | Restrict policy to specific properties |
f:onSubject | IRI(s) | Restrict policy to specific subjects |
f:onClass | IRI(s) | Restrict policy to instances of specific classes |
f:required | boolean | If true, this policy MUST allow for access to be granted |
f:exMessage | string | Custom error message when policy denies access |
Special Variables
| Variable | Binding |
|---|---|
?$identity | The identity IRI (from opts.identity or policy_values["?$identity"]) |
?$this | The subject being accessed (for property-level policies) |
Policy Combining Algorithm
When multiple policies match a flake, they are combined using Deny Overrides:
- If any matching policy explicitly denies (
f:allow: false), access is denied - If a targeted policy’s
f:queryreturns false, access is denied (doesn’t fall through to Default policies) - If any policy allows (
f:allow: trueorf:queryreturns true), access is granted - If no policies match and
default_allowistrue→ access is granted - Otherwise, access is denied
Identity resolution is three-state:
FoundWithPolicies(restrictions apply) →FoundNoPolicies(subject exists, no restrictions) →NotFound(subject absent, no restrictions). The three-state split determines whether a concrete identity SID is available to bind?$identityin policy queries; it does not gatedefault_allow. An unknown identity withdefault_allow: trueis granted access — this is the intended behavior for deployments where an application layer handles authorization and Fluree records signed transactions for provenance. Setdefault_allow: falsefor fail-closed behavior.
Important: Inline policies must use full IRIs (e.g., "http://schema.org/ssn"), not compact IRIs (e.g., "schema:ssn"). Compact IRIs in inline policies are not expanded.
Transactions with Policy
Policies can also be applied to transactions using the builder API:
#![allow(unused)]
fn main() {
use fluree_db_api::policy_builder;
let policy_ctx = policy_builder::build_policy_context_from_opts(
&ledger.snapshot,
ledger.novelty.as_ref(),
Some(ledger.novelty.as_ref()),
ledger.t(),
&qc_opts,
&[0], // default graph; use resolve_policy_source_g_ids() for config-driven graphs
).await?;
let txn = json!({
"@context": {"ex": "http://example.org/ns/"},
"insert": [
{"@id": "ex:alice", "ex:data": "secret"}
]
});
// Use the transaction builder with policy
let result = fluree.graph("mydb:main")
.transact()
.update(&txn)
.policy(policy_ctx)
.commit()
.await;
match result {
Ok(txn_result) => println!("Transaction succeeded at t={}", txn_result.ledger.t()),
Err(e) => println!("Policy denied: {}", e),
}
}
Historical Views with Policy
For time-travel queries with policy, load a historical graph and apply policy as a view overlay:
#![allow(unused)]
fn main() {
use fluree_db_api::{GraphDb, QueryConnectionOptions};
// Load a historical view
let graph = fluree.view_at_t("mydb:main", 100).await?;
// Apply policy to create a view
let policy_ctx = policy_builder::build_policy_context_from_opts(
&ledger.snapshot,
ledger.novelty.as_ref(),
Some(ledger.novelty.as_ref()),
ledger.t(),
&opts,
&[0],
).await?;
let view = graph.with_policy(std::sync::Arc::new(policy_ctx));
// Query the historical view with policy applied
let result = view.query(&fluree)
.jsonld(&query)
.execute()
.await?;
}
API Reference
wrap_identity_policy_view
#![allow(unused)]
fn main() {
pub async fn wrap_identity_policy_view<'a>(
ledger: &'a LedgerState,
identity_iri: &str,
default_allow: bool,
) -> Result<PolicyWrappedView<'a>>
}
Creates a policy-wrapped view using identity-based f:policyClass lookup.
Parameters:
ledger: The ledger state to wrapidentity_iri: IRI of the identity subject (will queryf:policyClass)default_allow: Whether to allow access when no policies match. Ignored (forcedfalse) if the identity IRI has no subject node in the ledger — see combining algorithm step 5
wrap_policy_view
#![allow(unused)]
fn main() {
pub async fn wrap_policy_view<'a>(
ledger: &'a LedgerState,
opts: &QueryConnectionOptions,
) -> Result<PolicyWrappedView<'a>>
}
Creates a policy-wrapped view from query connection options.
QueryConnectionOptions fields:
identity: Identity IRI forf:policyClasslookuppolicy: Inline policy JSONpolicy_class: Policy class IRIs to querypolicy_values: Variable bindings for policy queriesdefault_allow: Default access when no policies match
PolicyWrappedView
#![allow(unused)]
fn main() {
impl PolicyWrappedView {
/// Check if this is a root/unrestricted policy
pub fn is_root(&self) -> bool;
/// Get the underlying policy context
pub fn policy(&self) -> &PolicyContext;
/// Get the policy enforcer for query execution
pub fn enforcer(&self) -> &Arc<QueryPolicyEnforcer>;
}
}
Best Practices
1. Prefer Identity-Based Policies
Store policies in the database for:
- Version control with data
- Audit trail of policy changes
- Dynamic policy updates without code changes
- Time-travel to historical policy states
2. Use HTTP IRIs for Identities
HTTP IRIs are more portable than DIDs for identity subjects:
#![allow(unused)]
fn main() {
// Recommended
let identity = "http://example.org/identity/alice";
// Also works but may have encoding issues
let identity = "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK";
}
3. Always Set default_allow Explicitly
#![allow(unused)]
fn main() {
// Be explicit about default behavior
let wrapped = wrap_identity_policy_view(&ledger, identity, false).await?;
// ^^^^^ explicit deny
}
4. Handle Policy Errors
#![allow(unused)]
fn main() {
let graph = GraphDb::from_ledger_state(&ledger)
.with_policy(std::sync::Arc::new(policy_ctx));
match graph.query(&fluree).jsonld(&query).execute().await {
Ok(result) => process_results(result),
Err(ApiError::PolicyDenied { message, policy_id }) => {
log::warn!("Access denied by {}: {}", policy_id, message);
// Return empty or error to user
}
Err(e) => return Err(e),
}
}
Related Documentation
- Policy Model - Policy structure and evaluation
- Policy in Queries - Query-time enforcement
- Policy in Transactions - Transaction-time enforcement
- Rust API - General Rust API usage
Indexing and Search
Fluree provides powerful indexing and search capabilities beyond standard graph queries. This section covers background indexing, full-text search, and vector similarity search.
Index Types
Background Indexing
Core database indexing for query performance:
- SPOT, POST, OPST, PSOT indexes
- Automatic index maintenance
- Indexing configuration
- Performance tuning
- Monitoring and metrics
Reindex API
Manual index rebuilding for recovery and maintenance:
- Memory-bounded batched processing
- Checkpointing for resumable operations
- Progress monitoring with callbacks
- Resume after interruption
- Index configuration options
Inline Fulltext Search
Inline BM25-ranked text scoring. Two entry points, same query surface:
@fulltextdatatype — per-value annotation (analogous to@vector), always English, zero configf:fullTextDefaultsconfig — declare properties + language once at the ledger level; supports 18 languages with Snowball stemming and per-graph overrides for multilingual setupsfulltext(?var, "query")scoring function inbindexpressions (same for both paths)- Automatic per-(graph, property, language) fulltext arena construction during background indexing
- Unified scoring across indexed and novelty documents
- Works immediately (no-index fallback) with optimal performance after indexing
BM25 Full-Text Search
Dedicated full-text search indexes using BM25 ranking (for large-scale corpora):
- Creating BM25 indexes via Rust API
- Query-based field selection (indexing query defines what to index)
- BM25 scoring with configurable k1/b parameters
- Block-Max WAND for efficient top-k queries
- Incremental index updates via property-dependency tracking
Vector Search
Approximate nearest neighbor (ANN) search for embeddings:
- Vector index configuration
- Embedded HNSW indexes (in-process) or remote via dedicated search service
- Embedding storage with
@vectordatatype (resolves tohttps://ns.flur.ee/db#embeddingVector) - Similarity queries via
f:*syntax - Deployment modes (embedded / remote)
- Use cases (semantic search, recommendations)
Geospatial
Geographic point data with native binary encoding:
geo:wktLiteraldatatype support (OGC GeoSPARQL)- Automatic POINT geometry detection and optimization
- Packed 60-bit lat/lng encoding (~0.3mm precision)
- Foundation for proximity queries (latitude-band index scans)
Indexing Architecture
Fluree maintains multiple index types for different query patterns:
Core Indexes (automatic):
- SPOT: Subject-Predicate-Object-Time
- POST: Predicate-Object-Subject-Time
- OPST: Object-Predicate-Subject-Time
- PSOT: Predicate-Subject-Object-Time
Graph Source Indexes (explicit):
- BM25: Full-text search indexes
- Vector: Embedding similarity indexes
- R2RML: Relational database views
- Iceberg: Data lake integrations
Background Indexing
Core database indexing happens automatically:
Transaction → Commit → Background Indexer → Index Published
Process:
- Transaction committed (t assigned)
- Commit published to nameservice
- Background indexer detects new commit
- Indexes updated (SPOT, POST, OPST, PSOT)
- Index snapshot published
Novelty Layer:
- Gap between latest commit and latest index
- Queries combine indexed data + novelty
- Monitored via
commit_t - index_t
See Background Indexing for details.
Inline Fulltext Search
For small-to-medium corpora (up to hundreds of thousands of documents per predicate), inline fulltext search provides BM25-ranked scoring with zero configuration:
Annotate data:
{
"@id": "ex:article-1",
"ex:content": {
"@value": "Rust is a systems programming language focused on safety",
"@type": "@fulltext"
}
}
Query with scoring:
{
"select": ["?title", "?score"],
"where": [
{ "@id": "?doc", "ex:content": "?content", "ex:title": "?title" },
["bind", "?score", "(fulltext ?content \"Rust programming\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
See Inline Fulltext Search for details.
Full-Text Search (BM25 Graph Source)
For larger corpora (1M+ documents) with strict latency requirements, the BM25 graph source pipeline provides WAND-based top-k pruning, chunked posting lists, and incremental updates:
BM25 provides ranked full-text search:
Creating Index (Rust API):
#![allow(unused)]
fn main() {
use fluree_db_api::Bm25CreateConfig;
use serde_json::json;
let query = json!({
"@context": { "schema": "http://schema.org/" },
"where": [{ "@id": "?x", "@type": "schema:Product", "schema:name": "?name" }],
"select": { "?x": ["@id", "schema:name", "schema:description"] }
});
let config = Bm25CreateConfig::new("products-search", "mydb:main", query);
let result = fluree.create_full_text_index(config).await?;
}
There are no HTTP endpoints for index management yet — indexes are managed via the Rust API.
Searching:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "mydb:main",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop computer",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
],
"orderBy": ["-?score"]
}
See BM25 for details.
Vector Search
Similarity search using vector embeddings via HNSW indexes (embedded or remote).
Important: Embeddings must be stored with the vector datatype (@type: "@vector", @type: "f:embeddingVector", or full IRI https://ns.flur.ee/db#embeddingVector) to preserve array structure.
Creating Index (Rust API):
#![allow(unused)]
fn main() {
let config = VectorCreateConfig::new(
"products-vector", "mydb:main", query, "ex:embedding", 384
);
fluree.create_vector_index(config).await?;
}
Searching:
{
"from": "mydb:main",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-vector:main",
"f:queryVector": [0.1, 0.2, ..., 0.9],
"f:searchLimit": 10,
"f:searchResult": {
"f:resultId": "?product",
"f:resultScore": "?score"
}
}
]
}
See Vector Search for details.
Index as Graph Sources
Search indexes are exposed as graph sources:
Graph Source Names:
products-search:main- BM25 indexproducts-vector:main- Vector index
Query Like Regular Ledgers:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "mydb:main",
"select": ["?product", "?name", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
},
{ "@id": "?product", "schema:name": "?name" }
]
}
Combines structured data with search results via the f:graphSource pattern.
Index Management
Creating Indexes
BM25 and vector indexes are created via the Rust API. See BM25 and Vector Search for details.
Updating Indexes
BM25 indexes are not automatically updated when the source ledger changes. They must be explicitly synced:
#![allow(unused)]
fn main() {
// Incremental sync (detects changes since last watermark)
let result = fluree.sync_bm25_index("products-search:main").await?;
// Or use the Bm25MaintenanceWorker for automatic background syncing
}
The Bm25MaintenanceWorker can be configured to watch for ledger commits and sync automatically.
Deleting Indexes
#![allow(unused)]
fn main() {
let result = fluree.drop_full_text_index("products-search:main").await?;
}
Performance Characteristics
Inline Fulltext Search
- Indexed throughput: ~625,000 docs/sec (50K paragraph-length docs in 80ms)
- Novelty throughput: ~85,000 docs/sec (50K docs in ~600ms, no index required)
- Indexed speedup: 7-7.5x faster than novelty-only
- Scaling: Near-linear; ~625K docs within a 1-second query budget
- Arena build: Adds minimal overhead to the normal binary index build
BM25 Search
- Index Build Time: O(n) for n documents
- Top-k Query Time: Sub-linear via Block-Max WAND — skips posting list segments that cannot contribute to the top-k, with early termination. Falls back to O(total matching postings) when k approaches corpus size.
- Space: ~2-3x document size
- Updates: Incremental via property-dependency tracking, O(changed docs)
Vector Search
- Flat scan (inline functions): O(n) brute-force, viable up to ~100K vectors with binary indexing; binary index provides ~6x speedup over novelty-only scans and ~25x for filtered queries
- HNSW index: O(log n) approximate nearest neighbor, recommended for 100K+ vectors or strict latency requirements
- Space: ~1.5x embedding size
- Updates: Incremental, O(1) per vector
- See Vector Search – Performance and Scaling for benchmark data and guidance on when to adopt HNSW
Combined Queries
Combine search with graph queries:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "mydb:main",
"select": ["?product", "?category"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product" }
},
{ "@id": "?product", "schema:category": "?category" }
]
}
Query optimizer handles joins between the search graph source and structured data efficiently.
Use Cases
Full-Text Search
E-commerce Product Search:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "wireless headphones",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
],
"orderBy": ["-?score"]
}
Document Management:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "documents:main",
"where": [
{
"f:graphSource": "documents-search:main",
"f:searchText": "quarterly report 2024",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?doc" }
},
{ "@id": "?doc", "ex:department": "finance" }
]
}
Vector Similarity
Semantic Search:
{
"from": "articles:main",
"values": [
["?queryVec"],
[{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
],
"where": [
{
"f:graphSource": "articles-vector:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 10,
"f:searchResult": {
"f:resultId": "?article",
"f:resultScore": "?vecScore"
}
}
],
"select": ["?article", "?vecScore"],
"orderBy": [["desc", "?vecScore"]]
}
Recommendation Engine:
{
"from": "products:main",
"where": [
{
"@id": "ex:product-123",
"ex:embedding": "?queryVec"
},
{
"f:graphSource": "products-vector:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 5,
"f:searchResult": { "f:resultId": "?similar", "f:resultScore": "?vecScore" }
}
],
"select": ["?similar", "?vecScore"],
"orderBy": [["desc", "?vecScore"]]
}
Hybrid Search
Combine text and vector search:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"values": [
["?queryVec"],
[{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 100,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
},
{
"f:graphSource": "products-vector:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 100,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vecScore" }
}
],
"bind": {
"?finalScore": "(?textScore * 0.6) + (?vecScore * 0.4)"
},
"orderBy": ["-?finalScore"]
}
Monitoring
Check BM25 Staleness
Check whether a BM25 index is behind its source ledger:
#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("products-search:main").await?;
println!("Index at t={}, ledger at t={}, stale: {}, lag: {}",
check.index_t, check.ledger_t, check.is_stale, check.lag);
}
Background Maintenance
The Bm25MaintenanceWorker watches for source ledger commits and syncs indexes automatically:
- Debounces rapid commits (configurable interval)
- Bounded concurrency for concurrent sync operations
- Registers/unregisters graph sources dynamically
Best Practices
1. Choose Appropriate Index Type
- Structured queries: Use core graph indexes
- Keyword search (< 500K docs): Use inline
@fulltextfor zero-config BM25 scoring - Keyword search (1M+ docs): Use the BM25 graph source for WAND-optimized top-k retrieval
- Semantic similarity: Use vector search
- Hybrid: Combine multiple indexes
2. Tune BM25 Parameters
Adjust k1 and b for your corpus:
#![allow(unused)]
fn main() {
let config = Bm25CreateConfig::new("search", "docs:main", query)
.with_k1(1.5) // Higher = more weight to term frequency (default: 1.2)
.with_b(0.5); // Lower = less document length normalization (default: 0.75)
}
The indexing query controls which properties are indexed — all selected text properties contribute to the document’s searchable content.
3. Monitor Index Staleness
Check staleness after bulk operations:
#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("search:main").await?;
if check.is_stale {
fluree.sync_bm25_index("search:main").await?;
}
}
4. Sync After Bulk Updates
BM25 indexes require explicit sync. After bulk inserts, sync once at the end:
#![allow(unused)]
fn main() {
// Insert many documents...
for batch in batches {
fluree.insert(ledger.clone(), &batch).await?;
}
// Sync the BM25 index once after all inserts
fluree.sync_bm25_index("products-search:main").await?;
}
5. Use Appropriate Limits
Limit results for performance:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "docs:main",
"where": [
{
"f:graphSource": "docs-search:main",
"f:searchText": "search query",
"f:searchLimit": 100,
"f:searchResult": { "f:resultId": "?doc" }
}
]
}
Related Documentation
- Background Indexing - Core index details
- Inline Fulltext Search -
@fulltextdatatype andfulltext()scoring - BM25 - Dedicated full-text search graph source
- Vector Search - Similarity search
- Graph Sources - Graph source concepts
- Query - Query syntax
Background Indexing
Fluree maintains query-optimized indexes through a background indexing process. This document covers the indexing architecture, configuration, and monitoring.
Index Architecture
Fluree maintains four index permutations for efficient query execution:
SPOT (Subject-Predicate-Object-Time)
Organized by subject first:
ex:alice → schema:name → "Alice" → [t=1, t=5]
ex:alice → schema:age → 30 → [t=1]
ex:alice → schema:age → 31 → [t=10]
Optimized for: “Give me all properties of this subject”
POST (Predicate-Object-Subject-Time)
Organized by predicate first:
schema:name → "Alice" → ex:alice → [t=1, t=5]
schema:age → 30 → ex:alice → [t=1]
schema:age → 31 → ex:alice → [t=10]
Optimized for: “Find all subjects with this property/value”
OPST (Object-Predicate-Subject-Time)
Organized by object first:
"Alice" → schema:name → ex:alice → [t=1, t=5]
30 → schema:age → ex:alice → [t=1]
31 → schema:age → ex:alice → [t=10]
Optimized for: “Find subjects with this object value”
PSOT (Predicate-Subject-Object-Time)
Organized by predicate, then subject:
schema:name → ex:alice → "Alice" → [t=1, t=5]
schema:age → ex:alice → 30 → [t=1]
schema:age → ex:alice → 31 → [t=10]
Optimized for: “Get all values for this predicate”
Indexing Process
1. Transaction Commit
t=42: Transaction committed
- Flakes written to append-only log
- Commit metadata created
- Commit published to nameservice (commit_t=42)
2. Indexer Detection
Background indexing is triggered when the ledger’s novelty exceeds the configured threshold (see Configuration below):
Indexer checks: commit_t=42, index_t=40
Indexer: Need to index t=41, t=42
3. Index Building
Background indexing builds a new index snapshot up to a specific to_t (typically the current commit_t when the job starts). During the job, new commits may arrive; those remain in novelty for the next cycle.
Incremental indexing (default path):
- Load the existing index root (CAS CID) from nameservice
- Resolve only commits with t in (index_t, to_t]
- Merge resolved novelty into only the affected leaf blobs (Copy-on-Write)
- Update dictionaries (forward packs + reverse trees)
- Assemble a new root referencing mostly-unchanged CAS artifacts
Fallback:
- If incremental indexing cannot safely proceed, fall back to a full rebuild
4. Index Publishing
When complete:
- Upload new CAS blobs (leaves, branches, dict blobs) as needed
- Upload the new index root (CAS CID)
- Publish index_head_id to nameservice (atomic “commit point”)
- Update index_t to to_t
Novelty Layer
The novelty layer consists of transactions committed but not yet indexed:
Current State:
commit_t = 150
index_t = 145
novelty = [t=146, t=147, t=148, t=149, t=150]
Query Execution with Novelty
Queries combine indexed data with novelty:
Query for ex:alice's properties:
1. Check SPOT index (up to t=145)
2. Apply novelty layer (t=146 to t=150)
3. Combine results
Impact of Large Novelty
Small novelty (< 10 transactions):
- Minimal query overhead
- Fast query execution
Large novelty (> 100 transactions):
- Significant query overhead
- Slower query execution
- Higher memory usage
Configuration
Background indexing is on by default. Indexing is triggered based on novelty size thresholds:
- Enable/disable background indexing:
--indexing-enabled/FLUREE_INDEXING_ENABLED(defaulttrue; disable only when a peer/indexer process owns this storage) - Trigger threshold (soft):
--reindex-min-bytes/FLUREE_REINDEX_MIN_BYTES - Backpressure threshold (hard):
--reindex-max-bytes/FLUREE_REINDEX_MAX_BYTES
See Operations: Configuration for the canonical flag/env/config-file reference.
Incremental parallelism (per ledger)
Within a single incremental indexing job, Fluree can update multiple (graph, index-order) branches concurrently. This is bounded by:
IndexerConfig.incremental_max_concurrency(default: 4)
This setting is part of the Rust IndexerConfig used by the indexer pipeline; it is not a server CLI flag. Increasing it can improve throughput on multi-graph ledgers and can run the four main index orders (SPOT/PSOT/POST/OPST) in parallel, at the cost of higher peak memory.
Monitoring
Check Index Status
curl http://localhost:8090/v1/fluree/info/mydb:main
Response:
{
"ledger_id": "mydb:main",
"branch": "main",
"commit_t": 150,
"index_t": 145,
"commit_id": "bafy...headCommit",
"index_id": "bafy...indexRoot"
}
Key Metrics:
- index lag (txns):
commit_t - index_t
For byte-level novelty size and indexing trigger decisions, see the indexing block returned by transaction and replication endpoints (e.g. POST /push/<ledger>), documented in API Endpoints.
Key Log Messages
At INFO, background indexing now emits coarse-grained progress logs that make it easier to distinguish:
- request queued vs. worker started
- current wait status while
trigger_index()is blocked - incremental vs. rebuild path selection
- commit-chain walking progress
- commit resolution progress and phase completion
When background indexing is queued by an HTTP transaction request, the worker logs also include copied request_id and trace_id fields from the triggering request. This provides log-level correlation between the foreground request and the later background build without making the index build part of the original request trace.
At DEBUG, the same wait and commit-walk paths emit more frequent progress updates for incident debugging without changing behavior.
When you call indexing through the Rust API with trigger_index(), wait timeout
is optional and should generally be chosen by the caller. Leave
TriggerIndexOptions.timeout_ms unset to wait until completion, or set it
explicitly for bounded environments such as Lambda jobs, HTTP gateways, or
other workers with a fixed maximum runtime.
Health Indicators
Healthy:
index_lag: 0-10 transactions
index_rate > transaction_rate
Warning:
index_lag: 10-50 transactions
index_rate ≈ transaction_rate
Critical:
index_lag: > 50 transactions
index_rate < transaction_rate
Performance Tuning
Optimize for Write-Heavy Loads
fluree-server \
--indexing-enabled \
--reindex-min-bytes 200000 \
--reindex-max-bytes 2000000
Larger thresholds reduce indexing frequency (more novelty accumulation), trading some query-time overlay cost for reduced background indexing activity.
Optimize for Read-Heavy Loads
fluree-server \
--indexing-enabled \
--reindex-min-bytes 50000
Smaller reindex-min-bytes keeps novelty smaller (better query performance) at the cost of more frequent background indexing cycles.
Index Storage
Index Snapshots
Indexes are stored as immutable, content-addressed snapshots:
- Leaf blobs (FLI3) and branch manifests (FBR3)
- Dictionary blobs (forward packs, reverse tree leaves/branches)
- An index root blob (FIR6) that references everything needed for queries
The nameservice stores the current index root CID (index_head_id) and its watermark (index_t). Peers fetch only the CAS objects they need on demand.
Index Retention
Old index snapshots are retained for time-travel safety and concurrent query safety. Cleanup is performed by the binary index garbage collector, governed by:
IndexerConfig.gc_max_old_indexesIndexerConfig.gc_min_time_mins
No standalone HTTP compaction endpoint is currently exposed. Use POST /v1/fluree/reindex when you need to force a full index refresh.
Troubleshooting
High indexing lag
Symptom: commit_t - index_t grows continuously
Causes:
- Transaction rate exceeds indexing capacity
- Large transactions
- Insufficient resources
Solutions:
- Reduce
reindex-min-bytesso indexing triggers sooner - Increase resources for the indexer (CPU/memory and storage throughput)
- Consider running a dedicated indexer process (separate from the transactor)
- For incremental indexing, consider increasing
IndexerConfig.incremental_max_concurrency
Slow Indexing
Symptom: index_t advances slowly (or stops advancing)
Causes:
- Disk I/O bottleneck
- CPU bottleneck
- Large index size
- Storage backend latency
Solutions:
- Use faster storage (SSD)
- Increase CPU allocation
- Optimize transaction patterns
- Use local storage vs network storage
Index Corruption
Symptom: Query errors, unexpected results
Recovery: Use the Reindex API to rebuild indexes from scratch if you suspect corruption or need to change index structure parameters.
Best Practices
1. Monitor Novelty
setInterval(async () => {
const status = await fetch('http://localhost:8090/v1/fluree/info/mydb:main')
.then(r => r.json());
const lag = status.t - status.index.t;
if (lag > 50) {
console.warn(`High indexing lag: ${lag} transactions`);
}
}, 30000); // Check every 30 seconds
2. Tune for Workload
Match configuration to workload pattern:
- Write-heavy: Larger
reindex-min-bytes(fewer indexing cycles) - Read-heavy: Smaller
reindex-min-bytes(less novelty overlay) - Balanced: Default settings
3. Capacity Planning
Estimate indexing capacity:
Transaction rate: 10 txn/second
Avg flakes per txn: 100
Total flakes: 1,000 flakes/second
Indexing capacity: 2,000 flakes/second (2× margin)
4. Alert on Lag
Set up alerting:
const lag = status.commit_t - status.index_t;
if (lag > 100) {
alertOps('Critical: Indexing lag > 100 transactions');
}
5. Scheduled Reindex
Run a full reindex during off-peak hours when you need to rebuild indexes:
# Cron job
0 2 * * * curl -X POST http://localhost:8090/v1/fluree/reindex -H "Content-Type: application/json" -d '{"ledger":"mydb:main"}'
Related Documentation
- Reindex API - Manual index rebuilding and recovery
- Indexing Side-Effects - Transaction impact on indexing
- Query Performance - Query optimization
- BM25 - Full-text search indexing
- Vector Search - Vector indexing
Reindex API
The Reindex API provides full rebuilds of ledger indexes from the commit chain. Use this when you need to rebuild indexes from scratch, such as after suspected corruption or index configuration changes.
Overview
Unlike background indexing which incrementally updates indexes as transactions commit, reindexing rebuilds the entire binary columnar index from the commit history.
Reindex publishes the new index root via publish_index_allow_equal, which means a reindex can produce a new index root CID even when index_t stays the same (same logical snapshot, different physical layout/config).
When to Reindex
Common Use Cases
- Index corruption - Query errors or unexpected results suggest corrupted indexes
- Configuration changes - Changing index parameters (leaf size, branch size)
- Storage backend changes - If you move a deployment between storage backends or adopt a new index strategy/type.
Before You Reindex
Consider these factors:
- Duration: Full reindex scales with ledger size; large ledgers may take hours
- Resources: Ensure adequate memory and storage during the operation
- Availability: Queries remain available during reindex, but may be slower
- Backup: Be sure to back up data before major reindex operations
Rust API
The reindex API is exposed through the Fluree type in fluree-db-api. Fluree owns the storage backend, node cache, nameservice, and provides all ledger operations including queries, transactions, and admin functions like reindex.
Basic Reindex
#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, ReindexOptions, ReindexResult};
// Create Fluree instance
let fluree = FlureeBuilder::file("/path/to/data")
.build()
.await?;
// Reindex with default options
let result: ReindexResult = fluree.reindex("mydb:main", ReindexOptions::default()).await?;
println!("Reindexed to t={}", result.index_t);
println!("Root ID: {}", result.root_id);
}
Reindex with Custom Options
#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, ReindexOptions};
use fluree_db_indexer::IndexerConfig;
let fluree = FlureeBuilder::file("/path/to/data").build().await?;
let result = fluree.reindex("mydb:main", ReindexOptions::default()
// Use custom index node sizes
.with_indexer_config(IndexerConfig::large())
).await?;
}
ReindexOptions Reference
| Option | Default | Description |
|---|---|---|
indexer_config | IndexerConfig::default() | Controls output index structure (leaf/branch sizes, GC settings, memory budget) |
indexer_config
Controls the output index structure and rebuild resources:
#![allow(unused)]
fn main() {
use fluree_db_indexer::IndexerConfig;
// For small datasets (< 100k flakes)
ReindexOptions::default()
.with_indexer_config(IndexerConfig::small())
// For large datasets (> 10M flakes)
ReindexOptions::default()
.with_indexer_config(IndexerConfig::large())
// Custom configuration
let config = IndexerConfig::default()
.with_gc_max_old_indexes(10) // Keep more old index versions
.with_gc_min_time_mins(60) // Retain for at least 60 minutes
.with_run_budget_bytes(1 << 30) // 1 GB memory budget for sort buffers
.with_data_dir("/data/fluree"); // Directory for index artifacts
ReindexOptions::default()
.with_indexer_config(config)
}
Key IndexerConfig fields:
| Field | Default | Description |
|---|---|---|
leaf_target_bytes | 187,500 | Target bytes per leaf node |
leaf_max_bytes | 375,000 | Maximum bytes per leaf node (triggers split) |
branch_target_children | 100 | Target children per branch node |
branch_max_children | 200 | Maximum children per branch node |
gc_max_old_indexes | 5 | Old index versions to retain before GC |
gc_min_time_mins | 30 | Minimum age (minutes) before an index can be GC’d |
run_budget_bytes | 256 MB | Memory budget for sort buffers (split across all sort orders) |
data_dir | System temp dir | Base directory for index artifacts |
incremental_enabled | true | Background indexing: attempt incremental updates before full rebuild |
incremental_max_commits | 10,000 | Background indexing: max commit window for incremental indexing |
incremental_max_concurrency | 4 | Background indexing: max concurrent (graph, order) branch updates |
Note: Reindex is a full rebuild. The incremental_* fields are used by background indexing and are not relevant to the semantics of a reindex operation.
ReindexResult
The reindex operation returns:
#![allow(unused)]
fn main() {
pub struct ReindexResult {
/// Ledger ID
pub ledger_id: String,
/// Transaction time the index was built to
pub index_t: i64,
/// ContentId of the new index root
pub root_id: ContentId,
/// Index build statistics
pub stats: IndexStats,
}
}
Error Handling
Common Errors
#![allow(unused)]
fn main() {
use fluree_db_api::ApiError;
match fluree.reindex("mydb:main", opts).await {
Ok(result) => println!("Success: t={}", result.index_t),
Err(ApiError::NotFound(msg)) => {
// Ledger doesn't exist or has no commits
println!("Ledger not found: {}", msg);
}
Err(ApiError::ReindexConflict { expected, found }) => {
// Ledger advanced during reindex (new commits arrived)
println!("Conflict: expected t={}, found t={}", expected, found);
}
Err(e) => {
// Storage, indexing, or other errors
println!("Reindex failed: {}", e);
}
}
}
How It Works
The reindex operation:
- Looks up the current ledger state and captures
commit_tfor conflict detection - Cancels any active background indexing for the ledger
- Rebuilds a fresh binary columnar index from the full commit chain using
rebuild_index_from_commits:- Phase A: Walks the commit DAG once, reading only the envelope header of each commit via byte-range requests (
ContentStore::get_range). Returns the chronological CID list plus the genesis-mostNsSplitModein a single pass, so per-commit bandwidth on remote storage is ~128 KiB rather than the full commit blob. - Phase B: Resolves commits into batched chunks with chunk-local dictionaries (subjects, strings) and shared global dictionaries (predicates, datatypes, graphs, languages, numbigs, vectors). Commit blobs are pre-fetched concurrently (
buffered(K), defaultK=3, env-tunable viaFLUREE_REBUILD_FETCH_CONCURRENCY) so S3 round-trip latency overlaps with local decode cost. - Phase C: Merges per-chunk dictionaries into global dictionaries with remap tables
- Phase D: Builds SPOT indexes from sorted commit files via k-way merge with graph-aware partitioning
- Phase E: Builds secondary indexes (PSOT, POST, OPST) per-graph from partitioned run files
- Phase F: Uploads dictionaries and index artifacts to CAS, creates
IndexRoot(FIR6)
- Phase A: Walks the commit DAG once, reading only the envelope header of each commit via byte-range requests (
- Validates that no new commits arrived during the build (conflict detection)
- Publishes the new index root via
publish_index_allow_equal - Spawns async garbage collection to clean up old index versions
The rebuilt index preserves full time-travel history: retract-winner events and their preceding asserts are stored in Region 3 (history) of leaf nodes, enabling as-of queries at any past transaction time.
Best Practices
1. Schedule During Low-Traffic Periods
While queries continue to work during reindex, performance may be impacted. Schedule large reindex operations during maintenance windows when possible.
2. Tune Memory Budget for Large Ledgers
For ledgers with millions of flakes, increasing run_budget_bytes reduces the number of spill files and speeds up the merge phase:
#![allow(unused)]
fn main() {
let config = IndexerConfig::default()
.with_run_budget_bytes(2 * 1024 * 1024 * 1024); // 2 GB
}
3. Tune Phase B Fetch Concurrency for Remote Storage
When reindexing from remote storage (S3) on latency-bound platforms like AWS Lambda, Phase B benefits from fetching several commit blobs in parallel so S3 round-trip latency (25–50 ms) overlaps with local decode cost.
# Default: 3. Increase for high-latency links; pin to 1 for strict serial behavior.
export FLUREE_REBUILD_FETCH_CONCURRENCY=4
In-flight memory is bounded by K × avg_commit_blob_size. For typical commits (< 1 MB) and K=3, the overhead is negligible against the run_budget_bytes pool. Pathologically large commits (hundreds of MB) should set K=1 to avoid transient memory spikes.
4. Verify After Reindex
After reindex, verify the results:
#![allow(unused)]
fn main() {
// Get ledger info to check state
let info = fluree.ledger_info(ledger_id).execute().await?;
println!("Index rebuilt to t={}", info["index"]["t"]);
// Run a sample query to verify correctness
let db = fluree_db_api::GraphDb::from_ledger_state(&ledger);
let query_result = fluree.query(&db, &sample_query).await?;
}
5. Concurrent Operations
During reindex:
- Queries continue to work (using old index + novelty)
- Transactions continue to work (writes to novelty)
- Background indexing is paused for this ledger
Related Documentation
- Background Indexing - Automatic incremental indexing
- Admin and Health - Admin operations
- Rust API - Using Fluree as a library
- Storage - Storage configuration
Inline Fulltext Search
Inline fulltext search enables BM25-ranked text scoring directly in queries, using the @fulltext datatype (or a ledger-level f:fullTextDefaults config) and the fulltext() scoring function. This follows the same pattern as @vector and inline similarity functions: declare what to index, persist as normal commits, and query with a scoring function in bind expressions. No external services, no separate ingestion pipeline.
Two ways to enable fulltext scoring on a property:
- Per-value annotation (
@fulltextdatatype) — zero-config, always English. Tag individual literal values at insert time. Good for a handful of obviously-fulltext fields where English is fine. - Property-level configuration (
f:fullTextDefaults) — declare once in the ledger’s config graph which properties should be full-text indexed, and optionally which language to analyze them in. Plain-string values on those properties get indexed automatically — no@typeannotation needed at insert time. Required when you want non-English stemming/stopwords, or when you want every value of a property indexed by default.
Both paths produce the same on-disk BM25 arenas and are queried with the same fulltext(?var, "query") function.
Use cases:
- Document ranking: Score and rank articles, product descriptions, or knowledge base entries by keyword relevance
- Content discovery: Find the most relevant documents for a natural language query
- Faceted search: Combine fulltext scoring with graph pattern filters (e.g., score only documents in a specific category)
- Multilingual catalogs: Index product descriptions in Spanish on one graph and English on another, with the right stemmer picked automatically per-language
The @fulltext Datatype
Why a dedicated datatype?
Plain strings in Fluree are stored as xsd:string values. They are indexed for exact matching and prefix queries, but not for full-text search. The @fulltext datatype tells Fluree that a string value should be analyzed (tokenized, stemmed, stopword-filtered) and indexed for relevance scoring.
@fulltext is a JSON-LD shorthand that resolves to the full IRI https://ns.flur.ee/db#fullText, which can also be written as f:fullText when the Fluree namespace prefix is declared in your @context.
Inserting fulltext values (JSON-LD)
Use "@type": "@fulltext" to annotate a string as fulltext-searchable:
{
"@context": {
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "ex:article-1",
"@type": "ex:Article",
"ex:title": "Rust Programming",
"ex:content": {
"@value": "Rust is a systems programming language focused on safety and performance",
"@type": "@fulltext"
}
}
]
}
You can also use the full IRI or f: prefix form:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"@graph": [
{
"@id": "ex:article-1",
"ex:content": {
"@value": "Rust is a systems programming language...",
"@type": "f:fullText"
}
}
]
}
Inserting fulltext values (Turtle / SPARQL UPDATE)
In Turtle and SPARQL UPDATE, the @fulltext shorthand is not available. Use the f:fullText datatype IRI with the standard ^^ typed-literal syntax.
Turtle data file:
@prefix ex: <http://example.org/> .
@prefix f: <https://ns.flur.ee/db#> .
ex:article-1
a ex:Article ;
ex:title "Introduction to Rust" ;
ex:content "Rust is a systems programming language focused on safety and performance"^^f:fullText .
ex:article-2
a ex:Article ;
ex:title "Database Design Patterns" ;
ex:content "Modern database systems use columnar storage and immutable ledgers"^^f:fullText .
SPARQL UPDATE:
PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>
INSERT DATA {
ex:article-1 a ex:Article ;
ex:title "Introduction to Rust" ;
ex:content "Rust is a systems programming language focused on safety"^^f:fullText .
}
The ^^f:fullText annotation is the Turtle/SPARQL equivalent of "@type": "@fulltext" in JSON-LD. Without it, the string is stored as a plain xsd:string.
Multiple fulltext properties per entity
An entity can have @fulltext on multiple different properties:
{
"@id": "ex:article-1",
"ex:title": {
"@value": "Rust Programming Guide",
"@type": "@fulltext"
},
"ex:content": {
"@value": "Rust is a systems programming language focused on safety...",
"@type": "@fulltext"
}
}
Each property produces an independent fulltext index (arena). When you query with fulltext(), the function automatically uses the arena for the property bound to the variable.
Portability
@fulltext annotations are fully portable across Fluree’s data distribution pipeline. Import, export, push, and pull all preserve @fulltext type annotations, and indexes are rebuilt transparently on the receiving side.
Configured Full-Text Properties (f:fullTextDefaults)
The @fulltext datatype is a per-value shortcut — you decide at insert time, one triple at a time, whether a string gets full-text indexed, and English is the only supported language. For many real-world workloads that’s not what you want. You want to say once, at the ledger level, “index every value of ex:title”, or “index ex:productName in the product catalog graph in Spanish.” That’s what f:fullTextDefaults gives you.
When a property is declared in f:fullTextDefaults, any plain xsd:string or rdf:langString value on that property gets full-text indexed — no @type: @fulltext needed on individual values. Language-tagged (rdf:langString) values automatically route to a per-language arena (French stemmer for "fr", Spanish stopwords for "es", and so on). Untagged plain strings fall back to the configured default language.
The @fulltext datatype continues to work exactly as before: any value tagged @fulltext is always indexed as English, regardless of what f:fullTextDefaults says about its property. You can mix both paths on the same property; English content from either path lands in a single shared arena.
When to use which
| Need | Use |
|---|---|
| English-only, a few obviously-fulltext fields, want the choice per-value | @fulltext datatype |
| Non-English (or mixed languages) | f:fullTextDefaults with f:defaultLanguage |
| Every value of a property should be searchable, no per-value opt-in | f:fullTextDefaults |
| Different languages per graph (e.g. multilingual catalog) | f:fullTextDefaults with per-graph overrides |
| Zero config, just works | @fulltext datatype |
Setting it up
Write configuration into the ledger’s #config named graph, alongside any other config groups (policy, SHACL, reasoning, etc.). The config is itself a transaction — it’s versioned and auditable like any other data.
Minimal — index ex:title and ex:body, English by default:
@prefix f: <https://ns.flur.ee/db#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex: <http://example.org/> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
f:fullTextDefaults [
a f:FullTextDefaults ;
f:defaultLanguage "en" ;
f:property [ a f:FullTextProperty ; f:target ex:title ] ,
[ a f:FullTextProperty ; f:target ex:body ]
] .
}
Or as JSON-LD:
{
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "urn:fluree:mydb:main:config:ledger",
"@type": "f:LedgerConfig",
"@graph": "urn:fluree:mydb:main#config",
"f:fullTextDefaults": {
"@type": "f:FullTextDefaults",
"f:defaultLanguage": "en",
"f:property": [
{ "@type": "f:FullTextProperty", "f:target": { "@id": "ex:title" } },
{ "@type": "f:FullTextProperty", "f:target": { "@id": "ex:body" } }
]
}
}
]
}
HTTP / Docker: the same JSON-LD config goes into a regular /update transaction. Wrap it in @graph and POST to the ledger:
curl -X POST 'http://localhost:8090/v1/fluree/update?ledger=mydb:main' \
-H 'Content-Type: application/json' \
-d @- <<'JSON'
{
"@context": {
"f": "https://ns.flur.ee/db#",
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "urn:fluree:mydb:main:config:ledger",
"@type": "f:LedgerConfig",
"@graph": "urn:fluree:mydb:main#config",
"f:fullTextDefaults": {
"@type": "f:FullTextDefaults",
"f:defaultLanguage": "en",
"f:property": [
{ "@type": "f:FullTextProperty", "f:target": { "@id": "ex:title" } },
{ "@type": "f:FullTextProperty", "f:target": { "@id": "ex:body" } }
]
}
}
]
}
JSON
The config is stored in the ledger’s #config named graph (note the "@graph": "urn:fluree:mydb:main#config" placement directive on the resource). To verify, query the config graph:
curl -X POST http://localhost:8090/v1/fluree/query \
-H 'Content-Type: application/json' \
-d '{
"@context": { "f": "https://ns.flur.ee/db#" },
"from": "mydb:main",
"from-named": ["urn:fluree:mydb:main#config"],
"where": [{ "@graph": "urn:fluree:mydb:main#config",
"@id": "?cfg", "f:fullTextDefaults": "?defaults" }],
"select": ["?cfg", "?defaults"]
}'
After writing config, trigger a reindex so existing values on ex:title and ex:body get indexed. See Reindexing after a config change below.
Data writes don’t change. Once config is in place and the reindex has run, just insert plain strings the way you always would:
{
"@id": "ex:doc1",
"ex:title": "Rust programming language guide",
"ex:body": "Rust is a systems programming language..."
}
Both values flow into BM25 arenas automatically.
Multiple languages
Fluree ships Snowball stemmers and curated stopwords for 18 languages. Pick one as your ledger default via f:defaultLanguage; any BCP-47 tag in the list below works.
| Tag | Language |
|---|---|
ar | Arabic |
da | Danish |
de | German |
el | Greek |
en | English |
es | Spanish |
fi | Finnish |
fr | French |
hu | Hungarian |
it | Italian |
nl | Dutch |
no (or nb, nn) | Norwegian |
pt | Portuguese |
ro | Romanian |
ru | Russian |
sv | Swedish |
ta | Tamil |
tr | Turkish |
A BCP-47 tag that isn’t on this list still works — it just skips stemming and stopword removal (tokenize + lowercase only). Index and query sides agree on that behavior so scores remain consistent.
Per-value language tagging via rdf:langString. If a single property holds values in different languages, tag them with @language (JSON-LD) or @lang (Turtle):
{
"@id": "ex:doc1",
"ex:title": [
{ "@value": "Rust programming", "@language": "en" },
{ "@value": "Programmation Rust", "@language": "fr" }
]
}
Fluree automatically builds per-language arenas (ex:title in English, ex:title in French) and queries against the arena whose language matches the row’s tag. Untagged values fall back to the ledger’s f:defaultLanguage.
Per-graph overrides
Different graphs can have different full-text configuration. For example, a product catalog graph might index ex:productName in Spanish while the rest of the ledger uses English:
@prefix f: <https://ns.flur.ee/db#> .
@prefix ex: <http://example.org/> .
GRAPH <urn:fluree:mydb:main#config> {
<urn:fluree:mydb:main:config:ledger> a f:LedgerConfig ;
# Ledger-wide: English, index ex:title everywhere.
f:fullTextDefaults [
a f:FullTextDefaults ;
f:defaultLanguage "en" ;
f:property [ a f:FullTextProperty ; f:target ex:title ]
] ;
# Catalog graph: also index ex:productName, default Spanish.
f:graphOverrides [
a f:GraphConfig ;
f:targetGraph <urn:example:productCatalog> ;
f:fullTextDefaults [
a f:FullTextDefaults ;
f:defaultLanguage "es" ;
f:property [ a f:FullTextProperty ; f:target ex:productName ]
]
] .
}
The merge is additive: every property in the ledger-wide list applies to every graph (including productCatalog), and the per-graph override adds ex:productName on top of ex:title. The override’s f:defaultLanguage shadows the ledger-wide language only for untagged plain strings on that specific graph.
Targeting the default graph or txn-meta explicitly. Use the f:defaultGraph sentinel to target only the default graph (g_id = 0), or f:txnMetaGraph for the ledger’s txn-meta graph:
f:graphOverrides [
a f:GraphConfig ;
f:targetGraph f:defaultGraph ;
f:fullTextDefaults [
a f:FullTextDefaults ;
f:property [ a f:FullTextProperty ; f:target ex:note ]
]
]
Locking config (f:overrideControl)
If you want to prevent per-graph overrides from modifying the ledger-wide full-text defaults, set f:overrideControl to f:OverrideNone on the ledger-wide group:
<urn:fluree:mydb:main:config:ledger> f:fullTextDefaults [
a f:FullTextDefaults ;
f:defaultLanguage "en" ;
f:overrideControl f:OverrideNone ;
f:property [ a f:FullTextProperty ; f:target ex:title ]
] .
With f:OverrideNone, any f:graphOverrides entry targeting f:fullTextDefaults is ignored at resolution time — the ledger-wide group is final. See Override control for the full model.
Reindexing after a config change
Writing or editing f:fullTextDefaults does not automatically rebuild any arenas. You control when reindexing happens.
What you need to know:
- New commits after the config change pick up the new config automatically during the next incremental index build — newly inserted values on configured properties flow into arenas as expected.
- Existing values that were committed before the config change are not retroactively indexed until you run a full reindex.
- Removing or renaming a property from
f:fullTextDefaultsdrops it from the configured set for new commits, but the existing arena stays until you reindex. - Changing
f:defaultLanguagedoesn’t rewrite existing arenas — they keep whatever language they were built with. New values get the new language; scores may be temporarily inconsistent across the old/new boundary until a reindex.
To force the full picture — pick up config changes for all existing data — run a manual reindex:
# CLI
fluree reindex mydb:main
# Or via the admin API
curl -X POST https://<fluree-server>/v1/fluree/reindex \
-H 'Content-Type: application/json' \
-d '{"ledger": "mydb:main"}'
The reindex reads the current f:fullTextDefaults, walks the entire commit chain, and rebuilds arenas with the new configuration applied consistently.
Note on concurrent reindex + config write. A reindex already in progress operates on a point-in-time snapshot and will NOT pick up a config change committed during its run. If you change config during a reindex, wait for it to finish, then trigger another reindex. See Reindex for full semantics.
How config-path and @fulltext-datatype coexist
If a value’s datatype is @fulltext, the datatype wins: that value is indexed as English, even if the property is listed in f:fullTextDefaults with a different f:defaultLanguage. This keeps the @fulltext contract stable (“I tagged this value English, index it now”) and guarantees no double-indexing.
In practice, a single property can mix:
@fulltext-datatype values → English arenardf:langStringvalues tagged"fr"→ French arena- Plain
xsd:stringvalues → arena for the configuredf:defaultLanguage
Each language becomes its own arena; queries automatically look up the right one based on the row’s language tag (with English as the fallback). Ledger-wide English content from both paths shares a single arena — no wasted duplication.
The fulltext() Scoring Function
The fulltext() function computes a BM25 relevance score for a bound text value against a query string. Use it in bind expressions within JSON-LD queries.
Basic usage
{
"@context": {
"ex": "http://example.org/"
},
"select": ["?title", "?score"],
"where": [
{ "@id": "?doc", "ex:content": "?content", "ex:title": "?title" },
["bind", "?score", "(fulltext ?content \"Rust programming\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Arguments:
- First argument: a variable bound to a
@fulltext-typed value - Second argument: the search query string (natural language)
Returns: A numeric score (xsd:double). Higher scores indicate greater relevance. Returns 0.0 when the document contains none of the query terms.
Alternative array syntax
The function also accepts array form:
["bind", "?score", ["fulltext", "?content", "Rust programming"]]
This is equivalent to the S-expression string form.
Filtering by score
Combine bind with filter to exclude non-matching documents:
["bind", "?score", "(fulltext ?content \"search terms\")"],
["filter", "(> ?score 0)"]
Combining with graph patterns
Fulltext scoring works naturally with standard graph patterns. Filter by type, category, or relationships before or after scoring:
{
"@context": {
"ex": "http://example.org/"
},
"select": ["?title", "?score"],
"where": [
{
"@id": "?doc",
"@type": "ex:Article",
"ex:content": "?content",
"ex:title": "?title",
"ex:category": "?cat"
},
["filter", "(= ?cat \"technology\")"],
["bind", "?score", "(fulltext ?content \"distributed database systems\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Placing the category filter before the fulltext() bind reduces the number of documents scored, improving query performance.
How Scoring Works
The fulltext() function uses BM25 (Best Match 25), the standard information retrieval scoring algorithm used by search engines.
BM25 formula
For each query term t in document d:
IDF(t) = ln((N - df(t) + 0.5) / (df(t) + 0.5) + 1)
TF_norm(t) = tf(t,d) * (k1 + 1) / (tf(t,d) + k1 * (1 - b + b * |d| / avgdl))
score(q,d) = SUM( IDF(t) * TF_norm(t) ) for each query term t
What makes the scoring effective
-
IDF (Inverse Document Frequency) – Downweights common terms (“the”, “is”) and boosts rare, discriminative terms. A query for “distributed database” gives more weight to “distributed” (rarer) than “database” (common in a tech corpus).
-
Document length normalization – Prevents long documents from dominating purely due to having more words. Controlled by parameter b (default 0.75). A 50-word abstract mentioning “database” twice scores comparably to a 500-word article mentioning it twice.
-
Term frequency saturation – Diminishing returns for repeated terms, controlled by parameter k1 (default 1.2). The 5th occurrence of “database” in a document contributes less than the 1st.
-
Corpus-wide average document length (
avgdl) – Anchors the length normalization across the entire collection.
Text analysis pipeline
Both documents and queries go through the same analysis pipeline, and the index and query sides always use the same analyzer for a given arena — so query stems match document stems:
- Tokenization – Split text on whitespace and punctuation (Unicode-aware)
- Lowercasing – Normalize to lowercase
- Stopword removal – Remove common stopwords for the bucket’s language (“the”, “is”, “and” in English; “le”, “la”, “et” in French; etc.)
- Stemming – Reduce words to stems using the Snowball stemmer for the bucket’s language
This means a query for “programming” against an English arena matches documents containing “programmed”, “programs”, or “programmer”. A French-language arena stems French word forms instead (“chantait” → “chant”, matching “chanter”, “chantons”, and so on).
For the @fulltext datatype, the analyzer is always English. For properties declared in f:fullTextDefaults, the analyzer matches the arena’s language (row’s rdf:langString tag, or the configured f:defaultLanguage). An unrecognized BCP-47 tag skips steps 3 and 4 — tokenize + lowercase only — consistently on both sides.
Indexing
Automatic arena construction
During background binary index builds, Fluree automatically constructs a FulltextArena (FTA1 format) for each (graph, predicate) combination that has @fulltext values. Each arena stores:
- A sorted term dictionary of stemmed tokens
- Per-document bag-of-words (BoW) entries:
(term_id, tf)pairs sorted by term ID - Corpus-level statistics: document count (N), sum of document lengths (sum_dl), and per-term document frequency (df)
This precomputed representation enables fast scoring at query time – the indexed path avoids per-row text analysis entirely, reading precomputed BoW entries via binary search.
No-index fallback
If no binary index has been built yet (e.g., immediately after ledger creation), fulltext() still works using an on-the-fly analysis fallback. Documents are tokenized and scored using TF-saturation (a simplified scoring model). This is slower but ensures the feature works before background indexing catches up.
Novelty overlay
Documents committed after the last index build (in the “novelty” layer) are automatically included in query results with consistent BM25 scores. Fluree computes effective corpus statistics by merging the persisted arena stats with a novelty delta:
N' = N_arena + delta_N_noveltyavgdl' = (sum_dl_arena + delta_sum_dl_novelty) / N'df'(t) = df_arena(t) + delta_df_novelty(t)
This ensures that indexed documents and novelty documents produce comparable, consistent scores in the same query.
Retraction handling
When a @fulltext value is retracted, it is removed from the arena at the next index build. The retracted document no longer appears in fulltext query results and its statistics are excluded from corpus-level calculations.
Performance
Query-time benchmarks
All benchmarks measure the full end-to-end query path: JSON-LD parse, query plan, scan, BM25 score, sort, and limit 10. Documents are paragraph-length (~30-60 words), representative of article abstracts, product descriptions, or knowledge base entries.
| Documents | Novelty (no index) | Indexed (arena BM25) | Speedup |
|---|---|---|---|
| 1,000 | 11.6 ms | 1.7 ms | 6.7x |
| 5,000 | 57.0 ms | 7.9 ms | 7.2x |
| 10,000 | 115.8 ms | 15.5 ms | 7.5x |
| 50,000 | 601.9 ms | 80.2 ms | 7.5x |
Indexed throughput: ~625,000 docs/sec – 50K documents scored and ranked in 80ms.
Novelty throughput: ~85,000 docs/sec – 50K documents in ~600ms (no index required).
The indexed path is 7-7.5x faster because it reads precomputed BoW entries via binary search on sorted (term_id, tf) arrays, avoiding per-row text analysis and HashMap allocation.
Scaling is near-linear. Extrapolating, the indexed path handles approximately 625K documents within a 1-second query budget.
When to consider the BM25 graph source pipeline
Inline @fulltext works well for tens to hundreds of thousands of documents per predicate. For larger corpora (1M+ documents), consider the dedicated BM25 graph source pipeline, which provides:
- WAND (Weak AND) top-k pruning – Skips documents that provably cannot enter the top-k results, critical for large corpora where scanning every document is prohibitive
- Chunked posting list storage – Compressed, seekable posting lists with skip pointers for efficient I/O at scale
- Incremental index updates – Updates posting lists in place without rebuilding the full index
- Cross-property dependency tracking – BM25 scores can depend on fields from other properties
- Configurable analyzers per property – Language-specific tokenizers, stemmers, and stopword lists
- Multi-term query optimization – Term-at-a-time vs document-at-a-time evaluation strategies
| Corpus size | Recommendation |
|---|---|
| < 100K docs | Inline @fulltext works well, especially with binary indexing |
| 100K - 500K | Inline @fulltext remains viable; query times scale linearly |
| 500K - 1M | Evaluate based on latency requirements; WAND pruning may help |
| 1M+ | Use the BM25 graph source for production workloads |
Comparison with @vector
Both @fulltext and @vector follow the same architectural pattern: annotate, commit, index, query.
@vector | @fulltext | |
|---|---|---|
| Annotation | "@type": "@vector" | "@type": "@fulltext" |
| Index artifact | VAS1 arena (raw vectors) | FTA1 arena (BoW + corpus stats) |
| Scoring function | dotProduct, cosineSimilarity, euclideanDistance | fulltext(?var, "query") |
| Query input | Vector literal | Natural language string |
| Per-row cost | O(dims) float math | O(query_terms) integer lookups |
| Portability | Push/pull/import/export preserves @vector | Push/pull/import/export preserves @fulltext |
Complete Example
1. Insert documents with fulltext content:
{
"@context": {
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "ex:article-1",
"@type": "ex:Article",
"ex:title": "Introduction to Rust",
"ex:content": {
"@value": "Rust is a systems programming language focused on safety, speed, and concurrency. It prevents segfaults and guarantees thread safety.",
"@type": "@fulltext"
}
},
{
"@id": "ex:article-2",
"@type": "ex:Article",
"ex:title": "Database Design Patterns",
"ex:content": {
"@value": "Modern database systems use columnar storage and immutable ledgers. Graph databases model relationships as first-class citizens.",
"@type": "@fulltext"
}
},
{
"@id": "ex:article-3",
"@type": "ex:Article",
"ex:title": "Rust for Systems Programming",
"ex:content": {
"@value": "Building high-performance systems in Rust requires understanding ownership, borrowing, and lifetime semantics. Rust's type system catches bugs at compile time.",
"@type": "@fulltext"
}
}
]
}
2. Query – find articles about “Rust systems programming”, ranked by relevance:
{
"@context": {
"ex": "http://example.org/"
},
"select": ["?title", "?score"],
"where": [
{
"@id": "?doc",
"@type": "ex:Article",
"ex:content": "?content",
"ex:title": "?title"
},
["bind", "?score", "(fulltext ?content \"Rust systems programming\")"],
["filter", "(> ?score 0)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Expected results (ordered by relevance):
- “Rust for Systems Programming” – highest score (most query terms, multiple occurrences)
- “Introduction to Rust” – mentions Rust and systems programming
- “Database Design Patterns” – excluded by
> 0filter (no matching terms)
SPARQL Support
Inserting data
Fulltext annotation works in SPARQL UPDATE today using the ^^f:fullText typed literal syntax (see the Turtle/SPARQL insertion examples above).
Querying
The fulltext() scoring function is currently available in JSON-LD Query only. SPARQL query support is planned for a future release, with anticipated syntax like:
PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?title ?score
WHERE {
?doc a ex:Article ;
ex:content ?content ;
ex:title ?title .
BIND(f:fulltext(?content, "Rust programming") AS ?score)
FILTER(?score > 0)
}
ORDER BY DESC(?score)
LIMIT 10
This mirrors the pattern established by inline vector similarity functions (dotProduct, cosineSimilarity, euclideanDistance), which also support JSON-LD Query today with SPARQL planned.
Related Documentation
- Datatypes and Typed Values – All supported datatypes including
@fulltext - Setting Groups – Full reference for the
f:fullTextDefaultsschema (fields, additive merge, override control) - Override control – Locking ledger-wide config against per-graph overrides
- Reindex – When and how to reindex (required to pick up config changes for existing data)
- JSON-LD Query – Full query language reference
- BM25 Graph Source – Dedicated BM25 full-text search for large-scale corpora
- Vector Search – Inline similarity search with
@vector - Background Indexing – How background indexing works
BM25 Full-Text Search
Fluree provides integrated full-text search using the BM25 (Best Matching 25) ranking algorithm. BM25 indexes are implemented as graph sources: they index text content from a source ledger and expose search results that can be joined with structured graph queries.
What is BM25?
BM25 is a probabilistic ranking function that scores documents based on query term frequency and document length normalization. It’s widely used in search engines and information retrieval systems.
Key features:
- Term frequency with saturation (controlled by k1)
- Inverse document frequency weighting
- Document length normalization (controlled by b)
- English stemming and stopword filtering (default analyzer)
- Block-Max WAND for efficient top-k queries (early termination)
- Incremental index updates
- Time-travel: query the index as of any past transaction
Creating a BM25 Index
BM25 indexes are created via the Rust API using Bm25CreateConfig. There are no HTTP endpoints for index management yet — indexes are managed programmatically.
Basic Index
#![allow(unused)]
fn main() {
use fluree_db_api::{Bm25CreateConfig, FlureeBuilder};
use serde_json::json;
let fluree = FlureeBuilder::file("/path/to/data").build()?;
// Create a ledger and insert some data
let ledger = fluree.create_ledger("docs:main").await?;
let tx = json!({
"@context": { "ex": "http://example.org/" },
"@graph": [
{ "@id": "ex:doc1", "@type": "ex:Article", "ex:title": "Rust programming guide" },
{ "@id": "ex:doc2", "@type": "ex:Article", "ex:title": "Python for beginners" },
{ "@id": "ex:doc3", "@type": "ex:Article", "ex:title": "Systems programming in Rust" }
]
});
let ledger = fluree.insert(ledger, &tx).await?.ledger;
// Define the indexing query
let query = json!({
"@context": { "ex": "http://example.org/" },
"where": [{ "@id": "?x", "@type": "ex:Article", "ex:title": "?title" }],
"select": { "?x": ["@id", "ex:title"] }
});
// Create the BM25 index
let config = Bm25CreateConfig::new("article-search", "docs:main", query);
let result = fluree.create_full_text_index(config).await?;
println!("Indexed {} documents", result.doc_count);
println!("Graph source: {}", result.graph_source_id); // "article-search:main"
}
The graph source ID is {name}:{branch} — for example, article-search:main.
Indexing Query
The indexing query defines what to index. It’s a standard Fluree JSON-LD query with these requirements:
- Must include
@idin the select (to identify documents) - Must use
selectwith a map form:{"?x": ["@id", "ex:prop1", "ex:prop2"]} - All selected text properties are extracted and tokenized for search
The query can filter by type, filter by property values, or use any valid Fluree where clause:
{
"@context": { "ex": "http://example.org/" },
"where": [
{ "@id": "?x", "@type": "ex:Article", "ex:title": "?title" },
{ "@id": "?x", "ex:status": "published" }
],
"select": { "?x": ["@id", "ex:title", "ex:content", "ex:tags"] }
}
Configuration Options
| Parameter | Default | Description |
|---|---|---|
name | (required) | Graph source name. Cannot contain :. |
ledger | (required) | Source ledger alias (e.g., "docs:main") |
query | (required) | Indexing query (JSON-LD, must have select) |
branch | "main" | Branch name for the graph source |
k1 | 1.2 | Term frequency saturation. Higher = more weight to term frequency. Must be > 0. Typical range: 1.2-2.0. |
b | 0.75 | Document length normalization. 0 = no normalization, 1 = full normalization. Must be 0.0-1.0. |
#![allow(unused)]
fn main() {
let config = Bm25CreateConfig::new("search", "docs:main", query)
.with_branch("dev")
.with_k1(1.5)
.with_b(0.5);
}
Text Analysis
Fluree uses a default English analyzer that applies:
- Tokenization: Unicode-aware word boundary splitting
- Lowercasing: All tokens converted to lowercase
- Stopword filtering: Common English words removed (the, a, an, is, etc.)
- Stemming: Snowball English stemmer reduces words to root forms (e.g., “programming” -> “program”)
The analyzer is not configurable — it always uses the English pipeline for consistency.
Querying BM25 Indexes
JSON-LD Query Syntax
BM25 search is integrated into Fluree’s query system via the f: namespace predicates:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "docs:main",
"where": [
{
"f:graphSource": "article-search:main",
"f:searchText": "rust programming",
"f:searchLimit": 10,
"f:searchResult": {
"f:resultId": "?doc",
"f:resultScore": "?score"
}
},
{ "@id": "?doc", "ex:author": "?author" }
],
"select": ["?doc", "?score", "?author"]
}
Pattern fields:
| Field | Description |
|---|---|
f:graphSource | Graph source ID (e.g., "article-search:main") |
f:searchText | Query text (analyzed with same pipeline as indexing) |
f:searchLimit | Maximum number of search results |
f:searchResult | Binding object for results |
f:resultId | Variable binding for the document IRI |
f:resultScore | Variable binding for the BM25 relevance score |
f:resultLedger | (Optional) Variable binding for ledger provenance |
Combining Search with Structured Queries
The search pattern produces ?doc and ?score bindings. These can be joined with ledger data using normal where clauses:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "docs:main",
"where": [
{
"f:graphSource": "article-search:main",
"f:searchText": "rust",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
},
{ "@id": "?doc", "ex:title": "?title" },
{ "@id": "?doc", "ex:author": "?author" }
],
"select": ["?doc", "?title", "?author", "?score"]
}
The BM25 search runs first and produces candidate bindings. The subsequent where clauses join those candidates with the source ledger to retrieve additional properties.
Rust API: Direct Search
You can also use the Rust API directly for programmatic search without the query engine:
#![allow(unused)]
fn main() {
use fluree_db_query::bm25::{Analyzer, Bm25Scorer};
// Load the index
let index = fluree.load_bm25_index("article-search:main").await?;
// Analyze query terms (same pipeline as indexing)
let analyzer = Analyzer::english_default();
let terms = analyzer.analyze_to_strings("rust programming");
let term_refs: Vec<&str> = terms.iter().map(|s| s.as_str()).collect();
// Score and rank
let scorer = Bm25Scorer::new(&index, &term_refs);
let results = scorer.top_k(10);
for (doc_key, score) in &results {
println!("{}: {:.2}", doc_key.subject_iri, score);
}
}
Rust API: Query with BM25
Use query_connection_with_bm25 for integrated queries:
#![allow(unused)]
fn main() {
let query = json!({
"@context": { "ex": "http://example.org/", "f": "https://ns.flur.ee/db#" },
"from": "docs:main",
"where": [
{
"f:graphSource": "article-search:main",
"f:searchText": "rust",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
},
{ "@id": "?doc", "ex:author": "?author" }
],
"select": ["?doc", "?score", "?author"]
});
let result = fluree.query_connection_with_bm25(&query).await?;
}
Index Maintenance
Syncing
BM25 indexes are not automatically updated when the source ledger changes. You must explicitly sync them:
#![allow(unused)]
fn main() {
// Incremental sync (detects changes since last watermark)
let sync_result = fluree.sync_bm25_index("article-search:main").await?;
println!("Upserted: {}, Removed: {}", sync_result.upserted, sync_result.removed);
// Force full resync (rebuilds the entire index)
let sync_result = fluree.resync_bm25_index("article-search:main").await?;
}
Incremental sync uses property dependency tracking to identify which subjects changed since the last indexed commit. Only affected documents are re-queried and re-indexed. If no affected subjects are detected, it falls back to a full resync.
Background Maintenance Worker
For production use, the Bm25MaintenanceWorker can be configured to automatically sync indexes when source ledgers change:
- Watches for commit events on source ledgers
- Debounces rapid commits (configurable interval)
- Bounded concurrency for concurrent sync operations
- Registers/unregisters graph sources dynamically
Staleness Checking
Check whether an index is behind its source ledger:
#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("article-search:main").await?;
println!("Index at t={}, ledger at t={}, stale: {}, lag: {}",
check.index_t, check.ledger_t, check.is_stale, check.lag);
}
Time-Travel
Load an index at a specific historical transaction time:
#![allow(unused)]
fn main() {
// Load index as of transaction t=5
let (index, actual_t) = fluree.load_bm25_index_at("article-search:main", 5).await?;
println!("Loaded snapshot at t={}, docs: {}", actual_t, index.num_docs());
}
BM25 maintains a manifest of historical snapshots. The manifest is stored in content-addressed storage and tracks all snapshot versions. load_bm25_index_at selects the snapshot with the largest index_t <= as_of_t.
Dropping an Index
#![allow(unused)]
fn main() {
let drop_result = fluree.drop_full_text_index("article-search:main").await?;
println!("Deleted {} snapshots", drop_result.deleted_snapshots);
// Drop is idempotent
let drop_again = fluree.drop_full_text_index("article-search:main").await?;
assert!(drop_again.was_already_retracted);
}
Dropping marks the graph source as retracted in the nameservice and deletes all snapshot blobs from storage. The index can be recreated with the same name afterward.
Scoring and Top-K Optimization
For top-k queries (the typical case via f:searchLimit), BM25 uses Block-Max WAND (Weak AND) to avoid scoring every matching document. Posting lists are divided into fixed-size blocks (128 postings each) with per-block metadata (maximum term frequency). WAND uses these to compute score upper bounds, skipping entire blocks that cannot contribute to the current top-k results.
This makes top_k(10) on a 100K-document index significantly faster than scoring all matches — the algorithm terminates early once it can prove no remaining document can displace the current top results.
When block metadata is unavailable (e.g., during index building before the first snapshot), scoring falls back to dense accumulation over all postings.
Storage Format
V4 Chunked Format
Large BM25 indexes use a chunked storage format (v4) that splits the index into:
- Root blob: Terms dictionary, document metadata, BM25 statistics, routing table
- Posting leaflet blobs: Compressed posting lists (~2MB each), stored as separate content-addressed objects. Each posting list includes block metadata (128 postings per block with
max_doc_idandmax_tf) used for WAND score upper bounds and block-level navigation.
This enables selective loading: queries only fetch the leaflets containing terms that match the search query, rather than loading the entire index.
Leaflet Caching
Posting leaflets are cached in the global LeafletCache (shared with core index leaflets). Cache entries are keyed by content ID hash and are immutable (content-addressed data never changes). The cache uses moka’s TinyLFU eviction and is governed by the global cache budget (--cache-max-mb / FLUREE_CACHE_MAX_MB, default: tiered fraction of RAM — 30% <4GB, 40% 4-8GB, 50% ≥8GB).
Parallel I/O
Both reads and writes use bounded-concurrency parallel I/O (buffer_unordered(32)) for leaflet operations. This caps socket pressure when working with object stores like S3 while still providing significant throughput improvement over sequential access.
Format Selection
The storage format is selected automatically based on the storage backend:
- File storage: V3 single-blob format (optimized for local filesystem)
- Memory / S3 / object store: V4 chunked format (enables selective loading and caching)
Deployment Modes
Embedded Mode (Default)
In embedded mode, the BM25 index is loaded and searched within the same process as Fluree. This is the default behavior.
Remote Mode
In remote mode, search queries are delegated to a dedicated search service (fluree-search-httpd):
fluree-search-httpd \
--storage-root file:///var/fluree/data \
--nameservice-path file:///var/fluree/ns \
--listen 0.0.0.0:9090
Both modes use identical analyzer configuration, BM25 scoring algorithm, and time-travel semantics — queries return identical results regardless of deployment mode.
See BM25 Graph Source for details on the remote search protocol.
Related Documentation
- BM25 Graph Source - Graph source integration and remote search protocol
- Background Indexing - Core index architecture
- Vector Search - Similarity search
- Graph Sources Overview - Graph source concepts
Vector Search
Vector search enables similarity search using embedding vectors, supporting use cases like:
- Semantic search: Find similar meanings, not just keywords
- Recommendations: Find similar products, content, users
- Image search: Find similar images by visual features
- Anomaly detection: Find unusual patterns
Fluree supports two complementary approaches:
- Inline similarity functions – compute
dotProduct,cosineSimilarity, oreuclideanDistancedirectly in queries usingbind. No external index required. - HNSW vector indexes – build dedicated approximate-nearest-neighbor (ANN) indexes for large-scale similarity search using the
f:*query pattern.
The @vector Datatype
Why a dedicated datatype?
In RDF, a plain JSON array like [0.5, 0.5, 0.0] is decomposed into individual values. Duplicate elements can be deduplicated, and ordering is not guaranteed. This breaks embedding vectors. The @vector datatype tells Fluree to store the array as a single, ordered, fixed-length vector.
@vector is a shorthand for the full IRI https://ns.flur.ee/db#embeddingVector, which can also be written as f:embeddingVector when the Fluree namespace prefix is declared in your @context.
Storage: f32 precision contract
All @vector values are stored as IEEE-754 binary32 (f32) arrays. This means:
- Each element in your JSON array is quantized to f32 at ingest time
- Values that are not representable as finite f32 (NaN, Infinity, values exceeding f32 range) are rejected
- Round-trip reads return the f32-quantized values (e.g.,
0.1in JSON becomes0.10000000149011612after f32 quantization) - This provides a compact, cache-friendly representation optimized for SIMD similarity computation
If you need higher precision (f64) or different vector formats (sparse, integer), store them as a custom RDF datatype string.
Inserting vectors (JSON-LD)
Use "@type": "@vector" to annotate a numeric array as a vector:
{
"@context": {
"ex": "http://example.org/"
},
"@graph": [
{
"@id": "ex:doc1",
"@type": "ex:Document",
"ex:embedding": {
"@value": [0.1, 0.2, 0.3, 0.4],
"@type": "@vector"
}
}
]
}
You can also use the full IRI or the f: prefix form, which is equivalent:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"@graph": [
{
"@id": "ex:doc1",
"ex:embedding": {
"@value": [0.1, 0.2, 0.3, 0.4],
"@type": "f:embeddingVector"
}
}
]
}
Incorrect – plain array (will not work for similarity):
{
"@id": "ex:doc1",
"ex:embedding": [0.1, 0.2, 0.3, 0.4]
}
Plain arrays are decomposed into individual RDF values where duplicates may be removed and order is lost.
Inserting vectors (Turtle / SPARQL UPDATE)
In Turtle and SPARQL UPDATE, the @vector shorthand is not available. Use the f:embeddingVector datatype IRI with the standard ^^ typed-literal syntax:
PREFIX ex: <http://example.org/>
PREFIX f: <https://ns.flur.ee/db#>
INSERT DATA {
ex:doc1 ex:embedding "[0.1, 0.2, 0.3, 0.4]"^^f:embeddingVector .
}
The vector is represented as a JSON array string with the ^^f:embeddingVector datatype annotation.
Multiple vectors per entity
An entity can have multiple vectors on the same property:
{
"@id": "ex:doc1",
"ex:embedding": [
{"@value": [0.1, 0.9], "@type": "@vector"},
{"@value": [0.2, 0.8], "@type": "@vector"}
]
}
Each vector produces separate rows in query results.
Vector literals in query VALUES clauses
When passing a vector literal in a query values clause, use the full IRI or the f: prefix form – the @vector shorthand is only resolved in the transaction parser:
"values": [
["?queryVec"],
[{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
]
Or with the full IRI:
"values": [
["?queryVec"],
[{"@value": [0.7, 0.6], "@type": "https://ns.flur.ee/db#embeddingVector"}]
]
Inline Similarity Functions (JSON-LD Query)
Fluree provides three vector similarity functions that can be used in bind expressions within JSON-LD queries. These compute similarity scores directly during query execution without requiring a pre-built index.
Function names are case-insensitive; dotProduct, dotproduct, and dot_product are all equivalent.
dotProduct
Computes the dot product (inner product) of two vectors. Higher scores indicate greater similarity when vectors represent aligned directions.
{
"@context": {
"ex": "http://example.org/ns/",
"f": "https://ns.flur.ee/db#"
},
"select": ["?doc", "?score"],
"values": [
["?queryVec"],
[{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
],
"where": [
{"@id": "?doc", "ex:embedding": "?vec"},
["bind", "?score", "(dotProduct ?vec ?queryVec)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Score range: (-inf, +inf). Best when vector magnitude encodes importance.
cosineSimilarity
Computes the cosine of the angle between two vectors. Ignores magnitude, focusing purely on directional similarity.
{
"@context": {
"ex": "http://example.org/ns/",
"f": "https://ns.flur.ee/db#"
},
"select": ["?doc", "?score"],
"values": [
["?queryVec"],
[{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
],
"where": [
{"@id": "?doc", "ex:embedding": "?vec"},
["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Score range: [-1, 1] (1 = identical direction, 0 = orthogonal, -1 = opposite). Returns null if either vector has zero magnitude. Best for text embeddings and normalized vectors.
euclideanDistance
Computes the L2 (straight-line) distance between two vectors. Lower scores indicate greater similarity.
{
"@context": {
"ex": "http://example.org/ns/",
"f": "https://ns.flur.ee/db#"
},
"select": ["?doc", "?distance"],
"values": [
["?queryVec"],
[{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
],
"where": [
{"@id": "?doc", "ex:embedding": "?vec"},
["bind", "?distance", "(euclideanDistance ?vec ?queryVec)"]
],
"orderBy": "?distance",
"limit": 10
}
Score range: [0, +inf) (0 = identical). Best for geometric similarity and when absolute position matters.
Alternative array syntax
The similarity functions also accept array form instead of the S-expression string:
["bind", "?score", ["dotProduct", "?vec", "?queryVec"]]
This is equivalent to:
["bind", "?score", "(dotProduct ?vec ?queryVec)"]
Filtering by score threshold
Combine bind with filter to return only results above a similarity threshold:
{
"@context": {
"ex": "http://example.org/ns/",
"f": "https://ns.flur.ee/db#"
},
"select": ["?doc", "?score"],
"values": [
["?queryVec"],
[{"@value": [0.7, 0.6], "@type": "f:embeddingVector"}]
],
"where": [
{"@id": "?doc", "ex:embedding": "?vec"},
["bind", "?score", "(dotProduct ?vec ?queryVec)"],
["filter", "(> ?score 0.7)"]
]
}
Combining with graph patterns
Vector similarity can be combined with standard graph patterns to filter by type, property values, or relationships:
{
"@context": {
"ex": "http://example.org/ns/",
"f": "https://ns.flur.ee/db#"
},
"select": ["?doc", "?title", "?score"],
"values": [
["?queryVec"],
[{"@value": [0.9, 0.1, 0.05], "@type": "f:embeddingVector"}]
],
"where": [
{"@id": "?doc", "@type": "ex:Article", "ex:title": "?title", "ex:embedding": "?vec"},
["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"],
["filter", "(> ?score 0.5)"]
],
"orderBy": [["desc", "?score"]],
"limit": 5
}
Using a stored vector as the query vector
Instead of providing a literal vector, you can use a stored entity’s vector:
{
"@context": {
"ex": "http://example.org/ns/"
},
"select": ["?similar", "?score"],
"where": [
{"@id": "ex:reference-doc", "ex:embedding": "?queryVec"},
{"@id": "?similar", "ex:embedding": "?vec"},
["filter", "(!= ?similar ex:reference-doc)"],
["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"]
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Mixed datatypes
If a property contains both vector and non-vector values, the similarity functions return null for non-vector bindings:
{
"@graph": [
{"@id": "ex:a", "ex:data": {"@value": [0.6, 0.5], "@type": "@vector"}},
{"@id": "ex:b", "ex:data": "Not a vector"}
]
}
Querying with dotProduct on ?data will return a numeric score for ex:a and null for ex:b.
SPARQL support
Inline vector similarity functions (dotProduct, cosineSimilarity, euclideanDistance) are available in both JSON-LD Query and SPARQL. In SPARQL, use them as built-in function calls within BIND expressions:
dotProduct (SPARQL)
PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?doc ?score
WHERE {
VALUES ?queryVec { "[0.7, 0.6]"^^f:embeddingVector }
?doc ex:embedding ?vec ;
ex:title ?title .
BIND(dotProduct(?vec, ?queryVec) AS ?score)
}
ORDER BY DESC(?score)
LIMIT 10
cosineSimilarity (SPARQL)
PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?doc ?score
WHERE {
VALUES ?queryVec { "[0.88, 0.12, 0.08]"^^f:embeddingVector }
?doc a ex:Article ;
ex:embedding ?vec ;
ex:title ?title .
BIND(cosineSimilarity(?vec, ?queryVec) AS ?score)
FILTER(?score > 0.5)
}
ORDER BY DESC(?score)
LIMIT 5
euclideanDistance (SPARQL)
PREFIX ex: <http://example.org/ns/>
PREFIX f: <https://ns.flur.ee/db#>
SELECT ?doc ?distance
WHERE {
VALUES ?queryVec { "[0.7, 0.6]"^^f:embeddingVector }
?doc ex:embedding ?vec .
BIND(euclideanDistance(?vec, ?queryVec) AS ?distance)
}
ORDER BY ?distance
LIMIT 10
Vector literals in SPARQL
In SPARQL, vectors are passed as JSON array strings with the ^^f:embeddingVector typed literal syntax:
VALUES ?queryVec { "[0.1, 0.2, 0.3]"^^f:embeddingVector }
Or with the full IRI:
VALUES ?queryVec { "[0.1, 0.2, 0.3]"^^<https://ns.flur.ee/db#embeddingVector> }
Function name variants
Function names are case-insensitive in SPARQL. All of these are equivalent:
dotProduct,DOTPRODUCT,dot_productcosineSimilarity,COSINESIMILARITY,cosine_similarityeuclideanDistance,EUCLIDEANDISTANCE,euclidean_distance
HNSW Vector Indexes
For large-scale similarity search, Fluree provides dedicated HNSW (Hierarchical Navigable Small World) vector indexes. These are approximate nearest-neighbor (ANN) indexes that trade exact results for dramatically faster query times on large datasets.
Vector indexes are implemented using embedded usearch following the same architecture as BM25:
- Embedded in-process HNSW indexes (no external service required)
- Remote mode via dedicated search service (
fluree-search-httpd) - Snapshot-based persistence with watermarks
- Incremental sync for efficient updates
- Feature-gated via
vectorfeature flag
v1 limitation: HNSW vector search is head-only. Time-travel queries (e.g. @t:) are not supported.
Creating Vector Indexes
HTTP/Docker users: there is no HTTP endpoint for creating vector indexes today. Index creation is Rust-API-only. To use HNSW vector search from an HTTP-only deployment, create the index using a Rust program (or the Rust API embedded in your application) against the same storage path your Fluree server reads, then run queries normally via
POST /v1/fluree/query.
Rust API
#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, VectorCreateConfig};
use fluree_db_query::vector::DistanceMetric;
let fluree = FlureeBuilder::memory().build_memory();
// Create indexing query to select documents with embeddings
let indexing_query = json!({
"@context": { "ex": "http://example.org/" },
"where": [{ "@id": "?x", "@type": "ex:Document" }],
"select": { "?x": ["@id", "ex:embedding"] }
});
// Create vector index
let config = VectorCreateConfig::new(
"doc-embeddings", // index name
"mydb:main", // source ledger
indexing_query, // what to index
"ex:embedding", // embedding property
768 // dimensions
)
.with_metric(DistanceMetric::Cosine);
let result = fluree.create_vector_index(config).await?;
println!("Indexed {} vectors", result.vector_count);
}
Configuration Options
| Option | Description | Default |
|---|---|---|
name | Index name (creates graph source ID name:branch) | Required |
ledger | Source ledger ID (name:branch) | Required |
query | JSON-LD query selecting documents | Required |
embedding_property | Property containing embeddings | Required |
dimensions | Vector dimensions | Required |
metric | Distance metric (Cosine, Dot, Euclidean) | Cosine |
connectivity | HNSW M parameter | 16 |
expansion_add | efConstruction parameter | 128 |
expansion_search | efSearch parameter | 64 |
Query Syntax
Vector index search uses the f:* pattern syntax in WHERE clauses:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "mydb:main",
"where": [
{
"f:graphSource": "doc-embeddings:main",
"f:queryVector": [0.1, 0.2, 0.3],
"f:distanceMetric": "cosine",
"f:searchLimit": 10,
"f:searchResult": {
"f:resultId": "?doc",
"f:resultScore": "?score"
}
}
],
"select": ["?doc", "?score"]
}
Query Parameters
| Parameter | Description | Required |
|---|---|---|
f:graphSource | Vector index alias | Yes |
f:queryVector | Query vector (array or variable) | Yes |
f:distanceMetric | Distance metric (“cosine”, “dot”, “euclidean”) | No (uses index default) |
f:searchLimit | Maximum results | No |
f:searchResult | Result binding (variable or object) | Yes |
f:syncBeforeQuery | Wait for index sync before query | No (default: false) |
f:timeoutMs | Query timeout in ms | No |
Result Binding
Simple variable binding:
"f:searchResult": "?doc"
Structured binding with score and ledger:
"f:searchResult": {
"f:resultId": "?doc",
"f:resultScore": "?similarity",
"f:resultLedger": "?source"
}
Variable Query Vectors
Query vector can be a variable bound earlier:
{
"where": [
{ "@id": "ex:reference-doc", "ex:embedding": "?queryVec" },
{
"f:graphSource": "embeddings:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 5,
"f:searchResult": "?similar"
}
]
}
Index Maintenance
Sync Updates
After committing new data, sync the vector index:
#![allow(unused)]
fn main() {
let sync_result = fluree.sync_vector_index("doc-embeddings:main").await?;
println!("Upserted: {}, Removed: {}", sync_result.upserted, sync_result.removed);
}
Full Resync
Rebuild the entire index from scratch:
#![allow(unused)]
fn main() {
let resync_result = fluree.resync_vector_index("doc-embeddings:main").await?;
}
Check Staleness
#![allow(unused)]
fn main() {
let check = fluree.check_vector_staleness("doc-embeddings:main").await?;
if check.is_stale {
println!("Index is {} commits behind", check.commits_behind);
}
}
Drop Index
#![allow(unused)]
fn main() {
fluree.drop_vector_index("doc-embeddings:main").await?;
}
Distance Metrics
Cosine (Default)
Measures angle between vectors. Best for:
- Text embeddings (e.g., sentence transformers)
- Normalized vectors
- When magnitude doesn’t matter
Score range: [-1, 1] (1 = identical, 0 = orthogonal, -1 = opposite)
For unit-normalized vectors, cosine similarity equals dot product. Fluree’s SIMD kernels exploit this for faster computation when vectors are pre-normalized.
Dot Product
Measures alignment and magnitude. Best for:
- Maximum inner product search (MIPS)
- When vector magnitude encodes importance
Score range: (-inf, +inf)
Euclidean (L2)
Measures straight-line distance. Best for:
- Geometric similarity
- Image feature vectors
- When absolute position matters
Raw score range: [0, +inf). In HNSW index results, normalized to (0, 1] via 1 / (1 + distance).
Note: In HNSW index results (f:* queries), all metrics are normalized to “higher is better”. In inline similarity functions, euclideanDistance returns the raw L2 distance (lower = more similar).
Deployment Modes
Vector indexes support two deployment topologies: searching in-process (embedded) or via a dedicated fluree-search-httpd service that mounts the same storage. Both topologies use identical distance-metric computation, score normalization, and snapshot serialization, so results are identical.
Embedded Mode (Default)
The vector index is loaded and searched within the same process as the Fluree server. No additional services. This is the default and is appropriate for most deployments.
Dedicated Search Service
For large indexes or when you want search traffic isolated from the main Fluree process, run the standalone fluree-search-httpd binary on the same storage volume and have your application send vector requests directly to it.
Note: Today, vector search is invoked from a Fluree query (the
f:graphSource/f:queryVectorpattern) using the embedded path — the main Fluree server does not yet route those queries to a remote service. The dedicated service is reachable directly via its ownPOST /v1/searchAPI (the same protocol BM25 uses), which is suitable for applications that issue vector queries outside of a Fluree query context. Transparent delegation from inside a Fluree query is a planned follow-up; the wiring is in place but the deployment config is not yet persisted bycreate_vector_index.
See Remote Search Service for fluree-search-httpd configuration, env vars, the request/response protocol (vector and vector_similar_to query kinds), and Docker deployment.
Performance and Scaling
The importance of binary indexing
Fluree’s binary columnar index dramatically accelerates vector queries. Queries against novelty-only (unindexed) data perform a linear scan through the in-memory commit log, while indexed queries read pre-sorted, cache-friendly columnar data. Ensure background indexing is running for production workloads – the difference is substantial.
The following benchmarks use 768-dimensional vectors (typical for transformer embeddings like sentence-transformers or OpenAI text-embedding-3-small) on Apple M-series hardware:
Novelty-only (no binary index)
| Scenario | Vectors | Query time | Throughput |
|---|---|---|---|
| Scan all | 1,000 | 9.9 ms | ~101K vec/s |
| Scan all | 5,000 | 45.1 ms | ~111K vec/s |
| Filtered + score | 1,000 (75 pass filter) | 13.5 ms | ~5.5K vec/s |
| Filtered + score | 5,000 (402 pass filter) | 62.1 ms | ~6.5K vec/s |
With binary index
| Scenario | Vectors | Query time | Throughput | Speedup vs novelty |
|---|---|---|---|---|
| Scan all | 1,000 | 1.68 ms | ~595K vec/s | 5.9x |
| Scan all | 5,000 | 7.69 ms | ~650K vec/s | 5.9x |
| Filtered + score | 1,000 (75 pass filter) | 533 us | ~141K vec/s | 25x |
| Filtered + score | 5,000 (402 pass filter) | 2.40 ms | ~168K vec/s | 26x |
Key takeaways:
- Unfiltered scans are ~6x faster with the binary index
- Filtered queries (where graph patterns reduce the candidate set before scoring) are ~25x faster – the index enables efficient predicate-first access that avoids loading irrelevant vectors entirely
- At 5,000 vectors, a filtered indexed query completes in 2.4 ms – well within interactive latency budgets
Inline similarity functions (flat scan)
- Best for: Small to medium datasets, ad-hoc similarity queries, prototyping
- Complexity: O(n) linear scan – computes similarity against every matching vector
- Advantage: No index setup required, works immediately after insert
- SIMD acceleration: Fluree uses runtime-detected SIMD kernels (SSE2/AVX on x86_64, NEON on ARM) for vectorized dot/cosine/L2 computation
- Normalized embedding optimization: For unit-normalized vectors (most transformer embeddings), cosine similarity reduces to a dot product, avoiding magnitude computation entirely
When to consider HNSW
Inline similarity functions perform a brute-force scan over all candidate vectors. This scales linearly and remains fast for moderate datasets, but at larger scales an HNSW index provides O(log n) approximate nearest-neighbor search.
Rule of thumb:
| Vector count (per property) | Recommendation |
|---|---|
| < 100K | Flat scan works well, especially with binary indexing. Sub-100ms queries typical. |
| 100K – 1M | Start evaluating HNSW. Flat scan may still be acceptable depending on latency target and hardware, but HNSW will provide more consistent low-latency results. |
| 1M – 10M | HNSW strongly recommended for interactive latency. Flat scan can work if vectors are memory-resident and you can tolerate ~1-2 second queries. |
| > 10M | HNSW (or other ANN index) is the default recommendation. Flat scan becomes I/O- and cache-bound for low-latency use cases. |
Factors that shift the crossover:
- Hardware: Fast NVMe / large RAM pushes the threshold higher; object storage (S3) pulls it lower
- Latency target: A 50 ms budget favors HNSW earlier than a 2-second budget
- Filter selectivity: If graph patterns reduce candidates to a small fraction before scoring, flat scan remains viable at higher counts
- Normalized embeddings: Cosine-as-dot-product is faster, pushing the threshold higher
- Binary indexing: An indexed dataset scans ~6x faster than novelty-only, effectively raising the flat-scan ceiling
HNSW vector indexes
- Best for: Large datasets (100K+ vectors), production similarity search with strict latency requirements
- Complexity: O(log n) approximate nearest neighbor
- Space: ~1.5x embedding size + IRI mapping overhead
- Updates: Incremental via affected-subject tracking
Tuning parameters
| Parameter | Effect | Trade-off |
|---|---|---|
connectivity (M) | Graph connectivity | Higher = better recall, more memory |
expansion_add (efConstruction) | Build-time search width | Higher = better index quality, slower build |
expansion_search (efSearch) | Query-time search width | Higher = better recall, slower queries |
Feature Flag
The HNSW vector index functionality requires the vector feature:
[dependencies]
fluree-db-api = { version = "0.1", features = ["vector"] }
Inline similarity functions (dotProduct, cosineSimilarity, euclideanDistance) and the @vector datatype are available without feature flags.
Complete Example: Semantic Search
1. Insert documents with embeddings:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"@graph": [
{
"@id": "ex:doc1",
"@type": "ex:Article",
"ex:title": "Introduction to Machine Learning",
"ex:embedding": {"@value": [0.9, 0.1, 0.05], "@type": "@vector"}
},
{
"@id": "ex:doc2",
"@type": "ex:Article",
"ex:title": "Database Design Patterns",
"ex:embedding": {"@value": [0.1, 0.8, 0.1], "@type": "@vector"}
},
{
"@id": "ex:doc3",
"@type": "ex:Article",
"ex:title": "Neural Network Architectures",
"ex:embedding": {"@value": [0.85, 0.15, 0.1], "@type": "@vector"}
}
]
}
2. Query – find articles similar to a “machine learning” embedding:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"select": ["?title", "?score"],
"values": [
["?queryVec"],
[{"@value": [0.88, 0.12, 0.08], "@type": "f:embeddingVector"}]
],
"where": [
{"@id": "?doc", "@type": "ex:Article", "ex:title": "?title", "ex:embedding": "?vec"},
["bind", "?score", "(cosineSimilarity ?vec ?queryVec)"]
],
"orderBy": [["desc", "?score"]],
"limit": 5
}
Expected results (ordered by similarity):
- “Introduction to Machine Learning” – highest cosine similarity
- “Neural Network Architectures” – similar domain
- “Database Design Patterns” – different domain, lower score
Related Documentation
- Datatypes and Typed Values - All supported datatypes including
@vector - JSON-LD Query - Full query language reference
- BM25 - Full-text search
- Background Indexing - Core indexing
- Graph Sources - Graph source concepts
Geospatial Data
Fluree provides native support for geographic point data using the OGC GeoSPARQL standard. POINT geometries from geo:wktLiteral values are stored in an optimized binary format enabling efficient storage and index-accelerated proximity queries.
Status
Geospatial support is implemented with:
- Inline GeoPoint encoding: POINT geometries stored as packed 60-bit lat/lng values
- Automatic detection:
geo:wktLiteralPOINT values automatically converted to native format - Full round-trip: GeoPoints preserved through commit, index, and query paths
- ~0.3mm precision: 30-bit encoding per coordinate provides sub-millimeter accuracy
- Index-accelerated proximity queries: POST latitude-band scans with haversine post-filtering
- Time travel support: Point-in-time geo queries via
from: "<ledger>@t:<t>"(see examples below)
Non-POINT geometries (polygons, linestrings, multipolygons, etc.) are indexed using a separate S2 cell-based spatial index that enables efficient containment and intersection queries.
Storing Geographic Data
WKT Literal Format
Geographic data uses the Well-Known Text (WKT) format with the geo:wktLiteral datatype:
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"@graph": [
{
"@id": "ex:eiffel-tower",
"@type": "ex:Landmark",
"ex:name": "Eiffel Tower",
"ex:location": {
"@value": "POINT(2.2945 48.8584)",
"@type": "geo:wktLiteral"
}
}
]
}
Important: WKT uses POINT(longitude latitude) order (X, Y), which is the opposite of common lat/lng conventions.
Coordinate Order
| Format | Order | Example |
|---|---|---|
| WKT | longitude, latitude | POINT(2.2945 48.8584) |
| Common conventions | latitude, longitude | 48.8584, 2.2945 |
Fluree handles the conversion internally, storing coordinates in latitude-primary order for efficient latitude-band index scans.
Valid POINT Syntax
Fluree recognizes these POINT formats:
POINT(2.2945 48.8584) # Standard 2D point
POINT( 2.2945 48.8584 ) # Whitespace is flexible
POINT(-122.4194 37.7749) # Negative coordinates (San Francisco)
The following are not supported for native GeoPoint storage (stored as strings instead):
POINT EMPTY # Empty point
POINT Z(2.2945 48.8584 100) # 3D point with altitude
POINT M(2.2945 48.8584 1.0) # Point with measure
POINT ZM(2.2945 48.8584 100 1) # 3D point with measure
<http://...>POINT(...) # SRID prefix
point(2.2945 48.8584) # Lowercase (case-sensitive)
Coordinate Validation
Coordinates must be within valid ranges:
- Latitude: -90.0 to 90.0 (degrees)
- Longitude: -180.0 to 180.0 (degrees)
- Finite values only: NaN and infinity are rejected
Invalid coordinates cause the value to be stored as a plain string rather than a native GeoPoint.
Querying Geographic Data
Basic Retrieval
GeoPoints are returned in WKT format in query results:
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"from": "places:main",
"where": [
{ "@id": "?place", "@type": "ex:Landmark" },
{ "@id": "?place", "ex:location": "?loc" }
],
"select": ["?place", "?loc"]
}
Result:
[
["ex:eiffel-tower", "POINT(2.2945 48.8584)"]
]
SPARQL Queries
PREFIX ex: <http://example.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT ?place ?location
WHERE {
?place a ex:Landmark ;
ex:location ?location .
}
Output Formats
GeoPoints appear differently based on output format:
JSON-LD (default):
{
"@id": "ex:eiffel-tower",
"ex:location": {
"@value": "POINT(2.2945 48.8584)",
"@type": "geo:wktLiteral"
}
}
SPARQL JSON:
{
"type": "literal",
"value": "POINT(2.2945 48.8584)",
"datatype": "http://www.opengis.net/ont/geosparql#wktLiteral"
}
Typed JSON:
{
"@value": "POINT(2.2945 48.8584)",
"@type": "geo:wktLiteral"
}
Storage Encoding
Binary Format
GeoPoints are stored using a compact 60-bit encoding:
- Upper 30 bits: Latitude scaled from [-90, 90] to [0, 2^30-1]
- Lower 30 bits: Longitude scaled from [-180, 180] to [0, 2^30-1]
This provides:
- 8 bytes total storage per point (vs ~25+ bytes for WKT string)
- ~0.3mm precision at the equator
- Ordered encoding enabling efficient range scans by latitude band
Index Structure
GeoPoints use ObjKind::GEO_POINT (0x14) in the binary index:
| Component | Encoding |
|---|---|
| Object kind | 1 byte (0x14) |
| Object key | 8 bytes (packed lat/lng) |
The latitude-primary encoding enables POST index scans that efficiently retrieve all points within a latitude band.
Distance Queries
Fluree supports the geof:distance function (OGC GeoSPARQL) for calculating haversine distances between geographic points.
geof:distance Function
Calculate the distance between two points in meters:
JSON-LD Query (bind + filter):
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"from": "places:main",
"where": [
{ "@id": "?place", "ex:location": "?loc" },
{ "@id": "ex:paris", "ex:location": "?parisLoc" },
["bind", "?distance", "(geof:distance ?loc ?parisLoc)"],
["filter", "(< ?distance 500000)"]
],
"select": ["?place", "?distance"]
}
SPARQL:
PREFIX ex: <http://example.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?place ?distance
WHERE {
?place ex:location ?loc .
ex:paris ex:location ?parisLoc .
BIND(geof:distance(?loc, ?parisLoc) AS ?distance)
FILTER(?distance < 500000)
}
ORDER BY ?distance
Function aliases: geof:distance, geo_distance, geodistance
Arguments:
- Two GeoPoint values (stored as
geo:wktLiteralPOINT) - Or two WKT POINT strings
Returns: Distance in meters (Double)
Calculation: Uses the haversine formula with Earth’s mean radius (6,371 km), accurate to within 0.3% for typical distances.
Proximity Search
Fluree supports index-accelerated proximity queries that find points within a given distance of a center point.
Index-Accelerated Point Proximity
Use a geof:distance bind + filter pattern to run an accelerated proximity search over inline GeoPoints. This pattern works identically in both JSON-LD and SPARQL queries — the query optimizer detects the Triple + Bind(geof:distance) + Filter combination and rewrites it into an index-accelerated scan.
JSON-LD Query (find restaurants within 5km, include distance, limit to 10):
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"from": "places:main",
"where": [
{ "@id": "?place", "@type": "ex:Restaurant" },
{ "@id": "?place", "ex:location": "?loc" },
["bind", "?distance", "(geof:distance ?loc \"POINT(2.35 48.85)\")"],
["filter", "(<= ?distance 5000)"]
],
"select": ["?place", "?distance"],
"orderBy": ["?distance"],
"limit": 10
}
SPARQL (same pattern, same acceleration):
PREFIX ex: <http://example.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?station ?distance
WHERE {
?station a ex:GasStation ;
ex:location ?loc .
BIND(geof:distance(?loc, "POINT(2.35 48.85)"^^geo:wktLiteral) AS ?distance)
FILTER(?distance < 10000)
}
ORDER BY ?distance
LIMIT 10
How Index Acceleration Works
- Latitude-band scan: The query planner converts the radius to latitude bounds and scans only points in
[lat - δ, lat + δ] - Haversine post-filter: Results are filtered by exact haversine distance to eliminate false positives
- Distance sorting: Results can be sorted by distance for k-nearest-neighbor queries
Performance characteristics:
- Uses POST index with latitude-primary encoding
- Scans only relevant latitude band (not full table scan)
- False positive rate: 22-70% depending on latitude and radius (eliminated by post-filter)
- Handles antimeridian crossing with multiple range scans
Time Travel Support
Point proximity queries support time travel via the from ledger selector.
JSON-LD with time travel:
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"from": "places:main@t:100",
"where": [
{ "@id": "?place", "ex:location": "?loc" },
["bind", "?dist", "(geof:distance ?loc \"POINT(2.35 48.85)\")"],
["filter", "(<= ?dist 5000)"]
],
"select": ["?place"]
}
SPARQL with time travel:
PREFIX ex: <http://example.org/>
PREFIX fluree: <https://ns.flur.ee/ledger#>
SELECT ?place ?loc
FROM <ledger:places:main?t=100>
WHERE {
?place ex:location ?loc .
}
Time travel correctly handles:
- Points that existed at time
tbut were later retracted - Points added after time
t(excluded from results) - Overlay novelty merging for recent uncommitted data
Graph Scoping
Point proximity queries respect graph context. When used inside a GRAPH pattern, the query scans only the specified named graph:
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"from": "world:main",
"where": [
["graph", "http://example.org/france", [
{ "@id": "?city", "ex:location": "?loc" },
["bind", "?dist", "(geof:distance ?loc \"POINT(2.35 48.85)\")"],
["filter", "(<= ?dist 50000)"]
]]
],
"select": ["?city"]
}
This returns only cities from the France graph within 50km of Paris, not cities from other named graphs.
S2 Spatial Index (Complex Geometries)
Fluree provides an S2 cell-based spatial index for complex geometries (polygons, linestrings, multipolygons). This index enables efficient spatial predicate queries like “find all places within this region” or “find all regions that contain this point.”
Supported Operations
| Operation | Description | Use Case |
|---|---|---|
within | Find geometries that are completely inside a query geometry | “Find all buildings within this city boundary” |
contains | Find geometries that completely contain a query geometry | “Find the district that contains this point” |
intersects | Find geometries that overlap with a query geometry | “Find all parcels that touch this proposed road” |
nearby | Find geometries within a radius (with distances) | “Find polygons within 10km of this point” |
Query Syntax
JSON-LD Query (find places within a polygon):
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#",
"idx": "https://ns.flur.ee/index#"
},
"from": "places:main",
"where": [
{
"idx:spatial": "within",
"idx:property": "ex:boundary",
"idx:geometry": "POLYGON((2.0 48.0, 3.0 48.0, 3.0 49.0, 2.0 49.0, 2.0 48.0))",
"idx:result": "?place"
}
],
"select": ["?place"]
}
Find regions containing a point:
{
"where": [
{
"idx:spatial": "contains",
"idx:property": "ex:boundary",
"idx:geometry": "POINT(2.35 48.85)",
"idx:result": "?district"
}
],
"select": ["?district"]
}
Find intersecting parcels:
{
"where": [
{
"idx:spatial": "intersects",
"idx:property": "ex:parcel",
"idx:geometry": "LINESTRING(2.0 48.0, 3.0 49.0)",
"idx:result": "?parcel"
}
],
"select": ["?parcel"]
}
Find polygons near a point (with distances):
{
"@context": {
"ex": "http://example.org/",
"idx": "https://ns.flur.ee/index#"
},
"from": "places:main",
"where": [
{
"idx:spatial": "nearby",
"idx:property": "ex:boundary",
"idx:geometry": "POINT(2.35 48.85)",
"idx:radius": 10000,
"idx:result": {
"idx:id": "?region",
"idx:distance": "?dist"
}
}
],
"select": ["?region", "?dist"],
"orderBy": ["?dist"]
}
How It Works
The S2 spatial index uses Google’s S2 geometry library to map geometries to hierarchical cells on a sphere:
-
Ingestion: When a
geo:wktLiteralpolygon/linestring is committed, the indexer generates an S2 cell covering and stores cell entries in the spatial index. -
Query: When you query with a spatial predicate, the system:
- Generates an S2 covering for your query geometry
- Scans the index for matching cell ranges
- Applies bounding-box prefiltering
- Performs exact geometry tests on candidates
-
Time-Travel: The index supports full time-travel semantics, so you can query spatial data at any historical point in time.
Index Configuration
The S2 index is automatically created for predicates with geo:wktLiteral values. Configuration options:
| Parameter | Default | Description |
|---|---|---|
min_level | 4 | Minimum S2 cell level (coarser = faster build) |
max_level | 16 | Maximum S2 cell level (finer = tighter coverage) |
max_cells | 8 | Maximum cells per geometry covering |
Higher max_cells values produce tighter coverings (fewer false positives) but increase index size and build time.
Performance Characteristics
Performance depends on data distribution, covering configuration, and result selectivity. See Spatial Index Design for design rationale; a benchmark suite is recommended for deployment-specific measurements.
Supported Geometry Types
| Geometry Type | S2 Index | Notes |
|---|---|---|
| POLYGON | ✅ Yes | Most common for region queries |
| MULTIPOLYGON | ✅ Yes | Multiple disjoint regions |
| LINESTRING | ✅ Yes | Routes, boundaries |
| MULTILINESTRING | ✅ Yes | Multiple line segments |
| POINT | ⚠️ Optional | Use inline GeoPoint for proximity; S2 available with index_points=true |
| GEOMETRYCOLLECTION | ✅ Yes | Mixed geometry types |
Graph Scoping
Spatial indexes are scoped by named graph. Each graph has its own spatial index, and queries automatically use the correct index based on the graph context.
Default graph query:
{
"from": "mydb:main",
"where": [
{
"idx:spatial": "within",
"idx:property": "ex:boundary",
"idx:geometry": "POLYGON(...)",
"idx:result": "?region"
}
]
}
Named graph query (using GRAPH pattern):
{
"from": "mydb:main",
"where": [
["graph", "http://example.org/regions",
{
"idx:spatial": "within",
"idx:property": "ex:boundary",
"idx:geometry": "POLYGON(...)",
"idx:result": "?region"
}
]
]
}
When you enter a GRAPH pattern, the spatial query automatically switches to that graph’s index. This ensures results are correctly scoped—a spatial query inside GRAPH <http://example.org/france> only searches geometries in the France graph, not geometries from other named graphs.
Multiple named graphs:
If you have data across multiple named graphs (e.g., countries), you can query each independently:
{
"from": "world:main",
"where": [
["graph", "http://example.org/germany",
{
"idx:spatial": "within",
"idx:property": "ex:boundary",
"idx:geometry": "POLYGON(...)",
"idx:result": "?germanCity"
}
]
]
}
The same idx:property (e.g., ex:boundary) in different named graphs will query separate spatial indexes.
Time-Travel Support
Spatial queries support time travel via the from ledger selector:
{
"from": "places:main@t:100",
"where": [
{
"idx:spatial": "within",
"idx:property": "ex:boundary",
"idx:geometry": "POLYGON(...)",
"idx:result": "?place"
}
],
"select": ["?place"]
}
This returns places as they existed at transaction time 100, correctly handling:
- Geometries added after t=100 (excluded)
- Geometries retracted before t=100 (excluded)
- Geometries modified between t=100 and now
Note: Time travel requires t >= index.base_t. Queries for times before the index was built will return an error.
Note (v1): The historical-view API (query_historical) does not execute spatial index patterns. Use a time-pinned from selector (as above) against the current ledger state for spatial time travel.
Choosing Between Point Proximity and S2 Spatial Queries
Fluree provides two spatial query paths. Use this guide to pick the right one:
| Use Case | Approach | Reason |
|---|---|---|
| “Find restaurants near me” | geof:distance bind+filter | POINT proximity with distance ranking |
| “Find cities within 100km” | geof:distance bind+filter | POINT data with radius filter |
| “Find buildings in this district” | idx:spatial (within) | POLYGONs inside a boundary |
| “Which zone contains this address?” | idx:spatial (contains) | POLYGON containment test |
| “Find parcels crossing this road” | idx:spatial (intersects) | LINESTRING intersection |
| “Find regions near this location” | idx:spatial (nearby) | POLYGONs with distance from point |
Quick rule: Use geof:distance bind+filter for POINT locations with radius queries. Use idx:spatial for polygon/linestring containment, intersection, or region-based queries.
End-to-End Example: Points and Polygons
This example shows storing both POINT locations and POLYGON boundaries, then querying each appropriately.
1. Insert data with both geometry types:
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"@graph": [
{
"@id": "ex:central-paris",
"@type": "ex:District",
"ex:name": "Central Paris",
"ex:boundary": {
"@value": "POLYGON((2.3 48.8, 2.4 48.8, 2.4 48.9, 2.3 48.9, 2.3 48.8))",
"@type": "geo:wktLiteral"
}
},
{
"@id": "ex:eiffel-tower",
"@type": "ex:Landmark",
"ex:name": "Eiffel Tower",
"ex:location": {
"@value": "POINT(2.2945 48.8584)",
"@type": "geo:wktLiteral"
}
},
{
"@id": "ex:louvre",
"@type": "ex:Landmark",
"ex:name": "Louvre Museum",
"ex:location": {
"@value": "POINT(2.3376 48.8606)",
"@type": "geo:wktLiteral"
}
}
]
}
2. Find landmarks near Eiffel Tower (POINT proximity):
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"from": "places:main",
"where": [
{ "@id": "?place", "ex:location": "?loc" },
["bind", "?dist", "(geof:distance ?loc \"POINT(2.2945 48.8584)\")"],
["filter", "(<= ?dist 5000)"],
{ "@id": "?place", "ex:name": "?name" }
],
"select": ["?name", "?dist"],
"orderBy": ["?dist"]
}
3. Find which district contains the Louvre (POLYGON containment):
{
"@context": {
"ex": "http://example.org/",
"idx": "https://ns.flur.ee/index#"
},
"from": "places:main",
"where": [
{
"idx:spatial": "contains",
"idx:property": "ex:boundary",
"idx:geometry": "POINT(2.3376 48.8606)",
"idx:result": "?district"
},
{ "@id": "?district", "ex:name": "?name" }
],
"select": ["?name"]
}
MULTIPOLYGON Example
Store regions with multiple disjoint areas (e.g., archipelagos, non-contiguous territories):
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"@id": "ex:hawaii",
"@type": "ex:State",
"ex:name": "Hawaii",
"ex:territory": {
"@value": "MULTIPOLYGON(((-160 22, -159 22, -159 21, -160 21, -160 22)), ((-156 20, -155 20, -155 19, -156 19, -156 20)))",
"@type": "geo:wktLiteral"
}
}
Query: “Find states that contain this coordinate”
{
"where": [
{
"idx:spatial": "contains",
"idx:property": "ex:territory",
"idx:geometry": "POINT(-155.5 19.5)",
"idx:result": "?state"
}
],
"select": ["?state"]
}
LINESTRING Example
Store routes, roads, or boundaries:
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"@id": "ex:route-66",
"@type": "ex:Highway",
"ex:name": "Route 66",
"ex:path": {
"@value": "LINESTRING(-118.2 34.1, -112.0 35.2, -106.6 35.1, -97.5 35.5, -90.2 38.6, -87.6 41.9)",
"@type": "geo:wktLiteral"
}
}
Query: “Find highways that cross this region”
{
"where": [
{
"idx:spatial": "intersects",
"idx:property": "ex:path",
"idx:geometry": "POLYGON((-100 34, -95 34, -95 37, -100 37, -100 34))",
"idx:result": "?highway"
}
],
"select": ["?highway"]
}
Planned Capabilities
R-tree Index
An ephemeral R-tree is planned for:
- Spatial joins between datasets
- Range queries across multiple properties
Examples
Storing Multiple Locations
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"@graph": [
{
"@id": "ex:paris",
"@type": "ex:City",
"ex:name": "Paris",
"ex:center": { "@value": "POINT(2.3522 48.8566)", "@type": "geo:wktLiteral" }
},
{
"@id": "ex:london",
"@type": "ex:City",
"ex:name": "London",
"ex:center": { "@value": "POINT(-0.1278 51.5074)", "@type": "geo:wktLiteral" }
},
{
"@id": "ex:tokyo",
"@type": "ex:City",
"ex:name": "Tokyo",
"ex:center": { "@value": "POINT(139.6917 35.6895)", "@type": "geo:wktLiteral" }
}
]
}
Turtle Format
@prefix ex: <http://example.org/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
ex:sensor-1 a ex:WeatherStation ;
ex:name "Central Park Station" ;
ex:location "POINT(-73.9654 40.7829)"^^geo:wktLiteral .
ex:sensor-2 a ex:WeatherStation ;
ex:name "Times Square Station" ;
ex:location "POINT(-73.9855 40.7580)"^^geo:wktLiteral .
Mixed Geometry Types
Non-POINT geometries are stored as strings:
{
"@context": {
"ex": "http://example.org/",
"geo": "http://www.opengis.net/ont/geosparql#"
},
"@graph": [
{
"@id": "ex:central-park",
"@type": "ex:Park",
"ex:name": "Central Park",
"ex:entrance": {
"@value": "POINT(-73.9654 40.7829)",
"@type": "geo:wktLiteral"
},
"ex:boundary": {
"@value": "POLYGON((-73.9819 40.7681, -73.9580 40.8006, -73.9493 40.7969, -73.9732 40.7644, -73.9819 40.7681))",
"@type": "geo:wktLiteral"
}
}
]
}
The ex:entrance POINT is stored as a native GeoPoint, while the ex:boundary POLYGON is stored as a string.
GeoSPARQL-related support (v1)
Fluree supports the GeoSPARQL geo:wktLiteral datatype and geof:distance function. Point proximity queries use a unified geof:distance bind+filter pattern in both JSON-LD and SPARQL. For complex geometry queries (within/contains/intersects/nearby), use the JSON-LD idx:spatial pattern described above.
| Feature | Status |
|---|---|
geo:wktLiteral datatype | ✅ Supported |
| POINT geometry | ✅ Native encoding (60-bit packed) |
| LINESTRING geometry | ✅ S2 spatial index |
| POLYGON geometry | ✅ S2 spatial index |
| MULTIPOLYGON geometry | ✅ S2 spatial index |
geo:asWKT property | ✅ Use any property with wktLiteral type |
geof:distance function | ✅ Supported (haversine, ~0.3% accuracy) |
| Proximity queries (radius) | ✅ Index-accelerated via geof:distance bind+filter |
| Time travel | ✅ Support via from: "<ledger>@t:<t>" |
| k-NN queries (nearest K) | ✅ Via ORDER BY distance + LIMIT |
within spatial predicate | ✅ Via JSON-LD idx:spatial |
contains spatial predicate | ✅ Via JSON-LD idx:spatial |
intersects spatial predicate | ✅ Via JSON-LD idx:spatial |
| Spatial join (two variables) | 🔜 Planned (R-tree) |
Best Practices
Use geo:wktLiteral for All Geometry
Always declare the datatype explicitly:
// Correct
{ "@value": "POINT(2.3522 48.8566)", "@type": "geo:wktLiteral" }
// Incorrect - stored as plain string
{ "@value": "POINT(2.3522 48.8566)" }
Coordinate Precision
While Fluree stores ~0.3mm precision, consider your source data accuracy:
// Excessive precision (GPS typically ±3-5m)
"POINT(2.352219834765 48.856614892341)"
// Appropriate precision for most applications
"POINT(2.3522 48.8566)"
Coordinate Validation
Validate coordinates before insertion:
- Latitude: -90 to 90
- Longitude: -180 to 180
- No NaN or infinity values
Invalid coordinates are stored as strings and won’t benefit from native GeoPoint indexing.
Troubleshooting
Query returns no results
Check the coordinate order. WKT uses POINT(longitude latitude), not POINT(latitude longitude):
// Correct: Paris (lng=2.35, lat=48.86)
"POINT(2.35 48.86)"
// Wrong: coordinates swapped
"POINT(48.86 2.35)"
Check the datatype. Geometry values must use geo:wktLiteral:
// Correct
{ "@value": "POINT(2.35 48.86)", "@type": "geo:wktLiteral" }
// Wrong - no datatype, stored as plain string
{ "@value": "POINT(2.35 48.86)" }
Check the predicate. The property in the triple pattern must match the data exactly:
// If data uses ex:location, the triple must use ex:location
{ "@id": "?place", "ex:location": "?loc" } // Correct
{ "@id": "?place", "ex:geo": "?loc" } // Wrong - different predicate
For S2 spatial queries, idx:property must also match:
"idx:property": "ex:boundary" // Correct
"idx:property": "ex:geo" // Wrong - different predicate
“No spatial index available” error
The spatial index is built asynchronously after commits. If querying immediately after insert:
- Wait for background indexing to complete, or
- Use
from: "<ledger>@t:<t>"to query up to the indexedt
Large polygons cause slow queries
Polygons crossing the antimeridian (±180° longitude) generate many S2 cells. Consider:
- Splitting the polygon at the antimeridian
- Using a simpler bounding region for initial filtering
SPARQL spatial predicates not accelerated
In v1, SPARQL geof:* spatial predicates (like geof:sfWithin) evaluate as filters, not index operators. For accelerated spatial queries on complex geometries, use the JSON-LD idx:spatial pattern instead. Note: geof:distance bind+filter patterns are automatically accelerated in both SPARQL and JSON-LD.
Related Documentation
- Datatypes - Type system overview
- Vector Search - Similarity search
- BM25 - Full-text search
Graph Sources and Integrations
Graph sources extend Fluree’s query capabilities by integrating specialized indexes and external data sources. Graph sources appear as queryable ledgers but are backed by different storage and indexing systems.
Graph Source Types
Overview
Introduction to graph sources:
- What are graph sources
- Architecture and design
- Use cases
- Performance characteristics
- Creating and managing graph sources
Iceberg / Parquet
Apache Iceberg data lake integration:
- Querying Iceberg tables
- Parquet file support
- Schema mapping
- Partition pruning
- Performance optimization
R2RML
Relational database mapping:
- R2RML standard
- Mapping relational data to RDF
- SQL query generation
- Join optimization
- Supported databases (PostgreSQL, MySQL, etc.)
BM25 Graph Source
Full-text search as graph source:
- BM25 index as queryable ledger
- Search predicates
- Combining with structured queries
- Real-time index updates
What are Graph Sources?
Graph sources are queryable data sources that appear as Fluree ledgers but are backed by specialized storage:
Standard Ledger:
mydb:main → RDF triple store → SPOT/POST/OPST/PSOT indexes
Graph Source:
products-search:main → BM25 index → Inverted text index
products-vector:main → HNSW → Vector similarity index
warehouse-data:main → Iceberg → Parquet files
sql-db:main → R2RML → PostgreSQL tables
Query Transparency
Graph sources are queried like regular ledgers:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
]
}
Note: SPARQL queries use the same
f:namespace pattern (f:graphSource,f:searchText, etc.) within JSON-LD query syntax.
Multi-Graph Queries
Combine regular ledgers with graph sources:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"select": ["?product", "?name", "?price", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
},
{ "@id": "?product", "schema:name": "?name" },
{ "@id": "?product", "schema:price": "?price" }
],
"orderBy": ["-?score"]
}
Joins structured data from products:main with search results from the products-search:main graph source.
Graph Source Lifecycle
1. Create Graph Source
Define mapping/configuration:
curl -X POST http://localhost:8090/index/bm25?ledger=mydb:main \
-d '{"name": "products-search", "fields": [...]}'
2. Initial Indexing
Build index from source data:
- Load data from source ledger
- Transform to target format
- Build specialized index
- Publish to nameservice
3. Incremental Updates
Keep synchronized with source:
- Monitor source ledger for changes
- Update graph source incrementally
- Maintain consistency
4. Query Execution
Execute queries against graph source:
- Parse query
- Route to appropriate backend
- Execute specialized query
- Return results
Supported Graph Sources
BM25 Full-Text Search
Purpose: Keyword search with relevance ranking
Backend: Inverted index
Use Cases:
- E-commerce product search
- Document search
- Knowledge base search
Example:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "docs:main",
"where": [
{
"f:graphSource": "docs-search:main",
"f:searchText": "quarterly report",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?doc" }
}
]
}
See BM25 Graph Source and BM25 Indexing.
Vector Similarity Search
Purpose: Semantic search using embeddings
Backend: HNSW index (embedded or remote)
Use Cases:
- Semantic search
- Recommendations
- Image similarity
- Clustering
See Vector Search for details.
Apache Iceberg
Purpose: Query data lake tables
Backend: Apache Iceberg / Parquet files
Use Cases:
- Analytics on historical data
- Data warehouse integration
- Large-scale batch data
Example:
{
"from": "warehouse-sales:main",
"select": ["?date", "?revenue"],
"where": [
{ "@id": "?sale", "warehouse:date": "?date" },
{ "@id": "?sale", "warehouse:revenue": "?revenue" }
],
"filter": "?date >= '2024-01-01'"
}
See Iceberg / Parquet.
R2RML (Relational Databases)
Purpose: Query relational databases as RDF
Backend: SQL databases (PostgreSQL, MySQL, etc.)
Use Cases:
- Existing database integration
- Incremental adoption of graph queries
- Unified queries across systems
Example:
{
"from": "sql-customers:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "schema:email": "?email" }
]
}
See R2RML.
Architecture
Graph Source Registry
Graph sources registered in nameservice:
{
"graph_source_id": "products-search:main",
"type": "bm25",
"source": "products:main",
"backend": "inverted_index",
"status": "ready"
}
Query Routing
Query engine routes to appropriate backend:
Query: FROM <products-search:main>
↓
Nameservice lookup: type=bm25
↓
Route to BM25 query engine
↓
Execute against inverted index
↓
Return results
Result Integration
Results from graph sources join with regular graphs:
FROM <products:main>, <products-search:main>
↓
Execute subquery on products:main → Results A
Execute subquery on products-search:main → Results B
↓
Join Results A + B on ?product
↓
Return combined results
Performance Considerations
Query Planning
Graph sources affect query optimization:
- Specialized indexes enable efficient filtering
- Push filters down to graph source when possible
- Minimize data transfer between graphs
Data Transfer
Minimize data movement:
- Filter in graph source before joining
- Use selective projections
- Leverage graph source’s native capabilities
Caching
Some graph source backends support caching:
- BM25: Results cacheable
- Vector: Similar queries share computation
- Iceberg: Parquet file caching
- R2RML: SQL query plan caching
Best Practices
1. Choose Appropriate Graph Source Type
Match graph source to use case:
- Keyword search → BM25
- Semantic search → Vector
- Analytics → Iceberg
- Relational database integration → R2RML
2. Filter Early
Push filters to graph sources:
Good:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 50,
"f:searchResult": { "f:resultId": "?p" }
},
{ "@id": "?p", "schema:price": "?price" }
],
"filter": "?price < 1000"
}
3. Monitor Graph Source Lag
Check synchronization status:
curl http://localhost:8090/index/status/products-search:main
4. Use Appropriate Limits
Limit results from graph sources:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "query",
"f:searchLimit": 100,
"f:searchResult": { "f:resultId": "?p" }
}
]
}
5. Test Performance
Profile queries combining graph sources:
curl -X POST http://localhost:8090/v1/fluree/explain \
-d '{...}'
Troubleshooting
Graph Source Not Found
{
"error": "GraphSourceNotFound",
"message": "Graph source not found: products-search:main"
}
Solution: Create graph source or check name spelling.
Synchronization Lag
Graph source out of sync with source:
# Check status
curl http://localhost:8090/index/status/products-search:main
# Trigger rebuild
curl -X POST http://localhost:8090/index/rebuild/products-search:main
Poor Performance
Query combining graph sources is slow:
- Check explain plan
- Add filters to reduce result set
- Ensure indexes are up-to-date
- Consider query rewrite
Related Documentation
- Overview - Graph source concepts
- BM25 - Full-text search
- Vector Search - Similarity search
- Iceberg - Data lake integration
- R2RML - Relational mapping
- Query Datasets - Multi-graph queries
Graph Sources Overview
Graph sources enable querying specialized indexes and external data sources using the same query interface as regular Fluree ledgers. This document provides a comprehensive overview of graph source architecture and capabilities.
Concept
A graph source is anything you can address by a graph name/IRI and query as part of a single execution. Some graph sources are ledger-backed RDF graphs; others are backed by different systems optimized for specific query patterns.
Regular Ledger:
- Stored as RDF triples
- Indexed with SPOT, POST, OPST, PSOT
- Optimized for graph traversal
Non-ledger Graph Source:
- Stored in specialized format
- Custom indexing for specific queries
- Optimized for particular use cases
Both are queried using the same SPARQL or JSON-LD Query syntax.
Architecture
Components
┌─────────────────────────────────────────┐
│ Fluree Query Engine │
└─────────────────┬───────────────────────┘
│
┌───────────┴──────────┐
│ │
┌─────▼──────┐ ┌───────▼────────┐
│ Regular │ │ Graph │
│ Ledgers │ │ Sources │
└─────┬──────┘ └───────┬────────┘
│ │
│ ┌───────┴────────┐
│ │ │
┌─────▼──────┐ ┌───▼───┐ ┌─────▼──────┐
│ RDF Triple │ │ BM25 │ │ usearch │
│ Store │ │ Index │ │ Vector │
└────────────┘ └───────┘ └────────────┘
Graph Source Registry (Nameservice)
Non-ledger graph sources are registered in nameservice:
{
"graph_source_id": "products-search:main",
"type": "graph-source",
"backend": "bm25",
"source": "products:main",
"config": {
"fields": [...]
},
"status": "ready",
"last_sync": "2024-01-22T10:30:00Z"
}
Graph Source Types
1. BM25 Full-Text Search
Backend: Inverted text index
Purpose: Keyword search with relevance ranking
Configuration:
{
"type": "bm25",
"source": "products:main",
"fields": [
{ "predicate": "schema:name", "weight": 2.0 },
{ "predicate": "schema:description", "weight": 1.0 }
]
}
Query:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
],
"select": ["?product", "?score"]
}
2. Vector Similarity
Backend: HNSW index (embedded or remote)
Purpose: Semantic search using embeddings
Configuration:
{
"type": "vector",
"source": "products:main",
"embedding_property": "ex:embedding",
"dimensions": 384,
"metric": "cosine"
}
Query:
{
"from": "mydb:main",
"where": [
{
"f:graphSource": "products-vector:main",
"f:queryVector": [0.1, 0.2, ...],
"f:distanceMetric": "cosine",
"f:searchLimit": 10,
"f:searchResult": {
"f:resultId": "?product",
"f:resultScore": "?score"
}
}
],
"select": ["?product", "?score"]
}
3. Apache Iceberg
Backend: Iceberg tables / Parquet files via R2RML mapping
Purpose: Analytics on data lake
Iceberg graph sources require an R2RML mapping that defines how table rows become RDF triples. Two catalog modes select how Iceberg metadata is discovered:
- REST catalog: connects to an Iceberg REST catalog API (e.g., Polaris)
- Direct S3: reads
metadata/version-hint.textfrom the table’s S3 location (no catalog server required)
See Iceberg / Parquet for full configuration details and examples.
Query:
{
"from": "warehouse-orders:main",
"select": ["?orderId", "?total"],
"where": [
{ "@id": "?order", "ex:orderId": "?orderId" },
{ "@id": "?order", "ex:total": "?total" }
]
}
Creating Graph Sources
Via Rust API
Graph sources are created and registered via the fluree-db-api Rust API, which publishes the graph source record into the nameservice.
#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, R2rmlCreateConfig};
let fluree = FlureeBuilder::default().build().await?;
let config = R2rmlCreateConfig::new_direct(
"execution-log",
"s3://bucket/warehouse/logs/execution_log",
"fluree:file://mappings/execution_log.ttl",
)
.with_s3_region("us-east-1");
fluree.create_r2rml_graph_source(config).await?;
}
Querying Graph Sources
Graph sources come in two flavors with different query models:
- Iceberg sources — queried transparently using standard SPARQL/JSON-LD patterns (FROM, GRAPH, or as a direct query target)
- Search indexes (BM25, Vector) — queried using the
f:graphSource/f:searchTextpattern
Iceberg (Transparent)
Iceberg graph sources are queried just like ledgers. No special syntax is needed:
As a direct target:
-- Query the graph source directly
SELECT ?s ?p ?o FROM <execution-log:main> WHERE { ?s ?p ?o } LIMIT 10
Via GRAPH pattern (joining with ledger data):
{
"from": "mydb:main",
"select": ["?customer", "?orderId", "?total"],
"where": [
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "ex:customerId": "?custId" },
{
"graph": "warehouse-orders:main",
"where": [
{ "@id": "?order", "ex:customerId": "?custId" },
{ "@id": "?order", "ex:orderId": "?orderId" },
{ "@id": "?order", "ex:total": "?total" }
]
}
]
}
Iceberg graph sources use R2RML mappings to define how table rows become RDF triples. See Iceberg / Parquet and R2RML for details.
Search Indexes (BM25, Vector)
Search indexes use the f:graphSource pattern:
Single Graph Source
Query one graph source:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
]
}
Multiple Graph Sources
Combine multiple graph sources:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"select": ["?product", "?textScore", "?vecScore"],
"values": [
["?queryVec"],
[{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 100,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
},
{
"f:graphSource": "products-vector:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 100,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vecScore" }
}
]
}
Graph Sources + Regular Graphs
Combine graph sources and regular ledgers:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"select": ["?product", "?name", "?price", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
},
{ "@id": "?product", "schema:name": "?name" },
{ "@id": "?product", "schema:price": "?price" }
]
}
Synchronization
Source Tracking
Graph sources track their source ledger:
Source: products:main @ t=150
Graph Source: products-search:main @ source_t=150
Update Modes
Real-Time:
- Updates immediately as source changes
- Low latency
- Higher overhead
Batch:
- Updates periodically
- Higher latency
- Lower overhead
Manual:
- Updates on demand
- Full control
- Requires manual triggering
Checking Sync Status
curl http://localhost:8090/graph-source/products-search:main/status
Response:
{
"name": "products-search:main",
"source": "products:main",
"source_t": 150,
"index_t": 148,
"lag": 2,
"last_sync": "2024-01-22T10:30:00Z",
"status": "syncing"
}
Query Execution
Query Planning
Query planner handles graph sources:
- Parse Query: Extract graph patterns
- Route Subqueries: Identify which graphs handle which patterns
- Execute Subqueries: Run against appropriate backends
- Join Results: Combine results from multiple graphs
- Apply Filters: Final filtering and sorting
Example Execution
Query:
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 50,
"f:searchResult": { "f:resultId": "?p" }
},
{ "@id": "?p", "schema:price": "?price" }
],
"filter": "?price < 1000"
}
Execution Plan:
1. Execute BM25 search on products-search:main:
f:searchText "laptop", f:searchLimit 50
→ Result: ?p = [ex:p1, ex:p2, ex:p3, ...]
2. Execute on products:main:
SELECT ?p ?price WHERE {
VALUES ?p { ex:p1 ex:p2 ex:p3 ... }
?p schema:price ?price
}
→ Result: [(ex:p1, 899), (ex:p2, 1200), ...]
3. Join and filter:
?price < 1000
→ Result: [(ex:p1, 899)]
Performance Characteristics
BM25 Graph Sources
- Index Build: O(n × avg_doc_length)
- Query: O(log n) with inverted index
- Space: 2-3× source data
- Update: Incremental, O(doc_size)
Vector Graph Sources
- Index Build: O(n log n) for HNSW
- Query: O(log n) approximate
- Space: 1.5× embedding size
- Update: Incremental, O(1)
Iceberg Graph Sources
- Index Build: No index (direct file access)
- Query: O(partitions scanned)
- Space: Zero overhead (uses Parquet files)
- Update: Batch-oriented
Best Practices
1. Choose Appropriate Type
Match graph source type to use case:
- Keyword search → BM25
- Semantic search → Vector
- Analytics / data lake → Iceberg (with R2RML mapping)
2. Monitor Synchronization
Check sync lag regularly:
setInterval(async () => {
const status = await getGraphSourceStatus('products-search:main');
if (status.lag > 10) {
console.warn(`Graph source lag: ${status.lag} transactions`);
}
}, 60000);
3. Filter in Graph Sources
Push filters to graph sources when possible:
Good (graph source pattern first narrows results before graph traversal):
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?p" }
},
{ "@id": "?p", "schema:name": "?name" }
]
}
Bad (graph traversal before graph source means scanning all products first):
{
"@context": {"f": "https://ns.flur.ee/db#"},
"from": "products:main",
"where": [
{ "@id": "?p", "schema:name": "?name" },
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?p" }
}
]
}
4. Use Explain Plans
Understand query execution:
curl -X POST http://localhost:8090/v1/fluree/explain \
-d '{...}'
5. Limit Results
Always use LIMIT with graph sources:
{
"where": [...],
"limit": 100
}
Troubleshooting
High Sync Lag
Symptom: lag increasing
Causes:
- Source ledger write rate too high
- Graph source indexing too slow
- Resource constraints
Solutions:
- Increase indexing resources
- Batch updates
- Use manual sync mode
Query Performance Issues
Symptom: Slow queries combining graph sources
Solutions:
- Check explain plan
- Add filters to reduce intermediate results
- Ensure graph source is synced
- Consider query rewrite
Missing Results
Symptom: Expected results not returned
Causes:
- Graph source not synced
- Mapping misconfiguration
- Filter too restrictive
Solutions:
- Check sync status
- Verify mapping configuration
- Test subqueries independently
Related Documentation
- BM25 Graph Source - Full-text search
- Iceberg - Data lake integration
- R2RML - R2RML mapping reference
- BM25 Indexing - BM25 details
- Vector Search - Vector details
- Query Datasets - Multi-graph queries
Iceberg / Parquet
Fluree integrates with Apache Iceberg to query data lake tables as graph sources. An R2RML mapping defines how Iceberg table rows are materialized into RDF triples, enabling you to query large-scale analytical data stored in Parquet format using the same SPARQL / JSON-LD query interface as regular ledgers.
Note: Requires the iceberg feature flag. See Compatibility and Feature Flags.
What is Apache Iceberg?
Apache Iceberg is an open table format for huge analytical datasets. It provides:
- ACID transactions on data lakes
- Time travel and versioning
- Schema evolution
- Partition management
- Optimized file organization (Parquet)
Configuration
Catalog Modes
Fluree supports two ways to discover Iceberg metadata:
- REST catalog: discover table metadata via an Iceberg REST catalog API (e.g., Polaris).
- Direct S3 (no catalog server): bypass REST discovery and read
version-hint.textfrom the table’smetadata/directory to resolve the current metadata file.
CLI
The fluree iceberg map command creates Iceberg graph sources from the command line. An R2RML mapping is required to define how table rows become RDF triples.
# REST catalog with R2RML mapping
fluree iceberg map warehouse-orders \
--catalog-uri https://polaris.example.com/api/catalog \
--r2rml mappings/orders.ttl \
--auth-bearer $POLARIS_TOKEN
# Direct S3 (no catalog server) with R2RML mapping
fluree iceberg map execution-log \
--mode direct \
--table-location s3://bucket/warehouse/logs/execution_log \
--r2rml mappings/execution_log.ttl
Once mapped, graph sources appear in fluree list, can be inspected with fluree info, and removed with fluree drop. See CLI iceberg reference for all options.
HTTP API
When running the Fluree server (or Docker image) with the iceberg feature enabled, map a table by POSTing to {api_base_url}/iceberg/map (default: /v1/fluree/iceberg/map). The endpoint is admin-protected — include the admin Bearer token if admin auth is configured.
# REST catalog with R2RML mapping (mapping passed inline)
curl -X POST http://localhost:8090/v1/fluree/iceberg/map \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d @- <<'JSON'
{
"name": "warehouse-orders",
"mode": "rest",
"catalog_uri": "https://polaris.example.com/api/catalog",
"table": "sales.orders",
"warehouse": "my-warehouse",
"auth_bearer": "polaris-token-here",
"r2rml": "@prefix rr: <http://www.w3.org/ns/r2rml#> . ...",
"r2rml_type": "text/turtle"
}
JSON
# Direct S3 mode (no catalog server)
curl -X POST http://localhost:8090/v1/fluree/iceberg/map \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{
"name": "execution-log",
"mode": "direct",
"table_location": "s3://bucket/warehouse/logs/execution_log",
"r2rml": "...",
"r2rml_type": "text/turtle",
"s3_region": "us-east-1",
"s3_path_style": true
}'
R2RML can be omitted to auto-generate a direct mapping. AWS credentials for direct mode are read from the server’s environment (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, or an attached instance role). See the Graph Source Endpoints section in the API reference for the complete request/response schema.
Rust API
R2rmlCreateConfig::new and new_direct take the R2RML mapping as a content string (Turtle or JSON-LD), not a file path — read the file yourself first. To reference an already-stored mapping by address instead, build the config directly with R2rmlMappingInput::Address(...).
REST catalog mode (Polaris-style):
#![allow(unused)]
fn main() {
use fluree_db_api::R2rmlCreateConfig;
let mapping = std::fs::read_to_string("mappings/orders.ttl")?;
let config = R2rmlCreateConfig::new(
"warehouse-orders",
"https://polaris.example.com/api/catalog",
"sales.orders",
mapping,
)
.with_warehouse("my-warehouse")
.with_auth_bearer("my-token")
.with_vended_credentials(true);
fluree.create_r2rml_graph_source(config).await?;
}
Direct S3 mode (no REST catalog):
#![allow(unused)]
fn main() {
use fluree_db_api::R2rmlCreateConfig;
let mapping = std::fs::read_to_string("mappings/execution_log.ttl")?;
let config = R2rmlCreateConfig::new_direct(
"execution-log",
"s3://bucket/warehouse/logs/execution_log",
mapping,
)
.with_s3_region("us-east-1")
.with_s3_path_style(true);
fluree.create_r2rml_graph_source(config).await?;
}
Stored Configuration Format (Nameservice)
Iceberg graph sources are persisted as an IcebergGsConfig JSON document in the nameservice record’s config field.
Note the nesting: the graph source is “Iceberg” (this page), and catalog.type selects the catalog mode (rest vs direct) used to discover Iceberg metadata.
REST catalog config:
{
"catalog": {
"type": "rest",
"uri": "https://polaris.example.com/api/catalog",
"warehouse": "my-warehouse",
"auth": { "type": "bearer", "token": { "env_var": "POLARIS_TOKEN" } }
},
"table": "sales.orders",
"io": {
"vended_credentials": true,
"s3_region": "us-east-1",
"s3_endpoint": null,
"s3_path_style": false
}
}
Direct S3 config:
{
"catalog": {
"type": "direct",
"table_location": "s3://bucket/warehouse/logs/execution_log"
},
"table": "",
"io": {
"vended_credentials": false,
"s3_region": "us-east-1",
"s3_endpoint": null,
"s3_path_style": true
}
}
Direct mode requirements:
catalog.table_locationmust be an S3 URI (s3://ors3a://) pointing to the table root directory.- The table must contain a
metadata/subdirectory with:version-hint.text(containing the current metadata filename, e.g.,00001-abc-def.metadata.json)- The referenced
.metadata.jsonfile
- Direct mode uses ambient AWS credentials (IAM roles, env vars,
~/.aws/credentials). It does not support vended credentials.
How Direct metadata resolution works:
- Fluree does not require you to provide a path to
version-hint.textin the config. You provide the table root (table_location), and Fluree reads:"{table_location}/metadata/version-hint.text"to get the current metadata filename"{table_location}/metadata/{filename}"as the table’s current metadata
version-hint.textmay contain a bare filename (e.g.,00001-abc.metadata.json) or a full absolute path (s3://...).- If
version-hint.textis missing or empty, Direct mode fails with an error mentioningversion-hint.text.
Iceberg table setup must already exist:
Direct mode assumes table_location points at a valid Iceberg table layout (created by iceberg-rust, Spark, etc.), including the metadata/ directory and referenced metadata/manifest files. Fluree does not create or “bootstrap” Iceberg tables; it only reads them.
When to use Direct vs REST:
| Scenario | Recommended |
|---|---|
| Shared catalog (multiple consumers) | REST |
| Writer and reader are the same system | Direct |
iceberg-rust / Spark appending to known S3 path | Direct |
| Need catalog-managed credentials (vended) | REST |
| Minimizing infrastructure (no catalog server) | Direct |
RDF Mapping (R2RML)
Every Iceberg graph source requires an R2RML mapping (Turtle format) that defines how table rows become RDF triples — specifying subject IRI templates, predicate mappings, and type conversions. See R2RML for the full mapping reference.
Type Mapping
Iceberg types map to XSD types:
| Iceberg Type | RDF Type |
|---|---|
| int, long | xsd:integer |
| float, double | xsd:decimal |
| string | xsd:string |
| boolean | xsd:boolean |
| date | xsd:date |
| timestamp | xsd:dateTime |
| uuid | xsd:string |
Querying Iceberg Tables
Iceberg graph sources are queried using standard SPARQL and JSON-LD syntax. In the Rust API, mapped sources resolve transparently through the lazy query builders:
fluree.graph("warehouse-orders:main").query()for a single target that may be either a native ledger or a mapped graph sourcefluree.query_from()when the query body itself carries the dataset ("from"/FROM) or when composing multiple sources
The lower-level materialized snapshot path (let view = fluree.db(...).await?; fluree.query(&view, ...)) is still native-ledger-oriented and should not be used for graph source aliases.
#![allow(unused)]
fn main() {
// Single-target lazy query
let result = fluree.graph("warehouse-orders:main")
.query()
.sparql("SELECT * WHERE { ?s ?p ?o } LIMIT 10")
.execute()
.await?;
// FROM-driven query
let result = fluree.query_from()
.sparql("SELECT * FROM <warehouse-orders:main> WHERE { ?s ?p ?o } LIMIT 10")
.execute()
.await?;
}
Basic Query
{
"@context": {
"ex": "http://example.org/ns/"
},
"from": "warehouse-orders:main",
"select": ["?orderId", "?total"],
"where": [
{ "@id": "?order", "ex:orderId": "?orderId" },
{ "@id": "?order", "ex:total": "?total" }
],
"limit": 100
}
SPARQL Query
PREFIX ex: <http://example.org/ns/>
SELECT ?orderId ?total ?date
FROM <warehouse-orders:main>
WHERE {
?order ex:orderId ?orderId .
?order ex:total ?total .
?order ex:orderDate ?date .
FILTER (?date >= "2024-01-01"^^xsd:date)
}
ORDER BY DESC(?date)
LIMIT 100
Partition Pruning
Iceberg’s partition pruning optimizes queries:
{
"from": "warehouse-orders:main",
"select": ["?orderId", "?total"],
"where": [
{ "@id": "?order", "ex:orderId": "?orderId" },
{ "@id": "?order", "ex:total": "?total" },
{ "@id": "?order", "ex:orderDate": "?date" }
],
"filter": "?date >= '2024-01-01' && ?date < '2024-02-01'"
}
If orderDate is a partition column, Iceberg only scans January 2024 partitions.
Combining with Fluree Data
Join Iceberg data with Fluree ledgers:
{
"from": ["customers:main", "warehouse-orders:main"],
"select": ["?customerName", "?orderTotal", "?orderDate"],
"where": [
{ "@id": "?customer", "schema:name": "?customerName" },
{ "@id": "?customer", "ex:customerId": "?customerId" },
{ "@id": "?order", "ex:customerId": "?customerId" },
{ "@id": "?order", "ex:total": "?orderTotal" },
{ "@id": "?order", "ex:orderDate": "?orderDate" }
],
"filter": "?orderDate >= '2024-01-01'",
"orderBy": ["-?orderDate"]
}
Combines customer data from Fluree with order data from Iceberg.
Time Travel
Query historical Iceberg snapshots:
{
"from": "warehouse-orders:main@snapshot:12345",
"select": ["?orderId", "?total"],
"where": [
{ "@id": "?order", "ex:orderId": "?orderId" },
{ "@id": "?order", "ex:total": "?total" }
]
}
Or by timestamp:
{
"from": "warehouse-orders:main@timestamp:2024-01-01T00:00:00Z",
"select": ["?orderId", "?total"],
"where": [...]
}
Aggregations
Aggregate Iceberg data:
PREFIX ex: <http://example.org/ns/>
SELECT ?date (SUM(?total) AS ?dailyRevenue) (COUNT(?order) AS ?orderCount)
FROM <warehouse-orders:main>
WHERE {
?order ex:orderDate ?date .
?order ex:total ?total .
FILTER (?date >= "2024-01-01"^^xsd:date)
}
GROUP BY ?date
ORDER BY ?date
Performance
Query Planning
Fluree pushes filters to Iceberg:
Query: SELECT ?id WHERE { ?order ex:orderDate ?date } FILTER (?date > "2024-01-01")
↓
Pushed to Iceberg:
SELECT order_id FROM sales.orders WHERE order_date > '2024-01-01'
↓
Iceberg optimizations:
- Partition pruning (only scan 2024 partitions)
- File skipping (skip files outside date range)
- Column pruning (only read order_id, order_date)
Best Practices
-
Partition by Common Filters:
-- Partition Iceberg table by date PARTITIONED BY (YEAR(order_date), MONTH(order_date)) -
Use Filters:
{ "where": [...], "filter": "?date >= '2024-01-01'" // Enables partition pruning } -
Limit Results:
{ "where": [...], "limit": 1000 } -
Project Only Needed Columns:
{ "select": ["?orderId", "?total"], // Only these columns read from Parquet "where": [...] }
Schema Evolution
Iceberg supports schema evolution via metadata updates. If a schema change renames/removes columns used by your R2RML mapping, update the mapping accordingly.
Configuration Options
AWS Credentials
For S3-backed Iceberg (both REST and Direct modes):
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1
REST catalog mode also supports vended credentials (credentials issued by the catalog). Direct mode uses only ambient AWS credentials (env vars, IAM roles, ~/.aws/credentials).
Use Cases
Analytics on Historical Data
Query years of historical data:
SELECT ?year (SUM(?revenue) AS ?totalRevenue)
FROM <warehouse-sales:main>
WHERE {
?sale ex:year ?year .
?sale ex:revenue ?revenue .
FILTER (?year >= 2020 && ?year <= 2023)
}
GROUP BY ?year
ORDER BY ?year
Data Warehouse Integration
Combine real-time Fluree data with warehouse analytics:
{
"from": ["products:main", "warehouse-sales:main"],
"select": ["?productName", "?totalSold"],
"where": [
{ "@id": "?product", "schema:name": "?productName" },
{ "@id": "?product", "ex:productId": "?pid" },
{ "@id": "?sale", "ex:productId": "?pid" }
]
}
Large-Scale Reporting
Generate reports from petabyte-scale data:
SELECT ?region ?category (SUM(?amount) AS ?total)
FROM <warehouse-transactions:main>
WHERE {
?txn ex:region ?region .
?txn ex:category ?category .
?txn ex:amount ?amount .
FILTER (?year = 2024)
}
GROUP BY ?region ?category
ORDER BY DESC(?total)
Limitations
- Read-Only: Iceberg graph sources are read-only (no writes via Fluree)
- Complex Joins: Large joins between Fluree and Iceberg may be slow
- No Full-Text Search: Use Fluree’s BM25 for text search
Troubleshooting
Connection Issues
{
"error": "IcebergConnectionError",
"message": "Cannot connect to Glue catalog"
}
Solutions:
- Check AWS credentials
- Verify IAM permissions
- Check network connectivity
Schema Mismatch
{
"error": "SchemaMismatchError",
"message": "Column 'order_date' not found in Iceberg table"
}
Solutions:
- Update R2RML mapping configuration (if the mapping references missing columns)
- Verify table name and catalog
Slow Queries
Causes:
- Large result sets
- No partition pruning
- Scanning many files
Solutions:
- Add date filters to enable partition pruning
- Use LIMIT clause
- Optimize Iceberg table partitioning
- Use Iceberg file compaction
Related Documentation
- Graph Sources Overview - Graph source concepts
- R2RML - Relational database mapping
- Query Datasets - Multi-graph queries
R2RML (Relational to RDF Mapping)
R2RML (RDB to RDF Mapping Language) is a W3C standard for mapping tabular data into RDF triples. In Fluree, R2RML mappings are used to expose Iceberg tables as RDF graph sources, enabling you to query data lake tables using SPARQL or JSON-LD Query.
What is R2RML?
R2RML defines how to map:
- Database tables to RDF classes
- Table columns to RDF properties
- Rows to RDF resources
- Foreign keys to RDF relationships
In Fluree, this enables querying Iceberg tables as if they were RDF graphs.
Configuration
Create R2RML Graph Source (Iceberg-backed)
Use R2rmlCreateConfig to register a graph source that combines:
- an Iceberg table (REST catalog or Direct S3), and
- an R2RML mapping (Turtle) that materializes table rows into RDF triples.
If you use Direct S3 mode, Fluree resolves the current Iceberg metadata by reading metadata/version-hint.text under the configured table_location, then loading the metadata file referenced by the hint. The Iceberg table layout must already exist at that location.
#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, R2rmlCreateConfig};
let fluree = FlureeBuilder::default().build().await?;
let config = R2rmlCreateConfig::new_direct(
"airlines-rdf",
"s3://bucket/warehouse/openflights/airlines",
"fluree:file://mappings/airlines.ttl",
)
.with_s3_region("us-east-1")
.with_s3_path_style(true)
.with_mapping_media_type("text/turtle");
fluree.create_r2rml_graph_source(config).await?;
}
R2RML Mapping
Basic Mapping
Map a table to RDF class:
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
<#CustomerMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:tableName "customers"
] ;
rr:subjectMap [
rr:template "http://example.org/customer/{id}" ;
rr:class schema:Person
] ;
rr:predicateObjectMap [
rr:predicate schema:name ;
rr:objectMap [ rr:column "name" ]
] ;
rr:predicateObjectMap [
rr:predicate schema:email ;
rr:objectMap [ rr:column "email" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:customerId ;
rr:objectMap [ rr:column "id" ]
] .
This maps the customers table:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
To RDF triples:
<http://example.org/customer/1>
a schema:Person ;
schema:name "Alice" ;
schema:email "alice@example.org" ;
ex:customerId "1" .
Foreign Key Mapping
Map relationships:
<#OrderMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:tableName "orders"
] ;
rr:subjectMap [
rr:template "http://example.org/order/{id}" ;
rr:class ex:Order
] ;
rr:predicateObjectMap [
rr:predicate ex:orderId ;
rr:objectMap [ rr:column "id" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:customer ;
rr:objectMap [
rr:parentTriplesMap <#CustomerMapping> ;
rr:joinCondition [
rr:child "customer_id" ;
rr:parent "id"
]
]
] ;
rr:predicateObjectMap [
rr:predicate ex:total ;
rr:objectMap [ rr:column "total" ]
] .
Maps foreign key customer_id to RDF object property linking to customer resource.
Complex Queries
Use SQL views for complex mappings:
<#SalesReportMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:sqlQuery """
SELECT
c.id as customer_id,
c.name as customer_name,
SUM(o.total) as total_spent,
COUNT(o.id) as order_count
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
"""
] ;
rr:subjectMap [
rr:template "http://example.org/customer/{customer_id}" ;
rr:class ex:Customer
] ;
rr:predicateObjectMap [
rr:predicate schema:name ;
rr:objectMap [ rr:column "customer_name" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:totalSpent ;
rr:objectMap [ rr:column "total_spent" ; rr:datatype xsd:decimal ]
] ;
rr:predicateObjectMap [
rr:predicate ex:orderCount ;
rr:objectMap [ rr:column "order_count" ; rr:datatype xsd:integer ]
] .
Querying R2RML Graph Sources
R2RML graph sources are queried using standard SPARQL and JSON-LD query syntax — no special query language is needed. In the Rust API, graph source resolution is wired into the lazy query builders:
fluree.graph("my-gs:main").query()for a single target that may be either a native ledger or a mapped graph sourcefluree.query_from()when the query body specifies the dataset ("from"/FROM) or combines multiple sources
The raw materialized snapshot path (fluree.db(&alias) → fluree.query(&view, ...)) is still the wrong abstraction for graph source aliases because it assumes a native ledger snapshot has already been loaded.
Graph sources can be:
- Queried directly as the target:
fluree query my-gs 'SELECT * WHERE { ?s ?p ?o }' - Referenced in FROM clauses:
SELECT * FROM <my-gs:main> WHERE { ... } - Referenced in GRAPH patterns:
SELECT * WHERE { GRAPH <my-gs:main> { ... } }(useful for joining with ledger data)
Basic Query
{
"@context": {
"schema": "http://schema.org/",
"ex": "http://example.org/ns/"
},
"from": "warehouse-customers:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?customer", "@type": "schema:Person" },
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "schema:email": "?email" }
]
}
The mapping controls how subjects and predicate/object values are produced from the scanned table columns.
SPARQL Query
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?email
FROM <warehouse-customers:main>
WHERE {
?customer a schema:Person .
?customer schema:name ?name .
?customer schema:email ?email .
}
Filters
{
"from": "warehouse-customers:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "schema:email": "?email" },
{ "@id": "?customer", "ex:status": "?status" }
],
"filter": "?status == 'active'"
}
Joins
{
"from": "warehouse-orders:main",
"select": ["?customerName", "?orderTotal"],
"where": [
{ "@id": "?customer", "schema:name": "?customerName" },
{ "@id": "?order", "ex:customer": "?customer" },
{ "@id": "?order", "ex:total": "?orderTotal" }
]
}
Combining with Fluree Data
Join Iceberg data with Fluree ledgers:
{
"from": ["products:main", "warehouse-inventory:main"],
"select": ["?productName", "?stockLevel"],
"where": [
{ "@id": "?product", "schema:name": "?productName" },
{ "@id": "?product", "ex:sku": "?sku" },
{ "@id": "?inventory", "ex:sku": "?sku" },
{ "@id": "?inventory", "ex:stockLevel": "?stockLevel" }
]
}
Combines product data from Fluree with inventory from an Iceberg-backed R2RML graph source.
Performance
R2RML graph sources execute by scanning the underlying Iceberg table and materializing RDF terms according to the mapping.
Best Practices
-
Filter Early: Filters are pushed down to Iceberg for partition pruning.
{ "where": [...], "filter": "?date >= '2024-01-01'" } -
Limit Results:
{ "where": [...], "limit": 100 } -
Project Only Needed Columns: Only columns referenced in the query and mapping are read from Parquet files.
-
Partition by Common Filters: Partition your Iceberg tables by columns frequently used in filters (e.g., date).
Use Cases
Data Lake Analytics
Query Iceberg tables containing large-scale analytical data alongside Fluree ledgers:
{
"from": ["products:main", "warehouse-sales:main"],
"select": ["?productName", "?totalSold"],
"where": [
{ "@id": "?product", "schema:name": "?productName" },
{ "@id": "?product", "ex:productId": "?pid" },
{ "@id": "?sale", "ex:productId": "?pid" },
{ "@id": "?sale", "ex:quantity": "?totalSold" }
]
}
Multi-Table Mapping
A single R2RML mapping file can define multiple TriplesMap entries, each targeting a different Iceberg table or logical view. This enables querying across related tables through a single graph source.
Limitations
- Read-Only: R2RML graph sources are read-only (no writes via Fluree)
- Performance: Complex joins across Fluree + Iceberg may be slow
- Schema Changes: Requires mapping updates when referenced columns change
Troubleshooting
Connection Errors
{
"error": "IcebergConnectionError",
"message": "Cannot load table metadata"
}
Solutions:
- Check catalog configuration (REST vs Direct)
- Verify AWS credentials and S3 access
- Verify
version-hint.textis present for Direct mode
Mapping Errors
{
"error": "R2RMLMappingError",
"message": "Invalid R2RML mapping: table 'customers' not found"
}
Solutions:
- Verify table name / location
- Check referenced column names in the mapping
- Validate R2RML syntax (Turtle)
Slow Queries
Causes:
- Large result sets (many Parquet files scanned)
- No partition pruning
- Complex joins across Fluree + Iceberg
Solutions:
- Add date/partition filters to enable Iceberg partition pruning
- Use LIMIT clause
- Optimize R2RML mapping to project only needed columns
- Partition Iceberg tables by common filter columns
Related Documentation
- Graph Sources Overview - Graph source concepts
- Iceberg - Data lake integration
- Query Datasets - Multi-graph queries
BM25 Graph Source
BM25 indexes in Fluree are implemented as graph sources, allowing full-text search to be seamlessly integrated with structured graph queries through the standard query interface.
Overview
A BM25 graph source:
- Indexes text content from a source ledger using a configurable query
- Provides relevance-ranked search results via BM25 scoring
- Integrates with JSON-LD queries through
f:namespace predicates - Supports time-travel (query the index at any historical point)
- Maintains a manifest of snapshots for incremental sync
For index creation, configuration, and lifecycle management, see BM25 Full-Text Search.
Querying BM25 Graph Sources
JSON-LD Search Pattern
BM25 search uses the f: (Fluree) namespace predicates in where clauses:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "docs:main",
"where": [
{
"f:graphSource": "article-search:main",
"f:searchText": "rust programming",
"f:searchLimit": 10,
"f:searchResult": {
"f:resultId": "?doc",
"f:resultScore": "?score"
}
},
{ "@id": "?doc", "ex:title": "?title" }
],
"select": ["?doc", "?title", "?score"]
}
Pattern Fields
| Field | Required | Description |
|---|---|---|
f:graphSource | Yes | Graph source ID (e.g., "article-search:main") |
f:searchText | Yes | Query text. Analyzed with the same tokenizer/stemmer as indexing. |
f:searchLimit | Yes | Maximum number of search results to return |
f:searchResult | Yes | Object with variable bindings for results |
f:resultId | Yes | Variable for the matched document IRI (e.g., "?doc") |
f:resultScore | No | Variable for the BM25 relevance score (e.g., "?score") |
f:resultLedger | No | Variable for the source ledger alias (for multi-ledger provenance) |
How It Works
- The search pattern is parsed and turned into a
Bm25SearchOperator - The operator loads the BM25 index from storage (using the leaflet cache when available)
- Query text is analyzed (tokenized, lowercased, stopwords removed, stemmed)
- The top-k results are computed using Block-Max WAND, which skips posting list segments whose upper-bound scores cannot enter the result set, then returns the highest-scoring documents
- Results produce variable bindings (
?doc,?score) that flow into subsequent where clauses - Subsequent patterns join against the source ledger to retrieve additional properties
Joining with Ledger Data
The primary use case is combining search results with structured graph data:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "docs:main",
"where": [
{
"f:graphSource": "article-search:main",
"f:searchText": "database design",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
},
{ "@id": "?doc", "ex:title": "?title" },
{ "@id": "?doc", "ex:author": "?author" },
{ "@id": "?doc", "ex:year": "?year" }
],
"select": ["?doc", "?title", "?author", "?year", "?score"]
}
The BM25 search runs first, producing a set of (?doc, ?score) bindings. The remaining where clauses join those bindings against the source ledger to enrich results with structured data.
Rust API
Creating and Querying
#![allow(unused)]
fn main() {
use fluree_db_api::{Bm25CreateConfig, FlureeBuilder};
use serde_json::json;
let fluree = FlureeBuilder::memory().build_memory();
// Seed ledger
let ledger0 = fluree.create_ledger("docs:main").await?;
let tx = json!({
"@context": { "ex": "http://example.org/" },
"@graph": [
{ "@id": "ex:doc1", "@type": "ex:Doc", "ex:title": "Rust guide", "ex:author": "Alice" },
{ "@id": "ex:doc2", "@type": "ex:Doc", "ex:title": "Python intro", "ex:author": "Bob" }
]
});
let ledger = fluree.insert(ledger0, &tx).await?.ledger;
// Create index
let query = json!({
"@context": { "ex": "http://example.org/" },
"where": [{ "@id": "?x", "@type": "ex:Doc", "ex:title": "?title" }],
"select": { "?x": ["@id", "ex:title"] }
});
let config = Bm25CreateConfig::new("search", "docs:main", query);
let created = fluree.create_full_text_index(config).await?;
// Query with BM25 search + ledger join
let search_query = json!({
"@context": { "ex": "http://example.org/", "f": "https://ns.flur.ee/db#" },
"from": "docs:main",
"where": [
{
"f:graphSource": &created.graph_source_id,
"f:searchText": "rust",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?doc", "f:resultScore": "?score" }
},
{ "@id": "?doc", "ex:author": "?author" }
],
"select": ["?doc", "?score", "?author"]
});
let result = fluree.query_connection_with_bm25(&search_query).await?;
}
Using FlureeIndexProvider
The FlureeIndexProvider implements the Bm25IndexProvider and Bm25SearchProvider traits, used by the query engine for graph source resolution:
#![allow(unused)]
fn main() {
use fluree_db_api::FlureeIndexProvider;
use fluree_db_query::bm25::{Bm25IndexProvider, Bm25Scorer, Analyzer};
let provider = FlureeIndexProvider::new(&fluree);
// Load index through the provider (with optional sync and time-travel)
let index = provider
.bm25_index("search:main", Some(ledger.t()), false, None)
.await?;
// Direct search
let analyzer = Analyzer::english_default();
let terms = analyzer.analyze_to_strings("rust");
let term_refs: Vec<&str> = terms.iter().map(|s| s.as_str()).collect();
let scorer = Bm25Scorer::new(&index, &term_refs);
let results = scorer.top_k(10);
}
Remote Search Service
For large indexes or multi-instance deployments, BM25 (and vector) search can be delegated to a standalone search service: the fluree-search-httpd binary.
Important: the search service is a separate process with its own listen port and its own HTTP API. It is not mounted under the main Fluree server’s
api_base_url(/v1/fluree/...). It needs read access to the same storage and nameservice paths the main server writes to, so the typical deployment is to share a storage volume.
Prerequisite: the index must already exist
fluree-search-httpd only serves queries against existing indexes; it does not create them. Today, BM25 and vector graph-source indexes are created via the Rust API (Bm25CreateConfig + create_full_text_index, or VectorCreateConfig + create_vector_index). HTTP endpoints for index creation are not yet available — see the note in API endpoints.
The recommended workflow is:
- Run the Fluree server (or use the Rust API directly) to create the BM25 / vector index on a shared storage path.
- Run
fluree-search-httpdagainst the same--storage-rootand--nameservice-path. - Point clients (or the main Fluree server’s
SearchDeploymentConfig) at the search service’s/v1/searchendpoint.
Running the Search Service
fluree-search-httpd \
--storage-root file:///var/fluree/data \
--nameservice-path file:///var/fluree/ns \
--listen 0.0.0.0:9090
Configuration options (CLI flag / env var):
| Flag | Env var | Default | Description |
|---|---|---|---|
--storage-root | FLUREE_STORAGE_ROOT | (required) | Path to Fluree storage (where indexes are persisted). file:// prefix optional. |
--nameservice-path | FLUREE_NAMESERVICE_PATH | (required) | Path to nameservice data. |
--listen | FLUREE_SEARCH_LISTEN | 0.0.0.0:9090 | Address and port to bind. |
--cache-max-entries | FLUREE_SEARCH_CACHE_MAX_ENTRIES | 100 | Maximum cached indexes. |
--cache-ttl-secs | FLUREE_SEARCH_CACHE_TTL_SECS | 300 | Cache TTL in seconds. |
--max-limit | FLUREE_SEARCH_MAX_LIMIT | 1000 | Maximum results per query. |
--default-timeout-ms | FLUREE_SEARCH_DEFAULT_TIMEOUT_MS | 30000 | Default request timeout. |
--max-timeout-ms | FLUREE_SEARCH_MAX_TIMEOUT_MS | 300000 | Maximum allowed request timeout. |
Vector search is feature-gated: build/run a binary that includes the vector feature to enable the vector backend. When enabled, GET /v1/capabilities reports "vector" in supported_query_kinds.
Docker Deployment
Run the search service in Docker against a shared volume that the main Fluree server also mounts:
docker run -d --name fluree-search \
-p 9090:9090 \
-v fluree-data:/var/lib/fluree \
-e FLUREE_STORAGE_ROOT=/var/lib/fluree/storage \
-e FLUREE_NAMESERVICE_PATH=/var/lib/fluree/ns \
fluree/search-httpd:latest
For a full Compose example showing the main server + search service sharing a volume, see Running with Docker › Search service.
Search Protocol
The remote search service uses a JSON-based protocol on POST /v1/search. The request is the same shape regardless of backend; the query.kind discriminator selects BM25 vs. vector.
BM25 request:
{
"protocol_version": "1.0",
"graph_source_id": "article-search:main",
"query": { "kind": "bm25", "text": "rust programming" },
"limit": 20,
"as_of_t": 150,
"sync": false,
"timeout_ms": 5000
}
Vector request (requires the vector feature):
{
"protocol_version": "1.0",
"graph_source_id": "doc-embeddings:main",
"query": { "kind": "vector", "vector": [0.12, -0.34, ...], "metric": "cosine" },
"limit": 10
}
A vector_similar_to variant takes a to_iri instead of an explicit vector — the server resolves the entity’s embedding from the source ledger.
Response:
{
"protocol_version": "1.0",
"index_t": 150,
"hits": [
{ "iri": "http://example.org/doc1", "ledger_id": "docs:main", "score": 8.75 },
{ "iri": "http://example.org/doc2", "ledger_id": "docs:main", "score": 7.32 }
],
"took_ms": 12
}
Endpoints:
POST /v1/search— execute a search query (BM25 or vector)GET /v1/capabilities— protocol version, supported query kinds, max limit/timeoutGET /v1/health— health check
Time-travel: BM25 supports as_of_t (the service walks the manifest to find the newest snapshot ≤ t). Vector indexes are head-only and reject as_of_t.
Auth: the standalone service does not enforce auth itself — front it with a reverse proxy (or a network policy) if it shouldn’t be publicly reachable. The auth_token field on the main server’s SearchDeploymentConfig is sent as a Bearer token, so any proxy you put in front can validate it.
Where this fits in your architecture
Two ways to use the search service today:
- Direct client → search service. Your application sends BM25 / vector requests straight to
fluree-search-httpdand joins the resulting IRIs back to the main Fluree server’s query API on the application side. This is the path that works end-to-end today and is appropriate when search traffic dominates and you want it isolated from your main Fluree process. - Main Fluree server → search service (transparent delegation). The query path inside the main server has the plumbing to consult a per-graph-source
SearchDeploymentConfigand forward to a remote endpoint. This wiring is not yet exposed end-to-end through the create APIs —Bm25CreateConfighas no deployment builder, and the deployment field is not persisted to the nameservice config record by today’s create flow. Track this as a near-term gap; until then, query the search service directly.
Parity Guarantee
Both embedded and remote modes use identical:
- Analyzer configuration (tokenization, stemming, stopwords)
- BM25 scoring algorithm and parameters
- Time-travel and sync semantics
Queries return identical results regardless of deployment mode.
Time-travel note: BM25 time-travel selection is implemented by BM25 itself via a manifest/root in storage. The nameservice stores only a head pointer to the latest BM25 manifest (an opaque address) and does not store BM25 snapshot history.
Graph Source Identity
BM25 graph sources are registered in the nameservice as @type: "f:GraphSourceDatabase" records:
- ID format:
{name}:{branch}(e.g.,article-search:main) - Name: Cannot contain
:(reserved for ID formatting) - Branch: Defaults to
"main" - Dependencies: Tracked for the source ledger(s) the index draws from
- Config: Stores the indexing query and BM25 parameters (k1, b)
List ledgers and graph sources to discover BM25 graph sources:
curl http://localhost:8090/v1/fluree/ledgers
Related Documentation
- BM25 Full-Text Search - Index creation, configuration, maintenance, and storage internals
- Graph Sources Overview - Graph source concepts
- Query Datasets - Multi-graph queries
Fluree Memory
Persistent, searchable memory for AI coding assistants — built for real work.
Fluree Memory gives tools like Claude Code, Cursor, and VS Code Copilot a long-term project brain. Facts, decisions, and constraints are captured as structured memories, stored in a local Fluree ledger you control, and retrieved via ranked recall — either by the agent through MCP or directly from the CLI.
Because memories live in plain-text TTL files under your project (.fluree-memory/repo.ttl for the team, .fluree-memory/.local/user.ttl for you), they can be committed to git and shared across the team the same way code is. No cloud service, no opaque database, no data leaving your machine. Open the file, read it, grep it, diff it, review it in a PR.
Design philosophy
We initiallly built Fluree Memory for us, with a goal to increase the velocity of development with LLMs, work seamlessly in a git workflow, and to reduce token usage – in that order. We ended with a simple knowledge organization model (it started out more complex), and leaned into the speed and power of our knowledge graph database. We found most memory systems are designed for benchmarks or demos – they optimize for recall scores on synthetic tasks, ship your data to a hosted service, or bury context in a format only the tool can read - often running LLMs over git hooks or conversation turns that can burn more tokens than your actual coding session.
Fluree Memory has been refined by running it daily across real repositories — a 37-crate Rust workspace, multi-service TypeScript apps, real teams — and iterating on what actually gets used. The schema started with five memory kinds, four sensitivity levels, six sub-type fields, and bi-temporal validity. Usage data showed that 85% of memories were facts, “architecture” covered 81% of sub-types, and most optional fields were never set. So we simplified. Three kinds. Tags instead of sub-type taxonomies. Scope instead of a redundant sensitivity axis. Fewer decisions for the agent to make on every save means more saves actually happen.
The principles that came out of this:
- Your repo, your data. Memories are local Turtle (TTL) files. They live alongside your code, flow through your existing review and version control, and never leave your infrastructure. There is no hosted component, no account, no telemetry.
- Visible and auditable. Every memory is a block of Turtle you can read in any text editor.
git diffshows exactly what changed.git blameshows who (or what) added it. No black boxes. - Simple enough to actually use. Three kinds —
fact,decision,constraint— cover the real-world space. If a model has to deliberate over a five-way kind taxonomy plus sub-types on every save, it won’t save. A system that gets used at 80% fidelity beats one that’s theoretically perfect but sits idle. - Recalled, not regurgitated. Seach with metadata re-ranking (tags, branch affinity, recency) pulls what’s relevant to the current task. The agent gets a handful of targeted memories, not a dump of everything that was ever stored.
- Optimized for context tokens. Terse output, scoring thresholds, and explicit instructions of pagination telling the LLM whats next with enough context it can decide if useful to fetch more.
- Iterated from production. The schema, the recall ranking, the tool descriptions — all of it has been refined based on real agent behavior across real codebases. Features that earned usage stay. Features that didn’t get cut.
Why
Every AI coding session starts from zero. The model doesn’t remember what was tried last week, which library the team chose and why, or the ten subtle gotchas that live in someone’s head. You either re-explain each time, stuff it all into a CLAUDE.md / AGENTS.md that bloats context, or ship agents that repeat mistakes.
Fluree Memory is:
- Structured, not a wall of markdown. Memories have a kind (
fact,decision,constraint), tags, scope, optional severity, rationale, and artifact references. - Recalled on demand via BM25 keyword-scored search over memory content, with metadata-based re-ranking (tags, refs, kind, branch, recency). The agent pulls only what’s relevant to the current task, keeping context small.
- Versioned via git —
updatemodifies in place (same ID, only changed fields);git log -pshows the full history. Usefluree create <name> --memoryto import git history into a time-travel-capable Fluree ledger. - Scoped per-repo or per-user, so team knowledge stays shareable and personal preferences stay yours.
- Local-first, stored in
.fluree-memory/as TTL — no cloud dependency, you own the data. - Secret-aware — content is scanned on write against a set of known credential patterns, and matches are redacted automatically.
Start here
- New? → Quickstart — install, init, store a memory, recall it.
- Using Claude Code? → Set up Claude Code
- Using Cursor? → Set up Cursor
- Want to understand the model? → What is a memory?
- Looking for a command? → CLI reference
How it fits
Fluree Memory is a feature of Fluree DB — installing the fluree CLI gives you both. If you only care about the memory tooling, you can still install and use Fluree as a single binary and never touch the rest of the database features.
Getting started
Three steps and you’re running:
- Install the
flureeCLI — one binary, single command. - Run the quickstart — initialize the memory store, add your first memory, recall it. 2 minutes.
- Wire it into your AI tool — pick yours:
Once the MCP server is configured, the AI tool gets memory_add and memory_recall tools and will start saving and retrieving memories without you having to reach for the CLI.
What to read next
- What is a memory? — the kinds and when to use each
- Repo vs user memory — how scope shapes the file layout
- Team workflows — sharing memories via git
Install Fluree
Fluree Memory ships as part of the fluree CLI. Install the binary once and you have both the database and the memory tooling.
macOS / Linux (installer script)
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.sh | sh
Homebrew (macOS / Linux)
brew install fluree/tap/fluree
PowerShell (Windows)
Open PowerShell and run:
irm https://github.com/fluree/db/releases/latest/download/fluree-db-cli-installer.ps1 | iex
Open a new PowerShell session and verify with fluree --version. The binary is unsigned, so Windows SmartScreen may prompt on first run — click More info → Run anyway.
Pre-built binary
# Linux x86_64
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-x86_64-unknown-linux-gnu.tar.xz | tar xJ
# macOS aarch64
curl -L https://github.com/fluree/db/releases/latest/download/fluree-db-cli-aarch64-apple-darwin.tar.xz | tar xJ
Build from source
If you have Rust installed:
git clone https://github.com/fluree/db
cd db
cargo install --path fluree-db-cli
Verify
fluree --version
fluree memory --help
You should see a list of memory subcommands: init, add, recall, update, forget, status, export, import, mcp-install.
Next: quickstart.
Quickstart
2 minutes to your first memory.
1. Initialize the memory store
From the root of a project you’d like to give memory to:
cd my-project
fluree memory init
This creates:
.fluree-memory/repo.ttl— team memories, meant to be committed to git.fluree-memory/.local/user.ttl— your personal memories, gitignored.fluree-memory/.gitignore— pre-configured to ignore.local/(which holds your user scope plus the MCP log)- The
__memoryledger inside your project’s.fluree/store
init is idempotent; running it again is safe.
It will also detect any installed AI coding tools (Claude Code, Cursor, VS Code, Windsurf, Zed) and offer to wire up MCP. You can say no here and run fluree memory mcp-install later.
2. Add a memory
fluree memory add --kind fact \
--text "Tests use cargo nextest, not cargo test" \
--tags testing
Output:
Stored memory: mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0
The ID is a ULID — sortable by creation time and unique across the store.
3. Recall it
fluree memory recall "how do I run tests"
Output:
Recall: "how do I run tests" (1 match)
1. [score: 13.0] mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0
Tests use cargo nextest, not cargo test
Tags: testing
Recall is BM25-ranked over the memory content and tags. No embeddings, no network — fast and deterministic.
4. Check status
fluree memory status
Memory Store Status
Total memories: 1
Total tags: 1
By kind:
fact: 1
That’s the loop
Add memories as you learn things. Recall them when you need them. Commit .fluree-memory/repo.ttl to share team knowledge.
Next
- Wire to your AI tool — Claude Code, Cursor, or others — so the agent does this for you.
- Learn the memory kinds — What is a memory?
- Understand scope — Repo vs user memory
Set up Claude Code
Wire Fluree Memory into Claude Code so it saves and recalls memories for you.
Automatic setup
Easiest path: run init from your project root and accept the Claude Code prompt.
cd my-project
fluree memory init
When you see:
Detected AI coding tools:
- Claude Code
Install MCP config for Claude Code? [Y/n]
…press Y. This runs claude mcp add under the hood to register the Fluree Memory MCP server at local (user) scope, and appends a short section to your CLAUDE.md telling Claude when to use it.
If you already ran init and skipped it:
fluree memory mcp-install --ide claude-code
What gets added
- MCP server registered in
~/.claude.json— scopelocal- Command:
fluree mcp serve --transport stdio
- Command:
- Project instructions in
<repo>/CLAUDE.md— a short block explaining the memory tools
Verify
Restart Claude Code and start a session in the project. Ask:
What project memories do you have?
Claude should call memory_recall and return whatever you’ve added (initially nothing).
Try:
Remember: we use
cargo nextestfor tests, notcargo test.
Claude should call memory_add and report the stored ID.
Troubleshooting
The tool doesn’t appear. Confirm Claude Code sees the MCP server:
claude mcp list
You should see a fluree-memory entry. If not, re-run fluree memory mcp-install --ide claude-code.
Memories aren’t scoped to the repo. The Claude Code MCP entry doesn’t set FLUREE_HOME — the server walks up from its spawn CWD looking for a .fluree/ directory. In normal use this matches the workspace, but if Claude Code launched the server from outside your repo, memories can land in a global store. Fix by editing ~/.claude.json and adding an env block to the fluree-memory server entry:
"env": { "FLUREE_HOME": "/absolute/path/to/your/repo/.fluree" }
Then restart Claude Code.
The MCP log. The MCP server logs to <repo>/.fluree-memory/.local/mcp.log (the file is truncated on each server start). Tail it if something’s off:
tail -f .fluree-memory/.local/mcp.log
Set up Cursor
Wire Fluree Memory into Cursor so its agent mode saves and recalls memories for you.
Automatic setup
From your project root:
cd my-project
fluree memory init
Accept the Cursor prompt:
Install MCP config for Cursor? [Y/n]
Or, at any time:
fluree memory mcp-install --ide cursor
What gets written
<repo>/.cursor/mcp.json— repo-scoped MCP server config<repo>/.cursor/rules/fluree_rules.md— a short rules file telling Cursor when to reach formemory_recall
{
"mcpServers": {
"fluree-memory": {
"type": "stdio",
"command": "fluree",
"args": ["mcp", "serve", "--transport", "stdio"],
"env": {
"FLUREE_HOME": "${workspaceFolder}/.fluree"
}
}
}
}
${workspaceFolder} is a Cursor config-interpolation token — the MCP server is always launched with FLUREE_HOME pointing at the current project, so memories stay scoped to the repo even if Cursor spawns the process from a different working directory.
Verify
Fully restart Cursor (Cmd-Q on macOS, not just reload window). Open the project and ask the agent:
Recall project memories for testing.
The agent should call memory_recall with the tag testing and return what’s in .fluree-memory/repo.ttl.
Troubleshooting
MCP isn’t connecting. Tail the MCP log:
tail -f .fluree-memory/.local/mcp.log
You should see a client initialized line within a few seconds of Cursor startup. If not, check .cursor/mcp.json exists and is valid JSON, then restart Cursor.
Memories going to a global store on macOS. If you see memories landing in ~/Library/Application Support/.fluree-memory/ instead of <repo>/.fluree-memory/, FLUREE_HOME isn’t being honored. Re-run fluree memory mcp-install --ide cursor from inside the repo and restart Cursor fully.
Rules file ignored. Cursor picks up .cursor/rules/*.md on project open. After editing, reload the window.
Set up VS Code (Copilot)
Wire Fluree Memory into VS Code with GitHub Copilot Chat so it can save and recall memories through MCP.
Automatic setup
From your project root:
fluree memory init
Accept the VS Code prompt, or run:
fluree memory mcp-install --ide vscode
What gets written
<repo>/.vscode/mcp.json— repo-scoped MCP server config (key:servers)<repo>/.vscode/fluree_rules.md— rules file you can reference from your prompts
{
"servers": {
"fluree-memory": {
"type": "stdio",
"command": "fluree",
"args": ["mcp", "serve", "--transport", "stdio"]
}
}
}
Unlike the Cursor config, this entry does not set FLUREE_HOME — VS Code normally spawns the server from the workspace root, so the walk-up logic in fluree mcp serve finds .fluree/ on its own. If you need to pin the location explicitly (e.g. the server is ending up in a global store), add an env block pointing at the absolute path to <repo>/.fluree/.
Verify
Open the project in VS Code with Copilot Chat enabled. In chat (agent mode), ask:
Call memory_recall for “testing”.
Copilot should invoke the tool and return matching memories. On first use VS Code may prompt to allow the MCP server — approve it.
Troubleshooting
Tail .fluree-memory/.local/mcp.log and fully restart VS Code if something’s off. If memory is landing in a global store rather than the repo, add an explicit env.FLUREE_HOME pointing at <repo>/.fluree/ in .vscode/mcp.json and restart.
Set up Windsurf
Wire Fluree Memory into Windsurf (Codeium’s IDE).
Automatic setup
fluree memory init
Accept the Windsurf prompt, or run:
fluree memory mcp-install --ide windsurf
What gets written
Windsurf uses a global MCP config:
~/.codeium/windsurf/mcp_config.json— afluree-memoryentry is merged undermcpServers
{
"mcpServers": {
"fluree-memory": {
"command": "fluree",
"args": ["mcp", "serve", "--transport", "stdio"]
}
}
}
Because the config is global, it’s wired once and every Windsurf project can use it. The MCP server figures out which repo it’s serving by walking up from its spawn CWD until it finds a .fluree/ directory; in normal use Windsurf spawns it from the workspace root so this works without extra configuration. No FLUREE_HOME is set by default.
Verify
Restart Windsurf and open your project. In Cascade (Windsurf’s agent chat):
Use memory_recall to find testing patterns.
The agent should invoke the tool.
Troubleshooting
If memories end up in a global store instead of <repo>/.fluree-memory/, Windsurf is likely spawning the server from outside the workspace. Edit ~/.codeium/windsurf/mcp_config.json and add an explicit absolute path:
"env": { "FLUREE_HOME": "/absolute/path/to/repo/.fluree" }
${workspaceFolder} interpolation is not guaranteed in all Windsurf versions — when in doubt, use an absolute path and switch it per project.
Set up Zed
Wire Fluree Memory into Zed’s agent via MCP.
Automatic setup
fluree memory init
Accept the Zed prompt, or run:
fluree memory mcp-install --ide zed
What gets written
<repo>/.zed/settings.json— thecontext_serverskey gets afluree-memoryentry
{
"context_servers": {
"fluree-memory": {
"command": "fluree",
"args": ["mcp", "serve", "--transport", "stdio"]
}
}
}
No FLUREE_HOME is set by default — the MCP server walks up from Zed’s spawn CWD to find the workspace’s .fluree/. If you need to pin it explicitly, add an env block alongside command/args with an absolute path.
Caveat: JSONC
Zed’s settings.json often contains // comments (JSONC). mcp-install detects this and will skip the automatic write rather than risk corrupting your settings — it prints a hint telling you to add the block by hand.
If you’d like to pre-empt that, strip comments from .zed/settings.json before running mcp-install, or paste the block yourself.
Verify
Restart Zed. In the agent panel:
Recall project memories about testing.
The agent should call memory_recall via the fluree-memory context server.
Concepts
Short, self-contained explanations of the ideas behind Fluree Memory. Read these once; they’ll save you time when you’re reading CLI reference or wiring a new IDE.
- What is a memory? — the three kinds (
fact,decision,constraint) and when to use each. - Repo vs user memory — how scope decides whether a memory ends up in
repo.ttl(shared with the team) or.local/user.ttl(yours). - Updates and forgetting — how
updatemodifies memories in place and how history is tracked via git. - Recall and ranking — how BM25 scores results and how tag / kind filters narrow them.
- MCP server — the tools Fluree Memory exposes to AI agents.
- Secrets and sensitivity — automatic redaction and scope-based privacy.
What is a memory?
A memory is a single structured record of something worth remembering about a project. Every memory has:
- Content — the text itself (“Tests use cargo nextest, not cargo test”)
- Kind — what sort of thing it is
- Tags — free-form keywords for filtering
- Scope — repo (shared) or user (yours)
- Refs — optional file or artifact pointers
- Timestamps — when it was created
Everything else (severity, rationale, alternatives) is optional metadata that can appear on any kind.
The three kinds
Memories are typed. The kind tells future-you (and future-agents) how to interpret the content.
fact
Something that is objectively true about the project.
“The indexer uses postcard encoding for on-disk format.” “We run PostgreSQL 16 in production.” “The BM25 code lives in
fluree-db-indexer/src/bm25.rs.” “Error pattern defined here ->fluree-db-core/src/error.rs”
Use facts liberally. They’re the default and make up the bulk of a typical memory store. Use tags to categorize them (e.g. architecture, dependency, configuration). Facts can carry --rationale and --alternatives when you want to explain why something is the way it is.
decision
A choice the team made, ideally with why and what was considered.
“Use postcard for compact index encoding. Why: no_std compatible, smaller than bincode. Alternatives: bincode, CBOR, MessagePack.”
Decisions are what distinguishes a project with institutional knowledge from one where people keep re-litigating settled choices. Capture them with --rationale and --alternatives:
fluree memory add --kind decision \
--text "Use postcard for compact index encoding" \
--rationale "no_std compatible, smaller output than bincode" \
--alternatives "bincode, CBOR, MessagePack" \
--refs fluree-db-indexer/
constraint
A rule — something that must, should, or is preferred. Constraints carry a severity.
must “Never commit secrets; use environment variables.” should “Integration tests run in a real Postgres, not SQLite.” prefer “Name errors with the module prefix (
QueryError, notError).”
fluree memory add --kind constraint \
--text "Never suppress dead code with _underscore prefix; delete it" \
--severity must \
--tags code-style \
--rationale "Underscore-prefixed names hide code from future discovery"
When an agent is about to do something, constraints are the first thing it should recall. Like facts and decisions, constraints can carry --rationale and --alternatives to explain the reasoning behind the rule.
Which kind should I use?
| You have… | Use kind |
|---|---|
| A verifiable truth | fact |
| A choice and its reasoning | decision |
| A rule that must/should be followed | constraint |
| A pointer to code / a file | fact (with --refs) |
| A soft taste or convention | fact or constraint --severity prefer |
When in doubt: fact. The kind can always be refined later via update. All three kinds support --rationale and --alternatives for capturing the why.
Repo vs user memory
Fluree Memory has two scopes, and they live in separate files:
| Scope | File | Git | Visible to |
|---|---|---|---|
| repo | .fluree-memory/repo.ttl | ✅ commit it | the whole team |
| user | .fluree-memory/.local/user.ttl | ❌ gitignored | just you |
Scope is set at write time (--scope repo or --scope user) and defaults to repo. Once set, it determines which TTL file the memory is written to.
Layout
After fluree memory init inside a project:
my-project/
├── .fluree/ # Fluree DB storage for the __memory ledger
├── .fluree-memory/
│ ├── .gitignore # contents: ".local/"
│ ├── repo.ttl # team memories — COMMIT THIS
│ └── .local/ # ignored by the .gitignore above
│ ├── user.ttl # your personal memories
│ ├── mcp.log # MCP server log
│ └── build-hash # content hash used to detect external TTL edits
└── (your code)
The .fluree-memory/.gitignore is written by init and handles the split for you. Commit the whole .fluree-memory/ directory; git will skip .local/ automatically.
When to use which
Repo scope (default):
- Facts about the codebase (“tests use cargo nextest”)
- Team decisions with rationale
- Constraints everyone must follow
- File/symbol pointers via
--refs(“X lives at Y”)
User scope:
- Your IDE quirks
- Personal conventions the team hasn’t agreed on
- Scratch notes while you’re exploring
- Anything you’d be embarrassed to commit
Changing scope after the fact
You can’t move a memory between scopes directly. If you stored something as repo that should be user-only:
fluree memory forget <id>
fluree memory add --scope user --kind <kind> --text "..."
Recall sees both
By default, fluree memory recall and the memory_recall MCP tool return matches from both scopes — your personal notes and the team’s are merged in the result set. Filter with --scope repo or --scope user if you need to isolate one.
Sharing with the team
Memory becomes a shared asset as soon as you commit .fluree-memory/repo.ttl. A teammate who clones the repo and runs fluree memory init gets the ledger populated from the committed TTL automatically — no manual import step.
Conflicts on repo.ttl resolve like any other text file. TTL is line-oriented per-triple, so most merges are clean; occasionally you’ll see a merge mark in the middle of a memory’s fields and need to pick one side.
See Team workflows for the full story.
Updates and forgetting
Memories are updated in place. When you update a memory, the same ID is kept and only the changed fields are modified. History is tracked via git, not via internal versioning.
update modifies in place
fluree memory update mem:fact-01JDXYZ... --text "Tests use cargo nextest with --no-fail-fast"
Output:
Updated: mem:fact-01JDXYZ...
The memory keeps its original ID. The TTL file is rewritten with the new content, and git records what changed:
git diff .fluree-memory/repo.ttl
mem:fact-01JDXYZ a mem:Fact ;
- mem:content "Tests use cargo nextest" ;
+ mem:content "Tests use cargo nextest with --no-fail-fast" ;
mem:tag "cargo" ;
forget retracts
forget is different from update. It retracts the memory’s triples — the memory stops existing entirely.
fluree memory forget mem:fact-01JDXYZ...
Forgotten: mem:fact-01JDXYZ...
Rule of thumb:
| You think… | Use |
|---|---|
| “This was wrong from the start” | forget |
| “This was right but the world changed” | update |
| “I never want anyone to see this again” | forget |
History via git
Both update and forget rewrite the TTL file, and git tracks the full history. To see how a memory evolved:
git log -p .fluree-memory/repo.ttl
This shows every change — what was added, updated, or forgotten, and when.
Time-travel over memory history
If you want to query memory history with Fluree’s time-travel capabilities, you can import your git-tracked memory history into a Fluree ledger:
fluree create my-memory-ledger --memory
This replays each git commit to .fluree-memory/repo.ttl as a Fluree transaction, giving you a full time-travel-capable ledger over your memory history. Use --no-user to exclude user.ttl from the import.
Recall and ranking
recall is how you get memories out. It’s a keyword query against an inverted index with BM25 scoring — fast, local, and deterministic.
The basics
fluree memory recall "how do I run tests"
The query string is tokenized and matched against each memory’s content via a BM25-scored fulltext index. Tags, artifact refs, kind, branch, and recency contribute as re-rank bonuses on top of the BM25 score — they’re not part of the fulltext match itself. Results are sorted by combined score (higher = better) and capped at --limit (default: 3).
Recall: "how do I run tests" (2 matches)
1. [score: 13.0] mem:fact-01JDXYZ...
Tests use cargo nextest, not cargo test
Tags: testing
2. [score: 8.0] mem:fact-01JDABC...
Integration tests use assert_cmd + predicates
Tags: testing
What BM25 rewards
BM25 scores a memory’s content higher when:
- Query terms appear in the content.
- Those terms are rare in the overall store — a match on “postcard” beats a match on “the”.
- The matched terms are in a shorter memory — density matters.
- Multiple distinct terms from the query match (not the same term repeated).
There are no embeddings, no semantic matching — just lexical overlap with smart weighting. If you mean “tests” but phrase it as “unit tests” or “testing”, BM25 catches that because the stems overlap; it won’t catch “QA” unless the content mentions it.
Re-rank bonuses
After BM25 produces content scores, Fluree Memory adds small bonuses:
- Tag hit: +10 per tag that contains a query word.
- Artifact ref hit: +8 per ref path that contains a query word.
- Kind word in query: +6 if the query mentions the memory’s kind (“constraint”, “decision”, etc.).
- Branch match: +3 if the memory was captured on the current git branch.
- Recency: +2 for memories <7 days old, +1 for <30 days.
If BM25 returns no hits, recall falls back to metadata-only scoring using these same bonuses so a well-tagged memory can still surface on a content miss.
Filters
Filters narrow the candidate set before scoring:
# Only constraints tagged "errors"
fluree memory recall "handling" --kind constraint --tags errors
# Only repo-scoped memories
fluree memory recall "deployment" --scope repo
# Page through results
fluree memory recall "tests" --limit 10 --offset 10
Common filter recipes:
| You want… | Flags |
|---|---|
| Team-only (ignore personal) | --scope repo |
| Just the hard rules | --kind constraint |
| Just the decisions with reasoning | --kind decision |
| Pointers to code | --kind fact --tags <domain> (with --refs) |
Output formats
fluree memory recall "tests" # text — for humans
fluree memory recall "tests" --format json # JSON — for scripts
fluree memory recall "tests" --format context # XML — for LLM injection
The context format produces a compact XML block designed to be pasted into an agent’s context window:
<memory-context>
<memory id="mem:fact-01JDXYZ..." kind="fact" score="13.0">
<content>Tests use cargo nextest, not cargo test</content>
<tags>testing</tags>
</memory>
<pagination shown="1" offset="0" total_in_store="13" />
</memory-context>
When results are cut off, the pagination element embeds a human-readable hint telling the agent how to get more:
<pagination shown="3" offset="0" limit="3" total_in_store="13">
Results 1–3. Use offset=3 to retrieve more.
</pagination>
This pattern is why Fluree Memory is practical to use with an agent: a small, ranked slice goes into context, and the agent can ask for more if the top hits aren’t enough.
How this compares to other approaches
| Approach | Cost | Quality | Works offline |
|---|---|---|---|
| BM25 (Fluree Memory) | free, instant | high for keyword overlap | yes |
| Embedding search | paid + latency | high for paraphrase | usually no |
| Stuff-it-all-in-CLAUDE.md | free | context blow-up | yes |
For developer memory — where the agent knows the words for what it’s looking for — BM25 is a very good fit. If you later want semantic recall, Fluree DB itself ships a vector search feature that the memory store could layer on.
MCP server
Fluree Memory exposes its functionality over Model Context Protocol so AI coding agents can use it natively. The MCP server is bundled with the fluree CLI — no separate install.
Start it manually
fluree mcp serve --transport stdio
In practice you never start it manually — your IDE launches it. fluree memory mcp-install writes the IDE-specific config that does the spawning. See mcp-install for the per-IDE details.
Tools exposed
The server exposes these tools to the agent:
memory_recall
Search for relevant memories.
{
"name": "memory_recall",
"arguments": {
"query": "how do I run tests",
"limit": 5,
"offset": 0,
"kind": "fact",
"tags": ["testing"],
"scope": "repo"
}
}
Returns XML context-formatted output (see Recall and ranking).
memory_add
Store a new memory. The content field is named content (not text).
{
"name": "memory_add",
"arguments": {
"kind": "fact",
"content": "Tests use cargo nextest, not cargo test",
"tags": ["testing"],
"scope": "repo"
}
}
Other optional arguments: refs, severity, rationale, alternatives. Returns the new memory ID.
memory_update
Patch an existing memory in place. The memory keeps its ID; only the fields you pass are changed. Use content (not text) for the new body.
{
"name": "memory_update",
"arguments": {
"id": "mem:fact-01JDXYZ...",
"content": "Tests use cargo nextest with --no-fail-fast"
}
}
Also accepts tags, refs, rationale, alternatives.
memory_forget
Retract a memory permanently.
{
"name": "memory_forget",
"arguments": { "id": "mem:fact-01JDXYZ..." }
}
memory_status
Return a summary of the store — totals by kind and a preview of recent memories. Agents are encouraged to call this first to discover what topics to query.
kg_query
Run a raw SPARQL SELECT against the __memory ledger. Advanced escape hatch — prefer memory_recall for ranked search.
{
"name": "kg_query",
"arguments": {
"query": "PREFIX mem: <https://ns.flur.ee/memory#> SELECT ?id ?content WHERE { ?id a mem:Constraint ; mem:content ?content } LIMIT 20"
}
}
Where the store lives
When the MCP server starts, it picks its Fluree directory the same way the CLI does:
- If
$FLUREE_HOMEis set, that directory is used (unified mode). - Otherwise it walks up from the spawn CWD looking for an existing
.fluree/. - If neither is found, it falls back to the platform’s global config/data directories.
When the server is in unified mode (cases 1 and 2), the memory store lives in <dir>/../.fluree-memory/ and is shared with the CLI. In global mode, file-based sync is disabled and memories live only in the global ledger.
This matters for IDE integrations: the Cursor config that mcp-install writes explicitly sets FLUREE_HOME=${workspaceFolder}/.fluree so memory stays scoped to the current repo regardless of Cursor’s CWD. The other supported IDEs (Claude Code, VS Code, Windsurf, Zed) rely on the spawn CWD plus the walk-up behavior — which normally works, but can land in a global store if the IDE spawns the MCP server from outside the repo. If you see that, set FLUREE_HOME manually in the MCP config or re-run mcp-install from inside the repo root.
The rules file
Alongside the MCP server, mcp-install writes (or appends to) a short rules file for IDEs that support one:
| IDE | Rules file |
|---|---|
| Claude Code | Short section appended to <repo>/CLAUDE.md |
| Cursor | <repo>/.cursor/rules/fluree_rules.md |
| VS Code | <repo>/.vscode/fluree_rules.md |
| Windsurf, Zed | None written — you can add your own guidance manually |
The file tells the agent when to reach for memory tools — e.g. at the start of a task (memory_recall first), after capturing something reusable (memory_add), and not to re-ask the user for things already memorized. You can edit it to customize the agent’s instincts; see Customizing the rules file.
Secrets and sensitivity
Memory is meant to be written freely and committed to git. That only works if secrets never land in there.
Automatic redaction
Every memory_add / fluree memory add runs the input through a secret detector before storage. If the content matches patterns for API keys, passwords, tokens, or connection strings, the sensitive substrings are replaced with [REDACTED] and a warning is printed:
warning: secrets detected in content — storing redacted version.
Original content contained sensitive data that was replaced with [REDACTED].
Stored memory: mem:fact-01JDXYZ...
Patterns covered include:
- AWS access key IDs (
AKIA…) - GitHub personal access tokens (
ghp_…,gho_…,ghu_…,ghs_…,ghr_…) - OpenAI keys (
sk-…) and Anthropic keys (sk-ant-…) - Fluree API keys (
flk_…) - Generic
api_key=…/apikey: …assignments password=…/passwd: …assignments- Connection strings with inline credentials (
postgres://,mysql://,mongodb://,redis://,amqp://containinguser:pass@host) - PEM private keys (
-----BEGIN … PRIVATE KEY-----) - Bearer tokens (
Bearer eyJ…) - JWT tokens (three base64 segments separated by dots)
Redaction preserves enough context that the memory still makes sense (e.g. “Use the API key [REDACTED] from 1Password”) while the actual value never reaches the TTL file.
The detector is pattern-based, not entropy-based — well-disguised secrets outside these patterns can still slip through. Treat redaction as a safety net, not a guarantee.
Scope as the privacy boundary
Memory visibility is controlled by scope (repo or user), not by a separate sensitivity level. Repo-scoped memories live in .fluree-memory/repo.ttl and are committed to git, so they’re visible to anyone with repo access. User-scoped memories live in .fluree-memory/.local/user.ttl, which is gitignored.
If something is client-specific or team-internal, put it in user scope or use a private sub-repo. The scope mechanism plus secret-detection on ingest handles what a separate sensitivity field used to.
What if I slip?
If something slipped past the detector and into repo.ttl before you noticed:
fluree memory forget <id>— retracts the memory.- Run
git log -p .fluree-memory/repo.ttland usegit filter-repo(or the BFG) to scrub the history if the value leaked there too. - Rotate the credential at the source. Redaction in memory doesn’t rotate keys.
Treat this the same way you’d treat accidentally committing .env — the git history is the hard part, the file is the easy part.
Guides
Task-oriented walkthroughs for common situations.
- Team workflows: sharing memory via git — how
repo.ttlbecomes shared knowledge and how conflicts resolve. - Customizing the rules file — tuning what your AI tool does with memory.
- Migrating from plain-markdown memory — moving from
CLAUDE.md/AGENTS.mdstyle blobs to structured memories.
Looking for end-to-end setup instead? See Getting started.
Team workflows: sharing memory via git
The whole point of repo.ttl is that memory becomes a team asset — captured once by whoever learns it, available to every teammate and every AI agent forever.
The happy path
- Someone runs
fluree memory initin the repo and commits.fluree-memory/(minus the gitignored.local/). - Teammates pull and run
fluree memory initonce. The init picks up the committedrepo.ttland populates the ledger from it. No manual import. - As people add memories,
.fluree-memory/repo.ttlchanges in the working tree. Commit it like any other file. - Pulls bring in new memories automatically —
fluree memory recall(and the MCP server) read the ledger, which stays in sync with the TTL file.
That’s it. No server, no sync daemon, no API tokens. Git is the sync mechanism.
What to commit
✅ Commit:
.fluree-memory/repo.ttl.fluree-memory/.gitignore- Any IDE config MCP-install created:
.cursor/mcp.json,.cursor/rules/fluree_rules.md,.vscode/mcp.json,.vscode/fluree_rules.md,.zed/settings.json
❌ Don’t commit:
.fluree-memory/.local/user.ttl— your personal memories (handled by.fluree-memory/.gitignore).fluree-memory/.local/mcp.log— noisy and personal (handled by.fluree-memory/.gitignore).fluree/— the Fluree storage dir, can be re-hydrated fromrepo.ttl(add this to your project’s root.gitignore—.fluree-memory/.gitignoreonly covers its own subtree)
Reviewing memory in PRs
Treat repo.ttl changes like documentation changes in code review:
- New memory? Is the kind right? Is the wording accurate? Are the tags useful?
- Updated memory? Is the new content better (not just different)?
- Forgot memory? Was that really wrong, or should it have been updated instead?
Memories are serialized as subject blocks with one predicate per line, so most diffs are readable.
Merge conflicts
Memories in repo.ttl are sorted by (branch, id) — memories from the same git branch cluster together, and different branches land in different regions of the file. This means two feature branches that each add memories will almost never conflict, because their blocks insert at different positions in the file.
The branch name is captured automatically when a memory is created, so memories from feature/auth sort separately from memories created on feature/indexer. Within each branch group, memories are ordered chronologically (ULID encodes creation time).
When conflicts do occur, they’re usually because two branches modified the same existing memory (via update) or both worked on the same branch. These are typically clean to resolve:
<<<<<<< HEAD
mem:fact-01JD... a mem:Fact ;
mem:content "Tests use cargo nextest" ;
mem:tag "cargo" ;
mem:tag "testing" ;
...
=======
mem:fact-01JD... a mem:Fact ;
mem:content "Tests use cargo nextest with --no-fail-fast" ;
mem:tag "testing" ;
...
>>>>>>> their-branch
Pick the version you want or combine them, then re-run fluree memory status to make sure the store parses cleanly. If the merged file is genuinely messy, a cleaner path is to accept one side wholesale and then apply the other side’s changes via fluree memory add / update on top.
Onboarding a new teammate
When someone new clones the repo:
git clone git@github.com:team/project
cd project
fluree memory init
After init, they can immediately:
fluree memory recall "testing" -n 10
…and get everything the team has captured. No setup beyond installing fluree.
Going further
- Keep PR review short by tagging memories with their domain (
auth,indexer,docs, etc.) so reviewers can filter. - Use
constraint --severity mustsparingly — they’re the “policy layer” of memory. Prefershouldorpreferfor taste. - Periodically
fluree memory statusand prune stale memories withforget. The store should feel curated, not a dumping ground.
Customizing the rules file
When you run fluree memory mcp-install, a short “rules file” gets written alongside the MCP server config. This file tells your AI tool when and how to use the memory tools — things the tool definitions alone don’t express.
Where it lives
| IDE | Rules file |
|---|---|
| Claude Code | Section appended to <repo>/CLAUDE.md |
| Cursor | <repo>/.cursor/rules/fluree_rules.md |
| VS Code | <repo>/.vscode/fluree_rules.md |
| Windsurf | Not written — add your own guidance to Windsurf’s memory / rules UI |
| Zed | Not written — add your own guidance via Zed’s assistant settings |
The canonical source for the default text lives in fluree-db-memory/rules/fluree_rules.md in the repo; the Cursor and VS Code installers copy it verbatim. The Claude Code installer appends a short variant directly to CLAUDE.md. Windsurf and Zed don’t have a conventional per-project rules-file slot that mcp-install targets automatically — the paragraph below is a reasonable starting point if you want to paste one in yourself.
What the default says
A minimal set of instructions along these lines:
Before starting a task: call
memory_recallwith a query describing what you’re about to do. Review the top matches for constraints, decisions, and relevant facts.After learning something reusable: call
memory_addwith the appropriate kind:
fact— verifiable truths about the codebase (use--refsfor file pointers)decision— choices with rationale (use--rationale)constraint— rules with severity (use--severity must/should/prefer)Don’t re-ask the user for things that are already in memory.
Customizing
Edit the file freely. Common tweaks:
- Add domain-specific guidance: “When working on the indexer, always recall with the
indexertag first.” - Tighten the defaults: “Only call
memory_addfor memories that will apply in future sessions — not for task-specific scratch.” - Shape the kinds: “Use
factwith--refswhen the memory is really a pointer to a file or symbol.”
Reloading
- Cursor / VS Code: reload the window after editing.
- Claude Code: appending to
CLAUDE.mdtakes effect on the next session. - Zed: agent reads settings on connection — reload.
Keeping team customizations shared
If you edit the rules file and like what you got, commit it. Teammates get your tuning automatically on their next pull. The rules file is just markdown — treat it like any other piece of team guidance.
Migrating from plain-markdown memory
Many teams start with one big markdown file that their AI tool reads on every session — CLAUDE.md, AGENTS.md, .cursorrules, .windsurfrules, or a section in README.md. These files work until they don’t: they bloat context, mix levels (architectural rules next to “the CI flag is –all-features”), and rot silently.
Here’s a pragmatic migration from that world to structured memory.
Phase 1: leave the markdown alone
You don’t have to delete anything to start using Fluree Memory. Add memories for new things you learn while keeping the old file around. After a week or two of active use, you’ll have a sense of which things belong where.
fluree memory init
# ...work, capture things as they come up...
fluree memory add --kind constraint --severity must \
--text "All public fns must have doc comments" --tags code-style
Phase 2: categorize the markdown file
Open the old file and go paragraph by paragraph. For each chunk, ask:
| Chunk type | Where it goes |
|---|---|
| High-level overview / architecture prose | Stays in markdown (README, ARCHITECTURE.md) |
| Rules (“do this”, “don’t do that”) | → constraint memories with --severity |
| Choices + reasoning | → decision memories with --rationale |
| Named quirks / gotchas | → fact memories |
| “Look here for X” | → fact memories with --refs |
| Personal preferences | → fact memories (--scope user usually) |
The markdown file that’s left after this should be genuinely about framing — the 30-second project tour — not a knowledge base.
Phase 3: move the categorized chunks
Turn each chunk into a memory add call. Tag consistently so things group later:
fluree memory add --kind constraint --severity must \
--text "Never commit secrets; use environment variables" \
--tags security,secrets
fluree memory add --kind decision \
--text "Use postcard for index encoding" \
--rationale "no_std compatible, smaller than bincode" \
--alternatives "bincode, CBOR, MessagePack" \
--refs fluree-db-indexer/ \
--tags indexer,encoding
fluree memory add --kind fact \
--text "Error pattern defined here" \
--refs fluree-db-core/src/error.rs \
--tags errors
If you want to script it, pipe content into fluree memory add on stdin (with --kind / --tags set per-line). add reads stdin when --text is omitted:
echo "The index format uses postcard encoding" \
| fluree memory add --kind fact --tags indexer
Phase 4: trim the old file
Once the chunks are in memory, delete them from the markdown. What’s left is your high-level orientation doc, which is fine.
Leave a pointer at the top:
> Detailed conventions, rules, and decisions are in Fluree Memory.
> Use `memory_recall` from an MCP-enabled IDE, or `fluree memory recall "..."` from the shell.
Phase 5: review
Run fluree memory status and fluree memory recall "" -n 50 to eyeball everything. Look for:
- Duplicates — memories that say nearly the same thing with different wording.
- Mis-categorized kinds — a “decision” with no rationale is really a
fact. - Over-long content — memories should be paragraphs at most, not pages. Break up if needed.
Why this is worth doing
| Plain markdown | Structured memory |
|---|---|
| Entire file loaded every session | Only relevant matches loaded |
| No filtering | Filter by kind, tag, scope |
| No history | Full history via git log -p |
| Hard to share a slice | export + jq / curated recall |
| Drifts silently | status visibility + curation flow |
You get a knowledge base the team can actually maintain — and that costs fewer tokens per session than the markdown file it replaces.
CLI reference
The fluree memory subcommands, alphabetically-ish.
| Command | Purpose |
|---|---|
init | Create the memory store and optionally configure MCP for detected AI tools |
add | Store a new memory |
recall | Search and rank relevant memories |
update | Update an existing memory in place |
forget | Retract a memory permanently |
status | Summary of the store (totals, tags, kinds) |
export / import | Round-trip memories as JSON |
mcp-install | Install MCP config for an IDE |
Several subcommands take a --format flag (text for humans, json for scripts, and context on recall for XML intended for LLM injection). The default is always text.
The common options
A few flags show up across many subcommands:
| Flag | Default | Where |
|---|---|---|
--scope <repo|user> | repo | add; filter on recall |
--tags <t1,t2> | none | add, update; filter on recall |
--kind <kind> | fact on add | add; filter on recall |
--format <text|json> | text | add, update |
--format <text|json|context> | text | recall (XML context is for LLM injection) |
See What is a memory? for the kind taxonomy.
Environment
| Variable | Effect |
|---|---|
FLUREE_HOME | When set, the CLI and MCP server use this path as the unified Fluree directory. If unset, both walk up from CWD looking for an existing .fluree/; if none is found, they fall back to a platform-global config/data directory. |
Set FLUREE_HOME=<repo>/.fluree if you need to force repo-scoped operation from a shell that starts elsewhere. Among the IDE integrations, only the Cursor MCP config sets this automatically via ${workspaceFolder}; the others rely on the walk-up behavior from spawn CWD.
fluree memory init
Initialize the memory store and optionally configure MCP for detected AI coding tools. Idempotent — safe to run repeatedly.
fluree memory init [OPTIONS]
Options
| Option | Description |
|---|---|
--yes, -y | Auto-confirm all MCP installations (non-interactive) |
--no-mcp | Skip AI tool detection and MCP configuration entirely |
What init does
- Creates the
__memoryledger inside<repo>/.fluree/and transacts the memory schema. - Creates
.fluree-memory/at the project root:repo.ttl— team memories (empty to start; meant to be committed).local/user.ttl— your personal memories (gitignored).gitignore— pre-configured with.local/(which holds your user scope and the MCP log)
- Migrates existing memories — if the ledger already has memories (e.g. from before the TTL file layout), they’re exported into the appropriate
.ttlfile. - Detects AI coding tools (Claude Code, Cursor, VS Code, Windsurf, Zed) and offers to install MCP for each.
Example
$ fluree memory init
Memory store initialized at /path/to/project/.fluree-memory
Repo memories are stored in .fluree-memory/repo.ttl (git-tracked).
Commit this directory to share project knowledge with your team.
Detected AI coding tools:
- Claude Code (already configured)
- Cursor
- VS Code (Copilot) (already configured)
Install MCP config for Cursor? [Y/n] Y
Installed: .cursor/mcp.json
Installed: .cursor/rules/fluree_rules.md
Configured 1 tool.
With --yes: auto-confirms all installations without prompting. In a non-interactive shell (piped stdin) without --yes, MCP installation is skipped with a message.
Re-running
init is safe to run again. It won’t re-create or overwrite files that already exist; it just:
- Checks that the ledger and schema are current (migrating if not).
- Detects IDEs you’ve since installed and offers to configure them.
- Leaves existing memories untouched.
Run it again after:
- Installing a new AI tool you want to wire up.
- Cloning a repo someone else set up —
initwill pick up the committedrepo.ttlinto the ledger automatically.
fluree memory add
Store a new memory.
fluree memory add [OPTIONS]
Options
| Option | Description |
|---|---|
--kind <KIND> | fact (default), decision, constraint |
--text <TEXT> | Content text (or provide via stdin) |
--tags <T1,T2> | Required. Comma-separated tags — the primary recall signal |
--refs <R1,R2> | Comma-separated file/artifact references |
--severity <SEV> | For constraints: must, should, prefer |
--scope <SCOPE> | repo (default) or user |
--rationale <TEXT> | Why — the reasoning behind this memory (any kind) |
--alternatives <TEXT> | Alternatives considered (any kind) |
--format <FMT> | text (default) or json |
Examples
# A simple fact
fluree memory add --kind fact \
--text "Tests use cargo nextest" \
--tags testing,cargo
# A hard constraint with rationale
fluree memory add --kind constraint \
--text "Never suppress dead code with an underscore prefix" \
--tags code-style \
--severity must \
--rationale "Underscore-prefixed names hide code from future discovery"
# From stdin (useful for piping from other tools)
echo "The index format uses postcard encoding" \
| fluree memory add --kind fact --tags indexer
# A decision with full context
fluree memory add --kind decision \
--text "Use postcard for compact index encoding" \
--rationale "no_std compatible, smaller output than bincode" \
--alternatives "bincode, CBOR, MessagePack" \
--refs fluree-db-indexer/
# A fact pointing to a file (use --refs for artifact pointers)
fluree memory add --kind fact \
--text "Error pattern defined here" \
--refs fluree-db-core/src/error.rs \
--tags errors
# A personal convention, user-scoped
fluree memory add --kind fact \
--text "Always run clippy with --all-features" \
--scope user \
--tags code-style
Output
Default (text):
Stored memory: mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0
json:
{
"id": "mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0",
"kind": "fact",
"scope": "repo",
"created_at": "2026-04-14T16:45:12Z"
}
Secret detection
If the content matches a known secret pattern (AWS keys, GitHub tokens, password-bearing URLs, etc.), the sensitive portions are replaced with [REDACTED] before storage and a warning is printed. See Secrets and sensitivity.
Scope and file placement
| --scope repo (default) | Writes to .fluree-memory/repo.ttl — committable |
| --scope user | Writes to .fluree-memory/.local/user.ttl — gitignored |
See Repo vs user memory.
See also
recall— search stored memoriesupdate— update an existing memory in place- What is a memory? — choosing the right kind
fluree memory recall
Search and retrieve relevant memories ranked by BM25 score.
fluree memory recall <QUERY> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<QUERY> | Natural-language search query (keyword-matched, not semantic) |
Options
| Option | Description |
|---|---|
-n, --limit <N> | Max results per page (default: 3) |
--offset <N> | Skip the first N results — use for pagination (default: 0) |
--kind <KIND> | Filter to a specific memory kind |
--tags <T1,T2> | Filter to memories with these tags |
--scope <SCOPE> | Filter by repo or user |
--format <FMT> | text (default), json, or context (XML for LLM) |
Examples
# Basic recall — returns top 3
fluree memory recall "how to run tests"
# Page through a longer result set
fluree memory recall "how to run tests" --offset 3
fluree memory recall "error handling" -n 10
# Narrow with filters
fluree memory recall "error handling" --kind constraint --tags errors
fluree memory recall "deployment" --scope repo
# XML output designed for LLM context injection
fluree memory recall "testing patterns" --format context
Output
text
Recall: "how to run tests" (2 matches)
1. [score: 13.0] mem:fact-01JDXYZ...
Tests use cargo nextest
Tags: testing, cargo
2. [score: 8.0] mem:fact-01JDABC...
Integration tests use assert_cmd + predicates
Tags: testing
(showing results 1–3; use --offset 3 for more)
json
{
"query": "how to run tests",
"memories": [
{
"memory": {
"id": "mem:fact-01JDXYZ...",
"kind": "fact",
"content": "Tests use cargo nextest",
"tags": ["testing", "cargo"],
"scope": "repo",
"created_at": "2026-02-22T14:00:00Z"
},
"score": 13.0
}
],
"total_count": 13
}
total_count is the total number of memories in the store, not the number of matches — useful for UI context but not for pagination math.
context (XML for LLM injection)
<memory-context>
<memory id="mem:fact-01JDXYZ..." kind="fact" score="13.0">
<content>Tests use cargo nextest</content>
<tags>testing, cargo</tags>
</memory>
<pagination shown="1" offset="0" total_in_store="13" />
</memory-context>
When results are cut off, the pagination element embeds a hint:
<pagination shown="3" offset="0" limit="3" total_in_store="13">
Results 1–3. Use offset=3 to retrieve more.
</pagination>
How ranking works
See Recall and ranking for the full story — BM25 over content plus metadata bonuses for tag, ref, kind, branch, and recency matches. Filters (--kind, --tags, --scope) narrow the candidate set. All local, deterministic, and offline.
See also
add— store a new memorystatus— store summary- Recall and ranking
fluree memory update
Update an existing memory in place. The memory keeps the same ID — only the changed fields are modified. History is tracked via git.
fluree memory update <ID> [OPTIONS]
Options
| Option | Description |
|---|---|
--text <TEXT> | New content text |
--tags <T1,T2> | New tags (replaces all existing) |
--refs <R1,R2> | New artifact refs (replaces all existing) |
--format <FMT> | text (default) or json |
Example
fluree memory update mem:fact-01JDXYZ... \
--text "Tests use cargo nextest with --no-fail-fast"
Output:
Updated: mem:fact-01JDXYZ...
The TTL file is rewritten with the updated content. Use git diff to see what changed, or git log -p .fluree-memory/repo.ttl to review the full history.
See also
forget— retract instead of update- Updates and forgetting — the update model
fluree memory forget
Retract a memory permanently. Unlike update, forget removes the memory entirely — it stops existing.
fluree memory forget <ID>
Output:
Forgotten: mem:fact-01JDXYZ...
When to forget vs. update
| You think… | Use |
|---|---|
| “This was wrong from the start” | forget |
| “This was right but the world changed” | update |
| “I never want anyone to see this again” | forget |
See Updates and forgetting for more detail.
Forgetting accidentally-committed secrets
Forgetting removes the memory from the ledger and rewrites repo.ttl (or .local/user.ttl) immediately, so the deletion shows up in your next git diff. If a secret value also ended up in git history, you need to scrub the history separately — see Secrets and sensitivity.
Memory history via git
The fluree memory explain command has been removed. Memory history is now tracked via git.
Viewing history
Since updates modify memories in place and the TTL file is rewritten on each change, git log shows the full history:
# Full history of all memory changes
git log -p .fluree-memory/repo.ttl
# Search for changes to a specific memory ID
git log -p -S "mem:fact-01JDXYZ" .fluree-memory/repo.ttl
# Compact one-line summary
git log --oneline .fluree-memory/repo.ttl
Time-travel via Fluree
For richer querying over memory history, import your git history into a Fluree ledger:
fluree create my-memory-ledger --memory
Each git commit becomes a Fluree transaction, enabling time-travel queries over the full evolution of your project’s memory.
See Updates and forgetting for the update model.
fluree memory status
Show a summary of the memory store.
fluree memory status
Output:
Directory: /path/to/project/.fluree-memory
Memory Store: 12 memories, 25 tags
Kinds: 7 fact, 2 decision, 3 constraint
Recent memories:
- [fact] Tests use cargo nextest, not cargo test [cargo, testing]
ID: mem:fact-01JDXYZ...
- [decision] Use postcard for compact index encoding [encoding, indexer]
ID: mem:decision-01JDABC...
Use memory_recall with specific keywords from above to search.
status counts all memories in the store. The “Recent memories” list is included to help agents (or you) pick good keywords for memory_recall.
When it’s useful
- Confirming init worked and the store is live.
- Sanity-checking after an import.
- Quick “how much does this project remember?” check.
For per-memory detail, use recall with a broad query (e.g. fluree memory recall "" -n 100) or export.
fluree memory export / import
Round-trip memories as JSON.
export
Write all memories to stdout as a JSON array.
fluree memory export > memories.json
export takes no options — it emits every memory, both scopes included. To get a single scope, filter with jq or use recall with --scope and a permissive limit.
Output is a flat array of full memory objects:
[
{
"id": "mem:fact-01JDXYZ...",
"kind": "fact",
"content": "Tests use cargo nextest",
"tags": ["testing", "cargo"],
"scope": "repo",
"severity": null,
"artifact_refs": [],
"branch": "main",
"created_at": "2026-02-22T14:00:00Z"
}
]
import
Load memories from a JSON file produced by export (or a hand-crafted array of the same shape).
fluree memory import memories.json
Import is additive — every entry in the file is re-transacted into the ledger, with secret-detection applied to content, rationale, and alternatives. IDs and timestamps from the source file are preserved. There is no dedup step, so importing the same file twice will double-insert; forget the existing entries first (or import into a freshly-initialized store) if that’s not what you want.
When to use
- Backup / portability — export before a risky refactor.
- Bootstrapping a new repo from another project’s knowledge.
- Sharing a slice of memory out-of-band (e.g. into an issue or wiki).
For normal team sharing, you don’t need export/import — .fluree-memory/repo.ttl is committed to git and everyone who clones + runs fluree memory init picks it up automatically. See Repo vs user memory.
fluree memory mcp-install
Install MCP configuration for an IDE so its agent can use memory tools.
fluree memory mcp-install [--ide <IDE>]
Options
| Option | Description |
|---|---|
--ide <IDE> | Target IDE (auto-detected if omitted) |
Supported IDEs
| Value | Config written | Extras |
|---|---|---|
claude-code | claude mcp add → ~/.claude.json (local scope) | Appends to CLAUDE.md |
vscode | <repo>/.vscode/mcp.json (key: servers) | .vscode/fluree_rules.md |
cursor | <repo>/.cursor/mcp.json (key: mcpServers) | .cursor/rules/fluree_rules.md |
windsurf | ~/.codeium/windsurf/mcp_config.json (global) | — |
zed | <repo>/.zed/settings.json (key: context_servers) | Skips if JSONC detected |
Legacy aliases: claude-vscode and github-copilot both map to vscode.
When --ide is omitted, the first unconfigured detected tool is used; defaults to claude-code if nothing’s detected.
Example
fluree memory mcp-install --ide cursor
Output:
Installed: .cursor/mcp.json
Installed: .cursor/rules/fluree_rules.md
Per-IDE config shape
The JSON mcp-install writes differs per IDE:
Cursor (.cursor/mcp.json) is the only target that sets FLUREE_HOME by default. It uses ${workspaceFolder} interpolation to pin the memory store to the current workspace regardless of where Cursor spawns the process from:
{
"mcpServers": {
"fluree-memory": {
"type": "stdio",
"command": "fluree",
"args": ["mcp", "serve", "--transport", "stdio"],
"env": { "FLUREE_HOME": "${workspaceFolder}/.fluree" }
}
}
}
VS Code, Windsurf, Zed, Claude Code get a simpler entry with no env:
{
"command": "fluree",
"args": ["mcp", "serve", "--transport", "stdio"]
}
(The top-level wrapper key differs — servers for VS Code, mcpServers for Windsurf, context_servers for Zed. Claude Code’s entry is registered globally via claude mcp add.)
These rely on the MCP server’s walk-up behavior: on start, it looks for .fluree/ beginning at its spawn CWD. That’s usually the workspace, but if the IDE starts it elsewhere memory may land in a global store. See the troubleshooting section below.
Troubleshooting: repo vs global memory
Repo-scoped (the goal):
- Memories:
<repo>/.fluree-memory/repo.ttl - MCP log:
<repo>/.fluree-memory/.local/mcp.log(truncated on each server start — tail it while reproducing the issue)
Global (something’s wrong):
- Memories under the platform default, e.g.
~/Library/Application Support/fluree/on macOS - Fix: add an explicit absolute
FLUREE_HOMEto the MCP config entry, pointing at your repo’s.fluree/, and fully restart (not just reload) the IDE. For Cursor, the${workspaceFolder}-based default should already be in place — re-runmcp-installfrom inside the repo if it’s missing.
See also
- Concepts: MCP server — what tools are exposed
- Getting started: Claude Code / Cursor / VS Code / Windsurf / Zed — per-IDE walkthroughs
Reference
Lookups — the things you open once to check a specific detail.
- IDE support matrix — what’s supported where, config paths, gotchas.
- Schema (mem: vocabulary) — the RDF shape of a memory.
- TTL file format — structure of
repo.ttlanduser.ttl.
IDE support matrix
Where each supported AI coding tool stores its MCP config and its rules file, and whether the config is scoped per-repo or global.
| IDE | MCP config | Config scope | FLUREE_HOME set? | Rules file | mcp-install value |
|---|---|---|---|---|---|
| Claude Code | ~/.claude.json (via claude mcp add) | user (local) | no | section appended to <repo>/CLAUDE.md | claude-code |
| Cursor | <repo>/.cursor/mcp.json | repo | yes — ${workspaceFolder}/.fluree | <repo>/.cursor/rules/fluree_rules.md | cursor |
| VS Code (Copilot) | <repo>/.vscode/mcp.json | repo | no | <repo>/.vscode/fluree_rules.md | vscode |
| Windsurf | ~/.codeium/windsurf/mcp_config.json | global | no | none | windsurf |
| Zed | <repo>/.zed/settings.json | repo | no | none (skipped if JSONC) | zed |
Legacy aliases:
claude-vscode→vscodegithub-copilot→vscode
FLUREE_HOME and repo scoping
Only the Cursor config sets FLUREE_HOME automatically. For the other IDEs, the MCP server figures out which repo it’s serving by walking up from its spawn CWD until it finds a .fluree/ directory. In normal use the IDE spawns the server from the workspace root, so this works without extra configuration.
If memory ends up in a platform-global store instead of <repo>/.fluree-memory/, the fix is to add FLUREE_HOME manually to the relevant MCP config, pointing at an absolute path (or a variable the IDE interpolates — Cursor supports ${workspaceFolder}; other IDEs’ support varies). Then restart the IDE.
Known gotchas
- Zed + JSONC: If
.zed/settings.jsoncontains//comments,mcp-installrefuses to write to avoid corrupting your settings. Paste the snippet yourself or strip comments first. - Windsurf globals: Windsurf’s MCP config is user-global, not per-repo. If you work across multiple repos, you likely need to leave
FLUREE_HOMEunset and rely on walk-up — or switch the env var per project manually. - Cursor restarts: Cursor caches MCP servers aggressively. If a change to
.cursor/mcp.jsondoesn’t take effect, fully quit Cursor (Cmd-Q on macOS) rather than just reloading the window. - Claude Code CLAUDE.md: The rules section is appended at the end of
CLAUDE.md(only if one doesn’t already mentionfluree memoryormemory_recall). If you have a large existing CLAUDE.md, make sure the agent is actually reading to the end.
Schema (mem: vocabulary)
Every memory is a set of RDF triples. The mem: vocabulary defines the classes and predicates.
Namespace
@prefix mem: <https://ns.flur.ee/memory#> .
Classes
A memory’s kind is expressed via rdf:type (a in Turtle) — there is no mem:kind predicate.
| Class | Kind |
|---|---|
mem:Fact | fact |
mem:Decision | decision |
mem:Constraint | constraint |
mem:repo and mem:user are additional IRIs used as the range of mem:scope (see below).
Core predicates
| Predicate | Range | Required | Meaning |
|---|---|---|---|
mem:content | xsd:string (indexed as @fulltext) | ✅ | The textual content; BM25-searchable |
mem:scope | IRI — mem:repo or mem:user | ✅ | Which TTL file it lives in |
mem:createdAt | xsd:dateTime | ✅ | Insertion timestamp |
mem:tag | xsd:string (multi-valued) | optional | Free-form tags |
mem:artifactRef | xsd:string (multi-valued) | optional | File / symbol / URL references |
mem:branch | xsd:string | optional | Git branch captured at write time |
Optional predicates (any kind)
These predicates can appear on any memory kind. All values are stored as plain string literals (not IRIs).
| Predicate | Range | Meaning |
|---|---|---|
mem:rationale | xsd:string (indexed as @fulltext) | Why — the reasoning behind this memory |
mem:alternatives | xsd:string | What else was considered |
mem:severity | xsd:string — "must", "should", or "prefer" | How hard a constraint is (constraints only) |
ID format
Memory IRIs take the shape:
mem:<kind>-<ULID>
Examples:
mem:fact-01JDXYZ5A2B3C4D5E6F7G8H9J0
mem:decision-01JDABC6D7E8F9G0H1I2J3K4L5
mem:constraint-01JDLMN7O8P9Q0R1S2T3U4V5W6
ULIDs are sortable by creation time, which is why memories display nicely in chronological order without an explicit index.
Full example
@prefix mem: <https://ns.flur.ee/memory#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
mem:decision-01JDABC a mem:Decision ;
mem:content "Use postcard for compact index encoding" ;
mem:tag "encoding" ;
mem:tag "indexer" ;
mem:scope mem:repo ;
mem:artifactRef "fluree-db-indexer/" ;
mem:createdAt "2026-02-22T14:00:00Z"^^xsd:dateTime ;
mem:rationale "no_std compatible, smaller output than bincode" ;
mem:alternatives "bincode, CBOR, MessagePack" .
See also: TTL file format for how this shows up on disk.
TTL file format
The .fluree-memory/repo.ttl and .fluree-memory/.local/user.ttl files hold the serialized form of every memory in their respective scope. Each memory is a block of Turtle triples.
Structure
Each memory is a Turtle subject block: the IRI, followed by a mem:<Kind> (RDF type), then a predicate list in a canonical order. Multi-valued predicates (mem:tag, mem:artifactRef) repeat once per value.
# Fluree Memory — repo-scoped
# Auto-managed by `fluree memory`. Manual edits are supported.
@prefix mem: <https://ns.flur.ee/memory#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
mem:fact-01JDXYZ a mem:Fact ;
mem:content "Tests use cargo nextest" ;
mem:tag "cargo" ;
mem:tag "testing" ;
mem:scope mem:repo ;
mem:createdAt "2026-02-22T14:00:00Z"^^xsd:dateTime .
mem:decision-01JDABC a mem:Decision ;
mem:content "Use postcard for compact index encoding" ;
mem:tag "encoding" ;
mem:tag "indexer" ;
mem:scope mem:repo ;
mem:artifactRef "fluree-db-indexer/" ;
mem:createdAt "2026-02-22T14:05:00Z"^^xsd:dateTime ;
mem:rationale "no_std compatible, smaller output than bincode" ;
mem:alternatives "bincode, CBOR, MessagePack" .
Tags and artifact refs are sorted alphabetically within a memory for deterministic diffs. When a memory is updated, the TTL file is rewritten with the changes in place and git tracks the history.
Why TTL and not JSON
Three reasons:
- Diff-friendly — predicates are one per line within a subject block, so git diffs are readable. Memories are sorted by
(branch, id), which groups memories from the same branch together and reduces merge conflicts across feature branches. - Merge-friendly — because the sort distributes memories by originating branch, two feature branches adding memories will insert into different regions of the file and won’t conflict on merge.
- Semantically exact — Turtle is RDF, so there’s no impedance mismatch between what’s in the file and what’s in the
__memoryledger.
Sync direction
The TTL file is the canonical store for a given scope. The __memory ledger is a derived cache rebuilt from the TTL files when they change.
When you memory add, the CLI / MCP server:
- Rewrites the TTL file with the new memory inserted in sorted position (authoritative).
- Transacts the new triples into the
__memoryledger (so recall is fast). - Writes a content-hash watermark to
.fluree-memory/.local/build-hash.
If the ledger write fails, the hash is left stale and the next ensure_synced call rebuilds the ledger from the files. When git pulls in a new version of repo.ttl, the hash mismatch triggers the same rebuild. In practice this is invisible.
Editing by hand
You can edit repo.ttl or user.ttl directly if you need to — fix a typo, reorder, batch-retag. After editing:
fluree memory status
…to verify the store parses cleanly. If there’s a syntax error, status will point at it.
For most fixes, though, prefer update / forget — they’ll produce cleaner git history than hand-edits.
File size
TTL is compact. A project with ~200 memories typically lands under 50 KB. At that size, repo.ttl stays pleasant to review in a PR.
If a file grows past that, consider whether you’re memorizing task state instead of durable knowledge — a fluree memory status + skim + cleanup pass is usually all it takes.
Operations
This section covers operational aspects of running Fluree in production, including configuration, storage backends, monitoring, and administrative operations.
Operation Guides
Configuration
Server configuration options:
- Command-line flags
- Configuration files
- Environment variables
- Runtime settings
- Tuning parameters
Running with Docker
Configuring the official fluree/server image:
- Image internals (entrypoint, volumes, runtime user)
- Three configuration approaches: env vars, mounted JSON-LD/TOML config, CLI flags
- Common recipes: LRU cache sizing, background indexing, auth, S3+DynamoDB, query peers
- Full annotated Docker Compose example
- Troubleshooting (volume permissions,
RUST_LOGvsFLUREE_LOG_LEVEL, cache auto-sizing under cgroup limits)
Storage Modes
Storage backend options:
- Memory storage (development)
- File system storage (single server)
- AWS S3/DynamoDB (distributed)
- IPFS / Kubo (decentralized)
- Storage selection criteria
- Switching between storage modes
IPFS Storage
IPFS-specific setup and configuration:
- Kubo node installation and setup
- JSON-LD configuration fields
- Content addressing and CID mapping
- Pinning strategies
- Operational considerations
DynamoDB Nameservice
DynamoDB-specific setup and configuration:
- Table creation (CLI, CloudFormation, Terraform)
- Schema reference (v2 attributes)
- AWS credentials and permissions
- Local development with LocalStack
- Production considerations
Telemetry and Logging
Monitoring and observability:
- Logging configuration
- Metrics collection
- Tracing
- Health monitoring
- Performance metrics
- Integration with monitoring systems
Admin, Health, and Stats
Administrative operations:
- Health check endpoints
- Server statistics
- Manual indexing triggers
- Backup and restore
- Maintenance operations
Query peers and replication
Run fluree-server as a read-only query peer:
- SSE nameservice events (
GET /v1/fluree/events) - Peer mode (refresh on stale + write forwarding)
- Storage proxy endpoints (
/v1/fluree/storage/*) for private-storage deployments
Deployment Patterns
Development
Single-process, memory storage:
./fluree-db-server --storage memory --log-level debug
Single Server Production
File-based storage:
./fluree-db-server \
--storage file \
--data-dir /var/lib/fluree \
--port 8090 \
--log-level info
Distributed Production
AWS-backed distributed deployment:
./fluree-db-server \
--storage aws \
--s3-bucket fluree-prod-data \
--s3-region us-east-1 \
--dynamodb-table fluree-nameservice \
--port 8090
Key Configuration Areas
Server Settings
- Port and host binding
- TLS/SSL certificates
- Request size limits
- Timeout values
- CORS configuration
Storage Configuration
- Storage mode selection
- Data directory (file mode)
- AWS credentials (S3 mode)
- IPFS / Kubo connection (IPFS mode)
- Connection pooling
- Cache settings
Indexing Configuration
- Index interval
- Batch size
- Memory allocation
- Number of threads
- Index retention
Security Configuration
- Authentication mode
- API key requirements
- Signed request validation
- Policy enforcement
- Rate limiting
Monitoring
Health Checks
curl http://localhost:8090/health
Response:
{
"status": "healthy",
"version": "0.1.0",
"storage": "file",
"uptime_ms": 3600000
}
Server Statistics
curl http://localhost:8090/v1/fluree/stats
Response:
{
"version": "0.1.0",
"uptime_ms": 3600000,
"ledgers": 5,
"queries": {
"total": 12345,
"active": 3,
"avg_duration_ms": 45
},
"transactions": {
"total": 567,
"avg_duration_ms": 89
},
"indexing": {
"active": true,
"pending_ledgers": 1,
"avg_lag_ms": 1500
}
}
Metrics Collection
Use GET /v1/fluree/stats for built-in server statistics. Prometheus-style
/metrics export is not currently part of the standalone server API.
Operational Tasks
Backup
File storage backup:
# Backup data directory
tar -czf fluree-backup-$(date +%Y%m%d).tar.gz /var/lib/fluree/
AWS storage backup:
# S3 versioning enabled - automatic backups
aws s3 ls s3://fluree-prod-data/ --recursive
# Point-in-time recovery via S3 versions
Restore
File storage restore:
# Stop server
systemctl stop fluree
# Restore backup
tar -xzf fluree-backup-20240122.tar.gz -C /
# Start server
systemctl start fluree
Manual Indexing
Trigger indexing manually:
curl -X POST http://localhost:8090/v1/fluree/reindex \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
Compaction
There is no standalone HTTP compaction endpoint. Reindexing rebuilds index artifacts when you need to force a full refresh.
Performance Tuning
Memory Settings
./fluree-db-server \
--query-memory-mb 2048 \
--cache-size-mb 1024
Indexing Tuning
fluree-server \
--indexing-enabled \
--reindex-min-bytes 100000 \
--reindex-max-bytes 1000000
Query Tuning
./fluree-db-server \
--query-timeout-ms 30000 \
--max-query-size 1048576 \
--query-threads 8
High Availability
Load Balancing
Run multiple Fluree instances behind load balancer:
┌─────────────┐
│ Clients │
└──────┬──────┘
│
┌──────▼──────┐
│ Load │
│ Balancer │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│Fluree 1│ │Fluree 2│ │Fluree 3│
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
└───────────┼───────────┘
│
┌──────▼──────┐
│ S3/Dynamo │
│ Nameservice│
└─────────────┘
Failover
Configure health checks in load balancer:
health_check:
path: /health
interval: 10s
timeout: 5s
healthy_threshold: 2
unhealthy_threshold: 3
Security Hardening
TLS/SSL
./fluree-db-server \
--tls-cert /path/to/cert.pem \
--tls-key /path/to/key.pem \
--tls-ca /path/to/ca.pem
Require Authentication
./fluree-db-server \
--require-auth \
--require-signed-requests
Rate Limiting
./fluree-db-server \
--rate-limit-queries 100 \
--rate-limit-transactions 10 \
--rate-limit-window 60
Best Practices
1. Use Appropriate Storage Mode
- Development: memory
- Single server: file
- Production/Distributed: AWS
- Decentralized: IPFS
2. Enable Monitoring
Set up monitoring for:
- Health status
- Query latency
- Transaction rate
- Indexing lag
- Error rates
3. Regular Backups
Automate backups:
# Daily backup cron
0 2 * * * /usr/local/bin/backup-fluree.sh
4. Capacity Planning
Monitor growth:
- Storage usage
- Query volume
- Transaction rate
- Index sizes
5. Security Best Practices
- Use TLS in production
- Require authentication
- Enable rate limiting
- Regular security audits
6. Log Management
- Rotate logs regularly
- Ship logs to centralized system
- Set appropriate log levels
- Monitor error rates
Related Documentation
- Configuration - Detailed configuration reference
- Storage - Storage backend details
- Telemetry - Monitoring and metrics
- Admin and Health - Administrative operations
- Getting Started: Server - Initial setup
Configuration
Fluree server is configured via a configuration file, command-line flags, and environment variables.
Configuration Methods
Configuration File (TOML, JSON, or JSON-LD)
The server reads configuration from .fluree/config.toml (or .fluree/config.jsonld) — the same file used by the Fluree CLI. Server settings live under the [server] section (or "server" key in JSON/JSON-LD). The server walks up from the current working directory looking for .fluree/config.toml or .fluree/config.jsonld, falling back to the global Fluree config directory ($FLUREE_HOME, or the platform config directory — see table below).
Global Directory Layout
When $FLUREE_HOME is set, both config and data share that single directory. When it is not set, the platform’s config and data directories are used:
| Content | Linux | macOS | Windows |
|---|---|---|---|
Config (config.toml) | ~/.config/fluree | ~/Library/Application Support/fluree | %LOCALAPPDATA%\fluree |
Data (storage/, active) | ~/.local/share/fluree | ~/Library/Application Support/fluree | %LOCALAPPDATA%\fluree |
On Linux, config and data directories are separated per the XDG Base Directory specification. On macOS and Windows both resolve to the same directory. When directories are split, fluree init --global writes an absolute storage_path into config.toml so the server can locate the data directory regardless of working directory.
# Use default config file discovery
fluree-server
# Override config file path
fluree-server --config /etc/fluree/config.toml
# Activate a profile
fluree-server --profile prod
Example config.toml:
[server]
listen_addr = "0.0.0.0:8090"
storage_path = "/var/lib/fluree"
log_level = "info"
# cache_max_mb = 4096 # global cache budget (MB); default: tiered fraction of RAM (30% <4GB, 40% 4-8GB, 50% ≥8GB)
[server.indexing]
enabled = true
reindex_min_bytes = 100000
# reindex_max_bytes defaults to 20% of system RAM; override only if needed:
# reindex_max_bytes = 536870912 # 512 MB
[server.auth.data]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]
JSON is also supported (detected by .json file extension):
{
"server": {
"listen_addr": "0.0.0.0:8090",
"storage_path": "/var/lib/fluree",
"indexing": { "enabled": true }
}
}
JSON-LD Format
JSON-LD config files (.jsonld extension) add a @context that maps config keys to the Fluree config vocabulary (https://ns.flur.ee/config#), making the file valid JSON-LD. Generate one with:
fluree init --format jsonld
Example .fluree/config.jsonld:
{
"@context": {
"@vocab": "https://ns.flur.ee/config#"
},
"_comment": "Fluree Configuration — JSON-LD format.",
"server": {
"listen_addr": "0.0.0.0:8090",
"storage_path": ".fluree/storage",
"log_level": "info",
"indexing": {
"enabled": true,
"reindex_min_bytes": 100000
}
},
"profiles": {
"prod": {
"server": {
"log_level": "warn"
}
}
}
}
The @context is validated at load time (using the JSON-LD parser) but does not affect config value resolution — serde ignores unknown keys like @context and _comment. If both config.toml and config.jsonld exist in the same directory, TOML takes precedence and a warning is logged.
Profiles
Profiles allow environment-specific overrides. Define them in [profiles.<name>.server] and activate with --profile <name>:
[server]
log_level = "info"
[profiles.dev.server]
log_level = "debug"
[profiles.prod.server]
log_level = "warn"
[profiles.prod.server.indexing]
enabled = true
[profiles.prod.server.auth.data]
mode = "required"
Profile values are deep-merged onto [server] — only the fields present in the profile are overridden.
Command-Line Flags
fluree-server \
--listen-addr 0.0.0.0:8090 \
--storage-path /var/lib/fluree \
--log-level info
Environment Variables
All CLI flags have corresponding environment variables with FLUREE_ prefix:
export FLUREE_LISTEN_ADDR=0.0.0.0:8090
export FLUREE_STORAGE_PATH=/var/lib/fluree
export FLUREE_LOG_LEVEL=info
fluree-server
Precedence
Configuration precedence (highest to lowest):
- Command-line flags
- Environment variables
- Profile overrides (
[profiles.<name>.server]) - Config file (
[server]) - Built-in defaults
Error Handling
If --config or --profile is specified and the configuration cannot be loaded (file not found, parse error, missing profile), the server exits with an error. This prevents silent misconfiguration in production.
If the config file is auto-discovered (no explicit --config) and cannot be parsed, the server logs a warning and continues with CLI/env/default values only.
Server Configuration
Listen Address
Address and port to bind to:
| Flag | Env Var | Default |
|---|---|---|
--listen-addr | FLUREE_LISTEN_ADDR | 0.0.0.0:8090 |
fluree-server --listen-addr 0.0.0.0:9090
Storage Path
Path for file-based storage. If not specified, defaults to .fluree/storage relative to the working directory (the same location used by fluree init):
| Flag | Env Var | Default |
|---|---|---|
--storage-path | FLUREE_STORAGE_PATH | .fluree/storage |
# Explicit storage path (e.g. production)
fluree-server --storage-path /var/lib/fluree
# Default: uses .fluree/storage in the working directory
fluree-server
Connection Configuration (S3, DynamoDB, etc.)
For storage backends beyond local files — S3, DynamoDB nameservice, split commit/index storage, encryption — use a JSON-LD connection config file:
| Flag | Env Var | Default |
|---|---|---|
--connection-config | FLUREE_CONNECTION_CONFIG | None |
When set, the server builds its storage and nameservice from the connection config file instead of using --storage-path. The file uses the same JSON-LD format as the Fluree API connection config.
# S3 + DynamoDB via connection config
fluree server run --connection-config /etc/fluree/connection.jsonld
# Or via environment variable
FLUREE_CONNECTION_CONFIG=/etc/fluree/connection.jsonld fluree server run
Example connection config (connection.jsonld):
{
"@context": {
"@base": "https://ns.flur.ee/config/connection/",
"@vocab": "https://ns.flur.ee/system#"
},
"@graph": [
{
"@id": "commitStorage",
"@type": "Storage",
"s3Bucket": "fluree-commits",
"s3Prefix": "fluree-data/"
},
{
"@id": "indexStorage",
"@type": "Storage",
"s3Bucket": "fluree-indexes--use1-az4--x-s3"
},
{
"@id": "publisher",
"@type": "Publisher",
"dynamodbTable": "fluree-nameservice",
"dynamodbRegion": "us-east-1"
},
{
"@id": "conn",
"@type": "Connection",
"commitStorage": { "@id": "commitStorage" },
"indexStorage": { "@id": "indexStorage" },
"primaryPublisher": { "@id": "publisher" }
}
]
}
Behavior notes:
--connection-configand--storage-pathare mutually exclusive. If both are set,--connection-configtakes precedence (a warning is logged).- Server-level settings (
--cache-max-mb,--indexing-enabled,--reindex-min-bytes,--reindex-max-bytes) override any equivalent values from the connection config. --indexing-enableddefaults totrue. Pass--indexing-enabled=falseonly when a separate peer/indexer process owns index maintenance for the same storage.- AWS credentials and region are resolved via the standard AWS SDK chain (env vars, instance profile,
~/.aws/config, etc.) — they are not part of the connection config. - The connection config can use
envVarindirection for sensitive fields like S3 bucket names or encryption keys (see ConfigurationValue).
Config file equivalent:
[server]
connection_config = "/etc/fluree/connection.jsonld"
Capabilities by Backend
Not all nameservice backends support all features. The server checks capabilities at runtime:
| Feature | File (local) | DynamoDB | Storage-backed |
|---|---|---|---|
| Query / transact | Yes | Yes | Yes |
| Event subscriptions | Yes | No | No |
| Default context (read) | Yes | Yes | Yes |
| Default context (write) | Yes | Yes | No |
If a capability is not available, the server returns an appropriate error (e.g., 501 for event subscriptions with DynamoDB).
CORS
Enable Cross-Origin Resource Sharing:
| Flag | Env Var | Default |
|---|---|---|
--cors-enabled | FLUREE_CORS_ENABLED | true |
When enabled, allows requests from any origin.
Body Limit
Maximum request body size in bytes:
| Flag | Env Var | Default |
|---|---|---|
--body-limit | FLUREE_BODY_LIMIT | 52428800 (50MB) |
Log Level
Logging verbosity:
| Flag | Env Var | Default |
|---|---|---|
--log-level | FLUREE_LOG_LEVEL | info |
Options: trace, debug, info, warn, error
Cache Size
Global cache budget (MB):
| Flag | Env Var | Default |
|---|---|---|
--cache-max-mb | FLUREE_CACHE_MAX_MB | 30/40/50% of RAM (tiered: <4GB / 4-8GB / ≥8GB) |
Background Indexing
Enable background indexing and configure novelty backpressure thresholds:
| Flag | Env Var | Default | Description |
|---|---|---|---|
--indexing-enabled | FLUREE_INDEXING_ENABLED | true | Enable background indexing (set false only when an external indexer process owns this storage) |
--reindex-min-bytes | FLUREE_REINDEX_MIN_BYTES | 100000 | Soft threshold (triggers background indexing) |
--reindex-max-bytes | FLUREE_REINDEX_MAX_BYTES | 20% of system RAM (256 MB fallback) | Hard threshold (blocks commits until reindexed) |
Config file equivalent:
[server.indexing]
enabled = true
reindex_min_bytes = 100000 # 100 KB — soft trigger
# reindex_max_bytes = 536870912 # 512 MB — defaults to 20% of system RAM if omitted
Server Role Configuration
Server Role
Operating mode: transaction server or query peer:
| Flag | Env Var | Default |
|---|---|---|
--server-role | FLUREE_SERVER_ROLE | transaction |
Options:
transaction: Write-enabled, produces events streampeer: Read-only, subscribes to transaction server
Transaction Server URL (Peer Mode)
Base URL of the transaction server (required in peer mode):
| Flag | Env Var |
|---|---|
--tx-server-url | FLUREE_TX_SERVER_URL |
fluree-server \
--server-role peer \
--tx-server-url http://tx.internal:8090
Authentication Configuration
Replication vs Query Access
Fluree enforces a hard boundary between replication-scoped and query-scoped access:
- Replication (
fluree.storage.*): Raw commit and index block transfer for peer sync and CLIfetch/pull/push. These operations bypass dataset policy (data must be bit-identical). Replication tokens are operator/service-account credentials — never issue them to end users. - Query (
fluree.ledger.read/write.*): Application-level data access through the query engine with full dataset policy enforcement. Query tokens are appropriate for end users and application service accounts.
A user holding only query-scoped tokens cannot clone or pull a ledger. They can fluree track a remote ledger (forwarding queries/transactions to the server) but cannot replicate its storage locally.
Events Endpoint Authentication
Protect the /v1/fluree/events SSE endpoint:
| Flag | Env Var | Default |
|---|---|---|
--events-auth-mode | FLUREE_EVENTS_AUTH_MODE | none |
--events-auth-audience | FLUREE_EVENTS_AUTH_AUDIENCE | None |
--events-auth-trusted-issuer | FLUREE_EVENTS_AUTH_TRUSTED_ISSUERS | None |
Modes:
none: No authenticationoptional: Accept tokens but don’t require themrequired: Require valid Bearer token
Supports both Ed25519 (embedded JWK) and OIDC/JWKS (RS256) tokens when the oidc feature is enabled and --jwks-issuer is configured. For OIDC tokens, issuer trust is implicit — only tokens signed by keys from configured JWKS endpoints will verify. For Ed25519 tokens, the issuer must appear in --events-auth-trusted-issuer.
# Ed25519 tokens only
fluree-server \
--events-auth-mode required \
--events-auth-trusted-issuer did:key:z6Mk...
# OIDC + Ed25519 (both work simultaneously)
fluree-server \
--events-auth-mode required \
--jwks-issuer "https://auth.example.com=https://auth.example.com/.well-known/jwks.json" \
--events-auth-trusted-issuer did:key:z6Mk...
Data API Authentication
Protect query/transaction endpoints (including /v1/fluree/query/{ledger...},
/v1/fluree/insert/{ledger...}, /v1/fluree/upsert/{ledger...},
/v1/fluree/update/{ledger...}, /v1/fluree/info/{ledger...}, and
/v1/fluree/exists/{ledger...}):
| Flag | Env Var | Default |
|---|---|---|
--data-auth-mode | FLUREE_DATA_AUTH_MODE | none |
--data-auth-audience | FLUREE_DATA_AUTH_AUDIENCE | None |
--data-auth-trusted-issuer | FLUREE_DATA_AUTH_TRUSTED_ISSUERS | None |
--data-auth-default-policy-class | FLUREE_DATA_AUTH_DEFAULT_POLICY_CLASS | None |
Modes:
none: No authentication (default)optional: Accept tokens but don’t require them (development only)required: Require either a valid Bearer token or a signed request (JWS/VC)
Bearer token scopes:
- Read:
fluree.ledger.read.all=trueorfluree.ledger.read.ledgers=[...] - Write:
fluree.ledger.write.all=trueorfluree.ledger.write.ledgers=[...]
Back-compat: fluree.storage.* claims imply read scope for data endpoints.
fluree-server \
--data-auth-mode required \
--data-auth-trusted-issuer did:key:z6Mk...
OIDC / JWKS Token Verification
When the oidc feature is enabled, the server can verify JWT tokens signed by external identity
providers (e.g., Fluree Cloud Service) using JWKS (JSON Web Key Set) endpoints. This is in addition to the
existing embedded-JWK (Ed25519 did:key) verification path.
Dual-path dispatch: The server inspects each Bearer token’s header:
- Embedded JWK (Ed25519): Uses the existing
verify_jws()path — no JWKS needed. - kid header (RS256): Uses OIDC/JWKS path — fetches the signing key from the issuer’s JWKS endpoint.
Both paths coexist; no configuration change is needed for existing Ed25519 tokens.
| Flag | Env Var | Default | Description |
|---|---|---|---|
--jwks-issuer | FLUREE_JWKS_ISSUERS | None | OIDC issuer to trust (repeatable) |
--jwks-cache-ttl | FLUREE_JWKS_CACHE_TTL | 300 | JWKS cache TTL in seconds |
The --jwks-issuer flag takes the format <issuer_url>=<jwks_url>:
fluree-server \
--data-auth-mode required \
--jwks-issuer "https://solo.example.com=https://solo.example.com/.well-known/jwks.json"
For multiple issuers, repeat the flag or use comma separation in the env var:
# CLI flags (repeatable)
fluree-server \
--jwks-issuer "https://issuer1.example.com=https://issuer1.example.com/.well-known/jwks.json" \
--jwks-issuer "https://issuer2.example.com=https://issuer2.example.com/.well-known/jwks.json"
# Environment variable (comma-separated)
export FLUREE_JWKS_ISSUERS="https://issuer1.example.com=https://issuer1.example.com/.well-known/jwks.json,https://issuer2.example.com=https://issuer2.example.com/.well-known/jwks.json"
Behavior details:
- JWKS endpoints are fetched at startup (
warm()) but the server starts even if they’re unreachable. - Keys are cached and refreshed when a
kidmiss occurs (rate-limited to one refresh per issuer every 10 seconds). - The token’s
issclaim must exactly match a configured issuer URL — unconfigured issuers are rejected immediately with a clear error. - Data API, events, admin, and storage proxy endpoints all support JWKS verification. A single
--jwks-issuerflag enables OIDC tokens across all endpoint groups. MCP auth continues to use the existing Ed25519 path only.
Connection-Scoped SPARQL Scope Enforcement
When a Bearer token is present for connection-scoped SPARQL queries (/v1/fluree/query with
Content-Type: application/sparql-query), the server enforces ledger scope:
- FROM / FROM NAMED clauses are parsed to extract ledger IDs (
name:branch). - Each ledger ID is checked against the token’s read scope (
fluree.ledger.read.allorfluree.ledger.read.ledgers). - Out-of-scope ledgers return 404 (no existence leak).
- If no FROM clause is present, the query proceeds normally (the engine handles missing dataset errors).
Admin Endpoint Authentication
Protect /v1/fluree/create, /v1/fluree/drop, /v1/fluree/reindex, branch
administration, and Iceberg mapping endpoints:
| Flag | Env Var | Default |
|---|---|---|
--admin-auth-mode | FLUREE_ADMIN_AUTH_MODE | none |
--admin-auth-trusted-issuer | FLUREE_ADMIN_AUTH_TRUSTED_ISSUERS | None |
Modes:
none: No authentication (development)required: Require valid Bearer token (production)
Supports both Ed25519 (embedded JWK) and OIDC/JWKS (RS256) tokens when the oidc feature is enabled and --jwks-issuer is configured. For OIDC tokens, issuer trust is implicit — only tokens signed by keys from configured JWKS endpoints will verify. For Ed25519 tokens, the issuer must appear in --admin-auth-trusted-issuer or the fallback --events-auth-trusted-issuer.
# Ed25519 tokens only
fluree-server \
--admin-auth-mode required \
--admin-auth-trusted-issuer did:key:z6Mk...
# OIDC (trust comes from --jwks-issuer, no did:key issuers needed)
fluree-server \
--admin-auth-mode required \
--jwks-issuer "https://auth.example.com=https://auth.example.com/.well-known/jwks.json"
If no admin-specific issuers are configured, falls back to --events-auth-trusted-issuer.
MCP Endpoint Authentication
Protect the /mcp Model Context Protocol endpoint:
| Flag | Env Var | Default |
|---|---|---|
--mcp-enabled | FLUREE_MCP_ENABLED | false |
--mcp-auth-trusted-issuer | FLUREE_MCP_AUTH_TRUSTED_ISSUERS | None |
fluree-server \
--mcp-enabled \
--mcp-auth-trusted-issuer did:key:z6Mk...
Peer Mode Configuration
Peer Subscription
Configure what the peer subscribes to:
| Flag | Description |
|---|---|
--peer-subscribe-all | Subscribe to all ledgers and graph sources |
--peer-ledger <ledger-id> | Subscribe to specific ledger (repeatable) |
--peer-graph-source <ledger-id> | Subscribe to specific graph source (repeatable) |
fluree-server \
--server-role peer \
--tx-server-url http://tx:8090 \
--peer-subscribe-all
Or subscribe to specific resources:
fluree-server \
--server-role peer \
--tx-server-url http://tx:8090 \
--peer-ledger books:main \
--peer-ledger users:main
Peer Events Configuration
| Flag | Env Var | Description |
|---|---|---|
--peer-events-url | FLUREE_PEER_EVENTS_URL | Custom events URL (default: {tx_server_url}/v1/fluree/events) |
--peer-events-token | FLUREE_PEER_EVENTS_TOKEN | Bearer token for events (supports @filepath) |
Peer Reconnection
| Flag | Default | Description |
|---|---|---|
--peer-reconnect-initial-ms | 1000 | Initial reconnect delay |
--peer-reconnect-max-ms | 30000 | Maximum reconnect delay |
--peer-reconnect-multiplier | 2.0 | Backoff multiplier |
Peer Storage Access
| Flag | Env Var | Default |
|---|---|---|
--storage-access-mode | FLUREE_STORAGE_ACCESS_MODE | shared |
Options:
shared: Direct storage access (requires--storage-pathor--connection-config)proxy: Proxy reads through transaction server
For proxy mode:
| Flag | Env Var |
|---|---|
--storage-proxy-token | FLUREE_STORAGE_PROXY_TOKEN |
--storage-proxy-token-file | FLUREE_STORAGE_PROXY_TOKEN_FILE |
Storage Proxy Configuration (Transaction Server)
Storage proxy provides replication-scoped access to raw storage for peer servers and CLI replication commands (fetch/pull/push). Tokens must carry fluree.storage.* claims — query-scoped tokens (fluree.ledger.read/write.*) are not sufficient. See Replication vs Query Access above.
Enable storage proxy endpoints for peers without direct storage access:
| Flag | Env Var | Default |
|---|---|---|
--storage-proxy-enabled | FLUREE_STORAGE_PROXY_ENABLED | false |
--storage-proxy-trusted-issuer | FLUREE_STORAGE_PROXY_TRUSTED_ISSUERS | None |
--storage-proxy-default-identity | FLUREE_STORAGE_PROXY_DEFAULT_IDENTITY | None |
--storage-proxy-default-policy-class | FLUREE_STORAGE_PROXY_DEFAULT_POLICY_CLASS | None |
--storage-proxy-debug-headers | FLUREE_STORAGE_PROXY_DEBUG_HEADERS | false |
# Ed25519 trust (did:key):
fluree-server \
--storage-proxy-enabled \
--storage-proxy-trusted-issuer did:key:z6Mk...
# OIDC/JWKS trust (same --jwks-issuer flag used by other endpoints):
fluree-server \
--storage-proxy-enabled \
--jwks-issuer "https://solo.example.com=https://solo.example.com/.well-known/jwks.json"
JWKS support: When
--jwks-issueris configured, storage proxy endpoints accept RS256 OIDC tokens in addition to Ed25519 JWS tokens. The--jwks-issuerflag is shared with data, admin, and events endpoints — a single flag enables OIDC across all endpoint groups.
Complete Configuration Examples
Development (Memory Storage)
fluree-server \
--log-level debug
Single Server (File Storage)
fluree-server \
--storage-path /var/lib/fluree \
--indexing-enabled \
--log-level info
Production with Admin Auth
fluree-server \
--storage-path /var/lib/fluree \
--indexing-enabled \
--admin-auth-mode required \
--admin-auth-trusted-issuer did:key:z6Mk... \
--log-level info
Transaction Server with Events Auth
fluree-server \
--storage-path /var/lib/fluree \
--events-auth-mode required \
--events-auth-trusted-issuer did:key:z6Mk... \
--storage-proxy-enabled \
--admin-auth-mode required
Production with OIDC (All Endpoints)
fluree-server \
--storage-path /var/lib/fluree \
--indexing-enabled \
--jwks-issuer "https://auth.example.com=https://auth.example.com/.well-known/jwks.json" \
--data-auth-mode required \
--events-auth-mode required \
--admin-auth-mode required \
--storage-proxy-enabled
Query Peer (Shared Storage)
fluree-server \
--server-role peer \
--tx-server-url http://tx.internal:8090 \
--storage-path /var/lib/fluree \
--peer-subscribe-all \
--peer-events-token @/etc/fluree/peer-token.jwt
Query Peer (Proxy Storage)
fluree-server \
--server-role peer \
--tx-server-url http://tx.internal:8090 \
--storage-access-mode proxy \
--storage-proxy-token @/etc/fluree/storage-proxy.jwt \
--peer-subscribe-all \
--peer-events-token @/etc/fluree/peer-token.jwt
S3 + DynamoDB (Connection Config)
fluree server run \
--connection-config /etc/fluree/connection.jsonld \
--indexing-enabled \
--reindex-min-bytes 100000 \
--reindex-max-bytes 5000000 \
--cache-max-mb 4096
With a config file:
[server]
connection_config = "/etc/fluree/connection.jsonld"
cache_max_mb = 4096
[server.indexing]
enabled = true
reindex_min_bytes = 100000
reindex_max_bytes = 5000000
[server.auth.data]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]
S3 Peer (Shared Storage via Connection Config)
fluree server run \
--server-role peer \
--tx-server-url http://tx.internal:8090 \
--connection-config /etc/fluree/connection.jsonld \
--peer-subscribe-all \
--peer-events-token @/etc/fluree/peer-token.jwt
Environment Variables Reference
| Variable | Description | Default |
|---|---|---|
FLUREE_HOME | Global Fluree directory (unified config + data) | Platform dirs (see Global Directory Layout) |
FLUREE_CONFIG | Config file path | .fluree/config.{toml,jsonld} (auto-discovered) |
FLUREE_PROFILE | Configuration profile name | None |
FLUREE_LISTEN_ADDR | Server address:port | 0.0.0.0:8090 |
FLUREE_STORAGE_PATH | File storage path | .fluree/storage |
FLUREE_CONNECTION_CONFIG | JSON-LD connection config file path | None |
FLUREE_CORS_ENABLED | Enable CORS | true |
FLUREE_INDEXING_ENABLED | Enable background indexing | true |
FLUREE_REINDEX_MIN_BYTES | Soft reindex threshold (bytes) | 100000 |
FLUREE_REINDEX_MAX_BYTES | Hard reindex threshold (bytes) | 20% of system RAM (256 MB fallback) |
FLUREE_CACHE_MAX_MB | Global cache budget (MB) | 30/40/50% of RAM (tiered: <4GB / 4-8GB / ≥8GB) |
FLUREE_BODY_LIMIT | Max request body bytes | 52428800 |
FLUREE_LOG_LEVEL | Log level | info |
FLUREE_SERVER_ROLE | Server role | transaction |
FLUREE_TX_SERVER_URL | Transaction server URL | None |
FLUREE_EVENTS_AUTH_MODE | Events auth mode | none |
FLUREE_EVENTS_AUTH_TRUSTED_ISSUERS | Events trusted issuers | None |
FLUREE_DATA_AUTH_MODE | Data API auth mode | none |
FLUREE_DATA_AUTH_AUDIENCE | Data API expected audience | None |
FLUREE_DATA_AUTH_TRUSTED_ISSUERS | Data API trusted issuers | None |
FLUREE_DATA_AUTH_DEFAULT_POLICY_CLASS | Data API default policy class | None |
FLUREE_ADMIN_AUTH_MODE | Admin auth mode | none |
FLUREE_ADMIN_AUTH_TRUSTED_ISSUERS | Admin trusted issuers | None |
FLUREE_MCP_ENABLED | Enable MCP endpoint | false |
FLUREE_MCP_AUTH_TRUSTED_ISSUERS | MCP trusted issuers | None |
FLUREE_STORAGE_ACCESS_MODE | Peer storage mode | shared |
FLUREE_STORAGE_PROXY_ENABLED | Enable storage proxy | false |
Command-Line Reference
fluree-server --help
Best Practices
1. Keep Secrets Out of Config Files
Tokens and credentials should not be stored as plaintext in config files (which may be committed to version control or readable by other processes). Three options, in order of preference:
Environment variables (recommended for production):
export FLUREE_PEER_EVENTS_TOKEN=$(cat /etc/fluree/token.jwt)
export FLUREE_STORAGE_PROXY_TOKEN=$(cat /etc/fluree/proxy-token.jwt)
@filepath references in config files or CLI flags (reads the file at startup):
[server.peer]
events_token = "@/etc/fluree/peer-token.jwt"
storage_proxy_token = "@/etc/fluree/proxy-token.jwt"
--peer-events-token @/etc/fluree/token.jwt
Direct values (development only): If a secret-bearing field contains a literal token in the config file, the server logs a warning at startup recommending @filepath or env vars.
The following config file fields support @filepath resolution:
| Config file key | Env var alternative |
|---|---|
peer.events_token | FLUREE_PEER_EVENTS_TOKEN |
peer.storage_proxy_token | FLUREE_STORAGE_PROXY_TOKEN |
2. Enable Admin Auth in Production
Always protect admin endpoints in production:
fluree-server \
--admin-auth-mode required \
--admin-auth-trusted-issuer did:key:z6Mk...
3. Use File Storage for Persistence
Memory storage is lost on restart:
# Development only
fluree-server
# Production
fluree-server --storage-path /var/lib/fluree
4. Monitor Logs
Use structured logging for production:
fluree-server --log-level info 2>&1 | jq .
Remote Connections
Remote connections enable SPARQL SERVICE federation against other Fluree instances. A remote connection maps a name to a server URL and bearer token. Once registered, queries can reference any ledger on that server using SERVICE <fluree:remote:<name>/<ledger>> { ... }.
Rust API
Register remote connections on the FlureeBuilder:
#![allow(unused)]
fn main() {
let fluree = FlureeBuilder::file("./data")
.remote_connection("acme", "https://acme-fluree.example.com", Some(token))
.remote_connection("partner", "https://partner.example.com", None)
.build()?;
}
Each call registers a named connection. The name is used in SPARQL queries:
SERVICE <fluree:remote:acme/customers:main> { ?s ?p ?o }
SERVICE <fluree:remote:partner/inventory:main> { ?item ex:sku ?sku }
Connection Parameters
| Parameter | Description |
|---|---|
name | Alias used in fluree:remote:<name>/... URIs |
base_url | Server URL (e.g., https://acme-fluree.example.com). The query path /v1/fluree/query/{ledger} is appended automatically. |
token | Optional bearer token for authentication. Sent as Authorization: Bearer <token> on every request. |
The default per-request timeout is 30 seconds. Requests that exceed this produce a query error (or empty results with SERVICE SILENT).
Security
Bearer tokens are stored in memory on the Fluree instance. They are never serialized to storage, included in nameservice records, or exposed through info/admin endpoints. If the token needs rotation, rebuild the Fluree instance with an updated token, or use set_remote_service() to inject a custom executor with token refresh logic.
Feature Flag
The HTTP transport for remote SERVICE requires the search-remote-client Cargo feature (which enables reqwest). Without this feature, remote connections can be registered but queries against them will fail at runtime. The feature is enabled by default in the server binary.
See SPARQL: Remote Fluree Federation for query syntax and examples.
Related Documentation
- Query Peers - Peer mode and replication
- Storage Modes - Storage backend details
- Telemetry - Monitoring configuration
- Admin and Health - Health check endpoints
Running Fluree with Docker
The official image (fluree/server) ships the fluree binary on a slim Debian base. This guide covers what’s inside the image, how to configure it (env vars, mounted config files, CLI flags), and worked recipes for the common production patterns.
What’s in the Image
| Aspect | Value |
|---|---|
| Base | debian:trixie-slim |
| Entrypoint | /usr/local/bin/fluree-entrypoint.sh |
| Default command | fluree server run |
WORKDIR | /var/lib/fluree |
VOLUME | /var/lib/fluree |
| Exposed port | 8090 |
| Runtime user | fluree (UID 1000, GID 1000) |
| Healthcheck | GET /health every 30s |
| Default log filter | RUST_LOG=info |
Entrypoint behavior: on first start, if /var/lib/fluree/.fluree/ does not exist, the entrypoint runs fluree init to create a default .fluree/config.toml and .fluree/storage/ directory. Subsequent starts skip init. Any arguments passed to docker run after the image name are forwarded to fluree server run, so you can append CLI flags (e.g. --log-level debug) directly.
Quick Start
docker run --rm -p 8090:8090 fluree/server:latest
Verify:
curl http://localhost:8090/health
Data lives inside the container’s writable layer here — fine for trying things out, lost when the container is removed. For anything beyond a smoke test, mount a volume.
Persisting Data
The image declares VOLUME /var/lib/fluree. Mount a host directory or named volume there:
# Named volume (recommended)
docker run -d --name fluree \
-p 8090:8090 \
-v fluree-data:/var/lib/fluree \
fluree/server:latest
# Host bind mount — make sure the directory is writable by UID 1000
mkdir -p ./fluree-data && sudo chown 1000:1000 ./fluree-data
docker run -d --name fluree \
-p 8090:8090 \
-v "$PWD/fluree-data:/var/lib/fluree" \
fluree/server:latest
The volume holds both .fluree/config.toml (config) and .fluree/storage/ (ledger data) by default.
Three Ways to Configure
Fluree resolves configuration with this precedence (highest wins):
- CLI flags appended after the image name
- Environment variables (
FLUREE_*) set with-eorenvironment: - Profile overrides (
[profiles.<name>.server]) when you pass--profile - Config file at
.fluree/config.tomlor.fluree/config.jsonld - Built-in defaults
You can use any one of these — or, more typically, layer them: bake a base config file into a volume, then tweak per-environment with env vars or compose overrides.
Heads up — log level: The Dockerfile sets
ENV RUST_LOG=info. The console log filter usesRUST_LOGif it is non-empty and only falls back toFLUREE_LOG_LEVELwhenRUST_LOGis unset. Inside this image you must overrideRUST_LOGto change console verbosity:docker run -e RUST_LOG=debug fluree/server:latest
1. Environment Variables Only
Every CLI flag has a FLUREE_* env var equivalent (see Configuration). For simple deployments this is the lowest-friction path:
docker run -d --name fluree \
-p 8090:8090 \
-v fluree-data:/var/lib/fluree \
-e FLUREE_LISTEN_ADDR=0.0.0.0:8090 \
-e FLUREE_STORAGE_PATH=/var/lib/fluree/.fluree/storage \
-e FLUREE_INDEXING_ENABLED=true \
-e FLUREE_REINDEX_MIN_BYTES=1000000 \
-e FLUREE_REINDEX_MAX_BYTES=10000000 \
-e FLUREE_CACHE_MAX_MB=2048 \
-e RUST_LOG=info \
fluree/server:latest
2. Mounted Config File (JSON-LD or TOML)
Author a config file on the host, then mount it at /var/lib/fluree/.fluree/config.jsonld (or .toml). The server walks up from WORKDIR=/var/lib/fluree and picks it up automatically.
./fluree-config/config.jsonld:
{
"@context": { "@vocab": "https://ns.flur.ee/config#" },
"server": {
"listen_addr": "0.0.0.0:8090",
"storage_path": "/var/lib/fluree/.fluree/storage",
"log_level": "info",
"cache_max_mb": 2048,
"indexing": {
"enabled": true,
"reindex_min_bytes": 1000000,
"reindex_max_bytes": 10000000
}
},
"profiles": {
"prod": {
"server": {
"log_level": "warn",
"cache_max_mb": 8192
}
}
}
}
docker run -d --name fluree \
-p 8090:8090 \
-v fluree-data:/var/lib/fluree \
-v "$PWD/fluree-config/config.jsonld:/var/lib/fluree/.fluree/config.jsonld:ro" \
fluree/server:latest --profile prod
If both config.toml and config.jsonld exist in the same directory, TOML wins and the server logs a warning. Pick one format.
The TOML equivalent (./fluree-config/config.toml):
[server]
listen_addr = "0.0.0.0:8090"
storage_path = "/var/lib/fluree/.fluree/storage"
log_level = "info"
cache_max_mb = 2048
[server.indexing]
enabled = true
reindex_min_bytes = 1000000
reindex_max_bytes = 10000000
[profiles.prod.server]
log_level = "warn"
cache_max_mb = 8192
You can also stash the config outside WORKDIR and point at it explicitly:
docker run -d --name fluree \
-p 8090:8090 \
-v fluree-data:/var/lib/fluree \
-v "$PWD/fluree-config:/etc/fluree:ro" \
fluree/server:latest --config /etc/fluree/config.jsonld
3. Layered: File + Env Var Overrides
The common production shape: bake the base config into the image or volume, then let the orchestrator override per-environment with FLUREE_* env vars. Env vars beat the file — no file edit needed to bump cache size in staging vs. prod.
docker run -d --name fluree \
-p 8090:8090 \
-v fluree-data:/var/lib/fluree \
-v "$PWD/fluree-config/config.jsonld:/var/lib/fluree/.fluree/config.jsonld:ro" \
-e FLUREE_CACHE_MAX_MB=4096 \
-e RUST_LOG=warn \
fluree/server:latest
Common Configuration Recipes
Tuning the LRU Cache
cache_max_mb is the global budget for the in-memory index/flake cache. The default is a tiered fraction of system RAM (30%/40%/50% for <4GB/4–8GB/≥8GB hosts). On a container with a hard memory limit, set this explicitly — the auto-tier reads host RAM, not the cgroup limit, and can over-allocate.
# docker-compose.yml fragment
services:
fluree:
image: fluree/server:latest
mem_limit: 6g
environment:
FLUREE_CACHE_MAX_MB: 3072 # ~50% of the cgroup limit
Or in JSON-LD:
{
"@context": { "@vocab": "https://ns.flur.ee/config#" },
"server": { "cache_max_mb": 3072 }
}
Background Indexing
Indexing is off by default. Enable it for production write workloads — without it, every commit writes to novelty and queries get slower as novelty grows.
| Setting | Meaning |
|---|---|
indexing.enabled | Turn the background indexer on |
reindex_min_bytes | Soft threshold — novelty above this triggers a background reindex |
reindex_max_bytes | Hard threshold — commits block above this until reindexing catches up |
Tune min/max based on commit volume. Defaults (100 KB / 1 MB) are conservative; busy ledgers should raise both:
[server.indexing]
enabled = true
reindex_min_bytes = 5000000 # 5 MB — start indexing in the background
reindex_max_bytes = 50000000 # 50 MB — block commits at this point
docker run -d \
-e FLUREE_INDEXING_ENABLED=true \
-e FLUREE_REINDEX_MIN_BYTES=5000000 \
-e FLUREE_REINDEX_MAX_BYTES=50000000 \
fluree/server:latest
CORS and Request Body Size
[server]
cors_enabled = true
body_limit = 104857600 # 100 MB — raise for bulk imports
Authentication (Production)
Require a Bearer token on data and admin endpoints. The trusted issuer is the did:key of your token signer.
[server.auth.data]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]
[server.auth.admin]
mode = "required"
trusted_issuers = ["did:key:z6Mk..."]
For OIDC/JWKS (e.g. an external IdP), set --jwks-issuer or FLUREE_JWKS_ISSUERS:
docker run -d \
-e FLUREE_DATA_AUTH_MODE=required \
-e FLUREE_JWKS_ISSUERS="https://auth.example.com=https://auth.example.com/.well-known/jwks.json" \
fluree/server:latest
See Configuration → Authentication for the full matrix.
S3 + DynamoDB (Distributed Storage)
For multi-node or cloud deployments, point the server at a JSON-LD connection config describing your storage and nameservice. AWS credentials come from the standard SDK chain (env vars, IAM role, etc.) — they are not part of the connection config.
./fluree-config/connection.jsonld:
{
"@context": {
"@base": "https://ns.flur.ee/config/connection/",
"@vocab": "https://ns.flur.ee/system#"
},
"@graph": [
{ "@id": "commitStorage", "@type": "Storage",
"s3Bucket": "fluree-prod-commits", "s3Prefix": "data/" },
{ "@id": "indexStorage", "@type": "Storage",
"s3Bucket": "fluree-prod-indexes" },
{ "@id": "publisher", "@type": "Publisher",
"dynamodbTable": "fluree-nameservice", "dynamodbRegion": "us-east-1" },
{ "@id": "conn", "@type": "Connection",
"commitStorage": { "@id": "commitStorage" },
"indexStorage": { "@id": "indexStorage" },
"primaryPublisher": { "@id": "publisher" } }
]
}
docker run -d --name fluree \
-p 8090:8090 \
-v "$PWD/fluree-config:/etc/fluree:ro" \
-e AWS_REGION=us-east-1 \
-e AWS_ACCESS_KEY_ID=... \
-e AWS_SECRET_ACCESS_KEY=... \
-e FLUREE_CONNECTION_CONFIG=/etc/fluree/connection.jsonld \
-e FLUREE_INDEXING_ENABLED=true \
fluree/server:latest
--connection-config and --storage-path are mutually exclusive. See Configuration → Connection Configuration and the DynamoDB guide for backend-specific setup.
Search Service (fluree-search-httpd)
Run a dedicated BM25 / vector search service alongside the main server when search traffic is heavy enough that you want it isolated from the transactional path. The service is a separate binary with its own listen port — it is not mounted under the main server’s api_base_url. It needs read access to the same storage and nameservice paths the main server writes to.
docker run -d --name fluree-search \
-p 9090:9090 \
-v fluree-data:/var/lib/fluree \
-e FLUREE_STORAGE_ROOT=/var/lib/fluree/storage \
-e FLUREE_NAMESERVICE_PATH=/var/lib/fluree/ns \
fluree/search-httpd:latest
| Env var | Default | Purpose |
|---|---|---|
FLUREE_STORAGE_ROOT | (required) | Storage path (file:// optional) |
FLUREE_NAMESERVICE_PATH | (required) | Nameservice path |
FLUREE_SEARCH_LISTEN | 0.0.0.0:9090 | Listen address |
FLUREE_SEARCH_CACHE_MAX_ENTRIES | 100 | Max cached indexes |
FLUREE_SEARCH_CACHE_TTL_SECS | 300 | Cache TTL |
FLUREE_SEARCH_MAX_LIMIT | 1000 | Max results per query |
FLUREE_SEARCH_DEFAULT_TIMEOUT_MS | 30000 | Default request timeout |
FLUREE_SEARCH_MAX_TIMEOUT_MS | 300000 | Maximum allowed timeout |
Prerequisites. The service only serves queries against indexes that already exist on the shared volume. BM25 / vector graph-source indexes are created via the Rust API today (Bm25CreateConfig + create_full_text_index, or VectorCreateConfig + create_vector_index). The @fulltext datatype and the f:fullTextDefaults config-graph paths are managed entirely through the main server’s HTTP API and don’t require this dedicated service.
Compose example with both services sharing a volume:
services:
fluree:
image: fluree/server:latest
ports:
- "8090:8090"
volumes:
- fluree-data:/var/lib/fluree
environment:
RUST_LOG: info
FLUREE_INDEXING_ENABLED: "true"
fluree-search:
image: fluree/search-httpd:latest
depends_on:
- fluree
ports:
- "9090:9090"
volumes:
- fluree-data:/var/lib/fluree:ro # read-only is sufficient
environment:
RUST_LOG: info
FLUREE_STORAGE_ROOT: /var/lib/fluree/storage
FLUREE_NAMESERVICE_PATH: /var/lib/fluree/ns
volumes:
fluree-data:
Clients send search requests to POST http://fluree-search:9090/v1/search. See BM25 → Remote Search Service for the request/response protocol.
Query Peer
Run as a read-only peer that subscribes to a transaction server’s event stream:
docker run -d --name fluree-peer \
-p 8090:8090 \
-v fluree-peer-data:/var/lib/fluree \
-e FLUREE_SERVER_ROLE=peer \
-e FLUREE_TX_SERVER_URL=http://tx.internal:8090 \
fluree/server:latest --peer-subscribe-all
See Query peers and replication for the proxy-mode and auth options.
Docker Compose: Full Example
A production-leaning single-node setup with a mounted JSON-LD config, env-var overrides, named data volume, and resource limits:
services:
fluree:
image: fluree/server:latest
container_name: fluree
restart: unless-stopped
ports:
- "8090:8090"
volumes:
- fluree-data:/var/lib/fluree
- ./fluree-config/config.jsonld:/var/lib/fluree/.fluree/config.jsonld:ro
environment:
RUST_LOG: info
FLUREE_CACHE_MAX_MB: 4096
FLUREE_INDEXING_ENABLED: "true"
FLUREE_REINDEX_MIN_BYTES: "5000000"
FLUREE_REINDEX_MAX_BYTES: "50000000"
# Auth — point at your trusted did:key signer
FLUREE_DATA_AUTH_MODE: required
FLUREE_DATA_AUTH_TRUSTED_ISSUERS: did:key:z6Mk...
FLUREE_ADMIN_AUTH_MODE: required
FLUREE_ADMIN_AUTH_TRUSTED_ISSUERS: did:key:z6Mk...
mem_limit: 8g
healthcheck:
test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8090/health"]
interval: 30s
timeout: 3s
start_period: 15s
retries: 3
command: ["--profile", "prod"]
volumes:
fluree-data:
docker compose up -d
docker compose logs -f fluree
Troubleshooting
Container restarts after fluree init. First-run init only runs when /var/lib/fluree/.fluree/ is missing. If the volume is owned by a non-1000 UID, init fails. Fix with sudo chown -R 1000:1000 ./fluree-data on the host.
Mounted config file is ignored. Confirm the mount path and the file extension. The server only auto-discovers .fluree/config.toml or .fluree/config.jsonld under the working directory. Anything else needs --config <path> (or FLUREE_CONFIG=<path>). If both formats are present in the same directory, TOML wins — check the startup logs for the warning.
Setting FLUREE_LOG_LEVEL doesn’t change console output. The image’s ENV RUST_LOG=info shadows it. Override with -e RUST_LOG=debug instead.
cache_max_mb auto-default is too large under a memory limit. The auto-tier reads host RAM, not the cgroup. Set FLUREE_CACHE_MAX_MB (or cache_max_mb in the file) to a value sized to the container limit.
Health check failing. curl http://localhost:8090/health from your host. If the server is up but the healthcheck fails, the listen address is probably bound to 127.0.0.1 inside the container — set FLUREE_LISTEN_ADDR=0.0.0.0:8090.
Related Documentation
- Configuration reference — full flag/env/file matrix
- Storage modes — memory / file / AWS / IPFS
- JSON-LD connection configuration — schema for
connection.jsonld - Query peers and replication — peer-mode deployments
- Quickstart: Server — first-run walkthrough
Storage Modes
Fluree supports four storage modes, each optimized for different deployment scenarios. This document provides detailed information about each storage mode and guidance for choosing the right one.
Storage Modes
Memory Storage
In-memory storage for development and testing:
./fluree-db-server --storage memory
Characteristics:
- Data stored in RAM only
- No persistence (data lost on restart)
- Fastest performance
- No external dependencies
Use Cases:
- Local development
- Unit testing
- Temporary/ephemeral databases
- Prototyping
Limitations:
- No durability (data lost on crash/restart)
- Limited by available RAM
- Single process only
File Storage
Local file system storage:
./fluree-db-server \
--storage file \
--data-dir /var/lib/fluree
Characteristics:
- Data persisted to local disk
- Survives server restarts
- Good performance (SSD recommended)
- Simple setup
Use Cases:
- Single-server production
- Development with persistence
- Edge deployments
- Small to medium scale
Limitations:
- Single machine only
- No built-in replication
- Limited by disk capacity
- No cross-region support
AWS Storage
Distributed storage using S3 and DynamoDB:
./fluree-db-server \
--storage aws \
--s3-bucket fluree-prod-data \
--s3-region us-east-1 \
--dynamodb-table fluree-nameservice \
--dynamodb-region us-east-1
Characteristics:
- Distributed, scalable storage
- Multi-process coordination
- Cross-region replication
- High durability (99.999999999%)
Use Cases:
- Multi-server production
- High availability requirements
- Geographic distribution
- Cloud-native applications
Limitations:
- Requires AWS account
- Higher latency than local storage
- Usage costs
- More complex setup
IPFS Storage
Decentralized content-addressed storage via a local Kubo node:
{
"@context": {"@vocab": "https://ns.flur.ee/system#"},
"@graph": [{
"@type": "Connection",
"indexStorage": {
"@type": "Storage",
"ipfsApiUrl": "http://127.0.0.1:5001",
"ipfsPinOnPut": true
}
}]
}
Characteristics:
- Content-addressed (every blob identified by SHA-256 hash)
- Immutable, tamper-evident storage
- Decentralized replication via IPFS network
- Fluree’s native CIDs work directly with IPFS
Use Cases:
- Decentralized / censorship-resistant deployments
- Content integrity verification
- Cross-organization data sharing
- Foundation for IPNS/ENS-based ledger discovery
Limitations:
- Requires a running Kubo node
- No prefix listing (manifest-based tracking needed)
- No native deletion (unpin + GC)
- Higher write latency than local file I/O
See IPFS Storage Guide for complete setup and configuration.
Storage Architecture
Memory Storage
┌──────────────────────┐
│ Fluree Process │
│ ┌────────────────┐ │
│ │ Hash Map │ │
│ │ (In Memory) │ │
│ └────────────────┘ │
└──────────────────────┘
All data in process memory.
File Storage
┌──────────────────────┐
│ Fluree Process │
│ ┌────────────────┐ │
│ │ File I/O │ │
│ └────────┬───────┘ │
└───────────┼──────────┘
│
┌──────▼──────┐
│ File System │
│ /var/lib/ │
│ fluree/ │
└─────────────┘
Data persisted to local files.
AWS Storage
┌──────────────────────┐ ┌──────────────────────┐
│ Fluree Process 1 │ │ Fluree Process 2 │
│ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ AWS SDK │ │ │ │ AWS SDK │ │
│ └────────┬───────┘ │ │ └────────┬───────┘ │
└───────────┼──────────┘ └───────────┼──────────┘
│ │
└────────┬────────────────┘
│
┌──────────▼──────────┐
│ AWS Cloud │
│ ┌──────┐ ┌──────┐│
│ │ S3 │ │Dynamo││
│ └──────┘ └──────┘│
└─────────────────────┘
Multiple processes coordinate via AWS.
IPFS Storage
┌──────────────────────┐
│ Fluree Process │
│ ┌────────────────┐ │
│ │ IpfsStorage │ │
│ │ (HTTP client) │ │
│ └────────┬───────┘ │
└───────────┼──────────┘
│ HTTP RPC
┌──────▼──────┐
│ Kubo Node │
│ (IPFS) │
└──────┬──────┘
│ libp2p
┌──────▼──────┐
│ IPFS P2P │
│ Network │
└─────────────┘
Data stored as content-addressed blocks in IPFS via Kubo.
Storage Encryption
Fluree supports transparent AES-256-GCM encryption for data at rest. When enabled, all data is automatically encrypted before being written to storage.
Enabling Encryption
# Generate a 32-byte encryption key
export FLUREE_ENCRYPTION_KEY=$(openssl rand -base64 32)
Configure via JSON-LD (file storage):
{
"@context": {"@vocab": "https://ns.flur.ee/system#"},
"@graph": [{
"@type": "Connection",
"indexStorage": {
"@type": "Storage",
"filePath": "/var/lib/fluree",
"AES256Key": {"envVar": "FLUREE_ENCRYPTION_KEY"}
}
}]
}
For S3 storage with encryption:
{
"@context": {"@vocab": "https://ns.flur.ee/system#"},
"@graph": [{
"@type": "Connection",
"indexStorage": {
"@type": "Storage",
"s3Bucket": "my-fluree-bucket",
"s3Endpoint": "https://s3.us-east-1.amazonaws.com",
"AES256Key": {"envVar": "FLUREE_ENCRYPTION_KEY"}
}
}]
}
Key Features:
- AES-256-GCM authenticated encryption
- Works natively with all storage backends (memory, file, S3)
- Transparent encryption/decryption on read/write
- Portable ciphertext format (encrypted data can be moved between backends)
- Environment variable support for key configuration
See Storage Encryption for full documentation.
File Storage Details
Directory Structure
/var/lib/fluree/
├── ns@v2/ # Nameservice records
│ ├── mydb/
│ │ ├── main.json # Ledger metadata
│ │ └── dev.json
│ └── customers/
│ └── main.json
├── commit/ # Transaction commits
│ ├── abc123def456.commit
│ └── def456abc789.commit
├── index/ # Index snapshots
│ ├── mydb-main-t100.idx
│ └── mydb-main-t150.idx
└── graph-sources/ # Graph sources
└── products-search/
└── main/
└── bm25/
├── manifest.json
└── t150/
└── snapshot.bin
File Formats
Nameservice (JSON):
{
"ledger_id": "mydb:main",
"commit_t": 150,
"index_t": 145,
"commit_id": "bafybeig...commitT150",
"index_id": "bafybeig...indexRootT145"
}
Commits (Binary):
- Compressed flake data
- Transaction metadata
- Cryptographic signatures
Indexes (Binary):
- SPOT, POST, OPST, PSOT trees
- Optimized for query performance
File System Requirements
Minimum:
- 10 GB free space
- SSD recommended (HDD acceptable)
- Sufficient IOPS for workload
Recommended:
- 100 GB+ free space
- NVMe SSD
- High IOPS capability
- Regular backups
AWS Storage Details
S3 Structure
s3://fluree-prod-data/
├── commit/
│ ├── abc123def456.commit
│ └── def456abc789.commit
├── index/
│ ├── mydb-main-t100.idx
│ └── mydb-main-t150.idx
└── graph-sources/
└── products-search/
└── main/
└── bm25/
├── manifest.json
└── t150/
└── snapshot.bin
DynamoDB Schema
The nameservice uses a DynamoDB table with a composite primary key (pk + sk) for ledger and graph source metadata coordination. Each ledger or graph source is stored as multiple items (one per concern) under the same partition key.
See DynamoDB Nameservice Guide for:
- Complete table schema with composite-key layout
- Table creation scripts (AWS CLI, CloudFormation, Terraform)
- GSI setup for listing by kind
- Local development setup with LocalStack
- Production considerations and troubleshooting
Quick Reference:
Table: fluree-nameservice
Primary Key: pk (String, ledger-id) + sk (String, concern)
Sort Key Values: meta, head, index, config, status
GSI1 (gsi1-kind): kind (HASH) + pk (RANGE)
Items per ledger: 5 (meta, head, index, config, status)
Items per graph source: 4 (meta, config, index, status)
AWS Permissions
Required IAM permissions:
S3:
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::fluree-prod-data",
"arn:aws:s3:::fluree-prod-data/*"
]
}
DynamoDB:
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:Query",
"dynamodb:BatchGetItem"
],
"Resource": [
"arn:aws:dynamodb:us-east-1:*:table/fluree-nameservice",
"arn:aws:dynamodb:us-east-1:*:table/fluree-nameservice/index/gsi1-kind"
]
}
Cost Considerations
S3 Costs:
- Storage: ~$0.023/GB/month (Standard)
- PUT requests: ~$0.005/1000 requests
- GET requests: ~$0.0004/1000 requests
DynamoDB Costs:
- Provisioned: ~$0.25/WCU/month + $0.05/RCU/month
- On-Demand: ~$1.25/million writes + $0.25/million reads
Typical Monthly Costs (medium deployment):
- S3: $50-200 (depending on data size)
- DynamoDB: $10-50 (depending on traffic)
- Total: $60-250/month
Choosing a Storage Mode
Decision Matrix
| Requirement | Memory | File | AWS | IPFS |
|---|---|---|---|---|
| Development | Best | Good | Overkill | Overkill |
| Single server | No | Best | Overkill | Good |
| Multi-server | No | No | Best | Good |
| Persistence | No | Yes | Yes | Yes |
| Cloud-native | No | No | Yes | No |
| Decentralized | No | No | No | Best |
| Content integrity | No | No | No | Best |
| Cost | Free | Free | Monthly | Free |
| Setup complexity | Trivial | Simple | Complex | Moderate |
| Performance | Fastest | Fast | Good | Good |
| Durability | None | Local | 11 9’s | Network-wide |
Recommendations
Use Memory when:
- Developing locally
- Running tests
- Data is temporary
- Maximum performance needed
Use File when:
- Single server deployment
- Local persistence needed
- Simple setup preferred
- Predictable costs important
Use AWS when:
- Multiple servers needed
- High availability required
- Geographic distribution needed
- Cloud-native architecture
Use IPFS when:
- Decentralized storage required
- Content integrity verification is critical
- Cross-organization data sharing
- Building toward IPNS/ENS-based ledger discovery
- Censorship resistance is a requirement
Switching Storage Modes
Memory to File
Export from the running system and import into the new one:
# Export from memory
curl -X POST http://localhost:8090/export?ledger=mydb:main > mydb-export.jsonld
# Stop memory server, start file server
./fluree-db-server --storage file --data-dir /var/lib/fluree
# Import to file storage
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=mydb:main" \
--data-binary @mydb-export.jsonld
File to AWS
Copy files to S3 and create the nameservice table:
# Copy data directory to S3
aws s3 sync /var/lib/fluree/ s3://fluree-prod-data/
# Create DynamoDB table (see docs/operations/dynamodb-guide.md for full schema)
aws dynamodb create-table \
--table-name fluree-nameservice \
--attribute-definitions \
AttributeName=pk,AttributeType=S \
AttributeName=sk,AttributeType=S \
AttributeName=kind,AttributeType=S \
--key-schema \
AttributeName=pk,KeyType=HASH \
AttributeName=sk,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST
# Start AWS-backed server
./fluree-db-server --storage aws --s3-bucket fluree-prod-data
AWS to File
Download from S3:
# Download data from S3
aws s3 sync s3://fluree-prod-data/ /var/lib/fluree/
# Start file-backed server
./fluree-db-server --storage file --data-dir /var/lib/fluree
Backup and Recovery
Memory Storage
No native backup (data is ephemeral):
# Export ledger
curl -X POST http://localhost:8090/export?ledger=mydb:main > backup.jsonld
File Storage
Backup data directory:
# Stop server (recommended)
systemctl stop fluree
# Backup
tar -czf fluree-backup-$(date +%Y%m%d).tar.gz /var/lib/fluree/
# Start server
systemctl start fluree
For online backups, prefer storage-level snapshots or object-store versioning. The standalone server does not currently expose HTTP read-only toggle endpoints.
AWS Storage
Use S3 versioning and lifecycle policies:
# Enable versioning
aws s3api put-bucket-versioning \
--bucket fluree-prod-data \
--versioning-configuration Status=Enabled
# Configure lifecycle
aws s3api put-bucket-lifecycle-configuration \
--bucket fluree-prod-data \
--lifecycle-configuration file://lifecycle.json
DynamoDB backups:
# Enable point-in-time recovery
aws dynamodb update-continuous-backups \
--table-name fluree-nameservice \
--point-in-time-recovery-specification PointInTimeRecoveryEnabled=true
Troubleshooting
File Storage
Permission Errors:
sudo chown -R fluree:fluree /var/lib/fluree
chmod -R 755 /var/lib/fluree
Disk Full:
# Check space
df -h /var/lib/fluree
# Force a full index refresh
curl -X POST http://localhost:8090/v1/fluree/reindex \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
AWS Storage
Connection Errors:
- Verify AWS credentials
- Check IAM permissions
- Verify S3 bucket exists
- Check DynamoDB table exists
Throttling:
- Increase DynamoDB capacity
- Use provisioned capacity mode
- Implement retry logic
Related Documentation
- Configuration - Configuration options
- IPFS Storage Guide - IPFS/Kubo setup and configuration
- DynamoDB Nameservice Guide - DynamoDB-specific setup
- Getting Started: Server - Initial setup
- Admin and Health - Administrative operations
IPFS Storage
Fluree can use IPFS as a content-addressed storage backend via the Kubo HTTP RPC API. This enables decentralized, content-addressed data storage where every piece of data is identified by its cryptographic hash.
Feature flag: Requires the
ipfsfeature to be enabled at compile time. Build with:cargo build --features ipfs
Overview
IPFS storage maps naturally to Fluree’s content-addressed architecture. Fluree already identifies every blob (commits, transactions, index nodes) with a CIDv1 content identifier using SHA-256 hashing and Fluree-specific multicodec values. When IPFS is used as the storage backend, these CIDs are stored directly into IPFS via a local Kubo node.
Key properties:
- Content-addressed: data is identified by its SHA-256 hash, providing built-in integrity verification
- Immutable: once written, data cannot be modified or deleted (only unpinned for garbage collection)
- Decentralized: data can be replicated across IPFS nodes without centralized coordination
- Compatible: Fluree’s native CIDs work directly with IPFS (no translation layer needed)
Kubo Setup
Kubo (formerly go-ipfs) is the reference IPFS implementation. Fluree communicates with Kubo via its HTTP RPC API (default port 5001).
Install Kubo
macOS (Homebrew):
brew install ipfs
Linux (official binary):
wget https://dist.ipfs.tech/kubo/v0.32.1/kubo_v0.32.1_linux-amd64.tar.gz
tar xvfz kubo_v0.32.1_linux-amd64.tar.gz
cd kubo
sudo ./install.sh
Docker:
docker run -d \
--name ipfs \
-p 4001:4001 \
-p 5001:5001 \
-p 8080:8080 \
-v ipfs_data:/data/ipfs \
ipfs/kubo:latest
Initialize and Start
# Initialize IPFS (first time only)
ipfs init
# Start the daemon
ipfs daemon
Verify the node is running:
# Check node identity
curl -s -X POST http://127.0.0.1:5001/api/v0/id | jq .ID
Security Note
The Kubo HTTP RPC API (port 5001) provides full administrative access to the IPFS node. By default, it listens only on 127.0.0.1. Do not expose port 5001 to the public internet. If Fluree and Kubo run on different hosts, use SSH tunneling, a VPN, or a reverse proxy with authentication.
The IPFS gateway (port 8080) is read-only and can be exposed publicly if desired.
Configuration
JSON-LD Configuration
{
"@context": {
"@base": "https://ns.flur.ee/config/connection/",
"@vocab": "https://ns.flur.ee/system#"
},
"@graph": [
{
"@id": "ipfsStorage",
"@type": "Storage",
"ipfsApiUrl": "http://127.0.0.1:5001",
"ipfsPinOnPut": true
},
{
"@id": "connection",
"@type": "Connection",
"indexStorage": { "@id": "ipfsStorage" }
}
]
}
Flat JSON Configuration
{
"indexStorage": {
"@type": "IpfsStorage",
"ipfsApiUrl": "http://127.0.0.1:5001",
"ipfsPinOnPut": true
}
}
Configuration Fields
| Field | Type | Default | Description |
|---|---|---|---|
ipfsApiUrl | string | http://127.0.0.1:5001 | Kubo HTTP RPC API base URL |
ipfsPinOnPut | boolean | true | Pin blocks after writing (prevents garbage collection) |
Both fields support ConfigurationValue indirection (env vars):
{
"ipfsApiUrl": { "envVar": "FLUREE_IPFS_API_URL", "defaultVal": "http://127.0.0.1:5001" },
"ipfsPinOnPut": true
}
Architecture
┌──────────────────────┐
│ Fluree Process │
│ ┌────────────────┐ │
│ │ IpfsStorage │ │
│ │ (HTTP client) │ │
│ └────────┬───────┘ │
└───────────┼──────────┘
│ HTTP RPC
┌──────▼──────┐
│ Kubo Node │
│ (port 5001)│
└──────┬──────┘
│ libp2p
┌──────▼──────┐
│ IPFS P2P │
│ Network │
└─────────────┘
Fluree communicates with a local Kubo node via the HTTP RPC API. The Kubo node handles peer-to-peer networking, block storage, and replication with the broader IPFS network.
API Endpoints Used
| Kubo Endpoint | Purpose |
|---|---|
POST /api/v0/block/put | Store a block with optional codec and hash type |
POST /api/v0/block/get | Retrieve a block by CID |
POST /api/v0/block/stat | Check if a block exists (metadata only) |
POST /api/v0/pin/add | Pin a block to prevent garbage collection |
POST /api/v0/id | Health check (verify node is reachable) |
Content Addressing
How Fluree CIDs Map to IPFS
Fluree uses CIDv1 with SHA-256 multihash and private-use multicodec values:
| Content Kind | Multicodec | Hex | Example |
|---|---|---|---|
| Commit | fluree-commit | 0x300001 | bafybeig... |
| Transaction | fluree-txn | 0x300002 | bafybeig... |
| Index Root | fluree-index-root | 0x300003 | bafybeig... |
| Index Branch | fluree-index-branch | 0x300004 | bafybeig... |
| Index Leaf | fluree-index-leaf | 0x300005 | bafybeig... |
| Dict Blob | fluree-dict-blob | 0x300006 | bafybeig... |
| Garbage Record | fluree-garbage | 0x300007 | bafybeig... |
| Ledger Config | fluree-ledger-config | 0x300008 | bafybeig... |
| Stats Sketch | fluree-stats-sketch | 0x300009 | bafybeig... |
| Graph Source Snapshot | fluree-graph-source-snapshot | 0x30000A | bafybeig... |
| Spatial Index | fluree-spatial-index | 0x30000B | bafybeig... |
These are in the multicodec private-use range (0x300000+). Kubo accepts them via the cid-codec parameter and resolves blocks by multihash regardless of codec. This means Fluree’s native CIDs work directly with IPFS without any translation layer.
Cross-Codec Retrieval
IPFS block storage is keyed by multihash internally. A block stored with codec 0x300001 (Fluree commit) can be retrieved using a CID with codec 0x55 (raw) as long as the SHA-256 digest is the same. This simplifies the address-based StorageRead implementation: given a Fluree address containing a hash, we can construct any CID with that hash to fetch the block.
Pinning
What is Pinning?
IPFS nodes periodically garbage-collect unpinned blocks to free disk space. Pinning tells the node to keep specific blocks permanently. Without pinning, blocks may be removed from the local node (though they remain available on other nodes that have them).
Default Behavior
Fluree pins every block on write when ipfsPinOnPut is true (the default). This ensures that:
- All committed data survives Kubo garbage collection
- The local node serves as a reliable storage backend
- Blocks remain available even if no other node has them
When to Disable Pinning
Set ipfsPinOnPut: false when:
- Running integration tests (faster, less disk usage)
- Using a separate pinning service (Pinata, web3.storage, etc.)
- The Kubo node is configured with
--enable-gc=false
Pinning Services
For production deployments, consider using a remote pinning service for redundancy:
# Add a remote pinning service
ipfs pin remote service add pinata https://api.pinata.cloud/psa YOUR_JWT
# Pin a CID to the remote service
ipfs pin remote add --service=pinata bafybeig...
Limitations
No Prefix Listing
IPFS is a content-addressed store with no concept of directory listing or prefix enumeration. The list_prefix() operation returns an error. Operations that require listing (e.g., ledger discovery, GC scans) must use an alternative strategy such as manifest-based tracking.
No Deletion
IPFS content is immutable. The delete() operation is a no-op. Data removal is handled through:
- Unpinning the block on the local node
- Waiting for Kubo’s garbage collector to reclaim space
- The block may still exist on other IPFS nodes
Nameservice
IPFS storage currently requires a separate nameservice (file-based or DynamoDB) for ledger metadata. A future phase will add IPNS and/or ENS-based decentralized nameservices.
Latency
Writes go through the Kubo HTTP RPC API, adding HTTP overhead compared to direct file I/O. For latency-sensitive workloads, ensure Kubo runs on the same host as Fluree (localhost communication).
No Encryption
The IPFS storage backend does not currently support Fluree’s AES256Key encryption. Blocks are stored unencrypted in IPFS. If encryption is needed, use a separate encryption layer or a private IPFS network.
Storage Addresses
Fluree addresses for IPFS storage follow the standard format:
fluree:ipfs://{ledger_id}/{kind_dir}/{hash_hex}.{ext}
Examples:
fluree:ipfs://mydb/main/commit/a1b2c3...f6a1b2.fcv2
fluree:ipfs://mydb/main/index/roots/d4e5f6...c3d4e5.json
fluree:ipfs://mydb/main/index/spot/abc123...def456.fli
The hash hex in the filename is extracted and used to construct a CID for retrieval from IPFS.
Operational Considerations
Disk Usage
Kubo stores blocks in a local datastore (by default, a LevelDB-based flatfs at ~/.ipfs/blocks/). Monitor disk usage:
# Check IPFS repo size
ipfs repo stat
# Run garbage collection (removes unpinned blocks)
ipfs repo gc
Network Bandwidth
By default, Kubo participates in the IPFS DHT and may serve blocks to other nodes. For a private deployment:
# Disable DHT (private node)
ipfs config Routing.Type none
# Or use a private IPFS network with a swarm key
# See: https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#private-networks
Performance Tuning
# Increase concurrent connections
ipfs config Swarm.ConnMgr.HighWater 300
# Adjust datastore cache
ipfs config Datastore.BloomFilterSize 1048576
# Disable automatic GC (if using external pinning)
ipfs config --json Datastore.GCPeriod '"0"'
Monitoring
Check Kubo node health:
# Node identity and version
ipfs id
# Connected peers
ipfs swarm peers | wc -l
# Repo statistics
ipfs repo stat
# Bandwidth usage
ipfs stats bw
Troubleshooting
Connection Refused
IPFS node connection failed: http://127.0.0.1:5001
Causes:
- Kubo daemon is not running
- Kubo is listening on a different address/port
- Firewall blocking the connection
Fix:
# Start the daemon
ipfs daemon
# Or check what address it's listening on
ipfs config Addresses.API
Block Not Found
IPFS block not found: bafybeig...
Causes:
- Block was never stored on this node
- Block was unpinned and garbage collected
- CID format mismatch
Fix:
# Check if block exists locally
ipfs block stat bafybeig...
# Try fetching from the network
ipfs block get bafybeig... > /dev/null
Slow Writes
Causes:
- Kubo node under heavy load
- Network latency (if Kubo is remote)
- Disk I/O bottleneck
Fix:
- Run Kubo on the same host as Fluree
- Use SSD storage for the IPFS datastore
- Consider disabling DHT for private deployments
Future Roadmap
Phase 2: Decentralized Nameservice
The IPFS storage backend is designed as the foundation for decentralized Fluree deployments. Planned additions:
- IPNS: Publish mutable pointers to ledger state (commit head, index root)
- ENS / L2 chain: On-chain CID pointers for trustless ledger discovery
- Two-tier nameservice: Local nameservice for fast reads with async push to decentralized upstream (similar to
git push)
Content Pinning Strategy
Future versions may support:
- Automatic pinning profiles (pin commits only, pin everything, pin nothing)
- Integration with remote pinning services (Pinata, web3.storage)
- Manifest-based tracking for GC and prefix listing
Related Documentation
- Storage modes - Overview of all storage backends
- Configuration - Server configuration options
- JSON-LD connection config - Full config reference
- ContentId and ContentStore - Content addressing design
- Storage traits - Storage backend architecture
DynamoDB Nameservice Guide
Overview
Fluree supports Amazon DynamoDB as a nameservice backend for storing ledger and graph source metadata. The DynamoDB nameservice provides:
- Item-per-concern independence: Each concern (commit head, index, status, config) is a separate DynamoDB item, eliminating physical write contention between transactors and indexers
- Atomic conditional updates: Reduced logical contention via conditional expressions
- Strong consistency reads: Always see the latest data
- High availability: DynamoDB’s built-in redundancy and durability
- Unified ledger + graph source support: Both ledgers and graph sources (BM25, Vector, Iceberg, etc.) share the same table with a composite key
Why DynamoDB for Nameservice?
The nameservice stores metadata about ledgers and graph sources: commit IDs, index state, status, and configuration. In high-throughput scenarios, transactors and indexers may update this metadata concurrently.
DynamoDB solves this because:
- Item-per-concern layout: Each concern (head, index, status, config) is a separate DynamoDB item under the same partition key, so writes to different concerns never contend at the physical level
- Conditional updates: Each update only proceeds if the new watermark advances monotonically
- No read-modify-write cycles (for the write itself): Updates are atomic; callers should still expect occasional conditional-update conflicts under contention and retry where appropriate
Graph Sources (non-ledger)
Graph sources (BM25, Vector, Iceberg, etc.) are stored in the same nameservice table as ledgers. Under the graph-source-owned manifest design, the nameservice does not store snapshot history for graph sources.
- For ledgers,
index_idpoints to a ledger index root. - For graph sources,
index_idpoints to a graph-source-owned root/manifest in storage (opaque to nameservice). - Snapshot history (if any) is stored in storage and managed by the graph source implementation.
This keeps DynamoDB schema stable: no unbounded “snapshot history” list is stored in the DynamoDB item.
Table Setup
Schema Overview
The table uses a composite primary key (pk + sk) with a Global Secondary Index (GSI) for listing by kind.
pk(Partition Key, String): Alias inname:branchform (e.g.,mydb:main)sk(Sort Key, String): Concern discriminator (meta,head,index,config,status)- GSI1 (
gsi1-kind): Enables efficient listing of all ledgers or all graph sources
AWS CLI
aws dynamodb create-table \
--table-name fluree-nameservice \
--attribute-definitions \
AttributeName=pk,AttributeType=S \
AttributeName=sk,AttributeType=S \
AttributeName=kind,AttributeType=S \
--key-schema \
AttributeName=pk,KeyType=HASH \
AttributeName=sk,KeyType=RANGE \
--global-secondary-indexes '[
{
"IndexName": "gsi1-kind",
"KeySchema": [
{"AttributeName": "kind", "KeyType": "HASH"},
{"AttributeName": "pk", "KeyType": "RANGE"}
],
"Projection": {
"ProjectionType": "INCLUDE",
"NonKeyAttributes": ["name", "branch", "source_type", "dependencies", "retracted"]
}
}
]' \
--billing-mode PAY_PER_REQUEST
CloudFormation
Resources:
FlureeNameserviceTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: fluree-nameservice
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: pk
AttributeType: S
- AttributeName: sk
AttributeType: S
- AttributeName: kind
AttributeType: S
KeySchema:
- AttributeName: pk
KeyType: HASH
- AttributeName: sk
KeyType: RANGE
GlobalSecondaryIndexes:
- IndexName: gsi1-kind
KeySchema:
- AttributeName: kind
KeyType: HASH
- AttributeName: pk
KeyType: RANGE
Projection:
ProjectionType: INCLUDE
NonKeyAttributes:
- name
- branch
- source_type
- dependencies
- retracted
PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: true
Tags:
- Key: Application
Value: Fluree
Terraform
resource "aws_dynamodb_table" "fluree_nameservice" {
name = "fluree-nameservice"
billing_mode = "PAY_PER_REQUEST"
hash_key = "pk"
range_key = "sk"
attribute {
name = "pk"
type = "S"
}
attribute {
name = "sk"
type = "S"
}
attribute {
name = "kind"
type = "S"
}
global_secondary_index {
name = "gsi1-kind"
hash_key = "kind"
range_key = "pk"
projection_type = "INCLUDE"
non_key_attributes = [
"name",
"branch",
"source_type",
"dependencies",
"retracted",
]
}
point_in_time_recovery {
enabled = true
}
tags = {
Application = "Fluree"
}
}
Programmatic Table Creation
Fluree’s DynamoDbNameService also provides an ensure_table() method that creates the table with the correct schema if it doesn’t already exist:
#![allow(unused)]
fn main() {
use fluree_db_storage_aws::dynamodb::DynamoDbNameService;
let ns = DynamoDbNameService::from_client(dynamodb_client, "fluree-nameservice".to_string());
ns.ensure_table().await?;
}
This is used by integration tests and can be used for bootstrapping development environments.
Table Schema
Primary Key
| Attribute | Type | Description |
|---|---|---|
pk | String (Partition Key) | Alias in name:branch form (e.g., mydb:main) |
sk | String (Sort Key) | Concern discriminator: meta, head, index, config, status |
Items per Alias
Each ledger or graph source is represented as multiple items under the same pk:
Ledger (5 items):
Sort Key (sk) | Description | Key Attributes |
|---|---|---|
meta | Identity and metadata | kind, name, branch, retracted, schema |
head | Commit head pointer | commit_id, commit_t |
index | Index head pointer | index_id, index_t |
config | Ledger configuration | default_context_id, config_v, config_meta |
status | Operational status | status, status_v, status_meta |
Graph Source (4 items):
Sort Key (sk) | Description | Key Attributes |
|---|---|---|
meta | Identity and metadata | kind, source_type, name, branch, dependencies, retracted, schema |
config | Source configuration | config_json, config_v |
index | Index head pointer | index_id, index_t |
status | Operational status | status, status_v, status_meta |
Attribute Reference
All items share these common attributes:
| Attribute | Type | Description |
|---|---|---|
pk | String | Record address (name:branch) |
sk | String | Concern discriminator |
schema | Number | Schema version (always 2) |
updated_at_ms | Number | Last update timestamp (epoch milliseconds) |
meta item:
| Attribute | Type | Description |
|---|---|---|
kind | String | ledger or graph_source |
name | String | Base name (reserved word — use #name in expressions) |
branch | String | Branch name |
retracted | Boolean | Soft-delete flag |
source_type | String (graph source only) | Graph-source type (e.g., f:Bm25Index) |
dependencies | List<String> (graph source only) | Dependent ledger IDs |
head item (ledgers only):
| Attribute | Type | Description |
|---|---|---|
commit_id | String | null | Latest commit ContentId (CIDv1) |
commit_t | Number | Commit watermark (t). 0 = unborn. |
index item (ledgers + graph sources):
| Attribute | Type | Description |
|---|---|---|
index_id | String | null | Latest index ContentId (CIDv1) |
index_t | Number | Index watermark (t). 0 = unborn. |
config item:
| Attribute | Type | Description |
|---|---|---|
default_context_id | String | null | Default JSON-LD context ContentId (ledger) |
config_json | String | null | Opaque JSON config string (graph source) |
config_v | Number | Config version watermark |
config_meta | Map | null | Extensible config metadata (ledger) |
status item:
| Attribute | Type | Description |
|---|---|---|
status | String | Current state (reserved word — use #st in expressions) |
status_v | Number | Status version watermark |
status_meta | Map | null | Extensible status metadata |
GSI1: gsi1-kind
Enables listing all entities of a given kind (ledger or graph source).
| GSI Attribute | Source Attribute | Description |
|---|---|---|
| Partition Key | kind | ledger or graph_source |
| Sort Key | pk | Record address |
| Projected | name, branch, source_type, dependencies, retracted | Meta fields for listing without additional reads |
Only meta items carry the kind attribute and project into the GSI.
Initialization Semantics
All concern items are created atomically at initialization time. This is a key structural decision:
publish_ledger_initcreates all 5 items (meta,head,index,config,status) viaTransactWriteItemspublish_graph_sourcecreates all 4 items (meta,config,index,status) viaTransactWriteItems
Subsequent writes usually use UpdateItem operations (compare_and_set_ref, publish_index, push_status, push_config). The one exception is commit-head CAS on an unknown ledger ID with expected=None, where the backend bootstraps the ledger atomically via TransactWriteItems.
How Updates Work
Commit updates (transactor):
UpdateItem Key: { pk: "mydb:main", sk: "head" }
UpdateExpression: SET commit_id = :cid, commit_t = :t, updated_at_ms = :now
ConditionExpression: attribute_exists(pk) AND commit_t < :t
Index updates (indexer):
UpdateItem Key: { pk: "mydb:main", sk: "index" }
UpdateExpression: SET index_id = :cid, index_t = :t, updated_at_ms = :now
ConditionExpression: attribute_exists(pk) AND index_t < :t
Since commit and index updates target different items (different sk), they never contend at the DynamoDB physical level.
Status updates (CAS):
UpdateItem Key: { pk: "mydb:main", sk: "status" }
UpdateExpression: SET #st = :new_state, status_v = :new_v, updated_at_ms = :now
ConditionExpression: status_v = :expected_v AND #st = :expected_state
Config updates (CAS):
UpdateItem Key: { pk: "mydb:main", sk: "config" }
UpdateExpression: SET default_context_id = :ctx, config_v = :new_v, updated_at_ms = :now
ConditionExpression: config_v = :expected_v
RefPublisher updates (compare-and-set refs):
CommitHeaduses strict monotonic guard:new.t > current.tIndexHeadallows same-watermark overwrite:new.t >= current.t(reindex at samet)
When a caller attempts compare_and_set_ref(expected=None) on an unknown ledger ID, the DynamoDB backend bootstraps the ledger by creating all 5 ledger concern items via TransactWriteItems and pre-setting the target ref to the requested value.
Retract:
UpdateItem Key: { pk: "mydb:main", sk: "meta" }
UpdateExpression: SET retracted = :true, updated_at_ms = :now
DynamoDB Reserved Words
The attributes name and status are DynamoDB reserved words. All expressions (reads, updates, projections) must use ExpressionAttributeNames:
ExpressionAttributeNames: { "#name": "name", "#st": "status" }
Trait Implementations
The DynamoDB nameservice implements all seven nameservice traits:
| Trait | Description |
|---|---|
NameService | Lookup, ledger ID resolution, list all records |
Publisher | Initialize ledgers, publish indexes, retract |
AdminPublisher | Admin index publishing (allows equal-t overwrites) |
RefPublisher | Compare-and-set on commit/index refs |
StatusPublisher | CAS-based status updates |
ConfigPublisher | CAS-based config updates (ledgers only) |
GraphSourceLookup | Read-only graph source discovery: lookup, list all records |
GraphSourcePublisher | Graph source lifecycle (extends GraphSourceLookup): create, index, retract |
Note: ConfigPublisher is scoped to ledgers only. Graph source configuration is managed through GraphSourcePublisher, which stores config as an opaque JSON string (config_json). GraphSourceLookup is a supertrait of NameService, so all nameservice implementations automatically support graph source discovery. GraphSourcePublisher adds write operations and is required only by APIs that create or drop graph sources.
Configuration
JSON-LD Connection Configuration
{
"@context": {
"@vocab": "https://ns.flur.ee/system#"
},
"@graph": [
{
"@id": "s3Storage",
"@type": "Storage",
"s3Bucket": "fluree-production-data",
"s3Endpoint": "https://s3.us-east-1.amazonaws.com",
"s3Prefix": "ledgers",
"addressIdentifier": "prod-s3"
},
{
"@id": "dynamodbNs",
"@type": "Publisher",
"dynamodbTable": "fluree-nameservice",
"dynamodbRegion": "us-east-1"
},
{
"@id": "connection",
"@type": "Connection",
"parallelism": 4,
"cacheMaxMb": 1000,
"commitStorage": {"@id": "s3Storage"},
"indexStorage": {"@id": "s3Storage"},
"primaryPublisher": {"@id": "dynamodbNs"}
}
]
}
Configuration Options
| Field | Required | Description | Default |
|---|---|---|---|
dynamodbTable | Yes | DynamoDB table name | - |
dynamodbRegion | No | AWS region | us-east-1 |
dynamodbEndpoint | No | Custom endpoint URL (for LocalStack) | AWS default |
dynamodbTimeoutMs | No | Request timeout in milliseconds | 5000 |
AWS Credentials
Authentication Methods
The DynamoDB nameservice uses the standard AWS SDK credential chain:
-
Environment Variables
export AWS_ACCESS_KEY_ID=your_access_key export AWS_SECRET_ACCESS_KEY=your_secret_key export AWS_REGION=us-east-1 -
AWS Credentials File (
~/.aws/credentials)[default] aws_access_key_id = your_access_key aws_secret_access_key = your_secret_key region = us-east-1 -
IAM Roles (when running on EC2/ECS/Lambda)
- Automatically uses instance/task role credentials
-
Session Tokens (for temporary credentials)
export AWS_SESSION_TOKEN=your_session_token
Required IAM Permissions
Full permissions (recommended):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:Query",
"dynamodb:BatchGetItem"
],
"Resource": [
"arn:aws:dynamodb:*:*:table/fluree-nameservice",
"arn:aws:dynamodb:*:*:table/fluree-nameservice/index/gsi1-kind"
]
}
]
}
If you also use ensure_table() for automated table creation (development/testing):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:Query",
"dynamodb:BatchGetItem",
"dynamodb:CreateTable",
"dynamodb:DescribeTable"
],
"Resource": [
"arn:aws:dynamodb:*:*:table/fluree-nameservice",
"arn:aws:dynamodb:*:*:table/fluree-nameservice/index/gsi1-kind"
]
}
]
}
Minimal permissions (if not using all_records, all_graph_source_records, or graph sources):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:Query"
],
"Resource": "arn:aws:dynamodb:*:*:table/fluree-nameservice"
}
]
}
Local Development
Using LocalStack
-
Start LocalStack
docker run -d --name localstack \ -p 4566:4566 \ -e SERVICES=dynamodb \ localstack/localstack -
Create Test Table
AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=test \ aws --endpoint-url=http://localhost:4566 dynamodb create-table \ --table-name fluree-nameservice \ --attribute-definitions \ AttributeName=pk,AttributeType=S \ AttributeName=sk,AttributeType=S \ AttributeName=kind,AttributeType=S \ --key-schema \ AttributeName=pk,KeyType=HASH \ AttributeName=sk,KeyType=RANGE \ --global-secondary-indexes '[ { "IndexName": "gsi1-kind", "KeySchema": [ {"AttributeName": "kind", "KeyType": "HASH"}, {"AttributeName": "pk", "KeyType": "RANGE"} ], "Projection": { "ProjectionType": "INCLUDE", "NonKeyAttributes": ["name", "branch", "source_type", "dependencies", "retracted"] } } ]' \ --billing-mode PAY_PER_REQUEST -
Configure Fluree
{ "@id": "dynamodbNs", "@type": "Publisher", "dynamodbTable": "fluree-nameservice", "dynamodbEndpoint": "http://localhost:4566", "dynamodbRegion": "us-east-1" } -
Set Environment Variables
export AWS_ACCESS_KEY_ID=test export AWS_SECRET_ACCESS_KEY=test
Using DynamoDB Local
-
Start DynamoDB Local
docker run -d --name dynamodb-local \ -p 8000:8000 \ amazon/dynamodb-local -
Create Test Table (same command as LocalStack, change
--endpoint-urltohttp://localhost:8000)
Production Considerations
Performance
- DynamoDB provides single-digit millisecond latency
- The item-per-concern layout eliminates physical contention between transactors and indexers
- Use on-demand (PAY_PER_REQUEST) billing for variable workloads
- Consider provisioned capacity for predictable high-throughput scenarios
- Enable DynamoDB Accelerator (DAX) if sub-millisecond reads are needed
Security
- Use IAM roles instead of access keys when possible
- Enable encryption at rest (default for new tables)
- Use VPC endpoints for private DynamoDB access
- Enable CloudTrail for audit logging
Monitoring
Set up CloudWatch alarms for:
ConditionalCheckFailedRequests- indicates contention (usually normal)ThrottledRequests- capacity issuesSystemErrors- service issuesSuccessfulRequestLatency- track latency
Backup and Recovery
# Enable Point-in-Time Recovery
aws dynamodb update-continuous-backups \
--table-name fluree-nameservice \
--point-in-time-recovery-specification PointInTimeRecoveryEnabled=true
# Create on-demand backup
aws dynamodb create-backup \
--table-name fluree-nameservice \
--backup-name fluree-ns-backup-$(date +%Y%m%d)
Cost Optimization
- On-demand pricing is cost-effective for variable workloads
- Table data is small (5 items per ledger, 4 per graph source), so costs are minimal
- Typical costs: $1-10/month for small deployments
- GSI storage adds minimal cost (only meta items project into it)
Troubleshooting
Authentication Failures
Symptoms: Access denied, credential errors
Solutions:
- Verify AWS credentials are configured
- Check IAM permissions for the table and GSI
- Test with AWS CLI:
aws dynamodb describe-table --table-name fluree-nameservice
Table Not Found
Symptoms: ResourceNotFoundException
Solutions:
- Verify table name is correct
- Check table is in the correct region
- Ensure table has finished creating (including GSI)
Timeout Errors
Symptoms: Request timeout
Solutions:
- Increase
dynamodbTimeoutMsconfiguration - Check network connectivity to DynamoDB
- Verify endpoint URL is correct (especially for LocalStack)
Conditional Check Failures
Symptoms: High rate of ConditionalCheckFailedException in logs
Note: This is usually normal and indicates the system is working correctly. The conditional check prevents overwriting newer data with older data. publish_index stale writes are silently ignored (the newer value is preserved). CAS operations (compare_and_set_ref, push_status, push_config) return the current value so the caller can retry or report a conflict.
Unprocessed Keys (BatchGetItem)
Symptoms: Listing graph sources intermittently returns fewer results under load, or logs show throttling.
Cause: DynamoDB may return UnprocessedKeys in BatchGetItem responses under throttling.
Behavior: Fluree retries UnprocessedKeys with exponential backoff (bounded retries). If retries are exhausted, it returns an error rather than silently dropping items.
Uninitialized Alias Errors
Symptoms: Publish operations fail with “not found” or storage errors
Cause: Attempting to publish_index or other non-bootstrap writes on a ledger ID that was never initialized with publish_ledger_init.
Solution: Ensure ledger initialization happens before index/status/config writes. Normal Fluree transaction commit-head publication uses RefPublisher CAS and can bootstrap an unknown ledger ID when expected=None.
Related Documentation
- Storage Modes - Overview of all storage options
- Configuration - Full configuration reference
- Nameservice Schema v2 Design - Schema design details
- Ledgers and the Nameservice - Conceptual overview
Query peers and replication
This document describes how to run fluree-server in transaction mode (event source + transactions) and peer mode (read replica). It also documents the events stream (/v1/fluree/events) and storage proxy endpoints (/v1/fluree/storage/*) used to keep peers up to date and/or to proxy storage reads.
This guide is written from an operator / end-user standpoint: what to deploy, how to configure it, and what to expect from each mode.
Server roles
fluree-server supports two roles:
- Transaction server (
--server-role transaction)- Write-enabled.
- Produces the nameservice events stream at
GET /v1/fluree/events. - Optionally exposes storage proxy endpoints at
/v1/fluree/storage/*.
- Query peer (
--server-role peer)- Read-only API surface for clients (queries, history, etc.).
- Subscribes to
GET /v1/fluree/eventsfrom a transaction server to learn about nameservice updates. - Reads ledger data from storage (shared-storage deployments), and refreshes on staleness based on the events stream.
- Forwards write/admin operations to the configured transaction server.
Events stream (SSE): GET /v1/fluree/events
The transaction server exposes a Server-Sent Events (SSE) stream that emits nameservice changes for ledgers and graph sources. Query peers use this stream to stay up to date.
Query parameters
all=true: subscribe to all ledgers and graph sourcesledger=<ledger_id>: subscribe to a ledger ID (name:branch, repeatable)graph-source=<graph_source_id>: subscribe to a graph source ID (name:branch, repeatable)
Authentication and authorization
The /v1/fluree/events endpoint can be configured to require Bearer tokens:
--events-auth-mode none|optional|required--events-auth-audience <aud>(optional)--events-auth-trusted-issuer <did:key:...>(repeatable)
When authentication is enabled, the token can restrict what the client may subscribe to. Requests that ask for resources not covered by the token are silently filtered to the allowed scope.
The repo includes a token generator binary for operator workflows:
fluree-events-token: generates Bearer tokens suitable forGET /v1/fluree/events
Peer mode behavior
In peer mode:
- Write forwarding: write and admin endpoints are forwarded to the transaction server configured by
--tx-server-url. - Read serving: query endpoints are served locally, using ledger/index data obtained either from shared storage or via storage proxy reads (see below). History queries are executed via the standard
/queryendpoint with time range specifiers.
Peer configuration (SSE subscription)
--server-role peer--tx-server-url <base-url>(required)--peer-events-url <url>(optional; default is{tx_server_url}/v1/fluree/events)--peer-events-token <token-or-@file>(optional; Bearer token for/v1/fluree/events)- Subscribe scope:
--peer-subscribe-allor--peer-ledger <ledger_id>(repeatable) and/or--peer-graph-source <graph_source_id>(repeatable)
Peer storage access modes
Peer servers support two storage access modes:
- Shared storage (
--storage-access-mode shared, default)- The peer reads the same storage backend as the transaction server (shared filesystem, shared bucket credentials, etc.).
- Requires
--storage-path.
- Proxy storage (
--storage-access-mode proxy)- The peer does not need direct storage credentials.
- The peer proxies all storage reads through the transaction server’s
/v1/fluree/storage/*endpoints. - Requires
--tx-server-urland a storage proxy token via--storage-proxy-tokenor--storage-proxy-token-file. --storage-pathis ignored in this mode.
Storage proxy endpoints (transaction server): /v1/fluree/storage/*
Storage proxy endpoints allow a peer to read storage through the transaction server, rather than holding storage credentials directly. This is intended for environments where storage is private and peers cannot access it.
Storage proxy supports two kinds of reads:
- Raw bytes reads (
Accept: application/octet-stream) for any block type (commit blobs, branch nodes, leaf nodes). - Policy-filtered leaf flakes reads (
Accept: application/x-fluree-flakes) for ledger leaf nodes only.
Enablement
Storage proxy endpoints are disabled by default. Enable them on the transaction server:
--storage-proxy-enabled--storage-proxy-trusted-issuer <did:key:...>(repeatable; optional if you reuse--events-auth-trusted-issuer)--storage-proxy-default-identity <iri>(optional; used when token has nofluree.identity)--storage-proxy-default-policy-class <class-iri>(optional; applies policy in addition to identity-based policy)--storage-proxy-debug-headers(optional; debug only—can leak information)
AuthZ claims (Bearer token)
Storage proxy endpoints require a Bearer token that grants storage proxy permissions:
fluree.storage.all: true: access all ledgers (graph source artifacts are denied in v1)fluree.storage.ledgers: ["books:main", ...]: access specific ledgersfluree.identity: "ex:PeerServiceAccount"(optional): identity used for policy evaluation in policy-filtered read mode
Unauthorized requests return 404 (no existence leak).
Endpoints
GET /v1/fluree/storage/ns/{ledger-id}
Fetch a nameservice record for a ledger ID. Requires storage proxy authorization for that ledger.
POST /v1/fluree/storage/block
Fetch a block/blob by CID. The request includes the ledger ID so the server can authorize the request and derive the physical storage address internally. Currently supports:
Accept: application/octet-stream(raw bytes; always available)Accept: application/x-fluree-flakes(binary “FLKB” transport of policy-filtered leaf flakes only)Accept: application/x-fluree-flakes+json(debug-only JSON flake transport; leaf flakes only)
If the client requests a flakes format for a non-leaf block, the server returns 406 Not Acceptable. Clients (and peers in proxy mode) should retry with Accept: application/octet-stream in that case.
Example request body:
{
"cid": "bafy...leafOrBranchCid",
"ledger": "mydb:main"
}
Policy filtering semantics (leaf flakes)
When a flakes format is requested and the block is a ledger leaf:
- The transaction server loads policy restrictions using the effective identity and effective policy class:
- effective identity: token
fluree.identityif present, otherwise--storage-proxy-default-identity(if configured) - effective policy class:
--storage-proxy-default-policy-class(if configured; token-driven policy class selection may be added later)
- effective identity: token
- If the resolved policy is root/unrestricted, the server returns all leaf flakes (still encoded as FLKB in
application/x-fluree-flakesmode). - If the resolved policy is non-root, the server filters leaf flakes before encoding them for transport.
Note: the peer can still apply additional client-facing policy enforcement on top of this. Client-side policy can only further restrict results; it cannot “recover” facts filtered out upstream.
Security notes and limitations
- Branch/commit leakage (v1 limitation): filtering leaves without rewriting branches/commits can leak structure/existence information to the peer identity. This is currently an accepted v1 limitation.
- Graph source artifacts (v1): storage proxy denies graph-source artifacts by returning 404 even when
fluree.storage.allis present.
Deployment examples
Transaction server (events + storage proxy)
fluree-server \
--listen-addr 0.0.0.0:8090 \
--server-role transaction \
--storage-path /var/lib/fluree \
--events-auth-mode required \
--events-auth-trusted-issuer did:key:z6Mk... \
--storage-proxy-enabled
Query peer (shared storage)
fluree-server \
--listen-addr 0.0.0.0:8091 \
--server-role peer \
--tx-server-url http://tx.internal:8090 \
--storage-path /var/lib/fluree \
--peer-subscribe-all \
--peer-events-token @/etc/fluree/peer-events.jwt
Query peer (proxy storage mode)
In proxy storage mode, the peer does not need --storage-path and instead needs a storage proxy token:
fluree-server \
--listen-addr 0.0.0.0:8091 \
--server-role peer \
--tx-server-url http://tx.internal:8090 \
--storage-access-mode proxy \
--storage-proxy-token @/etc/fluree/storage-proxy.jwt \
--peer-subscribe-all \
--peer-events-token @/etc/fluree/peer-events.jwt
Telemetry and Logging
Fluree provides comprehensive logging, metrics, and tracing capabilities for monitoring and debugging production deployments.
Logging
Log Levels
Configure log verbosity:
--log-level error|warn|info|debug|trace
error: Critical errors only warn: Warnings and errors info: Informational messages (default) debug: Detailed debugging information trace: Very detailed tracing
Log Formats
JSON Format (Recommended)
--log-format json
Output:
{
"timestamp": "2024-01-22T10:30:00.123Z",
"level": "INFO",
"target": "fluree_db_server",
"message": "Transaction committed",
"fields": {
"ledger": "mydb:main",
"t": 42,
"duration_ms": 45,
"flakes_added": 3
}
}
Benefits:
- Machine-parseable
- Easy to index (Elasticsearch, etc.)
- Structured fields
- JSON query tools work
Text Format
--log-format text
Output:
2024-01-22T10:30:00.123Z INFO fluree_db_server] Transaction committed ledger=mydb:main t=42 duration_ms=45
Benefits:
- Human-readable
- Compact
- Easy to grep
Log Output
Standard Output (Default)
./fluree-db-server
Logs to stdout/stderr.
Log File
--log-file /var/log/fluree/server.log
[logging]
file = "/var/log/fluree/server.log"
Log Rotation
Use logrotate:
# /etc/logrotate.d/fluree
/var/log/fluree/*.log {
daily
rotate 14
compress
delaycompress
notifempty
create 0644 fluree fluree
sharedscripts
postrotate
systemctl reload fluree
endscript
}
Structured Logging
Add context to logs:
#![allow(unused)]
fn main() {
// Rust code (for reference)
info!(
ledger = %ledger,
t = transaction_time,
duration_ms = duration.as_millis(),
"Transaction committed"
);
}
Output:
{
"message": "Transaction committed",
"ledger": "mydb:main",
"t": 42,
"duration_ms": 45
}
Metrics
Planned — not yet implemented. The metrics below are a design target for a future PR. Prometheus metrics are not currently exposed by the server. The tracing/OTEL instrumentation described in the rest of this document is the current observability mechanism.
Prometheus Metrics (planned)
curl http://localhost:8090/metrics
Planned metrics:
fluree_transactions_total- Total transactions (counter)fluree_transaction_duration_seconds- Transaction latency (histogram)fluree_queries_total- Total queries (counter)fluree_query_duration_seconds- Query latency (histogram)fluree_query_errors_total- Query errors (counter)fluree_indexing_lag_transactions- Novelty count (gauge)fluree_index_duration_seconds- Indexing time (histogram)fluree_uptime_seconds- Server uptime (gauge)
Prometheus Integration (planned)
Configure Prometheus to scrape Fluree:
# prometheus.yml
scrape_configs:
- job_name: 'fluree'
static_configs:
- targets: ['localhost:8090']
metrics_path: '/metrics'
scrape_interval: 15s
Distributed Tracing (OpenTelemetry)
Fluree supports OpenTelemetry (OTEL) distributed tracing, providing deep visibility into query, transaction, and indexing performance. Traces are exported to any OTLP-compatible backend (Jaeger, Grafana Tempo, AWS X-Ray, Datadog, etc.).
Integrating your application’s traces with Fluree? See Distributed Tracing Integration for how to correlate your spans with Fluree’s – both for the Rust library (
fluree-db-api) and the HTTP server (fluree-db-serverwith W3Ctraceparent).
Enabling OTEL
Build the server with the otel feature flag:
cargo build -p fluree-db-server --features otel --release
Then set environment variables to configure the OTLP exporter:
OTEL_SERVICE_NAME=fluree-server \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
RUST_LOG=info,fluree_db_query=debug,fluree_db_transact=debug \
./target/release/fluree-db-server --data-dir ./data
| Environment Variable | Default | Description |
|---|---|---|
OTEL_SERVICE_NAME | fluree-db-server | Service name in traces |
OTEL_EXPORTER_OTLP_ENDPOINT | http://localhost:4317 | OTLP receiver endpoint |
OTEL_EXPORTER_OTLP_PROTOCOL | grpc | Protocol: grpc or http/protobuf |
Quick Start with Jaeger
The repository includes a self-contained test harness in the otel/ directory:
cd otel/
make all # starts Jaeger, builds with --features otel, starts server, runs tests
make ui # opens Jaeger UI at http://localhost:16686
See Performance Investigation with Distributed Tracing for detailed usage.
Dual-Layer Subscriber Architecture
The OTEL exporter uses its own Targets filter independent of RUST_LOG. This is a critical design choice: without it, enabling RUST_LOG=debug causes third-party crate spans (hyper, tonic, h2, tower-http) to flood the OTEL batch processor, which overwhelms the exporter and causes parent spans to be dropped.
┌──────────────────────────────────────────────────┐
│ tracing-subscriber registry │
│ │
│ ┌─────────────────────┐ ┌────────────────────┐ │
│ │ Console fmt layer │ │ OTEL trace layer │ │
│ │ (EnvFilter from │ │ (Targets filter: │ │
│ │ RUST_LOG) │ │ fluree_* only) │ │
│ └─────────────────────┘ └────────────────────┘ │
└──────────────────────────────────────────────────┘
- Console layer: Respects
RUST_LOGas-is (all crates) - OTEL layer: Exports only
fluree_*crate targets at DEBUG level. Per-leaf-node TRACE spans (binary_cursor_next_leaf,scan) are excluded to prevent flooding the batch processor queue on large queries
This means RUST_LOG=debug produces verbose console output, but the OTEL exporter only receives Fluree spans – no hyper/tonic/tower noise.
Batch processor queue size: The OTEL batch span processor queue is set to 1,000,000 spans. At ~200 bytes per span, this represents ~200MB of potential memory usage under sustained debug-level traffic. This is intentional to prevent span loss during investigation. At RUST_LOG=info without OTEL, no debug spans are created at all (true zero overhead). With OTEL enabled, the queue rarely exceeds a few thousand entries under normal operation.
Shutdown
On server shutdown, the OTEL SdkTracerProvider is flushed and shut down to ensure all pending spans are exported. This is handled automatically by the server’s shutdown hook.
Dynamic Span Naming (otel.name)
Each HTTP request span is named dynamically via the otel.name field so that traces in Jaeger/Tempo show descriptive names instead of a generic request:
| Operation | otel.name examples |
|---|---|
| Query | query:json-ld, query:sparql, query:explain |
| Transact | transact:json-ld, transact:sparql-update, transact:turtle |
| Insert | insert:json-ld, insert:turtle |
| Upsert | upsert:json-ld, upsert:turtle, upsert:trig |
| Ledger mgmt | ledger:create, ledger:drop, ledger:info, ledger:exists |
The operation span attribute retains the handler-specific name for precise filtering when needed.
Span Hierarchy
Fluree instruments queries, transactions, and indexing with structured tracing spans at two tiers. The only info_span! in the codebase is request (the HTTP request span). All operation spans use debug_span!, guaranteeing true zero overhead when OTEL is not compiled and RUST_LOG is at info.
Tier 1: DEBUG (operation and phase level)
All operation, phase, and operator spans. Visible when OTEL is enabled or when RUST_LOG includes debug:
RUST_LOG=info,fluree_db_query=debug,fluree_db_transact=debug,fluree_db_indexer=debug
Spans: query_execute, query_prepare, query_run, txn_stage, txn_commit, commit_* sub-spans, index_build, build_all_indexes, build_index, sort_blocking, groupby_blocking, core operators (scan, join, filter, project, sort), format, policy_enforce, etc.
Tier 2: TRACE (maximum detail)
Per-operator detail for deep performance analysis:
RUST_LOG=info,fluree_db_query=trace
Additional spans: binary_cursor_next_leaf, property_join, group_by, aggregate, group_aggregate, distinct, limit, offset, union, optional, subquery, having
Span Tree (Query)
query_execute (debug)
├── query_prepare (debug)
│ ├── reasoning_prep (debug)
│ ├── pattern_rewrite (debug, patterns_before, patterns_after)
│ └── plan (debug, pattern_count)
├── query_run (debug)
│ ├── scan (debug)
│ ├── join (debug)
│ │ └── join_next_batch (debug, per iteration)
│ ├── filter (debug)
│ ├── project (debug)
│ ├── sort (debug)
│ ├── sort_blocking (debug, cross-thread via spawn_blocking)
│ └── ...
└── format (debug)
Span Tree (Transaction)
transact_execute (debug)
├── txn_stage (debug, insert_count, delete_count)
│ ├── where_exec (debug, pattern_count, binding_rows, retraction_count, assertion_count)
│ │ ├── delete_gen (debug, template_count, retraction_count) ← per streaming-WHERE batch
│ │ └── insert_gen (debug, template_count, assertion_count) ← per batch (mixed DELETE+INSERT only)
│ ├── cancellation (debug) ← mixed DELETE+INSERT path
│ ├── dedup_retractions (debug) ← pure-DELETE path (no INSERT templates, not Upsert)
│ └── policy_enforce (debug)
└── txn_commit (debug, flake_count, delta_bytes)
├── commit_nameservice_lookup (debug)
├── commit_verify_sequencing (debug)
├── commit_namespace_delta (debug)
├── commit_write_raw_txn (debug) ← await of upload task spawned at pipeline entry
├── commit_build_record (debug)
├── commit_write_commit_blob (debug)
├── commit_publish_nameservice (debug)
├── commit_generate_metadata_flakes (debug)
├── commit_populate_dict_novelty (debug)
└── commit_apply_to_novelty (debug)
Span Tree (Indexing)
Indexing runs as a separate top-level trace (not nested under an HTTP request). Each index refresh cycle starts its own trace root:
index_build (debug, ledger_id)
├── commit_chain_walk (debug)
├── commit_resolve (debug, per commit)
├── dict_merge_and_remap (debug)
├── build_all_indexes (debug)
│ └── build_index (debug, per order: SPOT, PSOT, POST, OPST) [cross-thread]
├── secondary_partition (debug)
├── upload_dicts (debug)
├── upload_indexes (debug)
├── build_index_root (debug)
└── BinaryIndexStore::load (debug) [cross-thread]
index_gc is a separate top-level trace (fire-and-forget tokio::spawn):
index_gc (debug, separate trace)
├── gc_walk_chain (debug)
└── gc_delete_entries (debug)
Span Tree (Bulk Import / fluree-ingest)
Bulk import runs as a standalone top-level trace under the fluree-cli service (no HTTP server involved). The import pipeline instruments all major phases:
bulk_import (debug, alias)
├── import_chunks (debug, total_chunks, parse_threads)
│ ├── [resolver thread: inherits parent context]
│ ├── [ttl-parser-N threads: inherit parent context]
│ └── commit + run generation log events
├── import_index_build (debug)
│ ├── build_all_indexes (debug)
│ │ └── build_index (debug, per order: SPOT, PSOT, POST, OPST) [cross-thread]
│ ├── import_cas_upload (debug)
│ └── import_publish (debug)
└── cleanup log events
The import_chunks span covers the parse+commit loop. Spawned threads (resolver, parse workers) and async tasks (dict upload, index build) inherit the parent span context so their work appears nested in the trace waterfall.
Tracker-to-Span Bridge
When tracked queries or transactions are executed (via the /query or /update endpoints with tracking enabled), the tracker_time and tracker_fuel fields are recorded as deferred attributes on the query_execute and transact_execute spans. These values appear as span attributes in OTEL backends (Jaeger, Tempo, etc.), enabling correlation between the Tracker’s fuel accounting and the span waterfall.
RUST_LOG Quick Reference
| Goal | Pattern | What you see |
|---|---|---|
| Production default | info | HTTP request spans only (zero operation spans) |
| Debug slow queries | info,fluree_db_query=debug | + query_execute, query_prepare, query_run, operators |
| Debug slow transactions | info,fluree_db_transact=debug | + txn_stage, txn_commit, commit sub-spans |
| Full phase decomposition | info,fluree_db_query=debug,fluree_db_transact=debug,fluree_db_indexer=debug | All debug spans |
| Per-operator detail | info,fluree_db_query=trace | + per-leaf: binary_cursor_next_leaf, etc. |
| Console firehose | debug | Everything (OTEL still filters to fluree_*) |
Note: When OTEL is enabled, the OTEL Targets filter always captures fluree_* spans at DEBUG regardless of RUST_LOG. The table above describes console output visibility only.
Further Reading
- Distributed Tracing Integration – How to correlate your application’s traces with Fluree (library and HTTP)
- Performance Investigation with Distributed Tracing – How to use tracing to find bottlenecks, including AWS deployment patterns (ECS, Lambda, X-Ray, Tempo)
- Adding Tracing Spans – How contributors should instrument new code
- otel/ README – OTEL validation harness reference
Monitoring Integration
Grafana Dashboards
Import Fluree dashboard:
{
"dashboard": {
"title": "Fluree Monitoring",
"panels": [
{
"title": "Query Rate",
"targets": [
{
"expr": "rate(fluree_queries_total[5m])"
}
]
},
{
"title": "Query Latency (p95)",
"targets": [
{
"expr": "histogram_quantile(0.95, fluree_query_duration_seconds)"
}
]
},
{
"title": "Indexing Lag",
"targets": [
{
"expr": "fluree_indexing_lag_transactions"
}
]
}
]
}
}
Datadog Integration
Send logs to Datadog:
./fluree-db-server \
--log-format json | \
datadog-agent stream --service=fluree
New Relic Integration
Use New Relic agent:
export NEW_RELIC_LICENSE_KEY=your-key
export NEW_RELIC_APP_NAME=fluree-prod
./fluree-db-server
Elasticsearch/Kibana
Ship logs to Elasticsearch:
./fluree-db-server \
--log-format json | \
filebeat -e -c filebeat.yml
Filebeat config:
filebeat.inputs:
- type: stdin
json.keys_under_root: true
output.elasticsearch:
hosts: ["localhost:9200"]
index: "fluree-logs-%{+yyyy.MM.dd}"
Health Monitoring
Health Check Endpoint
curl http://localhost:8090/health
Response (healthy):
{
"status": "healthy",
"version": "0.1.0",
"storage": "file",
"uptime_ms": 3600000,
"checks": {
"storage": "healthy",
"indexing": "healthy",
"nameservice": "healthy"
}
}
Response (unhealthy):
{
"status": "unhealthy",
"checks": {
"storage": "healthy",
"indexing": "unhealthy",
"nameservice": "healthy"
},
"errors": [
{
"component": "indexing",
"message": "Indexing lag exceeds threshold"
}
]
}
Liveness Probe
For Kubernetes:
livenessProbe:
httpGet:
path: /health
port: 8090
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
Readiness Probe
readinessProbe:
httpGet:
path: /ready
port: 8090
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
Alerting
Alert Rules
Prometheus alert rules:
groups:
- name: fluree
rules:
- alert: HighQueryLatency
expr: histogram_quantile(0.95, fluree_query_duration_seconds) > 1
for: 5m
annotations:
summary: "High query latency"
description: "95th percentile query latency is {{ $value }}s"
- alert: HighIndexingLag
expr: fluree_indexing_lag_transactions > 100
for: 10m
annotations:
summary: "High indexing lag"
description: "Indexing lag is {{ $value }} transactions"
- alert: HighErrorRate
expr: rate(fluree_query_errors_total[5m]) > 10
for: 5m
annotations:
summary: "High query error rate"
description: "Error rate is {{ $value }}/s"
Alert Destinations
Configure alert routing:
route:
receiver: 'team-ops'
group_by: ['alertname', 'ledger']
routes:
- match:
severity: critical
receiver: 'pagerduty'
- match:
severity: warning
receiver: 'slack'
receivers:
- name: 'pagerduty'
pagerduty_configs:
- service_key: 'your-key'
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/...'
channel: '#alerts'
Performance Monitoring
Key Metrics to Track
-
Query Performance:
- p50, p95, p99 latency
- Queries per second
- Error rate
-
Transaction Performance:
- Commit time
- Transactions per second
- Error rate
-
Indexing:
- Novelty count
- Index time
- Indexing lag
-
Resource Usage:
- CPU utilization
- Memory usage
- Disk I/O
- Network I/O
-
Storage:
- Storage used
- Storage growth rate
- S3 request rate (if AWS)
Dashboards
Create operational dashboards:
Overview Dashboard:
- Request rate
- Error rate
- Response times
- Active connections
Performance Dashboard:
- Query latency percentiles
- Transaction latency
- Indexing performance
- Resource utilization
Capacity Dashboard:
- Storage usage and growth
- Memory usage trends
- Indexing lag trends
- Projection to capacity limits
Logging Best Practices
1. Use Structured Logging
JSON format with consistent fields:
{
"timestamp": "2024-01-22T10:30:00Z",
"level": "INFO",
"ledger": "mydb:main",
"operation": "query",
"duration_ms": 45
}
2. Log Request IDs
Include request IDs for tracing:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Request-ID: abc-123-def-456" \
-d '{...}'
3. Appropriate Log Levels
- Production:
info - Debugging:
debug - Development:
debugortrace
4. Sample High-Volume Logs
For high-traffic deployments, sample logs:
[logging]
sample_rate = 0.1 # Log 10% of requests
5. Sensitive Data
Never log sensitive data:
- API keys
- Passwords
- Personal information
- Financial data
Related Documentation
- Configuration - Configuration options
- Admin and Health - Health monitoring
- Troubleshooting - Debugging guides
Distributed Tracing Integration
This guide explains how to correlate your application’s traces and logs with Fluree’s internal instrumentation, whether you use Fluree as an embedded Rust library (fluree-db-api) or as an HTTP server (fluree-db-server).
Overview
Fluree instruments queries, transactions, and indexing with tracing spans. These spans can participate in your application’s distributed traces so that a single trace shows the full picture: your application code, the Fluree call, and every internal phase (parsing, planning, execution, commit, etc.).
There are two integration paths depending on how you use Fluree:
| Integration mode | Mechanism | What you get |
|---|---|---|
Rust library (fluree-db-api) | Shared tracing subscriber | Fluree spans automatically nest under your application spans |
HTTP server (fluree-db-server) | W3C Trace Context (traceparent header) | Fluree’s request span becomes a child of your distributed trace |
Rust Library Integration (fluree-db-api)
When you embed Fluree via fluree-db-api, trace correlation works automatically through the tracing crate’s context propagation – no special Fluree configuration required.
How it works
The tracing crate uses task-local storage to track the “current span.” When your code creates a span and then calls a Fluree API method, any spans Fluree creates internally become children of your span. This happens automatically as long as both your code and Fluree share the same tracing subscriber (which they do by default – there’s one global subscriber per process).
Basic setup
use fluree_db_api::{FlureeBuilder, Result};
use tracing_subscriber::{EnvFilter, fmt};
#[tokio::main]
async fn main() -> Result<()> {
// Initialize tracing -- Fluree's spans will appear here too
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.init();
let fluree = FlureeBuilder::new()
.with_storage_path("./data")
.build()
.await?;
// Your application span wraps the Fluree call
let span = tracing::info_span!("handle_request", user_id = %user_id);
async {
let db = fluree.db("my-ledger", None).await?;
let result = fluree.query(&db, my_query).await?;
Ok(result)
}
.instrument(span)
.await
}
At the default RUST_LOG=info, Fluree’s info-level log events appear within your span’s context:
INFO handle_request{user_id=42}: fluree_db_api::view::query: parse_ms=0.12 plan_ms=0.45 exec_ms=3.21 query phases
With RUST_LOG=info,fluree_db_query=debug, you additionally see Fluree’s operation spans nested under yours:
INFO handle_request{user_id=42}: my_app: handling request
DEBUG handle_request{user_id=42}:query_execute: fluree_db_query: ...
DEBUG handle_request{user_id=42}:query_execute:query_prepare: fluree_db_query: ...
DEBUG handle_request{user_id=42}:query_execute:query_run: fluree_db_query: ...
INFO handle_request{user_id=42}:query_execute: fluree_db_api: parse_ms=0.12 plan_ms=0.45 exec_ms=3.21 query phases
With OpenTelemetry export
If your application exports traces to an OTEL backend (Jaeger, Tempo, Datadog, etc.), Fluree’s spans appear in the same trace waterfall:
#![allow(unused)]
fn main() {
use opentelemetry::global;
use opentelemetry_otlp::WithExportConfig;
use tracing_opentelemetry::OpenTelemetryLayer;
use tracing_subscriber::{layer::SubscriberExt, EnvFilter, Registry};
fn init_tracing() {
let exporter = opentelemetry_otlp::SpanExporter::builder()
.with_tonic()
.with_endpoint("http://localhost:4317")
.build()
.expect("OTLP exporter");
let provider = opentelemetry_sdk::trace::SdkTracerProvider::builder()
.with_simple_exporter(exporter)
.build();
global::set_tracer_provider(provider);
let otel_layer = OpenTelemetryLayer::new(global::tracer("my-app"));
let subscriber = Registry::default()
.with(otel_layer)
.with(EnvFilter::from_default_env())
.with(tracing_subscriber::fmt::layer());
tracing::subscriber::set_global_default(subscriber).unwrap();
}
}
In Jaeger/Tempo, you’ll see a single trace containing both your application spans and Fluree’s internal spans (query_execute, query_prepare, query_run, scan, join, etc.).
Three tiers of visibility
Fluree uses a tiered logging strategy. At every tier, events and spans are correlated to your application’s active span.
| Tier | RUST_LOG pattern | What you see from Fluree |
|---|---|---|
| Logs | info (default) | Info-level log events: phase timings (parse_ms, plan_ms, exec_ms), commit summaries, errors. Zero span overhead. |
| Operation spans | info,fluree_db_query=debug | + query_execute, query_prepare, query_run, operator spans — timing waterfall in Jaeger/Tempo |
| Deep tracing | info,fluree_db_query=trace | + per-leaf, per-iteration detail (binary_cursor_next_leaf, group_by, etc.) |
At the default INFO level, you get Fluree’s summary log events (timings, counts, errors) correlated inside your spans. This is sufficient for most production correlation needs.
At DEBUG, you additionally get the structured span hierarchy that produces the timing waterfall in OTEL backends. This is useful for performance investigation.
Useful RUST_LOG patterns:
| Pattern | Use case |
|---|---|
info | Production: correlatable log events, zero span overhead |
info,fluree_db_query=debug | Investigate slow queries |
info,fluree_db_transact=debug | Investigate slow transactions |
info,fluree_db_query=debug,fluree_db_transact=debug | Full operation visibility |
debug | Everything, but includes third-party crate noise |
See Telemetry and Logging for the full span hierarchy.
Key span names and fields
These are the most useful spans and fields for application-level correlation:
| Span | Level | Key fields | When it appears |
|---|---|---|---|
query_execute | DEBUG | ledger_id | Every query |
query_prepare | DEBUG | pattern_count | Query planning phase |
query_run | DEBUG | Query execution phase | |
transact_execute | DEBUG | ledger_id | Every transaction |
txn_stage | DEBUG | insert_count, delete_count | Transaction staging |
txn_commit | DEBUG | flake_count, delta_bytes | Commit to storage |
format | DEBUG | output_format, result_count | Result serialization |
Adding your own context to Fluree spans
Since spans nest automatically, the simplest approach is to wrap Fluree calls with your own spans containing the context you need:
#![allow(unused)]
fn main() {
let span = tracing::info_span!(
"api_query",
user_id = %user_id,
endpoint = %path,
ledger = %ledger_alias,
);
let result = async {
fluree.query(&db, query).await
}
.instrument(span)
.await?;
}
All of Fluree’s internal spans inherit the user_id, endpoint, and ledger fields from the parent span in trace backends that support field inheritance.
HTTP Server Integration (fluree-db-server)
When Fluree runs as a standalone HTTP server, your application connects over HTTP. Distributed trace correlation uses the W3C Trace Context standard.
W3C traceparent header
When your application sends a traceparent header with an HTTP request, fluree-db-server automatically makes its request span a child of your trace. This requires the otel feature to be enabled on the server.
traceparent: 00-{trace-id}-{parent-span-id}-{trace-flags}
Example request:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "Content-Type: application/json" \
-H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
-d '{"from": "my-ledger", "select": {"?s": ["*"]}, "where": [["?s", "rdf:type", "schema:Person"]]}'
The resulting trace in Jaeger/Tempo:
your-service: handle_request ─────────────────────────────
fluree-server: request (query:json-ld) ──────────────────────────
query_execute ─────────────────────
query_prepare ────
query_run ───────────────
scan ─────
join ─────────
format ──
Server requirements
W3C trace context propagation requires:
-
otelfeature enabled at build time:cargo build -p fluree-db-server --features otel --release -
OTEL environment variables set at runtime:
OTEL_SERVICE_NAME=fluree-server \ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \ ./fluree-server
Without the otel feature, the traceparent header is still parsed and the trace ID is recorded as a log field for text-based correlation, but the span is not linked as a child in the OTEL trace.
For background indexing triggered by a transaction request, note the distinction between logs and traces:
- The later indexing work still runs in its own background task and appears as a separate trace/span tree.
- Fluree copies the triggering request’s
request_idandtrace_idinto the queued indexing job, so the background worker’s log lines can still be correlated back to the originating request. - If multiple requests coalesce onto one queued indexing job, the latest queued request metadata is the one retained on the worker logs.
X-Request-ID header (non-OTEL correlation)
For simpler log correlation without full distributed tracing, send an X-Request-ID header:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Request-ID: abc-123-def-456" \
-d '...'
The server logs and echoes back this ID in the response headers. All log lines for the request include the request_id field, so you can correlate with:
# In JSON log output:
grep '"request_id":"abc-123-def-456"' /var/log/fluree/server.log
This works without the otel feature and is useful for text-based log correlation. The same request_id is also copied onto background indexing logs when that request queues an index build, which helps connect the foreground transaction and later worker activity in plain log search.
Client examples
Python (OpenTelemetry)
from opentelemetry import trace
from opentelemetry.propagate import inject
import requests
tracer = trace.get_tracer("my-app")
with tracer.start_as_current_span("fluree_query") as span:
headers = {"Content-Type": "application/json"}
inject(headers) # adds traceparent header automatically
response = requests.post(
"http://localhost:8090/v1/fluree/query",
headers=headers,
json={
"from": "my-ledger",
"select": {"?s": ["*"]},
"where": [["?s", "rdf:type", "schema:Person"]],
},
)
JavaScript / TypeScript (OpenTelemetry)
import { trace, context, propagation } from "@opentelemetry/api";
const tracer = trace.getTracer("my-app");
await tracer.startActiveSpan("fluree_query", async (span) => {
const headers: Record<string, string> = {
"Content-Type": "application/json",
};
propagation.inject(context.active(), headers);
const response = await fetch("http://localhost:8090/v1/fluree/query", {
method: "POST",
headers,
body: JSON.stringify({
from: "my-ledger",
select: { "?s": ["*"] },
where: [["?s", "rdf:type", "schema:Person"]],
}),
});
span.end();
return response;
});
Rust (reqwest + tracing-opentelemetry)
#![allow(unused)]
fn main() {
use opentelemetry::global;
use opentelemetry::propagation::Injector;
use reqwest::header::HeaderMap;
struct HeaderInjector<'a>(&'a mut HeaderMap);
impl Injector for HeaderInjector<'_> {
fn set(&mut self, key: &str, value: String) {
if let Ok(name) = key.parse() {
if let Ok(val) = value.parse() {
self.0.insert(name, val);
}
}
}
}
let span = tracing::info_span!("fluree_query", ledger = "my-ledger");
let _guard = span.enter();
let mut headers = HeaderMap::new();
global::get_text_map_propagator(|propagator| {
propagator.inject(&mut HeaderInjector(&mut headers));
});
let response = reqwest::Client::new()
.post("http://localhost:8090/v1/fluree/query")
.headers(headers)
.json(&query)
.send()
.await?;
}
Correlation Strategy Summary
| Scenario | Mechanism | Setup required |
|---|---|---|
Rust app embedding fluree-db-api | Shared tracing subscriber | None – automatic |
Rust app embedding fluree-db-api with OTEL | Shared subscriber + OTEL layer | Add OpenTelemetryLayer to subscriber |
HTTP client → fluree-db-server (OTEL) | traceparent header | Server built with otel feature + OTEL env vars |
HTTP client → fluree-db-server (log only) | X-Request-ID header | None – works out of the box |
Related Documentation
- Telemetry and Logging – Server-side logging, OTEL export, span hierarchy
- Adding Tracing Spans – Contributor guide for instrumenting new code
- Performance Investigation – Using traces to find bottlenecks
- Using Fluree as a Rust Library – General library usage guide
Pack format: archive and restore
Fluree’s .flpack format is a self-contained binary snapshot of an entire ledger – commits, transaction payloads, and (optionally) binary index artifacts. It enables ledger portability: archive a ledger to cold storage, restore it later under the same or a different name, or move it between environments.
Overview
The pack protocol (fluree-pack-v1) was designed for efficient bulk transfer between Fluree instances. The same format works equally well for file-based archive/restore workflows. Because all objects inside a pack are content-addressed (identified by ContentId / CIDv1), the ledger name only matters at the nameservice layer – making rename-on-restore straightforward.
What’s in a .flpack file?
A .flpack file is a binary stream of frames:
[Preamble: FPK1 + version(1)]
[Header frame] -- JSON metadata (commit count, estimated size, etc.)
[Data frames...] -- commits + txn blobs (oldest-first, topological order)
[Manifest frame]? -- marks start of index artifact phase (if included)
[Data frames...]? -- index branches, leaves, dict blobs, roots
[End frame]
Each data frame contains a CID (content identity) and the raw bytes of the object. On ingest, every frame is integrity-verified before being written to storage.
With or without indexes
A pack can include just commits + txn blobs (compact, sufficient for full restore – queries replay from commits), or it can also include binary index artifacts (larger, but the restored ledger is immediately queryable without reindexing).
CLI usage
Archive (export to .flpack)
The CLI does not yet have a dedicated fluree export --format flpack command. To produce a .flpack file today, use the pack HTTP endpoint directly or the Rust API (see below).
From the CLI, the closest equivalent is fluree clone which uses the pack protocol internally for transfer, then writes objects to local CAS.
Restore (import from .flpack)
fluree create my-restored-ledger --from /path/to/archive.flpack
This reads the .flpack file, ingests all CAS objects, and creates a new ledger pointing at the imported commit chain. The ledger name (my-restored-ledger) is independent of whatever the original ledger was called.
Rust API usage
All building blocks for archive/restore live in the API and core crates – no CLI dependency required.
Dependencies
[dependencies]
fluree-db-api = { version = "0.1", features = ["native"] }
fluree-db-core = "0.1"
fluree-db-nameservice-sync = "0.1"
tokio = { version = "1", features = ["full"] }
Archive: generate a .flpack file
Use stream_pack() from fluree-db-api to generate pack frames, then write them to a file (or S3, GCS, etc.).
#![allow(unused)]
fn main() {
use fluree_db_api::{Fluree, FlureeBuilder};
use fluree_db_api::pack::{full_ledger_pack_request, stream_pack};
use tokio::sync::mpsc;
use tokio::io::AsyncWriteExt;
async fn archive_ledger(
fluree: &Fluree<impl Storage + Clone + Send + Sync + 'static, impl NameService + RefPublisher + Send + Sync>,
ledger_id: &str,
output_path: &std::path::Path,
) -> Result<(), Box<dyn std::error::Error>> {
let handle = fluree.ledger(ledger_id).await?;
// Build a request that captures the current head commit (and index
// root, if present). `include_indexes = true` gives the restored
// ledger instant queryability; pass `false` for a smaller archive
// that reindexes on import. Empty `want` is always rejected by
// `stream_pack`, so always build via this helper.
//
// `full_ledger_pack_request` sets `include_txns = true` by default.
// To produce an even smaller archive without original transaction
// payloads (verifiable but not replayable), mutate the returned
// request: `request.include_txns = false;`.
let request = full_ledger_pack_request(&handle, /* include_indexes */ true).await?;
let (tx, mut rx) = mpsc::channel(64);
// Spawn the pack generator
let fluree_clone = fluree.clone();
let handle_clone = handle.clone();
let req_clone = request.clone();
tokio::spawn(async move {
let _ = stream_pack(&fluree_clone, &handle_clone, &req_clone, tx).await;
});
// Write frames to file
let mut file = tokio::fs::File::create(output_path).await?;
while let Some(chunk) = rx.recv().await {
file.write_all(&chunk.bytes).await?;
}
file.flush().await?;
Ok(())
}
}
To archive to S3 instead of a local file, replace the file writer with your S3 upload (e.g., aws_sdk_s3 multipart upload consuming chunks from rx).
Restore: ingest a .flpack file
Use ingest_pack_frame() from fluree-db-nameservice-sync to write each object, then finalize the nameservice pointers with set_commit_head() / set_index_head().
Streaming vs. memory-mapped reads
Pack files can be very large for production ledgers. There are two approaches to reading them:
- Memory-mapped (mmap): The CLI uses
memmap2::Mmapto map the entire file into virtual address space. This avoids heap allocation but still requires the OS to page the entire file through virtual memory. Suitable for files that fit comfortably in available address space. - Streaming: For very large archives or when reading from a non-seekable source (S3
GetObject, HTTP response, pipe), decode frames incrementally from a buffered reader. The network ingestion path (ingest_pack_stream) already works this way – it processes one frame at a time and never holds more than a single frame in memory.
For API consumers building archive/restore on large datasets, the streaming approach is recommended. The example below shows the mmap approach for simplicity; see fluree-db-nameservice-sync::pack_client::ingest_pack_stream for the streaming pattern using BytesMut + decode_frame in a loop.
#![allow(unused)]
fn main() {
use fluree_db_api::{Fluree, FlureeBuilder};
use fluree_db_core::pack::{decode_frame, read_stream_preamble, PackFrame, DEFAULT_MAX_PAYLOAD};
use fluree_db_core::{ContentKind, ContentStore};
use fluree_db_nameservice_sync::pack_client::ingest_pack_frame;
async fn restore_ledger(
fluree: &Fluree<impl Storage + Clone + Send + Sync + 'static, impl NameService + RefPublisher + Send + Sync>,
new_ledger_id: &str,
flpack_bytes: &[u8],
) -> Result<(), Box<dyn std::error::Error>> {
// 1. Create the target ledger (empty)
fluree.create(new_ledger_id).await?;
let handle = fluree.ledger(new_ledger_id).await?;
// 2. Parse preamble
let mut pos = read_stream_preamble(flpack_bytes)?;
// 3. Decode frames and ingest each CAS object
let storage = fluree.storage();
let mut ns_manifest: Option<serde_json::Value> = None;
loop {
let (frame, consumed) = decode_frame(&flpack_bytes[pos..], DEFAULT_MAX_PAYLOAD)?;
pos += consumed;
match frame {
PackFrame::Header(_header) => {
// Metadata -- log or inspect as needed
}
PackFrame::Data { cid, payload } => {
ingest_pack_frame(&cid, &payload, storage, new_ledger_id).await?;
}
PackFrame::Manifest(json) => {
// The nameservice manifest contains commit/index head CIDs and t values
if json.get("phase").and_then(|v| v.as_str()) == Some("nameservice") {
ns_manifest = Some(json);
}
}
PackFrame::End => break,
PackFrame::Error(msg) => {
return Err(format!("pack error: {msg}").into());
}
}
}
// 4. Finalize nameservice pointers from the manifest
let manifest = ns_manifest.ok_or("missing nameservice manifest in .flpack")?;
if let Some(cid_str) = manifest.get("commit_head_id").and_then(|v| v.as_str()) {
let commit_cid: fluree_db_core::ContentId = cid_str.parse()?;
let commit_t = manifest.get("commit_t").and_then(|v| v.as_i64()).unwrap_or(0);
fluree.set_commit_head(&handle, &commit_cid, commit_t).await?;
}
if let Some(cid_str) = manifest.get("index_head_id").and_then(|v| v.as_str()) {
let index_cid: fluree_db_core::ContentId = cid_str.parse()?;
let index_t = manifest.get("index_t").and_then(|v| v.as_i64()).unwrap_or(0);
fluree.set_index_head(&handle, &index_cid, index_t).await?;
}
Ok(())
}
}
Key points
- Rename on restore: The
new_ledger_idparameter controls the ledger name. CAS objects are content-addressed and name-agnostic; only the nameservice pointer uses the name. - Integrity: Every data frame is verified (SHA-256) before writing. A corrupted archive is detected immediately.
- Indexes are optional: Without indexes, the restored ledger is functional but will need to reindex (or replay from commits) before queries are efficient. With indexes, it’s ready immediately.
- Storage-agnostic: The same
.flpackfile can be restored to file storage, S3, or any backend that implements theStoragetrait. Archive from file, restore to S3 (or vice versa).
Wire format reference
For full protocol details including frame encoding, see:
- Storage-agnostic commits and sync – design rationale and protocol overview
- ContentId and ContentStore – CID encoding and verification
fluree-db-core/src/pack.rs– wire format constants and encode/decode functions
Architecture
| Concern | Crate | Key file |
|---|---|---|
| Wire format (FPK1 frames, encode/decode) | fluree-db-core | src/pack.rs |
| Pack stream generation (export) | fluree-db-api | src/pack.rs |
HTTP endpoint (POST /v1/fluree/pack/*) | fluree-db-server | src/routes/pack.rs |
| Stream ingestion (import) | fluree-db-nameservice-sync | src/pack_client.rs |
| Commit/index head finalization | fluree-db-api | src/commit_transfer.rs |
CLI .flpack file import | fluree-db-cli | src/commands/create.rs |
Admin, Health, and Stats
This document covers administrative operations, health monitoring, and server statistics for Fluree deployments.
Health Endpoints
GET /health
Basic health check:
curl http://localhost:8090/health
Response (200 OK):
{
"status": "ok",
"version": "0.1.0"
}
Use this endpoint for:
- Load balancer health checks
- Container orchestration (Kubernetes liveness/readiness probes)
- Monitoring systems
Kubernetes Example:
livenessProbe:
httpGet:
path: /health
port: 8090
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8090
initialDelaySeconds: 5
periodSeconds: 5
Statistics Endpoints
GET /v1/fluree/stats
Server statistics:
curl http://localhost:8090/v1/fluree/stats
Response:
{
"uptime_secs": 3600,
"storage_type": "file",
"indexing_enabled": true,
"cached_ledgers": 3,
"version": "0.1.0"
}
| Field | Description |
|---|---|
uptime_secs | Server uptime in seconds |
storage_type | Storage mode (memory or file) |
indexing_enabled | Whether background indexing is enabled |
cached_ledgers | Number of ledgers currently cached |
version | Server version |
Diagnostic endpoints
GET /v1/fluree/whoami
Diagnostic endpoint for debugging Bearer tokens.
- If no token is present, returns
token_present=false. - If a token is present, attempts to cryptographically verify it using the same verification logic as authenticated endpoints (embedded-JWK Ed25519 and JWKS/OIDC when enabled/configured).
- On verification failure, returns
verified=falseand includes anerrorstring. Some unverified decoded fields may be included for debugging.
curl http://localhost:8090/v1/fluree/whoami \
-H "Authorization: Bearer eyJ..."
CLI discovery
GET /.well-known/fluree.json
Discovery document used by the CLI when adding a remote (fluree remote add) or when running fluree auth login with no configured auth type.
Standalone fluree-server returns:
{"version":1,"api_base_url":"/v1/fluree"}when no auth is enabled{"version":1,"api_base_url":"/v1/fluree","auth":{"type":"token"}}when any server auth mode is enabled (data/events/admin)
OIDC-capable implementations can return auth.type="oidc_device" plus issuer, client_id, and exchange_url.
The CLI treats oidc_device as “OIDC interactive login”: it uses device-code when the IdP supports it, otherwise authorization-code + PKCE (localhost callback).
Implementations MAY also return api_base_url to tell the CLI where the Fluree API is mounted (for example,
when the API is hosted under /v1/fluree or on a separate data subdomain).
See Auth contract (CLI ↔ Server) for the full schema and behavior.
GET /v1/fluree/info/<ledger…>
Get detailed ledger metadata:
curl "http://localhost:8090/v1/fluree/info/mydb:main"
Minimum fields used by the Fluree CLI:
t(required)commitId(required forfluree pushwhent > 0)
Optional query params:
- By default,
ledger-inforeturns the full novelty-aware stats view, including real-time datatype details and class ref edges. realtime_property_details=false: switchledger-infoto the lighter fast novelty-aware stats layer that keeps counts current but skips lookup-backed class/ref enrichment.include_property_datatypes=false: omitstats.properties[*].datatypeswhen you want a smaller payload.include_property_estimates=true: include index-derivedndv-values,ndv-subjects, and selectivity fields understats.properties[*].
Example:
curl "http://localhost:8090/v1/fluree/info/mydb:main"
Response:
{
"ledger": "mydb:main",
"t": 150,
"commitId": "bafybeig...commitT150",
"indexId": "bafybeig...indexRootT145",
"commit": {
"commit_id": "bafybeig...commitT150",
"t": 150
},
"index": {
"id": "bafybeig...indexRootT145",
"t": 145
},
"stats": {
"flakes": 12345,
"size": 1048576,
"indexed": 145,
"properties": {
"ex:name": {
"count": 3,
"last-modified-t": 150
}
},
"classes": {
"ex:Person": {
"count": 2,
"properties": {
"ex:worksFor": {
"count": 2,
"refs": { "ex:Organization": 2 },
"ref-classes": { "ex:Organization": 2 }
},
"ex:name": {}
},
"property-list": ["ex:name", "ex:worksFor"]
}
}
}
}
Stats freshness (real-time vs indexed)
-
Real-time (includes novelty):
commitand top-leveltreflect the latest committed head.stats.flakesandstats.sizeare derived from the current ledger stats view (indexed + novelty deltas).stats.classes[*].properties/property-listwill include properties introduced in novelty, even when the update does not restate@type.stats.properties[*].datatypesis real-time by default.stats.classes[*].properties[*].refsis real-time by default.
-
As-of last index:
stats.indexedis the last index (t). Ifcommit.t > indexed, the index is behind the head.- NDV-related fields in
stats.properties[*](ndv-values,ndv-subjects) and selectivity derived from them are only as current as the last index refresh, so they are omitted by default and only included wheninclude_property_estimates=true. stats.properties[*].datatypesare omitted only wheninclude_property_datatypes=falseis requested.- Class property ref-edge counts (
stats.classes[*].properties[*].refs) fall back to the lighter indexed/fast path only whenrealtime_property_details=falseis requested.
GET /v1/fluree/exists/<ledger…>
Check if a ledger exists:
curl "http://localhost:8090/v1/fluree/exists/mydb:main"
Response:
{
"ledger": "mydb:main",
"exists": true
}
This is a lightweight check that only queries the nameservice without loading the ledger.
Administrative Operations
POST /v1/fluree/create
Create a new ledger:
curl -X POST http://localhost:8090/v1/fluree/create \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
Response (201 Created):
{
"ledger": "mydb:main",
"t": 0,
"tx-id": "fluree:tx:sha256:abc123...",
"commit": {
"commit_id": "bafybeig...commitT0"
}
}
Authentication: When --admin-auth-mode=required, requires Bearer token from a trusted issuer.
See Admin Authentication for details.
POST /v1/fluree/drop
Drop (delete) a ledger:
# Soft drop (retract from nameservice, preserve files)
curl -X POST http://localhost:8090/v1/fluree/drop \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
# Hard drop (delete all files - IRREVERSIBLE)
curl -X POST http://localhost:8090/v1/fluree/drop \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main", "hard": true}'
Response:
{
"ledger": "mydb:main",
"status": "dropped",
"files_deleted": 23
}
| Status | Description |
|---|---|
dropped | Successfully dropped |
already_retracted | Was previously dropped |
not_found | Ledger doesn’t exist |
Authentication: When --admin-auth-mode=required, requires Bearer token from a trusted issuer.
Drop Modes:
- Soft (default): Retracts from nameservice, files remain (recoverable)
- Hard: Deletes all files (irreversible)
See Dropping Ledgers for more details.
API Specification
GET /swagger.json
OpenAPI specification:
curl http://localhost:8090/swagger.json
Returns the OpenAPI 3.0 specification for the server API.
Monitoring Best Practices
1. Use Health Checks
Configure your infrastructure to poll /health:
# Simple monitoring script
while true; do
curl -sf http://localhost:8090/health > /dev/null || echo "ALERT: Server unhealthy"
sleep 10
done
2. Track Server Stats
Periodically collect statistics:
curl http://localhost:8090/v1/fluree/stats | jq .
Key metrics to track:
uptime_secs: Detect restartscached_ledgers: Cache efficiency
3. Monitor Ledger Health
For each critical ledger:
curl "http://localhost:8090/v1/fluree/info/mydb:main" | jq .
Watch for:
- Index lag (
commit.tvsindex.t) - Unexpected state changes
4. Set Up Alerts
Alert conditions:
- Health check failures
- Server restarts (low uptime)
- High index lag
5. Log Analysis
Enable structured logging:
fluree-server --log-level info 2>&1 | jq .
Search for:
level: "error"- Errorslevel: "warn"- Warnings- Slow query patterns
Security Considerations
Protect Admin Endpoints
In production, enable admin authentication:
fluree-server \
--admin-auth-mode required \
--admin-auth-trusted-issuer did:key:z6Mk...
This protects /v1/fluree/create, /v1/fluree/drop, and other admin-protected
API routes from unauthorized access.
Limit Endpoint Exposure
Consider network-level restrictions:
- Health endpoint: Available to load balancers
- Stats endpoint: Internal monitoring only
- Admin endpoints: Restricted access
Audit Logging
Admin operations are logged. Monitor for:
- Ledger creation
- Ledger drops
- Authentication failures
Related Documentation
- Configuration - Server configuration options
- Query Peers - Distributed deployment
- Telemetry - Logging configuration
- API Endpoints - Full endpoint reference
Troubleshooting
This section helps you diagnose and resolve common issues with Fluree deployments.
Troubleshooting Guides
Common Errors
Reference for frequently encountered errors:
- Ledger not found
- Invalid IRI errors
- Transaction failures
- Query timeouts
- Permission errors
- Storage issues
- Indexing problems
Debugging Queries
Tools and techniques for query debugging:
- Using EXPLAIN plans
- Query tracing
- Performance profiling
- Identifying slow queries
- Optimizing query patterns
Quick Diagnostics
Health Check
First step for any issue:
curl http://localhost:8090/health
Check for unhealthy components.
Server Status
Check overall server state:
curl http://localhost:8090/v1/fluree/stats
Look for:
- High error counts
- Active queries/transactions stuck
- High indexing lag
- Memory issues
Logs
Check server logs:
# Recent errors
tail -f /var/log/fluree/server.log | grep ERROR
# Recent warnings
tail -f /var/log/fluree/server.log | grep WARN
Common Issue Categories
Connection Issues
Symptoms:
- Cannot connect to server
- Connection refused
- Connection timeout
Common Causes:
- Server not running
- Wrong port
- Firewall blocking
- Network issues
Quick Checks:
# Is server running?
ps aux | grep fluree-db-server
# Is port listening?
netstat -an | grep 8090
# Can you reach it?
curl http://localhost:8090/health
Query Issues
Symptoms:
- Queries return no results
- Queries timeout
- Unexpected results
- Query errors
Quick Checks:
# Enable explain
curl -X POST http://localhost:8090/v1/fluree/explain \
-d '{...}'
# Check server stats
curl http://localhost:8090/v1/fluree/stats
See Debugging Queries.
Transaction Issues
Symptoms:
- Transactions fail
- Validation errors
- Policy denials
- Slow commits
Quick Checks:
# Validate JSON-LD
# Use online validator: json-ld.org/playground
# Check permissions
curl -X POST http://localhost:8090/v1/fluree/update?dryRun=true \
-d '{...}'
# Check server stats
curl http://localhost:8090/v1/fluree/stats
Performance Issues
Symptoms:
- Slow queries
- Slow transactions
- High latency
- Timeouts
Quick Checks:
# Check indexing lag
curl http://localhost:8090/v1/fluree/info/mydb:main | jq '.t - .index.t'
# Check resource usage
curl http://localhost:8090/v1/fluree/stats
# Check active operations
curl http://localhost:8090/v1/fluree/stats
Storage Issues
Symptoms:
- Cannot write data
- Storage errors
- Disk full
- AWS errors
Quick Checks:
# Check disk space
df -h /var/lib/fluree
# Check AWS connectivity
aws s3 ls s3://fluree-prod-data/
# Check server stats
curl http://localhost:8090/v1/fluree/stats
Error Code Reference
See Common Errors for complete error code reference.
Most Common:
LEDGER_NOT_FOUND- Ledger doesn’t existPARSE_ERROR- Invalid JSON-LD or SPARQLINVALID_IRI- Malformed IRIQUERY_TIMEOUT- Query took too longPOLICY_DENIED- Not authorized
Diagnostic Tools
Enable Debug Logging
./fluree-db-server --log-level debug
Runtime log-level changes are not currently exposed through the standalone HTTP
API; restart with the desired --log-level or RUST_LOG.
Enable Query Tracing
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Fluree-Trace: true" \
-d '{...}'
Enable Policy Tracing
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Fluree-Policy-Trace: true" \
-d '{...}'
Get Query Plan
curl -X POST http://localhost:8090/v1/fluree/explain \
-d '{...}'
Getting Help
Diagnostic Information to Collect
When reporting issues, include:
-
Server version:
curl http://localhost:8090/health -
Configuration:
./fluree-db-server --help # Include relevant config values -
Error messages:
- Complete error response
- Relevant log entries
-
Reproduction steps:
- Minimal example to reproduce
- Sample data if needed
-
Environment:
- OS and version
- Storage mode
- Available resources (RAM, disk)
Log Collection
Collect diagnostic logs:
# Last 1000 lines
tail -n 1000 /var/log/fluree/server.log > fluree-diagnostic.log
# Specific time range
grep "2024-01-22T10:" /var/log/fluree/server.log > issue-logs.log
Best Practices
1. Check Logs First
Always check logs before deeper investigation:
tail -f /var/log/fluree/server.log
2. Start with Health Check
curl http://localhost:8090/health
3. Isolate the Issue
Test components independently:
- Can you connect?
- Can you query?
- Can you transact?
4. Use Debug Mode Carefully
Debug logging is verbose:
- Use temporarily
- Disable in production
- May impact performance
5. Test on Development
Reproduce on development environment before investigating production.
6. Keep Logs
Retain logs for historical analysis:
# Logrotate config
/var/log/fluree/*.log {
daily
rotate 30
compress
}
Related Documentation
- Common Errors - Error reference
- Debugging Queries - Query debugging
- API Errors - HTTP error codes
- Operations - Operational guides
Common Errors
This document provides solutions for the most frequently encountered Fluree errors.
LEDGER_NOT_FOUND
{
"error": "NotFound",
"message": "Ledger not found: mydb:main",
"code": "LEDGER_NOT_FOUND"
}
Causes
- Ledger doesn’t exist
- Typo in ledger name
- Wrong branch name
- Nameservice not initialized
Solutions
Check ledger exists:
curl http://localhost:8090/v1/fluree/ledgers
Create ledger:
curl -X POST "http://localhost:8090/v1/fluree/create" \
-H "Content-Type: application/json" \
-d '{"ledger": "mydb:main"}'
Verify spelling:
- Check for typos in ledger name
- Verify branch name (default is
main) - Check case sensitivity
PARSE_ERROR
{
"error": "ParseError",
"message": "Invalid JSON-LD: unexpected token at line 5",
"code": "PARSE_ERROR",
"details": {
"line": 5,
"column": 12
}
}
Causes
- Invalid JSON syntax
- Invalid JSON-LD structure
- Invalid SPARQL syntax
- Missing required fields
Solutions
Validate JSON:
# Use jq to validate
cat query.json | jq .
Check JSON-LD:
- Validate @context format
- Check @id and @type values
- Verify array vs object usage
Check SPARQL:
- Validate syntax online
- Check PREFIX declarations
- Verify quote matching
Common JSON Mistakes:
// Bad: trailing comma
{
"select": ["?name"],
"where": [...],
}
// Good: no trailing comma
{
"select": ["?name"],
"where": [...]
}
INVALID_IRI
{
"error": "ValidationError",
"message": "Invalid IRI: not a valid URI",
"code": "INVALID_IRI",
"details": {
"iri": "not a uri"
}
}
Causes
- Malformed IRI
- Missing namespace prefix
- Invalid characters
- Spaces in IRI
Solutions
Use valid IRIs:
// Good
{"@id": "http://example.org/alice"}
{"@id": "ex:alice"}
// Bad
{"@id": "not a uri"}
{"@id": "alice"} // Missing namespace
{"@id": "ex:alice smith"} // Space
Define namespace:
{
"@context": {
"ex": "http://example.org/ns/"
},
"@graph": [
{"@id": "ex:alice"} // Now valid
]
}
URL encode spaces:
{"@id": "ex:alice%20smith"}
UNRESOLVED_COMPACT_IRI
Unresolved compact IRI 'ex:Person': prefix 'ex' is not defined in @context.
If this is intended as an absolute IRI, use a full form (e.g. http://...)
or add the prefix to @context.
This error fires from the JSON-LD strict compact-IRI guard. A value that looks like a compact IRI (prefix:suffix) appeared in an IRI position, but prefix is not defined in @context and is not a recognised absolute scheme.
Causes
- Forgotten
@contexton a query or transaction - Misspelled or missing prefix in
@context - Intentionally using a bare
prefix:suffixstring as an opaque identifier
Solutions
Add the missing prefix to @context (most common fix):
{
"@context": {"ex": "http://example.org/ns/"},
"@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
}
Use a full absolute IRI instead of the compact form:
{
"@graph": [
{"@id": "http://example.org/ns/alice", "http://example.org/ns/name": "Alice"}
]
}
Opt out of the guard for legacy data where bare prefix:suffix strings are intentional:
{
"@context": {"ex": "http://example.org/ns/"},
"opts": {"strictCompactIri": false},
"@graph": [{"@id": "legacy:alice", "ex:name": "Alice"}]
}
The opt-out applies to both queries and transactions. See IRIs and @context — Strict Compact-IRI Guard for the full policy.
QUERY_TIMEOUT
{
"error": "Timeout",
"message": "Query execution exceeded timeout of 30000ms",
"code": "QUERY_TIMEOUT",
"details": {
"timeout_ms": 30000,
"elapsed_ms": 31245
}
}
Causes
- Complex query
- Large result set
- High indexing lag
- Insufficient resources
Solutions
Add LIMIT:
{
"select": ["?name"],
"where": [...],
"limit": 100 // Add limit
}
Add filters:
{
"where": [...],
"filter": "?age > 18" // Reduce result set
}
Check indexing lag:
curl http://localhost:8090/v1/fluree/info/mydb:main
# If (t - index.t) is large, wait for indexing (or reduce write rate)
Simplify query:
- Break into smaller queries
- Remove unnecessary joins
- Use more specific patterns
Increase timeout:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Fluree-Timeout: 60000" \
-d '{...}'
POLICY_DENIED
{
"error": "Forbidden",
"message": "Policy denies access to ledger mydb:main",
"code": "POLICY_DENIED",
"details": {
"subject": "did:key:z6Mkh...",
"action": "query",
"ledger": "mydb:main"
}
}
Causes
- No permission for operation
- Missing authentication
- Policy misconfiguration
- Wrong DID/identity
Solutions
Check authentication:
# Are you sending credentials?
curl -H "Authorization: Bearer token" ...
Verify policy:
# Query policies
SELECT ?policy ?subject ?action ?allow
WHERE {
?policy a f:Policy .
?policy f:subject ?subject .
?policy f:action ?action .
?policy f:allow ?allow .
}
Test with policy trace:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Fluree-Policy-Trace: true" \
-d '{...}'
Check DID:
- Verify DID in signed request
- Check DID is registered
- Verify public key
TYPE_ERROR
{
"error": "TypeError",
"message": "Expected integer, got string",
"code": "TYPE_ERROR",
"details": {
"expected": "xsd:integer",
"actual": "xsd:string",
"value": "not a number"
}
}
Causes
- Wrong datatype
- Type mismatch in comparison
- Invalid type conversion
Solutions
Use correct types:
// Good
{"ex:age": 30}
{"ex:age": {"@value": "30", "@type": "xsd:integer"}}
// Bad
{"ex:age": "30"} // String, not integer
Check type constraints:
- Verify expected types
- Use explicit @type
- Validate before submitting
PAYLOAD_TOO_LARGE
{
"error": "PayloadTooLarge",
"message": "Transaction exceeds maximum size of 10485760 bytes",
"code": "PAYLOAD_TOO_LARGE",
"details": {
"max_size": 10485760,
"actual_size": 15000000
}
}
Causes
- Transaction too large
- Query result too large
- Large embedded data
Solutions
Batch large transactions:
const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
const batch = entities.slice(i, i + batchSize);
await transact({"@graph": batch});
}
Use LIMIT for queries:
{
"select": ["?name"],
"where": [...],
"limit": 1000 // Paginate
}
Increase limits (if appropriate):
./fluree-db-server --max-transaction-size 20971520
STORAGE_ERROR
{
"error": "StorageError",
"message": "Cannot write to storage",
"code": "STORAGE_ERROR"
}
Causes
- Disk full (file storage)
- Permission errors
- AWS connectivity (AWS storage)
- Storage backend down
Solutions
File Storage:
# Check disk space
df -h /var/lib/fluree
# Check permissions
ls -la /var/lib/fluree
sudo chown -R fluree:fluree /var/lib/fluree
AWS Storage:
# Check AWS credentials
aws sts get-caller-identity
# Check S3 access
aws s3 ls s3://fluree-prod-data/
# Check DynamoDB
aws dynamodb describe-table --table-name fluree-nameservice
HIGH_INDEXING_LAG
Not an error, but a warning condition.
Symptoms
curl http://localhost:8090/v1/fluree/info/mydb:main
{
"commit_t": 150,
"index_t": 0
}
Causes
- Transaction rate exceeds indexing capacity
- Large transactions
- Insufficient resources
- Storage bottleneck
Solutions
Tune indexing:
fluree-server \
--indexing-enabled \
--reindex-min-bytes 100000 \
--reindex-max-bytes 1000000
Reduce transaction rate:
// Add delay between transactions
await transact(data);
await sleep(100);
Wait for catchup:
async function waitForIndexing() {
while (true) {
const status = await getStatus();
const lag = status.commit_t - status.index_t;
if (lag < 10) break;
await sleep(1000);
}
}
Add resources:
- More CPU
- More memory
- Faster disk
CONCURRENT_MODIFICATION
{
"error": "Conflict",
"message": "Concurrent modification detected",
"code": "CONCURRENT_MODIFICATION"
}
Causes
- Multiple processes updating same entity
- Nameservice contention
- Race condition
Solutions
Implement retry:
async function transactWithRetry(data, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await transact(data);
} catch (err) {
if (err.code === 'CONCURRENT_MODIFICATION' && i < maxRetries - 1) {
await sleep(Math.pow(2, i) * 100);
continue;
}
throw err;
}
}
}
Use upsert for retry-friendly transactions:
# Upsert is more retry-friendly for idempotent entity transactions
POST /upsert?ledger=mydb:main
SIGNATURE_VERIFICATION_FAILED
{
"error": "SignatureVerificationFailed",
"message": "Invalid signature",
"code": "INVALID_SIGNATURE"
}
Causes
- Wrong private key
- Payload modified after signing
- Incorrect algorithm
- Key not registered
Solutions
Verify signing process:
// Ensure payload not modified
const payload = JSON.stringify(transaction);
const jws = await sign(payload, privateKey);
// Don't modify payload after signing
Check algorithm:
{
"alg": "EdDSA", // Must match key type
"kid": "did:key:z6Mkh..."
}
Verify public-key material: standalone server signed requests use the key
material embedded in supported JWS/JWT headers (or configured OIDC JWKS). There
is no /admin/keys registration endpoint.
Memory Issues
Symptoms
- Out of memory errors
- Server crashes
- Slow performance
- Swap usage
Solutions
Check memory:
curl http://localhost:8090/v1/fluree/stats
Reduce memory usage:
# See docs/operations/configuration.md for current memory-related flags.
# In general: reduce write/query load, reduce indexing lag, and provision more RAM.
Add more RAM:
- Upgrade server
- Use cloud instance with more memory
Reduce novelty:
- Index more frequently
- Reduce transaction size
Troubleshooting Checklist
When encountering issues, check:
- Server is running
- Can connect to server
- Health endpoint returns healthy
- Logs show no errors
- Ledger exists
- Correct ledger name/branch
- Valid JSON-LD/SPARQL syntax
- Sufficient resources (disk, memory)
- No network issues
- Authentication working (if required)
Related Documentation
- Debugging Queries - Query-specific debugging
- API Errors - HTTP error reference
- Operations - Operational guides
- Telemetry - Monitoring and logging
Debugging Queries
This guide provides tools and techniques for debugging query performance and correctness issues in Fluree.
Query Explain Plans
Enable Explain
Get query execution plan:
curl -X POST http://localhost:8090/v1/fluree/explain \
-H "Content-Type: application/json" \
-d '{
"from": "mydb:main",
"select": ["?name", "?age"],
"where": [
{ "@id": "?person", "schema:name": "?name" },
{ "@id": "?person", "schema:age": "?age" }
],
"filter": "?age > 25"
}'
Response:
{
"plan": {
"type": "join",
"left": {
"type": "scan",
"index": "POST",
"predicate": "schema:name",
"estimated_rows": 1000
},
"right": {
"type": "scan",
"index": "POST",
"predicate": "schema:age",
"estimated_rows": 1000
},
"join_variable": "?person",
"filter": {
"expression": "?age > 25",
"selectivity": 0.6
},
"estimated_result_rows": 600
},
"execution": {
"duration_ms": 45,
"rows_scanned": 2000,
"rows_returned": 573,
"index_hits": 2000,
"filter_applications": 1000
}
}
Understanding Explain Output
Scan Operations:
- Which index used (SPOT, POST, OPST, PSOT)
- Estimated rows
- Actual rows scanned
Join Operations:
- Join type (hash, merge, nested loop)
- Join variable
- Join order
Filter Operations:
- Filter expression
- Estimated selectivity
- Rows filtered
Execution Stats:
- Total duration
- Rows scanned vs returned
- Index efficiency
Query Tracing
Enable Tracing
Get detailed execution trace:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Fluree-Trace: true" \
-d '{...}'
Response:
{
"results": [...],
"trace": {
"total_duration_ms": 45,
"phases": [
{
"phase": "parse",
"duration_ms": 2
},
{
"phase": "plan",
"duration_ms": 3
},
{
"phase": "execute",
"duration_ms": 38,
"steps": [
{
"step": "scan_POST_schema:name",
"duration_ms": 12,
"rows": 1000
},
{
"step": "scan_POST_schema:age",
"duration_ms": 15,
"rows": 1000
},
{
"step": "join",
"duration_ms": 8,
"rows": 1000
},
{
"step": "filter",
"duration_ms": 3,
"rows_in": 1000,
"rows_out": 573
}
]
},
{
"phase": "serialize",
"duration_ms": 2
}
]
}
}
Trace Analysis
Look for:
- Slow phases: Which phase takes longest?
- Excessive scans: Too many rows scanned?
- Inefficient joins: Large intermediate results?
- Ineffective filters: Filters applied too late?
Common Query Issues
No Results
Symptom: Query returns empty array
Debugging Steps:
-
Check data exists:
SELECT (COUNT(*) as ?count) WHERE { ?s ?p ?o } -
Test each pattern separately:
// Test pattern 1 {"select": ["?person"], "where": [{"@id": "?person", "schema:name": "?name"}]} // Test pattern 2 {"select": ["?person"], "where": [{"@id": "?person", "schema:age": "?age"}]} -
Check IRI matching:
// Query with full IRI {"@id": "http://example.org/ns/alice"} // Or with prefix {"@id": "ex:alice"} -
Verify time specifier:
# Current data "from": "mydb:main" # Historical might be empty "from": "mydb:main@t:1"
Unexpected Results
Symptom: Results don’t match expectations
Debugging Steps:
-
Check each variable:
{ "select": ["?person", "?name", "?age"], // See all bindings "where": [...] } -
Verify types:
SELECT ?person ?name (DATATYPE(?name) as ?nameType) WHERE { ?person schema:name ?name } -
Check for duplicates:
SELECT ?person (COUNT(?name) as ?count) WHERE { ?person schema:name ?name } GROUP BY ?person HAVING (?count > 1) -
Test without filters:
// Remove filter temporarily {"where": [...] } // No filter
Slow Queries
Symptom: Query takes too long
Debugging Steps:
-
Check explain plan:
curl -X POST http://localhost:8090/v1/fluree/explain -d '{...}' -
Check indexing lag:
curl http://localhost:8090/v1/fluree/info/mydb:main # High indexing lag (commit_t - index_t) can slow queries -
Add LIMIT:
{"where": [...], "limit": 100} -
Check pattern specificity:
// Specific (fast) {"@id": "ex:alice", "schema:name": "?name"} // General (slow) {"@id": "?entity", "?pred": "?value"} -
Verify index usage:
- Subject-based patterns use SPOT (fast)
- Broad patterns may scan many triples (slow)
Query Optimization
Automatic Pattern Reordering
The query planner automatically reorders WHERE-clause patterns for optimal join order. You do not need to manually order patterns from most to least selective — the planner does this for you using a greedy algorithm that considers cardinality estimates and which variables are already bound at each step.
When database statistics are available (after at least one indexing cycle), estimates use HLL-derived property counts and distinct-value counts. Without statistics, the planner falls back to heuristic constants. You can verify the planner’s decisions using explain plans (see Explain Plans).
Both of these queries produce the same execution plan:
{
"where": [
{"@id": "?company", "schema:name": "?companyName"},
{"@id": "?person", "schema:worksFor": "?company"},
{"@id": "ex:alice", "schema:name": "?name"}
]
}
{
"where": [
{"@id": "ex:alice", "schema:name": "?name"},
{"@id": "ex:alice", "schema:worksFor": "?company"},
{"@id": "?company", "schema:name": "?companyName"}
]
}
The planner recognizes that ex:alice patterns are highly selective (bound
subject), and that ?company becomes bound after those patterns execute,
making the final pattern a cheap per-subject lookup rather than a full scan.
Filter and BIND Placement
Filters and BINDs are placed during the greedy reordering loop, as soon as all their input variables are bound. You do not need to manually position them for efficiency. For BIND patterns, only the expression’s input variables must be bound — the target variable is an output that feeds back into the bound set, enabling cascading placement of dependent patterns.
When a filter or BIND becomes ready immediately after a compound pattern (UNION, Graph, or Service), the planner pushes it into the compound pattern’s inner lists rather than placing it after. For UNION, the filter is cloned into every branch. This means filters execute within each branch, benefiting from optimal placement, range pushdown, and inline evaluation — the same optimizations available to top-level filters.
Additionally, filters whose variables are all bound by a join operator are
evaluated inline during the join itself, avoiding the overhead of a separate
filter pass. Filters that depend on a BIND’s output variable are fused into
the BindOperator and evaluated inline after computing each row’s BIND value,
similarly eliminating a separate filter pass. Range-safe filters (comparisons
like >, < on indexed properties) are pushed down to the index scan.
Use LIMIT
Always limit large result sets:
{
"where": [...],
"orderBy": ["?name"],
"limit": 100,
"offset": 0
}
Implement pagination for UI.
Avoid Cartesian Products
Ensure patterns are connected:
Bad (Cartesian product):
{
"where": [
{"@id": "?person", "schema:name": "?name"},
{"@id": "?company", "schema:name": "?companyName"}
// Not connected! Returns person × company combinations
]
}
Good (connected):
{
"where": [
{"@id": "?person", "schema:name": "?name"},
{"@id": "?person", "schema:worksFor": "?company"},
{"@id": "?company", "schema:name": "?companyName"}
]
}
Policy Debugging
Enable Policy Trace
See which policies apply:
curl -X POST http://localhost:8090/v1/fluree/query \
-H "X-Fluree-Policy-Trace: true" \
-d '{...}'
Response:
{
"results": [...],
"policy_trace": [
{
"policy": "ex:department-policy",
"matched": true,
"condition_met": true,
"decision": "allow",
"patterns_added": [
{"@id": "?person", "ex:department": "engineering"}
]
},
{
"policy": "ex:role-policy",
"matched": false,
"reason": "subject_mismatch"
}
],
"final_decision": "allow"
}
Policy Impact on Query
Compare query with and without policies:
// With policies (authenticated)
const authResult = await queryWithAuth(query);
// Without policies (admin override)
const fullResult = await queryAsAdmin(query);
console.log(`Auth sees ${authResult.length} rows`);
console.log(`Admin sees ${fullResult.length} rows`);
console.log(`Policy filtered ${fullResult.length - authResult.length} rows`);
Testing Queries
Isolate Components
Test query components separately:
// Test each WHERE pattern
for (const pattern of wherePatterns) {
const result = await query({
select: ["?s"],
where: [pattern]
});
console.log(`Pattern ${JSON.stringify(pattern)}: ${result.length} results`);
}
Use Smaller Datasets
Test on small dataset first:
# Create test ledger
curl -X POST "http://localhost:8090/v1/fluree/insert?ledger=test:main" \
-d '{"@graph": [small test data]}'
# Test query
curl -X POST http://localhost:8090/v1/fluree/query \
-d '{"from": "test:main", ...}'
Compare with Expected Results
const expected = [
{ name: "Alice", age: 30 },
{ name: "Bob", age: 25 }
];
const actual = await query({...});
assert.deepEqual(actual, expected);
Diagnostic Queries
Check Index Usage
# Count triples per index
SELECT (COUNT(*) as ?count)
WHERE { ?s ?p ?o }
Find Large Entities
SELECT ?entity (COUNT(?triple) as ?tripleCount)
WHERE {
?entity ?p ?o .
BIND(?entity AS ?triple)
}
GROUP BY ?entity
ORDER BY DESC(?tripleCount)
LIMIT 10
Find Common Predicates
SELECT ?predicate (COUNT(*) as ?count)
WHERE {
?s ?predicate ?o
}
GROUP BY ?predicate
ORDER BY DESC(?count)
Check Data Types
SELECT ?type (COUNT(*) as ?count)
WHERE {
?entity a ?type
}
GROUP BY ?type
ORDER BY DESC(?count)
Performance Profiling
Measure Query Time
const start = Date.now();
const result = await query({...});
const duration = Date.now() - start;
console.log(`Query returned ${result.length} rows in ${duration}ms`);
Identify Bottlenecks
Use trace to find slow operations:
const response = await queryWithTrace({...});
const trace = response.trace;
const slowSteps = trace.phases
.flatMap(p => p.steps || [])
.filter(s => s.duration_ms > 100)
.sort((a, b) => b.duration_ms - a.duration_ms);
console.log('Slow steps:', slowSteps);
Compare Approaches
Test different query formulations:
// Approach 1
const start1 = Date.now();
const result1 = await query(approach1);
const time1 = Date.now() - start1;
// Approach 2
const start2 = Date.now();
const result2 = await query(approach2);
const time2 = Date.now() - start2;
console.log(`Approach 1: ${time1}ms, Approach 2: ${time2}ms`);
Best Practices
1. Use Explain Early
Run explain on new queries:
curl -X POST http://localhost:8090/v1/fluree/explain -d '{...}'
2. Test with Representative Data
Test queries with production-like data volume:
// Load realistic test data
await loadTestData(10000); // Similar to production size
// Test query performance
const result = await query({...});
3. Monitor Query Patterns
Track slow queries:
if (duration > 1000) {
logger.warn(`Slow query: ${duration}ms`, {
query: queryText,
resultCount: result.length
});
}
4. Profile Before Optimizing
Measure before optimizing:
console.time('query');
const result = await query({...});
console.timeEnd('query');
5. Use Query Logs
Enable query logging:
[logging]
level = "debug"
log_queries = true
Common Query Antipatterns
Antipattern 1: Overly Broad Patterns
Bad:
{"@id": "?entity", "?predicate": "?value"}
Good:
{"@id": "?person", "@type": "schema:Person"},
{"@id": "?person", "schema:name": "?name"}
Antipattern 2: Disconnected Patterns (Cartesian Products)
Ensure all patterns share at least one variable with the rest of the query. Disconnected patterns produce a Cartesian product:
Bad:
{
"where": [
{"@id": "?person", "schema:name": "?name"},
{"@id": "?dept", "schema:budget": "?budget"}
]
}
Good:
{
"where": [
{"@id": "?person", "schema:name": "?name"},
{"@id": "?person", "schema:department": "?dept"},
{"@id": "?dept", "schema:budget": "?budget"}
]
}
Note: filter placement is handled automatically by the planner. Filters are applied as soon as all their referenced variables are bound, regardless of where they appear in the query.
Antipattern 3: Missing LIMIT
Bad:
{
"select": ["?name"],
"where": [...] // Could return millions
}
Good:
{
"select": ["?name"],
"where": [...],
"limit": 1000 // Always limit
}
Antipattern 4: Redundant Patterns
Bad:
{
"where": [
{"@id": "ex:alice", "schema:name": "?name"},
{"@id": "ex:alice", "schema:name": "Alice"} // Redundant
]
}
Good:
{
"where": [
{"@id": "ex:alice", "schema:name": "Alice"}
]
}
Tools
Query Validation
Validate before sending:
function validateQuery(query) {
if (!query.select) {
throw new Error('Missing select clause');
}
if (!query.where || query.where.length === 0) {
throw new Error('Missing where clause');
}
if (!query.limit && estimateResultSize(query) > 1000) {
console.warn('Query missing LIMIT clause');
}
}
Query Builder
Use query builder for complex queries:
const query = new QueryBuilder()
.from('mydb:main')
.select('?name', '?age')
.where('?person', 'schema:name', '?name')
.where('?person', 'schema:age', '?age')
.filter('?age > 25')
.limit(100)
.build();
Query Templates
Create reusable templates:
function findPersonByName(name) {
return {
from: 'mydb:main',
select: ['?person', '?email'],
where: [
{ '@id': '?person', 'schema:name': name },
{ '@id': '?person', 'schema:email': '?email' }
]
};
}
Related Documentation
- Common Errors - Error reference
- Explain Plans - Query optimization
- JSON-LD Query - Query syntax
- SPARQL - SPARQL syntax
- Telemetry - Logging and monitoring
Performance Investigation with Distributed Tracing
Fluree includes deep instrumentation that decomposes every query, transaction, and indexing operation into a span waterfall visible in Jaeger, Grafana Tempo, AWS X-Ray, or any OpenTelemetry-compatible backend. This guide explains how to use that instrumentation to find and fix performance bottlenecks.
When to Use Deep Tracing
| Symptom | Start with | Escalate to |
|---|---|---|
| Single slow query | /v1/fluree/explain plan | Deep tracing at debug level |
| Slow queries in general, unclear which phase | Deep tracing at debug level | trace level for operator detail |
| Slow transactions / commits | Deep tracing at debug level | Check txn_commit sub-spans |
| Indexing taking too long | Deep tracing at debug level | Check build_index per-order timing |
| Intermittent latency spikes | Sustained tracing + Jaeger search by duration | Correlate with indexing traces |
| Production regression | Compare Jaeger traces before/after deploy | Filter by tracker_time span attribute |
Deep tracing is complementary to explain plans, not a replacement. Explain plans show the shape of a query plan; tracing shows where wall-clock time actually went.
Quick Start: Local Investigation
The otel/ directory at the repository root provides a self-contained Makefile-driven harness for local trace investigation.
Prerequisites
- Docker (for Jaeger)
- Rust toolchain
- curl, bash
One-liner setup
cd otel/
make all # starts Jaeger, builds with --features otel, starts server, runs smoke tests
make ui # opens Jaeger UI in browser
This gives you a running Fluree server exporting traces to a local Jaeger instance with pre-loaded test data.
Investigate a specific query
Once the server is running (via make server or make all):
# Run your problematic query against the server
curl -s -X POST http://localhost:8090/v1/fluree/query/otel-test:main \
-H 'Content-Type: application/json' \
-d '{
"select": ["?name", "?price"],
"where": [
{"@id": "?p", "@type": "ex:Product"},
{"@id": "?p", "ex:name": "?name"},
{"@id": "?p", "ex:price": "?price"}
],
"orderBy": [{"desc": "?price"}],
"limit": 100
}'
Then open Jaeger (make ui or http://localhost:16686), select service fluree-server, and find the trace. The waterfall shows exactly where time was spent.
Teardown
make clean-all # stops server, stops Jaeger, removes data
Writing Custom Scenario Scripts
The otel/scripts/ directory contains scenario scripts you can use as templates. To investigate a specific performance issue:
1. Create a scenario script
#!/usr/bin/env bash
# otel/scripts/my-investigation.sh
set -euo pipefail
PORT="${PORT:-8090}"
LEDGER="${LEDGER:-otel-test:main}"
BASE="http://localhost:${PORT}"
echo "=== My investigation scenario ==="
# Step 1: Insert data that triggers the problem
curl -sf -X POST "${BASE}/${LEDGER}/insert" \
-H 'Content-Type: application/json' \
-d '{
"@context": {"ex": "http://example.org/ns/"},
"@graph": [
... your test data ...
]
}' > /dev/null
sleep 0.5 # let OTEL batch exporter flush
# Step 2: Run the problematic query multiple times
for i in $(seq 1 5); do
echo " Query iteration $i..."
curl -sf -X POST "${BASE}/${LEDGER}/query" \
-H 'Content-Type: application/json' \
-d '{ ... your query ... }' > /dev/null
sleep 0.3
done
echo "=== Done. Check Jaeger for traces. ==="
2. Run it
cd otel/
make up build server # ensure infrastructure is running
bash scripts/my-investigation.sh
make ui # inspect traces
3. Add a Makefile target (optional)
# In otel/Makefile
my-investigation: _data/storage
bash scripts/my-investigation.sh
Tips for effective scenario scripts
- Pause between requests (
sleep 0.3-0.5) to let the OTEL batch exporter flush. Without this, spans from adjacent requests may interleave in Jaeger, making waterfall analysis harder. - Run the query multiple times to see variance. Sort by duration in Jaeger to find the worst case.
- Use different
RUST_LOGlevels for different investigations. Override when starting the server:make server RUST_LOG=info,fluree_db_query=trace - Isolate variables: test with and without indexing (
INDEXING=false), with different data volumes, or with different query patterns.
Reading Jaeger Waterfalls
Anatomy of a query trace
request (info) ─────────────────────────── 834ms
query_execute (debug) ─────────────────────────── 832ms
query_prepare (debug) ──── 12ms
reasoning_prep (debug) ── 3ms
pattern_rewrite (debug) ── 2ms
plan (debug) ── 5ms
query_run (debug) ──────────────────────── 818ms
scan (debug) ── 4ms
join (debug) ─────────────────── 780ms
join_flush_scan_spot (debug) ────────────────── 775ms
filter (debug) ── 2ms
sort (debug) ── 15ms
sort_blocking (debug) ── 14ms
project (debug) ── 1ms
format (debug) ── 2ms
In this example, the bottleneck is immediately visible: the join_flush_scan_spot span accounts for 775ms of the 834ms total. This tells you the query is doing a large range scan during the join phase.
Key span attributes to check
| Span | Attribute | What it tells you |
|---|---|---|
query_execute | tracker_time, tracker_fuel | Total tracked time and fuel consumption |
pattern_rewrite | patterns_before, patterns_after | Whether pattern rewriting is effective |
plan | pattern_count | Complexity of the query plan |
scan | (trace level) | How long individual scans take |
join_flush_scan_spot | unique_subjects, total_leaves | Join scan size — large values indicate broad scans |
sort_blocking | input_rows, sort_ms | Sort cost — are you sorting a huge result set? |
txn_stage | insert_count, delete_count | Transaction size |
txn_commit | flake_count, delta_bytes | Commit I/O volume |
Note: Span attributes like
tracker_time,tracker_fuel,patterns_before,assertion_count,template_count, andpattern_countare verified by acceptance tests (it_tracing_spans.rs). Other attributes in this table (unique_subjects,total_leaves,sort_ms,flake_count,delta_bytes) are documented from code inspection but not programmatically verified — they may drift if span instrumentation is refactored.
Anatomy of a transaction trace
request (info) ─────────────── 245ms
transact_execute (debug) ─────────────── 243ms
txn_stage (debug) ────── 45ms
where_exec (debug) ── 8ms
delete_gen (debug) ── 3ms
insert_gen (debug) ── 12ms
cancellation (debug) ── 5ms
policy_enforce (debug) ── 2ms
txn_commit (debug) ──────────── 195ms
commit_nameservice_lookup (debug) ── 2ms
commit_verify_sequencing (debug) ── 1ms
commit_write_raw_txn (debug) ── 5ms (await of spawned upload)
commit_build_record (debug) ── 3ms
commit_write_commit_blob (debug) ────── 65ms
commit_publish_nameservice (debug) ────── 35ms
When store_raw_txn is opted in, the raw-transaction bytes are uploaded on a Tokio task spawned at the top of the pipeline (see PendingRawTxnUpload). commit_write_raw_txn then measures just the await of that task — usually a few ms, even on S3, because the upload overlapped staging CPU work. If you see commit_write_raw_txn approaching the upload’s intrinsic latency (50-100ms on S3), staging finished faster than the upload; otherwise the overlap has absorbed it. The bottleneck on S3 is now typically commit_write_commit_blob alone.
Anatomy of an indexing trace
Indexing runs as a separate trace (not nested under an HTTP request). Search Jaeger for operation name index_build:
If the index build was queued by an HTTP transaction request, use logs to bridge the two views: the background worker now copies the triggering request_id and trace_id onto its log lines, but the OTEL/Jaeger indexing trace remains separate by design.
index_build (debug) ─────────────────── 12.5s
commit_chain_walk (debug) ── 50ms
commit_resolve (debug, commits=2) ── 27ms
build_all_indexes (debug) ─────────────────── 12.4s
build_index (debug, order=SPOT) ──────── 3.1s
build_index (debug, order=PSOT) ──────── 3.2s
build_index (debug, order=POST) ──────── 3.0s
build_index (debug, order=OPST) ──────── 3.1s
Common Bottleneck Patterns
1. Join scan too broad
Symptom: join_flush_scan_spot has unique_subjects in the thousands and dominates the waterfall.
Cause: A join pattern that matches too many subjects, forcing a large range scan.
Fix: Add more selective patterns or filters to narrow the join. Check the explain plan for join order.
2. Sort on large result set
Symptom: sort_blocking shows input_rows > 10,000 and sort_ms dominates.
Cause: Sorting happens after all joins/filters, on the full result set.
Fix: Add LIMIT if possible, or ensure filters run before the sort by placing restrictive patterns first.
3. Commit I/O on S3
Symptom: commit_write_commit_blob takes 50-200ms. commit_write_raw_txn may also show time if staging completed before the parallel upload finished.
Cause: S3 PutObject latency (~50-100ms per call). The raw-txn upload is parallelized with staging, so its cost is usually absorbed, but the commit-blob write is serial on the critical path.
Fix: S3 latency is inherent. Batch multiple small transactions into fewer larger ones. Consider file storage for latency-sensitive workloads. If commit_write_raw_txn is non-trivial, it indicates staging finished faster than the raw-txn upload — the overlap helped but couldn’t fully hide it.
4. Indexing backlog
Symptom: Multiple index_build traces in quick succession, each taking 10+ seconds.
Cause: Transaction volume exceeds indexing throughput, building up novelty.
Fix: Increase the novelty reindex threshold, or reduce transaction frequency. Check build_index sub-spans to see which index order is slowest.
5. Policy evaluation overhead
Symptom: policy_eval or policy_enforce takes a significant fraction of query/transaction time.
Cause: Complex policy rules that require additional queries to evaluate.
Fix: Simplify policy rules, or pre-compute policy decisions where possible.
Controlling Trace Verbosity
RUST_LOG patterns
| Goal | Pattern | Visible spans |
|---|---|---|
| Production default | info | HTTP request spans only (zero operation spans) |
| Query investigation | info,fluree_db_query=debug | + query_execute, query_prepare, query_run, operators |
| Transaction investigation | info,fluree_db_transact=debug | + txn_stage, txn_commit, commit sub-spans |
| Full debug | info,fluree_db_query=debug,fluree_db_transact=debug,fluree_db_indexer=debug | All debug spans |
| Operator-level detail | info,fluree_db_query=trace | + per-leaf: binary_cursor_next_leaf, etc. |
| Everything | debug | Console firehose (OTEL layer still filters to fluree_* only) |
Note: When OTEL is enabled, all fluree_* debug spans flow to the OTEL collector regardless of RUST_LOG. The table above describes console output only.
With the otel/ harness
# Override RUST_LOG when starting the server
make server RUST_LOG='info,fluree_db_query=trace'
# Then run your scenario
make query
In production
Set RUST_LOG via your container orchestrator’s environment variables. Start at info and increase selectively:
# ECS task definition (environment section)
RUST_LOG=info,fluree_db_query=debug,fluree_db_transact=debug
Production Tracing: AWS Deployments
Architecture: fluree-db-server on ECS/Fargate
┌─────────────┐ OTLP/gRPC (4317) ┌───────────────────┐
│ ECS Task │ ─────────────────────────▶│ OTEL Collector │
│ fluree-srv │ │ (sidecar or │
│ --features │ │ Daemon/Service) │
│ otel │ └────────┬──────────┘
└─────────────┘ │
┌─────────▼──────────┐
│ Grafana Tempo / │
│ AWS X-Ray / │
│ Jaeger │
└─────────────────────┘
ECS task definition snippet:
{
"containerDefinitions": [
{
"name": "fluree-server",
"image": "your-ecr-repo/fluree-db-server:latest",
"environment": [
{"name": "RUST_LOG", "value": "info,fluree_db_query=debug,fluree_db_transact=debug"},
{"name": "OTEL_SERVICE_NAME", "value": "fluree-server"},
{"name": "OTEL_EXPORTER_OTLP_ENDPOINT", "value": "http://localhost:4317"},
{"name": "OTEL_EXPORTER_OTLP_PROTOCOL", "value": "grpc"}
]
},
{
"name": "otel-collector",
"image": "amazon/aws-otel-collector:latest",
"essential": true,
"command": ["--config=/etc/otel-config.yaml"]
}
]
}
OTEL Collector config (for X-Ray export):
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
awsxray:
region: us-east-1
# Or for Grafana Tempo:
# otlp:
# endpoint: tempo.internal:4317
# tls:
# insecure: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [awsxray]
Architecture: fluree-db-api as a Rust crate in AWS Lambda
When using fluree-db-api directly (not through fluree-db-server), you initialize OTEL yourself. The key pattern is the same dual-layer subscriber with a Targets filter on the OTEL layer.
#![allow(unused)]
fn main() {
use opentelemetry_otlp::SpanExporter;
use opentelemetry_sdk::trace::SdkTracerProvider;
use tracing_subscriber::{filter::Targets, layer::SubscriberExt, Registry};
fn init_tracing() {
// OTEL exporter — Lambda uses HTTP/protobuf to the collector sidecar
let exporter = SpanExporter::builder()
.with_http()
.with_endpoint("http://localhost:4318") // collector sidecar
.build()
.expect("Failed to create OTLP exporter");
let resource = opentelemetry_sdk::Resource::builder()
.with_service_name("my-lambda-fn")
.build();
let provider = SdkTracerProvider::builder()
.with_batch_exporter(exporter)
.with_resource(resource)
.build();
let tracer = provider.tracer("fluree-db");
let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);
// Critical: filter OTEL layer to fluree_* crates only
let otel_filter = Targets::new()
.with_target("fluree_db_api", tracing::Level::DEBUG)
.with_target("fluree_db_query", tracing::Level::DEBUG)
.with_target("fluree_db_transact", tracing::Level::DEBUG)
.with_target("fluree_db_indexer", tracing::Level::DEBUG)
.with_target("fluree_db_core", tracing::Level::DEBUG);
let subscriber = Registry::default()
.with(tracing_subscriber::fmt::layer())
.with(otel_layer.with_filter(otel_filter));
tracing::subscriber::set_global_default(subscriber).ok();
}
}
Lambda deployment with ADOT (AWS Distro for OpenTelemetry):
Add the ADOT Lambda layer and set:
AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-handler
OTEL_SERVICE_NAME=my-fluree-lambda
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
The ADOT layer runs a collector sidecar that receives OTLP spans and exports them to X-Ray, Tempo, or any configured backend.
Grafana Tempo + Grafana UI
For production trace exploration, Grafana Tempo with the Grafana UI provides the best experience:
- Search by attributes: Find all queries with
tracker_time > 500ms - Service graph: Visualize call patterns between services
- Trace-to-logs: Jump from a slow span to the corresponding log lines
- Trace-to-metrics: Correlate latency spikes with metric dashboards
Tempo query examples (TraceQL):
# Find slow queries
{ resource.service.name = "fluree-server" && name = "query_execute" && duration > 500ms }
# Find large commits
{ name = "txn_commit" && span.flake_count > 1000 }
# Find indexing operations
{ name = "index_build" }
AWS X-Ray
X-Ray works with OTEL traces exported via the AWS OTEL Collector. Key differences from Jaeger/Tempo:
- X-Ray automatically creates a service map showing request flow
- Subsegment annotations map to OTEL span attributes
- X-Ray sampling rules can be configured server-side (no code changes)
- Use X-Ray Insights for anomaly detection on latency patterns
Using the otel/ Harness for Regression Testing
The otel/ directory is designed for reproducible trace validation. Use it to verify that tracing instrumentation works correctly after code changes:
cd otel/
# Clean slate
make fresh
# After all scenarios complete, check Jaeger:
# 1. Transaction traces should show txn_stage > txn_commit with sub-spans
# 2. Query traces should show query_prepare > query_run with operator spans
# 3. Index traces should appear as separate traces (not under a request)
Specific test scenarios
| Scenario | Command | What to verify in Jaeger |
|---|---|---|
| All transaction types | make transact | 5 traces, each with txn_stage + txn_commit |
| All query types | make query | 7 traces with query_prepare + query_run |
| Background indexing | make index | Separate index_build trace (not under a request) |
| Bulk import | make import | Many commit traces, possibly indexing traces |
| Full end-to-end | make smoke | All of the above |
| Multi-cycle stress | make cycle | 3 full cycles, multiple index_build traces |
Analyzing Exported Traces
Jaeger allows exporting traces as JSON files (click the download icon on any trace or search result). These exports are useful for offline analysis, sharing with teammates, and archiving evidence of performance issues.
Exporting from Jaeger
- Open Jaeger UI (
http://localhost:16686or your deployment) - Search for traces (by service, operation, duration, tags)
- Click the download/export icon to save as JSON
What’s in the export
The JSON file contains data[].spans[] with full span details: operation names, tags (key-value attributes), parent-child references, durations (in microseconds), and timestamps. Files range from ~100KB for a few traces to 50MB+ for large search exports.
Analyzing with Claude Code
The repository includes Claude Code skills for trace analysis:
/trace-inspect /path/to/traces.json # Drill into a single trace's span tree and timing
/trace-overview /path/to/traces.json # Aggregate stats and anomaly detection across all traces
These skills analyze the export file using targeted Python scripts (to avoid loading the full JSON into context) and cross-reference the results against the expected span hierarchy to produce a diagnosis with concrete code-level fix recommendations.
Manual analysis with Python
For quick one-off checks without Claude Code, the Jaeger JSON structure is straightforward:
import json
with open("traces.json") as f:
data = json.load(f)
for trace in data["data"]:
for span in trace["spans"]:
tags = {t["key"]: t["value"] for t in span.get("tags", [])}
print(f"{span['operationName']} dur={span['duration']/1000:.1f}ms {tags}")
Key fields: operationName (span name), duration (microseconds), tags (span attributes), references (parent-child links with refType: "CHILD_OF").
Related Documentation
- Telemetry and Logging – OTEL configuration reference
- Adding Tracing Spans – How to instrument new code paths
- Debugging Queries – Query-specific debugging (explain plans, etc.)
- otel/README.md – Full otel/ harness reference
Reference
Reference materials for Fluree developers and operators.
Reference Guides
Glossary
Definitions of key terms and concepts:
- RDF terminology
- Fluree-specific terms
- Database concepts
- Query terminology
- Index terminology
Fluree System Vocabulary
Complete reference for Fluree’s system vocabulary under https://ns.flur.ee/db#:
- Commit metadata predicates (
f:t,f:address,f:time,f:previous, etc.) - Search query vocabulary (BM25 and vector search patterns)
- Nameservice record fields and type taxonomy
- Policy vocabulary
- Namespace codes
Standards and Feature Flags
Standards and feature-flag reference:
- SPARQL 1.1 compliance
- JSON-LD specifications
- W3C standards support
- Feature flags
- Deprecated features
Graph Identities and Naming
Naming conventions for graphs, ledgers, and identifiers:
- User-facing terminology (ledger, graph IRI, graph source, graph snapshot)
- Time pinning syntax (
@t:,@iso:,@commit:) - Named graphs within a ledger
- Base resolution for graph references
Crate Map
Overview of Fluree’s Rust crate architecture:
- Core crates
- API crates
- Query engine crates
- Storage crates
- Dependency relationships
Quick Reference
Common Namespaces
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/terms/> .
Fluree Namespaces
@prefix f: <https://ns.flur.ee/db#> .
Time Specifiers
ledger:branch@t:123 # Transaction number
ledger:branch@iso:2024-01-22 # ISO timestamp
ledger:branch@commit:bafybeig... # Commit ContentId
HTTP Status Codes
| Code | Meaning | Common Cause |
|---|---|---|
| 200 | OK | Success |
| 400 | Bad Request | Invalid syntax |
| 401 | Unauthorized | Missing auth |
| 403 | Forbidden | Policy denied |
| 404 | Not Found | Ledger not found |
| 408 | Timeout | Query too slow |
| 413 | Payload Too Large | Request too big |
| 429 | Too Many Requests | Rate limited |
| 500 | Internal Error | Server error |
| 503 | Unavailable | Overloaded |
Index Types
| Index | Order | Optimized For |
|---|---|---|
| SPOT | Subject-Predicate-Object-Time | Entity properties |
| POST | Predicate-Object-Subject-Time | Property values |
| OPST | Object-Predicate-Subject-Time | Value lookups |
| PSOT | Predicate-Subject-Object-Time | Predicate scans |
Standards Compliance
RDF Standards
- RDF 1.1: Full compliance
- Turtle: Full support
- JSON-LD 1.1: Full compliance
- N-Triples: Support (future)
Query Standards
- SPARQL 1.1 Query: Full compliance
- SPARQL 1.1 Update: Partial support
- GeoSPARQL: Planned
Security Standards
- JWS (RFC 7515): Full support
- JWT (RFC 7519): Full support
- Verifiable Credentials: W3C compliant
- DIDs: did:key, did:web support
Performance Benchmarks
Typical performance characteristics:
Query Performance
| Query Type | Small DB | Medium DB | Large DB |
|---|---|---|---|
| Simple lookup | < 1ms | < 5ms | < 10ms |
| Pattern match | < 10ms | < 50ms | < 100ms |
| Complex join | < 50ms | < 200ms | < 500ms |
| Aggregation | < 100ms | < 500ms | < 2s |
Transaction Performance
| Operation | Typical Time |
|---|---|
| Small insert (< 10 triples) | < 10ms |
| Medium insert (< 100 triples) | < 50ms |
| Large insert (< 1000 triples) | < 200ms |
| Update | < 20ms |
| Upsert | < 30ms |
Indexing Performance
| Workload | Rate |
|---|---|
| Light | 1,000 flakes/sec |
| Medium | 5,000 flakes/sec |
| Heavy | 10,000 flakes/sec |
Related Documentation
- Glossary - Term definitions
- Compatibility - Standards compliance
- Crate Map - Code architecture
Glossary
Definitions of key terms and concepts used throughout Fluree documentation.
Core Concepts
Ledger
A versioned graph database instance in Fluree, equivalent to a database in traditional systems. Ledgers are identified by ledger IDs like mydb:main.
Example: customers:main, inventory:prod
Branch
A variant of a ledger, allowing multiple independent versions of the same logical database. Branches are part of the ledger ID after the colon.
Example: In mydb:dev, “dev” is the branch.
Transaction Time (t)
A monotonically increasing integer assigned to each transaction, representing the logical time of the transaction.
Example: t=42 is transaction number 42.
Flake
Fluree’s internal representation of an RDF triple with temporal information. A flake is a tuple: (subject, predicate, object, transaction-time, operation, metadata).
Novelty Layer
The set of transactions that have been committed but not yet indexed. The gap between commit_t and index_t.
Example: If commit_t=150 and index_t=145, the novelty layer contains transactions 146-150.
Nameservice
Fluree’s metadata registry that tracks ledger state, including commit and index locations. Enables discovery and coordination across distributed deployments.
RDF Terminology
IRI (Internationalized Resource Identifier)
A globally unique identifier for resources, predicates, and graphs. The internationalized version of URI supporting Unicode.
Example: http://example.org/alice, http://例え.jp/人物/アリス
Triple
The fundamental unit of RDF data: a subject-predicate-object statement.
Example: ex:alice schema:name "Alice"
Subject
The entity being described in a triple (first position).
Example: In ex:alice schema:name "Alice", ex:alice is the subject.
Predicate
The property or relationship in a triple (second position).
Example: In ex:alice schema:name "Alice", schema:name is the predicate.
Object
The value or target entity in a triple (third position).
Example: In ex:alice schema:name "Alice", "Alice" is the object.
Literal
A data value in a triple (string, number, date, etc.), as opposed to an IRI reference.
Example: "Alice", 30, "2024-01-22"^^xsd:date
Blank Node
An anonymous resource without an explicit IRI.
Example: [ schema:streetAddress "123 Main St" ]
Named Graph
A set of triples identified by an IRI, allowing data partitioning within a ledger.
Example: ex:graph1 containing specific triples.
Dataset
A collection of graphs (one default graph and zero or more named graphs) used for query execution.
Transaction Terms
Assertion
Adding a new triple to the database.
Example: Asserting ex:alice schema:age 30 adds this triple.
Retraction
Removing an existing triple from the current database state.
Example: Retracting ex:alice schema:age 30 removes this triple.
Commit
A persisted transaction with assigned transaction time and cryptographic signature.
Commit ContentId
Content-addressed identifier (CIDv1) for a commit, providing storage-agnostic identity and integrity verification. The SHA-256 digest is embedded in the CID.
Example: bafybeig...commitT42
Replace Mode
Transaction mode where all properties of an entity are replaced, enabling idempotent writes.
Also called: Upsert mode
WHERE/DELETE/INSERT
Update pattern for targeted modifications: match data (WHERE), remove old data (DELETE), add new data (INSERT).
Index Terms
SPOT Index
Subject-Predicate-Object-Time index, optimized for retrieving all properties of a subject.
POST Index
Predicate-Object-Subject-Time index, optimized for finding subjects with specific property values.
OPST Index
Object-Predicate-Subject-Time index, optimized for finding subjects that reference specific objects.
PSOT Index
Predicate-Subject-Object-Time index, optimized for scanning all values of a predicate.
Index Snapshot
A complete, query-optimized snapshot of the database at a specific transaction time.
Background Indexing
Asynchronous process that builds index snapshots from committed transactions.
Query Terms
Variable
A placeholder in a query pattern that matches actual values in the data, prefixed with ?.
Example: ?person, ?name, ?age
Binding
The association of a variable with a specific value during query execution.
Example: ?name binds to "Alice"
Pattern
A triple template with variables that matches actual triples in the database.
Example: { "@id": "?person", "schema:name": "?name" }
Filter
A condition that restricts which variable bindings are included in query results.
Example: "filter": "?age > 25"
CONSTRUCT
A SPARQL query form that generates RDF triples rather than variable bindings.
Graph Crawl
Following relationships recursively to explore connected entities.
Graph Source Terms
Graph Source
An addressable query source that participates in execution and can be named in SPARQL via FROM, FROM NAMED, and GRAPH <…>.
Graph sources include:
- Ledger graph sources (default graph and named graphs stored in a ledger)
- Index graph sources (BM25 and vector/HNSW indexes)
- Mapped graph sources (R2RML and Iceberg-backed graph mappings)
Graph Source (Non-Ledger)
A non-ledger graph source is a queryable data source that appears in graph queries but is backed by specialized storage (BM25 index, vector index, Iceberg table, SQL database).
Example: products-search:main, products-vector:main
BM25
Best Matching 25, a ranking algorithm for full-text search. Scores documents by relevance to query terms.
Vector Embedding
A numerical representation of data (text, images, etc.) as a high-dimensional vector, enabling similarity search.
Example: 384-dimensional vector for text embeddings
HNSW
Hierarchical Navigable Small World, a graph-based algorithm for approximate nearest neighbor search in high-dimensional spaces.
R2RML
RDB to RDF Mapping Language, a W3C standard for mapping relational databases to RDF.
Iceberg
Apache Iceberg, an open table format for huge analytical datasets with ACID guarantees.
Security Terms
Policy
A rule specifying who can perform what operations on which data.
DID (Decentralized Identifier)
A globally unique identifier that doesn’t require a central authority, used for cryptographic identity.
Example: did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
JWS (JSON Web Signature)
An IETF standard (RFC 7515) for representing digitally signed content as JSON.
Verifiable Credential (VC)
A W3C standard for cryptographically verifiable digital credentials.
Public Key
Cryptographic key used to verify signatures, shared publicly.
Private Key
Cryptographic key used to create signatures, kept secret.
Storage Terms
ContentId
A CIDv1 (multiformats) value that uniquely identifies any immutable artifact in Fluree. Encodes the content kind (multicodec) and a SHA-256 digest. The canonical string form is base32-lower multibase (e.g., bafybeig...).
See ContentId and ContentStore for details.
ContentKind
An enum identifying the type of content a ContentId refers to: Commit, Txn, IndexRoot, IndexBranch, IndexLeaf, DictBlob, or DefaultContext. Encoded as a multicodec tag within the CID.
ContentStore
The content-addressed storage trait providing get(ContentId), put(ContentKind, bytes), and has(ContentId) operations. All immutable artifacts are stored and retrieved via ContentStore.
Commit ID
A ContentId identifying a committed transaction. Derived by hashing the canonical commit bytes with SHA-256.
Example: bafybeig...commitT42
Index ID
A ContentId identifying an index root snapshot. Derived by hashing the index root descriptor bytes with SHA-256.
Example: bafybeig...indexRootT145
Storage Backend
The underlying system storing Fluree data (memory, file system, AWS S3/DynamoDB).
Nameservice Record
Metadata about a ledger stored in the nameservice, including commit and index ContentIds.
Time Travel Terms
Time Specifier
A suffix on a ledger reference indicating which point in time to query.
Examples: @t:100, @iso:2024-01-22, @commit:bafybeig...
Point-in-Time Query
A query executed against database state at a specific transaction time.
History Query
A query that returns changes to entities over a time range, showing assertions and retractions.
Temporal Database
A database that maintains complete history of all changes, enabling queries at any past state.
JSON-LD Terms
@context
JSON-LD mechanism for defining namespace prefixes and term mappings.
Example:
{
"@context": {
"ex": "http://example.org/ns/",
"schema": "http://schema.org/"
}
}
@id
JSON-LD property for specifying the IRI of a resource.
Example: "@id": "ex:alice"
@type
JSON-LD property for specifying the type(s) of a resource.
Example: "@type": "schema:Person"
@graph
JSON-LD property containing an array of entities.
Example:
{
"@graph": [
{ "@id": "ex:alice", "schema:name": "Alice" }
]
}
@value
JSON-LD property for specifying a literal value explicitly.
Example: {"@value": "30", "@type": "xsd:integer"}
Compact IRI
A shortened IRI using namespace prefix.
Example: ex:alice (compact) vs http://example.org/ns/alice (full)
IRI Expansion
Converting compact IRIs to full IRIs using @context mappings.
Example: ex:alice expands to http://example.org/ns/alice
IRI Compaction
Converting full IRIs to compact form using @context.
Example: http://schema.org/name compacts to schema:name
Query Execution Terms
Fuel
A measure of query/transaction execution cost. One unit of fuel is consumed for each item processed (flakes matched, items expanded during graph crawl, etc.). Used to prevent runaway queries from consuming excessive resources.
Example: "opts": {"max-fuel": 10000} limits query to 10,000 fuel units.
Tracking
Query/transaction execution monitoring that provides visibility into performance metrics. When enabled, returns time (execution duration), fuel (items processed), and policy statistics.
Example: "opts": {"meta": true} enables all tracking metrics.
TrackingTally
The result of tracking, containing time (formatted as “12.34ms”), fuel (total count), and policy stats ({policy-id: {executed, allowed}}).
Acronyms
- ANN: Approximate Nearest Neighbor
- API: Application Programming Interface
- CORS: Cross-Origin Resource Sharing
- CAS: Compare-And-Swap
- CID: Content Identifier (multiformats)
- DID: Decentralized Identifier
- HTTP: Hypertext Transfer Protocol
- HNSW: Hierarchical Navigable Small World
- IRI: Internationalized Resource Identifier
- JSON: JavaScript Object Notation
- JSON-LD: JSON for Linked Data
- JWT: JSON Web Token
- JWS: JSON Web Signature
- RDF: Resource Description Framework
- REST: Representational State Transfer
- SHA: Secure Hash Algorithm
- SPARQL: SPARQL Protocol and RDF Query Language
- SSL/TLS: Secure Sockets Layer / Transport Layer Security
- URI: Uniform Resource Identifier
- URL: Uniform Resource Locator
- VC: Verifiable Credential
- W3C: World Wide Web Consortium
- XSD: XML Schema Definition
Related Documentation
- Standards and feature flags - Standards compliance and feature flags
- Crate Map - Code architecture
- Concepts - Core concepts
Fluree System Vocabulary Reference
All Fluree system vocabulary lives under a single canonical namespace:
https://ns.flur.ee/db#
Users declare a prefix in their JSON-LD @context to use compact forms:
{ "@context": { "f": "https://ns.flur.ee/db#" } }
Any prefix works (f:, db:, fluree:, etc.) as long as it expands to the canonical IRI. Internally, Fluree always compares on fully expanded IRIs.
The @vector and @fulltext shorthands are exceptions: they are JSON-LD convenience aliases that resolve to f:embeddingVector and f:fullText respectively without requiring a prefix declaration.
Source of truth: All constants are defined in the
fluree-vocabcrate. This document is the user-facing reference.
Commit metadata predicates
These predicates appear on commit subjects in the txn-meta graph. Each commit produces 7-10 flakes describing the commit.
| Predicate | Full IRI | Datatype | Description |
|---|---|---|---|
f:address | https://ns.flur.ee/db#address | xsd:string | Commit ContentId (CID string) |
f:alias | https://ns.flur.ee/db#alias | xsd:string | Ledger ID (e.g. mydb:main) |
f:v | https://ns.flur.ee/db#v | xsd:int | Commit format version |
f:time | https://ns.flur.ee/db#time | xsd:long | Commit timestamp (epoch milliseconds) |
f:t | https://ns.flur.ee/db#t | xsd:int | Transaction number (watermark) |
f:size | https://ns.flur.ee/db#size | xsd:long | Cumulative data size in bytes |
f:flakes | https://ns.flur.ee/db#flakes | xsd:long | Cumulative flake count |
f:previous | https://ns.flur.ee/db#previous | @id (ref) | Reference to previous commit (optional) |
f:identity | https://ns.flur.ee/db#identity | xsd:string | Authenticated identity acting on the transaction (system-controlled — verified DID for signed requests, otherwise opts.identity / CommitOpts.identity). |
f:author | https://ns.flur.ee/db#author | xsd:string | Author claim — user-supplied via f:author in the transaction body (optional). Distinct from f:identity. |
f:txn | https://ns.flur.ee/db#txn | xsd:string | Transaction ContentId (CID string, optional) |
f:message | https://ns.flur.ee/db#message | xsd:string | Commit message — user-supplied via f:message in the transaction body (optional). |
f:asserts | https://ns.flur.ee/db#asserts | xsd:long | Assertion count in this commit |
f:retracts | https://ns.flur.ee/db#retracts | xsd:long | Retraction count in this commit |
Querying commit metadata
Commit metadata lives in the #txn-meta named graph within each ledger. To query it:
{
"@context": { "f": "https://ns.flur.ee/db#" },
"select": ["?t", "?time", "?author"],
"where": {
"@graph": "mydb:main#txn-meta",
"f:t": "?t",
"f:time": "?time",
"f:author": "?author"
}
}
Commit subject identifiers
Commit subjects use the scheme fluree:commit:<content-id> (e.g. fluree:commit:bafybeig...). This is a subject identifier scheme, not part of the db# predicate vocabulary.
Datalog rules
| Predicate | Full IRI | Description |
|---|---|---|
f:rule | https://ns.flur.ee/db#rule | Datalog rule definition predicate |
Vector datatype
| Term | IRI | Description |
|---|---|---|
f:embeddingVector | https://ns.flur.ee/db#embeddingVector | f32-precision embedding vector datatype |
@vector | (shorthand) | JSON-LD alias that resolves to f:embeddingVector |
Example usage in a transaction:
{
"@context": { "f": "https://ns.flur.ee/db#", "ex": "http://example.org/" },
"insert": {
"@id": "ex:doc1",
"ex:embedding": { "@value": [0.1, 0.2, 0.3], "@type": "f:embeddingVector" }
}
}
Or with the @vector shorthand:
{
"insert": {
"@id": "ex:doc1",
"ex:embedding": { "@value": [0.1, 0.2, 0.3], "@type": "@vector" }
}
}
A property declared with @type: "@vector" (or @type: "f:embeddingVector") in the @context may also use a bare JSON array as its value — equivalent to the explicit @value form above:
{
"@context": {
"ex": "http://example.org/",
"ex:embedding": { "@type": "@vector" }
},
"insert": {
"@id": "ex:doc1",
"ex:embedding": [0.1, 0.2, 0.3]
}
}
Validation rules
- Element type: every element must be a JSON number; non-numeric elements are rejected.
- Element range: values are quantized to
f32at ingest. Non-finite values (NaN,±Infinity) and values outside the representablef32range are rejected. - Non-empty: vectors must have at least one element. The empty vector (
[]) is reserved as an internal max-bound sentinel and is rejected by both the coercion layer and the write-path guard. - Scalar values are rejected: a single number paired with the
f:embeddingVectordatatype (e.g.{"@value": 0.1, "@type": "@vector"}) is rejected; the value must be an array.
The same rules apply to the SPARQL typed-literal form "[0.1, 0.2, 0.3]"^^f:embeddingVector and to Turtle.
Fulltext datatype
| Term | IRI | Description |
|---|---|---|
f:fullText | https://ns.flur.ee/db#fullText | Inline full-text search datatype |
@fulltext | (shorthand) | JSON-LD alias that resolves to f:fullText |
Example usage in a transaction:
{
"@context": { "f": "https://ns.flur.ee/db#", "ex": "http://example.org/" },
"insert": {
"@id": "ex:article-1",
"ex:content": { "@value": "Rust is a systems programming language", "@type": "f:fullText" }
}
}
Or with the @fulltext shorthand:
{
"insert": {
"@id": "ex:article-1",
"ex:content": { "@value": "Rust is a systems programming language", "@type": "@fulltext" }
}
}
Values annotated with @fulltext are analyzed (tokenized, stemmed) and indexed into per-predicate fulltext arenas during background index builds. Query with the fulltext() function in bind expressions for BM25 relevance scoring.
See Inline Fulltext Search for details.
Fulltext configuration predicates
These predicates live in the ledger’s #config named graph and declare which properties to full-text index (no per-value @fulltext annotation needed). See Configured full-text properties for the end-user guide and Setting groups for the full schema reference.
| Term | IRI | Description |
|---|---|---|
f:fullTextDefaults | https://ns.flur.ee/db#fullTextDefaults | Setting group on f:LedgerConfig / f:GraphConfig |
f:FullTextDefaults | https://ns.flur.ee/db#FullTextDefaults | Class (type) of the setting-group node |
f:defaultLanguage | https://ns.flur.ee/db#defaultLanguage | BCP-47 tag used for untagged plain strings on configured properties |
f:property | https://ns.flur.ee/db#property | One entry per full-text-indexed property (cardinality 0..n) |
f:FullTextProperty | https://ns.flur.ee/db#FullTextProperty | Class of each f:property entry |
f:target | https://ns.flur.ee/db#target | IRI of the property being indexed (on f:FullTextProperty) |
Search query vocabulary
These predicates are used in WHERE clause patterns for BM25 and vector search. Users write compact forms like "f:searchText" (with "f" in their @context) or full IRIs.
BM25 search
| Predicate | Full IRI | Required | Description |
|---|---|---|---|
f:graphSource | https://ns.flur.ee/db#graphSource | Yes | Graph source ID (name:branch, e.g. "my-search:main") |
f:searchText | https://ns.flur.ee/db#searchText | Yes | Search query text (string or variable) |
f:searchResult | https://ns.flur.ee/db#searchResult | Yes | Result binding (variable or nested object) |
f:searchLimit | https://ns.flur.ee/db#searchLimit | No | Maximum results |
f:syncBeforeQuery | https://ns.flur.ee/db#syncBeforeQuery | No | Wait for index sync before querying (boolean) |
f:timeoutMs | https://ns.flur.ee/db#timeoutMs | No | Query timeout in milliseconds |
Vector search
| Predicate | Full IRI | Required | Description |
|---|---|---|---|
f:graphSource | https://ns.flur.ee/db#graphSource | Yes | Graph source ID (name:branch) |
f:queryVector | https://ns.flur.ee/db#queryVector | Yes | Query vector (array of numbers or variable) |
f:searchResult | https://ns.flur.ee/db#searchResult | Yes | Result binding |
f:distanceMetric | https://ns.flur.ee/db#distanceMetric | No | Distance metric: "cosine", "dot", "euclidean" (default: "cosine") |
f:searchLimit | https://ns.flur.ee/db#searchLimit | No | Maximum results |
f:syncBeforeQuery | https://ns.flur.ee/db#syncBeforeQuery | No | Wait for index sync (boolean) |
f:timeoutMs | https://ns.flur.ee/db#timeoutMs | No | Query timeout in milliseconds |
Nested result objects
Both BM25 and vector search support nested result bindings:
| Predicate | Full IRI | Description |
|---|---|---|
f:resultId | https://ns.flur.ee/db#resultId | Document/subject ID binding |
f:resultScore | https://ns.flur.ee/db#resultScore | Search score binding |
f:resultLedger | https://ns.flur.ee/db#resultLedger | Source ledger ID (multi-ledger disambiguation) |
Example BM25 search with nested result:
{
"@context": { "f": "https://ns.flur.ee/db#" },
"select": ["?doc", "?score"],
"where": {
"f:graphSource": "my-search:main",
"f:searchText": "software engineer",
"f:searchLimit": 10,
"f:searchResult": {
"f:resultId": "?doc",
"f:resultScore": "?score"
}
}
}
Nameservice record vocabulary
Ledger record fields
These predicates appear on ledger nameservice records (the metadata Fluree stores about each ledger).
| Predicate | Full IRI | Description |
|---|---|---|
f:ledger | https://ns.flur.ee/db#ledger | Ledger name/identifier |
f:branch | https://ns.flur.ee/db#branch | Branch name (e.g. main) |
f:t | https://ns.flur.ee/db#t | Current transaction watermark |
f:ledgerCommit | https://ns.flur.ee/db#ledgerCommit | Pointer to latest commit ContentId |
f:ledgerIndex | https://ns.flur.ee/db#ledgerIndex | Pointer to latest index root |
f:status | https://ns.flur.ee/db#status | Record status (ready, etc.) |
f:defaultContextCid | https://ns.flur.ee/db#defaultContextCid | Default JSON-LD context ContentId |
Graph source record fields
| Predicate | Full IRI | Description |
|---|---|---|
f:name | https://ns.flur.ee/db#name | Graph source base name |
f:branch | https://ns.flur.ee/db#branch | Branch |
f:status | https://ns.flur.ee/db#status | Status |
f:graphSourceConfig | https://ns.flur.ee/db#graphSourceConfig | Configuration JSON |
f:graphSourceDependencies | https://ns.flur.ee/db#graphSourceDependencies | Dependent ledger IDs |
f:graphSourceIndex | https://ns.flur.ee/db#graphSourceIndex | Index ContentId reference |
f:graphSourceIndexT | https://ns.flur.ee/db#graphSourceIndexT | Index watermark (commit t) |
f:graphSourceIndexAddress | https://ns.flur.ee/db#graphSourceIndexAddress | Index ContentId (string form) |
Record type taxonomy
Nameservice records use @type to classify what kind of graph source a record represents.
Required kind types (exactly one per record):
| Type | Full IRI | Description |
|---|---|---|
f:LedgerSource | https://ns.flur.ee/db#LedgerSource | Ledger-backed knowledge graph |
f:IndexSource | https://ns.flur.ee/db#IndexSource | Index-backed graph source (BM25/HNSW/GEO) |
f:MappedSource | https://ns.flur.ee/db#MappedSource | Mapped database (Iceberg, R2RML) |
Optional subtype @type values (further classify the record):
| Type | Full IRI | Description |
|---|---|---|
f:Bm25Index | https://ns.flur.ee/db#Bm25Index | BM25 full-text search index |
f:HnswIndex | https://ns.flur.ee/db#HnswIndex | HNSW vector similarity search index |
f:GeoIndex | https://ns.flur.ee/db#GeoIndex | Geospatial index |
f:IcebergMapping | https://ns.flur.ee/db#IcebergMapping | Iceberg-mapped database |
f:R2rmlMapping | https://ns.flur.ee/db#R2rmlMapping | R2RML relational mapping |
Policy vocabulary
These predicates are used to define access control policies.
| Predicate | Full IRI | Description |
|---|---|---|
f:policyClass | https://ns.flur.ee/db#policyClass | Marks a class as policy-governed |
f:allow | https://ns.flur.ee/db#allow | Allow/deny flag on a policy rule |
f:action | https://ns.flur.ee/db#action | Action this rule governs (view or modify) |
f:view | https://ns.flur.ee/db#view | View action IRI |
f:modify | https://ns.flur.ee/db#modify | Modify action IRI |
f:onProperty | https://ns.flur.ee/db#onProperty | Property-level policy targeting |
f:onSubject | https://ns.flur.ee/db#onSubject | Subject-level policy targeting |
f:onClass | https://ns.flur.ee/db#onClass | Class-level policy targeting |
f:query | https://ns.flur.ee/db#query | Policy query (determines applicability) |
f:required | https://ns.flur.ee/db#required | Whether the policy is required (boolean) |
f:exMessage | https://ns.flur.ee/db#exMessage | Error message when policy denies access |
See Policy model and inputs for usage details.
Config graph vocabulary
These predicates define ledger-level configuration stored in the config graph. See Ledger configuration for full documentation.
Core types
| Type | Full IRI | Description |
|---|---|---|
f:LedgerConfig | https://ns.flur.ee/db#LedgerConfig | Ledger-wide configuration resource |
f:GraphConfig | https://ns.flur.ee/db#GraphConfig | Per-graph configuration override |
f:GraphRef | https://ns.flur.ee/db#GraphRef | Reference to a graph source |
Setting group predicates
| Predicate | Full IRI | Description |
|---|---|---|
f:policyDefaults | https://ns.flur.ee/db#policyDefaults | Policy enforcement defaults |
f:shaclDefaults | https://ns.flur.ee/db#shaclDefaults | SHACL validation defaults |
f:reasoningDefaults | https://ns.flur.ee/db#reasoningDefaults | OWL/RDFS reasoning defaults |
f:datalogDefaults | https://ns.flur.ee/db#datalogDefaults | Datalog rule defaults |
f:transactDefaults | https://ns.flur.ee/db#transactDefaults | Transaction constraint defaults |
Policy fields
| Predicate | Full IRI | Description |
|---|---|---|
f:defaultAllow | https://ns.flur.ee/db#defaultAllow | Default allow/deny when no policy matches (boolean) |
f:policySource | https://ns.flur.ee/db#policySource | Graph containing policy rules (GraphRef) |
f:policyClass | https://ns.flur.ee/db#policyClass | Default policy classes to apply |
SHACL fields
| Predicate | Full IRI | Description |
|---|---|---|
f:shaclEnabled | https://ns.flur.ee/db#shaclEnabled | Enable/disable SHACL validation (boolean) |
f:shapesSource | https://ns.flur.ee/db#shapesSource | Graph containing SHACL shapes (GraphRef) |
f:validationMode | https://ns.flur.ee/db#validationMode | f:ValidationReject or f:ValidationWarn |
Reasoning fields
| Predicate | Full IRI | Description |
|---|---|---|
f:reasoningModes | https://ns.flur.ee/db#reasoningModes | Reasoning modes: f:RDFS, f:OWL2QL, f:OWL2RL, f:Datalog |
f:schemaSource | https://ns.flur.ee/db#schemaSource | Graph containing schema triples (GraphRef) |
Datalog fields
| Predicate | Full IRI | Description |
|---|---|---|
f:datalogEnabled | https://ns.flur.ee/db#datalogEnabled | Enable/disable datalog rules (boolean) |
f:rulesSource | https://ns.flur.ee/db#rulesSource | Graph containing f:rule definitions (GraphRef) |
f:allowQueryTimeRules | https://ns.flur.ee/db#allowQueryTimeRules | Allow ad-hoc query-time rules (boolean) |
Transact / uniqueness fields
| Predicate | Full IRI | Description |
|---|---|---|
f:uniqueEnabled | https://ns.flur.ee/db#uniqueEnabled | Enable unique constraint enforcement (boolean) |
f:constraintsSource | https://ns.flur.ee/db#constraintsSource | Graph(s) containing constraint annotations (GraphRef) |
f:enforceUnique | https://ns.flur.ee/db#enforceUnique | Annotation on property IRIs: enforce value uniqueness (boolean) |
Override control
| Term | Full IRI | Description |
|---|---|---|
f:overrideControl | https://ns.flur.ee/db#overrideControl | Override gating on a setting group |
f:OverrideNone | https://ns.flur.ee/db#OverrideNone | No overrides permitted |
f:OverrideAll | https://ns.flur.ee/db#OverrideAll | Any request can override (default) |
f:IdentityRestricted | https://ns.flur.ee/db#IdentityRestricted | Only verified identities can override |
f:controlMode | https://ns.flur.ee/db#controlMode | Control mode (for identity-restricted objects) |
f:allowedIdentities | https://ns.flur.ee/db#allowedIdentities | List of DIDs authorized to override |
Graph targeting
| Predicate | Full IRI | Description |
|---|---|---|
f:graphOverrides | https://ns.flur.ee/db#graphOverrides | List of f:GraphConfig per-graph overrides |
f:targetGraph | https://ns.flur.ee/db#targetGraph | Target graph IRI for a f:GraphConfig |
f:graphSelector | https://ns.flur.ee/db#graphSelector | Graph selector within a f:GraphRef |
f:defaultGraph | https://ns.flur.ee/db#defaultGraph | Sentinel IRI for the default graph |
f:txnMetaGraph | https://ns.flur.ee/db#txnMetaGraph | Sentinel IRI for the txn-meta graph |
See Ledger configuration for usage details.
RDF-Star annotation predicates
Fluree supports RDF-Star annotations for transaction metadata. These predicates can appear in annotation triples:
| Predicate | Full IRI | Description |
|---|---|---|
f:t | https://ns.flur.ee/db#t | Transaction number on an annotated triple |
f:op | https://ns.flur.ee/db#op | Operation type (assert/retract) |
Namespace codes (internal)
Fluree encodes namespace IRIs as integer codes for compact storage. These are internal implementation details but useful for contributors working on the core.
| Code | Namespace | IRI |
|---|---|---|
| 0 | (empty) | "" |
| 1 | JSON-LD | @ |
| 2 | XSD | http://www.w3.org/2001/XMLSchema# |
| 3 | RDF | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
| 4 | RDFS | http://www.w3.org/2000/01/rdf-schema# |
| 5 | SHACL | http://www.w3.org/ns/shacl# |
| 6 | OWL | http://www.w3.org/2002/07/owl# |
| 7 | Fluree DB | https://ns.flur.ee/db# |
| 8 | DID Key | did:key: |
| 9 | Fluree Commit | fluree:commit: |
| 10 | Blank Node | _: |
| 11 | OGC GeoSPARQL | http://www.opengis.net/ont/geosparql# |
| 100+ | User-defined | (allocated at first use) |
Standard W3C namespaces
Fluree also recognizes these standard W3C namespaces:
| Prefix | IRI | Common predicates |
|---|---|---|
rdf: | http://www.w3.org/1999/02/22-rdf-syntax-ns# | rdf:type, rdf:first, rdf:rest |
rdfs: | http://www.w3.org/2000/01/rdf-schema# | rdfs:label, rdfs:subClassOf, rdfs:range |
xsd: | http://www.w3.org/2001/XMLSchema# | xsd:string, xsd:int, xsd:dateTime |
owl: | http://www.w3.org/2002/07/owl# | owl:sameAs, owl:inverseOf |
sh: | http://www.w3.org/ns/shacl# | sh:path, sh:datatype, sh:minCount |
See IRIs, namespaces, and JSON-LD @context for details on prefix declarations and IRI resolution.
JSON-LD Connection Configuration (Rust)
This page documents the JSON-LD connection config supported by the Rust implementation.
This config uses the same @context + @graph model as other Fluree JSON-LD config surfaces.
Using with the Fluree server
The server accepts a connection config file via --connection-config:
fluree server run --connection-config /path/to/connection.jsonld
This replaces --storage-path for S3, DynamoDB, and other non-filesystem backends. The server builds its storage and nameservice from the config file at startup. Server-level settings (--cache-max-mb, --indexing-enabled, etc.) override connection config defaults. See Configuration for full details and examples.
Entry points (Rust API)
All construction flows through FlureeBuilder:
FlureeBuilder::from_json_ld(&json)?— parses JSON-LD config into builder settings- Then call
.build_client().awaitfor a type-erasedFlureeClient - Or use typed terminal methods (
.build(),.build_memory(),.build_s3()) for compile-time type safety
- Then call
JSON-LD shape
At minimum, your document contains:
@contextwith@baseand@vocab@graphwith:- one
Connectionnode - one or more
Storagenodes - optional
Publishernodes (nameservice backends)
- one
{
"@context": {
"@base": "https://ns.flur.ee/config/connection/",
"@vocab": "https://ns.flur.ee/system#"
},
"@graph": [
{ "@id": "storage1", "@type": "Storage", "filePath": "./data" },
{
"@id": "connection",
"@type": "Connection",
"indexStorage": { "@id": "storage1" }
}
]
}
ConfigurationValue (env var indirection)
Many fields can be provided as direct literals or as a ConfigurationValue object:
{
"s3Bucket": { "envVar": "FLUREE_S3_BUCKET", "defaultVal": "my-bucket" },
"cacheMaxMb": { "envVar": "FLUREE_CACHE_MAX_MB", "defaultVal": "1024" }
}
Notes:
envVar: reads from environment (non-wasm targets)defaultVal: fallback string valuejavaProp: accepted for compatibility; Rust treats it like another env var key (best-effort)
Connection node fields
Supported:
parallelism(default 4)cacheMaxMb(supportsConfigurationValue)indexStorage(required): reference to aStoragenodecommitStorage(optional): reference to aStoragenodeprimaryPublisher(optional): reference to aPublishernodeaddressIdentifiers(read routing): map of identifier → storage referencedefaults(partial):defaults.indexing.reindexMinBytes/reindexMaxBytesare applied as the defaultIndexConfigfor writesdefaults.indexing.indexingEnabled=falsesuppresses background index triggersdefaults.indexing.maxOldIndexessets the maximum number of old index versions to retain before GC (default: 5)defaults.indexing.gcMinTimeMinssets the minimum age in minutes before an index can be garbage collected (default: 30)
addressIdentifiers (read routing)
The addressIdentifiers field maps identifier strings to storage backends, enabling read routing based on the identifier segment in Fluree addresses.
{
"@id": "connection",
"@type": "Connection",
"indexStorage": {"@id": "indexS3"},
"commitStorage": {"@id": "commitS3"},
"addressIdentifiers": {
"commit-storage": {"@id": "commitS3"},
"index-storage": {"@id": "indexS3"}
}
}
Routing behavior:
fluree:commit-storage:s3://db/commit/abc.fcv2→ routes tocommitS3fluree:index-storage:s3://db/index/xyz.json→ routes toindexS3fluree:s3://db/index/xyz.json(no identifier) → routes to default storagefluree:unknown-id:s3://db/file.json(unknown identifier) → fallback to default storage
Notes:
- Writes always go to the default storage (TieredStorage or indexStorage), regardless of identifier
- This is a read-only routing mechanism for addresses that already contain identifiers
- Use
addressIdentifier(singular) on storage nodes to write addresses with identifier segments
Not yet supported (parsed/ignored or absent):
remoteSystems— not supported
Storage node fields
Memory storage
{ "@id": "mem", "@type": "Storage" }
File storage (requires native)
Supported:
filePathAES256Key(supportsConfigurationValue)
Notes:
- Rust expects
AES256Keyto be base64-encoded and decode to exactly 32 bytes. - This encrypts the index/commit blobs written via the storage layer. The file-based nameservice remains plaintext, matching the existing builder behavior.
{
"@id": "fileStorage",
"@type": "Storage",
"filePath": "/var/lib/fluree",
"AES256Key": { "envVar": "FLUREE_ENCRYPTION_KEY" }
}
S3 storage (requires aws)
Supported fields (parsed and applied by Rust):
s3Buckets3Prefixs3Endpoint(optional; recommended only for LocalStack/MinIO/custom endpoints)s3ReadTimeoutMs,s3WriteTimeoutMs,s3ListTimeoutMs- Rust applies a single operation timeout of
max(read, write, list)
- Rust applies a single operation timeout of
s3MaxRetries,s3RetryBaseDelayMs,s3RetryMaxDelayMs- Rust maps
s3MaxRetriesto AWS SDKmax_attempts = max_retries + 1
- Rust maps
Standard S3 (AWS)
{
"@id": "s3",
"@type": "Storage",
"s3Bucket": "fluree-prod-data",
"s3Prefix": "fluree/"
}
LocalStack / MinIO (custom endpoint)
{
"@id": "s3",
"@type": "Storage",
"s3Bucket": "fluree-test",
"s3Endpoint": "http://localhost:4566",
"s3Prefix": "fluree/"
}
S3 Express One Zone
Rust relies on the AWS SDK’s native support for directory buckets. We also provide bucket-name
detection (--x-s3 + -azN) for diagnostics.
{
"@id": "s3Express",
"@type": "Storage",
"s3Bucket": "my-index--use1-az1--x-s3",
"s3Prefix": "indexes/"
}
Note: omit s3Endpoint for Express directory buckets and let the AWS SDK handle endpoint
resolution. FlureeBuilder::s3() is designed for standard and LocalStack
endpoints; for Express buckets, use FlureeBuilder::from_json_ld() with a config that omits
s3Endpoint.
Guidance:
- Standard S3 in AWS: omit
s3Endpoint(let the SDK pick defaults) - Express One Zone: omit
s3Endpoint - LocalStack/MinIO/custom: set
s3Endpoint
addressIdentifier
Rust parses addressIdentifier on storage nodes and uses it to rewrite published
commit/index ContentIds so they include the identifier segment, e.g.:
fluree:{addressIdentifier}:s3://....
This is mainly useful when you have multiple storage backends and want addresses to carry an explicit storage identifier.
Split commit vs index storage (tiered S3)
Rust supports the tiered commitStorage + indexStorage format via FlureeBuilder::from_json_ld() / build_client().
Internally, Rust routes:
.../commit/...and.../txn/...→ commit storage- everything else → index storage
{
"@context": {"@base": "https://ns.flur.ee/config/connection/", "@vocab": "https://ns.flur.ee/system#"},
"@graph": [
{ "@id": "commitStorage", "@type": "Storage", "s3Bucket": "commits-bucket", "s3Prefix": "fluree-data/" },
{ "@id": "indexStorage", "@type": "Storage", "s3Bucket": "index--use1-az1--x-s3" },
{ "@id": "publisher", "@type": "Publisher", "dynamodbTable": "fluree-nameservice", "dynamodbRegion": "us-east-1" },
{
"@id": "connection",
"@type": "Connection",
"commitStorage": {"@id": "commitStorage"},
"indexStorage": {"@id": "indexStorage"},
"primaryPublisher": {"@id": "publisher"}
}
]
}
IPFS storage (requires ipfs)
Supported:
ipfsApiUrl(defaulthttp://127.0.0.1:5001): Kubo HTTP RPC API base URLipfsPinOnPut(defaulttrue): pin blocks after writing
{
"@id": "ipfsStorage",
"@type": "Storage",
"ipfsApiUrl": "http://127.0.0.1:5001",
"ipfsPinOnPut": true
}
With env var indirection:
{
"@id": "ipfsStorage",
"@type": "Storage",
"ipfsApiUrl": { "envVar": "FLUREE_IPFS_API_URL", "defaultVal": "http://127.0.0.1:5001" },
"ipfsPinOnPut": true
}
Notes:
- Requires a running Kubo node at the specified URL
- Fluree’s CIDs (SHA-256 + private-use multicodec) are stored directly into IPFS
- No encryption support (
AES256Keyis not applicable) - See IPFS Storage Guide for Kubo setup and operational details
Publisher (nameservice) node fields
Storage-backed nameservice
Supported:
storage(reference to aStoragenode)
{
"@id": "publisher",
"@type": "Publisher",
"storage": { "@id": "s3" }
}
DynamoDB nameservice (requires aws)
Supported (and applied):
dynamodbTabledynamodbRegiondynamodbEndpointdynamodbTimeoutMs
Compatibility notes
This Rust JSON-LD model is intended to stay aligned with existing Fluree docs:
../db/docs/S3_STORAGE_GUIDE.md../db/docs/FILE_STORAGE_GUIDE.md../db/docs/DYNAMODB_NAMESERVICE_GUIDE.md
Current intentional gaps in Rust:
remoteSystemsnot supporteddefaults.identityis parsed but not currently applieddefaults.indexing.trackClassStatsis parsed but not currently applied
Standards and Feature Flags
This document covers Fluree’s compliance with standards and feature flags.
Standards Compliance
RDF 1.1
Status: Fully compliant
Fluree implements the W3C RDF 1.1 specification:
- RDF triples (subject-predicate-object)
- IRI identifiers
- Typed literals
- Language tags
- Blank nodes
- RDF datasets
Specification: https://www.w3.org/TR/rdf11-concepts/
JSON-LD 1.1
Status: Fully compliant
Fluree supports JSON-LD 1.1:
- @context for namespace mappings
- @id for resource identification
- @type for type specification
- @graph for multiple entities
- @value and @type for literals
- @language for language tags
- Nested objects
- Arrays
Specification: https://www.w3.org/TR/json-ld11/
SPARQL 1.1 Query
Status: In progress toward full compliance
Supported SPARQL features:
- SELECT queries
- CONSTRUCT queries
- ASK queries
- DESCRIBE queries
- FROM and FROM NAMED clauses
- GRAPH patterns
- OPTIONAL patterns
- UNION patterns
- FILTER expressions
- BIND expressions
- Aggregations (COUNT, SUM, AVG, MIN, MAX, SAMPLE, GROUP_CONCAT) with DISTINCT modifier
- GROUP BY (variables and expressions)
- ORDER BY
- LIMIT and OFFSET
- Subqueries
- Property paths (partial:
+,*,^,|,/; see SPARQL docs)
Aggregate result types: COUNT and SUM of integers return xsd:integer (per W3C spec), not xsd:long. SUM of mixed types and AVG return xsd:double.
W3C Compliance Testing: Fluree runs the official W3C SPARQL test suite via the testsuite-sparql crate. The suite automatically discovers and runs 700+ test cases from W3C manifest files. See the compliance test guide for details.
Specification: https://www.w3.org/TR/sparql11-query/
SPARQL 1.1 Update
Status: Partial support
Supported:
- INSERT DATA (via JSON-LD transactions)
- DELETE/INSERT WHERE (via WHERE/DELETE/INSERT)
Not yet supported:
- DELETE DATA
- LOAD
- CLEAR
- DROP
- COPY, MOVE, ADD
Use JSON-LD transactions for transaction operations.
Specification: https://www.w3.org/TR/sparql11-update/
Turtle
Status: Fully supported
Fluree parses Turtle 1.1:
- @prefix declarations
- Base IRIs
- Abbreviated syntax (a, ;, ,)
- Literals with datatypes and language tags
- Collections
- Blank nodes
Specification: https://www.w3.org/TR/turtle/
JSON Web Signature (JWS)
Status: Partial (EdDSA only)
Supported algorithms:
- EdDSA (Ed25519) - Only supported algorithm
Not yet supported:
- ES256, ES384, ES512 (ECDSA)
- RS256 (RSA)
- HS256, HS384, HS512 (HMAC)
Specification: RFC 7515
Note: Requires the credential feature flag.
Verifiable Credentials
Status: Planned (not yet implemented)
The credential module currently supports JWS verification only. Full VC support (proof verification, JSON-LD canonicalization) is planned but not yet available.
Specification: https://www.w3.org/TR/vc-data-model/
Decentralized Identifiers (DIDs)
Status: Partial support
Supported DID methods:
- did:key (Ed25519 keys only)
Not yet supported:
- did:web
- did:ion
- did:ethr
Specification: https://www.w3.org/TR/did-core/
Note: Requires the credential feature flag.
Compile-Time Feature Flags (Cargo)
These features are controlled at compile time via Cargo:
fluree-db-api Features
| Feature | Default | Description |
|---|---|---|
native | Yes | File storage support |
aws | No | AWS-backed storage support (S3, storage-backed nameservice). Enables FlureeBuilder::s3() and S3-based JSON-LD configs. |
credential | No | DID/JWS/VerifiableCredential support for signed queries/transactions. Pulls in crypto dependencies (ed25519-dalek, bs58). |
iceberg | No | Apache Iceberg/R2RML graph source support |
shacl | No | SHACL constraint validation (requires fluree-db-transact + fluree-db-shacl). Default in server/CLI. |
vector | No | Embedded vector similarity search (HNSW indexes via usearch) |
ipfs | No | IPFS-backed storage via Kubo HTTP RPC |
search-remote-client | No | HTTP client for remote BM25 and vector search services |
aws-testcontainers | No | Opt-in LocalStack-backed S3/DynamoDB tests (auto-start via testcontainers) |
full | No | Convenience bundle: native, credential, iceberg, shacl, ipfs |
Example:
[dependencies]
fluree-db-api = { path = "../fluree-db-api", features = ["native", "credential"] }
fluree-db-server Features
| Feature | Default | Description |
|---|---|---|
native | Yes | File storage support (forwards to fluree-db-api/native) |
credential | Yes | Signed request verification (forwards to fluree-db-api/credential) |
shacl | Yes | SHACL constraint validation (forwards to fluree-db-api/shacl) |
iceberg | Yes | Apache Iceberg/R2RML graph source support (forwards to fluree-db-api/iceberg) |
aws | No | AWS S3 storage + DynamoDB nameservice (forwards to fluree-db-api/aws) |
oidc | No | OIDC JWT verification via JWKS (RS256 tokens from external IdPs) |
swagger-ui | No | Swagger UI endpoint |
otel | No | OpenTelemetry tracing |
To build the server without credential support (faster compile):
cargo build -p fluree-db-server --no-default-features --features native
Runtime Behavior
Reasoning, SPARQL property paths, and GeoSPARQL functions are always available in any build that links the corresponding crate features (see the build-time feature tables above). They are not gated behind a runtime flag.
Reasoning is opted into per query (via the reasoning parameter or the
SPARQL PRAGMA reasoning directive) or per ledger (via
f:reasoningDefaults in the ledger configuration graph). See
Query-time reasoning and
Setting groups.
Parsing Modes
Strict Mode (Default)
Enforces strict compliance with standards:
- Invalid IRIs rejected
- Type mismatches rejected
- Strict JSON-LD parsing
./fluree-db-server --strict-mode true
Lenient Mode
More permissive parsing:
- Attempts to fix malformed IRIs
- Coerces types when possible
- Accepts non-standard syntax
./fluree-db-server --strict-mode false
Use lenient mode only when you fully control inputs and explicitly want permissive parsing behavior.
API Versioning
Current API version: v1
Version Header:
X-Fluree-API-Version: 1
Supported Data Formats
JSON-LD
Supported JSON-LD versions:
- JSON-LD 1.0: Yes
- JSON-LD 1.1: Yes
SPARQL
Supported SPARQL versions:
- SPARQL 1.0: Yes
- SPARQL 1.1: Yes
RDF Formats
| Format | Read | Write |
|---|---|---|
| JSON-LD | Yes | Yes |
| Turtle | Yes | Yes |
| N-Triples | Planned | Planned |
| N-Quads | Planned | Planned |
| RDF/XML | Planned | No |
| TriG | Planned | Planned |
Protocol Support
HTTP Versions
- HTTP/1.1: Fully supported
- HTTP/2: Supported
- HTTP/3: Planned
TLS Versions
- TLS 1.2: Supported
- TLS 1.3: Supported
- SSL 3.0: Not supported (deprecated)
- TLS 1.0/1.1: Not supported (deprecated)
Client Support
Fluree works with:
HTTP Clients:
- curl
- Postman
- Insomnia
- Any HTTP client library
RDF Libraries:
- Apache Jena (Java)
- RDF4J (Java)
- rdflib (Python)
- N3.js (JavaScript)
SPARQL Clients:
- Apache Jena ARQ
- RDF4J SPARQLRepository
- Any SPARQL 1.1 client
Platform Support
Operating Systems
Server:
- Linux (x86_64, aarch64)
- macOS (Intel, Apple Silicon)
- Windows (x86_64)
Clients:
- Any OS with HTTP support
Cloud Platforms
- AWS (native support)
- Google Cloud Platform (via file storage)
- Azure (via file storage)
- Self-hosted / on-premises
Container Support
- Docker: Full support
- Kubernetes: Full support
- Podman: Supported
- Docker Compose: Full support
Database Support
Import Sources
Fluree can import from:
RDF Databases:
- Apache Jena TDB
- Virtuoso
- Stardog
- GraphDB
- Any RDF export
Graph Databases:
- Neo4j (via RDF export)
- Amazon Neptune (via RDF export)
Relational Databases:
- Via R2RML mapping
- Direct SQL query
Export Formats
Export Fluree data to:
- Turtle files
- JSON-LD documents
- SPARQL CONSTRUCT results
- Any RDF format
Feature Roadmap
Planned Features
Query:
- SPARQL property paths: remaining operators (
?zero-or-one,!negated set) - GeoSPARQL
- SPARQL 1.1 Federation
- Full SPARQL UPDATE
Storage:
- Additional cloud providers (GCP, Azure)
- Hybrid storage modes
Security:
- OAuth 2.0 integration
- SAML support
- Additional DID methods
Graph Sources:
- BigQuery integration
- Snowflake integration
- Elasticsearch integration
Feature Discovery
Feature availability is documented in this compatibility matrix and by
crate feature flags; the standalone server does not expose a /features
HTTP endpoint.
Browser Support
For web applications using Fluree API:
Supported Browsers:
- Chrome/Edge 90+
- Firefox 88+
- Safari 14+
Requirements:
- Fetch API support
- CORS support
- WebSocket support (for future streaming)
Tool Support
RDF Tools
Compatible with standard RDF tools:
- Protégé (ontology editor)
- TopBraid Composer
- RDF validators
- SPARQL editors
Data Tools
Works with data engineering tools:
- Apache Airflow (via HTTP operators)
- dbt (via SQL proxy with R2RML)
- Apache Spark (via Iceberg)
- Pandas (via query API)
Version Requirements
Rust Version
Building from source requires:
- Rust 1.75.0 or later
- Cargo 1.75.0 or later
Dependencies
Runtime dependencies:
- None (statically linked binary)
Optional dependencies:
- AWS SDK (for AWS storage)
Related Documentation
- Glossary - Term definitions
- Crate Map - Code architecture
- Getting Started - Installation
Graph Identities and Naming
This document defines names → things and recommended naming conventions for Fluree. It is split into:
- User-facing naming: what we say in docs, examples, and APIs.
- Internal naming: how we name types/components in Rust so the implementation stays clear.
The goal is to make these simultaneously true: - Users can think in the familiar model: “database as a value” (immutable, time-travelable).
- SPARQL semantics remain correct:
GRAPH <…>identifies a graph by IRI. - Fluree can seamlessly query across:
- graphs inside a ledger (default + named graphs),
- across ledgers (federation),
- and non-ledger sources (BM25, vector, Iceberg/R2RML, etc.).
Summary: the model in one paragraph
In Fluree, you query graphs and often load a graph snapshot (an immutable point-in-time view you can query repeatedly). In SPARQL, graph scoping uses GRAPH <iri> { … }, where <iri> is a graph identifier (a graph IRI). Fluree supports multiple kinds of graph sources (ledger graphs and non-ledger sources like BM25/vector indexes and tabular mappings). Users may refer to graphs with short, friendly aliases that Fluree resolves against a configured base into canonical graph IRIs. Not all graph sources support the same time-travel semantics — time pinning and “as-of” behavior is a graph-source capability, not a universal guarantee.
Under the hood, this “graph snapshot” corresponds to the same semantic idea many temporal systems describe as “database as a value”: immutable, time-travelable, and safe to pass around.
User-facing naming (recommended)
Core terms
- Ledger: A durable data product with a commit chain, identified by a ledger ID like
mydb:main.- A ledger is what users create/manage.
- A ledger can contain multiple graphs (default graph + named graphs).
- Graph Source ID: A canonical
name:branchidentifier used in APIs/CLI/config to refer to a graph source, e.g.products-search:main.- This is an alias-style identifier (not a full IRI).
- In SPARQL contexts it may appear inside
<…>and can be resolved against a configured base into a canonical Graph IRI.
- Graph: A query scope (SPARQL term).
- In a query, a “graph” is identified by an IRI and used to scope patterns (
GRAPH <iri> { … }).
- In a query, a “graph” is identified by an IRI and used to scope patterns (
- Graph Snapshot: An immutable point-in-time view of a graph that can be queried repeatedly.
- LedgerSnapshot (database value): The underlying semantic model: an immutable value at a point in time.
- In product/docs we usually say “graph snapshot” because it aligns with SPARQL and the Rust API.
- Internally the type is
LedgerSnapshotinfluree-db-core.
- Graph IRI: The canonical identity of a graph. This is what SPARQL uses.
- Graph reference (GraphRef): What a user types (often an alias-like string), which Fluree resolves to a Graph IRI.
- Graph Source: Anything addressable by a Graph IRI that can participate in query execution.
- Federation (preferred) / Dataset (SPARQL term): A query executed over a set of graphs.
- We prefer “federation” when describing the product feature to non-SPARQL users.
“Graph snapshot” vs “Graph IRI” (how to talk about it)
- Graph Snapshot (value) answers: “What immutable point-in-time graph am I querying?”
- Graph IRI (identifier) answers: “Which graph does this part of the query run against?”
In practice, you query a graph snapshot by naming its graph:
- When you write
FROM <…>orGRAPH <…>, you are naming a graph IRI. - That graph IRI resolves to a graph snapshot (an immutable value) at execution time.
Time pinning syntax (“the part after @ pins the snapshot”)
Fluree supports time pinning in graph references.
Current syntax (implemented today):
<ledger>:<branch>@t:<t>— pin to transaction time<ledger>:<branch>@iso:<rfc3339>— pin to ISO datetime<ledger>:<branch>@commit:<commit-content-id>— pin to commit ContentId (prefix allowed)
Note: you may see an = form in older design notes (@t=100, etc.). That form is not the supported user-facing syntax today; use the @t: / @iso: / @commit: forms in docs and examples.
From a user perspective:
- The
@…portion selects which snapshot value you mean for that ledger graph.
Important nuance:
- For ledger graph sources,
@…selects a pinned point-in-time view. - For non-ledger graph sources,
@…support is capability-specific:- Some sources may support time pinning by selecting an appropriate snapshot/root.
- Some sources are head-only and should reject time-pinned requests with a clear error.
Named graphs within a ledger
We support multiple named graphs inside a single ledger (shared commit chain, distinct graph identities/indexes).
System graphs
Every ledger reserves system named graphs for internal use:
| Graph | IRI pattern | Purpose |
|---|---|---|
| Default graph | (implicit) | Application data |
| Txn-meta | urn:fluree:{ledger_id}#txn-meta | Commit metadata |
| Config graph | urn:fluree:{ledger_id}#config | Ledger configuration |
User-defined named graphs (created via TriG) are identified by their IRI and allocated after the system graphs.
The config graph stores ledger-level operational defaults (policy, SHACL, reasoning, uniqueness constraints) as RDF triples. See Ledger configuration for details.
Naming convention
Recommended user-facing convention (alias-friendly, URL-friendly, avoids / as a delimiter inside the ledger namespace):
<ledger>:<branch>[ @time-spec ] [ #<named-graph-alias> ]
Examples:
- Default graph, latest:
<acme/people:main> - Default graph at (t=1000):
<acme/people:main@t:1000> - Txn metadata graph at latest:
<acme/people:main#txn-meta> - Txn metadata graph at (t=1000):
<acme/people:main@t:1000#txn-meta>
Important note about # fragments
Using #<named-graph-alias> is idiomatic RDF identity, but HTTP clients do not send fragments to servers.
That’s fine for graph identity and query semantics, but if you want a dereferenceable HTTP endpoint for a named graph,
plan to expose a server-visible selector (e.g., ?graph=txn-meta) in addition to the canonical identity.
Full IRIs are always allowed
Semantic web users may prefer full IRIs:
https://data.flur.ee/acme/people:main@t:1000#txn-meta
These should be used as-is (no resolution needed).
Base resolution (“make aliases globally identifiable”)
Many users prefer short names like people:main or acme/people:main. To make them globally identifiable:
- Allow a configured base (SPARQL
BASE <…>or a connection/query base configuration). - Treat alias-style graph references as relative IRI references resolved against that base.
Example:
- Base:
https://data.flur.ee/ - Ref:
<acme/people:main@t:1000#txn-meta> - Graph IRI:
https://data.flur.ee/acme/people:main@t:1000#txn-meta
Character and encoding rules (user-facing)
To avoid ambiguity and URL pitfalls:
- Reserved delimiters:
@separates the time specifier#separates a named-graph alias (fragment):is used inside the ledger ID asledger:branch
- Do not use raw
@or#inside ledger names, branch names, or named-graph aliases.- If needed, percent-encode them.
- RFC3339 / ISO timestamps must be URL-safe:
- Prefer UTC with
Z(e.g.,2026-02-03T17:02:11Z). - If offsets are used (
+05:00), they should be percent-encoded in IRI contexts.
- Prefer UTC with
Graph Sources (user-facing)
We use Graph Source as the umbrella term for anything you can name in FROM, FROM NAMED, or GRAPH <…> and query as part of a single execution.
Graph sources differ in capabilities:
- Some behave like RDF triple stores (ledger graphs, some mappings).
- Some provide specialized operators/patterns (BM25 and vector search).
- Some support time pinning / time travel; others are head-only.
Categories
- Ledger Graph Sources: RDF graphs stored in a ledger (default graph or named graph).
- Index Graph Sources: persisted indexes queried through graph-integrated patterns (BM25, Vector/HNSW).
- Mapped Graph Sources: non-ledger data mapped into an RDF-shaped graph (R2RML, Iceberg).
Conventions and examples
SPARQL: base + pinned graphs
BASE <https://data.flur.ee/>
SELECT ?s ?p ?o
FROM <acme/people:main@t:1000>
WHERE {
?s ?p ?o .
}
SPARQL: txn metadata named graph
BASE <https://data.flur.ee/>
SELECT ?commit ?t
FROM NAMED <acme/people:main@t:1000#txn-meta>
WHERE {
GRAPH <acme/people:main@t:1000#txn-meta> {
?commit <https://ns.flur.ee/db#t> ?t .
}
}
JSON-LD Query: pinned graph reference
{
"from": "acme/people:main@t:1000",
"select": ["?name"],
"where": [{ "@id": "?p", "http://schema.org/name": "?name" }]
}
Internal naming conventions (Rust code)
These conventions govern how we name types and variables in Rust code related to graphs, ledgers, and identifiers.
Canonical identities: prefer explicit types, not raw String
Internally we should distinguish:
LedgerId: the durable ledger identifier (e.g.,acme/people:main)GraphIri: canonical graph identity used by SPARQL (Arc<str>or validated URL/IRI type)GraphRef: user input token, resolved to aGraphIriusing base rules
Even if these are initially just struct LedgerId(Arc<str>) newtypes, they prevent accidental mixing and make APIs self-documenting.
Naming rules (make each word mean one thing)
This repo reserves _id for content identifiers (e.g. commit_id, index_id, default_context_id) and identity/lookup keys (name:branch tokens). The recommended rule set:
id: canonical identifier used as a cache key / stable identity.- For ledgers this is the full
name:branchform (e.g.,people:main). - For graph sources this is the full
name:branchform (e.g.,products-search:main).
- For ledgers this is the full
_id(for content references): a content identifier (ContentId) used by the storage layer to fetch immutable artifacts.- Examples:
commit_id,index_id,default_context_id.
- Examples:
name: a base name without branch (e.g.,people).branch: the branch name (e.g.,main).alias: a human-friendly label, and only that. For ledger identifiers, preferledger_id.
Practical guidelines:
- If a string is used to load/cache/lookup a ledger, call it
ledger_id(notledger_alias, notledger_address). - If a string is used to load/cache/lookup a graph source by
name:branch, call itgraph_source_id(notgs_alias, notgraph_source_alias). - If a value identifies a content-addressed artifact (commit, index, context), call it
*_id(e.g.,commit_id,index_id,default_context_id). - If a string is used to identify a graph in SPARQL (
FROM,GRAPH), call itgraph_iri(canonical) orgraph_ref(user input). - Avoid having two different meanings for the same field name across crates.
Ledger vs Db vs Graph (internal meaning)
- Ledger: the durable identity + commit chain + publication state (nameservice record).
- LedgerSnapshot: the indexed snapshot value used for range/scan (hot path).
- Graph: query scoping mechanism (active graph, dataset graph selection,
GRAPHoperator).
Avoid using “graph” as a synonym for “db” in code comments. Prefer:
- “ledger graph” (RDF graph inside a ledger)
- “graph IRI” (identifier)
- “graph source” (registry/resolution concept)
“Dataset” naming internally
SPARQL calls the set-of-graphs a “dataset”, and the code already models a DataSet.
For product-facing APIs and docs, prefer “federation”, but internally keep DataSet as the SPARQL-aligned term.
Related docs
docs/concepts/time-travel.md(time pinning syntax)docs/concepts/datasets-and-named-graphs.md(SPARQL dataset semantics)docs/graph-sources/overview.mdanddocs/concepts/graph-sources.md(graph source overview)docs/reference/glossary.md(terms glossary)
OWL & RDFS Support Reference
This page lists every OWL and RDFS construct that Fluree’s reasoning engine supports. For conceptual background see Reasoning and inference; for query syntax see Query-time reasoning.
Quick orientation
Fluree implements reasoning via two techniques:
- Query rewriting (RDFS and OWL 2 QL modes) — patterns are expanded at compile time; no facts are materialized.
- Forward-chaining materialization (OWL 2 RL mode) — derived facts are computed before query execution using the OWL 2 RL rule set.
The tables below indicate which technique handles each construct.
RDFS constructs
These constructs are handled by query rewriting in RDFS mode (and also by materialization in OWL 2 RL mode).
rdfs:subClassOf
Declares that every instance of one class is also an instance of another.
ex:Student rdfs:subClassOf ex:Person .
Effect: A query for ?x rdf:type ex:Person also returns instances typed as
ex:Student (and any subclass of Student, transitively).
JSON-LD transaction:
{"@id": "ex:Student", "rdfs:subClassOf": {"@id": "ex:Person"}}
rdfs:subPropertyOf
Declares that one property is a specialization of another.
ex:hasMother rdfs:subPropertyOf ex:hasParent .
Effect: A query for ?x ex:hasParent ?y also returns triples asserted with
ex:hasMother.
JSON-LD transaction:
{"@id": "ex:hasMother", "rdfs:subPropertyOf": {"@id": "ex:hasParent"}}
rdfs:domain
Declares that the subject of a property is an instance of a class.
ex:teaches rdfs:domain ex:Professor .
Effect (OWL 2 QL / OWL 2 RL): If ex:alice ex:teaches ex:cs101, then
ex:alice rdf:type ex:Professor is inferred.
JSON-LD transaction:
{"@id": "ex:teaches", "rdfs:domain": {"@id": "ex:Professor"}}
rdfs:range
Declares that the object of a property is an instance of a class.
ex:teaches rdfs:range ex:Course .
Effect (OWL 2 QL / OWL 2 RL): If ex:alice ex:teaches ex:cs101, then
ex:cs101 rdf:type ex:Course is inferred.
JSON-LD transaction:
{"@id": "ex:teaches", "rdfs:range": {"@id": "ex:Course"}}
Note: Range inference applies to IRI-valued objects only. Literal values (strings, numbers, etc.) are not assigned a type via
rdfs:range.
OWL property constructs
These are handled by materialization in OWL 2 RL mode (some also by query rewriting in OWL 2 QL mode, as noted).
owl:inverseOf
Declares that two properties are inverses of each other.
ex:hasMother owl:inverseOf ex:motherOf .
Effect: If ex:alice ex:hasMother ex:carol, then
ex:carol ex:motherOf ex:alice is inferred (and vice versa).
Handled by: OWL 2 QL (query rewriting) and OWL 2 RL (materialization).
OWL 2 RL rule: prp-inv
JSON-LD transaction:
{"@id": "ex:hasMother", "owl:inverseOf": {"@id": "ex:motherOf"}}
owl:SymmetricProperty
Declares that a property holds in both directions.
ex:livesWith a owl:SymmetricProperty .
Effect: If ex:alice ex:livesWith ex:bob, then
ex:bob ex:livesWith ex:alice is inferred.
OWL 2 RL rule: prp-symp
JSON-LD transaction:
{"@id": "ex:livesWith", "@type": "owl:SymmetricProperty"}
owl:TransitiveProperty
Declares that a property chains through intermediate nodes.
ex:hasAncestor a owl:TransitiveProperty .
Effect: If ex:a ex:hasAncestor ex:b and ex:b ex:hasAncestor ex:c, then
ex:a ex:hasAncestor ex:c is inferred.
OWL 2 RL rule: prp-trp
JSON-LD transaction:
{"@id": "ex:hasAncestor", "@type": "owl:TransitiveProperty"}
owl:FunctionalProperty
Declares that a property can have at most one value per subject.
ex:hasBirthDate a owl:FunctionalProperty .
Effect: If ex:alice ex:hasBirthDate ex:d1 and
ex:alice ex:hasBirthDate ex:d2, then ex:d1 owl:sameAs ex:d2 is inferred.
OWL 2 RL rule: prp-fp
JSON-LD transaction:
{"@id": "ex:hasBirthDate", "@type": "owl:FunctionalProperty"}
owl:InverseFunctionalProperty
Declares that a property’s object uniquely identifies the subject.
ex:hasSSN a owl:InverseFunctionalProperty .
Effect: If ex:alice ex:hasSSN "123" and ex:bob ex:hasSSN "123", then
ex:alice owl:sameAs ex:bob is inferred.
OWL 2 RL rule: prp-ifp
JSON-LD transaction:
{"@id": "ex:hasSSN", "@type": "owl:InverseFunctionalProperty"}
owl:equivalentProperty
Declares that two properties have identical extensions.
ex:author owl:equivalentProperty ex:writtenBy .
Effect: Treated as mutual rdfs:subPropertyOf — queries and rules see both
properties’ triples when either is used.
owl:propertyChainAxiom
Declares that a chain of properties implies another property.
ex:hasUncle owl:propertyChainAxiom ( ex:hasParent ex:hasBrother ) .
Effect: If ex:alice ex:hasParent ex:bob and
ex:bob ex:hasBrother ex:charlie, then ex:alice ex:hasUncle ex:charlie is
inferred.
OWL 2 RL rule: prp-spo2
Chains can be of arbitrary length (2 or more properties) and can include inverse elements:
ex:hasNephew owl:propertyChainAxiom (
[ owl:inverseOf ex:hasBrother ]
ex:hasChild
) .
JSON-LD transaction:
{
"@id": "ex:hasUncle",
"owl:propertyChainAxiom": {
"@list": [{"@id": "ex:hasParent"}, {"@id": "ex:hasBrother"}]
}
}
OWL class constructs
owl:equivalentClass
Declares that two classes have identical extensions.
ex:Pupil owl:equivalentClass ex:Student .
Effect: Instances of either class are inferred to be instances of both.
OWL 2 RL rule: cax-eqc
owl:hasKey
Declares a set of properties that uniquely identify instances of a class.
ex:Person owl:hasKey ( ex:hasSSN ) .
Effect: If two ex:Person instances share the same ex:hasSSN value, they
are inferred to be owl:sameAs.
OWL 2 RL rule: prp-key
OWL restrictions (class expressions)
OWL restrictions define classes based on property constraints. They are used with OWL 2 RL materialization.
owl:hasValue
Defines a class of all subjects that have a specific value for a property.
ex:RedThings a owl:Restriction ;
owl:onProperty ex:color ;
owl:hasValue ex:Red .
Effect (forward — cls-hv1): If ?x rdf:type ex:RedThings, then
?x ex:color ex:Red is inferred.
Effect (backward — cls-hv2): If ?x ex:color ex:Red, then
?x rdf:type ex:RedThings is inferred.
Limitation: Currently supports IRI-valued
hasValueonly. Literal values (strings, numbers) are not yet supported.
owl:someValuesFrom
Defines a class of subjects that have at least one value of a given type for a property.
ex:Parent a owl:Restriction ;
owl:onProperty ex:hasChild ;
owl:someValuesFrom ex:Person .
Effect (cls-svf1): If ?x ex:hasChild ?y and ?y rdf:type ex:Person,
then ?x rdf:type ex:Parent is inferred.
owl:allValuesFrom
Defines a class where all values of a property belong to a given type.
ex:VeganRestaurant a owl:Restriction ;
owl:onProperty ex:servesFood ;
owl:allValuesFrom ex:VeganDish .
Effect (cls-avf): If ?x rdf:type ex:VeganRestaurant and
?x ex:servesFood ?y, then ?y rdf:type ex:VeganDish is inferred.
owl:maxCardinality (= 1)
When a restriction specifies maxCardinality of 1, it acts like a
context-specific functional property.
ex:SingleChild a owl:Restriction ;
owl:onProperty ex:hasChild ;
owl:maxCardinality 1 .
Effect (cls-maxc2): If ?x rdf:type ex:SingleChild,
?x ex:hasChild ?y1, and ?x ex:hasChild ?y2, then
?y1 owl:sameAs ?y2 is inferred.
owl:maxQualifiedCardinality (= 1)
Like maxCardinality but restricted to objects of a specific class.
ex:MonogamousPerson a owl:Restriction ;
owl:onProperty ex:marriedTo ;
owl:maxQualifiedCardinality 1 ;
owl:onClass ex:Person .
Effect (cls-maxqc3/4): If ?x is typed as this restriction class, has two
ex:marriedTo values, and both are ex:Person, they are inferred to be
owl:sameAs.
OWL class operations
owl:intersectionOf
Defines a class as the intersection of member classes.
ex:WorkingStudent owl:intersectionOf ( ex:Student ex:Employee ) .
Effect (forward — cls-int1): If ?x rdf:type ex:Student and
?x rdf:type ex:Employee, then ?x rdf:type ex:WorkingStudent is inferred.
Effect (backward — cls-int2): If ?x rdf:type ex:WorkingStudent, then
both ?x rdf:type ex:Student and ?x rdf:type ex:Employee are inferred.
owl:unionOf
Defines a class as the union of member classes.
ex:PersonOrOrg owl:unionOf ( ex:Person ex:Organization ) .
Effect (cls-uni): If ?x rdf:type ex:Person (or ex:Organization), then
?x rdf:type ex:PersonOrOrg is inferred.
owl:oneOf
Defines an enumerated class — a fixed set of individuals.
ex:PrimaryColor owl:oneOf ( ex:Red ex:Blue ex:Yellow ) .
Effect (cls-oo): ex:Red, ex:Blue, and ex:Yellow are each inferred to
be of type ex:PrimaryColor.
owl:sameAs
owl:sameAs declares that two IRIs refer to the same real-world entity.
ex:alice owl:sameAs ex:aliceSmith .
Effect: All facts about ex:alice and ex:aliceSmith are merged. Queries
for either IRI return the combined set of properties.
How sameAs is produced
owl:sameAs can be asserted explicitly or inferred by these rules:
| Rule | Trigger |
|---|---|
prp-fp | Functional property with multiple objects |
prp-ifp | Inverse functional property with multiple subjects |
prp-key | owl:hasKey match across instances |
cls-maxc2 | maxCardinality = 1 violation |
cls-maxqc3/4 | maxQualifiedCardinality = 1 violation |
Equivalence properties
owl:sameAs is handled as an equivalence relation:
- Symmetric: if
a sameAs bthenb sameAs a - Transitive: if
a sameAs bandb sameAs cthena sameAs c - Reflexive: every resource is same-as itself (implicit)
The engine uses a union-find data structure to efficiently track equivalence classes and select a canonical representative for each.
OWL 2 RL rule index
For reference, the complete set of OWL 2 RL rules implemented by Fluree:
Identity-producing rules (Phase B)
These rules produce owl:sameAs facts and run before other rules to ensure
proper canonicalization.
| Rule | Construct | Description |
|---|---|---|
prp-fp | owl:FunctionalProperty | Same subject + different objects → sameAs |
prp-ifp | owl:InverseFunctionalProperty | Same object + different subjects → sameAs |
prp-key | owl:hasKey | Same class + matching key values → sameAs |
cls-maxc2 | owl:maxCardinality = 1 | Over-cardinality → sameAs |
cls-maxqc3 | owl:maxQualifiedCardinality = 1 | Qualified over-cardinality → sameAs |
cls-maxqc4 | owl:maxQualifiedCardinality = 1 | Variant for owl:Thing |
Non-identity rules (Phase C)
| Rule | Construct | Description |
|---|---|---|
prp-symp | owl:SymmetricProperty | P(x,y) → P(y,x) |
prp-trp | owl:TransitiveProperty | P(x,y) ∧ P(y,z) → P(x,z) |
prp-inv | owl:inverseOf | P(x,y) → Q(y,x) |
prp-dom | rdfs:domain | P(x,y) → type(x,C) |
prp-rng | rdfs:range | P(x,y) → type(y,C) |
prp-spo1 | rdfs:subPropertyOf | P1(x,y) → P2(x,y) |
prp-spo2 | owl:propertyChainAxiom | Chain match → P(first,last) |
cax-sco | rdfs:subClassOf | type(x,C1) → type(x,C2) |
cax-eqc | owl:equivalentClass | type(x,C1) ↔ type(x,C2) |
cls-hv1 | owl:hasValue (backward) | type(x,C) → P(x,v) |
cls-hv2 | owl:hasValue (forward) | P(x,v) → type(x,C) |
cls-svf1 | owl:someValuesFrom | P(x,y) ∧ type(y,D) → type(x,C) |
cls-avf | owl:allValuesFrom | type(x,C) ∧ P(x,y) → type(y,D) |
cls-int1 | owl:intersectionOf (forward) | All member types → intersection type |
cls-int2 | owl:intersectionOf (backward) | Intersection type → all member types |
cls-uni | owl:unionOf | Any member type → union type |
cls-oo | owl:oneOf | Listed individual → enumerated type |
Known limitations
| Area | Limitation |
|---|---|
| Literal hasValue | owl:hasValue with literal values (strings, numbers) is not yet supported; only IRI-valued restrictions work. |
| Negation | owl:complementOf and negation-as-failure are not supported. OWL 2 RL is a positive-only fragment. |
| Disjointness | owl:disjointWith and owl:AllDisjointClasses do not trigger inconsistency detection. |
| Cardinality > 1 | Only maxCardinality = 1 and maxQualifiedCardinality = 1 are implemented (these are the only identity-producing cardinalities in OWL 2 RL). |
| Datatype reasoning | No inference over datatypes (e.g., xsd:integer subtype of xsd:decimal). |
Namespaces
For reference, the standard namespace prefixes:
| Prefix | URI |
|---|---|
rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs | http://www.w3.org/2000/01/rdf-schema# |
owl | http://www.w3.org/2002/07/owl# |
xsd | http://www.w3.org/2001/XMLSchema# |
Crate Map
Fluree is organized into multiple Rust crates, each with a specific purpose. This document provides an overview of the crate architecture and dependencies.
Crate Organization
fluree-db/
├── Core
│ ├── fluree-vocab/ # RDF vocabulary constants and namespace codes
│ ├── fluree-db-core/ # Runtime-agnostic core types and queries
│ └── fluree-db-novelty/ # Novelty overlay and commit types
│
├── Graph Processing
│ ├── fluree-graph-ir/ # Format-agnostic RDF intermediate representation
│ ├── fluree-graph-json-ld/ # JSON-LD processing
│ ├── fluree-graph-turtle/ # Turtle parser
│ └── fluree-graph-format/ # RDF formatters (JSON-LD, Turtle, etc.)
│
├── Query & Transaction
│ ├── fluree-db-query/ # Query engine (JSON-LD Query)
│ ├── fluree-db-sparql/ # SPARQL parser and lowering
│ └── fluree-db-transact/ # Transaction processing
│
├── Storage & Connection
│ ├── fluree-db-connection/ # Storage backends and connection management
│ ├── fluree-db-storage-aws/ # AWS storage (S3, S3 Express, DynamoDB)
│ ├── fluree-db-nameservice/ # Nameservice implementations
│ └── fluree-db-nameservice-sync/# Git-like remote sync for nameservice
│
├── Indexing
│ ├── fluree-db-binary-index/ # Binary index formats + read-side runtime
│ ├── fluree-db-indexer/ # Index building
│ └── fluree-db-ledger/ # Ledger state (indexed DB + novelty)
│
├── Security & Validation
│ ├── fluree-db-policy/ # Policy enforcement
│ ├── fluree-db-credential/ # JWS/VerifiableCredential verification
│ ├── fluree-db-crypto/ # Storage encryption (AES-256-GCM)
│ └── fluree-db-shacl/ # SHACL validation engine
│
├── Reasoning
│ └── fluree-db-reasoner/ # OWL2-RL reasoning engine
│
├── Graph Sources
│ ├── fluree-db-tabular/ # Tabular column batch types
│ ├── fluree-db-iceberg/ # Apache Iceberg integration
│ └── fluree-db-r2rml/ # R2RML mapping support
│
├── Search
│ ├── fluree-search-protocol/ # Search service protocol types
│ ├── fluree-search-service/ # Search backend implementations
│ └── fluree-search-httpd/ # Standalone HTTP search server
│
├── Networking
│ ├── fluree-sse/ # Server-Sent Events parser
│ └── fluree-db-peer/ # SSE protocol for peer mode
│
└── Top-Level
├── fluree-db-api/ # Public API and high-level operations
└── fluree-db-server/ # HTTP server (binary)
Foundation Crates
fluree-vocab
Purpose: RDF vocabulary constants and namespace codes
Responsibilities:
- Standard RDF namespace definitions (rdf:, rdfs:, xsd:, owl:, etc.)
- Fluree-specific namespace codes
- IRI constants for common predicates
Dependencies: None (foundation crate)
fluree-db-core
Purpose: Runtime-agnostic core library for Fluree DB
Responsibilities:
- Core types (Flake, Sid, IndexType, etc.)
- Index structures (SPOT, POST, OPST, PSOT)
- Range query operations
- Database snapshot representation
- Statistics and cardinality tracking
- Content-addressed identity (
ContentId,ContentKind) - Content store trait (
ContentStore)
Key Types:
Flake- Indexed triple representationSid- Subject identifierLedgerSnapshot- Database snapshot at a point in timeIndexType- Index selection enumStatsView- Query statisticsContentId- CIDv1 content-addressed identifierContentKind- Content type enum (Commit, Txn, IndexRoot, etc.)ContentStore- Content-addressed storage traitBranchedContentStore- Recursive content store with namespace fallback for branches
Dependencies:
- fluree-vocab
fluree-db-novelty
Purpose: Novelty overlay and commit types
Responsibilities:
- In-memory novelty (uncommitted/unindexed flakes)
- Commit metadata and structure
- Novelty application and slicing
Key Types:
Novelty- In-memory flake overlayCommit- Commit metadataFlakeId- Novelty flake identifier
Dependencies:
- fluree-db-core
- fluree-db-binary-index
- fluree-vocab
Graph Processing Crates
fluree-graph-ir
Purpose: Format-agnostic RDF intermediate representation
Responsibilities:
- Generic graph IR for RDF data
- Triple/quad representation
- Format-independent graph operations
Dependencies:
- fluree-vocab
fluree-graph-json-ld
Purpose: Minimal JSON-LD processing
Responsibilities:
- JSON-LD expansion
- JSON-LD compaction
- @context handling
- IRI resolution
Dependencies:
- fluree-graph-ir
- fluree-vocab
fluree-graph-turtle
Purpose: Turtle (TTL) parser
Responsibilities:
- Turtle syntax parsing
- Triple generation from Turtle
Dependencies:
- fluree-graph-ir
- fluree-vocab
fluree-graph-format
Purpose: RDF graph formatters
Responsibilities:
- Output formatting (JSON-LD, Turtle, N-Triples)
- Serialization utilities
Dependencies:
- fluree-graph-ir
Query & Transaction Crates
fluree-db-query
Purpose: Query engine for JSON-LD Query
Responsibilities:
- Query parsing and planning
- Statistics-driven pattern reordering across all WHERE-clause pattern types (triples, UNION, OPTIONAL, MINUS, search patterns, Graph, Service, etc.)
- Bound-variable-aware selectivity estimation using HLL-derived property statistics (with heuristic fallbacks)
- Query execution
- Filter pushdown (index-level range filters, inline join/BIND evaluation, dependency-based placement, compound pattern nesting)
- Aggregations
- BM25 and vector search integration
- Explain plan generation for optimization debugging
Key Types:
Query- Parsed queryVarRegistry- Variable managementPattern- Query patternsTriplePattern- Subject–predicate–object pattern with optionalDatatypeConstraintRef- Variable or constant in subject/predicate position (no literals)Term- Variable or constant in object position (includes literals)DatatypeConstraint- Explicit datatype (Explicit(Sid)) or language tag (LangTag; impliesrdf:langStringdatatype)PatternEstimate- Cardinality classification (Source, Reducer, Expander, Deferred)
Dependencies:
- fluree-db-core
fluree-db-sparql
Purpose: SPARQL parsing and execution
Responsibilities:
- SPARQL lexing and parsing
- AST construction
- Lowering to internal IR
- Diagnostic reporting
Key Types:
Query- SPARQL query ASTPattern- Graph patternDiagnostic- Parse/validation errors
Dependencies:
- fluree-db-query
- fluree-db-core
fluree-db-transact
Purpose: Transaction processing
Responsibilities:
- JSON-LD transaction parsing
- RDF triple generation
- Flake generation
- Commit creation
Dependencies:
- fluree-graph-json-ld
- fluree-db-core
Storage & Connection Crates
fluree-db-connection
Purpose: Storage backends and connection management
Responsibilities:
- Storage abstraction trait
- Memory, file, and cloud storage
- Address resolution
- Commit storage and retrieval
Key Types:
StoragetraitMemoryStorageFileStorage
Dependencies:
- fluree-db-core
- fluree-graph-json-ld
- fluree-db-storage-aws (optional)
- fluree-db-nameservice
fluree-db-storage-aws
Purpose: AWS storage backends
Responsibilities:
- S3 storage implementation
- S3 Express One Zone support
- DynamoDB integration
Dependencies:
- fluree-db-core
- fluree-db-nameservice
fluree-db-nameservice
Purpose: Nameservice implementations
Responsibilities:
- Nameservice abstraction
- Ledger metadata management
- Publish/lookup operations
- Branch creation and listing
- File and DynamoDB backends
Key Types:
NameServicetrait (includeslist_branches,create_branch,drop_branch)Publishertrait (commit/index publishing)NsRecord- Nameservice record (includessource_branchfor ancestry andbrancheschild count for reference counting)FileNameService
Dependencies:
- fluree-db-core
fluree-db-nameservice-sync
Purpose: Git-like remote sync for nameservice
Responsibilities:
- Remote nameservice synchronization (fetch/push refs)
- Multi-origin CAS object fetching with integrity verification
- Pack protocol client (streaming binary transport for clone/pull)
- SSE-based change streaming
- Sync driver (fetch/pull/push orchestration)
Key Types:
MultiOriginFetcher- Priority-ordered HTTP origin fallbackHttpOriginFetcher- Single-origin CAS object + pack fetcherSyncDriver- Orchestrates fetch/pull/push with remote clientsPackIngestResult- Result of streaming pack import
Dependencies:
- fluree-db-core
- fluree-db-nameservice
- fluree-db-novelty
- fluree-sse
Indexing Crates
fluree-db-binary-index
Purpose: Binary index wire formats and read-side runtime
Responsibilities:
- Binary index format codecs (FIR6 root, FBR3 branch, FLI3 leaf, leaflet layout)
- Dictionary artifacts and readers (inline dicts, dict trees, arenas)
- Query-time read types (
BinaryIndexStore,BinaryGraphView, cursors)
Dependencies:
- fluree-db-core
fluree-db-indexer
Purpose: Index building for Fluree DB
Responsibilities:
- Incremental index updates
- Full reindexing
- Index refresh orchestration
Dependencies:
- fluree-db-core
- fluree-db-binary-index
- fluree-db-novelty
- fluree-db-nameservice
- fluree-vocab
fluree-db-ledger
Purpose: Ledger state management
Responsibilities:
- Combining indexed DB with novelty overlay
- Ledger snapshot creation
- State transitions
- Building
BranchedContentStoretrees from branch ancestry
Key Types:
LedgerState- Complete ledger snapshot
Dependencies:
- fluree-db-core
- fluree-db-novelty
- fluree-db-nameservice
Security & Validation Crates
fluree-db-policy
Purpose: Policy enforcement
Responsibilities:
- Policy parsing and evaluation
- Query augmentation for policy
- Transaction authorization
Dependencies:
- fluree-db-query
- fluree-db-core
fluree-db-credential
Purpose: Credential verification
Responsibilities:
- JWS signature verification
- VerifiableCredential processing
- DID resolution
Dependencies: None (standalone)
fluree-db-crypto
Purpose: Storage encryption
Responsibilities:
- AES-256-GCM encryption/decryption
- Key management
- Encrypted storage layer
Dependencies:
- fluree-db-core
fluree-db-shacl
Purpose: SHACL validation engine
Responsibilities:
- SHACL shapes parsing
- Constraint validation
- Validation reports
Dependencies:
- fluree-db-core
- fluree-db-query
- fluree-vocab
Reasoning
fluree-db-reasoner
Purpose: OWL2-RL reasoning engine
Responsibilities:
- OWL2-RL rule application
- Inference generation
- Materialization
Dependencies:
- fluree-db-core
- fluree-vocab
Graph Source Crates
fluree-db-tabular
Purpose: Tabular column batch types
Responsibilities:
- Arrow-compatible column batches
- Graph source data abstraction
Dependencies: None (foundation for graph sources)
fluree-db-iceberg
Purpose: Apache Iceberg integration
Responsibilities:
- Iceberg REST catalog support
- Iceberg table scanning
- Parquet file reading
Dependencies:
- fluree-db-core
- fluree-db-tabular
fluree-db-r2rml
Purpose: R2RML mapping support
Responsibilities:
- R2RML mapping parsing
- Relational-to-RDF mapping
- Graph source generation
Dependencies:
- fluree-graph-ir
- fluree-graph-turtle (optional)
- fluree-db-tabular
- fluree-vocab
Search Crates
fluree-search-protocol
Purpose: Search service protocol types
Responsibilities:
- Request/response structs
- Error model and codes
- Protocol version constants
- BM25 and vector query definitions
Dependencies: serde, thiserror
fluree-search-service
Purpose: Search backend implementations
Responsibilities:
SearchBackendtrait- BM25 backend (tantivy)
- Vector backend (usearch, feature-gated)
- Index caching with TTL
Dependencies:
- fluree-search-protocol
- fluree-db-query
- fluree-db-core
fluree-search-httpd
Purpose: Standalone HTTP search server
Responsibilities:
- HTTP API for search queries
- Index loading from storage
- Health and capabilities endpoints
Dependencies:
- fluree-search-protocol
- fluree-search-service
- axum, tokio
Networking Crates
fluree-sse
Purpose: Lightweight SSE parser
Responsibilities:
- Server-Sent Events parsing
- Event stream handling
Dependencies: None (foundation)
fluree-db-peer
Purpose: SSE protocol for peer mode
Responsibilities:
- Peer protocol types
- SSE client for peer communication
Dependencies:
- fluree-sse
Top-Level Crates
fluree-db-api
Purpose: Public API and orchestration
Responsibilities:
- Ledger lifecycle (create, load, drop, branch)
- Query execution coordination
- Transaction execution
- Time travel resolution
- Policy application
- Dataset and view composition
Key Types:
Fluree- Main entry pointGraph- Lazy handle for chainingGraphSnapshot- Materialized snapshotLedgerState- Loaded ledger stateQueryResult- Query resultsTransactResult- Commit receipt
Dependencies:
- fluree-db-query
- fluree-db-sparql
- fluree-db-transact
- fluree-db-connection
- fluree-db-nameservice
- fluree-db-policy
- fluree-db-reasoner
- fluree-db-shacl
fluree-db-server
Purpose: HTTP server (binary)
Responsibilities:
- HTTP API endpoints
- Request routing
- Response formatting
- TLS/SSL, CORS handling
Dependencies:
- fluree-db-api
- axum
Dependency Layers
Layer 5 (Top) fluree-db-server
│
fluree-db-api
│
Layer 4 (Features) ┌──────┼──────┬──────────┬───────────┐
│ │ │ │ │
policy shacl reasoner credential crypto
│ │ │
Layer 3 (Query) └──────┴──────┴──────────┐
│
fluree-db-query ←── fluree-db-sparql
│
Layer 2 (Data) ledger, binary-index, indexer, novelty, connection
│
Layer 1 (Core) fluree-db-core
│
Layer 0 (Foundation) fluree-vocab, fluree-sse, fluree-db-tabular
External Dependencies
Key External Crates
Web Framework:
axum- HTTP server frameworktokio- Async runtimetower- Service abstractions
Serialization:
serde- Serialization frameworkserde_json- JSON support
RDF:
oxiri- IRI parsing and validation
Storage:
aws-sdk-s3- AWS S3 clientaws-sdk-dynamodb- AWS DynamoDB client
Search:
tantivy- BM25 full-text searchusearch- Vector similarity search (HNSW indexes)
Analytics:
iceberg-rust- Apache Iceberg supportparquet- Parquet file reading
Cryptography:
ed25519-dalek- Ed25519 signaturesring- Cryptographic operations
Building
Build All
cargo build --release
Build Server Only
cargo build --release --bin fluree-db-server
Run Tests
cargo test
Build with Features
cargo build --features native,vector
Crate Versions
All crates use synchronized versioning and are updated together.
Check versions:
cargo tree | grep fluree
Related Documentation
- Contributing: Dev Setup - Development environment
- Contributing: Tests - Testing guide
- Glossary - Term definitions
Contributing
Welcome to the Fluree contributor documentation! This section provides everything you need to contribute to Fluree.
Getting Started
Dev Setup
Set up your development environment:
- Install dependencies
- Clone repository
- Build from source
- Run development server
- IDE configuration
Tests
Testing guide:
- Running tests
- Writing tests
- Test organization
- Integration tests
- Benchmarking
- Continuous integration
Adding Tracing Spans
How to instrument new code paths with tracing spans:
- The two-tier span strategy (info / debug / trace)
- Code patterns for sync and async spans
- Deferred field recording
- Testing spans with SpanCaptureLayer
- Common gotchas (
!Sendguards, OTEL floods, etc.)
W3C SPARQL Compliance Suite
Guide to the manifest-driven W3C compliance test suite:
- Running and interpreting results
- Debugging failures
- From failure to issue/PR workflow
- Using Claude Code for compliance work
- Architecture overview
SHACL Implementation
How SHACL validation is wired into Fluree, for contributors adding constraints or fixing bugs:
- Pipeline: compile → cache → validate
- Crate layout (
fluree-db-shacl/-transact/-api) - Shared post-stage helper and its call sites
- Per-graph config,
f:shapesSource, target-type resolution - Adding a new constraint (walkthrough)
- Testing patterns (unit + integration + temp-revert regression trick)
- Known gaps (
sh:uniqueLang,sh:qualifiedValueShape, cross-txn cache)
How to Contribute
Ways to Contribute
- Report Bugs: File issues with reproduction steps
- Suggest Features: Propose enhancements with use cases
- Fix Bugs: Submit pull requests for bug fixes
- Add Features: Implement new capabilities
- Improve Documentation: Fix typos, clarify explanations, add examples
- Review Pull Requests: Help review others’ contributions
- Answer Questions: Help users in discussions
Before Contributing
- Check existing issues: Search for duplicate issues
- Read documentation: Understand the feature area
- Discuss major changes: Open issue before large PRs
- Follow style guide: Match existing code style
- Add tests: Include tests for new features
- Update docs: Document new features
Contribution Workflow
1. Fork Repository
# Fork on GitHub, then clone
git clone https://github.com/YOUR-USERNAME/db.git
cd db
2. Create Branch
git checkout -b feature/my-feature
Branch naming:
feature/- New featuresfix/- Bug fixesdocs/- Documentationrefactor/- Code refactoringtest/- Test additions
3. Make Changes
Edit code, following style guidelines.
4. Add Tests
# Run existing tests
cargo test
# Add new tests
# Edit tests/test_my_feature.rs
5. Run Checks
# Format code
cargo fmt
# Lint code
cargo clippy
# Run all tests
cargo test --all
6. Commit Changes
git add .
git commit -m "Add feature: description"
Commit message format:
Short summary (50 chars or less)
More detailed explanation if needed. Wrap at 72 characters.
- Key point 1
- Key point 2
Fixes #123
7. Push and Create PR
git push origin feature/my-feature
Create pull request on GitHub.
8. Address Review Comments
Respond to reviewer feedback, make requested changes.
Code Style
Rust Style
Follow Rust standard style:
# Format all code
cargo fmt
# Check style
cargo clippy
Naming Conventions
Types: PascalCase
#![allow(unused)]
fn main() {
struct Dataset { ... }
enum QueryResult { ... }
}
Functions: snake_case
#![allow(unused)]
fn main() {
fn execute_query() { ... }
fn parse_json_ld() { ... }
}
Constants: SCREAMING_SNAKE_CASE
#![allow(unused)]
fn main() {
const MAX_QUERY_SIZE: usize = 1_048_576;
}
Modules: snake_case
#![allow(unused)]
fn main() {
mod query_engine;
mod storage_backend;
}
Documentation
Document public APIs:
#![allow(unused)]
fn main() {
/// Executes a query against the dataset.
///
/// # Arguments
///
/// * `query` - The query to execute
/// * `context` - Execution context
///
/// # Returns
///
/// Query results or error
///
/// # Examples
///
/// ```
/// let results = dataset.query(&query, &context)?;
/// ```
pub fn query(&self, query: &Query, context: &Context) -> Result<Vec<Solution>> {
// Implementation
}
}
Error Handling
Use Result types:
#![allow(unused)]
fn main() {
// Good
pub fn parse_query(input: &str) -> Result<Query, ParseError> {
// ...
}
// Bad
pub fn parse_query(input: &str) -> Query {
// No error handling
}
}
Testing
Write tests for new code:
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_query_execution() {
let query = Query::parse("...").unwrap();
let result = execute(&query).unwrap();
assert_eq!(result.len(), 2);
}
}
}
Pull Request Guidelines
PR Title
Format: category: short description
Examples:
feat: Add SPARQL property paths supportfix: Correct transaction time orderingdocs: Update query examplestest: Add integration tests for time travelrefactor: Simplify index scan logic
PR Description
Include:
- Summary: What does this PR do?
- Motivation: Why is this needed?
- Changes: What changed?
- Testing: How was it tested?
- Breaking Changes: Any breaking changes?
Example:
## Summary
Adds support for SPARQL property paths, enabling recursive graph traversal.
## Motivation
Many users need to query hierarchical data structures. Property paths are a standard SPARQL feature.
## Changes
- Added property path parser to fluree-db-sparql
- Implemented path evaluation in query engine
- Added tests for various path patterns
## Testing
- Unit tests for parser
- Integration tests for path queries
- Benchmarks show acceptable performance
## Breaking Changes
None
PR Checklist
- Code follows style guidelines
- Tests added/updated
- Documentation updated
- All tests pass
- No clippy warnings
- Commit messages clear
- PR description complete
Review Process
What Reviewers Look For
- Correctness: Does it work as intended?
- Tests: Adequate test coverage?
- Style: Follows conventions?
- Documentation: Properly documented?
- Performance: No obvious performance issues?
- Breaking Changes: Backward compatible?
Responding to Reviews
- Be receptive to feedback
- Ask questions if unclear
- Make requested changes promptly
- Explain your reasoning when appropriate
- Say thanks for helpful reviews
Community Guidelines
Code of Conduct
- Be respectful and inclusive
- Assume good intentions
- Give constructive feedback
- Welcome newcomers
- No harassment or discrimination
Communication
- GitHub Issues: Bug reports, feature requests
- Pull Requests: Code contributions
- Discussions: Questions, ideas, help
Getting Help
- Read documentation first
- Search existing issues
- Ask specific questions
- Provide reproduction steps
- Be patient and respectful
License
Contributions licensed under Apache 2.0.
By contributing, you agree to license your contributions under the same license.
Recognition
Contributors are recognized in:
- CONTRIBUTORS.md file
- Release notes
- GitHub contributors page
Thank you for contributing to Fluree!
Related Documentation
- Dev Setup - Development environment
- Tests - Testing guide
- Graph Identities and Naming - Naming conventions
Development Setup
This guide walks through setting up a development environment for contributing to Fluree.
Prerequisites
Required
Rust:
# Install rustup
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Verify installation
rustc --version # Should be 1.75.0 or later
cargo --version
Git:
git --version # Should be 2.0 or later
Recommended
IDE/Editor:
- Visual Studio Code with rust-analyzer
- IntelliJ IDEA with Rust plugin
- Vim/Neovim with rust-analyzer LSP
Tools:
cargo-watch- Auto-rebuild on changescargo-nextest- Faster test runnercargo-flamegraph- Performance profiling
cargo install cargo-watch cargo-nextest cargo-flamegraph
Clone Repository
# Clone main repository
git clone https://github.com/fluree/db.git
cd db
# Or clone your fork
git clone https://github.com/YOUR-USERNAME/db.git
cd db
Build from Source
Development Build
# Build all crates
cargo build
# Build specific crate
cd fluree-db-query
cargo build
Release Build
# Optimized build
cargo build --release
# Server binary at: target/release/fluree-db-server
Build Server Only
cargo build --release --bin fluree-db-server
Run Development Server
Quick Start
# Run with default settings (memory storage)
cargo run --bin fluree-db-server
Server starts on http://localhost:8090
With Custom Settings
cargo run --bin fluree-db-server -- \
--storage file \
--data-dir ./dev-data \
--log-level debug
Watch Mode
Auto-rebuild and restart on changes:
cargo watch -x 'run --bin fluree-db-server'
Run Tests
All Tests
cargo test --all
Specific Crate Tests
cd fluree-db-query
cargo test
Specific Test
cargo test test_query_execution
With Output
cargo test -- --nocapture
Integration Tests
cargo test --test integration_tests
With Nextest (Faster)
cargo nextest run
IDE Setup
Visual Studio Code
Install Extensions:
- rust-analyzer
- CodeLLDB (debugging)
- Even Better TOML
Settings (.vscode/settings.json):
{
"rust-analyzer.cargo.features": "all",
"rust-analyzer.checkOnSave.command": "clippy",
"rust-analyzer.inlayHints.enable": true
}
Launch Config (.vscode/launch.json):
{
"version": "0.2.0",
"configurations": [
{
"type": "lldb",
"request": "launch",
"name": "Debug server",
"cargo": {
"args": ["build", "--bin=fluree-db-server"],
"filter": {
"name": "fluree-db-server",
"kind": "bin"
}
},
"args": ["--storage", "memory", "--log-level", "debug"],
"cwd": "${workspaceFolder}"
}
]
}
IntelliJ IDEA
Install Plugin:
- Rust plugin (official)
Configure:
- File → Settings → Languages & Frameworks → Rust
- Set toolchain location
- Enable external linter (clippy)
Vim/Neovim
Install rust-analyzer:
For Neovim with built-in LSP:
-- init.lua
require'lspconfig'.rust_analyzer.setup{}
For Vim with CoC:
" Install coc-rust-analyzer
:CocInstall coc-rust-analyzer
Development Workflow
Make Changes
# Create branch
git checkout -b feature/my-feature
# Edit code
vim fluree-db-query/src/execute.rs
# Format
cargo fmt
# Check
cargo clippy
Test Changes
# Run affected tests
cargo test -p fluree-db-query
# Run all tests
cargo test --all
Verify Build
# Development build
cargo build
# Release build
cargo build --release
# Check all features compile
cargo build --all-features
Run Server Locally
cargo run --bin fluree-db-server -- \
--storage memory \
--log-level debug
Test your changes:
# In another terminal
curl http://localhost:8090/health
curl -X POST http://localhost:8090/v1/fluree/query -d '{...}'
Debugging
With rust-lldb
# Build with debug symbols
cargo build
# Run with lldb
rust-lldb target/debug/fluree-db-server
# Set breakpoint
(lldb) b fluree_db_query::execute::execute_query
(lldb) run --storage memory
# Debug commands
(lldb) continue
(lldb) step
(lldb) print variable_name
With VS Code
Use launch.json configuration from above, then F5 to debug.
Print Debugging
#![allow(unused)]
fn main() {
// Quick debugging
println!("Debug: value = {:?}", value);
// Better: use tracing
tracing::debug!(?value, "Processing query");
}
Logging
Enable debug logs:
RUST_LOG=debug cargo run --bin fluree-db-server
Or trace specific module:
RUST_LOG=fluree_db_query=trace cargo run --bin fluree-db-server
Performance Profiling
Criterion Benchmarks
Run benchmarks:
cargo bench
View results: target/criterion/report/index.html
Flamegraphs
Generate flamegraph:
# Install tools (Linux)
sudo apt install linux-tools-common linux-tools-generic
# Generate flamegraph
cargo flamegraph --bin fluree-db-server
# Open flamegraph.svg in browser
perf (Linux)
# Record
cargo build --release
perf record -g target/release/fluree-db-server
# Report
perf report
Common Development Tasks
Add New Query Feature
- Add to query parser (fluree-db-query/src/parse/)
- Add to query executor (fluree-db-query/src/execute/)
- Add tests (fluree-db-query/tests/)
- Update documentation (docs/query/)
Add New Transaction Feature
- Add to transaction parser (fluree-db-transact/src/parse/)
- Add to staging logic (fluree-db-transact/src/stage.rs)
- Add tests (fluree-db-transact/tests/)
- Update documentation (docs/transactions/)
Add New Storage Backend
- Implement Storage trait (fluree-db-storage/src/)
- Add backend-specific logic
- Add tests
- Update configuration options
- Document in docs/operations/storage.md
Code Organization
Module Structure
fluree-db-query/
├── src/
│ ├── lib.rs # Public API and re-exports
│ ├── triple.rs # TriplePattern, Ref, Term, DatatypeConstraint
│ ├── parse/ # Query parsing
│ │ ├── mod.rs
│ │ ├── ast.rs # Unresolved AST (before IRI resolution)
│ │ ├── lower.rs # AST → IR lowering
│ │ └── node_map.rs # JSON-LD node-map → AST
│ ├── execute/ # Query execution
│ │ ├── mod.rs
│ │ ├── runner.rs
│ │ ├── operator_tree.rs
│ │ └── where_plan.rs # WHERE-clause planning (pattern types, reordering)
│ ├── bind.rs # Variable binding
│ └── filter.rs # Filter evaluation
├── tests/ # Integration tests
└── benches/ # Benchmarks
Import Organization
#![allow(unused)]
fn main() {
// Standard library
use std::collections::HashMap;
// External crates
use serde::{Deserialize, Serialize};
// Internal crates
use fluree_db_common::{Iri, Literal};
// Current crate
use crate::parse::Query;
}
Documentation
Code Documentation
Use Rustdoc:
#![allow(unused)]
fn main() {
/// Executes a query against a dataset.
///
/// This function parses the query, generates an execution plan,
/// and runs the plan against the dataset's indexes.
///
/// # Arguments
///
/// * `dataset` - The dataset to query
/// * `query` - The query to execute
///
/// # Returns
///
/// A vector of solutions (variable bindings)
///
/// # Errors
///
/// Returns error if query is invalid or execution fails
///
/// # Examples
///
/// ```
/// use fluree_db_api::query;
///
/// let results = query(&dataset, &query)?;
/// assert_eq!(results.len(), 10);
/// ```
pub fn query(dataset: &Dataset, query: &Query) -> Result<Vec<Solution>> {
// Implementation
}
}
Generate docs:
cargo doc --open
User Documentation
Update relevant docs in docs/ directory when adding user-facing features.
Dependencies
Adding Dependencies
Add to Cargo.toml:
[dependencies]
serde = { version = "1.0", features = ["derive"] }
tokio = { version = "1.35", features = ["full"] }
Updating Dependencies
# Update all dependencies
cargo update
# Update specific dependency
cargo update -p serde
Checking for Outdated
cargo install cargo-outdated
cargo outdated
Troubleshooting Development Issues
Build Fails
# Clean and rebuild
cargo clean
cargo build
Tests Fail
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_name -- --nocapture
Clippy Warnings
# Fix automatically where possible
cargo clippy --fix
rustfmt Issues
# Format all code
cargo fmt
Development Tools
Cargo Commands
cargo build # Build
cargo test # Test
cargo run # Run
cargo bench # Benchmark
cargo doc # Documentation
cargo clean # Clean
cargo check # Quick check (no binary)
cargo clippy # Lint
cargo fmt # Format
Useful Cargo Plugins
# Install useful plugins
cargo install cargo-watch # Auto-rebuild
cargo install cargo-nextest # Faster tests
cargo install cargo-outdated # Check deps
cargo install cargo-audit # Security audit
cargo install cargo-expand # Expand macros
Performance Tips
Development Builds
Use development builds during development:
- Faster compilation
- Slower execution
- Debug symbols included
Release Builds
Use release builds for testing performance:
cargo build --release
cargo test --release
Link Time Optimization
For maximum performance:
[profile.release]
lto = true
codegen-units = 1
Warning: Significantly slower compile times.
Related Documentation
- Tests - Testing guide
- Graph Identities and Naming - Naming conventions
- Crate Map - Code architecture
Tests
This guide covers testing practices, test organization, and how to run tests in the Fluree codebase.
Test Organization
Unit Tests
Tests in the same file as code:
#![allow(unused)]
fn main() {
// src/query.rs
pub fn execute_query(query: &Query) -> Result<Vec<Solution>> {
// Implementation
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_execute_query() {
let query = Query::parse("SELECT ?s WHERE { ?s ?p ?o }").unwrap();
let results = execute_query(&query).unwrap();
assert!(!results.is_empty());
}
}
}
Integration Tests
Tests in tests/ directory:
#![allow(unused)]
fn main() {
// tests/integration_test.rs
use fluree_db_api::{Dataset, query};
#[test]
fn test_query_workflow() {
let dataset = Dataset::new_memory();
// Insert data
dataset.transact(test_data()).unwrap();
// Query data
let results = query(&dataset, test_query()).unwrap();
// Verify
assert_eq!(results.len(), 5);
}
}
Example Tests
Tests in examples/:
// examples/basic_query.rs
fn main() -> Result<()> {
let dataset = Dataset::new_memory();
dataset.transact(sample_data())?;
let results = dataset.query(sample_query())?;
println!("Results: {:?}", results);
Ok(())
}
Run with:
cargo run --example basic_query
Running Tests
All Tests
cargo test --all
Opt-in LocalStack (S3/DynamoDB) tests
Some AWS/S3 tests are intentionally opt-in and will not run during typical cargo test runs.
They require Docker and start LocalStack automatically.
cargo test -p fluree-db-connection --features aws-testcontainers --test aws_testcontainers_test -- --nocapture
Specific Crate
cargo test -p fluree-db-query
Specific Test
cargo test test_query_execution
With Output
cargo test -- --nocapture
Integration Tests Only
cargo test --test '*'
Doc Tests
cargo test --doc
With Nextest (Faster)
cargo nextest run
Writing Tests
Unit Test Example
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_simple_query() {
let input = r#"{"select": ["?s"], "where": [{"@id": "?s"}]}"#;
let query = parse_query(input).unwrap();
assert_eq!(query.select_vars.len(), 1);
assert_eq!(query.where_patterns.len(), 1);
}
#[test]
fn test_parse_invalid_query() {
let input = "invalid json";
let result = parse_query(input);
assert!(result.is_err());
assert!(matches!(result.unwrap_err(), ParseError::InvalidJson));
}
}
}
Integration Test Example
#![allow(unused)]
fn main() {
// tests/it_query.rs
use fluree_db_api::*;
#[tokio::test]
async fn test_basic_query() {
// Setup
let dataset = Dataset::new_memory().await.unwrap();
// Insert test data
let txn = r#"{
"@context": {"ex": "http://example.org/ns/"},
"@graph": [{"@id": "ex:alice", "ex:name": "Alice"}]
}"#;
dataset.transact(txn).await.unwrap();
// Execute query
let query = r#"{
"from": "test:main",
"select": ["?name"],
"where": [{"@id": "?s", "ex:name": "?name"}]
}"#;
let results = dataset.query(query).await.unwrap();
// Verify
assert_eq!(results.len(), 1);
assert_eq!(results[0]["name"], "Alice");
}
}
Async Tests
Use tokio test runtime:
#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_async_operation() {
let result = async_function().await.unwrap();
assert_eq!(result, expected);
}
}
Property-Based Tests
Use proptest for property-based testing:
#![allow(unused)]
fn main() {
use proptest::prelude::*;
proptest! {
#[test]
fn test_parse_roundtrip(s in "\\PC*") {
let iri = Iri::parse(&s)?;
let serialized = iri.to_string();
let reparsed = Iri::parse(&serialized)?;
assert_eq!(iri, reparsed);
}
}
}
Test Fixtures
Test Data
Create reusable test data:
#![allow(unused)]
fn main() {
// tests/fixtures/mod.rs
pub fn sample_person_data() -> &'static str {
r#"{
"@context": {"schema": "http://schema.org/"},
"@graph": [
{"@id": "ex:alice", "@type": "schema:Person", "schema:name": "Alice"},
{"@id": "ex:bob", "@type": "schema:Person", "schema:name": "Bob"}
]
}"#
}
pub fn sample_query() -> &'static str {
r#"{
"select": ["?name"],
"where": [{"@id": "?p", "schema:name": "?name"}]
}"#
}
}
Use in tests:
#![allow(unused)]
fn main() {
#[test]
fn test_with_fixtures() {
let dataset = Dataset::new_memory();
dataset.transact(fixtures::sample_person_data()).unwrap();
let results = dataset.query(fixtures::sample_query()).unwrap();
assert_eq!(results.len(), 2);
}
}
Test Helpers
#![allow(unused)]
fn main() {
// tests/helpers/mod.rs
pub async fn setup_test_dataset() -> Dataset {
let dataset = Dataset::new_memory().await.unwrap();
dataset.transact(sample_data()).await.unwrap();
dataset
}
pub fn assert_query_results(results: &[Solution], expected: &[(&str, &str)]) {
assert_eq!(results.len(), expected.len());
for (result, (var, value)) in results.iter().zip(expected) {
assert_eq!(result.get(var).unwrap().to_string(), *value);
}
}
}
Test Categories
Fast Tests
Quick unit tests:
#![allow(unused)]
fn main() {
#[test]
fn test_fast_operation() {
// < 1ms execution
}
}
Slow Tests
Tests that take longer:
#![allow(unused)]
fn main() {
#[test]
#[ignore] // Ignored by default
fn test_slow_operation() {
// > 1s execution
}
}
Run slow tests:
cargo test -- --ignored
Integration Tests
End-to-end workflows:
#![allow(unused)]
fn main() {
// tests/it_full_workflow.rs
#[tokio::test]
async fn test_complete_workflow() {
let dataset = setup_test_dataset().await;
// Multiple operations
transact_initial_data(&dataset).await;
query_and_verify(&dataset).await;
update_data(&dataset).await;
query_history(&dataset).await;
}
}
Benchmarking
Criterion Benchmarks
Create benchmarks:
#![allow(unused)]
fn main() {
// benches/query_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use fluree_db_query::*;
fn benchmark_query_execution(c: &mut Criterion) {
let dataset = setup_benchmark_dataset();
let query = parse_query(QUERY).unwrap();
c.bench_function("query execution", |b| {
b.iter(|| {
execute_query(black_box(&dataset), black_box(&query))
});
});
}
criterion_group!(benches, benchmark_query_execution);
criterion_main!(benches);
}
Run benchmarks:
cargo bench
Comparison Benchmarks
Compare different approaches:
#![allow(unused)]
fn main() {
fn benchmark_approaches(c: &mut Criterion) {
let mut group = c.benchmark_group("approach_comparison");
group.bench_function("approach_1", |b| {
b.iter(|| approach_1(black_box(&data)))
});
group.bench_function("approach_2", |b| {
b.iter(|| approach_2(black_box(&data)))
});
group.finish();
}
}
Continuous Integration
GitHub Actions
Tests run automatically on:
- Pull requests
- Commits to main
- Scheduled (nightly)
Workflow: .github/workflows/test.yml
name: Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
- run: cargo test --all
- run: cargo clippy -- -D warnings
- run: cargo fmt -- --check
Pre-commit Checks
Run before committing:
#!/bin/bash
# .git/hooks/pre-commit
cargo fmt --check || exit 1
cargo clippy -- -D warnings || exit 1
cargo test --all || exit 1
Make executable:
chmod +x .git/hooks/pre-commit
Test Best Practices
1. Test One Thing
Each test should verify one behavior:
Good:
#![allow(unused)]
fn main() {
#[test]
fn test_query_returns_correct_count() {
let results = query(&dataset, &query).unwrap();
assert_eq!(results.len(), 5);
}
#[test]
fn test_query_returns_correct_values() {
let results = query(&dataset, &query).unwrap();
assert_eq!(results[0]["name"], "Alice");
}
}
Bad:
#![allow(unused)]
fn main() {
#[test]
fn test_query() {
let results = query(&dataset, &query).unwrap();
assert_eq!(results.len(), 5);
assert_eq!(results[0]["name"], "Alice");
assert_eq!(results[1]["name"], "Bob");
// Too many assertions
}
}
2. Use Descriptive Names
#![allow(unused)]
fn main() {
#[test]
fn test_query_with_filter_returns_only_matching_results() {
// Clear what's being tested
}
}
3. Arrange-Act-Assert
Structure tests clearly:
#![allow(unused)]
fn main() {
#[test]
fn test_example() {
// Arrange: Setup
let dataset = setup_test_dataset();
let query = parse_query(TEST_QUERY);
// Act: Execute
let results = execute_query(&dataset, &query).unwrap();
// Assert: Verify
assert_eq!(results.len(), 3);
}
}
4. Test Error Cases
#![allow(unused)]
fn main() {
#[test]
fn test_invalid_query_returns_error() {
let result = parse_query("invalid");
assert!(result.is_err());
}
#[tokio::test]
async fn test_missing_ledger_returns_ledger_not_found() {
let result = fluree.ledger("nonexistent:main").await;
assert!(matches!(result.unwrap_err(), Error::LedgerNotFound(_)));
}
}
5. Avoid Flaky Tests
Don’t depend on:
- Timing
- Random values (use seeded RNG)
- External services
- File system state
6. Clean Up Resources
#![allow(unused)]
fn main() {
#[test]
fn test_with_temp_file() {
let temp_dir = tempfile::tempdir().unwrap();
let file_path = temp_dir.path().join("test.db");
// Test with file_path
// temp_dir automatically cleaned up
}
}
7. Use Test Utilities
#![allow(unused)]
fn main() {
// tests/common/mod.rs
pub fn assert_solution_contains(solutions: &[Solution], var: &str, value: &str) {
let found = solutions.iter().any(|s| {
s.get(var).map(|v| v.to_string() == value).unwrap_or(false)
});
assert!(found, "Expected to find {}={} in results", var, value);
}
}
W3C SPARQL Compliance Tests
The testsuite-sparql crate runs official W3C SPARQL test cases against Fluree’s parser and query engine. Tests are discovered automatically from W3C manifest files — zero hand-written test cases.
# Run all W3C SPARQL tests
cargo test -p testsuite-sparql
# Run with verbose output
cargo test -p testsuite-sparql -- --nocapture 2>&1
The suite covers SPARQL 1.0 and 1.1 syntax tests (293 tests) plus query evaluation tests across 12 categories (233 tests). Eval tests are #[ignore]’d by default — run with --include-ignored or via make test-eval in testsuite-sparql/.
For the full guide on interpreting results, debugging failures, and contributing fixes, see the W3C SPARQL Compliance Suite guide.
Test Coverage
Generate Coverage Report
Using tarpaulin:
cargo install cargo-tarpaulin
cargo tarpaulin --out Html --output-dir coverage/
View: coverage/index.html
Coverage Goals
- Core functionality: 90%+ coverage
- Edge cases: Tested
- Error paths: Tested
- Public APIs: 100% covered
Related Documentation
- Dev Setup - Development environment
- Graph Identities and Naming - Naming conventions
- Contributing - Contribution guidelines
W3C SPARQL Compliance Test Suite
The testsuite-sparql crate runs official W3C SPARQL test cases against Fluree’s parser and query engine. Every test is discovered automatically from W3C manifest files — there are zero hand-written test cases.
This guide covers how to run the suite, interpret results, and turn failures into fixes.
Why This Exists
The W3C publishes its SPARQL test suite as RDF data. Each manifest.ttl file declares test entries: a query file, optional input data, and expected results. Every serious SPARQL implementation (Oxigraph, Apache Jena, Eclipse RDF4J) runs these manifests programmatically. We do the same.
The ratio is extraordinary: ~700 lines of Rust infrastructure drive 700+ W3C test cases. Each failure is a spec-backed bug report with built-in test data and expected results.
Philosophy: failures are features. When a test fails, the default response is to fix Fluree, not skip the test. Skip entries are reserved for documented, deliberate design divergences reviewed by the team.
Quick Start
Important: The
testsuite-sparqlcrate is excluded from the Cargo workspace (see rootCargo.toml). You mustcd testsuite-sparql/before running anycargoormakecommands. Usingcargo test -p testsuite-sparqlfrom the workspace root will fail.
All commands below assume you are already in testsuite-sparql/.
Run All Tests
cd testsuite-sparql
cargo test
This runs all non-ignored W3C test suites. Currently that includes SPARQL 1.0 and 1.1 syntax tests. Query evaluation tests (12 categories, 327 tests) are registered but #[ignore]’d — run them with --include-ignored or via the Makefile.
Run a Specific Suite
# SPARQL 1.1 syntax only
cargo test sparql11_syntax_query_tests
# SPARQL 1.0 syntax only
cargo test sparql10_syntax_tests
# Full query evaluation (~5 min, includes all 12 categories)
cargo test sparql11_query_w3c_testsuite -- --include-ignored
# Single evaluation category
cargo test sparql11_functions -- --include-ignored
Run With Verbose Output
cargo test -- --nocapture 2>&1
The suite writes progress to stderr (Running test N: <test_id> ...) and a summary at the end.
Using the Makefile
The testsuite-sparql/Makefile provides convenience targets:
# --- Running tests ---
make test # Run syntax tests (live output)
make test-syntax11 # SPARQL 1.1 syntax tests only
make test-syntax10 # SPARQL 1.0 syntax tests only
make test-eval # Full eval suite, all 12 categories
make test-eval-cat CAT=functions
# Run one eval category
make test-eval10 # Run SPARQL 1.0 eval tests
# --- Reports ---
make count-eval # Quick pass/fail counts for eval tests
make report-eval-json # JSON report for 1.1 eval → report-eval.json
make report-10-json # JSON report for 1.0 eval → report-10.json
make cat-json CAT=functions
# JSON report for a single category
# --- Analysis (requires report-eval.json) ---
make summary # Per-category pass/fail breakdown
make classify # Group failures by error type
make failures-eval # List all eval failures with type
make failures-eval CAT=functions
# Filter failures to one category
# --- Investigating specific tests ---
make investigate-eval TEST=substring01
# Search eval report for a test
make show-query TEST=syntax-select-expr-04.rq
# Print the .rq file for a test
make clean # Remove generated report files
Understanding the Output
Test Summary
After running, the suite prints:
=== Test Summary ===
Total: 94
Passed: 79
Ignored: 0
Failed: 15
- Total: Number of W3C test cases discovered from manifest files
- Passed: Tests where Fluree’s behavior matched the W3C expectation
- Ignored: Tests in the skip list (should be near zero)
- Failed: Tests where Fluree diverged from the spec — these are bugs or gaps
Failure Messages
Each failure includes the test ID, type, and error details:
https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl#test_34:
Positive syntax test failed — parser rejected valid query.
Test: ...#test_34
File: .../syntax-query/syntax-select-expr-04.rq
For syntax tests, failures fall into three categories:
| Failure Type | What It Means | Example |
|---|---|---|
| Positive test fails | Parser rejects valid SPARQL | Missing feature (subqueries, property path |) |
| Negative test fails | Parser accepts invalid SPARQL | Missing validation (BIND scope, GROUP BY scope) |
| Parser timeout | Parser enters infinite loop | Bug in grammar handling (mitigated by safety-net forward-progress check) |
Test IDs
Every test has a unique IRI like:
https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl#test_34
The fragment (#test_34) identifies the specific test within that manifest. The path tells you the W3C category (syntax-query, aggregates, bind, etc.).
Analyzing Results
Per-Category Breakdown
Use make summary to see pass/fail rates by W3C category:
make summary
This requires report-eval.json (generated automatically if missing). Output looks like:
Category Pass Fail Total Rate
----------------------------------------------------
syntax-query 80 14 94 85%
subquery 8 6 14 57%
functions 27 48 75 36%
...
----------------------------------------------------
TOTAL 167 160 327 51.1%
Error Classification
Use make classify to group failures by root cause:
make classify
Error types:
- RESULT MISMATCH — Query runs but returns wrong values
- INTERNAL ERROR — Execution fails with an internal error
- PARSE/LOWERING — SPARQL parsing or IR lowering fails
- NEGATIVE SYNTAX — Parser accepts a query it should reject
- POSITIVE SYNTAX — Parser rejects a query it should accept
- EMPTY RESULTS — Query returns no results when some were expected
- NOT IMPLEMENTED — Feature not yet implemented
- PANIC — Subprocess crashed (usually an index/unwrap bug)
- TIMEOUT — Test exceeded 5s (syntax) or 10s (eval) timeout
Listing Failures
Use make failures-eval to list all failures with their type and first error line:
make failures-eval # All failures
make failures-eval CAT=functions # Just one category
JSON Reports
For programmatic analysis, generate a JSON report:
make report-eval-json # → report-eval.json
make report-10-json # → report-10.json
make cat-json CAT=bind # → report-bind.json
Report format:
{
"total": 327, "passed": 167, "failed": 160, "pass_rate": "51.1%",
"tests": [
{ "test_id": "http://...#agg01", "status": "pass", "error": null, "timeout": false },
{ "test_id": "http://...#agg02", "status": "fail", "error": "Results not isomorphic...", "timeout": false }
]
}
The analysis script at scripts/analyze_report.py can also be used directly:
python3 scripts/analyze_report.py summary report-eval.json
python3 scripts/analyze_report.py classify report-eval.json
python3 scripts/analyze_report.py failures report-eval.json --category functions
From Failure to Fix: The Workflow
Step 1: Identify the Failure Category
Run the suite and look at the failure message:
cargo test sparql11_syntax_query_tests -- --nocapture 2>&1 | tail -40
Determine which category:
- Parser timeout → Bug in
fluree-db-sparqlgrammar rules causing infinite loop (mitigated by safety-net forward-progress check inparse_group_graph_pattern(), but can still occur in other parse entry points) - Positive syntax rejected → Missing parser feature or incorrect grammar
- Negative syntax accepted → Missing semantic validation pass
- Query evaluation mismatch → Bug in query engine, data loading, or result formatting
Step 2: Find the Test Query
Every W3C test references a .rq (query) or .ru (update) file. The failure message includes the file URL. Map it to a local path:
URL: https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq
Local: testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq
The pattern: strip https://w3c.github.io/rdf-tests/ and prepend testsuite-sparql/rdf-tests/.
Read the query to understand what SPARQL feature is being tested:
cat testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq
Step 3: Reproduce in Isolation
Try parsing the query directly to see the exact error:
#![allow(unused)]
fn main() {
// Quick test in fluree-db-sparql
let output = fluree_db_sparql::parse_sparql("SELECT (1 + ?x AS ?y) WHERE { ?x ?p ?o }");
println!("has_errors: {}", output.has_errors());
for err in output.errors() {
println!(" error: {err:?}");
}
}
If you suspect an infinite loop, the subprocess timeout will catch it automatically when run via the harness.
Step 4: Investigate the Root Cause
For parser issues, the relevant code is in fluree-db-sparql/. Start with:
src/parser/— Grammar rules and parser combinatorssrc/ast/— AST types the parser emits
For query evaluation issues, the chain is:
fluree-db-sparql→ parses toSparqlAstfluree-db-query→ evaluates the AST against a ledgerfluree-db-api→ orchestrates ledger creation and query execution
Step 5: Create an Issue
Use this template:
## W3C SPARQL Compliance: [short description]
**Test ID:** `https://w3c.github.io/rdf-tests/sparql/sparql11/[category]/manifest.ttl#[test_name]`
**Category:** [syntax-query | aggregates | bind | etc.]
**Failure type:** [parser timeout | positive syntax rejected | negative syntax accepted | evaluation mismatch]
### Test Query
\`\`\`sparql
[paste the .rq file contents]
\`\`\`
### Expected Behavior
[For positive syntax: should parse successfully]
[For negative syntax: should be rejected]
[For evaluation: expected results from the .srx/.srj file]
### Actual Behavior
[Error message or incorrect output]
### Root Cause Analysis
[What part of the code needs to change and why]
### W3C Spec Reference
[Link to relevant section of https://www.w3.org/TR/sparql11-query/]
Step 6: Fix and Verify
After making code changes:
# Verify the specific test passes (from testsuite-sparql/)
cargo test sparql11_syntax_query_tests -- --nocapture 2>&1 | grep "test_34"
# Verify you haven't regressed other tests
make count-eval
# Run the parser's own tests (from workspace root)
cd .. && cargo test -p fluree-db-sparql
# Full CI parity check
cargo clippy -p fluree-db-sparql --all-features -- -D warnings
Using Claude Code for Debugging
Claude Code is particularly effective for SPARQL compliance work because each failure is self-contained: a query file, an expected behavior, and a specific error. Here’s how to give a session full context.
Prompt Template for Parser Failures
I'm working on W3C SPARQL compliance in Fluree. The following test is failing:
Test ID: https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl#test_34
Category: Positive syntax test (parser should accept this query but rejects it)
The query file is at: testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/syntax-select-expr-04.rq
The SPARQL parser is in fluree-db-sparql/. The parse entry point is
`parse_sparql()` which returns `ParseOutput<SparqlAst>` — check `has_errors()`.
Please:
1. Read the failing query file
2. Understand what SPARQL feature it tests
3. Find the relevant parser grammar in fluree-db-sparql/src/parser/
4. Identify why the parser rejects this input
5. Propose a fix
Prompt Template for Query Evaluation Failures
I'm working on W3C SPARQL compliance. This query evaluation test is failing:
Test ID: https://w3c.github.io/rdf-tests/sparql/sparql11/aggregates/manifest.ttl#agg01
Test data: testsuite-sparql/rdf-tests/sparql/sparql11/aggregates/agg01.ttl
Query file: testsuite-sparql/rdf-tests/sparql/sparql11/aggregates/agg01.rq
Expected results: testsuite-sparql/rdf-tests/sparql/sparql11/aggregates/agg01.srx
The test harness creates an in-memory Fluree ledger, loads the data via
stage_owned().insert_turtle(), executes the query via query_sparql(), and
compares results.
Actual output: [paste actual output]
Expected output: [paste expected from .srx file]
Please investigate why the results differ and propose a fix.
Key Files to Reference
When asking Claude Code for help, these files provide essential context:
| Context Needed | File(s) |
|---|---|
| Test harness architecture | testsuite-sparql/src/lib.rs, src/evaluator.rs |
| Subprocess timeout isolation | testsuite-sparql/src/subprocess.rs |
| Subprocess worker binary | testsuite-sparql/src/bin/run_w3c_test.rs |
| How manifests are parsed | testsuite-sparql/src/manifest.rs |
| Syntax test handlers | testsuite-sparql/src/sparql_handlers.rs |
| Eval test handler (data load + query + compare) | testsuite-sparql/src/query_handler.rs |
| Expected result parsing (.srx/.srj) | testsuite-sparql/src/result_format.rs |
| Isomorphic result comparison | testsuite-sparql/src/result_comparison.rs |
| SPARQL parser entry point | fluree-db-sparql/src/lib.rs (parse_sparql()) |
| Parser grammar rules | fluree-db-sparql/src/parser/ |
| SPARQL AST types | fluree-db-sparql/src/ast/ |
| Query engine | fluree-db-query/src/ |
| API orchestration | fluree-db-api/src/ |
| W3C SPARQL test categories | testsuite-sparql/tests/w3c_sparql.rs |
Batch Processing Tips
When multiple tests fail for the same root cause (e.g., “all BIND tests timeout”), group them:
These 3 tests all timeout in the parser on BIND expressions:
- test_34: SELECT (1 + ?x AS ?y)
- test_40: SELECT (CONCAT(?x, "!") AS ?label)
- test_65: subquery with SELECT expression
All are in testsuite-sparql/rdf-tests/sparql/sparql11/syntax-query/.
The parser code for BIND is in fluree-db-sparql/src/parser/. Please find the
common root cause and fix all three.
JSON-LD Query Parity
SPARQL and JSON-LD queries in Fluree compile to the same intermediate representation (fluree-db-query/src/ir.rs) and share the entire execution engine. This means:
-
Shared code changes affect both languages. If you add a new
Expressionvariant,Patternvariant, orAggregateFnfor SPARQL, it automatically becomes available to JSON-LD query as well. Ensure JSON-LD tests still pass. -
New SPARQL features may need JSON-LD test coverage. If a feature you’re implementing for SPARQL compliance (e.g., a new built-in function, a new filter operator) is also expressible in JSON-LD query syntax, add corresponding JSON-LD integration tests.
-
Some features are SPARQL-only. Property paths, RDF-star, ASK query form, and SPARQL Update don’t have JSON-LD equivalents. These don’t require parity testing.
Where to add parity tests
| Language | Test files |
|---|---|
| SPARQL | fluree-db-api/tests/it_query_sparql.rs |
| JSON-LD | fluree-db-api/tests/it_query.rs, it_query_analytical.rs, it_query_grouping.rs |
| Shared | Unit tests in fluree-db-query/src/ modules |
Validation after shared-code changes
# SPARQL W3C tests (from testsuite-sparql/)
make test-eval-cat CAT=<category>
# JSON-LD query tests (from workspace root)
cargo test -p fluree-db-api --test it_query
cargo test -p fluree-db-api --test it_query_analytical
Architecture Overview
Crate Structure
testsuite-sparql/
├── Cargo.toml # Excluded from workspace, publish = false
├── Makefile # Developer convenience targets
├── scripts/
│ └── analyze_report.py # JSON report analysis (summary, classify, failures)
├── src/
│ ├── lib.rs # check_testsuite() entry point
│ ├── vocab.rs # W3C namespace constants (mf:, qt:, etc.)
│ ├── files.rs # URL → local file path mapping
│ ├── manifest.rs # TestManifest: Iterator<Item=Test>
│ ├── evaluator.rs # TestEvaluator: type → handler dispatch
│ ├── sparql_handlers.rs # Handler registration (syntax + eval)
│ ├── query_handler.rs # QueryEvaluationTest: load data, run query, compare
│ ├── subprocess.rs # Subprocess isolation for timeout enforcement
│ ├── result_format.rs # Parse .srx/.srj expected result files
│ ├── result_comparison.rs # Isomorphic result comparison (blank node mapping)
│ ├── report.rs # JSON report generation
│ └── bin/
│ └── run_w3c_test.rs # Subprocess worker binary
├── tests/
│ └── w3c_sparql.rs # Test entry points (syntax + 12 eval categories)
└── rdf-tests/ # Git submodule → github.com/w3c/rdf-tests
How It Works
1. Manifest Parsing (manifest.rs): TestManifest implements Iterator<Item = Result<Test>>. It loads manifest.ttl files using Fluree’s own Turtle parser, follows mf:include links recursively, and extracts per-test metadata: type, query file, data file, expected results.
2. Handler Dispatch (evaluator.rs): TestEvaluator maps test type URIs (e.g., mf:PositiveSyntaxTest11) to handler functions. For each test, it finds the matching handler and invokes it.
3. SPARQL Handlers (sparql_handlers.rs + query_handler.rs): The Fluree-specific logic. Both syntax and evaluation tests run in isolated subprocesses via the run-w3c-test binary (subprocess.rs). For syntax tests, the subprocess calls parse_sparql() + validate() and reports whether errors were found (5-second timeout). For evaluation tests, the subprocess creates an in-memory Fluree ledger, loads Turtle test data, executes the SPARQL query, and compares results against expected .srx/.srj files using isomorphic matching (10-second timeout). If a test exceeds its timeout, the parent kills the child process — no zombie threads.
4. Test Entry Points (tests/w3c_sparql.rs): Each test function is ~5 lines — just a manifest URL and a skip list. The harness does the rest.
Key Design Decisions
- Subprocess isolation for all test execution. Each syntax and eval test runs in a child process (
run-w3c-testbinary) that can be killed on timeout. This prevents zombie threads from parser infinite loops or runaway queries. - Syntax timeout: 5 seconds, eval timeout: 10 seconds. If a test exceeds its limit, the subprocess is killed and the test is marked as a timeout failure.
- Uses Fluree’s own Turtle parser for manifest files. If our parser can’t handle well-formed W3C manifests, that’s a bug worth knowing about.
- Fluree’s
list_indexapproach (instead ofrdf:first/rdf:rest) simplifies manifest list handling. @baseprepended to manifest files since they use<>(empty relative IRI) which requires a base.
Test Categories
Syntax Tests (Phase 1)
| Suite | What It Tests | Manifest |
|---|---|---|
| SPARQL 1.1 syntax | Parser correctness for SPARQL 1.1 grammar | syntax-query/manifest.ttl |
| SPARQL 1.0 syntax | Backward compatibility with SPARQL 1.0 | manifest-syntax.ttl |
Query Evaluation Tests (Phase 2)
Each test creates an in-memory Fluree ledger, loads RDF data, executes a SPARQL query, and compares results against W3C expected outputs. Run with make test-eval-cat CAT=<name>.
| Suite | What It Tests | Manifest |
|---|---|---|
| Aggregates | COUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT, SAMPLE | aggregates/manifest.ttl |
| BIND | BIND expressions, variable assignment | bind/manifest.ttl |
| Bindings | VALUES inline data | bindings/manifest.ttl |
| Cast | xsd:integer(), xsd:double(), xsd:string() | cast/manifest.ttl |
| Construct | CONSTRUCT query form | construct/manifest.ttl |
| Exists | FILTER EXISTS, FILTER NOT EXISTS | exists/manifest.ttl |
| Functions | String, numeric, date/time, hash, IRI functions | functions/manifest.ttl |
| Grouping | GROUP BY semantics, error handling | grouping/manifest.ttl |
| Negation | MINUS, NOT EXISTS | negation/manifest.ttl |
| Project-Expression | SELECT expressions, AS aliases | project-expression/manifest.ttl |
| Property-Path | /, |, ^, +, *, ? operators | property-path/manifest.ttl |
| Subquery | Nested SELECT within WHERE | subquery/manifest.ttl |
BIND / VALUES Compliance Notes
BIND (10/10 — 100%):
- Fixed lexer to tokenize
+/-as separate operators per the SPARQL spec (INTEGERis unsigned;INTEGER_POSITIVE/INTEGER_NEGATIVEare grammar-level). This fixed?o+10being mis-tokenized asVar, Integer(10)instead ofVar, Plus, Integer(10). - BIND input variable liveness is handled by
precompute_suffix_vars(cross-block) andpending_binds.expr.variables()(within-block) in the WHERE planner — no special handling needed incompute_variable_deps. - Explicitly nested
{ }blocks inside WHERE are lowered as anonymous subqueries (SubqueryPattern) to preserve SPARQL scope boundaries (bind10).
VALUES / Bindings (10/11 — 91%):
- Post-query VALUES (
WHERE { ... } VALUES ?x { ... }) is now parsed and lowered. Addedvaluesfield onSelectQueryAST andpost_valuesfield onParsedQueryto prevent the planner from reordering it relative to OPTIONAL/UNION. NestedLoopJoinOperator::combine_rowsfixed to handleUnbound/Poisonedleft-side shared variables by falling back to right-side values. This fixes VALUES with UNDEF (values4, values5, values8).ValuesOperatorupdated to treatPoisoned(from failed OPTIONAL) as wildcard inis_compatibleandmerge_rows, fixing values7 (OPTIONAL + VALUES).- Remaining failure:
graphtest requires named graph support (GRAPH keyword) — tracked separately.
Managing the Skip List
Skip entries are the ignored_tests parameter in check_testsuite() calls:
#![allow(unused)]
fn main() {
check_testsuite(
"https://w3c.github.io/rdf-tests/sparql/sparql11/syntax-query/manifest.ttl",
&[
// Deliberately accept bare `1` as integer literal (RDF 1.1 vs 1.0)
// Spec: https://www.w3.org/TR/sparql11-query/#rNumericLiteral
// Reviewed: 2025-02-15 by @ajohnson, @bsmith
"https://...#test_99",
],
)
}
Rules:
- Start with an empty skip list. Expect full compliance.
- Only add entries after investigation confirms a deliberate design choice, not a bug.
- Every skip entry must have a comment explaining why, linking to the relevant spec section.
- Skip entries require review by 2+ team members.
- The total skip list should be <5% of tests (Oxigraph skips ~25 out of 700+).
- Review skip entries periodically — remove them as features are added.
Updating the rdf-tests Submodule
The W3C test data lives in a git submodule at testsuite-sparql/rdf-tests/. To update to the latest W3C tests:
cd testsuite-sparql/rdf-tests
git pull origin main
cd ../..
git add testsuite-sparql/rdf-tests
git commit -m "chore: update W3C rdf-tests submodule"
After updating, run the full suite to check for new tests or changed expectations:
cd testsuite-sparql
cargo test
Related Documentation
- Tests guide — General testing practices
- SPARQL query docs — User-facing SPARQL feature documentation
- Compatibility — Standards compliance status
- Crate map — Workspace architecture
SHACL Implementation
This is the contributor-facing guide to how SHACL validation is wired into Fluree. It covers the pipeline, the crate layout, and the places you’ll want to touch when fixing a bug or adding a constraint.
User-facing docs: Cookbook: SHACL Validation and Setting Groups — SHACL.
Pipeline at a glance
Transaction flakes
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ fluree-db-transact :: stage() │
│ stages flakes into a StagedLedger (novelty overlay) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ fluree-db-api :: apply_shacl_policy_to_staged_view() │
│ (shared post-stage helper — called from every write surface) │
│ │
│ 1. load_transaction_config(ledger) │
│ 2. build_per_graph_shacl_policy(config, graph_delta) │
│ → HashMap<GraphId, ShaclGraphPolicy> │
│ 3. resolve_shapes_source_g_ids(config, snapshot) │
│ → Vec<GraphId> (where to compile shapes from) │
│ 4. ShaclEngine::from_dbs_with_overlay(&[GraphDbRef], ledger) │
│ 5. validate_view_with_shacl(view, cache, ..., per_graph_policy)│
│ → ShaclValidationOutcome { reject, warn } │
│ 6. log warn bucket; propagate ShaclViolation for reject bucket │
└─────────────────────────────────────────────────────────────────┘
Crate layout
| Crate | Role |
|---|---|
fluree-db-shacl | SHACL engine: shape compilation, cache, per-node validation, constraint evaluators. No transaction-layer concerns. |
fluree-db-transact | Staged-validation plumbing: validate_view_with_shacl, validate_staged_nodes. Knows about StagedLedger, staged flakes, and graph routing. Defines the per-graph policy types. |
fluree-db-api | Config resolution, policy building, and the shared helper that every write surface (JSON-LD, Turtle, commit replay) calls through. |
SHACL is feature-gated (shacl). See Standards and feature flags.
The shared post-stage helper
All SHACL-enforced write surfaces route through apply_shacl_policy_to_staged_view in fluree-db-api/src/tx.rs:
#![allow(unused)]
fn main() {
pub(crate) async fn apply_shacl_policy_to_staged_view(
view: &StagedLedger,
ctx: StagedShaclContext<'_>,
) -> Result<(), TransactError>
}
StagedShaclContext carries everything that varies between call sites:
| Field | Populated by JSON-LD txn | Populated by Turtle insert | Populated by commit replay |
|---|---|---|---|
graph_delta | Some(&txn.graph_delta) (IRIs) | None | Some(&routing.graph_iris) |
graph_sids | Some(&graph_sids) | None | Some(&routing.graph_sids) |
tracker | options.tracker | None | None |
Why not fold this into fluree-db-transact? Config resolution (three-tier merge, override control, per-graph lookup) is API-layer policy, not a staging primitive. Keeping the helper in tx.rs lets fluree-db-transact stay focused on staging mechanics.
Call sites:
fluree-db-api/src/tx.rs::stage_with_config_shacl(JSON-LD / SPARQL UPDATE txns)fluree-db-api/src/tx.rs::stage_turtle_insert(plain Turtle)fluree-db-api/src/commit_transfer.rs(push / replay)
Config resolution
Ledger-wide and per-graph policy
build_per_graph_shacl_policy(config, graph_delta) returns Option<HashMap<GraphId, ShaclGraphPolicy>>:
- Graphs absent from the map are disabled — their staged subjects are skipped by the validator.
ShaclGraphPolicy { mode: ValidationMode }controls warn vs reject for that graph.- The default graph (g_id=0) always gets the ledger-wide resolved policy when SHACL is enabled.
- Every graph in
graph_deltais resolved independently viaconfig_resolver::resolve_effective_config(config, Some(graph_iri)), which applies the three-tier merge (query-time → per-graph → ledger-wide) under override-control rules. - Returns
Nonewhen every graph resolves to disabled → the helper short-circuits before building the SHACL engine.
The transact layer’s validate_view_with_shacl signature:
#![allow(unused)]
fn main() {
pub async fn validate_view_with_shacl(
view: &StagedLedger,
shacl_cache: &ShaclCache,
graph_sids: Option<&HashMap<GraphId, Sid>>,
tracker: Option<&Tracker>,
per_graph_policy: Option<&HashMap<GraphId, ShaclGraphPolicy>>,
) -> Result<ShaclValidationOutcome>
}
per_graph_policy = None: treat every graph with staged flakes asReject(legacy / shapes-exist-heuristic path).per_graph_policy = Some(map): only graphs in the map participate; their mode drives the warn/reject split.
Output:
#![allow(unused)]
fn main() {
pub struct ShaclValidationOutcome {
pub reject_violations: Vec<ValidationResult>,
pub warn_violations: Vec<ValidationResult>,
}
}
The API helper logs the warn bucket and returns TransactError::ShaclViolation for the reject bucket.
f:shapesSource resolution
resolve_shapes_source_g_ids(config, snapshot) in tx.rs is the sibling of policy_builder::resolve_policy_source_g_ids — identical shape, different namespace. Both:
- Start with
[0](default graph) when the source field is unset. - Map
f:defaultGraph→[0]. - Map a named graph IRI to its registered
GraphIdviasnapshot.graph_registry.graph_id_for_iri. - Reject unsupported dimensions:
f:atT,f:trustPolicy,f:rollbackGuard, cross-ledgerf:ledger(these surface asTransactError::Parse).
f:shapesSource is authoritative, not additive — when set, shapes come exclusively from the configured graph(s). It’s intentionally non-overridable at query/txn time; it can only be changed via a config-graph transaction.
Shape compilation from multiple graphs
ShapeCompiler::compile_from_dbs(&[GraphDbRef]) in fluree-db-shacl/src/compile.rs scans each input graph for every SHACL predicate (see the shacl_predicates list), accumulates into a single ShapeCompiler, then finalizes. Cross-graph sh:and / sh:or / sh:xone / sh:in list references still resolve because finalization runs once after all graphs are consumed.
ShaclEngine::from_dbs_with_overlay(&[GraphDbRef], ledger_id) is the corresponding engine constructor. from_db_with_overlay(db, ledger_id) is a single-graph convenience that delegates to the multi-graph path via slice::from_ref(&db).
The engine’s SchemaHierarchy is taken from the first graph’s snapshot — hierarchy is schema-level and not graph-scoped.
Target-type resolution
The cache (fluree-db-shacl/src/cache.rs) holds four indexes:
| Field | Keyed by | Used for |
|---|---|---|
by_target_class | class Sid (with rdfs:subClassOf* expansion) | sh:targetClass |
by_target_node | subject Sid | sh:targetNode |
by_target_subjects_of | predicate Sid | sh:targetSubjectsOf |
by_target_objects_of | predicate Sid | sh:targetObjectsOf |
ShaclEngine::validate_node assembles applicable shapes for a focus node by:
shapes_for_node(focus)— O(1) hashmap hit.shapes_for_class(type)for each of the focus’srdf:typevalues — O(1) per type.- For each key
pinby_target_subjects_of: existence checkdb.range(SPOT, s=focus, p=p)— if non-empty, shape applies. - For each key
pinby_target_objects_of: existence checkdb.range(OPST, p=p, o=focus)— if non-empty, shape applies.
Why the live db check for steps 3/4 instead of precomputed staged-flake hints? Three scenarios a hint-only approach can’t cover:
- Base-state edge: the triggering edge is already indexed; the current txn only touches another property.
- Retraction-only: the staged flake set for a focus contains retractions that don’t remove the last matching edge.
- Cross-graph routing: a subject’s edge exists in graph A but we’re validating the subject in graph B — the per-graph db ref sees only B.
db.range() returns only post-state assertions (retractions are filtered in the range pipeline — see fluree-db-core/src/range.rs), so the check is exactly “is this edge present in the post-txn view of this graph”.
Cost is bounded by the number of predicate-targeted shapes in the cache, not by data size — typically 0–10 per ledger.
Staged validation loop
validate_staged_nodes in fluree-db-transact/src/stage.rs:
- Partition staged flakes into
subjects_by_graph: HashMap<GraphId, HashSet<Sid>>.- Every flake’s subject is added (including retractions — class/node targets still need to see them).
- Every assert flake’s Ref-object is also added to the graph’s focus set (ensures
sh:targetObjectsOfshapes fire on newly-referenced nodes).
- For each
(g_id, subjects):- If
enabled_graphsisSomeandg_idis not in it: skip. - Build a per-graph
GraphDbRefwithviewas overlay andview.staged_t()ast. - Attach the tracker (if any) — fuel accounting works for SHACL range scans too.
- For each subject: fetch
rdf:typeflakes, then callengine.validate_node(db, subject, &types). - Tag each returned
ValidationResultwithgraph_id = Some(g_id)so the caller can partition reject vs warn.
- If
RDFS subclass fallback (is_subclass_of)
When the indexed SchemaHierarchy doesn’t know about a rdfs:subClassOf edge (e.g. asserted in the same or a recent unindexed transaction), validate_class_constraint calls is_subclass_of(db, start, target) which walks rdfs:subClassOf upward via BFS.
Two invariants in that walk:
- Always scope to
g_id=0viarescope_to_schema_graph(db)— schema lives in the default graph, matching howSchemaHierarchy::from_db_root_schemais built. Subject may be in graph G but thesubClassOfedge must be looked up in the schema graph. - Preserve tracker + other
GraphDbReffields —rescope_to_schema_graphusesdbcopy +g_id = 0mutation rather thanGraphDbRef::new(..), which would resettracker,runtime_small_dicts, andeager. There’s a unit test pinning this (rescope_to_schema_graph_preserves_tracker_and_other_fields).
Adding a new constraint
1. Compile
In fluree-db-shacl/src/compile.rs:
- Add a variant to the
Constraintenum (orNodeConstraintfor node-level). - Add the predicate name to the
shacl_predicatesarray inShapeCompiler::compile_from_dbs. - Handle the predicate in
process_flake(sets the right field on the intermediate shape builder). - If the constraint takes arguments via an RDF list, extend
expand_rdf_lists.
2. Validate
Pure per-value constraints (no db access) go in fluree-db-shacl/src/constraints/:
- Add a
validate_<name>(values, ..) -> Option<ConstraintViolation>helper next to the similar ones incardinality.rs/value.rs/ etc. - Wire it into the big match in
validate_constraintinfluree-db-shacl/src/validate.rs.
Constraints that need database access (sh:class, pair constraints) are handled before the pure dispatch, inside validate_property_shape. Pattern:
#![allow(unused)]
fn main() {
Constraint::MyConstraint(target) => {
let helper_violations = validate_my_constraint(db, &values, target).await?;
for v in helper_violations {
results.push(ValidationResult {
focus_node: focus_node.clone(),
result_path: Some(prop_shape.path.clone()),
source_shape: parent_shape.id.clone(),
source_constraint: Some(prop_shape.id.clone()),
severity: prop_shape.severity,
message: v.message,
value: v.value,
graph_id: None, // tagged later in validate_staged_nodes
});
}
}
}
3. Advertise
Update fluree-db-shacl/src/lib.rs:
- Add the constraint to the Supported Constraints list.
- Remove from the Not Yet Supported section if it was listed.
4. Test
- Add a unit test next to your
validate_<name>helper for the pure logic. - Add an integration test in
fluree-db-api/src/shacl_tests.rsthat transacts a shape + violating data + valid data. - For a bug fix: temp-revert the fix, confirm the test fails, restore, confirm it passes. This pins the regression into the test.
Testing patterns
Integration tests
Most SHACL integration tests live in fluree-db-api/src/shacl_tests.rs and use the assert_shacl_violation(err, "substring") helper. Pattern:
#![allow(unused)]
fn main() {
let shape = json!({ /* sh:NodeShape with the constraint under test */ });
let ledger = fluree.create_ledger("shacl/foo:main").await.unwrap();
let ledger = fluree.upsert(ledger, &shape).await.unwrap().ledger;
// Negative case
let err = fluree.upsert(ledger, &violating_data).await.unwrap_err();
assert_shacl_violation(err, "expected message fragment");
// Positive case
fluree.upsert(ledger, &valid_data).await.expect("must pass");
}
Cross-graph / per-graph tests
See fluree-db-api/tests/it_config_graph.rs for patterns that write config via TriG into the config graph, then stage transactions across multiple graphs. Examples:
shacl_shapes_source_points_to_named_graph—f:shapesSourcewiring.shacl_per_graph_disable_honored— per-graphshaclEnabled: false.shacl_per_graph_mode_warn_vs_reject— mixed modes across graphs.shacl_target_subjects_of_fires_on_base_state_edge— base-state predicate-target discovery.
The temp-revert trick
For every correctness-fix PR, confirm the regression test actually covers the bug:
- Apply the minimum temp-revert in the production code (comment out the fix with a
// TEMP REVERT:marker). - Run the new test — it should fail with the expected symptom.
- Restore the fix — test passes.
- Commit the fix + the test together.
This is how we guard against tests that pass trivially but don’t actually exercise the fix.
Known gaps
sh:uniqueLang,sh:languageIn— parsed but not evaluated. Needs language-tag metadata on flakes, which isn’t yet threaded through the validation path.sh:qualifiedValueShape(+sh:qualifiedMinCount/sh:qualifiedMaxCount) — parsed but not evaluated. Needs recursive nested-shape counting.- Cross-transaction shape cache — every call to
from_dbs_with_overlayrecompiles from scratch.ShaclCacheKeyhas aschema_epochfield that’s ready to drive a sharedArc<ShaclCache>cache on the connection, but nothing populates it yet. Low priority until perf regressions are observed.
Where to look in the code
| What | File |
|---|---|
Shape compilation (Turtle/JSON-LD → CompiledShape) | fluree-db-shacl/src/compile.rs |
| Shape cache with target indexes | fluree-db-shacl/src/cache.rs |
| Per-focus validation engine | fluree-db-shacl/src/validate.rs |
| Per-constraint validators (pure values) | fluree-db-shacl/src/constraints/ |
| Staged-validation loop (per-graph) | fluree-db-transact/src/stage.rs::validate_staged_nodes |
| Public transact entry + outcome split | fluree-db-transact/src/stage.rs::validate_view_with_shacl |
Policy types (ShaclGraphPolicy, ShaclValidationOutcome) | fluree-db-transact/src/stage.rs |
| Shared post-stage helper | fluree-db-api/src/tx.rs::apply_shacl_policy_to_staged_view |
| Per-graph policy builder | fluree-db-api/src/tx.rs::build_per_graph_shacl_policy |
f:shapesSource resolver | fluree-db-api/src/tx.rs::resolve_shapes_source_g_ids |
| JSON-LD / SPARQL txn call site | fluree-db-api/src/tx.rs::stage_with_config_shacl |
| Turtle insert call site | fluree-db-api/src/tx.rs::stage_turtle_insert |
| Commit replay call site | fluree-db-api/src/commit_transfer.rs |
| Config field definition | fluree-db-core/src/ledger_config.rs::ShaclDefaults |
| Config graph parser | fluree-db-api/src/config_resolver.rs::read_shacl_defaults |
| Effective-config merge | fluree-db-api/src/config_resolver.rs::merge_shacl_opts |
Related
- Cookbook: SHACL Validation — user-facing usage guide
- Setting Groups — SHACL — config reference
- Override Control — three-tier precedence and monotonicity rules
- Crate map — layering overview
Adding Tracing Spans to New Code
When you add or modify code paths in Fluree, you should instrument them with tracing spans so that performance investigations can decompose wall-clock time into meaningful phases. This guide explains the conventions, patterns, and gotchas.
The Two-Tier Span Strategy
Fluree uses a tiered approach so that tracing is zero-overhead by default but deeply informative on demand.
The request span: info_span! (the one exception)
The HTTP request span in telemetry.rs::create_request_span() is the only info_span! in the codebase. It provides operators with HTTP request visibility at the production default RUST_LOG=info. All other operation spans are debug_span! — this guarantees true zero overhead when the otel feature is not compiled and RUST_LOG is at info.
Tier 1: debug_span! – operation and phase level
All operation spans (query_execute, transact_execute, txn_stage, txn_commit, index_build, sort_blocking, etc.) and their phases use debug_span!. They are visible when OTEL is enabled (the OTEL Targets filter registers interest at DEBUG for fluree_* crates) or when a developer sets RUST_LOG=debug or RUST_LOG=info,fluree_db_query=debug. Without either, debug_span! short-circuits to a single atomic load (~1-2ns, unmeasurable).
Tier 2: trace_span! – maximum detail
Per-operator, per-item, or per-iteration spans. Visible at RUST_LOG=info,fluree_db_query=trace. Use for fine-grained instrumentation in hot paths where you only want visibility during deep investigation. The OTEL Targets filter intentionally excludes TRACE to prevent flooding the batch processor.
Decision guide
| You’re adding… | Span level | Example |
|---|---|---|
| New top-level operation, phase, or operator | debug_span! | query_execute, reasoning_prep, join, binary_open_leaf |
| Detail or per-iteration instrumentation | trace_span! | group_by, distinct, binary_cursor_next_leaf |
Do not use info_span! for new operation spans. The request span is the sole exception.
Code Patterns
Sync phases (no .await)
Use span.enter() which creates a guard dropped at end of scope:
#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
"pattern_rewrite",
patterns_before = patterns.len() as u64,
patterns_after = tracing::field::Empty, // recorded later
);
let _guard = span.enter();
// ... do the rewriting ...
span.record("patterns_after", rewritten.len() as u64);
// _guard dropped here, span ends
}
Async phases (contains .await)
Never hold a span.enter() guard across an .await point. In tokio’s multi-threaded runtime, span.enter() enters the span on the current thread. When the task yields at .await, the span remains “entered” on that thread. Other tasks polled on the same thread will then inherit this span as their parent, causing cross-request trace contamination — completely unrelated operations become nested under each other in Jaeger. This was the root cause of a critical trace corruption bug in the HTTP route handlers.
Symptoms in Jaeger: If you see sequential, independent requests nested as children of an earlier request (especially where child spans outlive their parents), the cause is almost certainly span.enter() held across .await.
Instead, use .instrument(span):
#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
"format",
output_format = %format_name,
result_count = total_rows as u64,
);
format_results(batch, format).instrument(span).await
}
If you need to record deferred fields on a span that wraps an async block, use Span::current() inside the instrumented block:
#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
"txn_stage",
insert_count = tracing::field::Empty,
delete_count = tracing::field::Empty,
);
async {
// ... do staging work ...
let current = tracing::Span::current();
current.record("insert_count", inserts as u64);
current.record("delete_count", deletes as u64);
Ok(result)
}.instrument(span).await
}
HTTP route handlers (axum)
Route handlers are async and must use .instrument(). The standard pattern wraps the entire handler body in an async move block instrumented with the request span, then uses Span::current() inside:
#![allow(unused)]
fn main() {
pub async fn query(
State(state): State<Arc<AppState>>,
headers: FlureeHeaders,
) -> Result<impl IntoResponse> {
let span = create_request_span("query", request_id.as_deref(), ...);
async move {
let span = tracing::Span::current(); // Same span, safe to .record() on
tracing::info!(status = "start", "query request received");
let alias = get_ledger_alias(...)?;
span.record("ledger_alias", alias.as_str());
execute_query(&state, &alias, &query_json).await
}
.instrument(span)
.await
}
}
Why async move + Span::current() instead of just .instrument(): Route handlers need to record deferred fields (like ledger_alias, error_code) on the span after creation. By obtaining Span::current() inside the instrumented block, you get a handle to the same span that .instrument() entered, letting you call .record() and pass it to set_span_error_code().
spawn_blocking
For tokio::task::spawn_blocking, enter the span inside the closure:
#![allow(unused)]
fn main() {
let span = tracing::debug_span!("heavy_compute");
tokio::task::spawn_blocking(move || {
let _guard = span.enter();
// ... sync work ...
}).await
}
std::thread::scope (parallel OS threads)
std::thread::scope spawned threads do NOT inherit tracing span context from the parent thread. Capture the current span before spawning and enter it inside each closure:
#![allow(unused)]
fn main() {
let parent_span = tracing::Span::current();
std::thread::scope(|s| {
for item in &work_items {
let thread_span = parent_span.clone();
s.spawn(move || {
let _guard = thread_span.enter();
// ... work that creates child spans ...
});
}
});
}
This is safe because scoped threads are pure sync (no .await). The same pattern applies to any OS thread spawning (std::thread::spawn, rayon, etc.).
Lightweight operators (hot path)
For simple operators that just need a span marker, use the terse .entered() pattern:
#![allow(unused)]
fn main() {
fn open(&mut self, ctx: &mut Context<S, C>) -> Result<()> {
let _span = tracing::trace_span!("filter").entered();
self.child.open(ctx)?;
// ...
Ok(())
}
}
Deferred Fields
Declare fields as tracing::field::Empty at span creation, then record values later. This is essential for fields whose values aren’t known until the operation completes.
#![allow(unused)]
fn main() {
let span = tracing::debug_span!(
"plan",
pattern_count = tracing::field::Empty,
);
let _guard = span.enter();
let plan = build_plan(&patterns)?;
span.record("pattern_count", plan.patterns.len() as u64);
}
Gotcha: tracing::Span::current().record(...) records on the current innermost span. If you’ve entered a child span, .record() targets the child, not the parent. Get a handle to the parent span before entering children:
#![allow(unused)]
fn main() {
let parent_span = tracing::debug_span!("outer", total = tracing::field::Empty);
let _parent_guard = parent_span.enter();
{
let _child = tracing::trace_span!("inner").entered();
// Span::current() is now "inner", NOT "outer"
}
// Back to "outer" scope -- safe to record on parent
parent_span.record("total", count as u64);
}
#[tracing::instrument] vs Manual Spans
Use #[tracing::instrument] for simple functions where:
- You want span entry/exit to match the function boundary
- The function name is a good span name
- You don’t need deferred field recording
Always use skip_all and explicitly list fields:
#![allow(unused)]
fn main() {
#[tracing::instrument(level = "debug", name = "parse", skip_all, fields(input_format, input_bytes))]
fn parse_query(input: &[u8], format: &str) -> Result<Query> {
// ...
}
}
Use manual spans when:
- The span covers only part of a function
- You need a different name than the function
- You need deferred fields
- The function is a hot path (the
#[instrument]macro captures all arguments by default unless youskip_all)
Where to Add Spans
New query feature
If you add a new phase to query execution (e.g., a new optimization pass):
- Add a
debug_span!in the code path - Add the span name to the hierarchy in
docs/operations/telemetry.md - Add a test in
fluree-db-api/tests/it_tracing_spans.rsverifying the span emits
New operator
If you add a new query operator:
- For core structural operators (scan, join, filter, project, sort), use
debug_span!inopen() - For detail operators (group_by, distinct, limit, offset, etc.), use
trace_span!inopen() - If it’s a blocking/buffering operator (like sort), add a
debug_span!timing span innext_batch() - Add a test verifying the span emits at the correct level
For lower-level remote storage diagnostics on the binary path, prefer short-lived debug_span! blocks around:
- leaf-open strategy selection (
binary_open_leaf) - remote leaf metadata reads (
binary_fetch_header_dir) - individual range reads (
binary_range_fetch) - leaflet cache hit/miss points (
binary_load_leaflet)
These spans are intended for investigation of repeated remote I/O and cache effectiveness under query load.
New transaction phase
If you add a new phase to transaction processing:
- Add a
debug_span!in the phase code - Record relevant counts/sizes as deferred fields
- Add the span to the hierarchy in
docs/operations/telemetry.md
New background task
If you add a new background task (like indexing, garbage collection, compaction):
- Add a
debug_span!as the trace root (these are independent traces, not children of HTTP requests) - Add debug sub-spans for phases within the task
- Ensure the crate target is listed in the OTEL
Targetsfilter intelemetry.rs
Testing Spans
All new spans should have at least one test verifying they emit with expected fields at the right level.
Test utilities
The test infrastructure lives in fluree-db-api/tests/support/span_capture.rs:
#![allow(unused)]
fn main() {
mod support;
use support::span_capture;
#[tokio::test]
async fn my_new_span_emits_at_debug_level() {
let (store, _guard) = span_capture::init_test_tracing(); // captures ALL levels
// ... run the code that emits the span ...
assert!(store.has_span("my_new_phase"));
let span = store.find_span("my_new_phase").unwrap();
assert_eq!(span.level, tracing::Level::DEBUG);
assert!(span.fields.contains_key("some_field"));
}
#[tokio::test]
async fn my_new_span_not_visible_at_info() {
let (store, _guard) = span_capture::init_info_only_tracing(); // captures only INFO+
// ... run the code ...
assert!(!store.has_span("my_new_phase")); // zero noise at info
}
}
Test helpers available
span_capture::init_test_tracing()– captures all spans regardless of level (for verifying span existence)span_capture::init_info_only_tracing()– captures only INFO+ (for verifying zero-noise at default level)SpanStore::has_span(name)– check if a span was emittedSpanStore::find_span(name)– get span details (level, fields, parent)SpanStore::find_spans(name)– find all spans with a given nameSpanStore::span_names()– list all captured span names
Where to put tests
- Tracing integration tests go in
fluree-db-api/tests/it_tracing_spans.rs - The test utilities are in
fluree-db-api/tests/support/span_capture.rs
OTEL Layer Configuration
If you add a new crate that emits spans that should be exported via OTEL, add it to the Targets filter in fluree-db-server/src/telemetry.rs:
#![allow(unused)]
fn main() {
let otel_filter = Targets::new()
.with_target("fluree_db_server", Level::DEBUG)
.with_target("fluree_db_api", Level::DEBUG)
// ... existing targets ...
.with_target("my_new_crate", Level::DEBUG); // ADD THIS
}
Without this, spans from the new crate will appear in console logs but not in Jaeger/Tempo.
Important: All OTEL targets are set to DEBUG level. Do not set any target to TRACE in the OTEL filter — TRACE-level spans (e.g., binary_cursor_next_leaf, per-scan spans) can generate thousands of spans per query, overwhelming the BatchSpanProcessor queue and causing parent spans to be dropped. Users who need TRACE-level detail should use RUST_LOG for console output; the OTEL exporter intentionally excludes TRACE spans.
Checklist for New Instrumentation
- Used
debug_span!(notinfo_span!) for all new operation spans - Used
span.enter()only in sync code,.instrument(span)for async - Propagated span context into spawned threads (
spawn_blocking,std::thread::scope, etc.) - Added deferred fields for values computed after span creation
- Tested span emission with
SpanCaptureLayer - Verified zero overhead at INFO level (no debug/trace spans appear without OTEL or
RUST_LOG=debug) - Updated span hierarchy in
docs/operations/telemetry.mdif adding spans - Updated
.claude/skills/*/references/span-hierarchy.md(both copies) - Added new crate to OTEL
Targetsfilter if applicable
Common Gotchas
span.enter()across.awaitcauses cross-request contamination – This is the most dangerous tracing bug. In tokio’s multi-threaded runtime,span.enter()sets the span on the current thread. When the task suspends at.await, the span stays “entered” on that thread. Other tasks polled on the same thread inherit it as their parent. Result: unrelated requests cascade into each other’s traces in Jaeger, with child spans that outlive their parents. Always use.instrument(span)in async code. This was a real bug in the HTTP route handlers and took Jaeger analysis to identify.Span::current().record()targets the innermost span – not necessarily the one you intend. Hold a reference to the span you want to record on.- OTEL exporter floods – if you set
RUST_LOG=debugglobally, third-party crates (hyper, tonic, h2) emit debug spans that overwhelm the OTEL batch processor. TheTargetsfilter on the OTEL layer prevents this. - Tower-HTTP
TraceLayerremoved – tower-http’sTraceLayerwas removed entirely because it created a duplicaterequestspan that collided with Fluree’s ownrequestspan increate_request_span(). If you re-add tower-http tracing, ensure it does not conflict. set_global_defaultin tests – can only be called once per process. Useset_default()which returns a guard scoped to the test.- Compiler won’t catch
span.enter()across.await– Unlike what the tracing docs suggest,Enteredmay actually beSend(since&SpanisSendwhenSpan: Sync). The code compiles fine but produces incorrect traces at runtime. The only way to detect this is visual inspection in Jaeger. Grep forspan.enter()in async functions as part of code review. std::thread::scope/std::thread::spawndrops span context – New OS threads start with empty thread-local span context, so any spans created on them become orphaned root traces. You must captureSpan::current()and.enter()it inside the thread closure. This same issue applies totokio::task::spawn_blocking,rayon, and any other thread-spawning API.
Claude Code Trace Analysis Skills
Two Claude Code skills are available for analyzing Jaeger trace exports:
/trace-inspect
Drills into a single trace: span tree visualization, timing breakdown, structural health checks. Use when you have a specific slow request and want to understand where time went.
/trace-inspect path/to/traces.json
/trace-overview
Analyzes all traces in an export: aggregate statistics, anomaly detection across the corpus, comparison of query vs transaction patterns. Use when you want a high-level understanding of system behavior.
/trace-overview path/to/traces.json
Exporting traces from Jaeger
- Open Jaeger UI (default:
http://localhost:16686) - Search for traces of interest (by service name, operation, duration, etc.)
- Click the JSON download button on a trace or search result
- Save to a file and pass to either skill
See the OTEL dev harness for running a local Jaeger instance.
Related Documentation
- Performance Investigation – How operators use deep tracing
- Telemetry and Logging – Configuration reference
- Deep Tracing Playbook – Comprehensive implementation reference
Releasing
Cutting a release is two phases: a release PR that bumps the version, then a tag pushed after the PR merges. The tag triggers cargo-dist, which builds binaries and publishes the GitHub Release. Release notes are auto-generated by GitHub from merged PR titles since the previous tag.
The flow
# 1. Branch off main and bump the workspace version.
git checkout main && git pull
git checkout -b release/v4.0.3
$EDITOR Cargo.toml # update [workspace.package].version
cargo update --workspace # refresh Cargo.lock
git commit -am "release v4.0.3"
git push -u origin release/v4.0.3
gh pr create --title "release v4.0.3"
# 2. After the PR is reviewed and merged to main, tag the merge commit.
git checkout main && git pull
git tag v4.0.3
git push origin v4.0.3 # ← triggers .github/workflows/release.yml
Watch the Actions tab. cargo-dist builds platform binaries, creates the GitHub Release with auto-generated notes, publishes the Homebrew formula, and pushes the multi-arch Docker image.
Why the two-phase split
cargo-dist’s release workflow triggers on any pushed vX.Y.Z tag regardless of which branch the tag points at. Holding the tag step until after merge ensures every release goes through PR review.
How release notes are generated
.github/workflows/release.yml calls gh release create --generate-notes. GitHub builds the release body automatically from PRs merged since the previous tag, categorized per .github/release.yml:
| Label | Section |
|---|---|
breaking-change, semver:major | Breaking Changes |
feature, enhancement, feat | Features |
bug, fix | Bug Fixes |
performance, perf | Performance |
documentation, docs | Documentation |
| (anything else) | Other Changes |
Apply one of these labels to each PR before merging. Unlabeled PRs still appear under “Other Changes” with their full title and author credit, so categorization is nice-to-have, not required.
PR titles already follow the feat: / fix: / docs: convention from CLAUDE.md, which makes the unlabeled “Other Changes” list readable on its own.
Rolling back
During Phase 1, before pushing:
git checkout main
git branch -D release/vX.Y.Z
After the release PR is opened but before merge:
Close the PR and delete the branch on GitHub. Nothing has shipped.
After the tag is pushed but cargo-dist still running:
git push origin :refs/tags/vX.Y.Z # delete the remote tag
git tag -d vX.Y.Z # delete the local tag
Cancel the in-progress Release workflow run from the Actions tab.
After cargo-dist created the GitHub Release:
Delete the GitHub Release from the UI, then delete the tag (commands above). The merge commit on main stays in place — re-tag it once the issue is fixed, or supersede it with another release PR.
Configuration files
.github/release.yml— categorization for GitHub’s auto-generated release notes.dist-workspace.toml— cargo-dist’s distribution targets and installers..github/workflows/release.yml— autogenerated by cargo-dist (withallow-dirty = ["ci"]set so our--generate-notesedit survivesdist init).