Indexing and Search

Fluree provides powerful indexing and search capabilities beyond standard graph queries. This section covers background indexing, full-text search, and vector similarity search.

Index Types

Background Indexing

Core database indexing for query performance:

SPOT, POST, OPST, PSOT indexes
Automatic index maintenance
Indexing configuration
Performance tuning
Monitoring and metrics

Reindex API

Manual index rebuilding for recovery and maintenance:

Memory-bounded batched processing
Checkpointing for resumable operations
Progress monitoring with callbacks
Resume after interruption
Index configuration options

Inline Fulltext Search

Inline BM25-ranked text scoring. Two entry points, same query surface:

@fulltext datatype — per-value annotation (analogous to @vector), always English, zero config
f:fullTextDefaults config — declare properties + language once at the ledger level; supports 18 languages with Snowball stemming and per-graph overrides for multilingual setups
fulltext(?var, "query") scoring function in bind expressions (same for both paths)
Automatic per-(graph, property, language) fulltext arena construction during background indexing
Unified scoring across indexed and novelty documents
Works immediately (no-index fallback) with optimal performance after indexing

BM25 Full-Text Search

Dedicated full-text search indexes using BM25 ranking (for large-scale corpora):

Creating BM25 indexes via Rust API
Query-based field selection (indexing query defines what to index)
BM25 scoring with configurable k1/b parameters
Block-Max WAND for efficient top-k queries
Incremental index updates via property-dependency tracking

Vector Search

Approximate nearest neighbor (ANN) search for embeddings:

Vector index configuration
Embedded HNSW indexes (in-process) or remote via dedicated search service
Embedding storage with @vector datatype (resolves to https://ns.flur.ee/db#embeddingVector)
Similarity queries via f:* syntax
Deployment modes (embedded / remote)
Use cases (semantic search, recommendations)

Geospatial

Geographic point data with native binary encoding:

geo:wktLiteral datatype support (OGC GeoSPARQL)
Automatic POINT geometry detection and optimization
Packed 60-bit lat/lng encoding (~0.3mm precision)
Foundation for proximity queries (latitude-band index scans)

Indexing Architecture

Fluree maintains multiple index types for different query patterns:

Core Indexes (automatic):

SPOT: Subject-Predicate-Object-Time
POST: Predicate-Object-Subject-Time
OPST: Object-Predicate-Subject-Time
PSOT: Predicate-Subject-Object-Time

Graph Source Indexes (explicit):

BM25: Full-text search indexes
Vector: Embedding similarity indexes
R2RML: Relational database views
Iceberg: Data lake integrations

Background Indexing

Core database indexing happens automatically:

Transaction → Commit → Background Indexer → Index Published

Process:

Transaction committed (t assigned)
Commit published to nameservice
Background indexer detects new commit
Indexes updated (SPOT, POST, OPST, PSOT)
Index snapshot published

Novelty Layer:

Gap between latest commit and latest index
Queries combine indexed data + novelty
Monitored via commit_t - index_t

See Background Indexing for details.

Inline Fulltext Search

For small-to-medium corpora (up to hundreds of thousands of documents per predicate), inline fulltext search provides BM25-ranked scoring with zero configuration:

Annotate data:

{
  "@id": "ex:article-1",
  "ex:content": {
    "@value": "Rust is a systems programming language focused on safety",
    "@type": "@fulltext"
  }
}

Query with scoring:

{
  "select": ["?title", "?score"],
  "where": [
    { "@id": "?doc", "ex:content": "?content", "ex:title": "?title" },
    ["bind", "?score", "(fulltext ?content \"Rust programming\")"],
    ["filter", "(> ?score 0)"]
  ],
  "orderBy": [["desc", "?score"]],
  "limit": 10
}

See Inline Fulltext Search for details.

Full-Text Search (BM25 Graph Source)

For larger corpora (1M+ documents) with strict latency requirements, the BM25 graph source pipeline provides WAND-based top-k pruning, chunked posting lists, and incremental updates:

BM25 provides ranked full-text search:

Creating Index (Rust API):

#![allow(unused)]
fn main() {
use fluree_db_api::Bm25CreateConfig;
use serde_json::json;

let query = json!({
    "@context": { "schema": "http://schema.org/" },
    "where": [{ "@id": "?x", "@type": "schema:Product", "schema:name": "?name" }],
    "select": { "?x": ["@id", "schema:name", "schema:description"] }
});
let config = Bm25CreateConfig::new("products-search", "mydb:main", query);
let result = fluree.create_full_text_index(config).await?;
}

There are no HTTP endpoints for index management yet — indexes are managed via the Rust API.

Searching:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "mydb:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop computer",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ],
  "orderBy": ["-?score"]
}

See BM25 for details.

Vector Search

Similarity search using vector embeddings via HNSW indexes (embedded or remote).

Important: Embeddings must be stored with the vector datatype (@type: "@vector", @type: "f:embeddingVector", or full IRI https://ns.flur.ee/db#embeddingVector) to preserve array structure.

Creating Index (Rust API):

#![allow(unused)]
fn main() {
let config = VectorCreateConfig::new(
    "products-vector", "mydb:main", query, "ex:embedding", 384
);
fluree.create_vector_index(config).await?;
}

Searching:

{
  "from": "mydb:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": [0.1, 0.2, ..., 0.9],
      "f:searchLimit": 10,
      "f:searchResult": {
        "f:resultId": "?product",
        "f:resultScore": "?score"
      }
    }
  ]
}

See Vector Search for details.

Index as Graph Sources

Search indexes are exposed as graph sources:

Graph Source Names:

products-search:main - BM25 index
products-vector:main - Vector index

Query Like Regular Ledgers:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "mydb:main",
  "select": ["?product", "?name", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    },
    { "@id": "?product", "schema:name": "?name" }
  ]
}

Combines structured data with search results via the f:graphSource pattern.

#![allow(unused)]
fn main() {
// Incremental sync (detects changes since last watermark)
let result = fluree.sync_bm25_index("products-search:main").await?;

// Or use the Bm25MaintenanceWorker for automatic background syncing
}

The Bm25MaintenanceWorker can be configured to watch for ledger commits and sync automatically.

Deleting Indexes

#![allow(unused)]
fn main() {
let result = fluree.drop_full_text_index("products-search:main").await?;
}

Performance Characteristics

Inline Fulltext Search

Indexed throughput: ~625,000 docs/sec (50K paragraph-length docs in 80ms)
Novelty throughput: ~85,000 docs/sec (50K docs in ~600ms, no index required)
Indexed speedup: 7-7.5x faster than novelty-only
Scaling: Near-linear; ~625K docs within a 1-second query budget
Arena build: Adds minimal overhead to the normal binary index build

BM25 Search

Index Build Time: O(n) for n documents
Top-k Query Time: Sub-linear via Block-Max WAND — skips posting list segments that cannot contribute to the top-k, with early termination. Falls back to O(total matching postings) when k approaches corpus size.
Space: ~2-3x document size
Updates: Incremental via property-dependency tracking, O(changed docs)

Vector Search

Flat scan (inline functions): O(n) brute-force, viable up to ~100K vectors with binary indexing; binary index provides ~6x speedup over novelty-only scans and ~25x for filtered queries
HNSW index: O(log n) approximate nearest neighbor, recommended for 100K+ vectors or strict latency requirements
Space: ~1.5x embedding size
Updates: Incremental, O(1) per vector
See Vector Search – Performance and Scaling for benchmark data and guidance on when to adopt HNSW

Combined Queries

Combine search with graph queries:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "mydb:main",
  "select": ["?product", "?category"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product" }
    },
    { "@id": "?product", "schema:category": "?category" }
  ]
}

Query optimizer handles joins between the search graph source and structured data efficiently.

Use Cases

Full-Text Search

E-commerce Product Search:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "select": ["?product", "?score"],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "wireless headphones",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
    }
  ],
  "orderBy": ["-?score"]
}

Document Management:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "documents:main",
  "where": [
    {
      "f:graphSource": "documents-search:main",
      "f:searchText": "quarterly report 2024",
      "f:searchLimit": 20,
      "f:searchResult": { "f:resultId": "?doc" }
    },
    { "@id": "?doc", "ex:department": "finance" }
  ]
}

Vector Similarity

Semantic Search:

{
  "from": "articles:main",
  "values": [
    ["?queryVec"],
    [{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
  ],
  "where": [
    {
      "f:graphSource": "articles-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 10,
      "f:searchResult": {
        "f:resultId": "?article",
        "f:resultScore": "?vecScore"
      }
    }
  ],
  "select": ["?article", "?vecScore"],
  "orderBy": [["desc", "?vecScore"]]
}

Recommendation Engine:

{
  "from": "products:main",
  "where": [
    {
      "@id": "ex:product-123",
      "ex:embedding": "?queryVec"
    },
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 5,
      "f:searchResult": { "f:resultId": "?similar", "f:resultScore": "?vecScore" }
    }
  ],
  "select": ["?similar", "?vecScore"],
  "orderBy": [["desc", "?vecScore"]]
}

Hybrid Search

Combine text and vector search:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "products:main",
  "values": [
    ["?queryVec"],
    [{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
  ],
  "where": [
    {
      "f:graphSource": "products-search:main",
      "f:searchText": "laptop",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
    },
    {
      "f:graphSource": "products-vector:main",
      "f:queryVector": "?queryVec",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vecScore" }
    }
  ],
  "bind": {
    "?finalScore": "(?textScore * 0.6) + (?vecScore * 0.4)"
  },
  "orderBy": ["-?finalScore"]
}

Monitoring

Check BM25 Staleness

Check whether a BM25 index is behind its source ledger:

#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("products-search:main").await?;
println!("Index at t={}, ledger at t={}, stale: {}, lag: {}",
    check.index_t, check.ledger_t, check.is_stale, check.lag);
}

Background Maintenance

The Bm25MaintenanceWorker watches for source ledger commits and syncs indexes automatically:

Debounces rapid commits (configurable interval)
Bounded concurrency for concurrent sync operations
Registers/unregisters graph sources dynamically

Best Practices

1. Choose Appropriate Index Type

Structured queries: Use core graph indexes
Keyword search (< 500K docs): Use inline @fulltext for zero-config BM25 scoring
Keyword search (1M+ docs): Use the BM25 graph source for WAND-optimized top-k retrieval
Semantic similarity: Use vector search
Hybrid: Combine multiple indexes

2. Tune BM25 Parameters

Adjust k1 and b for your corpus:

#![allow(unused)]
fn main() {
let config = Bm25CreateConfig::new("search", "docs:main", query)
    .with_k1(1.5)  // Higher = more weight to term frequency (default: 1.2)
    .with_b(0.5);   // Lower = less document length normalization (default: 0.75)
}

The indexing query controls which properties are indexed — all selected text properties contribute to the document’s searchable content.

3. Monitor Index Staleness

Check staleness after bulk operations:

#![allow(unused)]
fn main() {
let check = fluree.check_bm25_staleness("search:main").await?;
if check.is_stale {
    fluree.sync_bm25_index("search:main").await?;
}
}

4. Sync After Bulk Updates

BM25 indexes require explicit sync. After bulk inserts, sync once at the end:

#![allow(unused)]
fn main() {
// Insert many documents...
for batch in batches {
    fluree.insert(ledger.clone(), &batch).await?;
}
// Sync the BM25 index once after all inserts
fluree.sync_bm25_index("products-search:main").await?;
}

5. Use Appropriate Limits

Limit results for performance:

{
  "@context": {"f": "https://ns.flur.ee/db#"},
  "from": "docs:main",
  "where": [
    {
      "f:graphSource": "docs-search:main",
      "f:searchText": "search query",
      "f:searchLimit": 100,
      "f:searchResult": { "f:resultId": "?doc" }
    }
  ]
}

Background Indexing - Core index details
Inline Fulltext Search - @fulltext datatype and fulltext() scoring
BM25 - Dedicated full-text search graph source
Vector Search - Similarity search
Graph Sources - Graph source concepts
Query - Query syntax

Fluree DB

Indexing and Search

Index Types

Background Indexing

Reindex API

Inline Fulltext Search

BM25 Full-Text Search

Vector Search

Geospatial

Indexing Architecture

Background Indexing

Inline Fulltext Search

Full-Text Search (BM25 Graph Source)

Vector Search

Index as Graph Sources

Index Management

Creating Indexes

Updating Indexes

Deleting Indexes

Performance Characteristics

Inline Fulltext Search

BM25 Search

Vector Search

Combined Queries

Use Cases

Full-Text Search

Vector Similarity

Hybrid Search

Monitoring

Check BM25 Staleness

Background Maintenance

Best Practices

1. Choose Appropriate Index Type

2. Tune BM25 Parameters

3. Monitor Index Staleness

4. Sync After Bulk Updates

5. Use Appropriate Limits

Keyboard shortcuts

Fluree DB