R2RML (Relational to RDF Mapping)

R2RML (RDB to RDF Mapping Language) is a W3C standard for mapping tabular data into RDF triples. In Fluree, R2RML mappings are used to expose Iceberg tables as RDF graph sources, enabling you to query data lake tables using SPARQL or JSON-LD Query.

What is R2RML?

R2RML defines how to map:

Database tables to RDF classes
Table columns to RDF properties
Rows to RDF resources
Foreign keys to RDF relationships

In Fluree, this enables querying Iceberg tables as if they were RDF graphs.

Configuration

Create R2RML Graph Source (Iceberg-backed)

Use R2rmlCreateConfig to register a graph source that combines:

an Iceberg table (REST catalog or Direct S3), and
an R2RML mapping (Turtle) that materializes table rows into RDF triples.

If you use Direct S3 mode, Fluree resolves the current Iceberg metadata by reading metadata/version-hint.text under the configured table_location, then loading the metadata file referenced by the hint. The Iceberg table layout must already exist at that location.

#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, R2rmlCreateConfig};

let fluree = FlureeBuilder::default().build().await?;

let config = R2rmlCreateConfig::new_direct(
    "airlines-rdf",
    "s3://bucket/warehouse/openflights/airlines",
    "fluree:file://mappings/airlines.ttl",
)
.with_s3_region("us-east-1")
.with_s3_path_style(true)
.with_mapping_media_type("text/turtle");

fluree.create_r2rml_graph_source(config).await?;
}

R2RML Mapping

Basic Mapping

Map a table to RDF class:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .

<#CustomerMapping>
  a rr:TriplesMap ;
  
  rr:logicalTable [
    rr:tableName "customers"
  ] ;
  
  rr:subjectMap [
    rr:template "http://example.org/customer/{id}" ;
    rr:class schema:Person
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate schema:name ;
    rr:objectMap [ rr:column "name" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate schema:email ;
    rr:objectMap [ rr:column "email" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:customerId ;
    rr:objectMap [ rr:column "id" ]
  ] .

This maps the customers table:

CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255),
  email VARCHAR(255)
);

To RDF triples:

<http://example.org/customer/1>
  a schema:Person ;
  schema:name "Alice" ;
  schema:email "alice@example.org" ;
  ex:customerId "1" .

Foreign Key Mapping

Map relationships:

<#OrderMapping>
  a rr:TriplesMap ;
  
  rr:logicalTable [
    rr:tableName "orders"
  ] ;
  
  rr:subjectMap [
    rr:template "http://example.org/order/{id}" ;
    rr:class ex:Order
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:orderId ;
    rr:objectMap [ rr:column "id" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:customer ;
    rr:objectMap [
      rr:parentTriplesMap <#CustomerMapping> ;
      rr:joinCondition [
        rr:child "customer_id" ;
        rr:parent "id"
      ]
    ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:total ;
    rr:objectMap [ rr:column "total" ]
  ] .

Maps foreign key customer_id to RDF object property linking to customer resource.

Complex Queries

Use SQL views for complex mappings:

<#SalesReportMapping>
  a rr:TriplesMap ;
  
  rr:logicalTable [
    rr:sqlQuery """
      SELECT
        c.id as customer_id,
        c.name as customer_name,
        SUM(o.total) as total_spent,
        COUNT(o.id) as order_count
      FROM customers c
      JOIN orders o ON o.customer_id = c.id
      WHERE o.order_date >= '2024-01-01'
      GROUP BY c.id, c.name
    """
  ] ;
  
  rr:subjectMap [
    rr:template "http://example.org/customer/{customer_id}" ;
    rr:class ex:Customer
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate schema:name ;
    rr:objectMap [ rr:column "customer_name" ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:totalSpent ;
    rr:objectMap [ rr:column "total_spent" ; rr:datatype xsd:decimal ]
  ] ;
  
  rr:predicateObjectMap [
    rr:predicate ex:orderCount ;
    rr:objectMap [ rr:column "order_count" ; rr:datatype xsd:integer ]
  ] .

Querying R2RML Graph Sources

R2RML graph sources are queried using standard SPARQL and JSON-LD query syntax — no special query language is needed. In the Rust API, graph source resolution is wired into the lazy query builders:

fluree.graph("my-gs:main").query() for a single target that may be either a native ledger or a mapped graph source
fluree.query_from() when the query body specifies the dataset ("from" / FROM) or combines multiple sources

The raw materialized snapshot path (fluree.db(&alias) → fluree.query(&view, ...)) is still the wrong abstraction for graph source aliases because it assumes a native ledger snapshot has already been loaded.

Graph sources can be:

Queried directly as the target: fluree query my-gs 'SELECT * WHERE { ?s ?p ?o }'
Referenced in FROM clauses: SELECT * FROM <my-gs:main> WHERE { ... }
Referenced in GRAPH patterns: SELECT * WHERE { GRAPH <my-gs:main> { ... } } (useful for joining with ledger data)

Basic Query

{
  "@context": {
    "schema": "http://schema.org/",
    "ex": "http://example.org/ns/"
  },
  "from": "warehouse-customers:main",
  "select": ["?name", "?email"],
  "where": [
    { "@id": "?customer", "@type": "schema:Person" },
    { "@id": "?customer", "schema:name": "?name" },
    { "@id": "?customer", "schema:email": "?email" }
  ]
}

The mapping controls how subjects and predicate/object values are produced from the scanned table columns.

SPARQL Query

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>

SELECT ?name ?email
FROM <warehouse-customers:main>
WHERE {
  ?customer a schema:Person .
  ?customer schema:name ?name .
  ?customer schema:email ?email .
}

Filters

{
  "from": "warehouse-customers:main",
  "select": ["?name", "?email"],
  "where": [
    { "@id": "?customer", "schema:name": "?name" },
    { "@id": "?customer", "schema:email": "?email" },
    { "@id": "?customer", "ex:status": "?status" }
  ],
  "filter": "?status == 'active'"
}

Joins

{
  "from": "warehouse-orders:main",
  "select": ["?customerName", "?orderTotal"],
  "where": [
    { "@id": "?customer", "schema:name": "?customerName" },
    { "@id": "?order", "ex:customer": "?customer" },
    { "@id": "?order", "ex:total": "?orderTotal" }
  ]
}

Combining with Fluree Data

Join Iceberg data with Fluree ledgers:

{
  "from": ["products:main", "warehouse-inventory:main"],
  "select": ["?productName", "?stockLevel"],
  "where": [
    { "@id": "?product", "schema:name": "?productName" },
    { "@id": "?product", "ex:sku": "?sku" },
    { "@id": "?inventory", "ex:sku": "?sku" },
    { "@id": "?inventory", "ex:stockLevel": "?stockLevel" }
  ]
}

Combines product data from Fluree with inventory from an Iceberg-backed R2RML graph source.

Performance

R2RML graph sources execute by scanning the underlying Iceberg table and materializing RDF terms according to the mapping.

Best Practices

Filter Early: Filters are pushed down to Iceberg for partition pruning.
```
{
  "where": [...],
  "filter": "?date >= '2024-01-01'"
}
```
Limit Results:
```
{
  "where": [...],
  "limit": 100
}
```
Project Only Needed Columns: Only columns referenced in the query and mapping are read from Parquet files.
Partition by Common Filters: Partition your Iceberg tables by columns frequently used in filters (e.g., date).

Bound-subject key pushdown

A query that names a specific subject IRI — e.g. <http://example.org/store/5> ex:name ?n — can be answered without scanning the whole table. Fluree reverses the subject template (rr:subjectMap rr:template "http://example.org/store/{store_key}") to recover the key value (store_key = 5) and pushes it to the Iceberg scan as an equality predicate, so the reader can prune to the matching rows. The mapping’s subject equality is always re-checked in-engine, so this only ever affects which rows the scan returns, never the result.

This is on by default. It shares the Iceberg predicate-pushdown kill-switch, FLUREE_ICEBERG_PREDICATE_PUSHDOWN=0 (see Configuration); with pushdown off, bound-subject queries are still answered correctly — just by scanning and filtering in-engine instead of pruning.

The percent-encoding contract. R2RML builds subject IRIs by substituting column values into the template and percent-encoding each value per RFC 3987 (letters, digits, and - . _ ~ ! $ & ' ( ) * + , ; = : @ are left literal; everything else — including /, #, ?, space, and non-ASCII — becomes %XX). Reversal is the exact inverse. For the pushdown to engage, a query’s subject IRI must use this same encoding. For example, a store_key of west/5 is stored in the IRI as http://example.org/store/west%2F5 — query it with the %2F, not a literal /.

If a query IRI is encoded differently, it simply matches no generated subject — and it does so identically whether pushdown is on or off (both return no rows). A mis-encoded IRI is therefore a query-authoring issue, never a case where pushdown and a full scan disagree.

What is pushed. Pushdown engages only for templates that reverse unambiguously and key columns whose physical type is supported:

Template shape: a single trailing placeholder (.../{key}) always qualifies; multi-placeholder templates (.../{a}/{b}) qualify only when the separators between placeholders are characters that are always percent-encoded (like /). Ambiguous shapes such as .../{a};{b} (; is left literal) are skipped and fall back to a full scan.
Key column type: string keys, and integer-valued keys on int, long, or decimal columns. A decimal column of any scale qualifies — the recovered key is pushed as an integer literal (the Arrow reader casts it to the column’s decimal type), and a key that is not integer-valued fails the parse and falls back to a full scan. Other physical types (dates, floats) are not pushed yet. The pushdown never affects correctness regardless: the operator always re-enforces the subject equality.

Use Cases

Data Lake Analytics

Query Iceberg tables containing large-scale analytical data alongside Fluree ledgers:

{
  "from": ["products:main", "warehouse-sales:main"],
  "select": ["?productName", "?totalSold"],
  "where": [
    { "@id": "?product", "schema:name": "?productName" },
    { "@id": "?product", "ex:productId": "?pid" },
    { "@id": "?sale", "ex:productId": "?pid" },
    { "@id": "?sale", "ex:quantity": "?totalSold" }
  ]
}

Multi-Table Mapping

A single R2RML mapping file can define multiple TriplesMap entries, each targeting a different Iceberg table or logical view. This enables querying across related tables through a single graph source.

Limitations

Read-Only: R2RML graph sources are read-only (no writes via Fluree)
Performance: Complex joins across Fluree + Iceberg may be slow
Schema Changes: Requires mapping updates when referenced columns change

Troubleshooting

Connection Errors

{
  "error": "IcebergConnectionError",
  "message": "Cannot load table metadata"
}

Solutions:

Check catalog configuration (REST vs Direct)
Verify AWS credentials and S3 access
Verify version-hint.text is present for Direct mode

Mapping Errors

{
  "error": "R2RMLMappingError",
  "message": "Invalid R2RML mapping: table 'customers' not found"
}

Solutions:

Verify table name / location
Check referenced column names in the mapping
Validate R2RML syntax (Turtle)

Slow Queries

Causes:

Large result sets (many Parquet files scanned)
No partition pruning
Complex joins across Fluree + Iceberg

Solutions:

Add date/partition filters to enable Iceberg partition pruning
Use LIMIT clause
Optimize R2RML mapping to project only needed columns
Partition Iceberg tables by common filter columns

Graph Sources Overview - Graph source concepts
Iceberg - Data lake integration
Query Datasets - Multi-graph queries

Keyboard shortcuts

Fluree DB