R2RML (Relational to RDF Mapping)
R2RML (RDB to RDF Mapping Language) is a W3C standard for mapping tabular data into RDF triples. In Fluree, R2RML mappings are used to expose Iceberg tables as RDF graph sources, enabling you to query data lake tables using SPARQL or JSON-LD Query.
What is R2RML?
R2RML defines how to map:
- Database tables to RDF classes
- Table columns to RDF properties
- Rows to RDF resources
- Foreign keys to RDF relationships
In Fluree, this enables querying Iceberg tables as if they were RDF graphs.
Configuration
Create R2RML Graph Source (Iceberg-backed)
Use R2rmlCreateConfig to register a graph source that combines:
- an Iceberg table (REST catalog or Direct S3), and
- an R2RML mapping (Turtle) that materializes table rows into RDF triples.
If you use Direct S3 mode, Fluree resolves the current Iceberg metadata by reading metadata/version-hint.text under the configured table_location, then loading the metadata file referenced by the hint. The Iceberg table layout must already exist at that location.
#![allow(unused)]
fn main() {
use fluree_db_api::{FlureeBuilder, R2rmlCreateConfig};
let fluree = FlureeBuilder::default().build().await?;
let config = R2rmlCreateConfig::new_direct(
"airlines-rdf",
"s3://bucket/warehouse/openflights/airlines",
"fluree:file://mappings/airlines.ttl",
)
.with_s3_region("us-east-1")
.with_s3_path_style(true)
.with_mapping_media_type("text/turtle");
fluree.create_r2rml_graph_source(config).await?;
}
R2RML Mapping
Basic Mapping
Map a table to RDF class:
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
<#CustomerMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:tableName "customers"
] ;
rr:subjectMap [
rr:template "http://example.org/customer/{id}" ;
rr:class schema:Person
] ;
rr:predicateObjectMap [
rr:predicate schema:name ;
rr:objectMap [ rr:column "name" ]
] ;
rr:predicateObjectMap [
rr:predicate schema:email ;
rr:objectMap [ rr:column "email" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:customerId ;
rr:objectMap [ rr:column "id" ]
] .
This maps the customers table:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
To RDF triples:
<http://example.org/customer/1>
a schema:Person ;
schema:name "Alice" ;
schema:email "alice@example.org" ;
ex:customerId "1" .
Foreign Key Mapping
Map relationships:
<#OrderMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:tableName "orders"
] ;
rr:subjectMap [
rr:template "http://example.org/order/{id}" ;
rr:class ex:Order
] ;
rr:predicateObjectMap [
rr:predicate ex:orderId ;
rr:objectMap [ rr:column "id" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:customer ;
rr:objectMap [
rr:parentTriplesMap <#CustomerMapping> ;
rr:joinCondition [
rr:child "customer_id" ;
rr:parent "id"
]
]
] ;
rr:predicateObjectMap [
rr:predicate ex:total ;
rr:objectMap [ rr:column "total" ]
] .
Maps foreign key customer_id to RDF object property linking to customer resource.
Complex Queries
Use SQL views for complex mappings:
<#SalesReportMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:sqlQuery """
SELECT
c.id as customer_id,
c.name as customer_name,
SUM(o.total) as total_spent,
COUNT(o.id) as order_count
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
"""
] ;
rr:subjectMap [
rr:template "http://example.org/customer/{customer_id}" ;
rr:class ex:Customer
] ;
rr:predicateObjectMap [
rr:predicate schema:name ;
rr:objectMap [ rr:column "customer_name" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:totalSpent ;
rr:objectMap [ rr:column "total_spent" ; rr:datatype xsd:decimal ]
] ;
rr:predicateObjectMap [
rr:predicate ex:orderCount ;
rr:objectMap [ rr:column "order_count" ; rr:datatype xsd:integer ]
] .
Querying R2RML Graph Sources
R2RML graph sources are queried using standard SPARQL and JSON-LD query syntax — no special query language is needed. In the Rust API, graph source resolution is wired into the lazy query builders:
fluree.graph("my-gs:main").query()for a single target that may be either a native ledger or a mapped graph sourcefluree.query_from()when the query body specifies the dataset ("from"/FROM) or combines multiple sources
The raw materialized snapshot path (fluree.db(&alias) → fluree.query(&view, ...)) is still the wrong abstraction for graph source aliases because it assumes a native ledger snapshot has already been loaded.
Graph sources can be:
- Queried directly as the target:
fluree query my-gs 'SELECT * WHERE { ?s ?p ?o }' - Referenced in FROM clauses:
SELECT * FROM <my-gs:main> WHERE { ... } - Referenced in GRAPH patterns:
SELECT * WHERE { GRAPH <my-gs:main> { ... } }(useful for joining with ledger data)
Basic Query
{
"@context": {
"schema": "http://schema.org/",
"ex": "http://example.org/ns/"
},
"from": "warehouse-customers:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?customer", "@type": "schema:Person" },
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "schema:email": "?email" }
]
}
The mapping controls how subjects and predicate/object values are produced from the scanned table columns.
SPARQL Query
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?email
FROM <warehouse-customers:main>
WHERE {
?customer a schema:Person .
?customer schema:name ?name .
?customer schema:email ?email .
}
Filters
{
"from": "warehouse-customers:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "schema:email": "?email" },
{ "@id": "?customer", "ex:status": "?status" }
],
"filter": "?status == 'active'"
}
Joins
{
"from": "warehouse-orders:main",
"select": ["?customerName", "?orderTotal"],
"where": [
{ "@id": "?customer", "schema:name": "?customerName" },
{ "@id": "?order", "ex:customer": "?customer" },
{ "@id": "?order", "ex:total": "?orderTotal" }
]
}
Combining with Fluree Data
Join Iceberg data with Fluree ledgers:
{
"from": ["products:main", "warehouse-inventory:main"],
"select": ["?productName", "?stockLevel"],
"where": [
{ "@id": "?product", "schema:name": "?productName" },
{ "@id": "?product", "ex:sku": "?sku" },
{ "@id": "?inventory", "ex:sku": "?sku" },
{ "@id": "?inventory", "ex:stockLevel": "?stockLevel" }
]
}
Combines product data from Fluree with inventory from an Iceberg-backed R2RML graph source.
Performance
R2RML graph sources execute by scanning the underlying Iceberg table and materializing RDF terms according to the mapping.
Best Practices
-
Filter Early: Filters are pushed down to Iceberg for partition pruning.
{ "where": [...], "filter": "?date >= '2024-01-01'" } -
Limit Results:
{ "where": [...], "limit": 100 } -
Project Only Needed Columns: Only columns referenced in the query and mapping are read from Parquet files.
-
Partition by Common Filters: Partition your Iceberg tables by columns frequently used in filters (e.g., date).
Bound-subject key pushdown
A query that names a specific subject IRI — e.g. <http://example.org/store/5> ex:name ?n — can be answered without scanning the whole table. Fluree reverses the subject template (rr:subjectMap rr:template "http://example.org/store/{store_key}") to recover the key value (store_key = 5) and pushes it to the Iceberg scan as an equality predicate, so the reader can prune to the matching rows. The mapping’s subject equality is always re-checked in-engine, so this only ever affects which rows the scan returns, never the result.
This is on by default. It shares the Iceberg predicate-pushdown kill-switch, FLUREE_ICEBERG_PREDICATE_PUSHDOWN=0 (see Configuration); with pushdown off, bound-subject queries are still answered correctly — just by scanning and filtering in-engine instead of pruning.
The percent-encoding contract. R2RML builds subject IRIs by substituting column values into the template and percent-encoding each value per RFC 3987 (letters, digits, and - . _ ~ ! $ & ' ( ) * + , ; = : @ are left literal; everything else — including /, #, ?, space, and non-ASCII — becomes %XX). Reversal is the exact inverse. For the pushdown to engage, a query’s subject IRI must use this same encoding. For example, a store_key of west/5 is stored in the IRI as http://example.org/store/west%2F5 — query it with the %2F, not a literal /.
If a query IRI is encoded differently, it simply matches no generated subject — and it does so identically whether pushdown is on or off (both return no rows). A mis-encoded IRI is therefore a query-authoring issue, never a case where pushdown and a full scan disagree.
What is pushed. Pushdown engages only for templates that reverse unambiguously and key columns whose physical type is supported:
- Template shape: a single trailing placeholder (
.../{key}) always qualifies; multi-placeholder templates (.../{a}/{b}) qualify only when the separators between placeholders are characters that are always percent-encoded (like/). Ambiguous shapes such as.../{a};{b}(;is left literal) are skipped and fall back to a full scan. - Key column type:
stringkeys, and integer-valued keys onint,long, ordecimalcolumns. Adecimalcolumn of any scale qualifies — the recovered key is pushed as an integer literal (the Arrow reader casts it to the column’s decimal type), and a key that is not integer-valued fails the parse and falls back to a full scan. Other physical types (dates, floats) are not pushed yet. The pushdown never affects correctness regardless: the operator always re-enforces the subject equality.
Use Cases
Data Lake Analytics
Query Iceberg tables containing large-scale analytical data alongside Fluree ledgers:
{
"from": ["products:main", "warehouse-sales:main"],
"select": ["?productName", "?totalSold"],
"where": [
{ "@id": "?product", "schema:name": "?productName" },
{ "@id": "?product", "ex:productId": "?pid" },
{ "@id": "?sale", "ex:productId": "?pid" },
{ "@id": "?sale", "ex:quantity": "?totalSold" }
]
}
Multi-Table Mapping
A single R2RML mapping file can define multiple TriplesMap entries, each targeting a different Iceberg table or logical view. This enables querying across related tables through a single graph source.
Limitations
- Read-Only: R2RML graph sources are read-only (no writes via Fluree)
- Performance: Complex joins across Fluree + Iceberg may be slow
- Schema Changes: Requires mapping updates when referenced columns change
Troubleshooting
Connection Errors
{
"error": "IcebergConnectionError",
"message": "Cannot load table metadata"
}
Solutions:
- Check catalog configuration (REST vs Direct)
- Verify AWS credentials and S3 access
- Verify
version-hint.textis present for Direct mode
Mapping Errors
{
"error": "R2RMLMappingError",
"message": "Invalid R2RML mapping: table 'customers' not found"
}
Solutions:
- Verify table name / location
- Check referenced column names in the mapping
- Validate R2RML syntax (Turtle)
Slow Queries
Causes:
- Large result sets (many Parquet files scanned)
- No partition pruning
- Complex joins across Fluree + Iceberg
Solutions:
- Add date/partition filters to enable Iceberg partition pruning
- Use LIMIT clause
- Optimize R2RML mapping to project only needed columns
- Partition Iceberg tables by common filter columns
Related Documentation
- Graph Sources Overview - Graph source concepts
- Iceberg - Data lake integration
- Query Datasets - Multi-graph queries