
Stratum Fork any table in O(1). Query past snapshots. Pure JVM, no native dependencies. Faster than DuckDB on 35 of 46 queries via the Java Vector API. PostgreSQL wire protocol - connect with psql,…
Stratum
Fork any table in O(1). Query past snapshots. Pure JVM, no native dependencies. Faster than DuckDB on 35 of 46 queries via the Java Vector API. PostgreSQL wire protocol - connect with psql, JDBC, or DBeaver.
Fused SIMD execution over copy-on-write columnar data. Single-threaded comparison vs DuckDB v1.4.4 (JDBC in-process) on 10M rows, Intel Core Ultra 7 258V, JVM 25. Median of 10 iterations, 5 warmup:
| Query | Stratum | DuckDB | Ratio |
|---|---|---|---|
| TPC-H Q6 (filter + sum-product) | 13ms | 28ms | 2.2x faster |
| Filtered COUNT (NEQ pred) | 3ms | 12ms | 4.0x faster |
| H2O Q3 (100K string groups) | 71ms | 362ms | 5.1x faster |
| LIKE '%search%' (string scan) | 47ms | 240ms | 5.1x faster |
| H2O Q6 (STDDEV group-by) | 30ms | 81ms | 2.7x faster |
| H2O Q10 (10M groups, 6 cols) | 832ms | 7056ms | 8.5x faster |
| AVG(LENGTH(URL)) | 38ms | 170ms | 4.5x faster |
Stratum wins 35 of 46 queries at 10M rows (single-threaded, median of 10 runs, Intel Core Ultra 7). DuckDB wins on sparse-selectivity filters, high-cardinality hash group-by, and global COUNT(DISTINCT). Full methodology and raw results in the benchmark docs.
st/fork creates an O(1) copy-on-write branch - no data copied, only a pointer to shared chunks. st/sync! persists a branch to storage. st/load
restores any named branch. Pass column data as a table map to SQL queries, or register live storage-backed tables in the server with register-live-table!.
(require '[stratum.api :as st]
'[konserve.file-store :as fs]
'[clojure.core.async :refer [<!!]])
;; Open storage and load the orders dataset (10M rows)
(def store (<!! (fs/new-fs-store "/data/stratum")))
(def orders (<!! (st/load store "orders")))
;; Fork in O(1) - structural sharing, zero data copied
(def experiment (st/fork orders))
;; Persist fork as a named branch - shares chunks with main
(<!! (st/sync! experiment store "experiment"))
;; Query both branches via SQL - pass column data as table map
(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns orders)})
;; => {:SUM(price * qty) 4821903.40} ← main branch unchanged
(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns experiment)})
;; => {:SUM(price * qty) 4401238.66} ← experiment branch
;; Time-travel: load any historical branch from storage
(def baseline (<!! (st/load store "orders-baseline")))
(st/q "SELECT COUNT(*) FROM t" {"t" (st/columns baseline)})
;; => {:COUNT(*) 9847233} Yggdrasil extends branching across your whole stack - fork Datahike, Stratum, and Proximum together for consistent snapshot isolation across SQL, Datalog, and vector search. Yggdrasil → · Dataset API docs →
Datahike is the system-of-record: Datalog queries, immutable transactions, time travel. Stratum is the SIMD SQL engine for scans and analytics over those same snapshots - fast on group-bys, joins, and window functions where a triple index isn't the right memory layout.
SIMD-accelerated via Java Vector API. Fused single-pass execution, dense group-by indexing, zone-map pruning. 35 of 46 queries faster than DuckDB at 10M rows.
Fork any table in O(1) via copy-on-write structural sharing. Named branches, time-travel, CoW snapshots - built into the engine, not bolted on.
No JNI, no native compilation. Runs anywhere a JVM does. SIMD acceleration without deployment complexity or platform lock-in.
PostgreSQL wire protocol. Full DML, CTEs, window functions, aggregates, FROM read_csv / read_parquet. psql, JDBC, DBeaver, psycopg2.
Datasets implement IEditableCollection, ILookup, IPersistentCollection. tablecloth and tech.ml.dataset work directly. DSL or SQL strings.
Branch Datahike, Stratum, and Proximum together via Yggdrasil. Consistent snapshots across SQL, Datalog, and vector search.
# Java 21+, no Clojure needed
java --add-modules jdk.incubator.vector \
--enable-native-access=ALL-UNNAMED \
- jar stratum-standalone.jar \
--index orders:/data/orders.csv
# Or try the built-in demo tables (lineitem, taxi -100K rows each)
java --add-modules jdk.incubator.vector \
--enable-native-access=ALL-UNNAMED \
- jar stratum-standalone.jar --demo
# Connect with any PostgreSQL client
psql - h localhost - p 5432 - U stratum (require '[stratum.api :as st])
;; Query with DSL
(st/q {:from {:price prices :qty quantities}
:where [[:> :price 100]]
:group [:region]
:agg [[:sum [:* :price :qty]]
[:count]]})
;; Or with SQL
(st/q "SELECT region, SUM(price * qty), COUNT(*)
FROM orders WHERE price > 100 GROUP BY region"
{"orders" {:price prices :qty quantities
:region regions}}) FROM read_csv('file.csv'), FROM read_parquet('file.parquet')If you need help getting Stratum into production, we can help with integration, custom development, and support contracts.
; deps.edn (Clojure CLI)
; check https://clojars.org/org.replikativ/stratum for latest version
{:deps {org.replikativ/stratum {:mvn/version "0.1.114"}}}
; JVM flags required (add to :jvm-opts or alias)
:jvm-opts ["--add-modules=jdk.incubator.vector"
"--enable-native-access=ALL-UNNAMED"]
;; Leiningen - project.clj
;; [org.replikativ/stratum "0.1.114"] Requires JDK 21+. Clojure 1.12+. Apache 2.0 license. Latest version on Clojars.
Hi HN - I’m the author of Stratum.
Stratum is a columnar SQL engine built on the JVM exploring two ideas:
• SIMD execution using the Java Vector API
• copy-on-write branching for tables
It speaks the PostgreSQL wire protocol, so tools like psql, JDBC, and DBeaver work out of the box.
The engine is pure JVM (no JNI). In benchmarks on 10M rows it performs competitively with DuckDB on many analytical queries.
GitHub repo: https://github.com/replikativ/stratum
Benchmarks and methodology are described in the docs.
Happy to answer questions - feedback very welcome.