Structured Key Model
LSM trees provide efficient point lookups and range scans on key prefixes. This is due to the physical clustering of keys with based on their lexicographical sort ordering. To take advantage of this, OpenData systems carefully construct keys both to model the raw data but also to construct secondary indexes and embed them within the underlying key-value storage. In the example below, we have three different “types” of keys. The type of the key is encoded in the first bytedata|index|meta, and then the value of the key
is dependent on the type. The raw data key is just the user ID. For the inverted
index, the key is attribute_name:attribute_value and the value are all the
user ids that have the given attribute pair. The metadata keys are global keys
that describe the database (e.g. the version or total number of rows).
Segmenting Data
Because keys are physically sorted in the LSM, encoding a segment identifier into the key naturally groups related records together on disk. A query that targets a single segment only scans the key range for that segment so everything outside it is never read. Each segment can also be managed independently: old segments can be compacted, rolled up, or garbage collected without affecting the rest of the keyspace. The segment field can represent whatever boundary makes sense for the data model — a time window, a partition ID, a record type prefix, or a monotonic sequence number. The key insight is that the LSM’s sort order turns logical partitions into physical locality, giving you the benefits of partitioned storage without managing separate stores.Miscellaneous
OpenData systems actively leverage some interesting features of SlateDB in order to optimize storage efficiency and performance:| Property | How it works | Benefit |
|---|---|---|
| Prefix compression | Common prefixes are only stored once the storage system. | OpenData keys rely heavily on prefixes, and these prefixes are often repeated. This reduces storage overhead without requiring explicit dictionary encoding. |
| Merge operators | New entries are written as merge operands instead of read-modify-write. The LSM merges them lazily during compaction. | Turns N individual writes into a single merged result, avoiding read-modify-write cycles for posting lists and bitmaps. |
For specific encoding details and storage layouts, see the per-database
design pages:
Timeseries,
Log, and
Vector.