Storage engines¶
Plugin category: bsb.storage.engines
A storage engine is an Engine subclass
that persists a reconstruction (placement, connectivity, files, morphologies). The
two reference implementations are bsb_hdf5 (full
reconstruction, HDF5-backed, see HDF5Engine) and bsb-core’s
FileSystemEngine (filesystem layout,
metadata-only).
Engine ABC¶
The Engine ABC has two groups of
abstract members. The lifecycle group covers
create,
move,
copy,
remove,
exists,
clear_placement, and
clear_connectivity.
The provenance surface (below) is the second group.
metadata(property)Returns the root-level provenance bundle as a JSON-serialisable
dict. The canonical layout is built bybuild_root_metadata; persist its output verbatim oncreate. Return{}if the engine is opened read-only against an artefact that lacks a bundle and cannot be upgraded._bump_stateIncrements
state_idatomically. The engine invokes this itself from every mutating code path:clear_placement,clear_connectivity,FileStore.storeandFileStore.remove,PlacementSetandConnectivitySetmutators. No-op when the engine is in read-only mode._upgrade_if_neededCalled from
__init__aftersuper().__init__. Detects an existing artefact missing the provenance bundle, stamps a freshstorage_idand the rest of the bundle with current values, and emits a singleBsbProvenanceUpgradeWarning. No-op for fresh artefacts (already stamped bycreate), read-only engines, and roots that do not exist yet.
Every mutator that writes to disk must call
_bump_state (or the local
equivalent that updates the root attrs in the same open handle) so the counter
stays in sync with reality.
The provenance bundle¶
Every storage root carries this bundle. It is exposed read-only on the
Scaffold as
scaffold.storage_id (UUID4, immutable),
scaffold.state_id (monotonic int) and
scaffold.provenance (the full dict).
Key |
Meaning |
|---|---|
|
UUID4, immutable. |
|
Monotonic revision counter (int). Bumped on every mutating write. Not a content fingerprint: it answers “did this artefact change since I last looked?”, not “is it the same network as that other artefact?”. |
|
Version of the bundle layout itself, so future BSB versions can read / write / migrate older and newer schemas. |
|
ISO 8601 UTC timestamp of engine creation. |
|
|
|
|
|
Engine package version at creation time. |
|
|
|
Diagnostic, optional: |
|
Diagnostic, optional: |
The bundle is the back-pointer target for simulation result files. Every .nio
neo.core.Block annotates bsb_provenance.scaffold = {storage_id,
state_id, root} (see The recorder annotation convention).
Sub-interfaces¶
Beyond the engine object itself, a storage backend supplies a concrete subclass of
each sub-interface it supports (a metadata-only backend may supply only
FileStore). Each subsection links the
abstract class (whose page lists every method signature) and then states the contract
an implementation must honour: what it stores, the data model, which methods are
mandatory, how access is coordinated, and the provenance hooks.
Declaring an implementation¶
There is no registry call or decorator. The framework discovers an engine’s
implementations by scanning the engine’s plugin module for the first class that
subclasses each storage interface (the Engine itself is found the same way). You declare a
sub-interface implementation simply by subclassing the abstract class and exporting
it from your plugin package’s top-level namespace, next to a StorageNode config
node (which is mandatory):
# my_engine/__init__.py (the module registered under the bsb.storage.engines entry point)
from bsb import Engine, StorageNode as IStorageNode
from bsb import FileStore as IFileStore
class MyEngine(Engine): ...
class StorageNode(IStorageNode): ... # the config node for storage.engine
class FileStore(IFileStore): ... # discovered as this engine's FileStore
# Re-export from submodules so the scan finds them at the package top level:
# from .placement_set import PlacementSet
# from .connectivity_set import ConnectivitySet
# from .morphology_repository import MorphologyRepository
A sub-interface you do not subclass resolves to a NotSupported stand-in that
raises on use; that is how a metadata-only engine omits placement and connectivity.
Two of the four sub-interfaces declare an engine_key on the abstract class
(FileStore keys files,
MorphologyRepository keys
morphologies). For those, the Storage factory
instantiates your subclass once and binds the singleton on the engine under that key,
so engine.files and engine.morphologies are ready to use. The other two have
no singleton (there is one placement set per cell type and one connectivity set per
tag); the engine hands those out through its factory methods, such as
require_placement_set, which call your subclass’s create / require.
Two cross-cutting rules apply to all four:
Ordinary reads and data writes are individual actions. The engine serialises concurrent access with a lock, so any single rank may read or write on its own; placement and connectivity strategies routinely write their chunk’s data from one rank. Only a handful of engine-level operations are collective and must be entered from every rank under MPI (they barrier internally):
create,move,copy,remove,clear_placement,clear_connectivity, andstore_active_config. Those may not be called from component code.Every write path must leave the provenance counters consistent by calling
_bump_stateon the engine (directly, or via the in-handle equivalent) before the lock is released.
FileStore¶
Class: FileStore
A key-value store of opaque blobs. Each entry is a (content, meta) pair filed
under a string id. The framework uses it for the active configuration JSON and for
every file or data dependency declared by a component: morphology sources, atlases,
NRRD volumes, and so on. Components SHOULD declare every external file they depend on
so the file store can absorb it, leaving a single self-contained reconstruction file
that carries everything needed to rebuild the network with no loose external paths.
This is the smallest sub-interface and the only one the
fs engine implements, which makes it
the best place to start a new backend.
An entry’s meta is a free dict, but three keys are conventional and the
framework relies on them: mtime (write timestamp), encoding (text codec, or
absent for binary), and active_config (the boolean flag marking the live
configuration).
Mandatory methods:
active configuration:
store_active_config,load_active_config.
The lookup helpers
(get,
find_files,
find_file,
find_id,
find_meta) are concrete on
the ABC, built on all.
store(content, id=None, meta=None, encoding=None, overwrite=False) -> strThe single write primitive. Steps an implementation must perform:
If
idisNone, mint one withuuid4and use it as the return value.Normalise content. A
stris encoded tobytes(defaultutf-8, unlessencodingoverrides it);bytesare stored verbatim withencodingleftNone. Record the chosenencodingin the entry soloadcan decode it back.If
overwriteis false and the id already exists, raiseFileExistsError. If it is true, replace the existing entry.Stamp
meta["mtime"]with the current time and, for provenance, acontent_sha256of the bytes and aproducer({"package", "version"}) describing who wrote the file.Persist content and meta, then bump engine state.
A minimal filesystem implementation looks like this:
import hashlib, json, time, uuid def store(self, content, id=None, meta=None, encoding=None, overwrite=False): if isinstance(content, str): encoding = encoding or "utf-8" content = content.encode(encoding) id = id or str(uuid.uuid4()) meta = {**(meta or {})} if not overwrite and self.has(id): raise FileExistsError(f"'{id}' already in the store") meta.setdefault("content_sha256", hashlib.sha256(content).hexdigest()) self._write_blob(id, content) self._write_meta(id, {"meta": meta, "mtime": time.time(), "encoding": encoding}) self._engine._bump_state() return id
load(id) -> tuple[str | bytes, dict]Return
(content, meta). Decode the content with the storedencoding(so text round-trips asstr); return rawbyteswhenencodingisNone. The second element is the usermetamapping, not the internal record. RaiseFileNotFoundErrorfor an unknown id.remove(id)Delete the entry (both content and meta) and bump engine state. Raise
FileNotFoundErrorfor an unknown id.all() -> dict[str, dict]Return
{id: meta}for every entry. This is the workhorse the concretefind_*helpers iterate, so keep it cheap; return only the meta, never the content.has(id) -> boolWhether an entry with that id exists. Must not raise for a missing id.
get_meta(id) -> dict/get_mtime(id)/get_encoding(id)Targeted accessors for one entry: the user meta mapping, the numeric write timestamp, and the text codec (or
Nonefor binary) respectively.store_active_config(config) -> strPersist
configas the active configuration. There is at most one at a time, so first remove any entry whose meta hasactive_configset, thenstorejson.dumps(config.__tree__())withmeta={"active_config": True, "producer": {...}}. Returns the new id. This call is collective under MPI; the reference engines do the actual write on rank 0 and broadcast the id.load_active_config() -> ConfigurationFind the entry flagged
active_config(thefind_meta("active_config", True)helper does this), parse its JSON back into aConfigurationwithConfiguration(**tree), re-attach the stored meta ascfg._meta, and return it. RaiseMissingActiveConfigErrorwhen no entry is flagged. Parse withjson.loads(), nevereval: stored files may come from untrusted sources.
PlacementSet¶
Class: PlacementSet
Stores placement data for a cell type. Its identifier is its tag; by convention
a cell type has one placement set whose tag is the cell-type name, but neither the
one-set-per-type nor the tag-equals-name correspondence is enforced. Data is
partitioned into chunks (spatial buckets). A
cell’s placement-set id is its rank across all chunks taken in sorted chunk order, so
the engine derives ids from the per-chunk counts rather than storing them.
Per cell the set holds: a position (N×3 float), a rotation (N×3), a
morphology index (into the set’s morphology loaders), an encoded label set, and any
number of named additional arrays.
Mandatory methods group into:
construction and existence:
create,exists,__init__,__len__(requireis concrete on the ABC);reads:
load_positions,load_rotations,load_morphologies,load_additional,load_ids,get_all_chunks,get_chunk_stats,__iter__;writes:
append_data,append_additional,clear;scoping and labels:
chunk_context,set_chunk_filter,set_morphology_label_filter,label,label_by_mask,remove_labels,remove_labels_by_mask,get_label_mask,get_labelled,get_unique_labels.
count_morphologies, load_boxes, load_box_tree,
set_label_filter
and get_label_filter are concrete on the ABC.
The chunk model¶
Each append_data call
targets a single chunk, but you may create
new chunks and append to existing ones in any order. A chunk is a fixed-size box of
space identified by its integer key; placement strategies fill chunks in parallel,
each rank owning a subset. An implementation therefore stores every per-cell dataset
partitioned by chunk, and never assumes a single contiguous array.
A cell’s placement-set id is not stored. It is defined as the cell’s rank when
chunks are concatenated in sorted key order: all cells of the lowest chunk first
(\(0 \ldots n_0 - 1\)), then the next chunk
(\(n_0 \ldots n_0 + n_1 - 1\)), and so on. Every read that returns ids (or that a
caller will index by id) must use this ordering, which is why
get_chunk_stats
(the {chunk_key: count} map) is load-bearing: it is the source of truth that
turns chunk-local rows into global ids, and connectivity relies on it to offset its
location matrices.
Writing data¶
append_data(chunk, positions=None, morphologies=None, rotations=None, additional=None, count=None)Append cells to one chunk. The optional arguments fill datasets left to right and are positionally dependent: to pass
morphologiesyou must also passpositions; to passrotationsyou must pass at leastpositions.countis the entity escape hatch: it createscountposition-less cells and is mutually exclusive withpositions/morphologies/rotations. After writing the datasets, update this set’s per-chunk counts and total length, then bump engine state. Sketch:def append_data(self, chunk, positions=None, morphologies=None, rotations=None, additional=None, count=None): n = count if count is not None else len(positions) if positions is not None: self._append(chunk, "position", positions) if morphologies is not None: # merges loaders, stores indices self._append_morphologies(chunk, morphologies) if rotations is not None: self._append(chunk, "rotation", rotations) for key, data in (additional or {}).items(): self.append_additional(key, chunk, data) self._track_add(chunk, n) # update stats + len, bump state
append_additional(name, chunk, data)Append
len(data)rows to the chunk’s array stored undername, creating it on first use. It appends, never overwrites: call it with each batch’s rows alongside the matchingappend_dataso the named array grows in lockstep with the chunk’s cells (append N positions then N rows, later M positions then M rows). Use it for arbitrary per-cell user data that should live and be filtered alongside the placement.clear(chunks=None)Drop all data (or only the given chunks), decrementing the chunk counts you track so
get_chunk_statsstays exact. Bump engine state.
Reading data¶
load_positions()/load_rotations()/load_additional(key=None)Return the concatenated
N×3positions, aRotationSet, and the named additional array(s), in placement-set id order. When a chunk or label filter is active they return only the matching rows.load_rotationsandload_morphologiesraiseDatasetNotFoundErrorwhen the data is absent (unlessallow_emptyis set).load_morphologies(allow_empty=False)Return a
MorphologySetpairing the set’s loaders with the per-cell morphology index. The loaders are obtained from the engine’sMorphologyRepository, and the per-cell dataset holds integer indices into that loader list, so storing morphologies means storing names plus indices, not duplicating geometry per cell.load_ids()Return the global ids in scope as a flat array, derived from the chunk counts (and masked by the label filter when set). A caller uses these to line recorded data back up with cells.
get_all_chunks()/get_chunk_stats()The chunks that hold data, and the
{chunk_key: count}map. Keepget_chunk_statsexact: id derivation and connectivity scoping both depend on it.
Scope filters¶
Filters narrow a set in place: while one is active, __len__ and every load_*
reflect only the matching cells, and a freshly loaded set has none set.
chunk_context(chunks)A context manager that restricts the set to
chunksfor the duration of awithblock. Use it internally to read one chunk’s slice without mutating the persistent filter.set_chunk_filter(chunks)Persistently restrict reads to the given chunks until changed.
set_morphology_label_filter(morphology_labels)Restrict the sub-cellular scope: morphologies returned by
load_morphologieswill be filtered to these labels.set_label_filterandget_label_filter(cell-level, concrete on the ABC) cover the cell scope.
Labels¶
Each cell carries an encoded label set. The label methods read and mutate it:
label(labels, cells)/label_by_mask(labels, mask)Add
labelsto the cells named by id, or by a boolean mask the length of the set. Validate that ids are in range / the mask fits, raisingLabellingErrorotherwise.remove_labels(labels, cells)/remove_labels_by_mask(labels, mask)The inverse: strip
labelsfrom the selected cells.get_label_mask(labels=None)/get_labelled(labels=None)/get_unique_labels()Query: a boolean mask for cells carrying
labels, their ids, and the set of all labels in use. Passing an empty list selects unlabelled cells.
ConnectivitySet¶
Class: ConnectivitySet
Stores the connections between a presynaptic and a postsynaptic cell type, written
once from each perspective (inc and out) and partitioned per chunk so that
incoming and outgoing queries are both cheap. The engine must set the class
attributes tag, pre_type_name, post_type_name, pre_type and
post_type on every instance.
A connection is a pair of locations. A location is a row [cell_id, branch_id,
point_id]; point-neuron connections use -1 for the branch and point columns.
Locations are interpreted in one of two frames:
connecttakes placement-set-scoped cell ids (the rank within the whole set);chunk_connecttakes chunk-relative cell ids (the rank within the named chunk).
Mandatory methods:
writes:
connect,chunk_connect,clear;reads:
get_local_chunks,get_global_chunks,flat_iter_connections,nested_iter_connections,load_block_connections,load_local_connections.
require and
load_connections
(which returns a
ConnectivityIterator) are
concrete on the ABC.
The location and direction model¶
src_locs and dest_locs are equal-length N×3 integer matrices: one row
per connection, columns [cell_id, branch_id, point_id]. src is presynaptic,
dest is postsynaptic. A point-neuron connection sets branch and point to -1;
a 1-D id vector is broadcast to that shape for you by the reference helper.
Connections are stored twice, once per direction, so that both “who do I send to” and “who sends to me” are local reads:
out: keyed by the presynaptic (local) chunk, pointing at postsynaptic (global) chunks;inc: keyed by the postsynaptic (local) chunk, pointing at presynaptic (global) chunks.
“Local” is the chunk you index by; “global” is the chunk on the other end of the
connection. Writing one batch of connections means appending to both an out block
and an inc block.
Writing connections¶
connect(pre_set, post_set, src_locs, dest_locs)The high-level entry point.
src_locsanddest_locscarry placement-set-scoped cell ids (rank within the whole set). It must: resolve any active label filter onpre_set/post_set(translating filtered ids back to stored ids), apply morphology back-mapping where the sets require it, then demultiplex the rows per(pre_chunk, post_chunk)pair and hand each block tochunk_connect. An engine that implementschunk_connectplus the iterators getsconnectfor free by reusing the reference demultiplexer; only override it if your backend can route locations to chunks more cheaply.chunk_connect(src_chunk, dst_chunk, src_locs, dst_locs)The low-level primitive. Here the cell ids are chunk-relative (rank within the named chunk). Append
src_locsto theoutblock ofsrc_chunk(globaldst_chunk) and to theincblock ofdst_chunk(globalsrc_chunk), then update the per-direction counts and bump engine state.src_locsanddst_locsmust be the same length.clear(chunks=None)Drop the connectivity (or only the given chunks) from both directions and reset the counts. Bump engine state.
Reading connections¶
get_local_chunks(direction)List the local chunks that hold data in
"inc"or"out".get_global_chunks(direction, local_)List the global chunks reachable from a given local chunk in that direction.
load_block_connections(direction, local_, global_)Return the
(local_locs, global_locs)pair for one(local, global)block. This is the leaf read the iterators are built on; an empty block returns two(0, 3)arrays rather than raising.load_local_connections(direction, local_)Return all connections of one local chunk as
(local_locs, global_chunk_ids, global_locs), where the middle array tags each global row with its chunk so the caller can resolve cross-chunk ids.flat_iter_connections(direction=None, local_=None, global_=None)/nested_iter_connections(...)Iterate the blocks. The flat form yields
(direction, local_chunk, global_chunk, data)tuples; the nested form yields nested iterators for hand-written loops. Omitting an argument iterates that axis; passing one pins it. The concreteload_connectionswraps these into aConnectivityIteratorthat applies the placement-set chunk offsets, so most callers never touch the raw blocks:cs = scaffold.get_connectivity_set("A_to_B") for pre_loc, post_loc in cs.load_connections(): # pre_loc / post_loc are [cell_id, branch_id, point_id], ids global to the set ...
MorphologyRepository¶
Class: MorphologyRepository
Stores morphologies and their metadata, content-addressed by a hash kept in each morphology’s meta. PlacementSets reference its loaders by name.
Mandatory methods:
reads:
all,select,has,preload,load,get_meta,get_all_meta;writes:
save,set_all_meta,update_all_meta.
__contains__ and
list are concrete on
the ABC.
A morphology has two parts an implementation must round-trip: its geometry (the
branch tree of points, radii, per-point labels and properties) and its meta (a
free dict that carries at least the content hash). The two are queried
separately because most of the framework only needs names and meta, and loading
geometry is expensive. That split is why there is both a lazy
StoredMorphology (a loader
plus meta) and an eager
Morphology.
Writing¶
save(name, morphology, overwrite=False)Persist a
Morphologyundername. RaiseMorphologyRepositoryErrorif the name exists andoverwriteis false. Serialise the branch tree to your backend and write the morphology’s meta, which must include thehashso aPlacementSetwrite can read it back intomorphology_hashes.set_all_meta(all_meta)/update_all_meta(meta)Replace, or merge into, the
{name: meta}map for the whole repository. Used to rewrite metadata without touching geometry.
Reading¶
all()Return a
StoredMorphologyfor every stored morphology, preloaded from the meta map (no geometry read).has(name)Whether a morphology of that name exists. Backs the concrete
__contains__.preload(name)Return a lazy
StoredMorphology: its meta is read now, its geometry only when the caller calls.load()on it. This is whatPlacementSet.load_morphologiescollects, so it must be cheap.load(name)Read the geometry and return a fully constructed
Morphology: rebuild the branch tree from your stored point, radius, label and property arrays and attach the meta.get_meta(name)/get_all_meta()The meta dict for one morphology, or the whole
{name: meta}map. RaiseMissingMorphologyErrorfor an unknown name.select(*selectors)Run
MorphologySelectorobjects againstall()and return the matching stored morphologies. Call each selector’svalidateonce before filtering, so a selector that names a missing morphology fails loudly:def select(self, *selectors): if not selectors: return [] loaders = self.all() picked = [] for selector in selectors: selector.validate(loaders) picked.extend(filter(selector.pick, loaders)) return picked
Keeping provenance current¶
Every write path across the sub-interfaces keeps the provenance bundle in step with the data:
placement writes (
append_data,append_additional, thelabel*methods,clear) bumprevisionon the placement set and refreshmorphology_hasheswhen morphology data changed;connectivity writes (
connect,chunk_connect,clear) bumprevisionon the connectivity set;file writes (
store,remove,store_active_config) recordcontent_sha256andproducerper file;all of them bump the engine’s
state_id(the cross-cutting_bump_staterule).
Backfilling a storage that lacks provenance. When an engine opens a root with no
provenance bundle (one written before the engine grew provenance support, or by a
third-party tool), it backfills one: the _upgrade_if_needed step stamps a fresh
bundle transparently and emits a single
BsbProvenanceUpgradeWarning. Read-only opens skip the backfill;
scaffold.storage_id and
scaffold.state_id are then None, and any
simulation result written against that scaffold records "storage_id": None in its
back-pointer. To force the backfill, reopen writable and trigger any mutation.
Reference walkthrough¶
For a worked example, follow the
HDF5Engine implementation (full PS/CS
support) and the FileSystemEngine
implementation (metadata-only, atomic writes via tmpfile and os.replace()).
Both reuse build_root_metadata,
iso_now, and the shared plugin and host collectors
so the bundle stays consistent across engines.