Module: secbench.storage
¶
This is the API documentation for the secbench.storage
module. An interactive user-guide is also available.
You will use mainly two classes from this module:
Store
, the storage abstraction layer. AStore
contains zero or moreDatasets
.Dataset
, which allow to access and modify datasets.
Storage types¶
The Store
class is the top-level accessor for dataset.
- class Store¶
Loader for Secbench HDF5-based dataset storage format.
- __init__(path, mode=OpenMode.read, temporary=False, **kwargs)¶
Load a new store from a path.
- Params path:
path or label of the dataset.
- Params temporary:
if True, the store will be temporary, the open mode will be ignored.
Here are the supported open modes:
Mode (string)
Mode (OpenMode)
Description
‘r’
OpenMode.read
Read only, file must exist (default)
‘r+’
OpenMode.read_write
Read/write, file must exist
‘w’
OpenMode.create_truncate
Create file, truncate if exists
‘w-’
OpenMode.create
Create file, fail if exists
‘a’
OpenMode.read_write_create
Read/write if exists, create otherwise
- classmethod open(path, mode=OpenMode.read, **kwargs)¶
Alias for
Store
constructor for backward compatibility.
- classmethod temporary(name)¶
Create a temporary store (backed in RAM).
- create_dataset(name, capacity, *fields)¶
Create a new dataset.
- Parameters:
name – Name of the dataset in hdf_obj
capacity – Maximum number on entries that can be stored in the dataset.
fields – One or more field names (as string).
- Returns:
A
Dataset
object.
- dataset_names()¶
List of datasets available in the store.
- datasets(callback=None)¶
Iterate datasets in a store.
- Parameters:
callback – An optional filter callback. It has the signature
callback(name: str, group: h5py.Group)
and should returnTrue
to keep an entry.- Returns:
An iterator of
DatasetRef
found.
- export_dataset(name, dst_store, new_name=None, shrink=True, chunk_size=100000)¶
Export a dataset from this store into an other store.
- Parameters:
name – Name of the dataset to export.
dst_store – Destination store.
shrink – Only export valid data in the output dataset (i.e., do not reserve additional capacity).
new_name – Optional new name for the exported dataset. If not given, use the source dataset name.
chunk_size – number of traces loaded in RAM for exporting. A lower chunk_size may be beneficial on system with little RAM.
- load_dataset(name)¶
Load an existing dataset.
- Parameters:
name – Name of the dataset to load.
- Returns:
A
Dataset
object.- Raises:
KeyError – if the dataset does not exists.
ValueError – if loading failed.
Then, the Dataset
class is used to manage individual datasets.
- class Dataset¶
- __init__(backend, name, capacity, fields, size=0, initialized=False)¶
- add_asset(name, data, replace=False)¶
Add some meta information to this dataset.
This data can be either raw bytes or a numpy array.
- Parameters:
name – identifier of the asset
data – content of the asset. It is stored as bytes in the HDF5.
replace – if True, overwrites existing entries
- add_json_asset(name, data, replace=False)¶
A shorthand to add a JSON asset in the current dataset.
This function is only a wrapper around
add_asset()
. This asset will be stored as a regular asset.- Parameters:
name – name of the asset.
data – content of the asset (should be a “JSON-serializable” object)
replace – if True, overwrites existing entries
- get_asset(name, raw=False)¶
Retrieve an asset from this dataset.
- Parameters:
name – name of the asset.
raw – if True, returns raw bytes instead of a numpy array.
- get_json_asset(name)¶
Load a JSON asset.
- assets()¶
List of assets available in this dataset.
- fields()¶
List of fields available in this dataset.
Note
The fields are returned in the order passed in the constructor.
- get(*fields)¶
Access arrays associated with each field.
- Parameters:
fields – Zero or more field names.
- Returns:
A tuple or arrays in the same order as requested as argument of this function. Those arrays have a numpy-compatible API.
- Raises:
KeyError – If a field requested does not exists.
- reset()¶
Reset the size of a dataset to 0.
Once this is done, you can
append()
orextend()
new data. This will erase existing data.
- append(*args)¶
Add a single row in the dataset.
You must stick to the following rules:
The order of fields is the same as returned by
Dataset.fields()
.All fields are explicitly typed (for example,
np.int32(3)
, not3
).The type of each field does not change between calls to
append
orextend
.
- Parameters:
args – Value of each field. The order of fields must be the same as given to the constructor (or returned by
Dataset.fields()
).- Raises:
ValueError – When the fields are invalid, or if the dataset is full.
- extend(*args)¶
Add multiple rows in the dataset.
You must stick to the following rules:
The order of fields is the same as returned by
Dataset.fields()
.All fields are arrays, with the same first dimension.
The type of each field does not changes between calls to
append
orextend
.
- Parameters:
args – Value of each field.
- Raises:
ValueError – When the fields are invalid, or if the dataset is full.
Helpers¶
- version()¶
Current version of the module (as a string).