Module: secbench.storage

This is the API documentation for the secbench.storage module. An interactive user-guide is also available.

You will use mainly two classes from this module:

  • Store, the storage abstraction layer. A Store contains zero or more Datasets.

  • Dataset, which allow to access and modify datasets.

Shared datasets

Over the years, we accumulated several datasets from real targets. To list the datasets available use:

$ secbench-db list -l

From Python, you can use secbench.storage.load_shared_dataset(). If you need to list datasets, use secbench.storage.shared_datasets().

load_shared_dataset(name, mode='r', config_path=None, **kwargs)

Load a shared dataset by its identifier.

You can obtain the list of dataset using list(shared_datasets().keys()) in Python or by running secbench-db list in a terminal. You can also obtain full details on the datasets available with shared_datasets().

Parameters:
  • name – identifier of the dataset as found shared_datasets() (or the secbench-db list command)

  • mode – mode used to load the file. Since dataset are shared, you will most likely open them in read-only mode (the default).

  • kwargs – the remaining arguments are forwarded to the Store.open() method.

shared_datasets(config_path=None, load_description=True)

Return information on all shared dataset as a nested dictionary.

The keys of the dictionary returned are the identifiers for datasets. They can be passed to load_shared_dataset() for loading a specific dataset.

Parameters:
  • load_description – If true, read content of file “description_file” (if the key is present) of each dataset into the “description” field.

  • config_path – path to the configuration to load. If not given, will load the dataset specification embedded in the secbench.storage package.

Storage types

The Store class is the top-level accessor for dataset.

class Store

Loader for Secbench HDF5-based dataset storage format.

__init__(path, mode=OpenMode.read, temporary=False, **kwargs)

Load a new store from a path.

Params path:

path or label of the dataset.

Params temporary:

if True, the store will be temporary, the open mode will be ignored.

Here are the supported open modes:

Mode (string)

Mode (OpenMode)

Description

‘r’

OpenMode.read

Read only, file must exist (default)

‘r+’

OpenMode.read_write

Read/write, file must exist

‘w’

OpenMode.create_truncate

Create file, truncate if exists

‘w-’

OpenMode.create

Create file, fail if exists

‘a’

OpenMode.read_write_create

Read/write if exists, create otherwise

classmethod open(path, mode=OpenMode.read, **kwargs)

Alias for Store constructor for backward compatibility.

classmethod temporary(name)

Create a temporary store (backed in RAM).

create_dataset(name, capacity, *fields)

Create a new dataset.

Parameters:
  • name – Name of the dataset in hdf_obj

  • capacity – Maximum number on entries that can be stored in the dataset.

  • fields – One or more field names (as string).

Returns:

A Dataset object.

dataset_names()

List of datasets available in the store.

datasets(callback=None)

Iterate datasets in a store.

Parameters:

callback – An optional filter callback. It has the signature callback(name: str, group: h5py.Group) and should return True to keep an entry.

Returns:

An iterator of DatasetRef found.

export_dataset(name, dst_store, new_name=None, shrink=True, chunk_size=100000)

Export a dataset from this store into an other store.

Parameters:
  • name – Name of the dataset to export.

  • dst_store – Destination store.

  • shrink – Only export valid data in the output dataset (i.e., do not reserve additional capacity).

  • new_name – Optional new name for the exported dataset. If not given, use the source dataset name.

  • chunk_size – number of traces loaded in RAM for exporting. A lower chunk_size may be beneficial on system with little RAM.

load_dataset(name)

Load an existing dataset.

Parameters:

name – Name of the dataset to load.

Returns:

A Dataset object.

Raises:

Then, the Dataset class is used to manage individual datasets.

class Dataset
__init__(backend, name, capacity, fields, size=0, initialized=False)
add_asset(name, data, replace=False)

Add some meta information to this dataset.

This data can be either raw bytes or a numpy array.

Parameters:
  • name – identifier of the asset

  • data – content of the asset. It is stored as bytes in the HDF5.

  • replace – if True, overwrites existing entries

add_json_asset(name, data, replace=False)

A shorthand to add a JSON asset in the current dataset.

This function is only a wrapper around add_asset(). This asset will be stored as a regular asset.

Parameters:
  • name – name of the asset.

  • data – content of the asset (should be a “JSON-serializable” object)

  • replace – if True, overwrites existing entries

get_asset(name, raw=False)

Retrieve an asset from this dataset.

Parameters:
  • name – name of the asset.

  • raw – if True, returns raw bytes instead of a numpy array.

get_json_asset(name)

Load a JSON asset.

assets()

List of assets available in this dataset.

fields()

List of fields available in this dataset.

Note

The fields are returned in the order passed in the constructor.

get(*fields)

Access arrays associated with each field.

Parameters:

fields – Zero or more field names.

Returns:

A tuple or arrays in the same order as requested as argument of this function. Those arrays have a numpy-compatible API.

Raises:

KeyError – If a field requested does not exists.

reset()

Reset the size of a dataset to 0.

Once this is done, you can append() or extend() new data. This will erase existing data.

append(*args)

Add a single row in the dataset.

You must stick to the following rules:

  • The order of fields is the same as returned by Dataset.fields().

  • All fields are explicitly typed (for example, np.int32(3), not 3).

  • The type of each field does not change between calls to append or extend.

Parameters:

args – Value of each field. The order of fields must be the same as given to the constructor (or returned by Dataset.fields()).

Raises:

ValueError – When the fields are invalid, or if the dataset is full.

extend(*args)

Add multiple rows in the dataset.

You must stick to the following rules:

  • The order of fields is the same as returned by Dataset.fields().

  • All fields are arrays, with the same first dimension.

  • The type of each field does not changes between calls to append or extend.

Parameters:

args – Value of each field.

Raises:

ValueError – When the fields are invalid, or if the dataset is full.

class DatasetRef

Reference for a Dataset.

This class is simply a tuple (HDF5 object, Path of the dataset).

__init__(store, name)
load()

Load the dataset referenced by this object.

Returns:

A Dataset.

Helpers

version()

Current version of the module (as a string).