Secbench Storage

You can also open this tutorial in a secbench environment download the notebook and open it in a Jupyter in the secbench environment.

Overview

The secbench storage API is a thin layer on top of HDF5 files (see h5py Documentation).

It provides a Store class, which abstracts the storage backend, such as a HDF5 file. A store contains zero or more Dataset.

A Dataset object represents an aggregation of several measurements. You can think of a dataset as a 2-dimensional array, where each row contains different fields (for example, x, y, z, temperature).

In side-channel analysis, we usually create a dataset for each acquisition campaign, with fields like:

  • side-channel measurements from the scope,

  • the plaintext associated with each trace,

  • the ciphertext associated with each trace.

In this walkthrough, we will see how to create a dataset and read it.

We start with some imports.

import json

# All data are represented and stored as numpy arrays
import numpy as np

# Dataset
from secbench.storage import Dataset, Store

Creating Datasets

The tutorial uses generated data, so that the notebook is self-contained.

We generate a 50MB dataset. This is not sufficient to get a taste of the performances achieved by the library. But, feel free to adapt the parameters to test on your machine.

capacity = 10_000
samples = 5_000
data = np.random.randint(-128, 127, size=(capacity, samples), dtype=np.int8)
pts = np.random.randint(0, 256, size=(capacity, 16), dtype=np.uint8)
print(f'Dataset size: {data.size / 1e6} Mb')
Dataset size: 50.0 Mb

The first step before creating or loading some dataset is to open a destination Store. For this, we use the secbench.storage.Store.open() class method.

We create one called “walkthrough.hdf5”.

Note that we open this file in ‘w’ mode, which clears the file at each execution. This mode is convenient for the tutorial. However, in practice, we recommend the ‘a’ mode when writing a dataset and the ‘r’ mode when reading a dataset. The modes supported by secbench.storage.Store.__init__() constructor are:

Mode (string)

Mode (OpenMode)

Description

‘r’

OpenMode.read

Read only, file must exist (default)

‘r+’

OpenMode.read_write

Read/write, file must exist

‘w’

OpenMode.create_truncate

Create file, truncate if exists

‘w-’

OpenMode.create

Create file, fail if exists

‘a’

OpenMode.read_write_create

Read/write if exists, create otherwise

store = Store('walkthrough.hdf5', mode='w')

Now, we can create a dataset called “my_acquisition” (the name should be chosen to easily identify datasets). This dataset will have a capacity of 100_000 entries and two fields: “data” and “plaintext”.

It means that this dataset can hold at most 100_000 pairs of “data” and “plaintext”. When created, the dataset is empty.

ds = store.create_dataset('my_acquisition', capacity, 'data', 'plaintext')

We can introspect various information about the dataset. The size attribute represents the number of rows (entries) in the dataset.

print("fields:", ds.fields())
print("capacity:", ds.capacity)
print("size", ds.size)
fields: ['data', 'plaintext']
capacity: 10000
size 0

The first way to add some entries in the dataset is to use the secbench.storage.Dataset.append() method, which adds a single entry.

%%time
for i in range(1000):
    ds.append(data[i], pts[i])
CPU times: user 61 ms, sys: 3.93 ms, total: 65 ms
Wall time: 64.9 ms

We can see that the size of the dataset was updated.

print(ds.size)
1000

However, a much faster way to insert traces (20x faster!) is to use the extend(), which adds many entries at once.

%%time
ds.extend(data[1000:2000], pts[1000:2000])
CPU times: user 1.98 ms, sys: 193 μs, total: 2.18 ms
Wall time: 1.95 ms

IMPORTANT: On the first call to extend() or append(), those methods look at the type and shape of all fields (here data and pts) and allocate the data in the HDF5 file. This implies that the arguments to append (or extend) must always have the same type and shape.

Once some rows are in the dataset, you can access directly the underlying arrays with the get() method. This method return the array with its full capacity, the data is only valid on the slice [:ds.size].

ds_data, ds_plaintext = ds.get('data', 'plaintext')
print(ds_data[0])
print(data[0])

print(ds_data.shape)
[  41  -90  -65 ... -104   32  -86]
[  41  -90  -65 ... -104   32  -86]
(10000, 5000)

If something went wrong, you can reset the dataset and reinstert data.

print('size (before reset)', ds.size)
ds.reset()
print('size (after reset)', ds.size)
size (before reset) 2000
size (after reset) 0

Now, we store the full dataset in one shot. HDF5 uses buffering, storing 50MB is instant but inserting gigabytes is also very fast!

%%time
ds.extend(data, pts)
CPU times: user 0 ns, sys: 15.1 ms, total: 15.1 ms
Wall time: 14.9 ms

Adding Assets

It is very frequent to add constant data in a dataset (e.g., the secret key used, the scope configuration). You can add an asset (a numpy array or raw bytes as follows). You will do that with three methods:

In addition, we provide two helpers:

ds_2 = store.create_dataset("dataset_with_assets", 10, "x", "y")
ds_2.append(np.array([1, 2]), np.array([3, 5]))

Let’s insert some assets:

ds_2.add_asset("name_of_the_asset", np.arange(100, dtype=np.int16))
ds_2.add_asset("name_of_byte_asset", b"coucou")


scope_config = {"samples": 100, "precision": 1e-3}
ds_2.add_json_asset("scope_config.json", scope_config)

Note

Here, we crafted a dummy scope config manually. In a real acquisition, you may find the secbench.api.instrument.Scope.config() method helpful to obtain this JSON object.

Now, we can see that the assets are present in the dataset, then try to load them.

ds_2.assets()
['name_of_byte_asset', 'name_of_the_asset', 'scope_config.json']
ds_2.get_asset("name_of_the_asset")
<HDF5 dataset "name_of_the_asset": shape (100,), type "<i2">
ds_2.get_json_asset("scope_config.json")
{'samples': 100, 'precision': 0.001}

Exercice

It is time for you to dive in!

  • (1) Create a dataset named “exercice_1”, with 4 fields “x”, “y”, “z”, “power”, with a capacity of 300 elements.

  • (2) Add a single entry into it. We assume that “x”, “y” and “z” have type np.float32.

    • (a) Try something like ds_ex.append(3.0, 4.5, 5.0, power), why does it fail?

    • (b) To fix this issue, try to pass explicitly typed scalar values: np.float32(3.0) instead of 3.0.

  • (3) Fill the rest of the dataset with extend.

  • (4) Try to add an additional entry and see what happens

  • (5) Add an asset in the dataset

Hide code cell source
# Solution to (1) 
ds_ex = store.create_dataset('exercice_1', 1000, 'x', 'y', 'z', 'power')
print(ds_ex.fields())
['x', 'y', 'z', 'power']
Hide code cell source
# Solution to (2)
try:
    ds_ex.append(3.0, 4.5, 5.0, np.random.random(10))
except AttributeError as e:
    print("!! append raised an exception:", e)
# (2.a) It fails because the values are not explicitely typed.

# (2.b) correct call:
ds_ex.append(np.float32(3.0), np.float32(4.5), np.float32(5.0), np.random.random(10))
print("ds_ex, size after insertion:", ds_ex.size)
!! append raised an exception: 'float' object has no attribute 'shape'
ds_ex, size after insertion: 1
Hide code cell source
# Solution to (3)
coords = np.random.random(1000 - 1).astype(np.float32)
power = np.random.random(size=(1000 - 1, 10))

ds_ex.extend(coords, coords, coords, power)
Hide code cell source
# Solution to (4)
try:
    ds_ex.append(np.float32(3.0), np.float32(4.5), np.float32(5.0), np.random.random(10))
except ValueError as e:
    print("!! append raised an exception:", e)
!! append raised an exception: this dataset is full, cannot append.
Hide code cell source
# Solution to (5)
ds_ex.add_asset("demo_asset", b"anything you want")

To continue the tutorial, we close the HDF5 file used for creating the dataset.

store.close()
del store

Loading Datasets

When working in the analysis stage, we recommend to open HDF files in read-only mode to prevent unexpected modifications.

store = Store.open('walkthrough.hdf5', mode='r')

You can look at the available datasets with the datasets method.

list(store.datasets())
[<DatasetRef: name="exercice_1">,
 <DatasetRef: name="dataset_with_assets">,
 <DatasetRef: name="my_acquisition">]

You can also check if a specific dataset is available, or iterate over dataset names:

print("'my_acquisition' defined:", "my_acquisition" in store)
print("'not_existing' defined:", "not_existing" in store)

print("\nDatasets:")
for name in store:
    print("-", name)
'my_acquisition' defined: True
'not_existing' defined: False

Datasets:
- exercice_1
- dataset_with_assets
- my_acquisition

Then, you can open the dataset with load_dataset() or using the store["dataset_name"] syntax, as follows:

ds_rd = store["my_acquisition"]
print("size:", ds_rd.size, "/", ds_rd.capacity)
size: 10000 / 10000

You can then access the fields using the get method (or using the dataset["field name"]syntax):

data_rd = ds_rd["data"]
print(data_rd[0])
print(data[0])
[  41  -90  -65 ... -104   32  -86]
[  41  -90  -65 ... -104   32  -86]

In addition, you can easily check if a field is available or iterate through field names:

print("'data' field exists:", "data" in ds_rd)
print("'aa' field exists:", "aa" in ds_rd)

print("\nFields:")
for name in ds_rd:
    print("-", name)
'data' field exists: True
'aa' field exists: False

Fields:
- data
- plaintext

If you opened the file in read/write (e.g., ‘a’ mode), you can continue to push rows in the dataset or reset it. But in read-only mode, these operations will fail.

try:
    ds_rd.reset()
    ds_rd.append(data[0], pts[0])
except Exception as e:
    print("!! got runtime error:", e)
!! got runtime error: 'Unable to delete attribute (no write intent on file)'

Command Line Interface

The command line tool secbench-db allows direct interaction with a dataset.

Exporting a Dataset

You can export datasets from the command line, or using the export() method.

!secbench-db status walkthrough.hdf5
ROOT
 +-- exercice_1
 |   +-- capacity: 1000
 |   +-- size: 1000
 |   +-- fields
 |   |   +-- x: shape=(1000,), dtype=float32
 |   |   +-- y: shape=(1000,), dtype=float32
 |   |   +-- z: shape=(1000,), dtype=float32
 |   |   +-- power: shape=(1000, 10), dtype=float64
 |   +-- assets
 |       +-- demo_asset: shape=(17,), dtype=uint8
 +-- my_acquisition
 |   +-- capacity: 10000
 |   +-- size: 10000
 |   +-- fields
 |       +-- data: shape=(10000, 5000), dtype=int8
 |       +-- plaintext: shape=(10000, 16), dtype=uint8
 +-- dataset_with_assets
     +-- capacity: 10
     +-- size: 1
     +-- fields
     |   +-- x: shape=(10, 2), dtype=int64
     |   +-- y: shape=(10, 2), dtype=int64
     +-- assets
         +-- name_of_byte_asset: shape=(6,), dtype=uint8
         +-- name_of_the_asset: shape=(100,), dtype=int16
         +-- scope_config.json: shape=(36,), dtype=uint8
!rm -f walkthrough_export.hdf5
!secbench-db export -o walkthrough_export.hdf5 --rename exercice_1_exp walkthrough.hdf5 exercice_1
!secbench-db status walkthrough_export.hdf5
ROOT
 +-- exercice_1_exp
     +-- capacity: 1000
     +-- size: 1000
     +-- fields
     |   +-- x: shape=(1000,), dtype=float32
     |   +-- y: shape=(1000,), dtype=float32
     |   +-- z: shape=(1000,), dtype=float32
     |   +-- power: shape=(1000, 10), dtype=float64
     +-- assets
         +-- demo_asset: shape=(17,), dtype=uint8

Cleanup

!rm -f walkthrough walkthrough_export.hdf5