Secbench Storage¶
You can also open this tutorial in a secbench environment download the notebook
and open it in a Jupyter in the secbench environment.
Overview¶
The secbench storage API is a thin layer on top of HDF5 files (see h5py Documentation).
It provides a Store
class, which abstracts the storage backend, such as a HDF5 file. A store contains zero or more Dataset
.
A Dataset
object represents an aggregation of several measurements. You can think of a dataset as a 2-dimensional array, where each row contains different fields (for example, x
, y
, z
, temperature
).
In side-channel analysis, we usually create a dataset for each acquisition campaign, with fields like:
side-channel measurements from the scope,
the plaintext associated with each trace,
the ciphertext associated with each trace.
In this walkthrough, we will see how to create a dataset and read it.
We start with some imports.
import json
# All data are represented and stored as numpy arrays
import numpy as np
# Dataset
from secbench.storage import Dataset, Store
Creating Datasets¶
The tutorial uses generated data, so that the notebook is self-contained.
We generate a 50MB dataset. This is not sufficient to get a taste of the performances achieved by the library. But, feel free to adapt the parameters to test on your machine.
capacity = 10_000
samples = 5_000
data = np.random.randint(-128, 127, size=(capacity, samples), dtype=np.int8)
pts = np.random.randint(0, 256, size=(capacity, 16), dtype=np.uint8)
print(f'Dataset size: {data.size / 1e6} Mb')
Dataset size: 50.0 Mb
The first step before creating or loading some dataset is to open a destination Store
. For this, we use the secbench.storage.Store.open()
class method.
We create one called “walkthrough.hdf5”.
Note that we open this file in ‘w’ mode, which clears the file at each execution. This mode is convenient for the tutorial. However, in practice, we recommend the ‘a’ mode when writing a dataset and the ‘r’ mode when reading a dataset. The modes supported by secbench.storage.Store.__init__()
constructor are:
Mode (string) |
Mode (OpenMode) |
Description |
---|---|---|
‘r’ |
|
Read only, file must exist (default) |
‘r+’ |
|
Read/write, file must exist |
‘w’ |
|
Create file, truncate if exists |
‘w-’ |
|
Create file, fail if exists |
‘a’ |
|
Read/write if exists, create otherwise |
store = Store('walkthrough.hdf5', mode='w')
Now, we can create a dataset called “my_acquisition” (the name should be chosen to easily identify datasets). This dataset will have a capacity of 100_000
entries and two fields: “data” and “plaintext”.
It means that this dataset can hold at most 100_000
pairs of “data” and “plaintext”. When created, the dataset is empty.
ds = store.create_dataset('my_acquisition', capacity, 'data', 'plaintext')
We can introspect various information about the dataset. The size
attribute represents the number of rows (entries) in the dataset.
print("fields:", ds.fields())
print("capacity:", ds.capacity)
print("size", ds.size)
fields: ['data', 'plaintext']
capacity: 10000
size 0
The first way to add some entries in the dataset is to use the secbench.storage.Dataset.append()
method, which adds a single entry.
%%time
for i in range(1000):
ds.append(data[i], pts[i])
CPU times: user 61 ms, sys: 3.93 ms, total: 65 ms
Wall time: 64.9 ms
We can see that the size of the dataset was updated.
print(ds.size)
1000
However, a much faster way to insert traces (20x faster!) is to use the extend()
, which adds many entries at once.
%%time
ds.extend(data[1000:2000], pts[1000:2000])
CPU times: user 1.98 ms, sys: 193 μs, total: 2.18 ms
Wall time: 1.95 ms
IMPORTANT: On the first call to extend()
or append()
, those methods look at the type and shape of all fields (here data
and pts
) and allocate the data in the HDF5 file. This implies that the arguments to append
(or extend
) must always have the same type and shape.
Once some rows are in the dataset, you can access directly the underlying arrays with the get()
method. This method return the array with its full capacity, the data is only valid on the slice [:ds.size]
.
ds_data, ds_plaintext = ds.get('data', 'plaintext')
print(ds_data[0])
print(data[0])
print(ds_data.shape)
[ 41 -90 -65 ... -104 32 -86]
[ 41 -90 -65 ... -104 32 -86]
(10000, 5000)
If something went wrong, you can reset the dataset and reinstert data.
print('size (before reset)', ds.size)
ds.reset()
print('size (after reset)', ds.size)
size (before reset) 2000
size (after reset) 0
Now, we store the full dataset in one shot. HDF5 uses buffering, storing 50MB is instant but inserting gigabytes is also very fast!
%%time
ds.extend(data, pts)
CPU times: user 0 ns, sys: 15.1 ms, total: 15.1 ms
Wall time: 14.9 ms
Adding Assets¶
It is very frequent to add constant data in a dataset (e.g., the secret key used, the scope configuration). You can add an asset (a numpy array or raw bytes as follows). You will do that with three methods:
add_asset()
, to insert or replace an asset,get_asset()
, to retrieve the content of the assets,assets()
, to list assets available.
In addition, we provide two helpers:
add_json_asset()
: encode a python object in JSON format and store it as an assetget_json_asset()
: load an asset stored in JSON format.
ds_2 = store.create_dataset("dataset_with_assets", 10, "x", "y")
ds_2.append(np.array([1, 2]), np.array([3, 5]))
Let’s insert some assets:
ds_2.add_asset("name_of_the_asset", np.arange(100, dtype=np.int16))
ds_2.add_asset("name_of_byte_asset", b"coucou")
scope_config = {"samples": 100, "precision": 1e-3}
ds_2.add_json_asset("scope_config.json", scope_config)
Note
Here, we crafted a dummy scope config manually. In a real acquisition, you may find the secbench.api.instrument.Scope.config()
method helpful to obtain this JSON object.
Now, we can see that the assets are present in the dataset, then try to load them.
ds_2.assets()
['name_of_byte_asset', 'name_of_the_asset', 'scope_config.json']
ds_2.get_asset("name_of_the_asset")
<HDF5 dataset "name_of_the_asset": shape (100,), type "<i2">
ds_2.get_json_asset("scope_config.json")
{'samples': 100, 'precision': 0.001}
Exercice¶
It is time for you to dive in!
(1) Create a dataset named “exercice_1”, with 4 fields “x”, “y”, “z”, “power”, with a capacity of 300 elements.
(2) Add a single entry into it. We assume that “x”, “y” and “z” have type
np.float32
.(a) Try something like
ds_ex.append(3.0, 4.5, 5.0, power)
, why does it fail?(b) To fix this issue, try to pass explicitly typed scalar values:
np.float32(3.0)
instead of3.0
.
(3) Fill the rest of the dataset with
extend
.(4) Try to add an additional entry and see what happens
(5) Add an asset in the dataset
Show code cell source
# Solution to (1)
ds_ex = store.create_dataset('exercice_1', 1000, 'x', 'y', 'z', 'power')
print(ds_ex.fields())
['x', 'y', 'z', 'power']
Show code cell source
# Solution to (2)
try:
ds_ex.append(3.0, 4.5, 5.0, np.random.random(10))
except AttributeError as e:
print("!! append raised an exception:", e)
# (2.a) It fails because the values are not explicitely typed.
# (2.b) correct call:
ds_ex.append(np.float32(3.0), np.float32(4.5), np.float32(5.0), np.random.random(10))
print("ds_ex, size after insertion:", ds_ex.size)
!! append raised an exception: 'float' object has no attribute 'shape'
ds_ex, size after insertion: 1
Show code cell source
# Solution to (3)
coords = np.random.random(1000 - 1).astype(np.float32)
power = np.random.random(size=(1000 - 1, 10))
ds_ex.extend(coords, coords, coords, power)
Show code cell source
# Solution to (4)
try:
ds_ex.append(np.float32(3.0), np.float32(4.5), np.float32(5.0), np.random.random(10))
except ValueError as e:
print("!! append raised an exception:", e)
!! append raised an exception: this dataset is full, cannot append.
Show code cell source
# Solution to (5)
ds_ex.add_asset("demo_asset", b"anything you want")
To continue the tutorial, we close the HDF5 file used for creating the dataset.
store.close()
del store
Loading Datasets¶
When working in the analysis stage, we recommend to open HDF files in read-only mode to prevent unexpected modifications.
store = Store.open('walkthrough.hdf5', mode='r')
You can look at the available datasets with the datasets
method.
list(store.datasets())
[<DatasetRef: name="exercice_1">,
<DatasetRef: name="dataset_with_assets">,
<DatasetRef: name="my_acquisition">]
You can also check if a specific dataset is available, or iterate over dataset names:
print("'my_acquisition' defined:", "my_acquisition" in store)
print("'not_existing' defined:", "not_existing" in store)
print("\nDatasets:")
for name in store:
print("-", name)
'my_acquisition' defined: True
'not_existing' defined: False
Datasets:
- exercice_1
- dataset_with_assets
- my_acquisition
Then, you can open the dataset with load_dataset()
or using the store["dataset_name"]
syntax, as follows:
ds_rd = store["my_acquisition"]
print("size:", ds_rd.size, "/", ds_rd.capacity)
size: 10000 / 10000
You can then access the fields using the get
method (or using the dataset["field name"]
syntax):
data_rd = ds_rd["data"]
print(data_rd[0])
print(data[0])
[ 41 -90 -65 ... -104 32 -86]
[ 41 -90 -65 ... -104 32 -86]
In addition, you can easily check if a field is available or iterate through field names:
print("'data' field exists:", "data" in ds_rd)
print("'aa' field exists:", "aa" in ds_rd)
print("\nFields:")
for name in ds_rd:
print("-", name)
'data' field exists: True
'aa' field exists: False
Fields:
- data
- plaintext
If you opened the file in read/write (e.g., ‘a’ mode), you can continue to push rows in the dataset or reset it. But in read-only mode, these operations will fail.
try:
ds_rd.reset()
ds_rd.append(data[0], pts[0])
except Exception as e:
print("!! got runtime error:", e)
!! got runtime error: 'Unable to delete attribute (no write intent on file)'
Command Line Interface¶
The command line tool secbench-db
allows direct interaction with a dataset.
Exporting a Dataset¶
You can export datasets from the command line, or using the export()
method.
!secbench-db status walkthrough.hdf5
ROOT
+-- exercice_1
| +-- capacity: 1000
| +-- size: 1000
| +-- fields
| | +-- x: shape=(1000,), dtype=float32
| | +-- y: shape=(1000,), dtype=float32
| | +-- z: shape=(1000,), dtype=float32
| | +-- power: shape=(1000, 10), dtype=float64
| +-- assets
| +-- demo_asset: shape=(17,), dtype=uint8
+-- my_acquisition
| +-- capacity: 10000
| +-- size: 10000
| +-- fields
| +-- data: shape=(10000, 5000), dtype=int8
| +-- plaintext: shape=(10000, 16), dtype=uint8
+-- dataset_with_assets
+-- capacity: 10
+-- size: 1
+-- fields
| +-- x: shape=(10, 2), dtype=int64
| +-- y: shape=(10, 2), dtype=int64
+-- assets
+-- name_of_byte_asset: shape=(6,), dtype=uint8
+-- name_of_the_asset: shape=(100,), dtype=int16
+-- scope_config.json: shape=(36,), dtype=uint8
!rm -f walkthrough_export.hdf5
!secbench-db export -o walkthrough_export.hdf5 --rename exercice_1_exp walkthrough.hdf5 exercice_1
!secbench-db status walkthrough_export.hdf5
ROOT
+-- exercice_1_exp
+-- capacity: 1000
+-- size: 1000
+-- fields
| +-- x: shape=(1000,), dtype=float32
| +-- y: shape=(1000,), dtype=float32
| +-- z: shape=(1000,), dtype=float32
| +-- power: shape=(1000, 10), dtype=float64
+-- assets
+-- demo_asset: shape=(17,), dtype=uint8
Cleanup¶
!rm -f walkthrough walkthrough_export.hdf5