Datasets
Open-source AES datasets
The standard version of AISY Framework comes with the definitions of open-source side-channel analysis datasets.
The user can download the datasets by running the following commands in a terminal:
Windows 10
curl.exe -o ASCAD_data.zip https://www.data.gouv.fr/s/resources/ascad/20180530-163000/ASCAD_data.zip
curl.exe -o ascad-variable.h5 https://static.data.gouv.fr/resources/ascad-atmega-8515-variable-key/20190903-083349/ascad-variable.h5
curl.exe -o ches_ctf.h5 http://aisylabdatasets.ewi.tudelft.nl/aes_hd.h5
curl.exe -o ches_ctf.h5 http://aisylabdatasets.ewi.tudelft.nl/aes_hd_ext.h5
Linux
wget https://www.data.gouv.fr/s/resources/ascad/20180530-163000/ASCAD_data.zip
wget https://static.data.gouv.fr/resources/ascad-atmega-8515-variable-key/20190903-083349/ascad-variable.h5
wget http://aisylabdatasets.ewi.tudelft.nl/aes_hd.h5
wget http://aisylabdatasets.ewi.tudelft.nl/aes_hd_ext.h5
Creating datasets for AISY Framework
.h5 datasets
In the academic community, the most used datasets in deep learning-based SCA are ASCAD trace sets from ANSSI (France) github repository.
The authors of ASCAD repository delivered source code
to generate .h5
datasets according to their specific format. The lines 127-145 from file ASCAD_generate.py
provide an example of
how to generate .h5
datasets.
If the user wants to process datasets in the .h5
format, AISY Framework expects files generated according to ASCAD database.
The only limitation found in ASCAD_generate.py
code is that it does not generate ciphertexts in the metadata field. Therefore, we
modified the code in order to overcome this limitation.
out_file = h5py.File('new_dataset.h5', 'w')
profiling_traces_group = out_file.create_group("Profiling_traces")
attack_traces_group = out_file.create_group("Attack_traces")
profiling_traces_group.create_dataset(name="traces", data=train_samples, dtype=train_samples.dtype)
attack_traces_group.create_dataset(name="traces", data=test_samples, dtype=test_samples.dtype)
metadata_type_profiling = np.dtype([("plaintext", profiling_plaintext.dtype, (len(profiling_plaintext[0]),)),
("ciphertext", profiling_ciphertext.dtype, (len(profiling_ciphertext[0]),)),
("key", profiling_key.dtype, (len(profiling_key[0]),)),
("mask", profiling_key.dtype, (len(profiling_key[0]),))
])
metadata_type_attack = np.dtype([("plaintext", attack_plaintext.dtype, (len(attack_plaintext[0]),)),
("ciphertext", attack_ciphertext.dtype, (len(attack_ciphertext[0]),)),
("key", attack_key.dtype, (len(attack_key[0]),)),
("mask", attack_key.dtype, (len(attack_key[0]),))
])
profiling_metadata = np.array([(profiling_plaintext[n], profiling_ciphertext[n], profiling_key[n], profiling_mask[n]) for n, k in
zip(profiling_index, range(0, len(train_samples)))], dtype=metadata_type_profiling)
profiling_traces_group.create_dataset("metadata", data=profiling_metadata, dtype=metadata_type_profiling)
attack_metadata = np.array([(attack_plaintext[n], attack_ciphertext[n], attack_key[n], attack_mask[n]) for n, k in
zip(attack_index, range(0, len(test_samples)))], dtype=metadata_type_attack)
attack_traces_group.create_dataset("metadata", data=attack_metadata, dtype=metadata_type_attack)
out_file.flush()
out_file.close()
Here is an example of how to read .h5
dataset in the ASCAD format:
in_file = h5py.File("my_location/my_dataset.h5", "r")
# reading trace samples
profiling_samples = numpy.array(in_file['Profiling_traces/traces'], dtype=numpy.float64)
attack_samples = numpy.array(in_file['Attack_traces/traces'], dtype=numpy.float64)
# reading trace plaintexts
profiling_plaintext = in_file['Profiling_traces/metadata']['plaintext']
attack_plaintext = in_file['Attack_traces/metadata']['plaintext']
# reading trace ciphertexts
profiling_ciphertext = in_file['Profiling_traces/metadata']['ciphertext']
attack_ciphertext = in_file['Attack_traces/metadata']['ciphertext']
# reading trace keys
profiling_key = in_file['Profiling_traces/metadata']['key']
attack_key = in_file['Attack_traces/metadata']['key']
# reading trace masks
profiling_mask = in_file['Profiling_traces/metadata']['mask']
attack_mask = in_file['Attack_traces/metadata']['mask']
Other dataset formats
.npz
and .csv
dataset formats will be available in future releases.
Dataset root folder
The root location of datasets can be defined in two ways.
Defining datasets location inapp.py
file
The root location of datasets is defined in app.py
file as follows:
datasets_root_folder = "my_location/datasets/"
The user is free to change this location.
Defining database location in script file
The user can also easily set the dataset location in the main script, as in the example below:
aisy = AisyAes()
aisy.set_datasets_root_folder("my_location/datasets/")
aisy.set_dataset("ascad-variable.h5")
Datasets Specifications
As for the location, specifications for the datasets can be done either in the main script or, as recommended, in a dictionary containing
six specifications for all datasets in custom/custom_datasets/datasets.py
file. The possible specifications are:
"file_name":
string setting the file name of the dataset. Example:"filename": "ascad-variable.h5"
."key":
hex string defining the dataset key. Example:"key": "00112233445566778899AABBCCDDEEFF"
."first_sample":
integer defining the index of the first sample in each trace to be analysed. Example:"first_sample": 0
."number_of_samples":
integer defining the number of samples in each trace to be analysed. Example:"number_of_samples": 700
."number_of_profiling_traces":
integer defining the number of profiling traces. Example:"number_of_profiling_traces": 200000
."number_of_attack_traces":
integer defining the number of attack traces. Example:"number_of_attack_traces": 10000
.
If the user decides to set the dataset specification in custom/custom_datasets/datasets.py
file, the dataset is called from the main
script according to the name defined in the dictionary. In the main script:
aisy = AisyAes()
aisy.set_dataset("ascad-variable.h5")
And in the dictionary in custom/custom_datasets/datasets.py
:
datasets_dict = {
"ascad-variable.h5": {
"filename": "ascad-variable.h5",
"key": "00112233445566778899AABBCCDDEEFF",
"first_sample": 0,
"number_of_samples": 1400,
"number_of_profiling_traces": 100000,
"number_of_attack_traces": 1000
}
}
Alternatively, the user can completely ignore custom/custom_datasets/datasets.py
file and set all the specifications in the main script:
aisy = aisy_sca.Aisy()
aisy.set_dataset_filename("ascad-variable.h5")
aisy.set_key("00112233445566778899AABBCCDDEEFF")
aisy.set_number_of_profiling_traces(200000)
aisy.set_number_of_attack_traces(10000)
aisy.set_first_sample(0)
aisy.set_number_of_samples(1400)
Note that if the user decides to avoid the datasets definitions from datasets_dict
, it is strictly necessary to provide all six specifications.