Data Augmentation

AISY Framework also allows easy configuration of data augmentation techniques during model training. Data augmentation is common machine learning technique and it is widely applied in side-channel analysis in order to improve model learnability. Basically, data augmentation allows small modifications in side-channel traces during training, which makes the model to reduce overfitting and, as a consequence, to improve generalization.

Creating Data Augmentation Methods

THe user may define data augmentation methods in any location. For better code organization, we recommend to write the methods in the file custom/custom_data_augmentation/data_augmentation.py Code below provides two examples of data augmentation methods: data_augmentation_shifts and data_augmentation_gaussian_noise:

import random
import numpy as np


def data_augmentation_shifts(data_set_samples, data_set_labels, batch_size, input_layer_shape):
    ns = len(data_set_samples[0])

    while True:

        x_train_shifted = np.zeros((batch_size, ns))
        rnd = random.randint(0, len(data_set_samples) - batch_size)
        x_mini_batch = data_set_samples[rnd:rnd + batch_size]

        for trace_index in range(batch_size):
            x_train_shifted[trace_index] = x_mini_batch[trace_index]
            shift = random.randint(-5, 5)
            if shift > 0:
                x_train_shifted[trace_index][0:ns - shift] = x_mini_batch[trace_index][shift:ns]
                x_train_shifted[trace_index][ns - shift:ns] = x_mini_batch[trace_index][0:shift]
            else:
                x_train_shifted[trace_index][0:abs(shift)] = x_mini_batch[trace_index][ns - abs(shift):ns]
                x_train_shifted[trace_index][abs(shift):ns] = x_mini_batch[trace_index][0:ns - abs(shift)]

        if len(input_layer_shape) == 3:
            x_train_shifted_reshaped = x_train_shifted.reshape((x_train_shifted.shape[0], x_train_shifted.shape[1], 1))
            yield x_train_shifted_reshaped, data_set_labels[rnd:rnd + batch_size]
        else:
            yield x_train_shifted, data_set_labels[rnd:rnd + batch_size]


def data_augmentation_gaussian_noise(data_set_samples, data_set_labels, batch_size, input_layer_shape):
    ns = len(data_set_samples[0])

    while True:

        x_train_augmented = np.zeros((batch_size, ns))
        rnd = random.randint(0, len(data_set_samples) - batch_size)
        x_mini_batch = data_set_samples[rnd:rnd + batch_size]

        noise = np.random.normal(0, 1, ns)

        for trace_index in range(batch_size):
            x_train_augmented[trace_index] = x_mini_batch[trace_index] + noise

        if len(input_layer_shape) == 3:
            x_train_augmented_reshaped = x_train_augmented.reshape((x_train_augmented.shape[0], x_train_augmented.shape[1], 1))
            yield x_train_augmented_reshaped, data_set_labels[rnd:rnd + batch_size]
        else:
            yield x_train_augmented, data_set_labels[rnd:rnd + batch_size]

As we can see, the method must be created with four input parameters:

data_set_samples:: array containing the profiling traces.
data_set_labels:: array containing the profiling categorical labels.
batch_size:: integer defining the batch size.
input_layer_shape:: shape of the input layer.

When data augmentation is considered, the model training is done with fit_generator method from Keras.

Calling Data Augmentation in the Main Script

To pass data augmentation to the run method, user must enter two parameter in a list, as in the example below:

import aisy_sca
from app import *
from custom.custom_models.neural_networks import *
from custom.custom_data_augmentation.data_augmentation import *

aisy = aisy_sca.Aisy()
aisy.set_resources_root_folder(resources_root_folder)
aisy.set_database_root_folder(databases_root_folder)
aisy.set_datasets_root_folder(datasets_root_folder)
aisy.set_database_name("database_ascad.sqlite")
aisy.set_dataset(datasets_dict["ascad-variable.h5"])
aisy.set_aes_leakage_model(leakage_model="HW", byte=2)
aisy.set_batch_size(400)
aisy.set_epochs(10)
aisy.set_neural_network(mlp)
aisy.run(data_augmentation=[data_augmentation_gaussian_noise, 100])

The first parameter indicates the name of the data augmentation method, as defined in the custom/custom_data_augmentation/data_augmentation.py file. The second parameter indicates the number of data augmentation iterarions in a single epochs. Each iteration will apply data augmentation method to a defined amount of profiling traces. In our examples above, each iteration randomly select a number of profiling traces equivalent to the batch size and applies the modifications.