How to perform Keras hyperparameter optimization x3 faster on TPU for free

(Comments)

keras-tpu-search

Are you a data-driven scientists or data engineer who wants to have complete control of your Keras models and want to be free from mindless parameter hopping and searching?

Hyperparameter optimization generally requires training the model multiple times with different configurations which means a fast computer with multiple graphics cards is needed to reduce the lag time by training the models faster. After reading this post, you will be able to configure your Keras model for hyperparameter optimization experiments x3 faster and yield state-of-the-art on TPU for free, compared to running the same setup on my single GTX1070.

Based on my experience of several open source hyperparameter optimization solutions with Keras support out there, Talos offers the most intuitive, easy-to-learn and permissive access to important hyperparameter optimization capabilities. Let's build an experiment to search for the best CNN model parameters to predict the fashion MNIST datasets with Talos.

A baseline notebook running hyperparameter optimization on my single GTX1070 GPU and the TPU version are both available on my GitHub.

Prepare a Keras model for hyperparameter optimization

Unlike some other neural architecture search tools like Auto-Keras, there is no black box during the hyperparameter optimization process, and it is up to you to specify the options for the parameter search.

Consider a candidate CNN model in Keras for the fashion MNIST classification task you normally write.

model = Sequential()
model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(64, (5, 5), padding='same', activation='elu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Dropout(0.25))

model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(128, (5, 5), padding='same', activation='elu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(256, (5, 5), padding='same', activation='elu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256))
model.add(Activation('elu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

To prepare the model for a Talos scan for the optimal hyperparameters, simply replace the ones you want to include in the scans with references to your parameter dictionary like below,

def fashion_mnist_fn(x_train, y_train, x_val, y_val, params):
    conv_dropout = float(params['conv_dropout'])
    dense1_neuron = int(params['dense1_neuron'])
    model = Sequential()
    model.add(BatchNormalization(input_shape=x_train.shape[1:]))
    model.add(Conv2D(64, (5, 5), padding='same', activation=params['activation']))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
    model.add(Dropout(conv_dropout))

    model.add(BatchNormalization(input_shape=x_train.shape[1:]))
    model.add(Conv2D(128, (5, 5), padding='same', activation=params['activation']))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(conv_dropout))

    model.add(BatchNormalization(input_shape=x_train.shape[1:]))
    model.add(Conv2D(256, (5, 5), padding='same', activation=params['activation']))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
    model.add(Dropout(conv_dropout))

    model.add(Flatten())
    model.add(Dense(dense1_neuron))
    model.add(Activation(params['activation']))
    model.add(Dropout(0.5))
    model.add(Dense(10))
    model.add(Activation('softmax'))
  
    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )


    out = model.fit(
        x, y, epochs=10, batch_size=32, 
        verbose=0,
        validation_data=[x_val, y_val]
    )
    return out, model

The value of params will be a dictionary passed into the fashion_mnist_fn function during scan dynamically when the Talos scan is running. Realize that the function will return the history metrics output of the model.fit() along with the model itself so that  Talos scanner can assess the performance for the model after training.

Here is how you define a list of hyperparameters and start the searching.

import talos as ta
para = {
    'dense1_neuron': [256, 512],
    'activation': ['relu', 'elu'],
    'conv_dropout': [0.25, 0.4]
}
scan_results = ta.Scan(x, y, para, fashion_mnist_fn)

Talos supports several common optimization strategies, for the simplest grid search, a combination of parameters will be plugged into the fashion_mnist_fn you defined previously for model training.

Run hyperparameter scan on TPU

If you run the previous scan it will only run on your default TensorFlow device, either CPU or GPU.

However, to run the whole process much faster with Cloud TPU, some extra steps must take place after you construct the model and convert the model to a TPU model.

def fashion_mnist_fn_tpu(x_train, y_train, x_val, y_val, params):
	# Step 1: reset the tensorflow backend session.
    tf.keras.backend.clear_session()
    # Step 2: Define the model with variable hyperparameters.
    conv_dropout = float(params['conv_dropout'])
    dense1_neuron = int(params['dense1_neuron'])
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
    model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation=params['activation']))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
    model.add(tf.keras.layers.Dropout(conv_dropout))

    model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
    model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation=params['activation']))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Dropout(conv_dropout))

    model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
    model.add(tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation=params['activation']))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
    model.add(tf.keras.layers.Dropout(conv_dropout))

    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(params['dense1_neuron']))
    model.add(tf.keras.layers.Activation(params['activation']))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(10))
    model.add(tf.keras.layers.Activation('softmax'))
    
    # Step 3: conver the model to tpu model and compile with tensorflow optimizer.
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(
        model,
        strategy=tf.contrib.tpu.TPUDistributionStrategy(
            tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
        )
    )
    tpu_model.compile(
        optimizer=tf.train.AdamOptimizer(learning_rate=1e-3, ),
        loss=tf.keras.losses.categorical_crossentropy,
        metrics=['categorical_accuracy']
    )

    # Step 4: Train the model on TPU with fixed batch size.
    out = tpu_model.fit(
        x, y, epochs=10, batch_size = 1024,
        verbose=0,
        validation_data=[x_val, y_val]
    )
    # Step 5: Return the history output and synced back cpu model.
    return out, tpu_model.sync_to_cpu()

Realize the differences happen in step 1, 3 and 4. The batch size of 1024 will be split to 8 TPU cores evenly with each training on 128 input samples batch.

After the scan is complete, you recover the best model index with the highest validation accuracy or other metrics of your choice.

# Get the best model index with highest 'val_categorical_accuracy' 
model_id = scan_results.data['val_categorical_accuracy'].astype('float').argmax() - 1
# Clear any previous TensorFlow session.
tf.keras.backend.clear_session()

# Load the model parameters from the scanner.
from tensorflow.keras.models import model_from_json
model = model_from_json(scan_results.saved_models[model_id])
model.set_weights(scan_results.saved_weights[model_id])
model.summary()
model.save('./best_model.h5')

# Download the saved model to your local file system from Colab.
from google.colab import files

files.download('./best_model.h5')

Benchmark and conclusion

It took 12:29 to completely train 8 variation of CNN with variable hyperparameters on TPU compared to 40:18 training on my GTX 1070.

TPU searching:

tpu_search

GPU searching:

gpu_search

Be sure to check out the runnable Colab notebook for this tutorial and the GPU/CPU counterpart Jupyter notebook on my GitHub.

Want to train an RNN Keras model x20 times faster with TPU? Read my previous post - How to train Keras model x20 times faster with TPU for free.

Also, read more about Talos on Github.

Current rating: 4.4

Comments