(Comments)
Are you a data-driven scientists or data engineer who wants to have complete control of your Keras models and want to be free from mindless parameter hopping and searching?
Hyperparameter optimization generally requires training the model multiple times with different configurations which means a fast computer with multiple graphics cards is needed to reduce the lag time by training the models faster. After reading this post, you will be able to configure your Keras model for hyperparameter optimization experiments x3 faster and yield state-of-the-art on TPU for free, compared to running the same setup on my single GTX1070.
Based on my experience of several open source hyperparameter optimization solutions with Keras support out there, Talos offers the most intuitive, easy-to-learn and permissive access to important hyperparameter optimization capabilities. Let's build an experiment to search for the best CNN model parameters to predict the fashion MNIST datasets with Talos.
Unlike some other neural architecture search tools like Auto-Keras, there is no black box during the hyperparameter optimization process, and it is up to you to specify the options for the parameter search.
Consider a candidate CNN model in Keras for the fashion MNIST classification task you normally write.
model = Sequential()
model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(64, (5, 5), padding='same', activation='elu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Dropout(0.25))
model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(128, (5, 5), padding='same', activation='elu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(256, (5, 5), padding='same', activation='elu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('elu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
To prepare the model for a Talos scan for the optimal hyperparameters, simply replace the ones you want to include in the scans with references to your parameter dictionary like below,
def fashion_mnist_fn(x_train, y_train, x_val, y_val, params):
conv_dropout = float(params['conv_dropout'])
dense1_neuron = int(params['dense1_neuron'])
model = Sequential()
model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(64, (5, 5), padding='same', activation=params['activation']))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Dropout(conv_dropout))
model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(128, (5, 5), padding='same', activation=params['activation']))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(conv_dropout))
model.add(BatchNormalization(input_shape=x_train.shape[1:]))
model.add(Conv2D(256, (5, 5), padding='same', activation=params['activation']))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Dropout(conv_dropout))
model.add(Flatten())
model.add(Dense(dense1_neuron))
model.add(Activation(params['activation']))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
out = model.fit(
x, y, epochs=10, batch_size=32,
verbose=0,
validation_data=[x_val, y_val]
)
return out, model
The value params
fashion_mnist_fn
model.fit()
Here is how you define a list of hyperparameters and start the searching.
import talos as ta
para = {
'dense1_neuron': [256, 512],
'activation': ['relu', 'elu'],
'conv_dropout': [0.25, 0.4]
}
scan_results = ta.Scan(x, y, para, fashion_mnist_fn)
Talos supports several common optimization strategies, for the simplest grid search, a combination of parameters will be plugged into fashion_mnist_fn
If you run the previous scan it will only run on your default TensorFlow device, either CPU or GPU.
However, to run the whole process much faster with Cloud TPU, some extra steps must take place after you construct the model and convert the model to a TPU model.
def fashion_mnist_fn_tpu(x_train, y_train, x_val, y_val, params):
# Step 1: reset the tensorflow backend session.
tf.keras.backend.clear_session()
# Step 2: Define the model with variable hyperparameters.
conv_dropout = float(params['conv_dropout'])
dense1_neuron = int(params['dense1_neuron'])
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation=params['activation']))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(conv_dropout))
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation=params['activation']))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(conv_dropout))
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation=params['activation']))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(conv_dropout))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(params['dense1_neuron']))
model.add(tf.keras.layers.Activation(params['activation']))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Activation('softmax'))
# Step 3: conver the model to tpu model and compile with tensorflow optimizer.
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
)
)
tpu_model.compile(
optimizer=tf.train.AdamOptimizer(learning_rate=1e-3, ),
loss=tf.keras.losses.categorical_crossentropy,
metrics=['categorical_accuracy']
)
# Step 4: Train the model on TPU with fixed batch size.
out = tpu_model.fit(
x, y, epochs=10, batch_size = 1024,
verbose=0,
validation_data=[x_val, y_val]
)
# Step 5: Return the history output and synced back cpu model.
return out, tpu_model.sync_to_cpu()
Realize the differences happen in step 1, 3 and 4. The batch size of 1024 will be split to 8 TPU cores evenly with each training on 128 input samples batch.
After the scan is complete, you recover the best model index with the highest validation accuracy or other metrics of your choice.
# Get the best model index with highest 'val_categorical_accuracy'
model_id = scan_results.data['val_categorical_accuracy'].astype('float').argmax() - 1
# Clear any previous TensorFlow session.
tf.keras.backend.clear_session()
# Load the model parameters from the scanner.
from tensorflow.keras.models import model_from_json
model = model_from_json(scan_results.saved_models[model_id])
model.set_weights(scan_results.saved_weights[model_id])
model.summary()
model.save('./best_model.h5')
# Download the saved model to your local file system from Colab.
from google.colab import files
files.download('./best_model.h5')
It took 12:29 to completely train 8 variation of CNN with variable hyperparameters on TPU compared to 40:18 training on my GTX 1070.
Be sure to check out the runnable Colab notebook for this tutorial and the GPU/CPU counterpart Jupyter notebook on my GitHub.
Want to train an RNN Keras model x20 times faster with TPU? Read my previous post - How to train Keras model x20 times faster with TPU for free.
Also, read more about Talos on Github.
Share on Twitter Share on Facebook
Comments