(Comments)

Compared to more simpler hyperparameter search methods like grid search and random search, Bayesian optimization is built upon Bayesian inference and Gaussian process with an attempts to find the maximum value of an unknown function as few iterations as possible. It is particularly suited for optimization of high-cost functions like hyperparameter search for deep learning model, or other situations where the balance between exploration and exploitation is important.

The Bayesian Optimization package we are going to use is BayesianOptimization, which can be installed with the following command,

`pip install bayesian-optimization`

Firstly, we will specify the function to be optimized, in our case, hyperparameters search, the function takes a set of hyperparameters values as inputs, and output the evaluation accuracy for the Bayesian optimizer. Inside the function, a new model will be constructed with the specified hyperparameters, train for a number of epochs and evaluated against a set metrics. Every new evaluated accuracy will become a new observation for the Bayesian optimizer, which contributes to the next search hyperparameters' values.

Let's create a helper function first which builds the model with various parameters.

```
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, BatchNormalization, MaxPooling2D, Flatten, Activation
from tensorflow.python.keras.optimizer_v2 import rmsprop
def get_model(input_shape, dropout2_rate=0.5):
"""Builds a Sequential CNN model to recognize MNIST.
Args:
input_shape: Shape of the input depending on the `image_data_format`.
dropout2_rate: float between 0 and 1. Fraction of the input units to drop for `dropout_2` layer.
Returns:
a Keras model
"""
# Reset the tensorflow backend session.
# tf.keras.backend.clear_session()
# Define a CNN model to recognize MNIST.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape,
name="conv2d_1"))
model.add(Conv2D(64, (3, 3), activation='relu', name="conv2d_2"))
model.add(MaxPooling2D(pool_size=(2, 2), name="maxpool2d_1"))
model.add(Dropout(0.25, name="dropout_1"))
model.add(Flatten(name="flatten"))
model.add(Dense(128, activation='relu', name="dense_1"))
model.add(Dropout(dropout2_rate, name="dropout_2"))
model.add(Dense(NUM_CLASSES, activation='softmax', name="dense_2"))
return model
```

Then, here is the function to be optimized with Bayesian optimizer, the **partial** function takes care of two arguments - `input_shape`

and `verbose`

in `fit_with`

which have fixed values during the runtime.

The function takes two hyperparameters to search, the dropout rate for the "dropout_2" layer and learning rate value, it trains the model for 1 epoch and outputs the evaluation accuracy for the Bayesian optimizer.

```
def fit_with(input_shape, verbose, dropout2_rate, lr):
# Create the model using a specified hyperparameters.
model = get_model(input_shape, dropout2_rate)
# Train the model for a specified number of epochs.
optimizer = rmsprop.RMSProp(learning_rate=lr)
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
# Train the model with the train dataset.
model.fit(x=train_ds, epochs=1, steps_per_epoch=468,
batch_size=64, verbose=verbose)
# Evaluate the model with the eval dataset.
score = model.evaluate(eval_ds, steps=10, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
# Return the accuracy.
return score[1]
from functools import partial
verbose = 1
fit_with_partial = partial(fit_with, input_shape, verbose)
```

The **BayesianOptimization** object will work out of the box without much tuning needed. The constructor takes the function to be optimized as well as the boundaries of hyperparameters to search. The main method you should be aware of is `maximize`

, which does exactly what you think it does, maximizing the evaluation accuracy given the hyperparameters.

```
from bayes_opt import BayesianOptimization
# Bounded region of parameter space
pbounds = {'dropout2_rate': (0.1, 0.5), 'lr': (1e-4, 1e-2)}
optimizer = BayesianOptimization(
f=fit_with_partial,
pbounds=pbounds,
verbose=2, # verbose = 1 prints only when a maximum is observed, verbose = 0 is silent
random_state=1,
)
optimizer.maximize(init_points=10, n_iter=10,)
for i, res in enumerate(optimizer.res):
print("Iteration {}: \n\t{}".format(i, res))
print(optimizer.max)
```

Here are many parameters you can pass to `maximize`

, nonetheless, the most important ones are:

`n_iter`

: How many steps of Bayesian optimization you want to perform. The more steps the more likely to find a good maximum you are.`init_points`

: How many steps of**random**exploration you want to perform. Random exploration can help by diversifying the exploration space.

| iter | target | dropou... | lr | ------------------------------------------------- 468/468 [==============================] - 4s 8ms/step - loss: 0.2575 - acc: 0.9246 Test loss: 0.061651699058711526 Test accuracy: 0.9828125 | 1 | 0.9828 | 0.2668 | 0.007231 | 468/468 [==============================] - 4s 8ms/step - loss: 0.2065 - acc: 0.9363 Test loss: 0.04886047407053411 Test accuracy: 0.9828125 | 2 | 0.9828 | 0.1 | 0.003093 | 468/468 [==============================] - 4s 8ms/step - loss: 0.2199 - acc: 0.9336 Test loss: 0.05553104653954506 Test accuracy: 0.98125 | 3 | 0.9812 | 0.1587 | 0.001014 | 468/468 [==============================] - 4s 9ms/step - loss: 0.2075 - acc: 0.9390 Test loss: 0.04128134781494737Test accuracy: 0.9890625| 4 | 0.9891 | 0.1745 | 0.003521 |

After searching for 4 times, the model build with the found hyperparameters achieves an evaluation accuracy of 98.9% with just one epoch of training.

Unlike grid search which does search in a finite number of discrete hyperparameters combinations, the nature of Bayesian optimization with Gaussian processes doesn't allow for an easy/intuitive way of dealing with discrete parameters.

For example, we want to search for the number of the neuron of a dense layer from a list of options. To apply Bayesian optimization, it is necessary to explicitly convert the input parameters to discrete ones before constructing the model.

You can do something like this.

```
pbounds = {'dropout2_rate': (0.1, 0.5), 'lr': (1e-4, 1e-2), "dense_1_neurons_x128": (0.9, 3.1)}
def fit_with(input_shape, verbose, dropout2_rate, dense_1_neurons_x128, lr):
# Create the model using a specified hyperparameters.
dense_1_neurons = max(int(dense_1_neurons_x128 * 128), 128)
model = get_model(input_shape, dropout2_rate, dense_1_neurons)
# ...
```

The dense layers neurons will be mapped to 3 unique discrete values, 128, 256 and 384 before constructing to the model.

In Bayesian optimization, every next search values depend on previous observations(previous evaluation accuracies), the whole optimization process can be hard to be distributed or parallelized like the grid or random search methods.

This quick tutorial introduces how to do hyperparameter search with Bayesian optimization, it can be more efficient compared to other methods like the grid or random since every search are "**guided**" from previous search results.

BayesianOptimization - The Python implementation of global optimization with Gaussian processes used in this tutorial.

How to perform Keras hyperparameter optimization x3 faster on TPU for free - My previous tutorial on performing grid hyperparameter search with Colab's free TPU.

- How to train Detectron2 with Custom COCO Datasets
- Getting started with VS CODE remote development
- Recent Advances in Deep Learning for Object Detection - Part 2
- Recent Advances in Deep Learning for Object Detection - Part 1
- How to run Keras model on Jetson Nano in Nvidia Docker container

- October (1)
- September (3)
- August (1)
- July (2)
- June (2)
- May (3)
- April (3)
- March (1)
- February (2)
- January (2)

- December (3)
- November (3)
- October (3)
- September (5)
- August (5)
- July (4)
- June (4)
- May (4)
- April (6)
- March (5)
- February (3)
- January (4)

- deep learning (76)
- edge computing (15)
- Keras (47)
- NLP (8)
- python (68)
- PyTorch (7)
- tensorflow (33)

- tutorial (55)
- Sentiment analysis (3)
- keras (34)
- deep learning (56)
- pytorch (2)

- Chengwei (82)

## Comments