(Comments)

In this quick tutorial, I am going to show you two simple examples to use `sparse_categorical_crossentropy`

`sparse_categorical_accuracy`

As one of the multi-class, single-label classification datasets, the task is to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their ten categories (0 to 9). Let's build a Keras CNN model to handle it with the last layer applied with "softmax" activation which outputs an array of ten probability scores(summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

```
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=(1, 28, 28)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
```

For such a model with output shape of (None, 10), the conventional way is to have the target outputs converted to the one-hot encoded array to match with the output shape, however, with the help of `sparse_categorical_crossentropy`

All you need is `categorical_crossentropy`

`sparse_categorical_crossentropy`

```
model.compile(
optimizer=keras.optimizers.Adadelta(),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
# The conventional way
# model.compile(
# optimizer=keras.optimizers.Adadelta(),
# loss=keras.losses.categorical_crossentropy,
# metrics=['accuracy'])
```

After that, you can train the model with integer targets, i.e. a one-dimensional array like

array([5, 0, 4, 1, 9 ...], dtype=uint8)

Note this won't affect the model output shape, it still outputs ten probability scores for each input sample.

We'll train a model on the combined works of William Shakespeare, then use it to compose a play in the similar style.

Every character in the text blob is first converted to an integer by calling Python's `ord()`

`ord('a')`

returns the `97`

Given a moving window of sequence length 100, the model learns to predict the sequence one time-step in the future. In other words, given characters of timesteps T0~T99 in the sequence, the model predicts characters of timesteps T1~T100.

Let's build a simple sequence to sequence model in Keras.

```
EMBEDDING_DIM = 512
MAX_TOKENS = 256
def lstm_model(seq_len=100, batch_size=None, stateful=True, max_tokens = 256):
"""Language model: predict the next char given the current char."""
source = tf.keras.Input(
name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)
embedding = tf.keras.layers.Embedding(input_dim=max_tokens, output_dim=EMBEDDING_DIM)(source)
lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(max_tokens, activation='softmax'))(lstm_2)
model = tf.keras.Model(inputs=[source], outputs=[predicted_char])
model.compile(
optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
training_model = lstm_model(seq_len=100, batch_size=128, stateful=False, max_tokens = MAX_TOKENS)
```

We can further visualize the structure of the model to understand its input and output shape respectively.

Even though the model has 3-dimensional output, when compiled with the loss `sparse_categorical_crossentropy`

`sparse_categorical_crossentropy`

The training model is,

- non-stateful
- seq_len =100
- batch_size = 128
- Model input shape: (batch_size, seq_len)
- Model output shape: (batch_size, seq_len, MAX_TOKENS)

Once the model is trained, we can make it "stateful" and predict five characters at a time. By making it stateful, the LSTMs' last state for each sample in a batch will be used as the initial state for the sample in the following batch, or put it simply, those five characters predicted at a time and following predicted batches are characters in one sequence.

The prediction model loads the trained model weights and predicts five chars at a time, it is,

- stateful
- seq_len =1, one character/batch
- batch_size = 5
- Model input shape: (batch_size, seq_len)
- Model output shape: (batch_size, seq_len, MAX_TOKENS)
- Need to
call `reset_states()`

before prediction to reset LSTMs' initial states.

For more implementation detail of the model, please refer to my GitHub repository.

This tutorial explores two examples `sparse_categorical_crossentropy`

To learn the actual implementation of keras.backend.sparse_categorical_crossentropy and sparse_categorical_accuracy, you can find it on TensorFlow repository. Don't forget to download the source code for this tutorial on my GitHub.

Share on Twitter Share on Facebook- How to use Keras sparse_categorical_crossentropy
- How to do Novelty Detection in Keras with Generative Adversarial Network (Part 2)
- How to do Novelty Detection in Keras with Generative Adversarial Network (Part 1)
- How to use Keras TimeseriesGenerator for time series data
- How to run PyTorch with GPU and CUDA 9.2 support on Google Colab

- October (1)
- September (5)
- August (5)
- July (4)
- June (4)
- May (4)
- April (6)
- March (5)
- February (3)
- January (4)

- deep learning (50)
- edge computing (9)
- Keras (33)
- NLP (8)
- python (45)
- PyTorch (2)
- tensorflow (17)

- tutorial (27)
- Sentiment analysis (3)
- keras (21)
- deep learning (30)

- Chengwei (54)

## Comments