How to train a Keras model to generate colors

(Comments)

color-names-crayons

Ever wonder how paint colors are named? “Princess ivory”, “Bull cream.” And what about “Keras red”? It turns out that people are making a living naming those colors. In this post, I’m going to show you how to build a simple deep learning model to do something similar — give the model a color name as input, and have the model propose the name of the color.

This post is beginner friendly. I will introduce you to the basic concepts of processing text data with deep learning.

Overview

  1. Choose a language model to best represent input text
  2. Clean and prepare data for training
  3. Build a basic Keras sequential neural network model.
  4. Apply recurrent neural network (RNN) to process character sequences.
  5. Generate 3 channel RGB color outputs.

Let’s take a look at the big picture we’re going to build,

build_overview

Language model

There are two general options for language modeling: word level models and character level models. Each has its own advantages and disadvantages. Let’s go through them now.

Word level language model

The word level language model can handle relatively long and clean sentences. By “clean”, I mean the words in the text datasets are free from typos and have few words outside of English vocabulary. The word level language model encodes each unique word into a corresponding integer, and there’s a predefined fixed-sized vocabulary dictionary to look up the word to integer mapping. One major benefit of the word level language model is its ability to leverage pre-trained word embeddings such as Word2Vec or GLOVE. These embeddings represent words as vectors with useful properties. Words close in context are close in Euclidean distance and can be used to understand analogies like “man is to women, as king is to queen”. Using these ideas, you can train a word level model with relatively small labeled training sets.

Character level language model

But there’s an even simpler language model, one that splits a text string into characters and associates a unique integer to every single character. There are some reasons you might choose to use the character level language model over the more popular word-level model:

  • Your text datasets contain a noticeable amount of out-of-vocabulary words or infrequent words. In our case, some legitimate color names could be “aquatone”, “chartreuse” and “fuchsia”. For me, I have to check a dictionary and find out their meanings and traditional word-level embeddings may not contain them.
  • The majority of the text strings are short, bounded length strings. If you’re looking for a specific length limit, I’ve been dealing with a Yelp review generation model with character level encode character length of 60 and still get decent results. You can find that blog post here: How to generate realistic yelp restaurant reviews with Keras. Usually, the character level language generation model can create text with more variety since its imagination is not constrained by a pre-defined dictionary of vocabulary.

You may also be aware of the limitation that came with adopting character level language:

  • Long sequences may not capture long-range dependencies as well as word level language models.
  • Character level models are also more computationally expensive to train — given the same text data sets, these model sequences are longer and, as a result, require extended training time.

Fortunately, these limitations won’t pose a threat to our color generation task. We’re limiting our color names to 25 characters in length and we only have 14157 training samples.

dataset

We mentioned that we’re limiting our color names to 25 characters. To arrive at this number we checked the distribution of the length of color names across all training samples and visualize it to make sure the length limit we pick makes sense.

h = sorted(names.str.len().as_matrix())
import numpy as np
import scipy.stats as stats
import pylab as plt

fit = stats.norm.pdf(h, np.mean(h), np.std(h))  #this is a fitting indeed
plt.plot(h,fit,'-o')
plt.hist(h,normed=True)      #use this to draw histogram of your data
plt.xlabel('Chars')
plt.ylabel('Probability density')
plt.show()

That gives us this plot, and you can clearly see that the majority of the color name strings has lengths less or equal to 25, even though the max length goes up to 30.

We could in our case pick the max length of 30, but the model we’re going to build will also need to be trained on longer sequences for an extended time. Our trade-off to pick shorter sequence length reduces the model training complexity while not compromising the integrity of the training data.

char length distribution

With the tough decision of max length being made, the next step in the character level data pre-processing is to transform each color name string to a list of 25 integer values, and this was made easy with the Keras text tokenization utility.

from tensorflow.python.keras.preprocessing.text import Tokenizer

maxlen = 25
t = Tokenizer(char_level=True)
t.fit_on_texts(names)
tokenized = t.texts_to_sequences(names)
padded_names = preprocessing.sequence.pad_sequences(tokenized, maxlen=maxlen)

Right now padded_names will have the shape of (14157, 25), where 14157 is the number of total training samples and 25 being the max sequence length. If a string has less than 25 characters, it will be padded with the value 0s from the beginning of the sequence.

You might be thinking, all inputs are now in the form of integers, and our model should be able to process it. But there is one more step we can take to make later model training more effective.

One-hot encoding

We can view the character to integer mapping by inspecting the t.word_indexproperty of the instance of Keras’ Tokenizer.

{' ': 4,  'a': 2, 'b': 18, 'c': 11, 'd': 13, 'e': 1, 'f': 22, 'g': 14, 'h': 16, 'i': 5, 'j': 26, 'k': 21, 'l': 7, 'm': 17, 'n': 6, 'o': 8, 'p': 15, 'q': 25, 'r': 3, 's': 10, 't': 9,'u': 12, 'v': 23, 'w': 20, 'x': 27, 'y': 19, 'z': 24}

The integer values have no natural ordered relationship between each other and our model may not be able to harness any benefit from it. What’s worse, our model will initially assume such an ordering relationship among those characters (i.e. “a” is 2 and “e” is 1 but that should not signify a relationship), which can lead to an unwanted result. We will use one-hot encoding to represent the input sequence.

Each integer will be represented by a boolean array where only one element in the array will have a value of 1. The max integer value will determine the length of the boolean array in the character dictionary.

In our case, the max integer value is ‘x’: 27, so the length of a one-hot boolean array will be 28(considering the lowest value starts with 0, which is the padding).

For example, instead of using the integer value 2 to represent character ‘a’, we’re going to use one-hot array [0, 0, 1, 0 …….. 0].

One-hot encoding is also accessible in Keras.

from keras.utils import np_utils
one_hot_names = np_utils.to_categorical(padded_names)

The resulting one_hot_names has the shape (14157, 25, 28), which stands for (# of training samples, max sequence length, # of unique tokens)

Data normalization

Remember we’re predicting 3 color channel values, each value ranging between 0–255. There is no golden rule for data normalization. Data normalization is purely practical because in practice it could take a model forever to converge if the training data values are spread out too much. A common normalization technique is to scale values to [-1, 1]. In our model we are using a ReLu activation function in the last layer. Since ReLu outputs non-negative numbers, we’ll normalize the values to [0, 1].

# The RGB values are between 0 - 255
# scale them to be between 0 - 1
def norm(value):
    return value / 255.0

normalized_values = np.column_stack([norm(data["red"]), norm(data["green"]), norm(data["blue"])])

Build the model

To build our model we’re going to use two types of neural networks, a feed forward neural network and a recurrent neural network. The feed forward neural network is by far the most common type of neural network. In this neural network, the information comes into the input units and flows in one direction through hidden layers until each reaches the output units.

In recurrent neural networks information can flow around in cycles. These networks can remember information for a long time. Recurrent networks are a very natural way to model sequential data. In our specific model, we’re using one of the most powerful recurrent networks named long short term memory(LSTM).

The easiest way to build up a deep learning model in Keras is to use its sequential API, and we simply connect each of the neural network layers by calling its model.add() function like connecting LEGO bricks.

from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Dropout, LSTM, Reshape

model = Sequential()
model.add(LSTM(256, return_sequences=True, input_shape=(maxlen, 28)))
model.add(LSTM(128))
model.add(Dense(128, activation='relu'))
model.add(Dense(3, activation='sigmoid'))
model.compile(optimizer='adam', loss='mse', metrics=['acc'])

Training a model cannot be any easier by calling model.fit() function. Notice that we’re reserving 10% of the samples for validation purpose. If it turns out the model is achieving great accuracy on the training set but much lower on the validation set, it’s likely the model is overfitting. You can get more information about dealing with overfitting on my other blog: Two Simple Recipes for Over Fitted Model.

history = model.fit(one_hot_names, normalized_values,
                    epochs=40,
                    batch_size=32,
                    validation_split=0.1)

Generate RGB colors

Let’s define some functions to generate and show the color predicted.

For a color name input, we need to transform it into the same one-hot representation. To achieve this, we tokenize characters to integers with the same tokenizer with which we processed the training data, pad it to the max sequence length of 25, then apply the one-hot encoding to the integer sequence.

And for the output RGB values, we need to scale it back to 0–255, so we can display them correctly.

# plot a color image
def plot_rgb(rgb):
    data = [[rgb]]
    plt.figure(figsize=(2,2))
    plt.imshow(data, interpolation='nearest')
    plt.show()

def scale(n):
    return int(n * 255) 

def predict(name):
    name = name.lower()
    tokenized = t.texts_to_sequences([name])
    padded = preprocessing.sequence.pad_sequences(tokenized, maxlen=maxlen)
    one_hot = np_utils.to_categorical(padded, num_classes=28)
    pred = model.predict(np.array(one_hot))[0]
    r, g, b = scale(pred[0]), scale(pred[1]), scale(pred[2])
    print(name + ',', 'R,G,B:', r,g,b)
    plot_rgb(pred)

Let's give the predict() function a try.

predict("tensorflow orange")
predict("forest")
predict("keras red")

predict-colors

“keras red” looks a bit darker than one we’re familiar with, but anyway, that was the model proposed.

Conclusion and further reading

In this post, we talked about how to build a Keras model that can take any color name and come up with an RGB color value. More specifically, we looked at how to apply the one-hot encoding to character level language models, building a neural network model with a feed forward neural network and recurrent neural network.

Here’s a diagram to summarize what we have built in the post, starting from the bottom and showing every step of the data flow.

data-pipeline

If you’re new to deep learning or the Keras library, there are some great resources that are easy and fun to read or experiment with.

TensorFlow playground: an interactive visualization of neural networks run on your browser.

Coursera deep learning course: learn the foundations of deep learning and lots of practical advice.

Keras get started guide: the official guide for the user-friendly, modular deep Python deep learning library.

Also, check out the source code for this post in my GitHub repo.

Current rating: 4

Comments