We are going to build a Keras model that leverages the pre-trained "Universal Sentence Encoder" to classify a given question text to one of the six categories.
TensorFlow Hub modules can be applied to a variety of transfer learning tasks and datasets, whether it is images or text. "Universal Sentence Encoder" is one of the many newly published TensorFlow Hub reusable modules, a self-contained piece of TensorFlow graph, with pre-trained weights value included.
A runnable Colab notebook is available, you can experiment with the code while reading on.
While you can choose to treat all TensorFlow Hub modules as black boxes, agnostic of what happens inside and still be able to build a functional transfer learning model. It would be helpful to develop a deeper understanding, that gives you a new perspective on what each module is capable of, its constraints and how well the transfer learning result could potentially be.
If you recall the GloVe word embeddings vectors in our previous tutorial which turns a word to 50-dimensional vector, the Universal Sentence Encoder is much more powerful, and it is able to embed not only words but phrases and sentences. That is, it takes variable length English text as input and outputs a 512-dimensional vector. Handling variable length text input sounds great, but what's the catch is as sentence getting longer counted by words, the more diluted embedding results could be. And since the model was trained at the word level, it will likely find typos and difficult words challenging to process. More on the difference between world and character level language model, you can read my previous tutorial.
There are two Universal Sentence Encoders to choose from with different encoder architectures to achieve distinct design goals, one based on the transformer architecture targets high accuracy at the cost of greater model complexity and resource consumption. The other targets efficient inference with slightly reduced accuracy by the deep averaging network(DAN).
Side by side Model architectures comparison for the Transformer and DAN sentence encoders.
The original Transformer model constitutes an encoder and decoder, but here we only use its encoder part.
The encoder is composed of a stack of N = 6 identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-wise fully connected feed-forward network. They also employed a residual connection around each of the two sub-layers, followed by layer normalization. Since the model contains no recurrence and no convolution, for the model to make use of the order of the sequence, it must inject some information about the relative or absolute position of the tokens in the sequence, that is what the "positional encodings" does. The
Deep Averaging Network(DAN) is much simpler where input embeddings for words and bi-grams are first averaged together and then passed through a feedforward deep neural network (DNN) to produce sentence embeddings. The primary advantage of the DAN encoder is that compute time is linear in the length of the input sequence.
Depends on what type of training data and the chosen training metric, it can have a significant impact on the transfer learning result.
Both models were trained with the Stanford Natural Language Inference (SNLI) corpus. The SNLI corpus is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). Essentially, the models were trained to learn the semantic similarity between the sentence pairs.
With that in mind, the sentence embeddings can be trivially used to compute sentence-level semantic similarity scores.
The source code to generate the similarity heat map is available both in my Colab notebook and in GitHub repo. Colored based on the inner product of the encodings for any two sentences. That means the more similar two sentences are, the darker the color is.
Loading Universal Sentence Encoder and computing the embeddings for some text can be as easy as below.
import tensorflow as tf import tensorflow_hub as hub module_url = "https://tfhub.dev/google/universal-sentence-encoder-large/3" # Import the Universal Sentence Encoder's TF Hub module embed = hub.Module(module_url) # Compute a representation for each message, showing various lengths supported. messages = ["That band rocks!", "That song is really cool."] with tf.Session() as session: session.run([tf.global_variables_initializer(), tf.tables_initializer()]) message_embeddings = session.run(embed(messages))
First time loading the module can take a while since it will download the weights files.
array([[ 0.06587551, 0.02066354, -0.01454356, ..., 0.06447642, 0.01654527, -0.04688655], [ 0.06909196, 0.01529877, 0.03278331, ..., 0.01220771, 0.03000253, -0.01277521]], dtype=float32)
To respond correctly to a question given a large collection of texts, classifying questions into fine-grained classes is crucial
The dataset we use is the TREC Question Classification dataset, There are entirely 5452 training and 500 test samples, that is 5452 + 500 questions each categorized into one of the six labels.
We want our model to be a multiclass classification model that takes strings as input and output probability for each of the 6 class labels. With this in mind, you know how to prepare the training and testing data for it.
The first step is to turn the raw text file into a pandas DataFrame and set the "label" column to be categorical column so as we can further access a label as a numeric value.
def get_dataframe(filename): lines = open(filename, 'r').read().splitlines() data =  for i in range(0, len(lines)): label = lines[i].split(' ') label = label.split(":") text = ' '.join(lines[i].split(' ')[1:]) text = re.sub('[^A-Za-z0-9 ,\?\'\"-._\+\!/\`@=;:]+', '', text) data.append([label, text]) df = pd.DataFrame(data, columns=['label', 'text']) df.label = df.label.astype('category') return df df_train = get_dataframe('train_5500.txt') df_train.head()
First 5 training samples look like this.
Next step we will prepare the input/output data for the model, the input as a list of question strings, and output as a list of one-hot encoded labels. If you are unfamiliar with one-hot encoding yet, I got you covered in part of my previous post.
train_text = df_train['text'].tolist() train_text = np.array(train_text, dtype=object)[:, np.newaxis] train_label = np.asarray(pd.get_dummies(df_train.label), dtype = np.int8)
If you take a peek at the value
array([[0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 0] ...], dtype=int8)
Now we are ready to build the model.
We have previously loaded the Universal Sentence Encoder as variable
def UniversalEmbedding(x): return embed(tf.squeeze(tf.cast(x, tf.string)), signature="default", as_dict=True)["default"]
Then we build the Keras model in its standard Functional API,
input_text = layers.Input(shape=(1,), dtype=tf.string) embedding = layers.Lambda(UniversalEmbedding, output_shape=(embed_size,))(input_text) dense = layers.Dense(256, activation='relu')(embedding) pred = layers.Dense(category_counts, activation='softmax')(dense) model = Model(inputs=[input_text], outputs=pred) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
We can view the model summary and realize that only the Keras layers are trainable, that is how the transfer learning task works by assuring the Universal Sentence Encoder weights untouched.
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 1) 0
lambda_1 (Lambda) (None, 512) 0
dense_1 (Dense) (None, 256) 131328
dense_2 (Dense) (None, 6) 1542
Total params: 132,870
Trainable params: 132,870
Non-trainable params: 0
In the next step, we train the model with the training datasets and validate its performance at the end of each training epoch with test datasets.
with tf.Session() as session: K.set_session(session) session.run(tf.global_variables_initializer()) session.run(tf.tables_initializer()) history = model.fit(train_text, train_label, validation_data=(test_text, test_label), epochs=10, batch_size=32) model.save_weights('./model.h5')
The final validation result shows the highest accuracy gets around 97% after training for 10 epochs.
After we have the model trained and its weights saved to a file, it is really to make predictions on new questions.
Here we come up with 3 new questions for the model to classify.
new_text = ["In what year did the titanic sink ?", "What is the highest peak in California ?", "Who invented the light bulb ?"] new_text = np.array(new_text, dtype=object)[:, np.newaxis] with tf.Session() as session: K.set_session(session) session.run(tf.global_variables_initializer()) session.run(tf.tables_initializer()) model.load_weights('./model.h5') predicts = model.predict(new_text, batch_size=32) categories = df_train.label.cat.categories.tolist() predict_logits = predicts.argmax(axis=1) predict_labels = [categories[logit] for logit in predict_logits] print(predict_labels)
The classification results look decent.
['NUM', 'LOC', 'HUM']
Congratulation! You have built a Keras text transfer learning model powered by the Universal Sentence Encoder and achieved a great result in question classification task. The Universal Sentence Encoder can embed longer paragraphs, so feel free to experiment with other datasets like the news topic classification, sentiment analysis, etc.
Some related resources you might find useful.
TensorFlow Hub example notebooks
For an intro to use Google Colab notebook, you can read the first section of my post- How to run Object Detection and Segmentation on a Video Fast for Free.
The source code in my GitHub and a runnable Colab notebook.
Share on Twitter Share on Facebook