How to choose Last-layer activation and loss function

Posted by: Chengwei 7 years, 7 months ago

choices

Without further due, here are the different combinations of last-layer activation and loss function pair for different tasks.

Last-layer activation and loss function combinations

Problem type	Last-layer activation	Loss function	Example
Binary classification	sigmoid	binary_crossentropy	Dog vs cat, Sentiemnt analysis(pos/neg)
Multi-class, single-label classification	softmax	categorical_crossentropy	MNIST has 10 classes single label (one prediction is one digit)
Multi-class, multi-label classification	sigmoid	binary_crossentropy	News tags classification, one blog can have multiple tags
Regression to arbitrary values	None	mse	Predict house price(an integer/float point)
Regression to values between 0 and 1	sigmoid	mse or binary_crossentropy	Engine health assessment where 0 is broken, 1 is new

Binary classification - Dog VS Cat

This competition on Kaggle is where you write an algorithm to classify whether images contain either a dog or a cat. It is a binary classification task where the output of the model is a single number range from 0~1 where the lower value indicates the image is more "Cat" like, and higher value if the model thing the image is more "Dog" like.

Here are the code for the last fully connected layer and the loss function used for the model

#Dog VS Cat last Dense layer
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])

If you are interested in the full source code for this dog vs cat task, take a look at this awesome tutorial on GitHub.

Multi-class single-label classification - MNIST

The task is to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 categories (0 to 9). The dataset came with Keras package so it's very easy to have a try.

Last layer use "softmax" activation, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

# MNIST last Dense layer
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Again the full source code for MNIST classification is provided on GitHub.

Multi-class, multi-label classification - News tags classification

Reuters-21578 is a collection of about 20K news-lines and categorized with 672 labels. They are divided into five main categories:

Topics
Places
People
Organizations
Exchanges

For example, one news can have 3 tags

Places: USA, China
Topics: trade

# News tags classification last Dense layer
model.add(Dense(num_categories, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', metrics=['accuracy'])

You can take a look at the source code for this task on my GitHub.

I also wrote another blog for this task in detail as well, check out if you are interested.

Regression to arbitrary values - Bosten Housing price prediction

The goal is to predict a single continuous value instead of a discrete label of the house price with given data.

The network ends with a Dense without any activation because applying any activation function like sigmoid will constrain the value to 0~1 and we don't want that to happen.

The mse loss function, it computes the square of the difference between the predictions and the targets, a widely used loss function for regression tasks.

# predict house price last Dense layer
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

Full source code can be found in the same GitHub repo.

Regression to values between 0 and 1

For a task like making an assessment of the health condition of a jet engine providing several sensors recordings. We want the output to be a continuous value from 0~1 where 0 means the engine needs to be replaced and 1 means it is in perfect condition, whereas the value between 0 and 1 may mean some degree of maintenance is needed. Compare to previous regression problem we are applying the "sigmoid" activation to the last dense layer to constrain the value between 0 to 1.

# Jet engine health assessment last Dense layer
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

How to choose Last-layer activation and loss function

Posted by: Chengwei 7 years, 7 months ago

Last-layer activation and loss function combinations

Binary classification - Dog VS Cat

Multi-class single-label classification - MNIST

Multi-class, multi-label classification - News tags classification

Regression to arbitrary values - Bosten Housing price prediction

Regression to values between 0 and 1

Comments

Recent Posts

Archive

2020

2019

2018

2017

Categories

Tags

Authors

Feeds

How to choose Last-layer activation and loss function

Posted by: Chengwei 7 years, 7 months ago

Last-layer activation and loss function combinations

Binary classification - Dog VS Cat

Multi-class single-label classification - MNIST

Multi-class, multi-label classification - News tags classification

Regression to arbitrary values - Bosten Housing price prediction

Regression to values between 0 and 1

Related posts

Comments

Recent Posts

Archive

2020

2019

2018

2017

Categories

Tags

Authors

Feeds