Using Tensorflow to Create Neural Network with TripAdvisor Data: Part II

BY

Here, we continue with “Using Tensorflow to Create Neural Network with TripAdvisor Data.

It’ll be helpful to look back at the first part of this article as we review the code sections below.

len(np.unique(ttarget) are number of target TripAdvisor Scores 1,2,3,4, 5. We add 1 to that value to give nnum_labels. This is because we must start with value 0 since that is an index and the first value of an array has index 0.

When we multiply (nnum_labels)[ttarget] and plug that into the numpy np.eye function to create a One Hot Vector. A one hot vector is a way to present the Scores 1,2,3,4,5 in an array where each element is 0 except the one whose position yields the value. So, a 1 in the 3rd position indicates a 2 (We start counting at 0.).

aall_Y

array([[ 0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.],
       ..., 
       [ 0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])

Now we use the scikit.learn function train_test_split to split the features (x) and labels (y) into the four sets ttrain_X, ttest_X, ttrain_y, ttest_y data sets.

ttrain_X, ttest_X, ttrain_y, ttest_y = train_test_split(aall_X, aall_Y, test_size=0.33, random_state=RANDOM_SEED)

In the code below, we use the softmax function to normalize our numbers. So, instead of working with large numbers we can work with small ones.

Remember that initially our weights are a guess. The neural network iteratively tries new weights until it finds the one that minimizes the cost function. The rule is that we must supply some statistical distribution. So, we use a normal distribution with standard deviation δ = 0.1

weights = tf.random_normal(shape, stddev=0.1)

def init_weights(shape):
    weights = tf.random_normal(shape, stddev=0.1)
    return tf.Variable(weights)

Now, we defined the forwardprop function. Remember that we are dealing with a multiple layer perceptron free forward neural network.

forwaprop defines an activation function. Below we use tf.nn.sigmoid which is one of the many nonlinear activation ones. See below for clarification.

def forwardprop(X, w_1, w_2):
    h = tf.nn.sigmoid(tf.matmul(X, w_1))   
    yhat = tf.matmul(h, w_2)  
    return yhat

tf.matmul(X, w_1) multiplies the matrices W and w_1.

y-hat (pronounced y-hat) is the function ŷ = wx + b where w is the weights, x is the input matrix, and b is the bias. In other words it is the predicted value of a perceptron, i.e., the output of the activation function.

To understand what all of this means, let’s look at the diagram below from Towards Data Science. The x’s are inputs. The w’s are weights and bias is bias. ∑ means to calculate the sum of each x1w1, x2w2, … ,xmwm. Then we add the bias. The green perceptron then uses, in this case, the simplest possible activation function to return 1 if the that sum if positive and 0 otherwise. ŷ is then the predicted value for the neural network.

XX is the Tensor (i.e.,matrix) of inputs and ww_1 and ww_2 are the weights, yhat is the prediction made by each perceptron. tr.argmax returns the index with the largest value.

XX = tf.placeholder("float", shape=[None, xx_size])
yy = tf.placeholder("float", shape=[None, yy_size])
ww_1 = init_weights((xx_size, hh_size))
ww_2 = init_weights((hh_size, yy_size))

The next function is the cost function. Remember that the goal with any kind of regression analysis is to minimize the cost function. In the case of simple linear regression, it is the difference between the predictive value and actual value, or MSE (mean squared error). So, we follow that convention and use tf.reduce_mean.

tf.nn.softmax_cross_entropy_with_logits logits means that the labels are probabilities that sum to 1. It it the logistic regression function, specifically long odds. Softmax, as we mentioned above, normalizes the values.

Cross entropy and tf.train.GradientDescentOptimizer are algorithms used to minimize the cost function in the fastest possible way.

yyhat    = forwardprop(XX, ww_1, ww_2)
ppredict = tf.argmax(yyhat, axis=1)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=yy, logits=yyhat))
updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

Now, we create Tensors to store our neural network calculations. They must have the same shape and size as the inputs to the neural network and outputs and number of hidden layers.

xx_size is 21 because we stuck the value 1 onto the front of our features set as a place to store our bias.

We pick 20 for hh_size, the value of the number of nodes in our hidden layer. (The number of hidden layers to pick is basically a guess, i.e., a value that can be changed to hone in on better results.)

hh:1 is the size of the hidden layer “A hidden layer transforms a single-layer perceptron into a multi-layer perceptron.”

yy_size is the size of our output.

xx_size = ttrain_X.shape[1]
hh_size = 20
yy_size = ttrain_y.shape[1]

Remember these are placeholders and do not yet have a value. They must have the same size along one axis to allow matrix multiplication. You can check that by looking at each:

XX
<tf.Tensor 'Placeholder:0' shape=(?, 21) dtype=float32>
yy
<tf.Tensor 'Placeholder_1:0' shape=(?, 6) dtype=float32>
ww_1
<tf.Variable 'Variable:0' shape=(21, 20) dtype=float32_ref>
ww_2
<tf.Variable 'Variable_1:0' shape=(20, 6) dtype=float32_ref>

Check and you can observe that yyhat and ppredict are not arrays at all but function placeholders.

Recall that Tensors do not have values until the init and run the session:

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

Now, we use the feed function to inject values into the tensors. We loop an arbitrary number of times and keep track of how often the predicted values equal the actual values to calculate the accuracy.

for epoch in range(100):
    for i in range(len(ttrain_X)):
            sess.run(updates, feed_dict={XX: ttrain_X[i: i + 1], yy: ttrain_y[i: i + 1]})
    train_accuracy = np.mean(np.argmax(ttrain_y, axis=1) ==
                                 sess.run(ppredict, feed_dict={XX: ttrain_X, yy: ttrain_y}))
    test_accuracy  = np.mean(np.argmax(ttest_y, axis=1) ==
                                 sess.run(ppredict, feed_dict={XX: ttest_X, yy:  ttest_y}))
    print("Epoch = %d, train accuracy = %.2f%%, test accuracy = %.2f%%"
              % (epoch + 1, 100. * train_accuracy, 100. * test_accuracy))

Results

Here are our results. They are not good. In the paper cited at the top, the authors only achieved 74% accuracy. Building a neural network to do handwriting recognition would get close to 100% accuracy, because that is an easier-to-conceive problem. Here there are many other factors to consider.
...Epoch = 90, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 91, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 92, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 93, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 94, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 95, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 96, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 97, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 98, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 99, train accuracy = 45.10%, test accuracy = 44.91%
Epoch = 100, train accuracy = 45.10%, test accuracy = 44.91%

Improving our Accuracy

So how do we improve our model? If we look at resources like Slav Ivanov’s “37 Reasons why your Neural Network is not working” or “Improving the way neural networks learn” from Michael Nielsen and go back and mention the paper from which we drew out data, there are lots of things we could change.

For example, we could reject the notion that TripAdvisor score is somehow a function of the number of reviews a person has written, the time of year, etc. But that would not be logical.

Also, we do not have enough training data. And it could be that there is a correlation between some of our input variables, which omitted. So, we need to drop certain columns and try again. And then there is the possibility that we could have picked the wrong cost function.

We will adjust all of this and report back to you. But the point is that the data scientist tunes a model over and over until it yields satisfactory results.

The Complete Code

Here is the complete code in one place to make copying it easier.

import tensorflow as tf
import pandas as pd 
import numpy as np
from sklearn.model_selection import train_test_split

RANDOM_SEED = 42
tf.set_random_seed(RANDOM_SEED)

def yesNo(x):
    if x=="YES":
        return 1
    else:
        return 0
 
cols = ['User country', 'Nr. reviews','Nr. hotel reviews','Helpful votes','Score','Period of stay','Traveler type','Pool','Gym','Tennis court','Spa','Casino','Free internet','Hotel name','Hotel stars','Nr. rooms','User continent','Member years','Review month','Review weekday']

df = pd.read_csv('LasVegasTripAdvisorReviews-Dataset.csv',sep=';',header=0)

df['Casino']=df['Casino'].apply(lambda x : yesNo(x))
df['Gym']=df['Gym'].apply(lambda x : yesNo(x))
df['Pool']=df['Pool'].apply(lambda x : yesNo(x))
df['Tennis court']=df['Tennis court'].apply(lambda x : yesNo(x))
df['Casino']=df['Casino'].apply(lambda x : yesNo(x))
df['Free internet']=df['Free internet'].apply(lambda x : yesNo(x))
df['Spa']=df['Spa'].apply(lambda x : yesNo(x))


def toOrd(str):
    x=0
    for l in str:
        x += ord(l)
    return int(x)

cols2 = ['Period of stay', 'Hotel name', 'User country', 'Traveler type', 'User continent', 'Review month', 'Review weekday']

for y in cols2:
    df[y]=df[y].apply(lambda x: toOrd(x))

def init_weights(shape):
    weights = tf.random_normal(shape, stddev=0.1)
    return tf.Variable(weights)

ttarget = df['Score'].values
df = df.drop('Score',axis=1)
ddata=pd.DataFrame.as_matrix(df,cols).astype(int)


# Prepend the column of 1s for bias

NN, MM  = ddata.shape
aall_X = np.ones((NN, MM + 1))
aall_X[:, 1:] = ddata

nnum_labels = len(np.unique(ttarget)) + 1
aall_Y = np.eye(nnum_labels)[ttarget] 

ttrain_X, ttest_X, ttrain_y, ttest_y = train_test_split(aall_X, aall_Y, test_size=0.33, random_state=RANDOM_SEED)

# Layer's sizes
    
xx_size = ttrain_X.shape[1]   # Number of input nodes:  19
hh_size = 20                # Number of hidden nodes
yy_size = ttrain_y.shape[1]   # Number of outcomes  5 ML advisor ratings

# Symbols
    
XX = tf.placeholder("float", shape=[None, xx_size])
yy = tf.placeholder("float", shape=[None, yy_size])

# Weight initializations
ww_1 = init_weights((xx_size, hh_size))
ww_2 = init_weights((hh_size, yy_size))

def forwardprop(X, w_1, w_2):
    """
    Forward-propagation.
    IMPORTANT: yhat is not softmax since TensorFlow's softmax_cross_entropy_with_logits() does that internally.
    """
    h    = tf.nn.sigmoid(tf.matmul(X, w_1))  # The \sigma function
    yhat = tf.matmul(h, w_2)  # The \varphi function
    return yhat

# Forward propagation
yyhat    = forwardprop(XX, ww_1, ww_2)
ppredict = tf.argmax(yyhat, axis=1)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=yy, logits=yyhat))
updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

for epoch in range(100):
    for i in range(len(ttrain_X)):
            sess.run(updates, feed_dict={XX: ttrain_X[i: i + 1], yy: ttrain_y[i: i + 1]})
    train_accuracy = np.mean(np.argmax(ttrain_y, axis=1) ==
                                 sess.run(ppredict, feed_dict={XX: ttrain_X, yy: ttrain_y}))
    test_accuracy  = np.mean(np.argmax(ttest_y, axis=1) ==
                                 sess.run(ppredict, feed_dict={XX: ttest_X, yy:  ttest_y}))
    print("Epoch = %d, train accuracy = %.2f%%, test accuracy = %.2f%%"
              % (epoch + 1, 100. * train_accuracy, 100. * test_accuracy))
Related posts:

Want to Learn More About Big Data and What It Can Do for You?

BMC recently published an authoritative guide on big data automation. It’s called Managing Big Data Workflows for Dummies. Download now and learn to manage big data workflows to increase the value of enterprise data.

Download Now ›

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

Share This Post


Walker Rowe

Walker Rowe

Walker Rowe is an American freelance tech writer and programmer living in Chile. He specializes in big data, analytics, and cloud architecture. Find him on LinkedIn or at Southern Pacific Review, where he publishes short stories, poems, and news.