Here, we continue with “Using Tensorflow to Create Neural Network with TripAdvisor Data.”

It’ll be helpful to look back at the first part of this article as we review the code sections below.

**len(np.unique(ttarget)** are number of target TripAdvisor Scores 1,2,3,4, 5. We add 1 to that value to give **nnum_labels**. This is because we must start with value 0 since that is an index and the first value of an array has index 0.

When we multiply **(nnum_labels)[ttarget]** and plug that into the numpy **np.eye** function to create a **One Hot Vector**. A one hot vector is a way to present the Scores 1,2,3,4,5 in an array where each element is 0 except the one whose position yields the value. So, a 1 in the 3rd position indicates a 2 (We start counting at 0.).

aall_Y array([[ 0., 0., 0., 0., 0., 1.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 0., 1.], ..., [ 0., 0., 0., 0., 1., 0.], [ 0., 0., 1., 0., 0., 0.], [ 0., 0., 0., 0., 1., 0.]])

Now we use the scikit.learn function **train_test_split** to split the features (x) and labels (y) into the four sets ttrain_X, ttest_X, ttrain_y, ttest_y data sets.

`ttrain_X, ttest_X, ttrain_y, ttest_y = train_test_split(aall_X, aall_Y, test_size=0.33, random_state=RANDOM_SEED)`

In the code below, we use the **softmax** function to normalize our numbers. So, instead of working with large numbers we can work with small ones.

Remember that initially our weights are a guess. The neural network iteratively tries new weights until it finds the one that minimizes the cost function. The rule is that we must supply some statistical distribution. So, we use a **normal distribution** with standard deviation **δ = 0.1**

`weights = tf.random_normal(shape, stddev=0.1)`

def init_weights(shape): weights = tf.random_normal(shape, stddev=0.1) return tf.Variable(weights)

Now, we defined the **forwardprop** function. Remember that we are dealing with a multiple layer perceptron free **forward** neural network.

**forwaprop** defines an **activation** function. Below we use **tf.nn.sigmoid** which is one of the many nonlinear activation ones. See below for clarification.

def forwardprop(X, w_1, w_2): h = tf.nn.sigmoid(tf.matmul(X, w_1)) yhat = tf.matmul(h, w_2) return yhat

**tf.matmul(X, w_1)** multiplies the matrices W and w_1.

**y-hat** (pronounced y-hat) is the function **ŷ = wx + b** where w is the weights, x is the input matrix, and b is the bias. In other words it is the predicted value of a perceptron, i.e., the output of the activation function.

To understand what all of this means, let’s look at the diagram below from **Towards Data Science**. The x’s are inputs. The w’s are weights and bias is bias. ∑ means to calculate the sum of each x1w1, x2w2, … ,xmwm. Then we add the bias. The green perceptron then uses, in this case, the simplest possible activation function to return 1 if the that sum if positive and 0 otherwise. ŷ is then the predicted value for the neural network.

**XX** is the Tensor (i.e.,matrix) of inputs and **ww_1** and **ww_2** are the weights, **yhat** is the prediction made by each perceptron. tr.argmax returns the index with the largest value.

`XX = tf.placeholder("float", shape=[None, xx_size])`

yy = tf.placeholder("float", shape=[None, yy_size])`ww_1 = init_weights((xx_size, hh_size))`

ww_2 = init_weights((hh_size, yy_size))

The next function is the **cost** function. Remember that the goal with any kind of regression analysis is to minimize the cost function. In the case of simple linear regression, it is the difference between the predictive value and actual value, or MSE (mean squared error). So, we follow that convention and use **tf.reduce_mean**.

**tf.nn.softmax_cross_entropy_with_logits logits** means that the labels are probabilities that sum to 1. It it the logistic regression function, specifically **long odds**. Softmax, as we mentioned above, normalizes the values.

**Cross entropy** and **tf.train.GradientDescentOptimizer** are algorithms used to minimize the cost function in the fastest possible way.

yyhat = forwardprop(XX, ww_1, ww_2) ppredict = tf.argmax(yyhat, axis=1) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=yy, logits=yyhat)) updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

Now, we create Tensors to store our neural network calculations. They must have the same shape and size as the inputs to the neural network and outputs and number of hidden layers.

**xx_size** is 21 because we stuck the value 1 onto the front of our features set as a place to store our bias.

We pick 20 for **hh_size**, the value of the number of nodes in our hidden layer. (The number of hidden layers to pick is basically a guess, i.e., a value that can be changed to hone in on better results.)

**hh:1** is the size of the hidden layer “A hidden layer transforms a single-layer perceptron into a multi-layer perceptron.”

**yy_size** is the size of our output.

`xx_size = ttrain_X.shape[1]`

hh_size = 20

yy_size = ttrain_y.shape[1]

Remember these are **placeholders** and do not yet have a value. They must have the same size along one axis to allow matrix multiplication. You can check that by looking at each:

`XX`

<tf.Tensor 'Placeholder:0' shape=(?, 21) dtype=float32>`yy`

<tf.Tensor 'Placeholder_1:0' shape=(?, 6) dtype=float32>`ww_1`

<tf.Variable 'Variable:0' shape=(21, 20) dtype=float32_ref>`ww_2`

<tf.Variable 'Variable_1:0' shape=(20, 6) dtype=float32_ref>

Check and you can observe that **yyhat** and **ppredict** are not arrays at all but function placeholders.

Recall that Tensors do not have values until the **init** and **run** the session:

`sess = tf.Session()`

init = tf.global_variables_initializer()

sess.run(init)

Now, we use the **feed** function to inject values into the tensors. We loop an arbitrary number of times and keep track of how often the predicted values equal the actual values to calculate the accuracy.

for epoch in range(100): for i in range(len(ttrain_X)): sess.run(updates, feed_dict={XX: ttrain_X[i: i + 1], yy: ttrain_y[i: i + 1]}) train_accuracy = np.mean(np.argmax(ttrain_y, axis=1) == sess.run(ppredict, feed_dict={XX: ttrain_X, yy: ttrain_y})) test_accuracy = np.mean(np.argmax(ttest_y, axis=1) == sess.run(ppredict, feed_dict={XX: ttest_X, yy: ttest_y})) print("Epoch = %d, train accuracy = %.2f%%, test accuracy = %.2f%%" % (epoch + 1, 100. * train_accuracy, 100. * test_accuracy))

## Results

Here are our results. They are not good. In the paper cited at the top, the authors only achieved 74% accuracy. Building a neural network to do handwriting recognition would get close to 100% accuracy, because that is an easier-to-conceive problem. Here there are many other factors to consider.

`...`

`Epoch = 90, train accuracy = 45.10%, test accuracy = 44.91%`

Epoch = 91, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 92, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 93, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 94, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 95, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 96, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 97, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 98, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 99, train accuracy = 45.10%, test accuracy = 44.91%

Epoch = 100, train accuracy = 45.10%, test accuracy = 44.91%

## Improving our Accuracy

So how do we improve our model? If we look at resources like Slav Ivanov’s “37 Reasons why your Neural Network is not working” or “Improving the way neural networks learn” from Michael Nielsen and go back and mention the paper from which we drew out data, there are lots of things we could change.

For example, we could reject the notion that TripAdvisor score is somehow a function of the number of reviews a person has written, the time of year, etc. But that would not be logical.

Also, we do not have enough training data. And it could be that there is a correlation between some of our input variables, which omitted. So, we need to drop certain columns and try again. And then there is the possibility that we could have picked the wrong cost function.

We will adjust all of this and report back to you. But the point is that the data scientist tunes a model over and over until it yields satisfactory results.

## The Complete Code

Here is the complete code in one place to make copying it easier.

import tensorflow as tf import pandas as pd import numpy as np from sklearn.model_selection import train_test_split RANDOM_SEED = 42 tf.set_random_seed(RANDOM_SEED) def yesNo(x): if x=="YES": return 1 else: return 0 cols = ['User country', 'Nr. reviews','Nr. hotel reviews','Helpful votes','Score','Period of stay','Traveler type','Pool','Gym','Tennis court','Spa','Casino','Free internet','Hotel name','Hotel stars','Nr. rooms','User continent','Member years','Review month','Review weekday'] df = pd.read_csv('LasVegasTripAdvisorReviews-Dataset.csv',sep=';',header=0) df['Casino']=df['Casino'].apply(lambda x : yesNo(x)) df['Gym']=df['Gym'].apply(lambda x : yesNo(x)) df['Pool']=df['Pool'].apply(lambda x : yesNo(x)) df['Tennis court']=df['Tennis court'].apply(lambda x : yesNo(x)) df['Casino']=df['Casino'].apply(lambda x : yesNo(x)) df['Free internet']=df['Free internet'].apply(lambda x : yesNo(x)) df['Spa']=df['Spa'].apply(lambda x : yesNo(x)) def toOrd(str): x=0 for l in str: x += ord(l) return int(x) cols2 = ['Period of stay', 'Hotel name', 'User country', 'Traveler type', 'User continent', 'Review month', 'Review weekday'] for y in cols2: df[y]=df[y].apply(lambda x: toOrd(x)) def init_weights(shape): weights = tf.random_normal(shape, stddev=0.1) return tf.Variable(weights) ttarget = df['Score'].values df = df.drop('Score',axis=1) ddata=pd.DataFrame.as_matrix(df,cols).astype(int) # Prepend the column of 1s for bias NN, MM = ddata.shape aall_X = np.ones((NN, MM + 1)) aall_X[:, 1:] = ddata nnum_labels = len(np.unique(ttarget)) + 1 aall_Y = np.eye(nnum_labels)[ttarget] ttrain_X, ttest_X, ttrain_y, ttest_y = train_test_split(aall_X, aall_Y, test_size=0.33, random_state=RANDOM_SEED) # Layer's sizes xx_size = ttrain_X.shape[1] # Number of input nodes: 19 hh_size = 20 # Number of hidden nodes yy_size = ttrain_y.shape[1] # Number of outcomes 5 ML advisor ratings # Symbols XX = tf.placeholder("float", shape=[None, xx_size]) yy = tf.placeholder("float", shape=[None, yy_size]) # Weight initializations ww_1 = init_weights((xx_size, hh_size)) ww_2 = init_weights((hh_size, yy_size)) def forwardprop(X, w_1, w_2): """ Forward-propagation. IMPORTANT: yhat is not softmax since TensorFlow's softmax_cross_entropy_with_logits() does that internally. """ h = tf.nn.sigmoid(tf.matmul(X, w_1)) # The \sigma function yhat = tf.matmul(h, w_2) # The \varphi function return yhat # Forward propagation yyhat = forwardprop(XX, ww_1, ww_2) ppredict = tf.argmax(yyhat, axis=1) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=yy, logits=yyhat)) updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost) sess = tf.Session() init = tf.global_variables_initializer() sess.run(init) for epoch in range(100): for i in range(len(ttrain_X)): sess.run(updates, feed_dict={XX: ttrain_X[i: i + 1], yy: ttrain_y[i: i + 1]}) train_accuracy = np.mean(np.argmax(ttrain_y, axis=1) == sess.run(ppredict, feed_dict={XX: ttrain_X, yy: ttrain_y})) test_accuracy = np.mean(np.argmax(ttest_y, axis=1) == sess.run(ppredict, feed_dict={XX: ttest_X, yy: ttest_y})) print("Epoch = %d, train accuracy = %.2f%%, test accuracy = %.2f%%" % (epoch + 1, 100. * train_accuracy, 100. * test_accuracy))

##### Related posts:

- Using Apache Pig and Hadoop with ElasticSearch with The Elasticsearch-Hadoop Connector
- Using Spark with Hive
- Introduction to Neural Networks Part II
- Using ElasticSearch with Apache Spark
- Working with Streaming Twitter Data Using Kafka

## Want to Learn More About Big Data and What It Can Do for You?

BMC recently published an authoritative guide on big data automation. It’s called Managing Big Data Workflows for Dummies. Download now and learn to manage big data workflows to increase the value of enterprise data.