﻿ Using TensorFlow to Create Neural Network with TripAdvisor Data: Part II – BMC Blogs

In Part One we explained the problem we want to solve, which is predict how someone might rate one of the Las Vegas hotels on TripAdvisor given how other people have done that. Here we write the code to build a neural network to do that. In this part we will create the training model. In the next blog post we will make predictions.

Prerequisites

• Python 3
• You need to install Tensorflow in Python 3, i.e., pip3 install –upgrade tensorflow
• Download this data. This is this Trip Advisor data converted to integers using this program. It has these column headings. All of these items are features. Score is the label.
`User country,Nr. reviews,Nr. hotel reviews,Helpful votes,Score,Period of stay,Traveler type,Pool,Gym,Tennis court,Spa,Casino,Free internet,Hotel name,Hotel stars,Nr. rooms,User continent,Member years,Review month,Review weekday`

Below is the code, which you can copy from here. We explain each section.

Below we put each column name into an array. There are 21 columns. The FIELD_DEFAULTS are given as 21 integers. We use integers to tell TensorFlow that these are integers and not floats.

```import tensorflow as tf

feature_names = ['Usercountry', 'Nrreviews','Nrhotelreviews','Helpfulvotes','Score','Periodofstay',
'Travelertype','Pool','Gym','Tenniscourt','Spa','Casino','Freeinternet',
'Hotelname','Hotelstars','Nrrooms','Usercontinent','Memberyears',
'Reviewmonth','Reviewweekday']
FIELD_DEFAULTS = [[0], [0], [0], [0], [0],
[0], [0], [0], [0], [0],
[0], [0], [0], [0], [0],
[0], [0], [0], [0], [0], [0]]
```

Next we want to read the data as a .csv file. Tensorflow provides the tf.decode_csv() method to read one line at a time. We use the dataset map() method to call parse_line for each line in the dataset. This creates a TensorFlow dataset, which is not a normal Python dataset. It is designed to work with Tensors. If you do not know what a Tensor is you can review this.

This routine returns the features as a dictionary and the label as a label. Notice that we delete the Score (parsed_line[4]) from the features since Score is not a feature. It is a label. The dict(zip()) methods put the key names in the dictionary,

```def parse_line(line):
parsed_line = tf.decode_csv(line, FIELD_DEFAULTS)
label = parsed_line[4]
del parsed_line[4]
features = parsed_line
d = dict(zip(feature_names, features))
print ("dictionary", d, " label = ", label)
return d, label
```

Tensorflow provides the tf.data.TextLineDataset() method to read a .csv file into a TensorFLow dataset. tf.estimator.DNNClassifier.train() requires that we call some function, in this case csv_input_fn(), which returns a dataset of features and labels. We use dataset.shuffle() since that is used when you create neural network.

```def csv_input_fn(csv_path, batch_size):
dataset = tf.data.TextLineDataset(csv_path)
dataset = dataset.map(parse_line)
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
return dataset
```

We have to create Tensors for each column in the dataset. We have both categorical data (e.g., 0 and 1) and numbers, e.g., number of reviews.

Categorical data set encode with, e.g., which means there are 47 categories. In other words our same data comes from people from 47 different countries. We can use df[‘User continent’].groupby(df[‘User continent’]).count(), for example, to count the unique elements.

`tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Usercountry",47))`

Numeric data we encode with, for example:

`Nrreviews = tf.feature_column.numeric_column("Nrreviews")`

Here is that full section. Notice in the last line we create the array of Tensors in feature_columns.

```Usercountry = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Usercountry",47))

Nrreviews = tf.feature_column.numeric_column("Nrreviews")

Nrhotelreviews = tf.feature_column.numeric_column("Nrhotelreviews")

Helpfulvotes = tf.feature_column.numeric_column("Helpfulvotes")

Periodofstay = tf.feature_column.numeric_column("Periodofstay")

Travelertype = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Travelertype",5))

Pool = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Pool",2))

Gym = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Gym",2))

Tenniscourt = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Tenniscourt",2))

Spa = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Spa",2))

Casino = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Casino",2))

Freeinternet = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Freeinternet",2))

Hotelname = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Hotelname",24))

Hotelstars = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Hotelstars",5))

Nrrooms = tf.feature_column.numeric_column("Nrrooms")

Usercontinent = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Usercontinent",6))

Memberyears = tf.feature_column.numeric_column("Memberyears")

Reviewmonth = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Reviewmonth",12))

Reviewweekday = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Reviewweekday",7))

feature_columns = [Usercountry, Nrreviews,Nrhotelreviews,Helpfulvotes,Periodofstay,
Travelertype,Pool,Gym,Tenniscourt,Spa,Casino,Freeinternet,Hotelname,Hotelstars,Nrrooms,Usercontinent,Memberyears,Reviewweekday]
```

Here is the tf.estimator.DNNClassifier, where DNN means Deep Neural Network. We give it the feature columns and the directory where it should store the model. We also say there are 5 classes since hotel scores range from 1 to 5. For hidden units we pick [10, 10]. This means the first layer of the neural network has 10 nodes and the next layer has 10. You can read more about how to pick that number by reading, for example, this StackOverflow article. I do not yet know if this is the correct value. We will see when we make predictions in the next post.

```classifier=tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[10, 10],
n_classes=5,
model_dir="/tmp")

batch_size = 100
```

Finally we call the train() method and give it an inplace (lambda) call to csv_input_fn and the path from which we read the csv file.

```classifier.train(
steps=1000,
input_fn=lambda : csv_input_fn("/home/walker/tripAdvisorFL.csv", batch_size))
```

In the next blog post we will show how to make predictions from this model, meaning estimate how a customer might rate a hotel given their characteristics. The hotel could then decide how much effort they might want to expend to make this customer happy or expend no effort at all.

### Automate workflows to simplify your big data lifecycle

In this e-book, you’ll learn how you can automate your entire big data lifecycle from end to end—and cloud to cloud—to deliver insights more quickly, easily, and reliably.

Last updated: 12/29/2017

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

### Run and Reinvent Your Business with BMC

BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for six years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe. Learn more about BMC ›

### About the author

#### Walker Rowe

Walker Rowe is a freelance tech writer and programmer. He specializes in big data, analytics, and programming languages. Find him on LinkedIn or Upwork.