ML with TensorFlow & Keras – BMC Software | Blogs https://s7280.pcdn.co Wed, 24 Jan 2024 12:52:02 +0000 en-US hourly 1 https://s7280.pcdn.co/wp-content/uploads/2016/04/bmc_favicon-300x300-36x36.png ML with TensorFlow & Keras – BMC Software | Blogs https://s7280.pcdn.co 32 32 TensorFlow vs PyTorch: Choosing Your ML Framework https://s7280.pcdn.co/tensorflow-vs-keras/ Mon, 14 Feb 2022 00:00:15 +0000 https://www.bmc.com/blogs/?p=15855 Among the most used machine learning (ML) frameworks, two are quite popular: PyTorch is Facebook’s ML package TensorFlow is from Google Both allow you to build Machine Learning models, both have easy out-of-the-box models, and both are highly customizable. Whether you are new to the field of an expert, these libraries can satisfy all your […]]]>

Among the most used machine learning (ML) frameworks, two are quite popular:

  • PyTorch is Facebook’s ML package
  • TensorFlow is from Google

Both allow you to build Machine Learning models, both have easy out-of-the-box models, and both are highly customizable. Whether you are new to the field of an expert, these libraries can satisfy all your needs—from testing to deployment. Let’s take a look at the differences between them.

Note: For those looking to choose a programming language, both libraries are written in Python. I machine learning is the direction you intend to go, learning Python is a common denominator. TensorFlow now has TensorFlow JS, so it can be used with JavaScript as well.

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

Getting started

TensorFlow is older and more widespread. Initially, it had a lot more community support and tutorials out on the web, but PyTorch has caught up. There is now extensive documentation for both, so learning one will not be any easier than the other. The key is to just get started.

With modelling, too, you rarely need to start from scratch. You can both find pre-built models, and models with pre-trained weights:

Huggingface, too, is a great ML community that uses both PyTorch and TensorFlow models interchangeably (they literally have a tool to convert models and weights between the two), and their new Hugging Face course can walk you through how to get started with Machine Learning.

Additional resources include:

Code style

The code for each looks different. PyTorch is more pythonic and uses the Object-Oriented Programming styles. TensorFlow has made things easy for coders, but, in doing so, it has removed some of the normal kind of coding developers are used to.

Tensors

Both frameworks use:

  • Tensors to pass data through to its models
  • Graphs to define their models

TensorFlow has a statically defined graph that gets exposed to the user through the commands tf.session and tf.Placeholder. PyTorch has dynamically defined graphs that become useful for some more complex models. TensorFlow Fold enables dynamism.

Comparing TensorFlow vs PyTorch

Now, let’s turn to a comparison of both frameworks.

Performance tests

At the beginning of 2020, OpenAI standardized on PyTorch. They found it easier and quicker to test new ideas.

PyTorch reports slower training times, which, given some models these companies use, training can take thousands of hours of compute so the training time is directly associated with costs. PyTorch tends to take more memory during training.

Accuracy

Both model frameworks can get the same accuracy. (A Comparison of Two Popular Machine Learning Frameworks)

Models Grouped by Framework

This comparison is a little dated (2017), but, from a report, its results appear to still hold.

Distributed training

Distributed training is getting closer to a must for large models. The big idea is to train a model on a Kubernetes cluster. On PyTorch, implementing a distributed training model is easy:

import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel
dist.init_process_group("nccl", world_size=2)
model = DistributedDataParallel(
        model)

On TensorFlow, you’ll have to do a lot of the coding yourself.

Deploying models

Models are great for research and backend purposes, but they need to work out in the world, too. This means the model and its weights need to get either:

At first, TensorFlow was the best framework for this, but as time has passed, both frameworks can be used to deploy models into production and there is plenty of documentation on how to do so.

Now, given Google has its own cloud framework, if you are using Google Cloud, TensorFlow can be integrated with Google’s services pretty easily, such as saving a TF Lite model onto its Firestore account and delivering the model to a mobile application. Though some of the TF stuff integrates well with Google’s services, there are plenty of workarounds to get PyTorch models working, too.

Data visualization

TensorFlow has TensorBoard which lets you see multiple aspects around the machine learning model from the model graph to the loss curve. It has lots of options to explore your model, as you can see:

PyTorch has no native visualization tool. Instead, there exists a graph visualization package called Visdom.

Visdom Graph Package

If needed, there is a way to display your PyTorch models on Tensorboard.

That concludes this comparison of TensorFlow and PyTorch. Browse the Guide to learn more.

Related reading

]]>
Using TensorFlow to Create a Neural Network (with Examples) https://www.bmc.com/blogs/create-neural-network-with-tensorflow/ Thu, 07 May 2020 08:59:30 +0000 https://www.bmc.com/blogs/?p=17278 When people are trying to learn neural networks with TensorFlow they usually start with the handwriting database. This builds a model that predicts what digit a person has drawn based upon handwriting samples obtained from thousands of persons. To put that into features-labels terms, the combinations of pixels in a grayscale image (white, black, grey) […]]]>

When people are trying to learn neural networks with TensorFlow they usually start with the handwriting database. This builds a model that predicts what digit a person has drawn based upon handwriting samples obtained from thousands of persons. To put that into features-labels terms, the combinations of pixels in a grayscale image (white, black, grey) determine what digit is drawn (0, 1, .., 8, 9).

Here we use other data.

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

Prerequisites

Before reading this TensorFlow Neural Network tutorial, you should first study these three blog posts:

Introduction to TensorFlow and Logistic Regression
What is a Neural Network? Introduction to Neural Networks Part I
Introduction to Neural Networks Part II

Then you need to install TensorFlow. The easiest way to do that on Ubuntu is to follow these instructions and use virtualenv.

Then install Python Pandasnumpy, scikit-learn, and SciPy packages.

The Las Vegas Strip Hotel Dataset from Trip Advisor

Programmers who are learning to using TensorFlow often start with the iris-data database. That given the combination of pixels that show what type of Iris flower is drawn. But we want to do something original here instead of use the Iris dataset. So we will use the Las Vegas Strip Data Set, cited in the paper “Moro, S., Rita, P., & Coelho, J. (2017). Stripping customers’ feedback on hotels through data mining: The case of Las Vegas Strip. Tourism Management Perspectives, 23, 41-52.” and see if we can wrap a neural network around it.

In their paper, the authors wrote a model using the R programming language and used Support Vector Matrices (SVMs) as their algorithm. That is a type of non-linear regression problem. It uses the same approach to solving regular LR problems, which is to find a line that reduces the MSE (mean square error) to its lowest point to build a predictive model. But SVMs take that up a notch in complexity by working with multiple, nonlinear inputs and finds a plane in n-dimensional space and not line on the XY Cartesian Plane.

Here we take the same data and but use a neural network instead of SVM. We will present this in 3 blog posts:

  1. Put data into numeric format.
  2. Train neural network.
  3. Make prediction.

The data and code for this tutorial is located here.

The Data

Click here to see the data in Google Sheets format. The data is too wide to fit on one screen so we show it below in two screen prints. If you read the paper cited above you can get more details about the data but basically it is TripAdvisor data for 21 Hotels along the Las Vegas Strip. The goal is to build a model that will predict what score an individual is likely to give to which hotel. The score is 1 to 5 and the input are 20 variables described in the spreadsheet below.

The authors of the paper say that certain data—like whether the hotel has a casio, pool, number of stars, or free internet—does not have much bearing on the score given by the hotel guest on Tripadvisor. Rather the factors that most heavily predict the score are the number of reviews the reviewer has written and how long they have been writing reviews. Other factors that influence the score are the day of the week and the month of the year.

Convert Values to Integers

You can download the code below from this iPython notebook.

First we need to convert all of those values to integers as machine learning uses arrays of numbers as input. We adopt three approaches:

  1. If the number is already an integer leave it.
  2. If the number is a YES or NO then change it to 1 or 0.
    1. If the element in a string, then use the ordinal string function to change each letter to a integer. Then sum those integers.
import pandas as pd  

def yesNo(x):
    if x=="YES":
        return 1
    else:
        return 0

def toOrd(str):
    x=0
    for l in str:
        x += ord(l)
    return int(x) 

cols = ['User country', 'Nr. reviews','Nr. hotel reviews','Helpful votes',
        'Score','Period of stay','Traveler type','Pool','Gym','Tennis court',
        'Spa','Casino','Free internet','Hotel name','Hotel stars','Nr. rooms',
        'User continent','Member years','Review month','Review weekday']

df = pd.read_csv('/home/walker/TripAdvisor.csv',sep=',',header=0)

Here we change every string to an integer. You would have to save the string-integer combination in some data structure so that later you could see which integer equals what string value.

Here is what our data looks like now.

We’ve already explained the problem we want to solve, which is to predict how someone might rate one of the Las Vegas hotels on TripAdvisor given how other people have done that. Next we will write the code to build a neural network to do that. In this part we will create the training model. After that we can use our TensorFlow Neural Network to make predictions.

Prerequisites for building our neural network

  • Python 3
  • You need to install Tensorflow in Python 3, i.e., pip3 install –upgrade tensorflow
  • Download this data. This is this Trip Advisor data converted to integers using this program. It has these column headings. All of these items are features. Score is the label.
User country,Nr. reviews,Nr. hotel reviews,Helpful votes,Score,Period of stay,
Traveler type,Pool,Gym,Tennis court,Spa,Casino,Free internet,Hotel name,
Hotel stars,Nr. rooms,User continent,Member years,Review month,Review weekday

Below is the code, which you can copy from here. We explain each section.

Below we put each column name into an array. There are 21 columns. The FIELD_DEFAULTS are given as 21 integers. We use integers to tell TensorFlow that these are integers and not floats.

import tensorflow as tf

feature_names = ['Usercountry', 'Nrreviews','Nrhotelreviews','Helpfulvotes','Score','Periodofstay',
                 'Travelertype','Pool','Gym','Tenniscourt','Spa','Casino','Freeinternet','Hotelname',
                 'Hotelstars','Nrrooms','Usercontinent','Memberyears','Reviewmonth','Reviewweekday']
FIELD_DEFAULTS = [[0], [0], [0], [0], [0],
                 [0], [0], [0], [0], [0],
                 [0], [0], [0], [0], [0],
                 [0], [0], [0], [0], [0], [0]]

Next we want to read the data as a .csv file. Tensorflow provides the tf.decode_csv() method to read one line at a time. We use the dataset map() method to call parse_line for each line in the dataset. This creates a TensorFlow dataset, which is not a normal Python dataset. It is designed to work with Tensors. If you do not know what a Tensor is you can review this.

This routine returns the features as a dictionary and the label as a label. Notice that we delete the Score (parsed_line[4]) from the features since Score is not a feature. It is a label. The dict(zip()) methods put the key names in the dictionary,

def parse_line(line):
   parsed_line = tf.decode_csv(line, FIELD_DEFAULTS)
   label = parsed_line[4]
   del parsed_line[4]
   features = parsed_line 
   d = dict(zip(feature_names, features))
   print ("dictionary", d, " label = ", label)   
   return d, label

Tensorflow provides the tf.data.TextLineDataset() method to read a .csv file into a TensorFLow dataset. tf.estimator.DNNClassifier.train() requires that we call some function, in this case csv_input_fn(), which returns a dataset of features and labels. We use dataset.shuffle() since that is used when you create neural network.

def csv_input_fn(csv_path, batch_size):
   dataset = tf.data.TextLineDataset(csv_path)
   dataset = dataset.map(parse_line)
   dataset = dataset.shuffle(1000).repeat().batch(batch_size)
   return dataset

We have to create Tensors for each column in the dataset. We have both categorical data (e.g., 0 and 1) and numbers, e.g., number of reviews.

Categorical data set encode with, e.g., which means there are 47 categories. In other words our same data comes from people from 47 different countries. We can use df[‘User continent’].groupby(df[‘User continent’]).count(), for example, to count the unique elements.

tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Usercountry",47))

Numeric data we encode with, for example:

Nrreviews = tf.feature_column.numeric_column("Nrreviews")

Here is that full section. Notice in the last line we create the array of Tensors in feature_columns.

Usercountry = tf.feature_column.indicator_column(tf.feature_column.categorical_
column_with_identity("Usercountry",47))

Nrreviews = tf.feature_column.numeric_column("Nrreviews")

Nrhotelreviews = tf.feature_column.numeric_column("Nrhotelreviews")

Helpfulvotes = tf.feature_column.numeric_column("Helpfulvotes")

Periodofstay = tf.feature_column.numeric_column("Periodofstay")

Travelertype = 
tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_identity("Travelertype",5))

Pool = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Pool",2))

Gym = 
tf.feature_column.indicator_column(tf.feature
_column.categorical_column_with_identity("Gym",2))

Tenniscourt = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Tenniscourt",2))

Spa = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Spa",2))

Casino = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Casino",2))

Freeinternet = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Freeinternet",2))

Hotelname = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Hotelname",24))

Hotelstars = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Hotelstars",5))

Nrrooms = 
tf.feature_column.numeric_column("Nrrooms")

Usercontinent = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Usercontinent",6))

Memberyears = tf.feature_column.numeric_column("Memberyears")

Reviewmonth = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Reviewmonth",12))

Reviewweekday = 
tf.feature_column.indicator_column(tf.feature_column.categorical
_column_with_identity("Reviewweekday",7))
feature_columns = [Usercountry, Nrreviews,Nrhotelreviews,Helpfulvotes,Periodofstay,
Travelertype,Pool,Gym,Tenniscourt,Spa,Casino,Freeinternet,Hotelname,Hotelstars,Nrrooms,
Usercontinent,Memberyears,Reviewweekday]

Here is the tf.estimator.DNNClassifier, where DNN means Deep Neural Network. We give it the feature columns and the directory where it should store the model. We also say there are 5 classes since hotel scores range from 1 to 5. For hidden units we pick [10, 10]. This means the first layer of the neural network has 10 nodes and the next layer has 10. You can read more about how to pick that number by reading, for example, this StackOverflow article. I do not yet know if this is the correct value. We will see when we make predictions in the next post.

classifier=tf.estimator.DNNClassifier(
   feature_columns=feature_columns,
   hidden_units=[10, 10],
   n_classes=5,
   model_dir="/tmp")

batch_size = 100

Finally we call the train() method and give it an inplace (lambda) call to csv_input_fn and the path from which we read the csv file.

classifier.train(
steps=1000,
input_fn=lambda : csv_input_fn("/home/walker/tripAdvisorFL.csv", batch_size))

In a separate blog post we will show how to make predictions from this model, meaning estimate how a customer might rate a hotel given their characteristics. The hotel could then decide how much effort they might want to expend to make this customer happy or expend no effort at all.

]]>
Deep Learning Step-by-Step Neural Network Tutorial with Keras https://www.bmc.com/blogs/deep-learning-neural-network-tutorial-keras/ Thu, 10 Oct 2019 00:00:34 +0000 https://www.bmc.com/blogs/?p=15623 In this article, we’ll show how to use Keras to create a neural network, an expansion of this original blog post. The goal is to predict how likely someone is to buy a particular product based on their income, whether they own a house, whether they have a college education, etc. The source code for […]]]>

In this article, we’ll show how to use Keras to create a neural network, an expansion of this original blog post. The goal is to predict how likely someone is to buy a particular product based on their income, whether they own a house, whether they have a college education, etc.

The source code for this Zeppelin notebook is here.

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

First, download the data from the internet. The feature data is already categorical (0 and 1) so there is no need to do any conversion on that.

The label column is data[‘Buy’]. data.iloc[:,2:16] means takes columns 2 through 16 and all rows, where iloc is the index search operator. The colon (:) means all.

import tensorflow as tf
from keras.models import Sequential
import pandas as pd
from keras.layers import Dense

url = 'https://raw.githubusercontent.com/werowe/logisticRegressionBestModel/master/KidCreative.csv'

data = pd.read_csv(url, delimiter=',')

labels=data['Buy']
features = data.iloc[:,2:16]

Now, we split the data into a 33% and 66% training and testing data sets, which is the normal convention for machine learning: take the input data, split it, and use one set to train the model. Then you use the other set to check its accuracy.

np.ravel(labels) create an array from the Pandas series labels.

import numpy as np
from sklearn.model_selection import train_test_split

X=features

y=np.ravel(labels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Here we normalize the data, which means putting it on some common scale (value – mean / standard deviation), a machine learning data convention.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)

Now we create a neural network with three layers. The input shape is (14,1) since there are 14 feature columns in the data Pandas dataframe.

We use binary_crossentropy for the loss function and Stochastic Gradient Descent for the optimizer as well as different activation functions. The choice of which to choose is arbitrary. Pick one, try another, pick another, and see which created the most accurate model.

The number of layers and the number of epochs are completely arbitrary. You can increase those to perhaps increase the accuracy of the model, but any variation is going to be only slight as this model converges in just a couple of steps.

model = Sequential()

model.add(Dense(8, activation='relu', input_shape=(14,)))

model.add(Dense(8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))


model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
                   
model.fit(X_train, y_train,epochs=8, batch_size=1, verbose=1)
Epoch 1/8
450/450 [==============================] - 2s 5ms/step - loss: 0.4760 - acc: 0.7844
Epoch 2/8
450/450 [==============================] - 1s 1ms/step - loss: 0.3338 - acc: 0.8511
Epoch 3/8
450/450 [==============================] - 1s 1ms/step - loss: 0.2521 - acc: 0.9000
Epoch 4/8
450/450 [==============================] - 1s 1ms/step - loss: 0.2058 - acc: 0.9156
Epoch 5/8
450/450 [==============================] - 1s 2ms/step - loss: 0.1829 - acc: 0.9311
Epoch 6/8
450/450 [==============================] - 1s 1ms/step - loss: 0.1740 - acc: 0.9311
Epoch 7/8
450/450 [==============================] - 1s 2ms/step - loss: 0.1630 - acc: 0.9311
Epoch 8/8
450/450 [==============================] - 1s 1ms/step - loss: 0.1583 - acc: 0.9378

Here we use the evaluate() method to show the accuracy of the model, meaning the ratio (number of correct predictions)/(number of predictions),

You can print y_pred and y_test side-by-side and see that most of the predictions are the same as the test values. That’s to be expected as the accuracy of this model is 93.78%.

y_pred = model.predict(X_test)

score = model.evaluate(X_test, y_test,verbose=1)

print(score)
223/223 [==============================] - 1s 3ms/step
[0.18696444257759728, 0.9372197314762748]

Correlating the data

A necessary step in machine learning is to plot is to see if that supports your hypothesis that the data is correlated. et’s separate the data into buyers and non-buyers and plot the features in a histogram. We stack one histogram on top of the other so that they all fit.

Buyers

We obviously believe that people with certain income, education level, current/prior homeownership, etc., will predict whether they will buy or not.

df is a new dataframe with all the columns of the original data frame data.columns. We create that and then select people who have bought using the df.loc[(df.Buy == 1)] operator. df.loc selects columns based on a row value.

fig, ax = plt.subplots(2,8,figsize=(16, 4) ) means to create a figure of two rows with 8 charts. So each chart is then referenced by ax[row,column].

fig.subplots_adjust(hspace=1, wspace=0.2) gives us horizontal spacing between and width.

buyers.columns[2:] means to take the 3rd column to the last one in the dataframe. We don’t have to plot a history of the observation number, buy-or-not, and income columns as those would not lend themselves to being plot as a histogram.

df = pd.DataFrame(data, columns= np.array(data.columns))

buyers = df.loc[(df.Buy == 1)]

import matplotlib.pyplot as plt


fig, ax = plt.subplots(2,8,figsize=(16, 4) )

i = 0
j = 0

for c in buyers.columns[2:]:
    ax[j,i].hist(buyers[c])
    ax[j,i].set_title(c)
    i = i + 1
    if i == 8:
        j = 1
        i = 0
 
fig.subplots_adjust(hspace=1, wspace=0.2)
plt.show()

As you can see, people with higher incomes, are professionals, have jobs, have dual incomes, and who own their house are most likely to have bought.

Non-buyers

Now we do the opposite and pick out people who did not buy anything.

df = pd.DataFrame(data, columns = data.columns )

notbuyers = df.loc[(df.Buy == 0)]

import matplotlib.pyplot as plt


fig, ax = plt.subplots(2,8,figsize=(16, 4) )

i = 0
j = 0

for c in np.array(data.columns)[2:]:
    ax[j,i].hist(notbuyers[c])
    ax[j,i].set_title(c)
    i = i + 1
    if i == 8:
        j = 1
        i = 0
 
fig.subplots_adjust(hspace=1, wspace=0.2)
plt.show()

Comparing histograms

In easily see the differences between the buy versus non-buy persons, we shown both charts below together.

Buyers histogram

Non-buyers histogram

]]>
How to Use Keras to Solve Classification Problems with a Neural Network https://www.bmc.com/blogs/keras-neural-network-classification/ Fri, 04 Oct 2019 00:00:27 +0000 https://www.bmc.com/blogs/?p=15565 Keras can be used to build a neural network to solve a classification problem. In this article, we will: Describe Keras and why you should use it instead of TensorFlow Explain perceptrons in a neural network Illustrate how to use Keras to solve a Binary Classification problem For some of this code, we draw on […]]]>

Keras can be used to build a neural network to solve a classification problem. In this article, we will:

  • Describe Keras and why you should use it instead of TensorFlow
  • Explain perceptrons in a neural network
  • Illustrate how to use Keras to solve a Binary Classification problem

For some of this code, we draw on insights from a blog post at DataCamp by Karlijn Willems.

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

What is Keras?

Keras is an API that sits on top of Google’s TensorFlow, Microsoft Cognitive Toolkit (CNTK), and other machine learning frameworks. The goal is to have a single API to work with all of those and to make that work easier.

In my view, you should always use Keras instead of TensorFlow as Keras is far simpler and therefore you’re less prone to make models with the wrong conclusions. Too many people dive in and start using TensorFlow, struggling to make it work. Keras adds simplicity. But you can use TensorFlow functions directly with Keras, and you can expand Keras by writing your own functions.

Keras prerequisites

In order to run through the example below, you must have Zeppelin installed as well as these Python packages:

  • TensorFlow
  • Keras
  • Theano
  • Seaborn
  • Matplotlib
  • NumPy
  • pydot
  • scikit-learn

You’ll also need this package:

  • sudo apt install install graphviz

The data

First, we use this data set from Kaggle which tracks diabetes in Pima Native Americans. We use it to build a predictive model of how likely someone is to get or have diabetes given their age, body mass index, glucose and insulin levels, skin thickness, etc.

The code below plugs these features (glucode, BMI, etc.) and labels (the single value yes [1] or no [0]) into a Keras neural network to build a model that with about 80% accuracy can predict whether someone has or will get Type II diabetes.

Neural network

Here we are going to build a multi-layer perceptron. This is also known as a feed-forward neural network. That’s opposed to fancier ones that can make more than one pass through the network in an attempt to boost the accuracy of the model.

If the neural network had just one layer, then it would just be a logistic regression model.

You can still think of this as a logistic regression model, but one having a higher degree of accuracy by running logistic regression calculations multiple times.  That’s the basic idea behind the neural network:  calculate, test, calculate again, test again, and repeat until an optimal solution is found. This approach works for handwriting, facial recognition, and predicting diabetes.

Neural networks explained

You should have a basic understanding of the logic behind neural networks before you study the code below. Here is a quick review; you’ll need a basic understanding of linear algebra to follow the discussion.

Basically, a neural network is a connected graph of perceptrons. Each perceptron is just a function. In a classification problem, its outcome is the same as the labels in the classification problem. For this model it is 0 or 1. For handwriting recognition, the outcome would be the letters in the alphabet.

Each perceptron makes a calculation and hands that off to the next perceptron. This calculation is really a probability. In the case of a classification problem a threshold t is arbitrarily set such that if the probability of event x is > t then the result it 1 (true) otherwise false (0).  For logistic regression, that threshold is 50%.

The functions used are a sigmoid function, meaning a curve, like a sine wave, that varies between two known values. The logistic sigmoid function works well in this example since we are trying to predict whether someone has or will get diabetes (1) or not (0).

A neural network is just a large linear or logistic regression problem

Logistic regression is closely related to linear regression. The only difference is logistic regression outputs a discrete outcome and linear regression outputs a real number. In fact, if we have a linear model y = wx + b and let t = y then the logistic function is.

It’s a number that’s designed to range between 1 and 0, so it works well for probability calculations.

In the simple linear equation y = mx + b we are working with only on variable, x.  You can solve that problem using Microsoft Excel or Google Sheets. You don’t need a neural network for that.

In most problems we face in the real world, we are dealing with many variables. In that case m and x are matrices. But the math is similar because we still have the concept of weights and bias in mx +b.

In the formula below, the matrix is size m x 1 below.  So it’s a vector, which is a one-dimensional matrix. Each of i= 1, 2, 3, …, m weights is wi. And there are m features (x) x1, x2, x3, …, xm.  x is BMI; glucose, etc. in the diabetes data. The weights w1, w2, …, wm and the bias is the number that most accurately predicts the relationship between those indicators and the probability that the person is diabetic.

For each node in the neural network, we calculate the dot product of w • x, which means multiple every weight w by every feature x taken from our training set, and then add a bias b to shift the calculation up or down.

The expanded calculation looks like this, where you take every element from vector w and multiple it by its corresponding element in vector x.

f(x) = (w1* x1 + w2 * x2 + … + wm * xm) + b.

This gives us a real number.  In the case of the logistic function, as we said above, it f(x) > %50 then the perceptron outputs 1. Otherwise 0.

Solving the neural network problem

The algorithm stops when the model converges, meaning when the error reaches the minimum possible value. In plain English, that means we have built a model with a certain degree of accuracy. The error is the value error = 1 – (number of times the model is correct) / (number of observations).

A mathematician would say the model converges when we have found a hyperplane that separates each point in this m dimensional space (since there are m input variables) with maximum distance between the plane and the points in space. Each of the positive outcomes is on one side of the hyperplane and each of the negative outcomes is on the other. In other words, it’s like calculating the LSE (least squares error) in a simple linear regression problem, except this is working in more than one dimension.

If no such hyperplane exists, then there is no solution to the problem. Then we conclude that a model cannot be built because there is not enough correlation between the variables.

Neural network layers

Remember that the approach to solving such a problem is iterative. In terms of a neural network, you can see this in this graphic below.

Source: Wikipedia

We have an input layer, which is where we feed our matrix of features and labels.  Those perceptron functions then calculate an initial set of weights and hand off to any number of hidden layers.  How many times it does this is governed by the parameters you pass to the algorithms, the algorithm you pick for the loss and activation function, and the number of nodes that you allow the network to use.

The final solution comes out in the output later. There’s just one input and output layer.  There’s no scientific way to determine how many hidden layers you should use.  The data scientist just varies those and the algorithms used at each layer until the most accurate solution is found.  So it’s trial and error.

The code

We have stored the code for this example in a Jupyter notebook here.

Seaborn correlation plot

A first step in data analysis should be plotting as it is easier to see if we can discern any pattern.

We could start by looking to see if there is some correlation between variables. So, we use the powerful Seaborn correlation plot. Seaborn is an extension to matplotlib.

First load the data into a dataframe:

import tensorflow as tf
from keras.models import Sequential
import pandas as pd
from keras.layers import Dense

data = pd.read_csv('/home/ubuntu/Downloads/diabetes.csv', delimiter=',')

Then visually inspect it:

First let’s browse the data, listing maximum and minimum and average values

data.describe()
Pregnancies Glucose BloodPressure SkinThickness Insulin \
count 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.799479
std 3.369578 31.972618 19.355807 15.952218 115.244002
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000
75% 6.000000 140.250000 80.000000 32.000000 127.250000
max 17.000000 199.000000 122.000000 99.000000 846.000000

BMI DiabetesPedigreeFunction Age Outcome \
count 768.000000 768.000000 768.000000 768.000000
mean 31.992578 0.471876 33.240885 0.348958
std 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.078000 21.000000 0.000000
25% 27.300000 0.243750 24.000000 0.000000
50% 32.000000 0.372500 29.000000 0.000000
75% 36.600000 0.626250 41.000000 1.000000
max 67.100000 2.420000 81.000000 1.000000

prediction
count 768.000000
mean 0.317708
std 0.465889
min 0.000000
25% 0.000000
50% 0.000000
75% 1.000000
max 1.000000
FINISHED

You can also inspect the values in the dataframe like this:

 data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
Pregnancies                 768 non-null int64
Glucose                     768 non-null int64
BloodPressure               768 non-null int64
SkinThickness               768 non-null int64
Insulin                     768 non-null int64
BMI                         768 non-null float64
DiabetesPedigreeFunction    768 non-null float64
Age                         768 non-null int64
Outcome                     768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB  

Check correlation with heatmap graph

Next, run this code to see any correlation between variables. That is not important for the final model but is useful to gain further insight into the data.

Seaborn creates a heatmap-type chart, plotting each value from the dataset against itself and every other value. Then it figures out if these two values are in any way correlated with each other.

import seaborn as sns
import matplotlib as plt
corr = data.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)

Items that are perfectly correlated have correlation value 1. Obviously, every metric is perfectly correlated with itself., illustrated by the tan line going diagonally across the middle of the chart.

There’s not a lot of orange squares in the chart. So, you can say that no single value is 80% likely to give you diabetes (outcome). There does not seem to be much correlation between these individual variables. But, we will see that when taken in the aggregate we can predict with almost 75% accuracy who will develop diabetes given all of these factors together.

You can check the correlation between two variables in a dataframe like shown below.  There is not much correlation here since 0.28 and 0.54 are far from 1.00.

data['BloodPressure'].corr( data["BMI"])

0.2818052888499106

data["Pregnancies"].corr(data["Age"])

0.5443412284023394

Prepare the test and training data sets

  • Outcome is the column with the label (0 or 1).
  • The rest of the columns are the features.
  • We use the scikit-learn function train_test_split(X, y, test_size=0.33, random_state=42) to split the data into training and test data sets, given 33% of the records to the test data set.  The training data set is used to train the mode, meaning find the weights and biases.  The test data set is used to check its accuracy.
  • labels is not an array. It is a column in a dataset.  So we use the NumPy np.ravel() function to convert that to an array.
import numpy as np

labels=data['Outcome']
features = data.iloc[:,0:8]

from sklearn.model_selection import train_test_split

X=features

y=np.ravel(labels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) 

Now we normalize the values, meaning take each x in the training and test data set and calculate (x – μ) / δ, or the distance from the mean (μ) divided by the standard deviation (δ). That put the data on a standard scale, which is a standard practice with machine learning.

StandardScaler does this in two steps:  fit() and transform().

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)  

The Keras sequential model

The code below created a Keras sequential model, which means building up the layers in the neural network by adding them one at a time, as opposed to other techniques and neural network types.

Activation function

Pick an activation function for each layer. It takes that ((w • x) + b) and calculates a probability. Then it sets a threshold to determine whether the neuron ((w • x) + b) should be 1 (true) or (0) negative. (That’s not the same as saying diabetic, 1, or not, 0, as neural networks can handle problems with more than just two discrete outcomes.)

For the first two layers we use a relu (rectified linear unit) activation function. That choice means nothing, as you could have picked sigmoid.  reluI is 1 for all positive values and 0 for all negative ones. So:

f(x) = 0 if x <=0
f(x) = 1 if x > 0

This is the same as saying f(x) = max (0, x). So f(-1), for example = max(0, -1) = 0. In other words, if our probability function is negative, then pick 0 (false). Otherwise pick 1 (true).

The rule as to which activation function to pick is trial and error. Pick different ones and see which produces the most accurate predictions. There are others: Sigmoid, tanh, Softmax, ReLU, and Leaky ReLU. Some are more suitable to multiple rather than binary outputs.

Sigmoid uses the logistic function, 1 / (1 + e**z) where  z = f(x) =  ((w • x) + b).

This graph from Beyond Data Science shows each function plotted as a curve.

Some notes on the code:

  • input_shape—we only have to give it the shape (dimensions) of the input on the first layer. It’s (8,) since it’s a vector of 8 features. In other words its 8 x 1.
  • Dense—to apply the activation function over ((w • x) + b). The first argument in the Dense function is the number of hidden units, a parameter that you can adjust to improve the accuracy of the model. Hidden units is, like the number of hidden layers, a complex topic not easy to understand or explain, but it’s one we can safely gloss over.  (The complexity of these two topics is what makes most people say that working with neural networks is art. A mathematician would mock that lack of rigor.)
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()

model.add(Dense(8, activation='relu', input_shape=(8,)))

model.add(Dense(8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))
  • loss—the goal of the neural network is to minimize the loss function, i.e., the difference between predicted and observed values. There are many functions we can use. We pick binary_crossentropy because our label data is binary (1) diabetic and (0) not diabetic.
  • optimizer—we use the optimizer function sgd, Stochastic Gradient Descent. It’s an algorithm designed to minimize the loss function in the quickest way possible. There are others.
  • epoch—means how many times to run the model. Remember that it is an iterative process. You could add additional epochs, but the accuracy might not change much. You just have to try and see.
  • metrics—means what metrics to display as it runs. Accuracy means how accurately the evolving model predicts the outcome.
  • batch sizen means divide the input data into n batches and process each in parallel.
  • fit()—trains the model, meaning calculates the weights, biases, number of layers, etc.
model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
                   
model.fit(X_train, y_train,epochs=4, batch_size=1, verbose=1)

Above, we talked about the iterative process of solving a neural network for weights and bias.  That’s done with epochs. Here is the output as it runs those. As you can see the accuracy goes up quickly then levels off.

Epoch 1/4
514/514 [==============================] - 2s 3ms/step - loss: 0.6016 - acc: 0.6790
Epoch 2/4
514/514 [==============================] - 1s 1ms/step - loss: 0.5118 - acc: 0.7588
Epoch 3/4
514/514 [==============================] - 1s 1ms/step - loss: 0.4755 - acc: 0.7782
Epoch 4/4
514/514 [==============================] - 1s 1ms/step - loss: 0.4597 - acc: 0.7802

You can use model.summary() to print some information.

Here are the weights for each layer we mentions.

for layer in model.layers:
    weights = layer.get_weights()

It looks like this:

[array([[ 0.11246287,  0.64353085,  0.00519296, -0.3992814 ,  0.29071185,
         0.3010074 ,  0.21385622,  0.31609035],
       [ 0.6338699 ,  0.5349119 ,  0.11174025, 
…

We can also draw a picture of the layers and their shapes. It’s not very useful but nice to see.

from keras.utils import plot_model
plot_model(model, to_file='/tmp/model.png', show_shapes=True,)

As you would expect, the shape of the output is 1, as there we have our prediction:

Then we can get configuration information on each layer with layer.get_config and the model with model.get_config():

Now we can run predictions on test data.

y_pred = model.predict_classes(X_test)

This prints the score, or accuracy.

score = model.evaluate(X_test, y_test,verbose=1)

print(score)
254/254 [==============================] - 0s 46us/step
[0.5745944582571195, 0.7204724428221936]

So, our predictive model is 72% accurate.

If you read the discussions at data camp you can see other analysts have been able to get slightly better results trying other techniques. But remember the danger of overfitting.

]]>
Google Cloud TPUs for ML Acceleration https://www.bmc.com/blogs/google-cloud-tpu/ Fri, 24 Aug 2018 00:00:17 +0000 https://www.bmc.com/blogs/?p=12712 We already wrote how machine learning frameworks are using NVIDIA GPUs (graphical processing units) to speed machine learning. Now Google is taking that idea and using it to speed machine learning using their own ASIC hardware, called TPUs, Tensor Processing Units. What Google has really done is take technology invented by NVIDIA (GPUs) and pushed […]]]>

We already wrote how machine learning frameworks are using NVIDIA GPUs (graphical processing units) to speed machine learning. Now Google is taking that idea and using it to speed machine learning using their own ASIC hardware, called TPUs, Tensor Processing Units. What Google has really done is take technology invented by NVIDIA (GPUs) and pushed it to the cloud.

A Tensor is an n-dimensional matrix. This is the basic unit of operation in with TensorFlow, the open source machine learning framework launched by Google Brain.

A Tensor is analogous to a numpy array and in fact uses Numpy. According to their documentation it is “NumPy is the fundamental package for scientific computing with Python. It contains among other things a powerful N-dimensional array object … ”

Arrays are the fundamental data structures used by machine learning algorithms. Multiplying and taking slices from arrays takes a lot of CPU clock cycles and memory. So Numpy was written to make writing code to do that easier. GPUs now make those operations run faster.

In particular, the math involved in doing ML includes adding and multiplying these objects:

GPUs were originally built to offload the intensive math calculations need to rotate graphics on a screen and otherwise speed up any kind of graphical operation, like painting screens in gaming applications. The goal was to not overburden the CPU. But then NVIDIA wrote the CUDA SDK letting programmers who write things like Tensorflow use GPUs for any kind of scalar, vector, or matrix addition or multiplication.

A CPU has 1 to 8 cores or more. A GPU has hundreds. The GPU and TPU are the same technology. The only difference is now selling it as a cloud service using proprietary GPU chips that they sell to no one else.

Google’s approach to provisioning a TPU is different than Amazon’s. At Amazon you pick a GPU-enabled template and spin up a virtual machine with that. Those templates all start with the letters P3 and are listed here.

With Google you use their command line tool cptu to provide machines with TPUs. (And you can continue to use NVIDIA GPUs as well.)

According to Google’s pricing information, each TPU cost $4.50 hour. Apparently they do not charge different rates for different TPU models even though they show three models on their website. That seems confusing as TPUs have different memory sizes and clock speeds. So one should be more expensive than another.

The TPU workload is distributed to what they call their TPU Cloud Server, as shown below.

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

Hardware

According to their architecture docs, their TPUs are connected to their cloud machines through a PCI interface. That is the same way that NVIDIA let gamers add graphical expansion cards to boost the performance of the graphics on the computer. So they are not just onboard chips, but expansion cards.

Google says, “Each chip consists of two compute cores called Tensor Cores. A Tensor Core consists of scalar, vector and matrix units (MXU). In addition, 16 GB of on-chip memory (HBM) is associated with each Tensor Core.”

Software

An estimator is the tf.estimator.Estimator class. These are the implementation of neural networks, linear regression, and other objects with Python code that makes creating those kinds of objects simpler, since they leverage Numpy, Pandas, and other Python data structures and utilities.

Now Google says they have a TPU Estimator. You cannot download and use the GPU-enabled version of Tensorflow, which is different than regular TensorFlow in that it uses the CUDA SDK for that part of that code that is written in C and C++. There is no separate TPU-enabled version of TensorFlow. And unlike GPU, there appears to be no way to explicitly tell the code to use the TPU device, like in this code snippet that multiplies two matrices using GPU device /device:GPU:n. ( To use the CPU you would write /device:CPU:n, where n can be any of the n CPUs on the computer.)

with tf.device('/device:GPU:0'):
   c = tf.matmul(x, y)

Scale Advantage?

One advantage of the TPU design would be that it lets you scale operations across different machines with their TPU servers. The user, of course, does not need to write any code to do that. This researcher has not yet studied how to do that with GPUs. In other words what do you do when your calculation runs out of memory? You can add PCI expansion cards ‘/device:GPU:1, 2, …., n or effectively do the same thing by paying Amazon for a larger template. But how do you implement something like a Mesos equivalent that would let you scale across a cluster of servers without having to hard-code device and server names? We will look at that and write you back.

]]>
How Keras Machine Language API Makes TensorFlow Easier https://www.bmc.com/blogs/how-keras-machine-language-api-makes-tensorflow-easier/ Fri, 03 Aug 2018 11:34:20 +0000 https://www.bmc.com/blogs/?p=12520 Keras is a Python framework designed to make working with Tensorflow (also written in Python) easier. It builds neural networks, which, of course, are used for classification problems. The example problem below is binary classification. You can find the code here. The binary classification problem here is to determine whether a customer will buy something […]]]>

Keras is a Python framework designed to make working with Tensorflow (also written in Python) easier. It builds neural networks, which, of course, are used for classification problems. The example problem below is binary classification. You can find the code here. The binary classification problem here is to determine whether a customer will buy something given 14 different features. You can see the data here.

Keras can run on top of:

  • TensorFlow
  • cuDNN
  • CNTK

Here we use Tensorflow. So install Aconda and then run these commands to install the rest.

conda install theano
conda install tensorflow
conda install keras

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

The Code

In the code below, we have a dataframe of shape (673,14), meaning 673 rows and 14 feature columns. We take the columns called Buy and use that for labels. You can use the Keras methods with dataframes, numpy arrays, or Tensors.

We declare our model to be Sequential. These are a stack of layers.

We tell Keras to return the accuracy metric metrics=[‘accuracy’].

import tensorflow as tf
from keras.models import Sequential
import pandas as pd
from keras.layers import Dense

url = 'https://raw.githubusercontent.com/werowe/
logisticRegressionBestModel/master/KidCreative.csv'

data = pd.read_csv(url, delimiter=',')

labels=data['Buy']
features = data.iloc[:,2:16]

model = Sequential()

model.add(Dense(units=64, activation='relu', input_dim=1))
model.add(Dense(units=14, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

model.fit(labels, features,
          batch_size=12,
          epochs=10,
          verbose=1,
          validation_data=(labels, features))
          
model.evaluate(labels, features, verbose=0)

model.summary()

The output looks like this. As you can see it ran the sgd (standard gradient descent) optimizer and categorical_crossentropy 10 times, since we set the epochs to 10. Since we used the same data for training and evaluating we get a 0.9866 accuracy. In actual use we would split the input data into training and test data, following the standard convention. The loss is 362.1225. We could have used mse (mean squared error), but we used categorical_crossentropy. The goal of the model (in this case sgd) is to minimize the loss function, meaning the difference between the actual and predicted values.

rain on 673 samples, validate on 673 samples
Epoch 1/10
2018-07-26 08:43:32.122722: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
673/673 [==============================] - 0s 494us/step - loss: 1679.5777 - acc: 0.9851 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 2/10
673/673 [==============================] - 0s 233us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 3/10
673/673 [==============================] - 0s 218us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 4/10
673/673 [==============================] - 0s 208us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 5/10
673/673 [==============================] - 0s 213us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 6/10
673/673 [==============================] - 0s 212us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 7/10
673/673 [==============================] - 0s 216us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 8/10
673/673 [==============================] - 0s 218us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 9/10
673/673 [==============================] - 0s 228us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
Epoch 10/10
673/673 [==============================] - 0s 239us/step - loss: 362.1225 - acc: 0.9866 - val_loss: 362.1225 - val_acc: 0.9866
<keras.callbacks.History object at 0x7fa48f3ccac8>
>>>           
... model.evaluate(labels, features, verbose=0)
[362.1224654085746, 0.986627043090639]
>>> 
>>> model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 64)                128       
_________________________________________________________________
dense_2 (Dense)              (None, 14)                910       
=================================================================
Total params: 1,038
Trainable params: 1,038
Non-trainable params: 0

Now you can use the predict() method to make some prediction on whether a person is likely to buy this product or not.

]]>
Using TensorFlow Neural Network for Machine Learning Predictions with TripAdvisor Data https://www.bmc.com/blogs/using-tensorflow-neural-network-for-machine-learning-predictions-with-tripadvisor-data/ Mon, 21 May 2018 00:00:52 +0000 http://www.bmc.com/blogs/?p=12259 Here is the last part of our analysis of the Tripadvisor data. Part one is here. In order to understand this, you will need to know Python and Numpy Arrays and the basics behind tensorflow and neural networks. If you do not, you can read an introduction to tensorflow here. The code from this example […]]]>

Here is the last part of our analysis of the Tripadvisor data. Part one is here. In order to understand this, you will need to know Python and Numpy Arrays and the basics behind tensorflow and neural networks. If you do not, you can read an introduction to tensorflow here.

The code from this example is here and input data here. We create a neural network using the Tensorflow tf.estimator.DNNClassifier. (DNN means deep neural network, i.e., one with hidden layers between the input and output layers.)

Below we discuss each section of the code.

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

parse_line

feature_names is the name we have assigned to the feature columns.

FIELD_DEFAULTS is an array of 20 integers. This tells tensorflow that our inputs are integers and that there are 20 features. If we had used 1.0 it would declare those as floats.

import tensorflow as tf
import numpy as np
feature_names = ['Usercountry', 'Nrreviews','Nrhotelreviews','Helpfulvotes','Periodofstay',
'Travelertype','Pool','Gym','Tenniscourt','Spa','Casino',
'Freeinternet','Hotelname','Hotelstars','Nrrooms','Usercontinent',
'Memberyears','Reviewmonth','Reviewweekday']
FIELD_DEFAULTS = [[0], [0], [0], [0], [0],
[0], [0], [0], [0], [0],
[0], [0], [0], [0], [0],
[0], [0], [0], [0], [0]]

parse_line

DNNClassifier.train requires an input_fn that returns features and labels. It is not supposed to be called with arguments, so we use lambda below to iteratively call it and to pass it a parameter, which is the name of the text file to read..

We cannot simply use one of the examples provided by TensorFlow, such as the helloword-type one that reads Iris flower data, to read the data. We made our own data and put it into a .csv file. So we need our own parser. So, in this case, we use the tf.data.TextLineDataset method to read from the csv text file and feed it into this parser. That will read those lines and return the features and labels as a dictionary and tensor pair.

In del parsed_line[4] we deleted the 5th tensor from the input, which is the Tripadvisor score. Because that is an label (i.e., output) and not a feature (input).

tf.decode_csv(line, FIELD_DEFAULTS) creates tensors for each items read from the .csv file.

You cannot see tensors using they have value. And they do not have value until you run a tensor session. But you can inspect these values using tp.Print(). Note also that for debug purposes you could do this to test the parse functions:

import pandas as pd
df = pd.read_csv("/home/walker/TripAdvisor.csv")
ds = df.map(parse_line)

Continuing with our explanation, dict(zip(feature_names, features)) create a dictionary from the features tensors and features name. For the label we just assign that label = parsed_line[4] from the 5th item in parsed_line.

def parse_line(line):
parsed_line = tf.decode_csv(line, FIELD_DEFAULTS)
tf.Print(input_=parsed_line , data=[parsed_line ], message="parsed_line ")
tf.Print(input_=parsed_line[4], data=[parsed_line[4]], message="score")
label = parsed_line[4]
del parsed_line[4]
features = parsed_line
d = dict(zip(feature_names, features))
return d, label

csv_input
A dataset is a Tensorflow dataset and not a simpler Python object. We call parse_line with the dataset.map() method after having created the dataset from the .csv text file with tf.data.TextLineDataset(csv_path).

def csv_input_fn(csv_path, batch_size):
dataset = tf.data.TextLineDataset(csv_path)
dataset = dataset.map(parse_line)
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
return dataset

Create Tensors

Here we create the tensors as continuous numbers as opposed to categorical. This is correct but could be improved. See the note below.

Note: User country, is a set of discrete values.  So we could have used, for example, Usercountry = tf.feature_column.indicator_column(tf.feature_column. categorical_column_with_identity("Usercountry",47))
since there are 47 countries in our dataset.  You can experiment with that and see if you can make that change. I got errors trying to get that to work since tf.decode_csv() appeared to be reading the wrong column in certain cases this given values that were, for example, not one of the 47 countries.  So there must be a few rows in the input data that has a different number of commas than the others.  You can experiment with that.

Finally feature_columns is an array of the tensors we have created.

Usercountry = tf.feature_column.numeric_column("Usercountry")
Nrreviews = tf.feature_column.numeric_column("Nrreviews")
Nrhotelreviews = tf.feature_column.numeric_column("Nrhotelreviews")
Helpfulvotes = tf.feature_column.numeric_column("Helpfulvotes")
Periodofstay = tf.feature_column.numeric_column("Periodofstay")
Travelertype = tf.feature_column.numeric_column("Travelertype")
Pool = tf.feature_column.numeric_column("Pool")
Gym = tf.feature_column.numeric_column("Gym")
Tenniscourt = tf.feature_column.numeric_column("Tenniscourt")
Spa = tf.feature_column.numeric_column("Spa")
Casino = tf.feature_column.numeric_column("Casino")
Freeinternet = tf.feature_column.numeric_column("Freeinternet")
Hotelname = tf.feature_column.numeric_column("Hotelname")
Hotelstars = tf.feature_column.numeric_column("Hotelstars")
Nrrooms = tf.feature_column.numeric_column("Nrrooms")
Usercontinent = tf.feature_column.numeric_column("Usercontinent")
Memberyears = tf.feature_column.numeric_column("Memberyears")
Reviewmonth = tf.feature_column.numeric_column("Reviewmonth")
Reviewweekday = tf.feature_column.numeric_column("Reviewweekday")
feature_columns = [Usercountry, Nrreviews,Nrhotelreviews,Helpfulvotes,Periodofstay,
Travelertype,Pool,Gym,Tenniscourt,Spa,Casino,Freeinternet,Hotelname,
Hotelstars,Nrrooms,Usercontinent,Memberyears,Reviewmonth,
Reviewweekday]

Create Classifier

Now we train the model. The hidden_units [10,10] means the first hidden layer of the deep neural network has 10 nodes and the second has 10. The model_dir is the temporary folder where to store the trained model. The hotel scores range from 1 to 5 so n_classes is 6 since it must be greater than that number of buckets.

classifier=tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[10, 10],
n_classes=6,
model_dir="/tmp")
batch_size = 100

Train the model

Now we train the model. We use lambda because the documentation says “Estimators expect an input_fn to take no arguments. To work around this restriction, we use lambda to capture the arguments and provide the expected interface.”

classifier.train(
steps=100,
input_fn=lambda : csv_input_fn("/home/walker/tripAdvisorFL.csv", batch_size))

Make a Prediction

Now we make a prediction on the trained model. In practice you should also run an evaluation step. You will see in the code on github that I wrote that, but it never exited the evaluation step. So that remains an open issue to sort out here.
We need some data to test with. To we have the first line from the training set input and key it in here. That reviewer gave the hotel a score of 5. So our expected result is 5. The neural network will give the probability that the expected result is 5. The classifier.predict() method runs the input function we tell it to run, in this case. predict_input_fn(). It that returns the features as a dictionary. If we had been using running the evaluation we would need both the features and the label.

features = {'Usercountry': np.array([233]), 'Nrreviews': np.array([11]),'Nrhotelreviews': np.array([4]),'Helpfulvotes': np.array([13]),'Periodofstay': np.array([582]),'Travelertype': np.array([715]),'Pool' : np.array([0]),'Gym' : np.array([1]),'Tenniscourt' : np.array([0]),'Spa' : np.array([0]),'Casino' : np.array([0]),'Freeinternet' : np.array([1]),'Hotelname' : np.array([3367]),'Hotelstars' : np.array([3]),'Nrrooms' : np.array([3773]),'Usercontinent' : np.array([1245]),'Memberyears' : np.array([9]),'Reviewmonth' : np.array([730]),'Reviewweekday' : np.array([852])}
def predict_input_fn():
return features
expected = [5]
prediction = classifier.predict(input_fn=predict_input_fn)
for pred_dict, expec in zip(prediction, expected):
class_id = pred_dict['class_ids'][0]
probability = pred_dict['probabilities'][class_id]
print ('class_ids=', class_id, ' probabilities=',  probability)

We then print the results. The probability of a 5 is in this example is 38%. We would hope to get something close to, say, 90%. This could be an outlier value. We do not know since he have yet to evaluation the model.

Obviously we need to go back and evaluation the model and try again with additional data. One would think that hotel scores are indeed correlated with the Tripadvisor data that we have given it. But the focus here is just to get the model to work. Now we need to fine tune in and see if another ML model might be more appropriate.

class_ids= 5  probabilities= 0.38341486

Addendum

You can try these to make the discrete value columns as mentioned above:

Usercountry = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Usercountry",47))
Nrreviews = tf.feature_column.numeric_column("Nrreviews")
Nrhotelreviews = tf.feature_column.numeric_column("Nrhotelreviews")
Helpfulvotes = tf.feature_column.numeric_column("Helpfulvotes")
Periodofstay = tf.feature_column.numeric_column("Periodofstay")
Travelertype = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Travelertype",5))
Pool = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Pool",2))
Gym = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Gym",2))
Tenniscourt = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Tenniscourt",2))
Spa = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Spa",2))
Casino = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Casino",2))
Freeinternet = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Freeinternet",2))
Hotelname = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Hotelname",22))
Hotelstars = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Hotelstars",5))
Nrrooms = tf.feature_column.numeric_column("Nrrooms")
Usercontinent = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Usercontinent",6))
Memberyears = tf.feature_column.numeric_column("Memberyears")
Reviewmonth = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Reviewmonth",12))
Reviewweekday = tf.feature_column.indicator_column(tf.feature_column.
categorical_column_with_identity("Reviewweekday",7))
]]>
Introduction to TensorFlow and Logistic Regression https://www.bmc.com/blogs/introduction-to-tensorflow-and-logistic-regression/ Mon, 06 Nov 2017 09:57:49 +0000 http://www.bmc.com/blogs/?p=11421 Here we introduce TensorFlow, an opensource machine learning library developed by Google. We explain what it does and show how to use it to do logistic regression. (This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.) Background TensorFlow has many applications to machine learning, […]]]>

Here we introduce TensorFlow, an opensource machine learning library developed by Google. We explain what it does and show how to use it to do logistic regression.

(This tutorial is part of our Guide to Machine Learning with TensorFlow & Keras. Use the right-hand menu to navigate.)

Background

TensorFlow has many applications to machine learning, including neural networks. One application of neural networks is handwriting analysis. Another is facial recognition. TensorFLow is design to allow such problems to scale without limit as the nodes in the graph can be run across a distributed network. Google uses TensorFlow in some of their production applications.

One interesting aspect about TensorFlow is not only does the logic use the CPU of a machine, it can use the GPU, or graphical processor unit. That provides more power per machine as GPUs typically have a lot of power as powering the computer screen requires speed.

Install and Basic Concepts

To follow this tutorial, first install TF using the directions here.

The basis unit in TensorFlow is the tensor. A tensor is an array of any number of dimensions. For example:

[1] is a 1 dimension array
[[1,1]] is 2 dimension array

To get started, first run Python and import TensorFlow:

import tensorflow as tf

You can assign values directly or make a placeholder where you assign the value later. For example a single value can be written:

x =  tf.constant(3.0, dtype=tf.float32)

Where x is an immutable constant (meaning you cannot change it).

But the tensor has no value until you initiate a Session and run it:

import tensorflow as tf
sess = tf.Session()
x =  tf.constant(3.0, dtype=tf.float32)
print(sess.run([x])) 
Outputs:
[3.0]

Or you can write:

import tensorflow as tf
sess = tf.Session()
y = tf.Variable([3.0], dtype=tf.float32)
init = tf.global_variables_initializer()
sess.run(init) 
print(sess.run([y]))
Outputs:
[array([ 3.], dtype=float32)]

In the example above, the Variable(s) have no value until you run tf.global_variables_initializer().

You can add tensors and do other math, like this:

x =  tf.constant([3,3], dtype=tf.float32)
y =  tf.constant([4,4], dtype=tf.float32)
print (x + y)
print(sess.run([x+y])) 
outputs:
Tensor("add_4:0", shape=(2,), dtype=float32)
[array([ 11.,  11.], dtype=float32)]

As you can see, the values of x and y have no value until you call run.

Here is another example. This is the graph of a line f(x)=mx + b, where m is the slope and b the y-intercept.

m = tf.Variable([2], dtype=tf.float32)
b = tf.Variable([3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
y = m * x + b

You can pass an array of n values to that and run that function n times. Here we use [1, 2, 3, 4]:

init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(y, {x: [1, 2, 3, 4]}))
Outputs:
[  5.   7.   9.  11.]

Linear Regression with tf.estimator

For background on logistic regression, and interpretation of the results, you can read this document from WikiPedia. We also get our test data from that document. The goal is to predict the likelihood that a student will pass a test given how many hours they have studied.

Copy and paste the code below into the Python interpreter as we explain.

Having installed TensorFlow, now run python.

First we import pandas, as it is the easiest way to work with columnar data. The hours are floating numbers, like x.xx. We multiply them by 100 and convert them to an integer since the TensorFlow functions we used for logistic regression require either strings or integers.

import pandas
hours = [0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50]
passx = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
df = pandas.DataFrame(passx)
df['hours'] = hours
df.columns = ['pass', 'hours']
h = df['hours'].apply(lambda x: x * 100).astype(int)
df['hours']=h
print(df)
outputs:
print(df)
hours  pass
0    0.50     0
1    0.75     0
2    1.00     0
3    1.25     0
...

We create a function input_fn that we can pass into the LinearClassifier model below. This function returns a data frame using the tf.estimator.inputs.pandas_input_fn method.

def input_fn(df):
    labels = df["pass"]
    return tf.estimator.inputs.pandas_input_fn(
    x=df,
    y=labels,
    batch_size=100,
    num_epochs=10,
    shuffle=False,
    num_threads=5)

TensorFlow writes its working data to disk, so we give it a place to do that. And we have to create a NumericColumn object, since our independent variable in continuous and not categorical. Then we create the LinearClassifier model.

import tensorflow as tf
import tempfile
model_dir = tempfile.mkdtemp()
hours = tf.feature_column.numeric_column("hours")
base_columns = [hours]
m = tf.estimator.LinearClassifier(model_dir=model_dir, feature_columns=base_columns)

Now we run the train method.

m.train(input_fn(df),steps=None)
Outputs:
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmpS8OD2H/model.ckpt.
INFO:tensorflow:loss = 69.3147, step = 1
INFO:tensorflow:Saving checkpoints for 10 into /tmp/tmpS8OD2H/model.ckpt.
INFO:tensorflow:Loss for final step: 54.1885.
<tensorflow.python.estimator.canned.linear.LinearClassifier object at 0x7f103b560390>

Use same data for test data set as the training set. In real life you would split them in two. But we have very little data here.

results = m.evaluate(input_fn(df),steps=None)
Outputs:
INFO:tensorflow:Starting evaluation at 2017-11-02-14:20:16
INFO:tensorflow:Restoring parameters from /tmp/tmpS8OD2H/model.ckpt-10
INFO:tensorflow:Finished evaluation at 2017-11-02-14:20:16
INFO:tensorflow:Saving dict for global step 10: accuracy = 0.75, accuracy_baseline = 0.5, auc = 0.895, auc_precision_recall = 0.907308, average_loss = 0.535767, global_step = 10, label/mean = 0.5, loss = 53.5767, prediction/mean = 0.585759

Here we print out the same results as above but in an easier to read manner.

print("model directory = %s" % model_dir)
for key in sorted(results):
    print("%s: %s" % (key, results[key]))
Outputs:
accuracy: 0.75
accuracy_baseline: 0.5
auc: 0.895
auc_precision_recall: 0.907308
average_loss: 0.535767
global_step: 10
label/mean: 0.5
loss: 53.5767
prediction/mean: 0.585759

The accuracy could be improved. You could create a larger data set and split the input data into a training and test data set. You could also adjust num_epochs and other values.

]]>