Using Zeppelin with Big Data

BY

Zeppelin is an interactive notebook. It lets you write code into a web page, execute it, and display the results in a table or graph. It also does much more as it supports markdown and JavaScript (Angular). So you can write code, hide it from your users, and create beautiful reports and share them. And you can also create real time reports and … [Read more...]

Spark Decision Tree Classifier

BY

Here we explain how to use the Decision Tree Classifier with Apache Spark ML (machine learning). We use data from The University of Pennsylvania here and here. We write the solution in Scala code and walk the reader through each line of the code. Do not bother to read the mathematics part of the lecture notes from Penn, unless you know a lot of … [Read more...]

Using Logistic Regression, Scala, and Spark

BY

Here we explain how to do logistic regression with Apache Spark. Logistic regression (LR) is closely related to linear regression.  But instead of predicting a dependant value given some independent input values it predicts a probability and binary, yes or no, outcome. You use linear or logistic regression when you believe there is some … [Read more...]

SGD Linear Regression Example with Apache Spark

BY

This article explains how to do linear regression with Apache Spark. It assumes you have some basic knowledge of linear regression. If you do not, then you need to learn about it as it is one of the simplest ideas in statistics. Also, most machine language models are an extension of this basic idea. It is so simple to understand and use that you … [Read more...]