Using ElasticSearch with Apache Spark

Machine Learning & Big Data Blog

Using ElasticSearch with Apache Spark

2 minute read

ElasticSearch is a JSON database popular with log processing systems. For example, organizations...

Machine Learning & Big Data Blog

Using Spark with Hive

3 minute read

Here we explain how to use Apache Spark with Hive. That means instead of Hive storing data in...

Machine Learning & Big Data Blog

How to write a Hive User Defined Function (UDF) in Java

3 minute read

Here we show how to write user defined functions (UDF) in Java and call that from Hive. You can...

Machine Learning & Big Data Blog

What is Apache HCatalog? HCatalog Explained

4 minute read

Here we explain what HCatalog is and why it is useful to Hadoop programmers. Basically, HCatalog...

Machine Learning & Big Data Blog

Apache Hive Beeline Client, Import CSV File into Hive

4 minute read

Beeline has replaced the Hive CLI in what Hive was formally called HiveServer1. Now Hive is called...

Machine Learning & Big Data Blog

K-means Clustering with Apache Spark

3 minute read

Here we show a simple example of how to use k-means clustering. We will look at crime statistics...

Machine Learning & Big Data Blog

Apache Spark: Working with Streams

4 minute read

In the last two posts we wrote, we explained how to read data streaming from Twitter into Apache...

Machine Learning & Big Data Blog

Using Zeppelin with Big Data

4 minute read

Zeppelin is an interactive notebook. It lets you write code into a web page, execute it, and...

Machine Learning & Big Data Blog

Spark Decision Tree Classifier

4 minute read

Here we explain how to use the Decision Tree Classifier with Apache Spark ML (machine learning). We...

Machine Learning & Big Data Blog

Using Logistic Regression, Scala, and Spark

5 minute read

Here we explain how to do logistic regression with Apache Spark. Logistic regression (LR) is...

Author - Walker Rowe