Using Apache Hive with ElasticSearch

BY

Here we explain how to use Apache Hive with ElasticSearch. We will copy an Apache webserver log into ElasticSearch then use Hive SQL to query it. Why do this? Hive lets you write user defined functions and use SQL (actually HQL) which is easier to work with and provides more functions that ElasticSearch, whose query language is Lucene Query. For … [Read more...]

Using ElasticSearch with Apache Spark

BY

ElasticSearch is a JSON database popular with log processing systems. For example, organizations often use ElasticSearch with logstash or filebeat to send web server logs, Windows events, Linux syslogs, and other data there. Then they use the Kibana web interface to query log events. All of this is important for cybersecurity, operations, … [Read more...]

Using Spark with Hive

BY

Here we explain how to use Apache Spark with Hive. That means instead of Hive storing data in Hadoop it stores it in Spark. The reason people use Spark instead of Hadoop is it is an all-memory database. So Hive jobs will run much faster there. Plus it moves programmers toward using a common database if your company runs predominately Spark. It … [Read more...]

How to write a Hive User Defined Function (UDF) in Java

BY

Here we show how to a write user defined functions (UDF) in Java and call that from Hive. You can then use a UDF in Hive SQL statements. It runs over whatever element you send it and then returns a result. So you would write a function to format strings or even do something far more complex. In this example, we use this … [Read more...]

What is Apache HCatalog? HCatalog Explained

BY

Here we explain what HCatalog is and why it is useful to Hadoop programmers. Basically, HCatalog provides a consistent interface between Apache Hive, Apache Pig, and MapReduce. Since it ships with Hive, you could consider it an extension of Hive. (We have written tutorials here on Apache Pig, MapReduce, and Hive.) Why this Matters To … [Read more...]

Apache Hive Beeline Client, Import CSV File into Hive

BY

Beeline has replaced the Hive CLI in what Hive was formally called HiveServer1. Now Hive is called HiveServer2 and the new, improved CLI is Beeline. Apache Hive says, “HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline. HiveCLI is now deprecated in favor of Beeline, as it lacks the multi-user, security, and other capabilities … [Read more...]

Graphing Spark Data with HighCharts

BY

Here we look at how to use HighCharts with Spark. HighCharts is a charting framework written in JavaScript. It works with both static and streaming data. So you can make live charts with it. And their collection of charts is a beautiful set of designs, made larger by the annual competition they hold. HighCharts is free for non-commercial use. It … [Read more...]

Basics of Graphing Streaming Big Data

BY

Imagine creating a live chart that updates as data flows in. With this you could watch currency value fluctuations, streaming IOT data, application performance, cybersecurity events, or other data in real time. It is not so hard to create Spark Streaming data. We give an example below. But creating any graphs more elaborate than simple SQL … [Read more...]