Imagine creating a live chart that updates as data flows in. With this you could watch currency value fluctuations, streaming IOT data, application performance, cybersecurity events, or other data in real time.
Here we show how to pass variables to AngularJS below.
HighCharts is free for non-commercial use. You could also look at Google Charts for the R programming language, which is called googleVis. Regarding Google Charts, it is designed for Google’s own cloud and Google Big Tables rather than open source Apache Spark.
One thing that makes graphing difficult in any language is understanding the different kinds of charts. There are dozens. You can learn something about that by studying these design principles on the HighCharts website.
So let’s get going. Here we are going to give the simplest possible example of passing data to an web page (i.e., AngularJS). Then we give the simplest possible example of Spark Streaming.
First, you need to install Zeppelin. The easiest way to do that is to use Docker:
docker pull dylanmei/zeppelin
docker run –rm -p 8080:8080 dylanmei/zeppelin
Zeppelin lets you pass variables to Angular using z-commands, like z.put, z.run, and z.angularBind. Here we make the simplest possible example. (The problem with most examples on the internet is they are too complicated. No one starts with a very simple example that you can easily understand. So we do.)
Below we make an Array of 1 element. Then we create an RDD and map over it 1 time to pass the value 1 to the HTML table element “1” shown below.
val data = Array(1)
val distData = sc.parallelize(data)
distData.map( l=> z.angularBind(l.toString, 1))
So when you run that and the Spark Scala code it will output:
Below we give an example of Spark Streaming code. In part II of this blog post we will show how to make a graph using this data and HighCharts. But for now just get a feel of how to create streaming code. We also wrote another, more complex example of streaming Twitter Tweets here.
Know that when you run streaming code in Zeppelin it will freeze up your browser, since it is streaming. So you have to kill the Docker process like this:
docker stop $(docker ps -aq)
If you kill it any other way you will have errors because you are trying to run to SparkContexts at the same time.
What this example does is listen on port 80 for traffic on your laptop, i.e., web page traffic. If you were to dump that as text you could see data packets. Here we just create a Dstream and use the info() method to echo some of that. It does not show the whole packet.
To prove that you have data coming into port 80 on your laptop, you could install nmap and then run:
while true;do nmap –packet-trace -A localhost -p 80;sleep 1;done
Here is the code.
val ssc = new StreamingContext(sc, Seconds(5))
val lines = ssc.socketTextStream("localhost", 80)
It should output something like this is a continuous loop:
Time: 1499497850000 ms
- Redis Clustering and Partitioning for Beginners
- Big Data, the Internet of Things, and Extracting Insights that Create Value
- Common Challenges with Big Data Deployments
- Using Spark with Hive
- Big Data, Big Team: The Making of Control-M for Hadoop