Zeppelin supports many different language interpreters and features. For example:
|Hadoop File operations||Psql|
|Hbase||Linux shell (sh)|
And you can add 3rd party ones.
Below we give a Spark example. But first here is a little Markdown:
You click the arrow to run the code and it displays the text in markdown format, i.e., the same syntax to create README.md files at Github.
You could run Zeppelin as a Jupyter notebook already bundled with Spark, etc. Install it with apt-get or yum. We downloaded it from here. Then unzip it and copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml.
Then you might have to change the IP address it listens on from 0.0.0.0 to something else:
Then start it:
Important note about security
If you set up Zepplin so that you can reach it over the public internet, then it is extremely important that you put a password on it. This is because someone can create a bash shell interpreter and do serious damage to your server. Follow the simple instructions from Zeppelin here to do that.
Here we show a simple example of how to read a text file, convert it to a Spark dataframe, then query it using SQL to create a table display and graph.
- First download the data used in this example from here. Then copy it to the Hadoop file system or local file system.
- Then open Zeppelin. It is usually listening on http://localhost:8080.
- Then click “Create new note.”
- Now select the Spark interpreter and give the notebook some name. It assumes that the Spark master is on the same server as Zeppelin. Note you can change the properties on the interpreter tab.
- Then post the code below into the cell shown below. Since we chose the interpreter Spark it is not necessary to make the first line %spark. Note: one annoying feature is that you cannot use the right mouse button and select past. Instead you have to use (-ctl-)(-v-).
Below in pink we highlight some of the buttons. Reading left to right and top to bottom we have:
- Run code (called Paragraph)
- Clear output, meaning clear the results displayed
- Run one paragraph
- Access submenu
- Add a new paragraph
var data = sc.textFile("file:/home/walker/Downloads/lpsa.data")
val schemaStr = "a b c d e"
val fields = schemaStr.split(" ").map(fieldName => StructField(fieldName, DoubleType, nullable = true))
val schema = StructType(fields)
val parsedData = data.map( l => l.replace(",", " ").split(" "))
case class DataClass(a: Double, b: Double, c: Double, d: Double, e: Double)
var x = parsedData.map(b => DataClass(b(0).toDouble, b(1).toDouble, b(2).toDouble, b(3).toDouble, b(4).toDouble))
var df = x.toDF()
- Now in the next paragraph in the interpreter you can use regular SQL again the table you just registered as a temporary table. Enter the code below and run it. Then look at the data below. Now there are buttons where you can create graphs.
select * from df
Click the pie chart icon and now you have a pie chart. (Here we selected only 2 columns. Change that under the settings option.)
My experience is that Zeppelin is kind of buggy. It seems to lose notebooks at times, it does not always update variables, and the graph feature crashes. So you might have to delete paragraphs and put them back etc. to fix those. Also export your notebooks to JSON format to have a backup.
- Working with Streaming Twitter Data Using Kafka
- Best Practices in Big Data Automation: Thinking Beyond Oozie to the Enterprise Requirements
- Bring Big Data To Life at Strata Hadoop World
- How Hadoop & Workload Automation Make FRTB Compliance More Manageable
- Introduction to Neural Networks Part II