image_pdfimage_print

Here we show how to load CSV data into ElasticSearch using Logstash.

The file we use is network traffic. There are no heading fields, so we will add them.

Download and Unzip the Data

Download this file eecs498.zip from Kaggle. Then unzip it. The resulting file is conn250K.csv. It has 256,670 records.

Next, change permissions on the file, since the permissions are set to no permissions.

chmod 777 conn250K.csv

Now, create this logstash file csv.config, changing the path and server name to match your environment.

input {
  file {
    path => "/home/ubuntu/Documents/esearch/conn250K.csv"
    start_position => "beginning"
  }
}

filter {
      csv {
        columns => [ "record_id", "duration", "src_bytes", "dest_bytes" ]
     }
    }

output {
  elasticsearch { 
  hosts => ["parisx:9200"] 
  index => "network"
  }

 }

Then start logstash giving that config file name.

sudo bin/logstash -f config/csv.conf

While the load is running, you can list some documents:

curl XGET http://parisx:9200/network/_search?pretty

results in:

"_index" : "network",
        "_type" : "_doc",
        "_id" : "dmx9emwB7Q7sfK_2g0Zo",
        "_score" : 1.0,
        "_source" : {
          "record_id" : "72552",
          "duration" : "0",
          "src_bytes" : "297",
          "host" : "paris",
          "message" : "72552,0,297,9317",
          "@version" : "1",
          "@timestamp" : "2019-08-10T07:45:41.642Z",
          "dest_bytes" : "9317",
          "path" : "/home/ubuntu/Documents/esearch/conn250K.csv"
        }

You can run this query to follow when the data load is complete, which is when the document count is 256,670.

curl XGET http://parisx:9200/_cat/indices?v

Create Index Pattern in Kibana

Open Kibana.

Create the Index Pattern. Don’t use @timestamp as a key field as that only refers to the time we loaded the data into Logstash. Unfortunately, the data provided by Kaggle does not include any date, which is strange for network data. But we can use the record_id in later time series analysis.

Now go to the Discover tab and list some documents:

In the next blog post we will show how to use Elasticsearch Machine Learning to do Anomaly Detection on this network traffic.

Automate workflows to simplify your big data lifecycle

In this e-book, you’ll learn how you can automate your entire big data lifecycle from end to end—and cloud to cloud—to deliver insights more quickly, easily, and reliably.


Last updated: 08/22/2019

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

BMC Bring the A-Game

From core to cloud to edge, BMC delivers the software and services that enable nearly 10,000 global customers, including 84% of the Forbes Global 100, to thrive in their ongoing evolution to an Autonomous Digital Enterprise.
Learn more about BMC ›

About the author

Walker Rowe

Walker Rowe

Walker Rowe is an American freelancer tech writer and programmer living in Cyprus. He writes tutorials on analytics and big data and specializes in documenting SDKs and APIs. He is the founder of the Hypatia Academy Cyprus, an online school to teach secondary school children programming. You can find Walker here and here.