Machine Learning & Big Data Blog – BMC Blogs https://www.bmc.com/blogs BMC Software Fri, 07 Dec 2018 13:50:33 +0000 en-US hourly 1 https://blogs.bmc.com/wp-content/uploads/2016/04/bmc_favicon-300x300-150x150.png Machine Learning & Big Data Blog – BMC Blogs https://www.bmc.com/blogs 32 32 How to Setup a Cassandra Cluster https://www.bmc.com/blogs/setup-cassandra-cluster/ Thu, 06 Dec 2018 00:00:05 +0000 https://www.bmc.com/blogs/?p=13165 Here we show how to set up a Cassandra cluster. We will use two machines, 172.31.47.43 and 172.31.46.15. First, open these firewall ports on both: 7000 7001 7199 9042 9160 9142 Then follow this document to install Cassandra and get familiar with its basic concepts. Make sure to install Cassandra on each node. Configure Cluster […]]]> ElasticSearch Joins: Has_Child, Has_parent query https://www.bmc.com/blogs/elasticsearch-joins-child-parent/ Thu, 06 Dec 2018 00:00:04 +0000 https://www.bmc.com/blogs/?p=13182 Once again we tackle the complexity and sometimes contradictory documentation of ElasticSearch and try to make it easier to understand. Here we look at how to parent-child relationships between documents. The Concepts Parent, Child, and Join In a relational database a parent-child relationship is called a join. A mathematician would call that the intersection of […]]]> ElasticSearch Nested Queries: How to Search for Embedded Documents https://www.bmc.com/blogs/elasticsearch-nested-searches-embedded-documents/ Fri, 16 Nov 2018 00:00:55 +0000 https://www.bmc.com/blogs/?p=13053 ElasticSearch is annoyingly complicated at times. You can run a search and it runs the wrong results and you are not made aware of that. This can happen when, for example, you have a nested JSON document, i.e., one JSON document inside another. This is because Lucene (i.e., ElasticSearch) query has no understanding of object […]]]> ElasticSearch Search Syntax and Boolean and Aggregation Searches https://www.bmc.com/blogs/elasticsearch-search-syntax-boolean-aggregation/ Tue, 13 Nov 2018 00:00:59 +0000 https://www.bmc.com/blogs/?p=13033 Here we explain how to do searches in ElasticSearch (ES). ES has a seemingly endless list of search options, which can seem overwhelming. So we will start with some simple examples and build from there. We look at these items: Basic Search Syntax 2 Boolean Searches 2 Aggregation 4 Prerequisites If you have not installed […]]]> How to Setup An ElasticSearch Cluster on Amazon EC2 https://www.bmc.com/blogs/how-to-setup-elasticsearch-cluster-amazon-ec2/ Fri, 09 Nov 2018 00:00:07 +0000 https://www.bmc.com/blogs/?p=13038 Here we explain how to setup an ElasticSearch (ES) cluster on Amazon EC2. The main difference between Amazon and non-Amazon is Amazon considers unicast to be a security weakness, since it broadcasts the existence of servers across the network. So they have their own mechanism for node discovery, the ElasticSearch EC2 Discovery Plugin. With Amazon, […]]]> Will Robots Replace My Service Desk? https://www.bmc.com/blogs/will-robots-replace-my-service-desk/ Thu, 01 Nov 2018 00:00:33 +0000 https://www.bmc.com/blogs/?p=13003 The explosive growth of artificial intelligence, predictive analytics and big data tools have had a revolutionary impact on the way businesses consider, evaluate and execute their strategy. In fact, PwC estimates AI will have a $15.7 trillion dollar impact on the global economy by 2030, and this trend is only likely to accelerate as these […]]]> How to write Apache Spark data to ElasticSearch using Python https://www.bmc.com/blogs/write-apache-spark-elasticsearch-python/ Fri, 26 Oct 2018 00:00:50 +0000 https://www.bmc.com/blogs/?p=12987 Here we explain how to write Apache Spark data to ElasticSearch (ES) using Python. We will write Apache log data into ES. This topic is made complicated, because of all the bad, convoluted examples on the internet. But here we make it easy. One complicating factor is that Spark provides native support for writing to […]]]> Spark ElasticSearch Hadoop Update and Upsert Example and Explanation https://www.bmc.com/blogs/spark-elasticsearch-hadoop/ Thu, 25 Oct 2018 00:00:05 +0000 https://www.bmc.com/blogs/?p=12983 Here we explain how to write Python to code to update an ElasticSearch document from an Apache Spark Dataframe and RDD. There are few instructions on the internet. Those written by ElasticSearch are difficult to understand and offer no examples. So we make the simplest possible example here. This code adds additional fields to an […]]]> ElasticSearch Commands Cheat Sheet https://www.bmc.com/blogs/elasticsearch-commands/ Mon, 15 Oct 2018 00:00:15 +0000 https://www.bmc.com/blogs/?p=12964 Here we show some of the most common ElasticSearch commands using curl. ElasticSearch is sometimes complicated. So here we make it simple. delete index Below the index is named samples. curl -X DELETE 'http://localhost:9200/samples' list all indexes curl -X GET 'http://localhost:9200/_cat/indices?v' list all docs in index curl -X GET 'http://localhost:9200/sample/_search' query using URL parameters Here […]]]> Installing Jupyter for Big Data and Analytics https://www.bmc.com/blogs/installing-jupyter-for-big-data-and-analytics/ Thu, 11 Oct 2018 10:00:26 +0000 https://www.bmc.com/blogs/?p=12925 Jupyter and Zeppelin both provide an interactive Python, Scala, Spark, etc. interpreter. Plus they do what the command line cannot, which is support graphical output with graphing packages like matplotlib. While I personally prefer Zeppelin, it seems more data scientists and big data engineers are using Jupyter (aka iPython). For example most of the interactive […]]]>