Machine Learning & Big Data Blog – BMC Blogs BMC Software Wed, 20 Jun 2018 15:35:19 +0000 en-US hourly 1 Machine Learning & Big Data Blog – BMC Blogs 32 32 Top 5 Machine Learning Algorithms for Beginners Fri, 08 Jun 2018 00:00:18 +0000 Machine learning is a major component in the race towards artificial intelligence. Whether you’re seeking true artificial intelligence or simply trying to gain insight from all the data you’ve been collecting, machine learning is a major step forward. But where to get started? If you’re a beginner, machine learning can feel overwhelming – how to […]]]> Introduction to Spark’s Machine Learning Pipeline Wed, 06 Jun 2018 00:00:36 +0000 Here we explain what is a Spark machine learning pipeline. We will do this by converting existing code that we wrote, which is done in stages, to pipeline format. This will run all the data transformation and model fit operations under the pipeline mechanism. The existing Apache Spark ML code is explained in two blog […]]]> NLU vs NLP: What’s the Difference? Mon, 28 May 2018 00:00:04 +0000 In the 21st century, computers can analyze all sorts of data, providing insights and performing tasks based on the learned outcome. When that data is language, however, it is a whole different world. Asking a computer to process real-world language is more complicated and difficult to mine in an efficient manner that offers productive results. […]]]> How to use Apache Spark to make predictions for preventive maintenance Fri, 25 May 2018 00:00:39 +0000 In part one we explained how to create a training model. In this part we show how to make predictions to show which machines in our dataset should be taken out of service for maintenance. First, here is how to submit the job to Spark with spark-submit: jar file that contains what file to […]]]> Predictive and Preventive Maintenance using IoT, Machine Learning & Apache Spark Wed, 23 May 2018 00:00:04 +0000 Here we explain a use case of how to use Apache Spark and machine learning. This is the classic preventive maintenance problem, one of the most common business use cases of machine learning and IoT too. We take the data for this analysis from the Kaggle website, a site dedicated to data science. This is […]]]> Using Tensorflow Neural Network for Machine Learning Predictions with TripAdvisor Data Mon, 21 May 2018 00:00:52 +0000 Here is the last part of our analysis of the Tripadvisor data. Part one is here. In order to understand this, you will need to know Python and Numpy Arrays and the basics behind tensorflow and neural networks. If you do not, you can read an introduction to tensorflow here. The code from this example […]]]> What is Refactoring? Code Refactoring Explained Fri, 18 May 2018 00:00:14 +0000   Code refactoring means to take a working program and change it to make some improvements. It changes the code but not the outcome. These improvements can: make it easier for other programmers to read make aesthetic improvements such as implement a clever idea make the program run faster or use less resources adhere to […]]]> Introduction to Google Cloud Machine Learning Engine Thu, 10 May 2018 00:00:36 +0000 The Google Cloud Machine Learning Engine is almost exactly the same as Amazon Sagemaker. It is not a SaaS program that you can just upload data to and start using like the Google Natural Language API. Instead, you have to program Google Cloud ML using any of the ML frameworks such as TensorFlow, scikit-learn, XGBoost, […]]]> What is AI-as-a-Service? AIaaS Explained Wed, 25 Apr 2018 00:00:48 +0000 We’ve all heard of IaaS and SaaS before. These terms have become ubiquitous for infrastructure-as-a-service and software-as-a-service. Another variant is PaaS, short for platform-as-a-Service. Today, most companies are using at least one type of “as a service” offering as a way to focus on their core business and spend less money on an important service. […]]]> Real Time vs Batch Processing vs Stream Processing: What’s The Difference? Tue, 17 Apr 2018 00:00:24 +0000 With the constant rate of current innovations, developers can expect to analyze terabytes and even petabytes of data in any given period of time. While this allows advantages far beyond what we can see, it can be difficult to know the best way to accelerate and speed up these technologies, especially when reactions must occur […]]]>