Walker Rowe – BMC Blogs https://www.bmc.com/blogs BMC Software Tue, 17 Jul 2018 20:43:54 +0000 en-US hourly 1 https://blogs.bmc.com/wp-content/uploads/2016/04/bmc_favicon-300x300-150x150.png Walker Rowe – BMC Blogs https://www.bmc.com/blogs 32 32 Bias and Variance in Machine Learning https://www.bmc.com/blogs/bias-variance-machine-learning/ Tue, 10 Jul 2018 00:00:56 +0000 https://www.bmc.com/blogs/?p=12530 The risk in following ML models is they could be based on false assumptions and skewed by noise and outliers. That could lead to making bad predictions. That is why ML cannot be a black box. The user must understand the data and algorithms if the models are to be trusted. So here we look […]]]> Mean Squared Error, R2, and Variance in Regression Analysis https://www.bmc.com/blogs/mean-squared-error-r2-and-variance-in-regression-analysis/ Thu, 05 Jul 2018 00:00:59 +0000 http://blogs.bmc.com/?p=12474 Here we introduce some terms important to machine learning; variance, r2 score, and mean square error. We illustrate with these concepts using scikit-learn. It is important to understand these metrics to determine whether regression models are accurate or misleading. Following a flawed model is a bad idea. So it is important that you can quantify […]]]> Getting Started with scikit-learn https://www.bmc.com/blogs/scikit-learn/ Wed, 27 Jun 2018 00:00:31 +0000 http://blogs.bmc.com/?p=12419 Here we explore another machine learning framework, scikit-learn, as well as show how to use matplotlib, to draw graphs. The scikit-learn python ML API predates Apache Spark and TensorFlow, which is to say it has been around longer than big data. It has long been used by those who see themselves as pure data scientists, […]]]> Introduction to Spark’s Machine Learning Pipeline https://www.bmc.com/blogs/introduction-to-sparks-machine-learning-pipeline/ Wed, 06 Jun 2018 00:00:36 +0000 http://blogs.bmc.com/?p=12336 Here we explain what is a Spark machine learning pipeline. We will do this by converting existing code that we wrote, which is done in stages, to pipeline format. This will run all the data transformation and model fit operations under the pipeline mechanism. The existing Apache Spark ML code is explained in two blog […]]]> How to use Apache Spark to make predictions for preventive maintenance https://www.bmc.com/blogs/how-to-use-apache-spark-to-make-predictions-for-preventive-maintenance/ Fri, 25 May 2018 00:00:39 +0000 http://blogs.bmc.com/?p=12286 In part one we explained how to create a training model. In this part we show how to make predictions to show which machines in our dataset should be taken out of service for maintenance. First, here is how to submit the job to Spark with spark-submit: jar file that contains com.bmc.lr.makePrediction what file to […]]]> Predictive and Preventive Maintenance using IoT, Machine Learning & Apache Spark https://www.bmc.com/blogs/predictive-and-preventive-maintenance-using-iot-machine-learning-apache-spark/ Wed, 23 May 2018 00:00:04 +0000 http://blogs.bmc.com/?p=12280 Here we explain a use case of how to use Apache Spark and machine learning. This is the classic preventive maintenance problem, one of the most common business use cases of machine learning and IoT too. We take the data for this analysis from the Kaggle website, a site dedicated to data science. This is […]]]> Using Tensorflow Neural Network for Machine Learning Predictions with TripAdvisor Data https://www.bmc.com/blogs/using-tensorflow-neural-network-for-machine-learning-predictions-with-tripadvisor-data/ Mon, 21 May 2018 00:00:52 +0000 http://blogs.bmc.com/?p=12259 Here is the last part of our analysis of the Tripadvisor data. Part one is here. In order to understand this, you will need to know Python and Numpy Arrays and the basics behind tensorflow and neural networks. If you do not, you can read an introduction to tensorflow here. The code from this example […]]]> What is Refactoring? Code Refactoring Explained https://www.bmc.com/blogs/code-refactoring-2/ Fri, 18 May 2018 00:00:14 +0000 http://blogs.bmc.com/?p=12255   Code refactoring means to take a working program and change it to make some improvements. It changes the code but not the outcome. These improvements can: make it easier for other programmers to read make aesthetic improvements such as implement a clever idea make the program run faster or use less resources adhere to […]]]> Introduction to Google Cloud Machine Learning Engine https://www.bmc.com/blogs/google-cloud-machine-learning-engine/ Thu, 10 May 2018 00:00:36 +0000 http://blogs.bmc.com/?p=12220 The Google Cloud Machine Learning Engine is almost exactly the same as Amazon Sagemaker. It is not a SaaS program that you can just upload data to and start using like the Google Natural Language API. Instead, you have to program Google Cloud ML using any of the ML frameworks such as TensorFlow, scikit-learn, XGBoost, […]]]> AWS Linear Learner: Using Amazon SageMaker for Logistic Regression https://www.bmc.com/blogs/aws-linear-learner/ Mon, 16 Apr 2018 00:00:58 +0000 http://blogs.bmc.com/?p=12139 In the last blog post we showed you how to use Amazon SageMaker. So read that one before you read this one because there we show screen prints and explain how to use the graphical interface of the product, including its hosted Jupyter Notebooks feature. We also introduced the SageMaker API, which is a front […]]]>