With all the attention that big data and machine learning get from the media lately, it’s no wonder that so many people are asking, “What is machine learning?” In this third post of BMC’s big data series, you’ll see a powerful example of machine learning at play in the Beijing economy, followed by a concise answer to the question, “What is machine learning?”, and lastly, you’re going to get a brief overview of how machine learning methods can benefit your business.
One way to answer the question, “What is machine learning?” is to look at an example of it in action. To say that Beijing has an environmental health problem, is the same say that it has an economic problem. IBM’s Green Horizon, a recently developed machine learning model, pinpoints why this is true and what Beijing officials can do to solve the problem.
The alarming extent to which Beijing’s air quality exceeds international standards has been making the news lately. No doubt that you’ve heard about it… and it’s not just hype either. Within 48 hours of the writing of this article, the Air Quality Index Network showed Beijing’s PM10 concentration readings at a maximum of 786 ug/m3; A stark contrast from the World Health Organization‘s target concentration of 20 ug/m3 for PM10 (the lower limit of where people begin exhibiting an increase in cardiopulmonary and lung cancer mortality). Researchers have discovered an interdependent relationship between Beijing’s air quality, and the city’s weather patterns, factory operations, and automotive activities. With almost 9 million residents and a vast sensor network of continuously streaming sensor data sources, quantifying these incredibly complex interdependent relationships is beyond human cognitive capacity. The risk that these pollution levels pose to human health necessitated that Beijing find a solution-and fast. But how?
Although the scale, magnitude, and complexity of Beijing’s air pollution problem is beyond what the human mind can solve, these circumstances are typical problems solved using machine learning methods. Green Horizon gives the Chinese government the information they need to begin curtailing pollutant emissions in the Beijing region. By combining sensor data from the city’s air quality monitoring network, with factory location and operation data, as well as weather patterns, the model generates predictive insights about how bad air quality will be at any time, in any given city sub-region. Since the model is built on machine learning algorithms, its predictions become increasingly accurate the longer it operates.
These predictions make for only half of the solution though… the Chinese government also needs to know what actions they can take to curb air pollutant emissions. That’s where the model’s prescriptive insights come in. Beijing officials know they have the power to curb air pollutant emissions by either taking some of the local factories offline, or by restricting the number of cars that are allowed on the road. They just need to know exactly how many cars to restrict, or what plants to take offline on a daily basis, in order to reach their given air quality goal. Green Horizon tells them just that. The model couples its air quality predictions with defined parameters for local factory and vehicle emissions, to inform Beijing officials about what exact actions they can take to reach their air quality goals.
Now for the economic problem-Beijing’s economy is heavily dependent on the wellbeing of businesses operating in the manufacturing sector. To force factories to go offline is to harm local business. This in turn harms Beijing’s economy. But by relying on Green Horizon’s prescriptive insights, Beijing officials can begin incentivizing (or deterring) businesses and residents to make more environmentally-responsible choices. For example, the Chinese government can incentivize businesses to switch from coal to renewable energy sources, or it can begin monthly plant inspections and then fine factories that exceed emissions standards.
What is Machine Learning?
A simple definition to answer the question, “What is machine learning?”, is as follows. Machine learning is an analytical method that uses algorithms to iteratively learn from, and find hidden patterns in, large datasets. As mentioned in the example above, machine learning models are adaptive. As new data is introduced, machine learning models adapt and become increasingly accurate in their predictions. The process of applying a machine learning method can be broken into 5 main steps.
- Data Analysis – This is where you produce descriptive statistics and carry out exploratory data analysis.
- Data Preparation – This is where you clean, reformat, munge, and aggregate your data.
- Algorithm Testing and Evaluation – In step 3, you deploy various algorithms and then compare and evaluate the results of each. Select the appropriate algorithm based on your findings.
- Model Fine-Tuning – This is where you make small tweaks to your model, in order to improve its performance.
- Data Presentation – In the final step, you present your findings to your team and begin working to get the model into production.
How Machine Learning Can Benefit Your Business
Machine learning approaches are being deployed to solve all sorts of problems in all sorts of places. From Beijing’s environmental-economic crisis, to Google Gmail’s continuous battle against spam, it’s hard to imagine a scenario where machine learning methods couldn’t be put to good use.
With respect to business applications, machine learning methods are being used to do things like:
- Build more accurate pricing models
- Detect network intrusions
- Generate real-time targeted advertising on websites
- Reach record sales via recommendation engine deployment
- Improve demand forecasts in retail
- Detect and prevent fraud in real-time
Many of these methods, as well as other important topics in big data and data science, are discussed in greater detail in Managing Big Data Workflows for Dummies. If you liked the way this post answered the question, “What is machine learning?”, then you’ll love this Dummies guide! Grab your downloadable version here.
These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.