Bharat Saxena – BMC Software | Blogs

Supervised, Unsupervised & Other Machine Learning Methods

Bharat Saxena — Wed, 23 Jun 2021 11:18:04 +0000

Machine learning is augmenting human capabilities and making things possible—things that just a few years back were considered impossible.

Take, for example, the protein folding problem. For about 50 years, the biology field assumed that solving this problem was beyond human capabilities. But with the might of AI and ML, folks at DeepMind were finally able to come up with a solution to this problem.

ML-based applications are ubiquitous these days and they continue to evolve day by day. Before long we might also manage to build a fully autonomous driving vehicle.

But then the question arises: how exactly do you make a machine learn?

Let’s look at the two most well-known machine learning methods—supervised and unsupervised learning. We’ll deep dive into how they both work, and we’ll look at up-and-coming learning methods, too.

Machines can learn

We are very familiar with the paradigm of coding programs. Coding is akin to explicitly telling the machine what to do. The programmed machine cannot make a decision on its own. And it most certainly cannot handle a situation that it hasn’t been programmed for.

This is like giving machines a fish when, really, we want to teach machines how to fish.

In the field of AI and ML the way machines are made to learn generally fall under two categories:

Supervised learning
Unsupervised learning

In a nutshell, the difference between these two methods is that in supervised learning we also provide the correct results in terms of labeled data. Labeled data in machine learning parlance means that we know the correct output values of the data beforehand.

In unsupervised machine learning, the data is not labeled. So, in unsupervised learning the machines are left to fend for themselves, you may ask? Not quite.

(Understand the role of data annotation in ML.)

How supervised machine learning works

The notion of ‘supervision’ in supervised machine learning comes from the labeled data.

With the help of labels, the predictions a machine learning model makes can be compared against the known correct values. This helps with gauging the accuracy of the model and calculation of loss. This in turn can be used as a feedback to the model to further improve its predictions. (This labeled data seems like the answer to all our problems, right? What could ever go wrong!)

But, as they say: with great power comes great responsibilities. We need to be careful with the extent we used the labels in during the supervised learning or in machine learning jargon how much we train our model.

The pitfall of too much training is overfitting. This is what happens when the ML model learns the training data so well that, when new data comes in, the model often fails to perform correctly.

(Unsupervised learning algorithm can also face overfitting, but it is more prevalent in supervised learning algorithms. Eagerness to train one more epoch, for the sake of better accuracy, often leads into overfitting.)

Broadly, supervised machine learning finds its application in 2 types of tasks:

Classification
Regression

Classification

In this type of tasks, the model tries to classify a given input into one of the data categories.

For example, classifying a tumor as malignant or benign. Here we train the model on the input data which has already been correctly labeled with either malignant or benign. We compare the generated output with these labels and re-train the model to achieve a robust model.

Regression

In this type of tasks, the model tries to predict a numerical value (real number).

An example of this is predicting housing prices given housing data. The key point here is that the output in this case is a real or continuous value—it’s not one bucket or the other, as in classification. Again, we compare the predicted value with the known correct values and make further tweaks to improve the model.

As you can see, in both classification and regression, the labels themselves are, in a sense, “supervising” the training of the machine learning model.

How unsupervised learning works

On the other side of the aisle, unsupervised learning algorithms work on unlabeled data. This is where the notion of unsupervised comes from: there are no labels for the model to course correct against while training.

But the absence of labels does not mean that unsupervised learning methods wander aimlessly. Actually, these algorithms look for underlying patterns or connections within the data and uses that to help understand/analyze the data.

So, why would someone use unlabeled data in the first place, you may ask? There can be multiple reasons for this:

Sometimes labeled data is simply not available.
Often, the cost to label the data is very high.
The size of the data is so huge that it is impossible to get labels added in a reasonable time.

This is not all bad. In fact, you’ll often use unsupervised learning algorithms when you don’t know what you are looking for.

For example, you have demographic data of various grocery shoppers from a city and you want to group the users into logical groups. Then the unsupervised learning algorithms can help identify the clusters/groups in the data. This is called clustering, and it’s the most common application of unsupervised learning algorithms.

Unsupervised learning algorithms are not limited to clustering tasks alone. Other applications are reducing dimensions and estimating density.

Clustering

As we saw in the example, clustering is where an algorithm finds similarities within the data points and groups similar data together. This can be based on:

Distance (K-Means)
Density (DBSCAN)
And other characteristics

(Learn about common ML architectures, including K-Means.)

Dimensionality reduction

Often, the data contains way too many features—and not all features contribute equally to the predicting power of the model. So, using unsupervised learning algorithms can help to remove superfluous features from the dataset.

Density estimation

The aim of density estimation is to:

Discover relations among attributes in data.
Generate underlying probability density function based on the data.

One common use case for this method is anomaly detection.

(Explore anomaly detection with ML.)

Supervised vs unsupervised learning algorithms

By now, we can say that the main difference between these two categories of algorithms lies in the labeling of the training data.

But you’ll also need to consider other factors when building a machine learning pipeline, such as:

Using unsupervised methods on labeled data. Doing so can identify hidden traits as a part of Exploratory Data Analysis (EDA).
Using supervised and unsupervised algorithms together. For example, you can use unsupervised learning algorithms to reduce the dimensionality of the labeled data, and then proceed with supervised leaning algorithms.
Using manual effort to label data. If the data is unlabeled and the use case desires highly accurate classification into specially defined categories, then you can apply manual effort to add labels to the input data. Often, this process is time consuming and expensive, but it’s always available for situations that demand it.

What is semi-supervised learning?

Machine learning algorithms yearn for data. The more, the merrier. These days we are generating data at an astronomical rate. So far so good, but only a very small portion of this data is labeled. So, while unsupervised learning methods can identify patterns and create clusters, they cannot be guided to either:

Create custom/or specific clusters
Attain a certain accuracy threshold

Luckily, we don’t always have to choose between large datasets or the customization and accuracy of supervised learning.

A new class of algorithms—semi-supervised algorithm—can learn from partially labeled datasets. These algorithms are especially useful in:

Natural language processing (NLP) tasks like classifying text
The field of medicine, like for protein sequence classification

These are both areas where we have loads of data, but it’s mostly unlabeled. So, here’s how semi-supervised machine learning works:

We add labels to a fraction of the data either algorithmically or via human labor.
Next, we use unsupervised learning algorithms to create clusters of similar data points.
Then we use that labeled data to further train the rest of the unlabeled data.

Semi-supervised ML is like an explorer getting the lay of the land upon arrival, and then seeking the help of locals to get to know individual areas more intimately.

One example of semi-supervised learning algorithm based system is Google’s Expander, a technology that Google uses in many of its products including Gmail and Google Photos.

(See the similarities with human-in-the-loop ML.)

The path ahead

In this article we have covered a lot of ground on two major types of machine learning algorithms, finally arriving at semi-supervised learning, which epitomizes the notion of: why choose when you can have both?

Importantly, we are not limited to just these methods. Through continuous evolving, new methods are emerging to tackle new problems.

Reinforcement learning

For example, another class of machine learning approach is reinforcement learning. Instead of relying on labeled or unlabeled data, reinforcement learning employs the concept of rewards and penalties to make the machine learning model solve a problem on its own.

This learning setup is almost autonomous, with human intervention limited to altering the environment and tweaking rewards/penalties. Reinforcement learning finds wide application in building autonomous systems be it a self-driving car or video game playing bot.

Adversarial learning

As machine learning models are becoming mainstream, the threat of attack and hacking attempts is increasing. It’s this scenario that adversarial learning, a sub-class of supervised learning, comes to the rescue.

Adversarial learning is widely used in making machine learning models robust and immune to attacks.

They say modern problems need modern solutions. So, as we are evolving in AI and ML, we find ourselves tackling new challenges. And the machine learning methods are also evolving to keep pace.

NLP vs NLU: What’s The Difference?

Bharat Saxena — Thu, 13 May 2021 13:00:04 +0000

A natural language is one that has evolved over time via use and repetition. It does not involve deliberate planning and strategy. Latin, English, Spanish, and many other spoken languages are all languages that evolved naturally over time.

Natural languages are different from formal or constructed languages, which have a different origin and development path. For example, programming languages including C, Java, Python, and many more were created for a specific reason.

For a machine to be autonomous, a key tenet is to be able to communicate via one of the natural languages known to humans. In the wide world of Artificial Intelligence, one field deals with enabling machines to interact using these languages: Natural Language Processing (NLP).

NLP is an umbrella term which encompasses any and everything related to making machines able to process natural language—be it receiving the input, understanding the input, or generating a response.

In this context, another term which is often used as a synonym is Natural Language Understanding (NLU). Actually, though, NLP and NLU focus on different areas. In this article, we’ll look at them to understand the nuances.

What is natural language processing?

From the computer’s point of view, any natural language is a free form text. That means there are no set keywords at set positions when providing an input.

Beyond the unstructured nature, there can also be multiple ways to express something using a natural language. For example, consider these three sentences:

How is the weather today?
Is it going to rain today?
Do I need to take my umbrella today?

All these sentences have the same underlying question, which is to enquire about today’s weather forecast.

As humans, we can identify such underlying similarities almost effortlessly and respond accordingly. But this is a problem for machines—any algorithm will need the input to be in a set format, and these three sentences vary in their structure and format. And if we decide to code rules for each and every combination of words in any natural language to help a machine understand, then things will get very complicated very quickly.

This is where NLP enters the picture.

NLP is a subset of AI tasked with enabling machines to interact using natural languages. The domain of NLP also ensures that machines can:

Process large amounts of natural language data
Derive insights and information

But before any of this natural language processing can happen, the text needs to be standardized.

In machine learning (ML) jargon, the series of steps taken are called data pre-processing. The idea is to break down the natural language text into smaller and more manageable chunks. These can then be analyzed by ML algorithms to find relations, dependencies, and context among various chunks.

Some examples of pre-processing steps are:

Parsing
Stop-word removal
Part-of-speech (POS) tagging
Tokenization
Many more

Thus, we can sum up: The aim of NLP is to process the free form natural language text so that it gets transformed into a standardized structure.

What is natural language understanding (NLU)?

Considered a subtopic of NLP, the main focus of natural language understanding is to make machines:

Interpret the natural language
Derive meaning
Identify context
Draw insights

For example, in NLU, various ML algorithms are used to identify the sentiment, perform Name Entity Recognition (NER), process semantics, etc. NLU algorithms often operate on text that has already been standardized by text pre-processing steps.

Going back to our weather enquiry example, it is NLU which enables the machine to understand that those three different questions have the same underlying weather forecast query. After all, different sentences can mean the same thing, and, vice versa, the same words can mean different things depending on how they are used.

Let’s take another example:

The banks will be closed for Thanksgiving.
The river will overflow the banks during floods.

A task called word sense disambiguation, which sits under the NLU umbrella, makes sure that the machine is able to understand the two different senses that the word “bank” is used.

So, how do NLP & NLU differ?

In natural language, what is expressed (either via speech or text) is not always what is meant. Let’s take an example sentence:

Please crack the windows, the car is getting hot.

NLP focuses on processing the text in a literal sense, like what was said. Conversely, NLU focuses on extracting the context and intent, or in other words, what was meant.

NLP will take the request to crack the windows in the literal sense, but it will be NLU which will help draw the inference that the user may be intending to roll down the windows.

NLP alone could result in literal damage

NLP can process text from grammar, structure, typo, and point of view—but it will be NLU that will help the machine infer the intent behind the language text. So, even though there are many overlaps between NLP and NLU, this differentiation sets them distinctly apart.

Do we need both?

In one word, yes.

On our quest to make more robust autonomous machines, it is imperative that we are able to not only process the input in the form of natural language, but also understand the meaning and context—that’s the value of NLU. This enables machines to produce more accurate and appropriate responses during interactions.

Let’s take the example of ubiquitous chatbots.

Gone are the days when chatbots could only produce programmed and rule-based interactions with their users. Back then, the moment a user strayed from the set format, the chatbot either made the user start over or made the user wait while they find a human to take over the conversation.

Combining NLU and NLP, today’s chatbots are more robust. Using NLU methods, chatbots are able to:

Be aware of the conversation’s context
Extract the conversation’s meaning based on that context
Guide users on the topic of conversation

Ecommerce websites rely heavily on sentiment analysis of the reviews and feedback from the users—was a review positive, negative, or neutral? Here, they need to know what was said and they also need to understand what was meant.

User reviews aren’t always easy to understand

When are machines intelligent?

In the world of AI, for a machine to be considered intelligent, it must pass the Turing Test. A test developed by Alan Turing in the 1950s, which pits humans against the machine.

To pass the test, a human evaluator will interact with a machine and another human at the same time, each in a different room. If the evaluator is not able to reliably tell the difference between the response generated by the machine and the other human, then the machine passes the test and is considered to be exhibiting “intelligent” behavior.

This is a crude gauge of intelligence, albeit an effective one. The first successful attempt came out in 1966 in the form of the famous ELIZA program which was capable of carrying on a limited form of conversation with a user.

Since then, with the help of progress made in the field of AI and specifically in NLP and NLU, we have come very far in this quest. After all, chatbots are everywhere.

NLP & NLU use cases

According to various industry estimates only about 20% of data collected is structured data. The remaining 80% is unstructured data—the majority of which is unstructured text data that’s unusable for traditional methods. Just think of all the online text you consume daily, social media, news, research, product websites, and more.

NLP and NLU techniques together are ensuring that this huge pile of unstructured data can be processed to draw insights from data in a way that the human eye wouldn’t immediately see. Machines can find patterns in numbers and statistics, pick up on subtleties like sarcasm which aren’t inherently readable from text, or understand the true purpose of a body of text or a speech.

NLP and NLU are helping to ensure that we are able to process and use this enormous amount of data being generated. Some common use cases using NLP techniques are:

Speech recognition (e.g., Siri, Alexa)
Machine translate (e.g. Google Translate)
Chatbots
Sentiment analysis

The future for language

Thanks to recent advancements, another sub-field of NLP is Natural Language Generation. NLG has gained a lot of attention.

In addition to processing natural language similarly to a human, NLG-trained machines are now able to generate new natural language text—as if written by another human. All this has sparked a lot of interest both from commercial adoption and academics, making NLP one of the most active research topics in AI today.

(See how AI language models & GPT-3 actually work.)