NoSQL vs SQL: What’s The Difference and How To Choose

BY and

One of the world’s richest men, and has been for decades, is Larry Ellison. He founded Oracle in the early 1970’s taking an idea from IBM’s Edgar F. Cobb for a SQL relational database to create the Oracle Database rdbms (relational database management system). Oracle obtained a monopoly position in this market as in those days there was virtually no free software and practically all applications ran on IBM mainframes. Even IBM’s database, DB2, could not unseat Oracle as the market leader as that database only ran on mainframes. It was not until racks of PCs began to be used a servers in the 1990s (as they are today) that people used anything but mainframes. Oracle is still a monopoly for most transactional business applications among the Fortune 500. Oracle has purchased the most common opensource alternative, MySQL, and has even purchased opensource Java itself. But those remain free.

When it comes to choosing a database, one of the biggest decisions an organization may come across is whether to pick a relational (SQL) or non-relational (NoSQL) data structure. While both of these are good choices, each have clear advantages and disadvantages which must be kept in mind. We have broken down the most important differences between SQL and NoSQL as well as highlighted the benefits of each.

What is SQL?

SQL (Structured Query Language) is a programming language that is used to manage data in relational databases. Relational databases use relations, which are typically called tables, to store data and then match that data by using common characteristics within the dataset. Some common relational database management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, and Ingres.

Cobb’s breakthrough paper describes a database where objects could be constructed and queried using something he called SQL, structured query language. He used SQL to create both data (in objects called tables) and the schema for that data, which describes fields in columns. A single record in a SQL database is called a row.

What is NoSQL?

A NoSQL database, on the other hand, is self-describing, so does not require a schema. Nor does it enforce relations between tables in all case. All its documents are JSON documents, which are complete entities that one can readily read and understand. NoSQL refers to high-performance, non-relational databases that utilize a wide variety of data models. These databases are highly recognized for their ease-of-use, scalable performance, strong resilience, and wide availability. NoSQL database examples include MongoDB, MarkLogic, Couchbase, CloudDB, and Amazon’s Dynamo DB.

Major Differences

There are many differences between SQL and NoSQL, all of which are important to understand when making a decision about what might be the best data management system for your organization. These include differences among the:

  • Language
  • Scalability
  • Community
  • Structure

The Language

One of the major differences between SQL relational and NoSQL non-relational databases is the language. As mentioned, SQL databases use Structured Query Language for defining and manipulating data. This allows SQL to be extremely versatile and widely-used, however, it also makes it more restrictive. SQL requires that you use predefined schemas to determine the structure of your data before you even begin to work with it. Your data must also follow the same structure, as well, which can entail both significant up-front preparation along with careful execution.

A NoSQL database features a dynamic schema for unstructured data and the data can be stored in many different ways, whether it be graph-based, document-oriented, column-oriented, or organized as a KeyValue store. This extreme flexibility allows you to create documents without first having to carefully plan and define their structure, add fields as you go, and vary the syntax from database to database. It also allows you to give each document its own unique structure, giving you more freedom overall.

The Scalability

Another big difference between SQL and NoSQL is their scalability. In most SQL databases, they are vertically scalable, which means that you can increase the load on a single server by increasing components like RAM, SSD, or CPU. In contrast, NoSQL databases are horizontally scalable, which means that they can handle more traffic simply by adding more servers to the database. NoSQL databases have the ability to become larger and much more powerful, making them the preferred choice for large or constantly evolving data sets.

The Community

Due to SQL’s maturity, it has a much stronger and more developed community compared to NoSQL. There are thousands of chats and forums available where experts can share knowledge and discuss SQL best practices, continuously enhancing skills. Although NoSQL is growing rapidly, its community is not as well defined as SQL due to the fact that it is still relatively new.

The Structure

Finally, a last thing to consider when debating SQL versus NoSQL is their structures. SQL database are table-based which makes them a better option for applications that require multi-row transactions. Samples of these may be accounting systems or even legacy systems that were originally built for a relational structure. NoSQL databases can be key-value pairs, wide-column stores, graph databases, or document-based.

So Which Database Is Right For Your Business?

The best way to determine which database is right for your business is to analyze what you need its functions to be. SQL is a good choice for any organization that will benefit from a predefined structure and set schemas, particularly if they require multi-row transactions. It is also a good option if all data must be consistent without leaving room for error, such as with accounting systems.

NoSQL is a good choice for those companies experiencing rapid growth with no clear schema definitions. NoSQL offers much more flexibility than a relational database and is a solid option for companies who must analyze large quantities of data or whose data structures they manage are variable.

Examples

below you can clearly see that the first field is student and the second field is class.

{ student:  "Walker Rowe",
  class:  "biology"
}

In terms of SQL the user would first create this schema before they could add data to the database.:

CREATE TABLE studentClasses (
    student varchar,
    class varchar
);

Where varchar is variable character length.

To add data to that table one would:

INSERT INTO studentClasses (student, class)
VALUES ("Walker Rowe", "biology:);

With a NoSQL database, in this example MongoDB, you would use the database API to insert data like this:

db.studentClasses.insert( { name: "Walker Rowe", class: "biology" } )

And then you can create the union (all elements from two or more sets) and intersection (common elements of two or more sets) of sets using SQL.

The big breakthrough here was to let programmers do all this using easy-to-understand SQL syntax. Then Oracle made further technological advances to ensure database referential integrity and improve performance by indexing fields and caching records. (Database referential integrity means the completeness of transactions so that there are no orphaned records. For example, a sales record with no corresponding product item. This is what is I meant by saying Oracle can enforce the relationship between tables.)

Note that in the MongoDB example we have described above, Oracle programmers would say that the table studentClasses is a intersection. Because you can determine from it both what classes a student has and which students are in which class. In this case you would also have both student and class records to contain things like the class room number and the student phone number.

The Oracle database is called a row oriented database. Data is grouped into rows and columns. We don’t need to mention column-oriented databases here, like Cassandra, as they different in architecture and not conception to such a large degree. So they are not so fundamentally different as SQL versus noSQL. In particular, the Cassandra NoSQL database is used to group similar columns of data near each other so they can be retrieved at the highest possible speed. Also Cassandra and noSQL database get rid of the concept of database normalization, which is key to Oracle, as we explain below. And they do not store empty column values. So the row lengths can differ.

Efficiency and Normalization

One thing that Oracle stressed was the relationship between objects. They said that all data should be normalized. This means no data should be stored twice. So instead of putting, for example, the school address in every student record, it would be better to maintain a school table and store the address there. NoSQL databases have gotten rid of this constraint, to a certain degree.

Disk space was expensive in the 1970s and so was memory, so normalization made sense. But it can take some time to do a join operation to bring together a record that is stored in different tables into one logical unit. It also requires the overhead of maintaining index files and writing to those as data is added or deleted

NoSQL databases say all that does not matter as disk space and memory are cheap. Proponents of that say it is OK to, regarding the aforementioned case, put the school address in with the student. This speeds data retrieval time and makes coding easier.

NoSQL vs SQL

Oracle’s largest competitor in the business market is SAP. They have their own database, Hana. But the only difference between than and Oracle is Hana stores all its records in memory (flushing them to disk as needed.). It does this for speed. Still, it is still a rdbms.

It is difficult to make the case to switch to noSQL databases in business applications that have been running for decades or to propose those for new applications when companies already have knowledge of rdbms. There are management issues that Oracle has solved, such as data replication, that could leave someone using, for example, ElasticSearch, without support and with a downed system. But to fill that gap some companies have taken over the support of, and sometimes most of the programming for, so-called opensource databases, like ElasticSearch. If you want support for that then you can buy support and a supported version from Elastic.

The other too is the paradigm switch for transactional systems. It is easy to conceive of adding a sale to a sales database. Oracle then would automatically calculated on on-hand inventory using a saved SQL operation called a view. For MongoDB, a program would have to sort through the inventory items and subtract the sales to determine the new on-hand inventory.

Some Common NoSQL Databases

If you read the use cases for NoSQL databases you will find that those tend to be adopted as niche and not enterprise systems. For example, Uber uses Cassandra to keep track of drivers. But its needs are unique, including the need to write millions of records per second across multiple data centers. They even wrote their own implementation of Cassandra so that it could run on Mesos. Mesos is an orchestration system similar to containers.

Amazon markets is DynamoDB database as having “millisecond latency.” They also drop the term noSQL and simply call it a nonrelational database.

DynamoDB, like MongoDB, has a JavaScript interface. So you can work with it using that relatively simple programming language as well. For example to add a a record, you first instantiate an instance of the database then add the JSON item like this:

var docClient = AWS.DynamoDB.DocumentClient()
docClient.put("{JSON … }"}

One implementation detail is that you can run these operations in MongoDB and DynamoDB using Node.js. That is JavaScript running in the middle tier. So you do not need to create JAR files or middleware servers like Oracle Weblogic.

So which should you be using for your new project? Your accounting system could very well continue to run on an RDBMS system. But there are alternatives to paying Oracle for licensing fees, like using MySQL. But will it use MongoDB? That is not very likely for the short term as there are millions of programmers around the world using Java and Oracle and project managers and users who understand that. Use ElasticSearch for logs and Spark for analytics. As for the others, study those on a case by case basis to see which works best given your resources, skill, ability to suffer lost transactions, etc.

Conclusion

No matter what field you are in, choosing the correct database for your organization is an important decision. NoSQL databases are quickly becoming a major part of the database landscape today, and they are proving to be a real game-changer in the IT arena. They have numerous benefits, including lower cost, open-source availability, and easier scalability, which make NoSQL an appealing option for anyone thinking about integrating in Big Data. They are a young technology, however, which makes them slightly more volatile.

On the other hand, SQL databases have proven themselves for over 40 years and use long-established standards that are well defined. They have a huge community of experts behind them and the opportunity for collaboration is limitless.

Overall, the decision of using SQL versus NoSQL for business is not entirely black and white; it requires some comparing and contrasting to determine which database best fits your specific needs. With the proper amount of research and preparation, however, you will ensure that the database you choose provides an efficient and streamlined management system for your organization.

Running SQL on Db2?

Learn about BMC’s Performance for Db2 SQL.

Find and eliminate the wasteful SQL statements that are slowing you down, with the BMC Performance for Db2 SQL solution. Designed to manage SQL performance throughout the application lifecycle, the tools in this solution will help you:

  • Diagnose performance problems and track them to their source, so you can effectively tune your SQL
  • Anticipate SQL-related slowdowns so you can resolve them before they impact service levels
  • Avoid cumbersome reorganizations and costly CPU upgrades by making the most of your resources
  • Ensure your SQL is running efficiently and cost effectively at all times
Related posts:

Wikibon: Automate your Big Data pipeline

Learn how data management experts throughout the industry are transforming their Big Data infrastructure for maximum business impact.

Download Now ›

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

Share This Post


Walker Rowe

Walker Rowe

Walker Rowe is an American freelance tech writer and programmer living in Chile. He specializes in big data, analytics, and cloud architecture. Find him on LinkedIn or at Southern Pacific Review, where he publishes short stories, poems, and news.

Laura Shiff

Laura Shiff

Laura Shiff is a researcher and technical writer based in the Twin Cities. She specializes in software, technology, and medicine. You can reach Laura at LauraShiffCopywriting@gmail.com or her website at https://www.laurashiff.com