MongoDB Overview: Getting Started with MongoDB

BY

ITOps, DevOps, NoOps, and more

Here we provide an overview of the MongoDB database. In subsequent posts we will give more in depth examples of how to use MongoDB.

First, MongoDB is a noSQL big data database. It fits the definition of big data, because it scales (i.e., can be made larger) simply by adding more servers to a distributed system. And it is does not require any schema, like an RDBMS database, such as Oracle.

MongoDB data records are stored in JSON (JavaScript Object Notation) format, which is self-describing, meaning the metadata (i.e., schema) is stored together with the data.

Command Line Shell

Mongo has an interactive command shell. JavaScript programmers will love this because the syntax is JavaScript. To open the shell you simply type:

mongo

Concepts

MongoDB records are called documents. Each MongoDB database (You can have many.) includes collections, which are a set of JSON documents. Each collection and document has an ObjectID created by MongoDB or supplied by the programmer.

To illustrate, suppose we have one database called products.

We could have two collections to contain all products grouping them by where they are sold:

Collection Documents and fields
Products sold in Europe EAN (European barcode), weight in kilos
Products sold in the USA UPC (American barcode), weight in pounds

Data storage is cheap and memory and CPU costs more. So, some big data databases, like Cassandra and MongoDB, throw out the idea of a normalized database, which is one of the key principles of an RDBMS database.

For example, with Oracle you would have a product category in a product record. The product category table contains fields common to all of those products. Each product record points to a product category record, so that such common data is not stored more than once:

Product table
Fields: product number, product category

 

Product Category table
Fields: product category, weight, color

But then you have to do a join operation if you want to know the color or weight of a product. But a join is a computationally expensive operation. That takes time. MongoDB would store the data like this:

Product documents
Fields: product number, product category, weight, color

RDBMS programmers say that creates duplication and wastes space. MongoDB programmers would say “yes,” but speed is more important than storage.

In other words, MongoDB records might look like this:

Product 1 Category boy’s diapers small Color blue
Product 2 Category boy’s diapers large Color blue
Product 3 Category girl’s diapers small Color pink

Obviously. When you know the category you know the color.

We will illustrate that by creating the products database and adding some products there. Paste these commands into the MongoDB shell.

First create the products database.

use products
switched to db products

Then these two collections:

> db.createCollection("boyDiapers")
{ "ok" : 1 }
> db.createCollection("girlDiapers")
{ "ok" : 1 }
>

Then add some data:

db.boyDiapers.insert([
{
size: 1,
color: 'blue',
brand: 'toddler tyke',
}
])
db.girlDiapers.insert([
{
size: 1,
color: 'pink',
brand: 'little angel',
}
])

Notice two things. First, we use the format db.(collection).insert to add the document. Second, we use the brackets [], which indicates an array, do that we can add more than one document at a time.

Now create some more data so that we can query for data:

db.boyDiapers.insert([
{
size: 2,
color: 'white',
brand: 'boy large white'
}
])
db.girlDiapers.insert([
{
size: 2,
color: 'while',
brand: 'girl large'
}
])

Selecting Data

If you use find with no arguments it lists all documents. Use pretty to display the results in easy-to-read indented JSON format:

> db.girlDiapers.find().pretty()
{
"_id" : ObjectId("59d1e9d5ccf50b62c5a7af55"),
"size" : 1,
"color" : "pink",
"brand" : "little angel"
}
{
"_id" : ObjectId("59d1f022ccf50b62c5a7af57"),
"size" : 1,
"color" : "while",
"brand" : "girl large"
}
{
"_id" : ObjectId("59d1f565ccf50b62c5a7af59"),
"size" : 2,
"color" : "while",
"brand" : "girl large"
}

Find all girl diapers of size 2 add arguments to the find statement:

db.girlDiapers.find({"size":2})
{ "_id" : ObjectId("59d1f565ccf50b62c5a7af59"), "size" : 2, "color" : "while", "brand" : "girl large" }

Now, you could not search both boy’s and girl’s diapers collections at the same time. MongoDB does not do that. Instead you have to program that in your application that you would code using some driver (See below).

Normalized Documents

We just said that in MongoDB there is no normalization because storage is cheap and computational power expensive. But you can create normalize documents.

For example we can create a sales record for each size 2 girl large document like this with the diaper field pointing to the diaper object. That might make more sense in this case as you would not want the diaper collection to grow many times larger each time you make a sale.

db.girlDiapers.insert([
{ "diaper" : ObjectId("59d1f565ccf50b62c5a7af59"),
"price" : 45.2,
"quanity" : 10,
"sku" : "case"
}
])

MongoDB Drivers

Of course, you probably would not use the command line shell for an application. Instead you would write a program to interact with MongoDB using any of the many drivers available. There are drivers for C++, C#, Java, Node.JS, Scala, Python, and more.

For example, to use Python:

sudo pip install pymongo

Then to query for size 2 diapers across the boy and girl collections:

from pymongo import MongoClient
client = MongoClient()
db = client.products
x=db.collection_names()for i in range(len(x)):
c=x[i]
d = db.get_collection(c)
for e in d.find({"size": 2}):
print(e)

Outputs:

{'size': 2.0, 'brand': 'boy large white', 'color': 'white', '_id': ObjectId('59d1f564ccf50b62c5a7af58')}
{'size': 2.0, 'brand': 'girl large', 'color': 'while', '_id': ObjectId('59d1f565ccf50b62c5a7af59')}

In the next post we will get into some more advanced MongoDB topics.

Related posts:

Want to Learn More About Big Data and What It Can Do for You?


BMC recently published an authoritative guide on big data automation. It’s called Managing Big Data Workflows for Dummies. Download now and learn to manage big data workflows to increase the value of enterprise data.

Download Now ›

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

Share This Post


Walker Rowe

Walker Rowe

Walker Rowe is an American freelance tech writer and programmer living in Chile. He specializes in big data, analytics, and cloud architecture. Find him on LinkedIn or at Southern Pacific Review, where he publishes short stories, poems, and news.