MongoDB Sharding Explained


MongoDB is designed to be scalable, meaning you can run it in a cluster across a distributed platform. That is called sharding.

You assign different parts of the data to different servers using an index.  For example, records with the index customers could be on one set of servers and vendors on the other.  But if you want a completely random distribution then you use a hashed value for the index.  You can also assign data to servers using a range of values.


Shared Cluster

 The basic architecture is as shown in this graphic from MongoDB.  You communicate with the routers and they consult the config servers to determine where to write and read data on the shard servers, i.e. where the data is stored. The config servers also work as replica sets, meaning they replicate data there, to have a extra copy.

You are supposed to have at least 3 config servers for production.  Here we show how to set up the shard configuration using the minimum of 1 router, 1 config server, and 1 shard server in a development environment.

 Install Software and set up Virtual Machines

 We need three VMs.  In /etc/hosts we have these three shown below.  You can have any IP address you want, but set the hostname using the same names as shown below for purposes of this tutorials.

We have:

mongomaster—this is the router.
mongoshard—this is the database server that we wish to shard, i.e., run on a cluster.
mongoconfig—this is both a config server and a database replica. mongomaster mongoshard mongoconfig

  • Install Software

We are using Mongo 3.4.10 on Ubuntu 16.04.  I mention that because Mongo changed the software where a server can no longer be a just config server.  Rather is it a replica with the config server role.  Other replicas can just be replicas.

Install the Mongo software on all three servers.

sudo apt-key adv --keyserver hkp:// -- recv 0C49F3730359A14518585931BC711F9BA15703C6
echo "deb [ arch=amd64,arm64 ] xenial/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list
sudo apt-get update
sudo apt-get install -y mongodb-org

  • Create the Config Server Replica Set

rs is the name that we give to this replicate set.  The -configsvr switch means this will serve as both a replica server and a config server.

ssh mongoconfig
sudo mongod --configsvr --dbpath /data/configdb --replSet rs

Now, from any server, run mongo.  We connect to port 27019 because in stdout when we started mongod that is where mongod told us it is listening:

This node is mongoconfig:27019 in the config.

Initialize it.

mongo --host mongoconfig --port 27019

Now look at the status and verify that it is a config server:

"set" : "rs",
"date" : ISODate("2017-10-28T06:38:52.355Z"),
"myState" : 1,
"term" : NumberLong(1),
"configsvr" : true,

  • Start Mongod on the Shard Server

ssh mongoshard
sudo mongod --shardsvr --replSet

Now connect using the mongo client and initialize that one as well:

mongo --host monogshard --port 2718
"info2" : "no configuration specified. Using a default configuration for the set",
"me" : "mongoshard:27018",
"ok" : 1

  • Start Mongos on the Router

ssh mongomaster
mongos --configdb rs/mongoconfig:27019

Notice this message in stdout, which gives us the port number for the next step:

waiting for connections on port 27017

  • Add Shards to the Cluster

mongo --host mongomaster --port 27017
sh.addShard( "rs/mongoshard:27018")
{ "shardAdded" : "rs", "ok" : 1 }

  • Enable Sharding for Database

Use any database name here.  Remember it will create it if it does not exist. We use tobacco, since that was the name in we used in the last tutorial, i.e., the one on using NodeJS with MongoDB.

{ "ok" : 1 }

Check Status

Look to see that the database is there and that it is shard.

--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("59f425f12fdbabb0db68b690")
{ "_id" : "rs", "host" : "rs/mongoshard:27018", "state" : 1 }
active mongoses:
"3.4.10" : 1
Currently enabled: yes
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
{ "_id" : "tobacco", "primary" : "rs", "partitioned" : true }

Related posts:

Want to Learn More About Big Data and What It Can Do for You?

BMC recently published an authoritative guide on big data automation. It’s called Managing Big Data Workflows for Dummies. Download now and learn to manage big data workflows to increase the value of enterprise data.

Download Now ›

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

Share This Post

Walker Rowe

Walker Rowe

Walker Rowe is an American freelance tech writer and programmer living in Chile. He specializes in big data, analytics, and cloud architecture. Find him on LinkedIn or at Southern Pacific Review, where he publishes short stories, poems, and news.