Machine Learning & Big Data Blog

DynamoDB Bulk Insert: An Easy Tutorial

Walker Rowe
2 minute read
Walker Rowe
image_pdfimage_print

In this article, we’ll show how to do bulk inserts in DynamoDB. If you’re new to Amazon DynamoDB, start with these resources:

Bulk inserts and deletes

DynamoDB can handle bulk inserts and bulk deletes. We use the CLI since it’s language agnostic. The file can be up to 16 MB but cannot have more than 25 request operations in one file.

Request operations can be:

  • PutRequest
  • DeleteRequest

The bulk request does not handle updates.

Data from IMDB

To illustrate, we have pulled 24 items from the IMDB (Internet Movie Database) and put them into JSON format. You can download that data from here.

The format for the bulk operation is:

{ "table name: [
        "request operation": {
             "item: {
                 (put your item here in Attribute value format)
         }
     }
}]
}

Here is an example:

{
	"title": [{
		"PutRequest": {
			"Item": {
				"tconst": {
					"S": "tt0276132"
				},
				"titleType": {
					"S": "movie"
				},
				"primaryTitle": {
					"S": "The Fetishist"
				},
				"originalTitle": {
					"S": "The Fetishist"
				},
				"isAdult": {
					"S": "0"
				},
				"startYear": {
					"S": "2019"
				},
				"endYear": {
					"S": "\\N"
				},
				"runtimeMinutes": {
					"S": "\\N"
				},
				"genres": {
					"S": "Animation"
				}
			}
		}
	}]
}

If you are running DynamoDB locally then start it like this:

java -Djava.library.path=./DynamoDBLoc_lib -jar DynamoDBLocal.jar -sharedDb

Create a table like this:

aws dynamodb create-table \
    --table-name title \
    --attribute-definitions AttributeName=tconst,AttributeType=S \
    --key-schema AttributeName=tconst,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
--endpoint-url http://localhost:8000

Then load the data like this, having saved the IMDB data in the file 100.basics.json.

aws dynamodb batch-write-item \
         --endpoint-url http://localhost:8000 \
          --request-items file:////Users/walkerrowe/Documents/imdb/100.basics.json \
         --return-consumed-capacity  TOTAL \
        --return-item-collection-metrics  SIZE          

It responds:

{
    "UnprocessedItems": {}, 
    "ConsumedCapacity": [
        {
            "CapacityUnits": 23.0, 
            "TableName": "title"
        }
    ]
}

It told you how many records it wrote. You can query that it worked like this:

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst"     \
    --expression-attribute-values '{ ":tconst":{"S":"tt0276132"}}'

Attribute Types and AttributeValue

Here we show some of the AttributeValues, meaning attribute or data types supported by DynamoDB. Those are:

  • S
  • BOOL
  • L
  • M
  • etc.

Note: Even with numeric values you wrap them in quotes.

attribute type description
S String
Notice that a date is in ISO-8601 value like this:

“currentTime”: {

“S”: “2020-07-24T09:25:49+0000”

}

BOOL Boolean. Use true or false.
L A list of values without any AttributeValue, meaning no attribute name:

“other”: {

“L”: [{“S”: “Paris”},

{“N”: “13000000”}]

}

M Map, containing attribute values. This is like a JSON object, except it has attribute values.  So, it’s like a list of named attributes.

“map”: {

“M”: {“Name”: {“S”: “Joe”},

“Age”: {“N”: “35”}}

}

}

Here is an example showing how to use those DynamoDB attribute types.

{
	"title": [{
		"PutRequest": {
			"Item": {
				"tconst": {
					"S": "tt9276132"
				},
				"titleType": {
					"S": "movie"
				},
				"primaryTitle": {
					"S": "Zorba"
				},
				"isAdult": {
					"BOOL": true
				},
				"Years": {
					"NS": ["2019","2020"]
				},
				"actors": {
					"SS": ["Anthony Quinn", "Marcel Marciano", "David Niven", "Peter Sellers"]
				},
				"currentTime": {
					"S": "2020-07-24T09:25:49+0000"
				},
				"other": {
				"L": [{"S": "Paris"}, 
				     {"N": "13000000"}]
				},
				"map": {
				"M": {"Name": {"S": "Joe"}, "Age": {"N": "35"}}
				}
			}
		}
	}]
}

Additional resources

For more on this topic, explore the BMC Machine Learning & Big Data Blog and these resources:

Automate workflows to simplify your big data lifecycle

In this e-book, you’ll learn how you can automate your entire big data lifecycle from end to end—and cloud to cloud—to deliver insights more quickly, easily, and reliably.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Run and Reinvent Your Business with BMC

From core to cloud to edge, BMC delivers the software and services that enable nearly 10,000 global customers, including 84% of the Forbes Global 100, to thrive in their ongoing evolution to an Autonomous Digital Enterprise.
Learn more about BMC ›

About the author

Walker Rowe

Walker Rowe

Walker Rowe is an American freelancer tech writer and programmer living in Cyprus. He writes tutorials on analytics and big data and specializes in documenting SDKs and APIs. He is the founder of the Hypatia Academy Cyprus, an online school to teach secondary school children programming. You can find Walker here and here.