DynamoDB Guide – BMC Software | Blogs https://s7280.pcdn.co Wed, 16 Dec 2020 08:24:28 +0000 en-US hourly 1 https://s7280.pcdn.co/wp-content/uploads/2016/04/bmc_favicon-300x300-36x36.png DynamoDB Guide – BMC Software | Blogs https://s7280.pcdn.co 32 32 DynamoDB Advanced Queries: A Cheat Sheet https://s7280.pcdn.co/dynamodb-advanced-queries/ Fri, 31 Jul 2020 07:42:41 +0000 https://www.bmc.com/blogs/?p=18179 This is an article on advanced queries in Amazon DynamoDB and it builds upon DynamoDB basic queries. (This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.) DynamoDB Query Rules Remember the basic rules for querying in DynamoDB: The query includes a key condition and filter expression. The key condition selects […]]]>

This is an article on advanced queries in Amazon DynamoDB and it builds upon DynamoDB basic queries.

(This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.)

DynamoDB Query Rules

Remember the basic rules for querying in DynamoDB:

  • The query includes a key condition and filter expression.
  • The key condition selects the partition key and, optionally, a sort key.
  • The partition key query can only be equals to (=). Thus, if you want a compound primary key, then add a sort key so you can use other operators than strict equality.
  • Having selected a subset of the database with the key condition, you can narrow that down by writing a filter expression. That can run against any attribute.
  • Logical operators (>, <, begins_with, etc.) are the same for key conditions and filter expressions, except you cannot use contains as a key condition.

Load sample data

To perform these advanced queries, we need some data to work with. Download this sample data from GitHub, which is data from IMDB that I’ve slightly modified.

Create a table

In this document we are using DynamoDB on a local machine. So, we specify –endpoint-url http://localhost:8000.

Create the title table like this:

aws dynamodb create-table \
--endpoint-url http://localhost:8000 \
    --table-name title \
    --attribute-definitions AttributeName=tconst,AttributeType=S \
                                   AttributeName=primaryTitle,AttributeType=S \
    --key-schema AttributeName=tconst,KeyType=HASH \
       AttributeName=primaryTitle,KeyType=RANGE \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

Notice that the primary key is the combination of the attributes tconst (partition key) and primaryTitle (sort key).

For our sample data we have data like shown below. All the partition keys are set to the same value movie. Then the movie primaryTitle is the sort key.

{
          "tconst": {
            "S": "movie"
          },
           "primaryTitle": {
            "S": "Travel Daze"
          },

Then load the data like this:

aws dynamodb batch-write-item \
         --endpoint-url http://localhost:8000 \
          --request-items file:////Users/walkerrowe/Documents/imdb/movies.json \
         --return-consumed-capacity  TOTAL \
        --return-item-collection-metrics  SIZE          

Between query

Here we use the first (space) and last (ÿ) characters in the UTF-8 character set to select all titles.

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst and primaryTitle BETWEEN :fromTitle AND :toTitle" \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                      ":fromTitle":{"S":" "},
                      ":toTitle":{"S":"ÿ"}
      }'

Begins with query

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst and begins_with(primaryTitle, :beginsWith)" \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                      ":beginsWith":{"S":"A"} 
      }'

Contains query

Here we write a filter expression instead of a key condition just to show how to write a filter expression as opposed to a key condition. As we mentioned above, the operators are the same, except you cannot use the operator contains as a key condition.

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression 'contains(originalTitle, :containsStr)' \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                      ":containsStr":{"S":"Brooklyn"} 
      }'

Attribute exists query

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression 'attribute_exists(genres)' \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"}
       }'

Attribute not exists query

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression 'attribute_not_exists(genres)' \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"}
       }'

In query

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression "genres IN (:inDrama, :inComedy)" \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                    ":inDrama":{"S":"Drama"},
                     ":inComedy":{"S":"Comedy"}
       }'

String set query

Our data contains data like this:

"actors": {
	"SS": ["Anthony Quinn", "Marcel Marciano", "David Niven", "Peter Sellers"]
},

So, an equality condition on that string set (SS) element would necessarily contain all those strings.

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression "actors =  :actors" \
   --expression-attribute-values  '{ 
        ":tconst":{"S":"movie"},
         ":actors":{"SS": ["Anthony Quinn", "Marcel Marciano", "David Niven", "Peter Sellers"]}
       }'

Boolean query

Boolean data is stored like this:

"isComedy": {
	"BOOL": false
}

So, you query it like this:

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression "isComedy = :isComedy" \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                     ":isComedy":{"BOOL":true}
       }'

Query map type

The map query is similar to the nested query (see the next item).

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression "aditionalInfo = :aditionalInfo" \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                      ":aditionalInfo": {
                  "M": {"Location": {"S": "Bost"}, "Language": {"S": "FR"}}
        }
}'

Nested query

A nested DynamoDB object is one that contains a map. You refer to the element using the dot notation parent.child, like for this data you would write aditionalInfo.Location.

"aditionalInfo": {
            "M": {"Location": {"S": "Bost"}, "Language": {"S": "FR"}}
        },

Location is a reserved word, so we have to give it an alias using:

--expression-attribute-names '{"#loc": "Location"}'

And here is the nested DynamoDB query:

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression "aditionalInfo.#loc = :loc" \
--expression-attribute-names '{"#loc": "Location"}'     \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                      ":loc": {"S": "Bost"}
}'

Projection Expression

Use this projection expression to limit the attributes returned by DynamoDB, as it returns all attributes by default.

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst" \
   --filter-expression "aditionalInfo = :aditionalInfo" \
   --expression-attribute-values  '{ 
                     ":tconst":{"S":"movie"},
                      ":aditionalInfo": {
                  "M": {"Location": {"S": "Bost"}, "Language": {"S": "FR"}}
        }
}' \
--projection-expression "originalTitle, runtimeMinutes"

Additional resources

For more on this topic, explore the BMC Big Data & Machine Learning Blog or check out these resources:

]]>
DynamoDB Bulk Insert: An Easy Tutorial https://www.bmc.com/blogs/dynamodb-bulk-insert/ Thu, 30 Jul 2020 00:00:44 +0000 https://www.bmc.com/blogs/?p=18145 In this article, we’ll show how to do bulk inserts in DynamoDB. If you’re new to Amazon DynamoDB, start with these resources: Introduction to Amazon DynamoDB How To Add Data to Amazon DynamoDB How To Query Amazon DynamoDB (This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.) Bulk inserts and […]]]>

In this article, we’ll show how to do bulk inserts in DynamoDB. If you’re new to Amazon DynamoDB, start with these resources:

(This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.)

Bulk inserts and deletes

DynamoDB can handle bulk inserts and bulk deletes. We use the CLI since it’s language agnostic. The file can be up to 16 MB but cannot have more than 25 request operations in one file.

Request operations can be:

  • PutRequest
  • DeleteRequest

The bulk request does not handle updates.

Data from IMDB

To illustrate, we have pulled 24 items from the IMDB (Internet Movie Database) and put them into JSON format. You can download that data from here.

The format for the bulk operation is:

{ "table name: [
        "request operation": {
             "item: {
                 (put your item here in Attribute value format)
         }
     }
}]
}

Here is an example:

{
	"title": [{
		"PutRequest": {
			"Item": {
				"tconst": {
					"S": "tt0276132"
				},
				"titleType": {
					"S": "movie"
				},
				"primaryTitle": {
					"S": "The Fetishist"
				},
				"originalTitle": {
					"S": "The Fetishist"
				},
				"isAdult": {
					"S": "0"
				},
				"startYear": {
					"S": "2019"
				},
				"endYear": {
					"S": "\\N"
				},
				"runtimeMinutes": {
					"S": "\\N"
				},
				"genres": {
					"S": "Animation"
				}
			}
		}
	}]
}

If you are running DynamoDB locally then start it like this:

java -Djava.library.path=./DynamoDBLoc_lib -jar DynamoDBLocal.jar -sharedDb

Create a table like this:

aws dynamodb create-table \
    --table-name title \
    --attribute-definitions AttributeName=tconst,AttributeType=S \
    --key-schema AttributeName=tconst,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
--endpoint-url http://localhost:8000

Then load the data like this, having saved the IMDB data in the file 100.basics.json.

aws dynamodb batch-write-item \
         --endpoint-url http://localhost:8000 \
          --request-items file:////Users/walkerrowe/Documents/imdb/100.basics.json \
         --return-consumed-capacity  TOTAL \
        --return-item-collection-metrics  SIZE          

It responds:

{
    "UnprocessedItems": {}, 
    "ConsumedCapacity": [
        {
            "CapacityUnits": 23.0, 
            "TableName": "title"
        }
    ]
}

It told you how many records it wrote. You can query that it worked like this:

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name title    \
    --key-condition-expression "tconst = :tconst"     \
    --expression-attribute-values '{ ":tconst":{"S":"tt0276132"}}'

Attribute Types and AttributeValue

Here we show some of the AttributeValues, meaning attribute or data types supported by DynamoDB. Those are:

  • S
  • BOOL
  • L
  • M
  • etc.

Note: Even with numeric values you wrap them in quotes.

attribute type description
S String
Notice that a date is in ISO-8601 value like this:”currentTime”: {“S”: “2020-07-24T09:25:49+0000”

}

BOOL Boolean. Use true or false.
L A list of values without any AttributeValue, meaning no attribute name:

“other”: {

“L”: [{“S”: “Paris”},

{“N”: “13000000”}]

}

M Map, containing attribute values. This is like a JSON object, except it has attribute values.  So, it’s like a list of named attributes.

“map”: {

“M”: {“Name”: {“S”: “Joe”},

“Age”: {“N”: “35”}}

}

}

Here is an example showing how to use those DynamoDB attribute types.

{
	"title": [{
		"PutRequest": {
			"Item": {
				"tconst": {
					"S": "tt9276132"
				},
				"titleType": {
					"S": "movie"
				},
				"primaryTitle": {
					"S": "Zorba"
				},
				"isAdult": {
					"BOOL": true
				},
				"Years": {
					"NS": ["2019","2020"]
				},
				"actors": {
					"SS": ["Anthony Quinn", "Marcel Marciano", "David Niven", "Peter Sellers"]
				},
				"currentTime": {
					"S": "2020-07-24T09:25:49+0000"
				},
				"other": {
				"L": [{"S": "Paris"}, 
				     {"N": "13000000"}]
				},
				"map": {
				"M": {"Name": {"S": "Joe"}, "Age": {"N": "35"}}
				}
			}
		}
	}]
}

Additional resources

For more on this topic, explore the BMC Machine Learning & Big Data Blog and these resources:

]]>
How To Query Amazon DynamoDB https://www.bmc.com/blogs/dynamodb-queries/ Thu, 23 Jul 2020 00:00:16 +0000 https://www.bmc.com/blogs/?p=18075 In this article, we explain the basics of DynamoDB queries. The query syntax is quite different from other databases, so it takes some time to get used to it. (This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.) Hash key in DynamoDB The primary reason for that complexity is that […]]]>

In this article, we explain the basics of DynamoDB queries. The query syntax is quite different from other databases, so it takes some time to get used to it.

(This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.)

Hash key in DynamoDB

The primary reason for that complexity is that you cannot query DynamoDB without the hash key. So, it’s not allowed to query the entire database. That means you cannot do what you would call a full table scan in other databases.

However, the primary (partition) key can include a second attribute, which is called a sort key. The key query condition must be = (equals). But operators for the sort key can be:

  • _
  • 〈=
  • 〉=
  • begins_with
  • between

Each query has two distinct parts:

  1. The key condition query (i.e., the partition key hash) and optionally the sort key
  2. The filter expression (whatever query other attribute you want)

Load sample data

We give some examples below, but first we need some data:

  1. Install DynamoDB and run it locally, as we explained in How To Add Data in DynamoDB.
  2. Install node so we can run some JavaScript code.
  3. Install the Amazon SDK using npm, which is part of node:
npm install aws-sdk

Run these programs from the Amazon JavaScript examples:

  1. Create the Movies table by running MoviesCreateTable.js.
  2. Download the sample data from here and unzip it.
  3. Load some data by running MoviesLoadData.

Inspect the data

Take a look at the top of the data file. It is a movie database that includes nested JSON fields and arrays. So, it is designed to be used as a teaching exercise.

{
        "year": 2013,
        "title": "Rush",
        "info": {
            "directors": ["Ron Howard"],
            "release_date": "2013-09-02T00:00:00Z",
            "rating": 8.3,
            "genres": [
                "Action",
                "Biography",
                "Drama",
                "Sport"
            ],
            "image_url": "http://ia.media-imdb.com/images/M/MV5BMTQyMDE0MTY0OV5BMl5BanBnXkFtZTcwMjI2OTI0OQ@@._V1_SX400_.jpg",
            "plot": "A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda.",
            "rank": 2,
            "running_time_secs": 7380,
            "actors": [
                "Daniel Bruhl",
                "Chris Hemsworth",
                "Olivia Wilde"
            ]
        }
    }

Describe table

You can verify that the data was loaded using:

aws dynamodb describe-table   --table-name Movies   --endpoint-url http://localhost:8000

Here is the first record.

{
    "Table": {
        "TableArn": "arn:aws:dynamodb:ddblocal:000000000000:table/Movies", 
        "AttributeDefinitions": [
            {
                "AttributeName": "year", 
                "AttributeType": "N"
            }, 
            {
                "AttributeName": "title", 
                "AttributeType": "S"
            }
        ], 
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0, 
            "WriteCapacityUnits": 10, 
            "LastIncreaseDateTime": 0.0, 
            "ReadCapacityUnits": 10, 
            "LastDecreaseDateTime": 0.0
        }, 
        "TableSizeBytes": 2095292, 
        "TableName": "Movies", 
        "TableStatus": "ACTIVE", 
        "KeySchema": [
            {
                "KeyType": "HASH", 
                "AttributeName": "year"
            }, 
            {
                "KeyType": "RANGE", 
                "AttributeName": "title"
            }
        ], 
        "ItemCount": 4608, 
        "CreationDateTime": 1595056037.797
    }
}

Query structure

We will use the Amazon CLI command line interface to execute queries. If you are using any of the programming language SDKs, the principles are the same. The only part that varies is the syntax.

Queries are composed of two parts:

  1. Key condition expression
  2. Filter expression

Key condition expression

The key condition expression can contain the partition key and, optionally, the sort key. This primary key is what DynamoDB calls the partition key. You can also have a second key, which is called the sort key. In the movies database, the partition key name is year and the sort key is the title. The partition key query expression only allows equals to (=). The sort key allows

  • _
  • 〈=
  • 〉=
  • begins_with
  • between

The Amazon sample data includes a reserved word, year. So, we have to deal with that complexity up front.

The query is below. We use backslashes (\) after each line because that’s the continuation character in the bash shell. And we put a single quote (‘) around JSON for the same reason, so that it can span lines.

Here is the query:

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name Movies    \
    --key-condition-expression "#yr = :yyyy"     \
    --expression-attribute-names '{"#yr": "year"}'     \
   --expression-attribute-values  '{ ":yyyy":{"N":"2010"}}'

Let’s break down each part of the query:

–key-condition-expression

#yr = :yyyy

The format of this expression is:

partition key name = placeholder

And you could add a sort key, which for this database is title. But we don’t want it here as we are looking for a title and not searching by one.

The pound (#) sign means that we will redefine that partition key field name in the parameter expression-attribute-names because it is a reserved word. Usually you just put the field name. But you cannot use year as it is a reserved word.

The colon (:) is a placeholder as well. It means we will redefine that below in the key-condition-expression

–expression-attribute-names ‘{“#yr”: “year”}’ This is where we provide an alias for the field year as year is a reserved word, meaning you can’t use it as a field name.
–expression-attribute-values  ‘{ “:yyyy”:{“N”:”2010″}}’ Think of this as expansion or definition of the placeholder used in the key or filter expressions (We will get to filter expressions below). In other words, in the key condition we wrote: :yyyy . That’s just a temporary placeholder. We define what value that holds here

‘{ “:yyyy”:{“N”:”2010″}}’

The N means it is a number. S is string. B is binary.

2010 is the targeted value.

Filter expression

Remember that the key condition serves to select a set or records based upon their partition. It’s a two-step process of pulling a subset of the database and then querying that subset in the filter expression.

In the example below, we want to show all films in the year 2010 that have a rating higher than 8.5. The (1) key condition gets the year and (2) filter expression lets you query by rating all movies from the year 2010. It’s designed this way for speed, by reducing the amount of data to query.

Here is the query:

aws dynamodb query  \
     --endpoint-url http://localhost:8000 \
     --table-name Movies    \
    --key-condition-expression "#yr = :yyyy"     \
    --expression-attribute-names '{"#yr": "year"}'     \
   --expression-attribute-values  '{ ":yyyy":{"N":"2010"}}' \
 --filter-expression 'info.rating > :rating' \
--expression-attribute-values '{
   ":yyyy":{"N":"2010"},
    ":rating": { "N": "8.5" }
}'

The filter expression has the same syntax as the key condition, but there are a couple of items to note.

 –filter-expression ‘info.rating > :rating’ \ The filter expression, like the key condition, is just an attribute on the left, operator in the middle, and placeholder on the right. In other words, it’s not JSON, whereas we use JSON elsewhere.

rating is nested underneath info. So, we write info.rating.

–expression-attribute-values ‘{

“:yyyy”:{“N”:”2010″},

“:rating”: { “N”: “8.5” }

}’

In the key condition query above we used the exact same parameter. Here we use JSON syntax:

–expression-attribute-values  ‘{ “:yyyy”:{“N”:”2010″}}’

The only different here is we now have two placeholders (:yyyy and :rating) so need two lines in that JSON:

–expression-attribute-values ‘{

“:yyyy”:{“N”:”2010″},

“:rating”: { “N”: “8.5” }

}’

This also illustrates the point that the filter expression must always include the same query key condition. So, you repeat it.

Additional resources

For more on this topic, explore the BMC Big Data & Machine Learning Blog or check out these resources:

]]>
How To Add Data to Amazon DynamoDB https://www.bmc.com/blogs/dynamodb-adding-data/ Wed, 22 Jul 2020 00:00:39 +0000 https://www.bmc.com/blogs/?p=18078 In this article, we show how to add data to Amazon DynamoDB using the command line and Python. If you’re new to this product, see our DynamoDB introduction. (This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.) Set up DynamoDB First, download DynamoDB from Amazon. Run it locally to avoid […]]]>

In this article, we show how to add data to Amazon DynamoDB using the command line and Python. If you’re new to this product, see our DynamoDB introduction.

(This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.)

Set up DynamoDB

First, download DynamoDB from Amazon. Run it locally to avoid paying subscription fees before you’re ready to push your project to the cloud. (Amazon says this is how you should use their database.)

Unzip DynamoDB then start it like this:

java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

Install the Amazon Boto3 API. Boto3 is Amazon’s Python interface to all their products, S3, DynamoDB, etc.

pip install boto3

Now, we will make up some data. This will be financial transactions. The key will be transNo. so create the expenses table. You need to install the AWS CLI client first.

Note that we use endpoint-url to indicate that we are using DynamoDB locally.

aws dynamodb create-table \
    --table-name expenses \
    --attribute-definitions AttributeName=transNo,AttributeType=N \
    --key-schema AttributeName=transNo,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
--endpoint-url http://localhost:8000

Open a Python shell and check that table exists.

import boto3

boto3.resource('dynamodb', endpoint_url='http://localhost:8000').Table('expenses')

It should respond:

dynamodb.Table(name='expenses')

Add data from the command line

Below we add a transaction with just a transaction number and a date. The N means that the transNo is a number and S means that the date is a string. (DynamoDB recognizes this ISO-8601 date format, so you can work with that attribute as if it were a date.)

Important note: When you use the put-item you have to put the data type (N, number, S, string, etc.) with the value in the JSON. You put numbers in quotes too. When you use Boto3 you don’t need to do that.

aws dynamodb put-item \
--table-name expenses \
--item '{
"transNo": {"N": "1" },
"date": {"S": "2020-03-19"}
}' \
--return-consumed-capacity TOTAL --endpoint-url http://localhost:8000

Add data with Python Boto3

The code below is self-explanatory, with these additional notes.

  • Dictionaries and JSON are also the same when using Python. So, construct JSON using dictionaries as it’s far simpler. The put_item method will accept a dictionary or JSON.
  • Note that Boto3 does not accept floating point numbers. Instead use the Decimal
  • For an amount, we pick a random integer randint() and multiply it by random(), since that’s less than 1 (financial amounts are probably usually greater than 1).

The data we pass to DynamDB looks like this:

{'transNo': 87049131615, 'amount': Decimal('373.5446821689723'), 'transDate': '2020-03-19'}

The complete code

Here is the complete code:

import boto3
import random
from decimal import *


def load_transactions(dynamodb):

    table = dynamodb.Table('expenses')

    trans = {}

    trans['transNo'] = random.randint(100000, 99999999999)
    trans['amount'] = Decimal(str(random.random()*random.randint(10,1000)))
    trans['transDate'] = '2020-03-19'

    print(trans)

    table.put_item(Item=trans)


if __name__ == '__main__':


    dynamodb = boto3.resource('dynamodb',  endpoint_url = "http://localhost:8000")

    load_transactions(dynamodb)

Additional resources

For more on this topic, explore the BMC Big Data & Machine Learning Blog or check out these resources:

]]>
Introduction to Amazon DynamoDB https://www.bmc.com/blogs/amazon-dynamodb/ Wed, 15 Jul 2020 00:00:45 +0000 https://www.bmc.com/blogs/?p=18025 DynamoDB is a key-value, noSQL database developed by Amazon. It’s unlike some other products offered by Amazon and other vendors in that it’s not just an open source system, like Spark, hosted on the vendor’s platform. Amazon wrote this for their own internal needs and now they make it available to their customers. (This tutorial […]]]>

DynamoDB is a key-value, noSQL database developed by Amazon. It’s unlike some other products offered by Amazon and other vendors in that it’s not just an open source system, like Spark, hosted on the vendor’s platform. Amazon wrote this for their own internal needs and now they make it available to their customers.

(This tutorial is part of our DynamoDB Guide. Use the right-hand menu to navigate.)

How does DynamoDB work?

DynamoDB looks just like JSON, with the one difference being that each JSON record must include the record key. That has the advantage that it lets you do updates on a record. In a JSON database, like MongoDB, you cannot update records. Instead you must delete them then add back the changed version to effect the same change.

DynamoDB also lets you work with transactions, something that MongoDB supports as well. Not all noSQL databases let you do that. This is important as certain database operations logically must go together. For example, a sales transaction must both decrement inventory and increase cash-on-hand. If one of those two operations failed then the sales and inventory systems would be out of balance.

You work with the database using the AWS command line client, APIs for different programming languages, their NoSQL workbench desktop tool, or on the Amazon AWS website. For example, the screen below shows how you create a table.

Notice that you just create the key. That’s because it’s JSON, meaning there’s no structure, no schema. So all the other attributes can be anything.

DynamoDB Definitions

DynamoDB has these concepts and more:

  • Table: a collection of items
  • Item: a collection of attributes. (Other databases call these records or documents.)
  • Stream: like a cache that holds changes in memory until they are flushed to storage.
  • Partition key: the primary key. It must be unique.
  • Partition key and sort key: a composite primary key, meaning a partition key with more than one attribute, like employee name and employee ID (necessary because two employees could have the same name).
  • Secondary indexes: you can index other attributes that you frequently query to speed up reads.

API and SDK

As with most cloud systems, DynamoDB exposes its services via web services. But that does not mean you have to format your data to JSON and then post it using HTTP. Instead they provide software development kits (SDKs). The SDK takes the requests you send it and then translates that to HTTP calls behind the scenes. In this way, the SDK provides a more natural and far less wordy way to work with the database. The SDK lets you work with DynamoDB as you would work with regular objects.

The SDK has these methods:

  • PutItem
  • BatchWriteItem
  • GetItem
  • BatchGetItem
  • Query
  • Scan
  • UpdateItem
  • DeleteItem
  • ListStreams
  • GetShardIterator
  • GetRecords
  • TransactWriteItems
  • TransactGetItems

AWS CLI

As with other Amazon products you can use the AWS command line client. That lets you run database operations from the command line without having to write a program. You use JSON to work with DynamoDB.

For example, there are these operations and a few more:

  • aws dynamodb create-table
  • aws dynamodb put-item

SDKs

DynamoDB has SDKs for these programing languages:

  • Java
  • JavaScript
  • .NET
  • js
  • PHP
  • Python
  • Ruby
  • C++
  • Go,
  • Android
  • iOS

For Java and .NET, they provide objects. Those let you work with table items as if they were objects in those programming languages. And it translates from data types, letting you use, for example, you can use dates and bytes instead of being limited to facsimiles of those as strings or numbers, as you would with JSON.

For example, with Java you add this @annotation then can work with the table as if it were a Java object. That way you don’t have to do something that is wordier, like using SQL or JSON.

@DynamoDBTable(tableName = "ProductCatalog")
    public static class CatalogItem {
        private Integer id;
        private String title;
        private String ISBN;
        private Set bookAuthors;

DynamoDB is downloadable

You can also download DynamoDB and run it on your local system. Their web site does not mention whether this is free, meaning whether you could use it forever and never pay. The idea is you use it to test code locally, and thus save money, before committing it to the cloud.

You can, for example, add it as a Maven dependency in your Java project. That means when you run your Java code in the code editor it will download DynamoDB and spin up an instance for you with little to no configuration required.

Amazon DynamoDB Accelerator (DAX)

DAX is an optional feature that turns DynamoDB into an in-memory database.

Integration with other systems

  • Amazon Glue. In this case you pull data from DynamoDB into Amazon Glue. There you to ETL and then write it out to other systems like the Amazon Redshift data warehouse.
  • Apache Hive on Amazon EMR. This is more a back-and-forth interface. You can use this to, for example, using HiveQL (the Hive SQL language) to query DynamoDB tables. And you can copy data into Hadoop. (Hive rides atop Hadoop to support EMR, which is the mapReduce operation.) You can also join DynamoDB tables in Hive.

Quotas

DynamoDB users are subject to quotas by Amazon. These are given in capacity units. The user can request an additional quota. For example, for a write transaction 1 capacity unit = one write per second up to 1KB.

Transactions, reads, and stream operations are all capacity units as well.

Pricing

As with EC2, there are separate prices for on-demand and provisioned (i.e., dedicated) resources. For a provisioned instance, a write capacity unit is $0.00065. Read is $0.00013.

Storage prices vary according to the Amazon data center. US-East in Ohio is $0.25 per 25 GB.

DAX is priced by the hour and varies according to virtual CPU and memory size. For example, 1 vCPU and 1.5 GB memory is $0.04 per hour.

Additional resources

For more on this topic, explore the BMC Big Data & Machine Learning Blog or check out these resources:

]]>