Comparing AWS Cloud Database Technologies: Non-Relational Databases

A substantial amount of database types are now offered in the cloud to suite a variety of business use cases. This adds more flexibility to data and application solution design but increases the difficultly of distinguishing which type is best suited depending on the scenario. In a previous post, we looked at the relational databases available on AWS and defined how they can be best leveraged for a variety of solutions. This post explores the diametric set of databases that fall into the non-relational category and serves to disentangle those services which are offered on AWS.

Non-Relational Databases

This is the opposite of a Relational database, here the data is not stored in a tabular structure (columns and rows) and no predefined relationship exists among tables. This means it's based on the type of data its storing which makes it very flexible and adaptable.

1) Key-Value

A key-value database stores data as a key-value pair. An example is a dictionary, where the key is the unique identifier, and the value holds the associated attributes. The two major key-value databases available on AWS are Amazon DynamoDB and Amazon Keyspaces.

A) Amazon DynamoDB

This is a fully managed serverless NoSQL database in AWS. It is one of the most commonly used NoSQL databases because it supports Updating transactions across multiple tables (ACID) and it allows for in-memory caching with DAX. These are some of the features:

Global Tables: Multi-region, multi-master database
Backups are allowed and supports point-in-time recovery
Single-digit millisecond performance at any scale
Supports CRUD (Create/Read/Update/Delete) operations through APIs
No direct analytical queries (joins are not allowed)
Access patterns must be known ahead of time for efficient design and performance

B) Amazon Keyspaces

Amazon Keyspaces is a fully managed serverless database that is used to execute Cassandra workloads on AWS. Cassandra is an open-source NoSQL database. Keyspaces is available in both On-demand and Provisioned mode.

C) AWS S3

S3 is also considered a key-value database which is used for storing huge volumes of data (semi-structured or unstructured data). For each uploaded file, the Key is the unique filename, and the value is the content of the file.

2) Document

Amazon DocumentDB

This is a document NoSQL database; it is used for storing and managing json-like documents. This data model is commonly used by developers because it has same data format used in their application code. Documents store data in field-value pairs. There is only one document database on AWS (Documents store data in field-value pairs)

This is a fully managed NoSQL document database for executing MongoDB workloads. These JSON documents are stored in collections. A collection is a group of documents similar to a table. It uses the same architecture as Aurora.

3) IN-Memory

Used for performing in-memory tasks mainly in situations where accessing data in a disk might be an expensive operation. So having an in-memory database helps save time and improves performance rather than pulling the same information from the disk.

A) Amazon ElastiCache

This is the main fully managed in-memory service available on AWS. It has its own dedicated caching instance (Remote cache). Elasticache supports two in-memory engines (Redis and Memcached). Redis is suitable for complex applications including message queues, session caching, leaderboards, etc. Memcached on the other hand is suitable for simple Application Aurora (w/ integrated cache) Database caches and is also useful when working with Multithreaded architecture.

B) DynamoDB Accelerator (DAX)

Dax is an in-memory caching service that is used with DynamoDB, it allows for faster in-memory operations and better performance when working with DynamoDB. There are two types of DAX caches (item cache and query cache). Item cache stores results of index reads while Query cache stores results of Query and Scan operations. A DAX’s use case is when users access a small number of items more frequently than others.

4) Graph

Graph database shows how data is interconnected. It provides a high-level detail of the relationship between the data in a database using nodes (stores data entities) and edges (stores the relationship information between edges).

Amazon Neptune

Amazon Neptune is the fully managed graph database service available on AWS. This database makes it easy to quickly access complex relationships between connected datasets. It uses Apache TinkerPop Gremlin and RDF/SPARQL as the graph query languages.

Useful Scenario

Fraud detection and recommendation engines. For fraud detection, transactions will be stored as graphs, and this will help identify related pieces in a dataset. Once the patterns are detected, it then becomes easy to find the fraudulent ones.

5) Search

This service makes it easy to search for any kind of information in your data warehouse and to provide near real-time visualizations and analytics of your data (this includes log files, text files, messages etc.).

Amazon OpenSearch Service

This is the fully managed search service available on AWS. This is an open-source fork of Elasticsearch and Kibana, it was recently renamed from Amazon Elasticsearch Service to Amazon OpenSearch Service.

Useful Scenario

This is mostly used by developers, and it can be used for Full-text search and Log analytics. An example is searching documents for a particular word. It provides an aggregate count of the word and summarizes the data.

6) Time Series

Time series database is used to effectively store and retrieve trillions of events in real-time. This data is stored as a pair of time and associated value. Using this process facilitates time series analysis, since we are working with data points in time.

Amazon Timestream

This is a fully managed serverless time-series database service. It is used to process a huge volume of data over time.

Useful Scenario

Stock market and IoT device data of high volume where trending of patterns centered on time is the most important dimension by which to analyze data. AWS Timestream has in-memory capabilities which make real-time use case (on analyzing the most recent data) extremely performant.

7) Ledger

This database is an append-only NoSQL database i.e. it is an immutable, transparent, and cryptographically verifiable ledger.

Amazon QLDB

This is a fully managed serverless ledger database that uses PartiQL as the query language and stores the data in Amazon ION format. The three main features of QLDB are Ledger, Journal, and Tables.

Ledger contains a journal and list of Tables.
Journal holds the ordered history of the cryptographically verifiable entry of every change made in the tables.
Tables are set of documents i.e. actual data and are stored in the Amazon ion format.

Useful Scenario

A government agency requires a method to track the history of Vehicle ownership. In this case, a non-immutable record-based system such as AWS QLDB which stores the history of Vehicle ownership over time could be a perfect fit.

Lastly, I have highlighted the differences between the NoSQL databases in the table below:

Database	Data Type	Workloads	Data Size	Performance
Amazon DynamoDB	Semi-structured	Transactional key-value / Document Store	High TB range	Ultra-high throughput, low latency (ultra-low latency with Dax)
Amazon Keyspaces	Semi-structured	Cassandra	N/A	Low latency
Amazon DocumentDB	Semi-structured	MongoDB	Up to 64 TB	High throughput, low latency
Amazon ElastiCache	Semi-structured/ Unstructured	In-memory caching	Low TB Range	High throughput, ultra-low latency
Amazon Neptune	Graph-Structured	Highly connected graph datasets	Mid TB Range	High throughput, low latency
Amazon QLDB	Structured/ Semi-structured	Transactional	N/A	High throughput, low latency

In Conclusion

The explosion of non-relational database services has simplified and optimized backend architectures for a variety of old and new use cases. These non-relational databases cater to specific use cases and what AWS has been doing in recent times is bringing all these databases into their platform, therefore making it easy to have access to them in one environment. For example, Amazon QLDB is used for developing ledger databases and Amazon Keyspaces is used to run Cassandra workloads. If you are looking for a cloud solution that fits your business, you can reach out to us directly.

Comparing AWS Cloud Database Technologies: Non-Relational Databases

Non-Relational Databases

1) Key-Value

A) Amazon DynamoDB

B) Amazon Keyspaces

C) AWS S3

2) Document

Amazon DocumentDB

3) IN-Memory

A) Amazon ElastiCache

B) DynamoDB Accelerator (DAX)

4) Graph

Amazon Neptune

Useful Scenario

5) Search

Amazon OpenSearch Service

Useful Scenario

6) Time Series

Amazon Timestream

Useful Scenario

7) Ledger

Amazon QLDB

Useful Scenario

In Conclusion

Got a Project?