Distributed Database System

sumitnagle
Jul 5
4 min read

We have done a lot with single database, meaning there is only one server/node/machine (whatever you want to call it!) which serves our database purpose, also know as centralised database system. Is it sufficient? Yes, but only till a limit, because after a certain point, the database might become bottle-neck, and as with any single entity system, they become single point of failure. For that purpose, we need to use multiple database servers. This is what we refer as distributed database system.

In a distributed database system, data is stored across multiple physical locations (servers), which can be on the same network or spread across different geographical locations. Despite being distributed, the database appears as a single unified system to the end-users. Distributed databases ensure data distribution (and even replication) to improve reliability, scalability, availability, and performance.

In distributed database system, all nodes can be of same DBMS (for example all nodes are MySQL) called as homogeneous distributed database or nodes may have different DBMS (for example, some nodes are MySQL and other are MongoDB) called as heterogeneous distributed database, which is more complex to manage due to differences in data models and query languages.

As the database is distributed, we can make our data partitioned (data partitioning) (we will discuss in more details in later blogs, and for now think like, each database servers only a particular sets of data, means splitting data) or replicated (data replication) (again, more details later, but for now copying data at multiple places).

Also, we had discussed transaction management and concurrency control on a single, centralised database system. However, these mechanisms work differently in a distributed database system.

In a distributed database system, transaction management ensures that a transaction spanning multiple database nodes follows ACID properties, managing start, commit, or rollback across all involved sites. Alongside this, concurrency control handles conflicts between transactions running on different nodes, ensuring consistency when multiple users access or modify distributed data simultaneously. To coordinate this, distributed databases use techniques like two-phase commit (2PC) for atomicity and distributed concurrency control protocols (like distributed locking or timestamp ordering) to maintain isolation across nodes while managing failures, latency, and partial commits.

Due to distributed nature of database system, consistency (data need to be consistent across all the nodes) and availability (all or some of the nodes, need to be available) becomes an important factor, which is not required, for single, centralised database system. And for distributed systems, the CAP theorem is a fundamental concept. It states that in the presence of a network partition, one must choose between maintaining strict consistency or high availability. Don't worry we will be covering them later!

Also, we can easily intuit that consistency is an important concern for distributed database system. Because of this, distributed database system declare their consistency model, and it defines the rules about the visibility and ordering of updates in a distributed system, essentially, it specifies what data values a read operation can return after a write, especially when multiple replicas or concurrent users are involved. Different models offer different guarantees,

Strong consistency (linearisability), every read reflects the most recent write.
Eventual consistency, replicas may temporarily diverge, but will eventually become consistent if no new writes happen.
Causal consistency, respects cause-effect relationships between operations.
Read-your-writes consistency, a client always sees its own updates.

The choice of model affects system performance, availability, and correctness trade-offs, especially under network partitions (as per the CAP theorem). Distributed database system can be designed to support multiple consistency models for different use cases, or configurable consistency levels for individual operations. For example, Cassandra lets us choose consistency levels per read/write (like ONE, QUORUM, or ALL), MongoDB offers tuneable consistency via read/write concerns, DynamoDB provides both eventual and strongly consistent reads as options.

While concurrency control and transaction management ensure local correctness (preventing anomalies such as lost updates within or across nodes), the consistency model defines how soon and under what guarantees changes become visible across all replicas once a transaction has committed.

concurrency control is the mechanism, transaction management is the process, and the consistency model is the rule being followed.

Consistency models sets the target rules for how data should behave under concurrent access and distributed replication. Based on the chosen model, the system decides,

Transaction management. How to enforce ACID (or relax it if needed).
Concurrency control techniques. Which protocols (locking, MVCC, timestamps, and so on) to use to achieve the desired isolation and ordering.
Data replication strategies. Synchronous or asynchronous, quorum-based, leader-follower, all chosen to align with the desired consistency and availability.
Recovery and durability mechanisms. WAL, shadow paging, 2PC/3PC for distributed durability.

The desired concurrency model heavily influences other system design choices, but other factors like performance, fault tolerance, and business requirements also play a big role.

🌟 Based on system needs and CAP theorem trade-offs, we first decide between prioritising consistency or availability under network partitions. Then, we choose a suitable consistency model (like strong or eventual) to match that choice. This, in turn, guides our data replication strategy (synchronous or asynchronous) and system behaviour during normal operation.

When a network partition occurs, the system behaves as designed (such as, sacrificing availability to maintain consistency, or vice versa). After recovery, conflict resolution techniques (like last-write-wins, vector clocks, or app-specific logic) handle any data divergence.

Distributed Database System

There's a lot to cover, but lets start with the pillars of distributed database system, which is the CAP theorem.

Recent Posts