Multi-Version Concurrency Control
- sumitnagle
- Jun 27
- 5 min read
We have discussed two main ways of managing concurrency control, optimistic locking and pessimistic locking! and now we will go through some other ways through which we achieve concurrency control.
Multi-version concurrency control method manages concurrent transactions without locking the database resources. It ensures consistency and isolation while allowing high concurrency by using these techniques,
Versioning. Instead of locking rows during a transaction, the database maintains multiple versions of each data item. Every write operation creates a new version of the data while old versions are kept for transactions that might still need them.
Timestamps. Each transaction is assigned a timestamp or transaction ID. Every data version also has, a creation timestamp indicating when it was created and an expiration timestamp or validity range to determine until when it is valid.
Traditional locks (pessimistic locking) can cause performance bottlenecks because, readers block writers and writers block readers. MVCC avoids this by giving each transaction a "snapshot" of the database as it existed at the start of the transaction.
Transactions read the version of data that was committed before they started. This ensures that transactions don't see uncommitted changes which solves the dirty read problem.
When a transaction updates data, it creates a new version with a new timestamp. The old version remains accessible to older transactions. For example, say, a transaction A is reading version 1 of a data, now if another transaction B updates this data, then a new version (version 2) is created, now if transaction A again reads the data, it still reads version 1 of data. When this version 2 is committed, this version 2 becomes visible.
As, reads and writes don't block other reads and writes, this makes it highly concurrent, thus also avoid deadlock as it reduces lock contention. Each transaction works with a consistent snapshot of the database. However, due to multiple snapshots of the same data, so it requires additional storage to maintain multiple versions, also these old versions must be periodically cleaned (vacuuming in PostgreSQL).
If you check (MVCCEmployees)! each row snapshot have two metadata, xmin (transaction ID that created this row version) and xmax (transaction ID that deleted (or superseded) this row version).
Say we have an employee,
When a transaction (currentTransactionId) wants to read a data,
SELECT * FROM MVCCEmployees WHERE ID = 1;It will see only exactly one (will discuss why only one) snapshot (a version of this above row),
which was created before our transaction started, mathematically, xmin <= currentTransactionId
and it hasn't been delete yet, i.e. either xmax is null or xmax > currentTransactionId. i.e. it ignores snapshot which had xmax <= currentTransactionId.
And when this transaction wants to update this row (or to say this current snapshot) it doesn't change this existing snapshot, instead it creates a new snapshot with new xmin to currentTransactionId and deletes the old snapshot (by setting xmax to currentTransactionId). However this old snapshot is retained until no active transaction needs them, after that it is garbage collected.
In this way each transaction sees data based on the snapshot it got when it started, and this is how, readers don’t block writers and writers don’t block readers.
For example, for above data say we have three transactions, with transactionId's 101, 102 and 103. And say transaction A (101) tries to read the data, so it see the current data,
Now transaction B (102), and it wants to update this row, and so it also sees the same current data, and when it tries to update, say increase the salary 😌, this will create a new snapshot, and delete the current version.
Now if another transaction C (103) tries to read the data it, it will choose the second snapshot, as this is the latest committed snapshot! However, if go back to transaction A, even if it tries to read the data again, it will see the first snapshot and not this new one! You can do you own homework and check those two condition and verify that based on conditions only the early snapshot validates and not this new one!
And while doing this math, one question may come in your mind! what if two snapshot validates the reading condition? the answer is never. We will never have two version that satisfies the visibility rules for a single transaction at the same time. Because when a row is updated (new snapshot is created), the old version’s xmax is set to the updating's transactionId, and the new version’s xmin is the updating's transactionId. And at any snapshot point (say transactionId1), only one of these versions can satisfy the visibility rules,
the old version's xmax will either be less than or equal to transactionId1 (making it invisible).
or the new version's xmin will be greater than transactionId1 (making it invisible).
Or one of them is visible and not both.
To conclude, a transaction’s snapshot (the one it will see, while it commits or rollback) freezes the visible versions at its start! and it doesn’t see changes made by other transactions that started after it began even if those other transactions commit, this avoid dirty reads.
MVCC doesn't provide latest, real-time data, instead they provide a consistent snapshot of data (which is not necessarily the latest). This ensure consistency while avoiding locking, as within one transaction, multiple reads of the same data always give the same result, avoiding non-repeatable reads.
A Question! Is this control good for banking system? No! can we still use it? Yes, we can still use MVCC by introducing locking. MVCC normally avoids locking by letting each transaction work with its own snapshot of data, allowing reads and writes to happen concurrently without blocking. However, in critical cases like banking transactions where we must read a value and immediately update it safely (like checking and deducting a balance), MVCC alone isn't enough, because another transaction could update the same row in parallel. To handle this, we combine MVCC with explicit locking using statements like SELECT FOR UPDATE, which locks the row for the rest of the transaction. This ensures no other transaction can modify that row until the current one finishes, providing safe, isolated control while still benefiting from MVCC’s versioning for other, non-locked operations.
Before we close, we will see yet another concurrency control,
Timestamp-Based Concurrency Control
Timestamp-based concurrency control method assigns a unique timestamp to each transaction. These timestamps help determine the order of execution and resolve conflicts.
Each data item (i.e., a database row) keeps track of rts (read timestamp, latest time it was read) and wts (write timestamp, latest time it was written) (check TCCEmployees).
When a transaction want to read,
Proceeds if its timestamp (unique) is newer than current data's wts i.e. it is reading the latest committed version, and then it updates rts.
Otherwise, aborted because another transaction must have already wrote a newer value.
When a transaction want to write the data,
Proceeds, if its timestamp is newer than rts (meaning no newer transaction has read this data yet) and also newer than wts (meaning no newer transaction has already written to this data), and then it updates the wts.
Otherwise, aborted because some other transaction must have already read or wrote a newer value.
Since transactions don't wait for each other, deadlocks are avoided and multiple transactions can proceed in parallel unless conflicts occur. Also it provides serialisability. However, this have a drawback, as the condition for writing and reading is very strict, older transactions are more likely to be aborted if newer ones conflicts, which means, some transactions may be repeatedly aborted and never complete.