MVCC part 1 has been added

2019-04-28 18:51:24 +01:00 · 2019-04-28 18:51:24 +01:00 · 411fbda7af
parent 9b14d4215b
commit 411fbda7af
2 changed files with 131 additions and 6 deletions
--- a/_drafts/mvcc.md
+++ b/_drafts/mvcc.md
@ -0,0 +1,123 @@
+---
+layout: post
+title:  "Multi-Version Concurrency Control"
+date:   2019-04-28
+categories: DB
+tags: concurrency
+theme: dark
+---
+
+Multi version concurrency control or **MVCC** for short is a famous and comonly used concurrency
+control methods in [DBMS](https://en.wikipedia.org/wiki/Database#Database_management_system)s
+and some programming languages (for [Transactional Memory](https://en.wikipedia.org/wiki/Transactional_memory)).
+Like lots of other concepts and algorithms in computer science it is old (introduced in 70s).
+
+Before we start I presume you are familiar with
+[transaction processing](https://en.wikipedia.org/wiki/Transaction_processing). Also as an heads up, Since
+MVCC is a huge topic and far beyond a blog post, I'll split this topic into several posts. In this post
+we're going to have an overview of MVCC.
+
+## What is Concurrency Control ?
+Concurrency control is the procedure in data oriented systems such as a DBMS or a programming language for managing
+simultaneous operations without conflicting with each another. Concurrent access is quite easy if all everyone
+wants to just read data. In a read only environment there is no way that read operations can interfere with one
+another. But the purpose of every system in this world is to process some data and make changes to the world. Write
+operations are important part of each system and concurrency controll is all about handling simultaneous write
+operations in a conflict free way.
+
+**MVCC** is one of the most popular and widely used concurrency control methods. For more on concurrency control
+checkout [this wikipedia page](https://en.wikipedia.org/wiki/Concurrency_control)
+
+## MVCC
+According to MVCC, the system (DBMS or a programming language) maintains multiple physical versions of a single
+logical object (any thing under the control of the system, either a tuple in relational DBMSs or some data
+in memory controlled by a programming language ) in the system:
+
+* When a transaction writes to an object, the system creates a new version of that object.
+* When a transaction reads an object, it reads the newest version that existed when the transaction started.
+
+We'll see how MVCC works in a minute but let's discuss why to use MVCC ?
+
+There are lots of benefits to using MVCC as the concurrency control method but some of the main benefits
+are:
+
+* Writes don't block readers:
+  With MVCC write operations can be done in a way which no reader would get blocked by the write operation
+  which is the case in [Two Phase Locking](https://en.wikipedia.org/wiki/Two-phase_locking)
+
+* Lock free read operations via consistent snapshots:
+  Read only transactions don't have to acquire a lock anymore because they will provided by a snapshot
+  of the current state of the system to operate on.
+
+* Time Traveling Operations:
+  With storing all the versions of an object in the system, we easily can operate on a specific version
+  of an object for a given time. For example in the case of DBMS, we can run a query against the state
+  of the database from 2 years ago.
+
+MVCC useful not just for concurrency control. It can shine when it comes to multi version data control
+as well.
+
+## Snapshot Isolation (SI)
+In order to understand how MVCC works, first we need to know about snapshot isolation (SI). MVCC and
+SI have a two way relationship. By two way relationship I mean, In order to implement MVCC we need
+to implement SI and if we want to have SI in our system we need to have MVCC as well (does it make sense?).
+
+Basically when a transaction starts, the system provides the transaction with a consistent
+snapshot of the current state of system. By current, I mean the exact state of the system just before
+the transaction started and by consistent I mean, the snapshot would not contain any uncommited data
+from a running transaction. So If in any given time transaction T1 is running and T2 is about to start,
+the system would not include T1 changes in the snapshot which is going to be used for T2. Simple as that.
+
+This way we would not end up with torn writes (for example when a writes operation which is supposed to
+write two objects in the state, writes only the first one) from any running transaction.
+
+Also the important rule here is that if two transactions want to update the same object the first one
+will wins and the second one has to retry.
+
+Snapshots might be physical or logical. Depends on the system. For example in a DBMS it does not make
+sense to copy the database state to each transaction (physical snapshot) because obviously it would be
+huge. Instead it use logical snapshots which using the same physical data. But in a programming language,
+it might be much faster to just use a physical snapshot of some data in memory instead of handling the overhead
+of the necessary book keeping for a logical snapshot.
+
+It's important to bear in mind that SI is not serializable isolation by default. If you need to implement
+serializable isolation for the snapshots in your system you have to take care of some extra stuff.
+
+## Design of MVCC
+In order to implement MVCC in a system we need to decide between different aspects of the system
+which would be involved with MVCC. The most crucial aspects are:
+
+* Book keeping of data we need to store
+* Concurrency control protocol
+* Index Management
+* Garbage Collection
+* Storage
+
+### Data book keeping
+Depends on the concurrency control protocol we want to use, we have to manage some extra data about
+every object in our system. In general we need to keep track of the following information about each
+object:
+
+* Transaction ID (`TxID`)
+* Life time of each object:
+  * When the transaction that operate on this object began: `BEGIN-TS`
+  * When the transaction that operate on this object ended: `END-TS`
+* A link to the previous/next versions of the same object
+
+And some other information depends on the protocol we use for concurrency control. It's crucial to
+decide who to manage and store these data in your system and it's totally depends on the nature of
+your system. Is it a disk oriented, single node databse management system ? is it a programming
+language operating on a single threaded environment ? or maybe it's an in-memory, distributed
+database management system ?
+
+Whatever it is you have to keep in mind that computer science is about tradeoffs. There is no
+ultimate answer. For example storing these kind of data along side with the object it self can
+increase your storage usage but can save you lots of computation time. It can be wise to do it
+in a DBMS but not in a programming language to implement STM.
+
+### Concurrency control protocols for MVCC
+
+* Mutli Version Timestamp Ordering  (MTVO)
+* The "Optimistic Concurrency Control" (MVOCC)
+* Multi Version 2 Phase Locking (MV2PL)
+* Serializable Snapshot Isolation (SSI)
--- a/_drafts/variant-types-of-transactions.md
+++ b/_drafts/variant-types-of-transactions.md
@ -23,12 +23,14 @@ have to book 4 flights from, `C1 -> CA -> CB -> C2`. The process of booking each
 is a transaction by itself and the whole process is a transaction too.

 * Bulk updates
-Let's say we want to update billion tuples. What if the very last tuple fails to update. Then we
-need to revert our changes to a billion tuples.
+Let's say we want to update billion tuples. What if the very last tuple fails to update and cause
+the transaction to abort. Then we need to revert the changes made by the transaction and revert
+a billion tuples which obviously is a huge task.

-## Savepoints transactions
-These transactions are similar to save point transaction but they have one extra thing which is
-save points So any where in there transaction users case ask for a save point and again they can
+
+## Transaction Savepoints
+These transactions are similar to flat transaction with addition of one extra thing which is
+save points. So any where in there transaction users case ask for a save point and again they can
 rollback to a save point or rollback the entire transaction.

 ```sql
@ -38,7 +40,7 @@ BEGIN
    SAVEPOINT 1
    WRITE(B)
    SAVEPOINT 2
-    ROLLBACK 2
+    ROLLBACK TO 1
 COMMIT
 ```