MVCC part 1 has been added

2019-04-28 18:51:24 +01:00 · 2019-04-28 18:51:24 +01:00 · 411fbda7af
parent 9b14d4215b
commit 411fbda7af
2 changed files with 131 additions and 6 deletions
--- a/_drafts/mvcc.md
+++ b/_drafts/mvcc.md
@ -0,0 +1,123 @@
 ---
 layout: post
 title:  "Multi-Version Concurrency Control"
 date:   2019-04-28
 categories: DB
 tags: concurrency
 theme: dark
 ---
 Multi version concurrency control or **MVCC** for short is a famous and comonly used concurrency
 control methods in [DBMS](https://en.wikipedia.org/wiki/Database#Database_management_system)s
 and some programming languages (for [Transactional Memory](https://en.wikipedia.org/wiki/Transactional_memory)).
 Like lots of other concepts and algorithms in computer science it is old (introduced in 70s).
 Before we start I presume you are familiar with
 [transaction processing](https://en.wikipedia.org/wiki/Transaction_processing). Also as an heads up, Since
 MVCC is a huge topic and far beyond a blog post, I'll split this topic into several posts. In this post
 we're going to have an overview of MVCC.
 ## What is Concurrency Control ?
 Concurrency control is the procedure in data oriented systems such as a DBMS or a programming language for managing
 simultaneous operations without conflicting with each another. Concurrent access is quite easy if all everyone
 wants to just read data. In a read only environment there is no way that read operations can interfere with one
 another. But the purpose of every system in this world is to process some data and make changes to the world. Write
 operations are important part of each system and concurrency controll is all about handling simultaneous write
 operations in a conflict free way.
 **MVCC** is one of the most popular and widely used concurrency control methods. For more on concurrency control
 checkout [this wikipedia page](https://en.wikipedia.org/wiki/Concurrency_control)
 ## MVCC
 According to MVCC, the system (DBMS or a programming language) maintains multiple physical versions of a single
 logical object (any thing under the control of the system, either a tuple in relational DBMSs or some data
 in memory controlled by a programming language ) in the system:
 * When a transaction writes to an object, the system creates a new version of that object.
 * When a transaction reads an object, it reads the newest version that existed when the transaction started.
 We'll see how MVCC works in a minute but let's discuss why to use MVCC ?
 There are lots of benefits to using MVCC as the concurrency control method but some of the main benefits
 are:
 * Writes don't block readers:
  With MVCC write operations can be done in a way which no reader would get blocked by the write operation
  which is the case in [Two Phase Locking](https://en.wikipedia.org/wiki/Two-phase_locking)
 * Lock free read operations via consistent snapshots:
  Read only transactions don't have to acquire a lock anymore because they will provided by a snapshot
  of the current state of the system to operate on.
 * Time Traveling Operations:
  With storing all the versions of an object in the system, we easily can operate on a specific version
  of an object for a given time. For example in the case of DBMS, we can run a query against the state
  of the database from 2 years ago.
 MVCC useful not just for concurrency control. It can shine when it comes to multi version data control
 as well.
 ## Snapshot Isolation (SI)
 In order to understand how MVCC works, first we need to know about snapshot isolation (SI). MVCC and
 SI have a two way relationship. By two way relationship I mean, In order to implement MVCC we need
 to implement SI and if we want to have SI in our system we need to have MVCC as well (does it make sense?).
 Basically when a transaction starts, the system provides the transaction with a consistent
 snapshot of the current state of system. By current, I mean the exact state of the system just before
 the transaction started and by consistent I mean, the snapshot would not contain any uncommited data
 from a running transaction. So If in any given time transaction T1 is running and T2 is about to start,
 the system would not include T1 changes in the snapshot which is going to be used for T2. Simple as that.
 This way we would not end up with torn writes (for example when a writes operation which is supposed to
 write two objects in the state, writes only the first one) from any running transaction.
 Also the important rule here is that if two transactions want to update the same object the first one
 will wins and the second one has to retry.
 Snapshots might be physical or logical. Depends on the system. For example in a DBMS it does not make
 sense to copy the database state to each transaction (physical snapshot) because obviously it would be
 huge. Instead it use logical snapshots which using the same physical data. But in a programming language,
 it might be much faster to just use a physical snapshot of some data in memory instead of handling the overhead
 of the necessary book keeping for a logical snapshot.
 It's important to bear in mind that SI is not serializable isolation by default. If you need to implement
 serializable isolation for the snapshots in your system you have to take care of some extra stuff.
 ## Design of MVCC
 In order to implement MVCC in a system we need to decide between different aspects of the system
 which would be involved with MVCC. The most crucial aspects are:
 * Book keeping of data we need to store
 * Concurrency control protocol
 * Index Management
 * Garbage Collection
 * Storage
 ### Data book keeping
 Depends on the concurrency control protocol we want to use, we have to manage some extra data about
 every object in our system. In general we need to keep track of the following information about each
 object:
 * Transaction ID (`TxID`)
 * Life time of each object:
  * When the transaction that operate on this object began: `BEGIN-TS`
  * When the transaction that operate on this object ended: `END-TS`
 * A link to the previous/next versions of the same object
 And some other information depends on the protocol we use for concurrency control. It's crucial to
 decide who to manage and store these data in your system and it's totally depends on the nature of
 your system. Is it a disk oriented, single node databse management system ? is it a programming
 language operating on a single threaded environment ? or maybe it's an in-memory, distributed
 database management system ?
 Whatever it is you have to keep in mind that computer science is about tradeoffs. There is no
 ultimate answer. For example storing these kind of data along side with the object it self can
 increase your storage usage but can save you lots of computation time. It can be wise to do it
 in a DBMS but not in a programming language to implement STM.
 ### Concurrency control protocols for MVCC
 * Mutli Version Timestamp Ordering  (MTVO)
 * The "Optimistic Concurrency Control" (MVOCC)
 * Multi Version 2 Phase Locking (MV2PL)
 * Serializable Snapshot Isolation (SSI)
--- a/_drafts/variant-types-of-transactions.md
+++ b/_drafts/variant-types-of-transactions.md
@ -23,12 +23,14 @@ have to book 4 flights from, `C1 -> CA -> CB -> C2`. The process of booking each
 is a transaction by itself and the whole process is a transaction too.
 * Bulk updates
-Let's say we want to update billion tuples. What if the very last tuple fails to update. Then we
+Let's say we want to update billion tuples. What if the very last tuple fails to update and cause
-need to revert our changes to a billion tuples.
+the transaction to abort. Then we need to revert the changes made by the transaction and revert
 a billion tuples which obviously is a huge task.
-## Savepoints transactions
+
-These transactions are similar to save point transaction but they have one extra thing which is
+## Transaction Savepoints
-save points So any where in there transaction users case ask for a save point and again they can
+These transactions are similar to flat transaction with addition of one extra thing which is
 save points. So any where in there transaction users case ask for a save point and again they can
 rollback to a save point or rollback the entire transaction.
 ```sql
@ -38,7 +40,7 @@ BEGIN
    SAVEPOINT 1
    WRITE(B)
    SAVEPOINT 2
-    ROLLBACK 2
+    ROLLBACK TO 1
 COMMIT
 ```