GATE 2018 NOTES .. : March 2017

Concurrency Control Techniques

1. What is concurrency control? How is it implemented?

Ans: When several transactions execute concurrently, they may result in interleaved operations, and the isolation property of transaction may no longer be preserved. Thus, it may leave the database in an inconsistent state. To understand, consider a situation in which two transactions concurrently access the same data item. One transaction modifies a tuple, and the other makes a decision on the basis of that modification. Now, suppose that the first transaction rolls back. At this point, the decision of the second transaction becomes invalid. Thus, there is a need to control the interaction among concurrent transactions, which is referred to as concurrency control.

The concurrency control is implemented with the help of concurrency control techniques, which ensure that the concurrent transactions maintain the integrity of a database by avoiding the interference among them and further ensure serializability order in the schedule of transactions. The various concurrency control techniques are locking, timestamp-based, optimistic and the multiversion technique.

2. Define lock. What are the two modes of locking?

Ans: A lock is a variable associated with each data item that indicates whether a read or write operation can be applied to the data item. In addition, it synchronizes the concurrent access of the data item. Acquiring the lock by modifying its value is called locking. It controls the concurrent access and manipulation of the locked data item by other transactions and, hence, maintains the consistency and integrity of the database. Database systems mainly use two modes of locking, namely, exclusive locks and shared locks.

Exclusive lock (denoted by X) is the commonly used locking strategy that provides a transaction an exclusive control on the data item. A transaction that wants to read as well as write a data item must acquire an exclusive lock on the data item. Hence, an exclusive lock is also known as the update lock or write lock. If a transaction (say, Ti) has acquired an exclusive lock on a data item (say, Q), no other transaction is allowed to access Q until Ti releases its lock on Q.

Shared lock (denoted by S) can be acquired on a data item when a transaction wants to only read a data item and not modify it. Hence, it is also known as read lock. If a transaction Ti has acquired a shared lock on data item Q, Ti can read but cannot write on Q. In addition, any number of transactions can acquire shared locks on Q simultaneously. However, no transaction can acquire an exclusive lock on Q.

3. What do you understand by lock compatibility? Discuss in detail with examples.

Ans: Whenever a transaction needs to perform some operation on a data item, it should request a lock in appropriate mode on that data item. If the requested data item is not locked by any other transaction, the lock manager grants the request immediately. Otherwise, the lock request may or may not be granted depending on the compatibility of locks. Lock compatibility determines whether locks can be acquired on a data item by multiple transactions at the same time. Suppose a transaction Ti requests a lock of mode m1 on a data item Q on which another transaction Tj currently holds a lock of mode m2. If mode m2 is compatible with mode m1, the request is immediately granted, otherwise rejected. The lock compatibility can be represented by a matrix called the compatibility matrix (see Figure 10.1). The term “YES” indicates that the request can be granted and “NO” indicates that the request cannot be granted.

Figure 10.1 Compatibility Matrix

If the mode m1 is shared, the lock request of transaction Ti is granted immediately, if and only if m2 is also shared. Otherwise the lock request is not granted and the transaction Ti has to wait. On the other hand, if mode m1 is exclusive, the lock request (either shared or exclusive) by transaction Ti is not granted and Ti has to wait.

4. How is locking implemented? What is the role of the lock table in implementation? How are requests to lock and unlock a data item handled?

Ans: The locking or unlocking of data items is implemented by a subsystem of the database system known as the lock manager. It receives the lock requests from transactions and replies them with a lock grant message or rollback message (in case of deadlock). In response to an unlock request, the lock manager only replies with an acknowledgement. In addition, it may result in lock grant messages to other waiting transactions.

The lock manager maintains a linked list of records for each locked data item in the order in which the requests arrive. Each record of the linked list is used to keep the transaction identifier that made the request and the mode in which the lock is requested. It also records whether the request has been granted. Lock manager uses a hash table known as lock table, indexed on the data item identifier, to find the linked list for a data item. Figure 10.2 shows a lock table, which contains locks for three different data items, namely, Q1, Q2 and Q3. In addition, two transactions, namely, Ti and Tj, are shown which have been granted locks or are waiting for locks.

Observe that Ti has been granted locks on the Q1 and Q3 in shared mode. Similarly, Tj has been granted lock on Q2 and Q3 in shared mode, and is waiting to acquire a lock on Q1 in exclusive mode, which has already been locked by Ti.

The lock manager handles the requests by the transaction to lock and unlock a data item in the way given here.

Lock request: When a first request to lock a data item arrives, the lock manager creates a new linked list to record the lock request for the data item. In addition, it immediately grants the lock request of the transaction. However, if the linked list for the data item already exists, it includes the request at the end of the linked list. The lock request will be granted only if the lock request is compatible with all the existing locks and no other transaction is waiting for acquiring lock on this data item. Otherwise, the transaction has to wait.

Figure 10.2 Lock Table

Unlock request: When an unlock request for the data items arrives, the lock manager deletes the record corresponding to that transaction from the linked list for the data item. It then checks whether other waiting requests on that data item can be granted. If the request can be granted, it is granted by the lock manager, and the next record, if any, is processed.

If a transaction aborts, the lock manager deletes all waiting lock requests by the transaction. In addition, the lock manager releases all locks acquired by the transaction and updates the records in the lock table.

5. Write a short note on deadlock.

Ans: Deadlock is a situation that occurs when all the transactions in a set of two or more transactions are in a simultaneous wait state and each of them is awaiting for the release of a data item held by one of the other waiting transaction in the set. None of the transactions can proceed until at least one of the waiting transactions releases lock on the data item. For example, consider the partial schedule of transactions T1 and T2 shown in Figure 10.3.

Figure 10.3 Partial Schedule

Observe that T1 is waiting for T2 to unlock Q, and T2 is waiting for T1 to unlock R. Thus, a situation is arrived where these two transactions can no longer continue with their normal execution. This situation is called deadlock. Now, one of these transactions must be rolled back by the system so that the data items locked by that transaction are released and become available to the other transaction.

6. What is the difference between lock-based technique and timestamp-based technique?

Ans: Both lock-based and timestamp-based techniques are used for concurrency control. The lock-based technique requires all the transactions in a schedule to follow some set of rules, which indicate when a transaction may lock or unlock any data item. Two-phase locking is a lock-based concurrency control technique that generates the serializable schedules based on the order in which the transactions acquire the lock on the data items. However, it does not ensure freedom from deadlock. In contrast, timestamp-based technique is a non-lock concurrency control technique, which involves ordering the execution of transactions in advance using timestamps. It also ensures serializability of schedules as well as deadlocks cannot occur.

7. Explain the two-phase locking protocol with the help of an example. What are its disadvantages? How can these disadvantages be overcome? What is the benefit of rigorous two-phase locking?

Ans: Two-phase locking (2PL) is a lock-based concurrency control technique that requires each transaction to be divided into two phases. During the first phase, the transaction acquires all the locks; during the second phase, it releases all the locks. The phase during which locks are acquired is growing (or expanding) phase. In this phase, the number of locks held by a transaction increases from zero to maximum. On the other hand, a phase during which locks are released is shrinking (or contacting) phase. In this phase, the number of locks held by a transaction decreases from maximum to zero. Whenever, a transaction releases a lock on a data item, it enters into the shrinking phase, and from this point, it is not allowed to acquire any lock further. Therefore, until all the required locks on the data items are acquired, the release of the locks must be delayed. The point in the schedule at which the transaction successfully acquires its last lock is called the lock point of the transaction.

To understand 2PL technique, consider two transactions T1 and T2 along with their lock requests given in Figure 10.4.

Figure 10.4 Transactions T1 and T2 with their Lock Requests

Figure 10.5 shows the transactions T1 and T2 written under two-phase locking.

Figure 10.5 Transactions T1 and T2 in Two-phase Locking

In Figure 10.5, the statements used for releasing the lock are written at the end of the transaction. However, such statements do not always need to appear at the end of the transaction to retain two-phase locking property. For example, the unlock(R) statement of T1 may appear just after the lock-X (Q) statement and still maintains the two-phase locking property.

Some disadvantages of the two-phase locking technique are as follows:

The two-phase locking leads to a lower degree of concurrency among transactions.

Cascading rollback occurs in two-phase locking.

There may be a chance of deadlock in 2PL.

To avoid cascading rollback in 2PL, a modification of two-phase locking called strict two-phase locking can be used. In the strict two-phase locking, a transaction does not release any of its exclusive-mode locks until it commits or aborts. Strict schedules ensure that no other transaction can read or write the data item exclusively locked by a transaction T until it commits. It makes strict schedules recoverable.

Rigorous two-phase locking is another more restrictive version of two-phase locking. The benefit of rigorous two-phase locking is that a transaction does not release any of its locks (both exclusive and shared) until it commits or aborts in it.

8. Define lock conversion. What do you understand by lock upgrade and lock downgrade?

Ans: A transaction can change the mode of locks from one mode to another on a data item on which it already holds a lock. This process of changing the mode of lock is referred to as lock conversion. Lock conversion helps in achieving more concurrency.

The changing of lock mode from shared (less restrictive) to exclusive (more restrictive) is known as lock upgrade. Similarly, changing the lock mode from exclusive to shared is known as lock downgrade. The lock on a data item can be upgraded only in the growing phase. On the other hand, the lock on a data item can be downgraded only in the shrinking phase.

9. Discuss the graph-based locking technique.

Ans: Two-phase locking ensures serializability even when the information about the order in which the data items are accessed is unknown. However, if the order in which the data items are accessed are known in advance, it is possible to develop other techniques to ensure conflict serializability. One such technique is graph-based locking.

In this technique, a directed acyclic graph called a database graph is formed, which consists of a set S = {A, B, …, L} of all data items as its nodes, and a set of directed edges. A directed edge from A to B (A → B) in the database graph denotes that any transaction Ti must access A before B when it needs to access both A and B. A simple kind of graph-based locking is tree-locking in which all database graphs are tree-structured, and any transaction Ti can acquire only exclusive locks on data items. A tree-structured database graph for the set S is shown in Figure 10.6.

Figure 10.6 Tree-structured Database Graph

A transaction Ti can acquire its first lock on any data item. Suppose Ti requests a lock on data item F, it is granted. After that, Ti can lock a data item only if it has already locked the parent of that data item. For example, if transaction Ti, needs to access L, it has to lock J before requesting lock on L. Note that locking the parent of any data item does not automatically lock that data item. Tree locking allows a transaction to unlock a data item at any time. However, once a transaction releases lock on a data item, it cannot relock that data item.

Some advantages of the tree-locking are as follows:

It ensures conflict serializability as well as freedom from deadlock.

Unlike two-phase locking, transactions can unlock data items earlier.

It ensures shorter waiting times and greater amount of concurrency.

The disadvantage of the tree-locking is that in order to access a data item, a transaction has to lock other data items even if it does not need to access them. For example, in order to lock data items A and L in Figure 10.6, a transaction must lock not only A and L, but also data items C, F, and J. Thus, the number of locks and associated locking overhead including possibility of additional waiting time is increased.

10. What is phantom problem for concurrency control and how does index locking resolve this problem?

Ans: Due to the dynamic nature of database, the phantom problem may arise. Consider the BOOKrelation of Online Book database that stores information about books including their price. Now, suppose that the PUBLISHER relation is modified to store information about average price of books that are published by corresponding publishers in the attribute Avg_price. Consider a transaction T1that verifies whether the average price of books in PUBLISHER relation for the publisher P001 is consistent with the information about the individual books recorded in BOOK relation that are published by P001. T1 first locks all the tuples of books that are published by P001 in BOOK relation and thereafter locks the tuple in PUBLISHER relation referring to P001. Meanwhile, another transaction T2inserts a new tuple for a book published by P001 into BOOK relation, and then, before T1 locks the tuple in PUBLISHER relation referring to P001, T2 locks this tuple and updates it with the new value. In this case, average information of T1 will be inconsistent even though both transactions follow two-phase locking, since new book tuple is not taken into account. The new book tuple inserted into BOOK relation by T2 is called a phantom tuple. This is because T1 assumes that the relation it has locked includes all information of books published by P001, and this assumption is violated when T2 inserted the new book tuple into BOOK relation.

Here, the problem is that T1 did not lock what it logically required to lock. Instead of locking existing tuples of publisher P001, it also needed to restrict the insertion of new tuples having P_ID=‘P001’. This can be done by using the index locking technique. In this technique, any transaction that tries to insert a tuple with predicate P_ID = ‘P001’ must insert a data entry pointing to the new tuple into the index and is blocked until Ti releases the lock. These indexes must be locked in order to eliminate the phantom problem. In order to access a relation, one or more indexes are used. In our example, assume that we have an index on BOOK relation for P_ID. Then, the entry in P_ID will be locked for P001. In this way, the creation of phantoms will be prevented since the creation requires the index to be updated and it also requires an exclusive lock to be acquired on that index.

11. Discuss two techniques for concurrency control in tree-structured indexes like B+-trees.

Ans: Two techniques for concurrency control in tree-structured indexes like B+-trees are as follows:

Crabbing: This technique proceeds in the similar manner as a crab walks, that is, releasing lock on a parent node and acquiring lock on a child node and so on alternately. To understand this technique, consider a transaction Ti that wants to insert a tuple in a relation on which a B+-tree index exists. Clearly, Ti also needs to insert a key-value (or corresponding entry) in the appropriate node. For this, the transaction Ti proceeds as follows:

Ti first locks the root node of tree in shared mode. Then, it acquires a shared lock on the child node, which is to be traversed next, and releases the lock on the parent node. This process is repeated till we traverse down to the appropriate leaf node.

Ti then upgrades its shared lock on leaf node to exclusive lock, and inserts the key-value there, assuming that the leaf node has space for the new entry. Otherwise, the node needs to be split.

In case of split, Ti acquires exclusive-mode lock on its parent, performs the splitting operation, and releases its lock on the parent node.

B-link tree: This technique is a variant of the B+-tree. In a B-link tree, every node of the tree (not just leaf nodes) maintains a pointer that points to its right sibling. This technique differs from crabbing as it allows the transactions to release locks on nodes even before acquiring locks on the child nodes. Thus, this allows more transactions to operate concurrently.

12. Explain locking granularity, coarse granularity and fine granularity.

Ans: Locking granularity is the size of the data item that the lock protects. It is important for the performance of a database system. Coarse granularity refers to large data item sizes such as an entire relation or a database, whereas fine granularity refers to a small data item sizes such as either a tuple or an attribute.

13. What is multiple-granularity locking? Under what situations is it used?

Ans: Generally, different transactions have dissimilar requests and require different levels of lock granularity on the basis of the operation being performed. For example, an attempt to modify a particular tuple requires only that tuple to be locked while an attempt to modify or delete multiple tuples requires an entire relation to be locked. Thus, it is desirable that the database system provides a range of locking granules called multiple-granularity locking. The efficiency of the locking mechanism can be improved by considering the lock granularity to be applied.

Multiple-granularity locking permits each transaction to use levels of locking that are most suitable for its operations. This implies that long transactions use coarse granularity and short transactions use fine granularity. In this way, long transactions can save time by using few locks on a large data item and short transactions do not block the large data item, when its requirement can be satisfied by the small data item.

The size of the data items to be locked influences the level of concurrency. While choosing locking granularity, trade-off between concurrency and overheads should be considered. In other words, the choice is between the loss of concurrency due to coarse granularity and increased overheads of maintaining fine granularity. This choice depends upon the type of the applications being executed and how those applications utilize the database. Since the choice depends upon the type of the applications, it is appropriate for the database system to support or use multiple-granularity locking.

As shown in Figure 10.7, the hierarchy of data items of various sizes can be represented in the form of a tree in which small data items are nested within larger data items. The hierarchy in this figure consists of database DB with three files, namely, F1, F2 and F3. Each file consists of several records as its child nodes. Here, notice the difference between the multiple-granularity tree and tree locking. In the multiple-granularity tree, each non-leaf node represents data associated with its descendents, whereas each node in tree locking is an independent data item.

Figure 10.7 Multiple-granularity Tree

Whenever a transaction acquires shared-mode or exclusive-mode lock on a node in multiple-granularity tree, the descendants of that node are implicitly locked in the same mode. Consider the scenario given here to clear this point: Suppose Ti acquires an exclusive lock on file F1 in Figure 10.7. In that case, it has an exclusive lock on all the records, that is, R11, …, R1n of that file. Observe that, in this way, Ti has to acquire only a single lock on F1 instead of acquiring locks on individual records of F1. Now, assume that Tj requests a shared lock on R11 of F1. Now, Tj must traverse from the root of the tree to record R11 to determine whether this request can be granted. If any node in that path is locked in an incompatible mode, the lock request for R11 is denied and Tj must wait.

14. What are intention locks? How does it provide a higher degree of concurrency?

Ans: Intention lock is a type of lock mode used in multiple-granularity locking in which a transaction intends to explicitly lock a lower level of the tree. While traversing from the root node to the desired node, all the nodes along the path are locked in intention mode. In simpler terms, no transaction is allowed to acquire a lock on a node before acquiring an intention-mode lock on all its ancestor nodes. For example, to lock a node R11 in Figure 10.7, a transaction needs to lock all the nodes along the path from root node to node R11 in an intention mode before acquiring lock on node R11.

To provide a higher degree of concurrency, the intention mode is associated with shared mode and exclusive mode. When intention mode is associated with shared mode, it is called intention-shared (IS) mode, and when it is associated with exclusive mode, it is called intention-exclusive (IX) mode. To lock a node in shared mode, all its ancestor nodes must be locked in intention-shared (IS) mode. Similarly, to lock a node in exclusive mode, all its ancestor nodes must be locked in intention-exclusive (IS) mode.

It is clear from the discussion that the lower level of the tree has to be explicitly locked in the mode requested by the transaction. However, it is undesirable if a transaction needs to access only a small portion of the tree. Thus, another type of lock mode, called shared and intention-exclusive (SIX) mode is introduced. The shared and intention-exclusive mode explicitly locks the subtree rooted by a node in the shared mode, and the lower level of that subtree is explicitly locked in exclusive-mode. This mode provides the higher degree of concurrency than exclusive mode because it allows other transactions to access the part of the subtree that is not locked in the exclusive mode.

15. What are the locking rules for multiple-granularity locking?

Ans: In multiple-granularity locking, each transaction Ti can acquire a lock on node N in any locking mode by following certain rules, which ensure serializability. These rules are given as follows:

Ti must adhere to the compatibility matrix given in Figure 10.8.

Ti must acquire any lock on the root of the tree before acquiring any lock on N.

Ti can acquire S or IS lock on N only if it has successfully acquired either IX or IS lock on the parent of N.

Ti can acquire X, SIX, or IX lock on N only if it has successfully acquired either IX or SIX lock on the parent of N.

Ti can acquire additional locks if it has not released any lock. Observe that this is requirement of two-phase locking.

Ti must release the locks in bottom-up order (that is, leaf-to-root). Therefore, Ti can release its lock on N only if it has released all its locks on the lower level of the subtree rooted by N.

Figure 10.8 Compatibility Matrix for Different Modes in Multiple-granularity Locking

16. What are the two factors that govern the performance of locking? Discuss how blocking of transactions can degrade the performance.

Ans: The two factors that govern the performance of locking are resource contention and data contention.

Resource contention refers to the contention over memory space, computing time and other resources. It determines the rate at which a transaction executes between its lock requests. On the other hand, data contention refers to the contention over data. It determines the number of currently executing transactions. Assume that the concurrency control is turned off; in that case the transactions suffer from resource contention. For high loads, the system may thrash, that is, the throughput of the system first increases and then decreases. Initially, the throughput increases since only few transactions request the resources. Later, with the increase in the number of transactions, the throughput decreases. If the system has enough resources (memory space, computing power, etc.) that make the contention over resources negligible, the transactions only suffer from data contention.

For high loads, the system may thrash due to blocking. Blocking of transactions can degrade the performance as it prevents other transactions to access the data item, which is held by blocked transaction. An instance of blocking is deadlock in which two more transactions are in a simultaneous wait state, and are waiting for some data item, which is locked by one of the other transactions. Hence, the transactions are blocked till the time one of the deadlocked transactions is aborted. Practically, it is seen that there are less than 1% of transactions, which are involved in a deadlock and there are comparatively lesser aborts. Thus, the system thrashes mainly due to blocking.

17. What is timestamp? How does a system generate timestamps?

Ans: Timestamp is a unique identifier assigned to transactions in the order they begin. If transaction Ti begins before transaction Tj, Ti is assigned a lower timestamp. The priority of a transaction will be higher if it is assigned lower timestamp. More precisely, the older transaction has the higher priority since it is assigned a lower timestamp. The timestamp of a transaction Ti is denoted by TS(Ti). Timestamp can be considered as the starting time of the transaction and it is assigned to each transaction Ti in the order in which the transactions are entered in the database system. Suppose a new transaction Tj enters in the system after Ti. In that case, the timestamp of Ti is less than that of Tj, which can be denoted as TS(Ti)< TS(Tj).

The timestamp of each transaction can be generated in two simple ways:

The value of system clock can be used as the timestamp. When a transaction enters the system, it must be assigned a timestamp that is equal to the value (that is, current date/time) of the system clock. It must be ensured that no two transactions are assigned the timestamp values at the same tick of the system clock.

A logical counter can be used which is incremented each time a transaction is assigned a timestamp. When a transaction enters the system, it must be assigned a value that is equal to the value of the counter. The value of the counter may be numbered as 1, 2, 3, … . A counter has finite number of values, so the value of the counter must be reset periodically to zero when no transaction is currently executing for some short period of time.

18. What is timestamp ordering? What are the values associated with timestamps?

Ans: When two concurrently executing transactions, namely, Ti and Tj are assigned the timestamp values TS(Ti) and TS(Tj), respectively, such that TS(Ti)< TS(Tj), the system generates a schedule equivalent to a serial schedule in which the older transaction Ti appears before the younger transaction Tj. This is termed as timestamp ordering (TO). Timestamp ordering can be implemented by a number of techniques including basic timestamp ordering, strict timestamp ordering and Thomas' Write Rule.

When the timestamp ordering is enforced, the order in which the data item is accessed by conflicting operations in the schedule does not violate the serializability order. In order to determine this, the following two timestamps values must be associated with each data item Q:

read_TS(Q): The read timestamp of Q; this is the largest timestamp among all the timestamps of transactions that have successfully executed read(Q).

write_TS(Q): The write timestamp of Q; this is the largest timestamp among all the timestamps of transactions that have successfully executed write(Q).

Whenever new read(Q) or write(Q) operations are executed, read_TS(Q) and write_ TS(Q) are updated.

19. Discuss basic timestamp ordering. How it is different from strict timestamp ordering protocol?

Ans: The basic timestamp ordering ensures that the transaction Ti is executed in the timestamp order whenever Ti requests read or write operation on Q. This is done by comparing the timestamp of Ti with read_TS(Q) and write_TS(Q). If the timestamp order is violated, the system rolls back the transaction Ti and restarts it with a new timestamp. In this situation, a transaction Tj, which may have used a value of the data item written by Ti, must be rolled back. Similarly, any other transaction Tk, which may have used a value of the data item written by Tj, must also be rolled back, and so on. This situation is known as cascading rollback, which is one of the problems with basic timestamp ordering.

Basic timestamp ordering operates as follows:

(1) Transaction Ti requests a read operation on Q:

(a) If TS(Ti)< write_TS(Q), the read operation is rejected and Ti is rolled back. This is because another transaction with higher timestamp (that is, younger transaction) has already written the value of Q, which Ti needs to read. The transaction Ti is too late in performing the required read operation and any other values it has acquired are expected to be inconsistent with the modified value of Q. Thus, it is better to roll back Ti and restart it with a new timestamp.

(b) If TS(Ti) ≥ write_TS(Q), the read operation is executed, and read_TS(Q) is set to the maximum of read_TS(Q) and TS(Ti). This is because Ti is younger than the transaction that has written the value of Q, which Ti needs to read. If the timestamp of Ti is more than the current value of read timestamp (that is, read_TS(Q)), the timestamp of Ti becomes the new value of read_TS(Q).

(2) Transaction Ti requests a write operation on Q:

(a) If TS(Ti)< read_TS(Q), the write operation is rejected and Ti is rolled back. This is because another transaction with higher timestamp (that is, younger transaction) has already read the value of Q, which Ti needs to write. Therefore, writing the value of Q by Ti violates the timestamp ordering. Thus, it is better to roll back Ti and restart it with a new timestamp.

(b) If TS(Ti)< write_TS(Q), the write operation is rejected and Ti is rolled back. This is because another transaction with higher timestamp (that is, younger transaction) has already written the value of Q, which Ti needs to write. Therefore, the value that Ti is attempting to write becomes obsolete. Thus, it is better to roll back Ti and restart it with a new timestamp.

If TS(TJ)≥ read_TS(Q) and TS(TJ)≥ write_TS(Q), the write operation is executed, and the value of TS(Ti) becomes the new value of write_TS(Q). This is because Ti is younger than both the transactions that have last written the value of Q and read the value of Q. Basic timestamp ordering executes the conflict operations in timestamp order; hence, it ensures conflict serializability. In addition, it is deadlock-free since no transaction ever waits.

The strict timestamp ordering is a variation of basic timestamp ordering. In addition to the basic timestamp ordering constraints, it follows another constraint when the transaction Ti, requests read or write operations on some data item. This constraint ensures a strict schedule, which guarantees recoverability in addition to serializability. Suppose a transaction Ti requests a read or write operation on Q and TS(Ti)> write_TS(Q), Ti is delayed until the transaction, say Tj, that wrote the value of Qhas committed or aborted. The strict timestamp ordering is implemented by locking Q that has been written by transaction Tj. Furthermore, the strict timestamp ordering ensures freedom from deadlock, since Ti waits for Tj only if TS(Ti)>TS(Tj).

20. Explain Thomas' Write Rule?

Ans: Thomas' Write Rule is the modification to the basic timestamp ordering, in which the rules for write operations are slightly different from those of basic timestamp ordering. The rules for a transaction Ti that request a write operation on data item Q are as follows:

(1) If TS(Ti)< read_TS(Q), the write operation is rejected and Ti is rolled back. This is because another transaction with higher timestamp (that is, younger transaction) has already read the value of Q, which Ti needs to write. Therefore, the value of Q that Ti is producing was previously needed, and it had been assumed that the value would never be produced.

(2) If TS(Ti)< write_TS(Q), the write operation can be ignored. This is because another transaction with a higher timestamp (that is, younger transaction) has immediately overwritten the value of Q, which Ti wrote. Therefore, no transaction can read the value written by Ti and hence, Ti is trying to write an obsolete value of Q.

(3) If both the conditions given in points (1) and (2) do not occur, the write operation of Ti is executed and write_TS(Q) is set to TS(Ti).

Thomas‘Write Rule does not enforce conflict serializability; however, it makes use of view serializability. It is possible to generate serializable schedules using Thomas’ Write Rule that are not possible under other techniques like two-phase locking, tree-locking, etc. To illustrate this, consider the schedule shown in Figure 10.9. In this figure, the write operation of T2 succeeds the read operation and precedes the write operation of T1. Hence, the schedule is not conflict serializable.

Figure 10.9 A Non-conflict Serializable Schedule

Under Thomas' Write Rule, the write(Q) operation of T1 is ignored. Therefore, the schedule shown in Figure 10.10 is view equivalent to the serial schedule of T1 followed by T2.

Figure 10.10 A Schedule Under Thomas' Write Rule

21. How are optimistic concurrency control techniques different from other concurrency control techniques? Why are they named so? Discuss the typical phases of an optimistic technique.

Ans: All the concurrency control techniques including locking and timestamp ordering result either in transaction delay or transaction rollback, thereby named as pessimistic techniques. These techniques require performing a check before executing any read or write operation. These checks can be expensive and represent overhead during transaction execution as they slow down the transactions. In addition, these checks are unnecessary overhead when a majority of transactions are read-only transactions. This is because the rate of conflicts among these transactions may be low. Therefore, these transactions can be executed without applying checks and still maintaining the consistency of the system by using an alternative technique, known as optimistic (or validation) technique. The optimistic technique is named so because the transactions execute optimistically and thereby assumes that few conflicts will occur among the transactions.

In optimistic concurrency control techniques, it is assumed that the transactions do not directly update the data items in the database until they finish their execution. Instead, each transaction maintains local copies of the data items it requires and updates them during execution. All the data items in the database are updated at the end of the transaction execution. In this technique, each transaction Tiproceeds through three phases, namely, read phase, validation phase, and write phase, depending on whether it is a read-only or an update transaction. The three phases of an optimistic technique are as follows:

Read phase: At the start of this phase, transaction Ti, is associated with a timestamp Start(Ti). Ti reads the values of data items from the database and these values are then stored in the temporary local copies of the data items kept in the workspace of Ti. All modifications are performed on these temporary local copies of the data items without updating the actual data items of the database.

Validation phase: At the start of this phase, transaction Ti is associated with a timestamp Validation(Ti). The system performs a validation test when Ti decides to commit. This validation test is performed to determine whether the modifications made to the temporary local copies can be copied to the database. In addition, it determines whether there is a possibility of Tito conflict with any other concurrently executing transaction. In case any conflict exists, Ti is rolled back, its workspace is cleared and Ti is restarted.

Write phase: In this phase, the system copies the modifications made by Ti in its workspace to the database only if it succeeds in the validation phase. At the end of this phase, Ti is associated with a timestamp Finish(Ti).

22. How is a transaction validated in an optimistic concurrency control technique?

Ans: Each transaction in an optimistic concurrency control technique goes through three phases, namely, read, validation and write phase. Accordingly, three different timestamps are assigned to each transaction: Start(Ti), validation(Ti) and Finish(Ti).

The validation test for a transaction Tj requires that, for each transaction Ti with TS(Ti)<TS(Tj), one of the following validation conditions must hold:

1. Finish(Ti)<Start(Tj): This condition ensures that the older transaction Ti must complete its all three phases before transaction Tj begins its start phase. Thus, this condition maintains serializability order.

2. Finish(Ti)<Validation(Tj): This condition ensures that Ti completes its write phase before Tjstarts its validation phase. It also ensures that the write_set of Ti does not overlap with the read_set of Tj. In addition, the older transaction Ti must complete its write phase before younger transaction Tj finishes its read phase and starts its validation phase. This is due to the reason that writes of Tj are not overwritten by writes of Ti Hence, it maintains the serializability order.

3. Validation(Ti)< Validation(Tj): This condition ensures that Ti completes its read phase before Tj completes its read phase. More precisely, this condition ensures that the write_set of Tidoes not intersect with the read_set or write_set of Tj. Since Ti completes its read phase before Tjcompletes its read phase, Ti cannot affect the read or write phase of Tj, which also maintains serializability.

To validate Tj, the first condition is checked for each committed transaction Ti such that TS(Ti)<TS(Tj). Only if the first condition does not hold, the second condition is checked. Further, if the second condition is false, the third condition is checked. If any one of the mentioned validation conditions holds, there is no conflict and Tj is validated successfully. If none of these conditions holds, there is conflict and the validation of Tj fails and it is rolled back and restarted.

23. Discuss the multiversion technique, for what purpose is it used? Discuss multiversion timestamp ordering and multiversion two-phase locking. Compare multiversion two-phase locking with basic two-phase locking.

Ans: Multiversion technique is a concurrency control technique, which keeps the old version (or value) of each data item in the system when the data item is updated. In this technique, several versions (or values) of a data item are maintained. When a transaction issues a write operation on the data item Q, it creates a new version of Q, the old version of Q is retained. When a transaction issues a read operation on the data item Q, an appropriate version of Q is chosen by concurrency control techniques in such a way that the serializability of the currently executing transactions be maintained.

Multiversion timestamp ordering

For each version of a data item, say Qi, the system maintains the value of version and associates the following two timestamps.

read_TS(Qi):The read timestamp of Qi; this is the largest timestamp among all the timestamps of transactions that have successfully read Qi.

write_TS(Qi): The write timestamp of Qi; this is the timestamp of the transaction that has written the value of Qi.

Consider a data item Q with the most recent version Qi, that is write_TS(Qi) is the largest among all the versions. Further, assume a transaction Ti with write_TS(Qi)<= TS(Ti). Now when Ti issues read(Q) request, the system returns the value of Qi and updates the value of read_TS(Qi) with TS(Ti) if read_TS(Qi)<TS(Ti). On the other hand, when Ti issues a write(Q) request, the following situations may occur:

read_TS(Qi)>TS(Ti), which means any younger transaction has already read the value of Qi. In this situation, Ti is rolled back.

TS(Ti)=write_TS(Qi). In this situation, the contents of Qi are overwritten.

If none of the above situation holds, a new version, say Qj, of Q is created with read_TS(Qi)=write_TS(Qi)=TS(Ti).

The main advantage of the multiversion timestamp ordering technique is that read requests by the transaction are never blocked. Hence, it is important for typical database systems in which read requests are more frequent than write requests. On the other hand, there are two main disadvantages of this technique, which are given here:

Whenever a transaction reads a data item, read_TS(Qi) is updated. It results in accessing the disk twice, that is, one for data item and another for updating read_TS(Qi).

Whenever two transactions conflict, one of them is rolled back (rather than wait) in order to resolve the conflict, which could result in cascading and hence, can be expensive.

Multiversion two-phase locking

In addition to read and write lock modes, multiversion two-phase locking provides another lock mode, that is, certify. In order to determine whether these lock modes are compatible with each other or not, consider Figure 10.11.

Figure 10.11 Compatibility Matrix for Multiversion Two-phase Locking

The term “YES” indicates that, if a transaction Ti, holds the lock on data item Q with the mode specified in column header and another transaction Tj requests to acquire the lock on Q with the mode specified in row header, then the lock can be granted. This is because the requested mode is compatible with the mode of lock held. On the other hand, the term “NO” indicates that the requested mode is not compatible with the mode of lock held, so, the transaction that has requested the lock must wait until the lock is released.

Unlike the locking technique, in multiversion two-phase locking, other transactions are allowed to read a data item while a transaction still holds an exclusive lock on the data item. This is done by maintaining two versions for each data item. One version, known as certified version, must be written by any committed transaction and second version, known as uncertified version, is created when an active transaction acquires an exclusive lock on a data item. Suppose a transaction, Tj, requests a shared lock on a data item Q, on which another transaction Ti holds an exclusive lock. In this situation, Tj is allowed to read the certified version of Q while Ti is writing the value of uncertified version of Q. However, once Ti is ready to commit, it must acquire a certify lock on Q. Since certify lock is not compatible with other locks (see Figure 10.11), Ti must delay its commit until there is no transaction accessing certified version of the data item in order to obtain the certify lock. Once Tiacquires the certify lock on Q, the value of uncertified version becomes the certified version of Q and the uncertified version is deleted.

The multiversion two-phase locking technique has an advantage over basic two-phase locking that many read operations can execute concurrently with a single write operation on a data item, which is not possible under two-phase locking. This technique, however, has an overhead that the transaction may have to delay its commit until the certify locks are acquired on all the data items it has updated. This technique avoids cascading rollbacks because transactions read certified version instead of uncertified version. Whereas, deadlock may occur if a read lock is allowed to upgrade to a write lock, and they must be handled using the some deadlock handling technique.

24. Explain how to prevent the deadlock.

Ans: Deadlock prevention ensures that deadlock never happens. Generally, this technique is used when the chances of occurring deadlock are high. The approaches to prevent the deadlocks are as follows:

Conservative 2PL: In this approach, each transaction locks in advance all the data items that it needs during its life- time.

Assigning an order to all the data items: In this approach, an ordering is imposed on all the data items in the database. Each transaction acquires locks on the data items in a sequence consistent with that order.

Using timestamps along with locking: In this approach, each transaction is assigned a priority using a unique timestamp, which determines whether a transaction is allowed to wait or is rolled back. A lower timestamp denotes higher priority and vice versa.

The first two approaches can be easily implemented if all the data items that are to be accessed by the transaction are known at the beginning. However, it is not a practical assumption because it is often difficult to predict what data items are needed by a transaction before it begins execution. In addition, both approaches limit concurrency since a transaction locks many data items that remain unused for a long duration. Thus, the third approach is mainly used for deadlock prevention.

25. Discuss “wait-die” and “wound-wait” techniques of deadlock prevention.

Ans: Both wait-die and wound-wait are deadlock prevention techniques using timestamps. To understand, consider a situation in which data item Q is locked by a transaction Ti and another transaction Tj issues an incompatible lock request on Q. In this situation, the wait-die and wound-wait will be described as follows:

Wait-die: If Ti has a lower timestamp (that is, higher priority), Ti is allowed to wait. Otherwise, Ti is rolled back (dies).

Wound-wait: If Ti has a lower timestamp (that is, higher priority), Tj is rolled back (Tj is wounded by Ti); otherwise, Ti waits.

Both the wait-die and wound-wait techniques have some similarities, which are as follows:

Starvation is avoided by both the wait-die and wound-wait techniques. In both the techniques, a transaction with the smallest timestamp is not rolled back. In addition, the new transactions are given a higher timestamp than the older transactions. Thus, a transaction that is rolled back repeatedly will eventually become the oldest transaction and has the highest priority. Observe that the oldest transaction has the smallest timestamp. Therefore, it will not be rolled back again and will be granted all the locks that it had requested.

The request to acquire a lock on a data item held by another transaction does not necessarily involve a deadlock. Therefore, unnecessary rollbacks may occur in both wait-die and wound-wait techniques.

In spite of these similarities, there are certain important differences between the wait-die and wound-wait techniques which are given in Table 10.1.

**Table 10.1** Wait-die and Wound-wait Technique
Wait-die Technique	Wound-wait Technique
• In this technique, the waiting that may be required by a transaction with higher priority (that is, older transaction) could be significantly higher.	• In this technique, an older transaction gets the greater probability of acquiring a lock on the data item. It never waits for a younger transaction to release the lock on its data item.
• In this technique, a transaction may be rolled back many times before acquiring the lock on the requested data item. For example, when a younger transaction Tirequests a data item `Q` held by an older transaction Tjthen it is rolled back. When it is restarted, it again requests a lock on `Q`. Now, if `Q` is still locked by Tj, Tiwill be rolled back again. In this way, Ti may be rolled back several times till it acquires lock on `Q`.	• Rollbacks in wound-wait technique are less as compared to wait-die technique. For example, when an older transaction Ti requests a data item `Q` held by a younger transaction Tj then Tj is rolled back. When it is restarted, it requests `Q`, which is now locked by Ti. In this situation, Tjwaits.

26. How is deadlock detected by wait-for graph? What are the measures for recovery from deadlock, explain.

Ans: To detect the deadlock, the system maintains a wait-for graph, which consists of nodes and directed arcs. The nodes of this graph represent the currently executing transactions, and there exists a directed arc from one node to another, if the transaction is waiting for another transaction to release a lock. If there exists a cycle in the wait-for graph, it indicates the deadlock in the system. If there exists a cycle in the wait-for graph, it indicates the deadlock in the system. To understand the creation of a wait-for graph, consider four transactions T1, T2, T3, T4, whose partial schedule is given in Figure 10.12.

In this schedule, the transaction T1 waits for transactions T2 to release the lock on the data item Q2. Similarly, T2 waits for T4, and T3 waits for T2 For this situation, the wait-for graph is represented in Figure 10.13.

Observe that there exists no cycle in this graph; thus, there is no deadlock. Now assume that the transaction T4 requests a lock on the data item Q4 held by T3, a directed arc is added in the wait-for graph from T4 to T3 (see Figure 10.14). The creation of this directed arc results in a cycle in the wait-for graph, thus, a deadlock occurs in the system in which the transactions T2, T3, and T4 are involved.

Figure 10.12 A Partial Schedule for T1 T2, T3 and T4

Figure 10.13 Wait-for Graph

Figure 10.14 Wait-for Graph with Cycle

The wait-for graph is periodically examined by the system to check the existence of deadlock. Once the deadlock is detected, there is a need to recover from the deadlock. The simplest solution is to abort some of the transactions involved in the deadlock, in order to allow other transactions to proceed. The transactions chosen to be aborted are called victim transactions. Some criteria should be followed to choose victim transaction, like the transaction with fewest locks, the transaction that has done the minimum work, and so on.

27. Draw a wait-for graph for the following requests and find whether the transactions are deadlocked or not.

Ans: In the given schedule, transactions T1 and T3 are waiting for the transaction T2 to release lock on the data item B and C respectively. The wait-for graph corresponding to the given schedule of transactions is shown below:

Since there is no cycle in the wait-for graph, the transactions are not deadlocked.

GATE 2018 NOTES ..

Friday, 31 March 2017

Concurrency Control

Concurrency Control Techniques

GATE 2018 NOTES

Amazing idea

Report Abuse