GATE 2018 NOTES .. : Transaction Mnagement System Questions

1. Define transaction. What are the properties of a transaction? Explain with the help of an example.

Ans: A collection of operations that form a single logical unit of work is called a transaction. The operations that make up a transaction typically consist of requests to access existing data, modify existing data, add new data or any combination of these requests. The statements of a transaction are enclosed within the begin transaction and end transaction statements.

To ensure the integrity of the data, the database system must maintain some desirable properties of the transaction. These properties are known as ACID properties, the acronym derived from the first letter of the terms atomicity, consistency, isolation and durability.

Atomicity implies that either all of the operations that make up a transaction should execute or none of them should occur.

Consistency implies that if all the operations of a transaction are executed completely, the database is transformed from one consistent state to another.

Isolation implies that each transaction appears to run in isolation with other concurrently running transactions.

Durability (also known as permanence) implies that once a transaction is completed successfully, the changes made by the transaction persist in the database, even if the system fails.

For example, consider a transaction T₁ that transfers $100 from account A to account B. Let the initial values of account A be $2000 and account B be $1500. The sum of the values of account A and B is $3500 before the execution of transaction T₁. Since T₁ is a fund-transferring transaction, the sum of the values of account A and B should be $3500 even after its execution. The transaction T₁ can be written as follows:

T¹:

         read(A);
         A:=A-100;

         write(A);
         read(B);
         B:= B+100;
         write(B);

If transaction T₁ is executed, either $100 should be transferred from account A to B or neither of the accounts should be affected. If T₁ fails after debiting $100 from account A, but before crediting $100 to account B, the effects of this failed transaction on account A must be undone. This is the atomicity property of transaction T₁. The execution of transaction T₁ should also preserve the consistency of the database, that is, the sum of the values of account A and B should be $3500 even after the execution of T₁.

Now, let us consider the isolation property of T₁. Suppose that during the execution of transaction T₁ when $100 is debited from account A and not yet credited to account B, another concurrently running transaction, say T₂, reads the values of account A and B. Since T₁ has not yet completed, T₂ will read inconsistent values. The isolation property ensures that the effects of the transaction T₁ are not visible to other transaction T₂ until T₁ is completed. If the transaction T₁ is successfully completed and the user who initiated the transaction T₁ has been notified about successful transfer of funds, then the data regarding the transfer of funds should not be lost even if the system fails. This is the durability property of T₁. The durability can be guaranteed by ensuring that either

all the changes made by the transaction are written to the disk before the transaction is completed or

the information about all the changes made by the transaction is written to the disk and is sufficient to enable the database system to reconstruct the changes when the database system is restarted after the failure.

2. Explain state transition diagram. Explain, when a transaction is said to be failed.

Ans: State transition diagram is a diagram that describes how a transaction passes through various states during its execution. Whenever a transaction is submitted to a DBMS for execution, either it executes successfully or fails due to some reasons. During its execution, a transaction passes through various states that are active, partially committed, committed, failed and aborted as shown in Figure 9.1.

Figure 9.1 State Transition Diagram Showing Various States of a Transaction

A transaction enters into the active state with its commencement. At this point, the system marks BEGIN_TRANSACTION operation to specify the beginning of the transaction execution. During its execution, the transaction stays in the active state and executes several READ and WRITE operations on the database. The READ operation transfers a data item from the database to a local buffer of the transaction that has executed the read operation. The WRITE operation transfers the data item from the local buffer of the transaction back to the database.

Once the transaction executes its final operation, the system marks END_TRANSACTION operation to specify the end of the transaction execution. At this point, the transaction enters into the partially committed state. The actual output at this point may still be residing in the main memory and, thus, any kind of hardware failure might prevent its successful completion. In such a case, the transaction may have to be aborted.

Before actually updating the database on the disk, the system first writes the details of updates performed by the transaction in the log file. The log file is then written to the disk so that, even in case of failure, the system can re-construct the updates performed by the transaction when the system restarts after the failure. When this information is successfully written out in the log file, the system marks COMMIT_TRANSACTION operation to indicate the successful end of the transaction. Now, the transaction is said to be committed and all its changes must be reflected permanently in the database.

If the transaction is aborted during its active state or the system fails to write the changes in the log file, the transaction enters the failed state. The failed transaction must be rolled back to undo its effects on the database to maintain the consistency of the database. When the transaction leaves the system, it enters into the terminated state. At this point, the transaction information maintained in the log file during its execution is removed.

3. Why is it desirable to have concurrent execution of multiple transactions? What are the various anomalies that can occur due to undesirable interleaving of transactions? Explain by giving suitable examples.

Ans: The database system allows concurrent execution of transactions due to two reasons as given below:

To increase system throughput: A transaction performing read or write operation using I/O devices may not be using the CPU at a particular point of time. Thus, while one transaction is performing I/O operations, the CPU can process another transaction. This is possible because CPU and I/O system in the computer system are capable of operating in parallel. This overlapping of I/O and CPU activities reduces the amount of time for which the disks and processors are idle and, thus, increases the throughput of the system.

To reduce average response time: An interleaved execution of a short and a long transaction allows short transaction to be completed quickly. If a short transaction is executed after the execution of a long transaction in a serial order, the short transaction may have to wait for a long time, leading to unpredictable delays in executing a transaction. On the other hand, if these transactions are executed concurrently, it reduces the unpredictable delays, and thereby reduces the average response time.

If the transactions are interleaved in an undesirable manner, it may lead to several anomalies such as lost update, dirty read and unrepeatable read. To understand these anomalies, consider two transactions T₁ and T₂, where T₁ transfers $100 from account A to account B, and T₂ adds two percent interest to account A. Suppose that the initial values of account A and B are $2000 and $1500, respectively. Then, after serial execution of transactions T₁ and T₂ the value of account A should be $1938 and that of account B should be $1600.

Suppose that the operations of T and T₂ are interleaved in such a way that T₂ reads the value of account A before T₁ updates its value in the database. Now, when T₂ updates the value of account A in the database, the value of account A updated by the transaction T₁ is overwritten and hence is lost. This is known as lost update problem. The value of account A at the end of both the transactions is $2040 instead of $1938, which leads to data inconsistency. The interleaved execution of transactions T₁ and T₂ that leads to lost update problem is shown in Figure 9.2(a).

The second problem occurs when a transaction fails after updating a data item, and before this data item is changed back to its original value, another transaction reads this updated value. For example, assume that T_j fails after debiting $100 from account A, but before crediting this amount to account B. This will leave the database in an inconsistent state. The value of account A is now $1900, which must be changed back to the original one, that is, $2000. However, before the transaction T_j is rolled back, let another transaction T₂ reads the incorrect value of account A. This incorrect value of account A that is read by transaction T₂ is called dirty data, and the problem is called dirty read problem. The interleaved execution of transactions T_j and T₂ that leads to dirty read problem is shown in Figure 9.2(b).

The third problem occurs when a transaction tries to read the value of data item twice, and another transaction updates the same data item in between the two read operations of the first transaction. As a result, the first transaction reads varied values of same data item during its execution. This is known as unrepeatable read. For example, consider a transaction T₃ that reads the value of account A. At this point, let another transaction T₄ updates the value of account A. Now if T₃ again tries to read the value of account A, it will get a different value. As a result, the transaction T₃ receives different values for two reads of account A. The interleaved schedule of transactions T₃ and T₄ that leads to a problem of unrepeatable read is shown in Figure 9.2(c).

Figure 9.2 Anomalies Due to Interleaved Execution

4. Define the following terms in context of a transaction:

(a) Schedule

Ans: A list of operations (such as reading, writing, aborting or committing) from a set of transactions is known as a schedule (or history). A schedule should comprise all the instructions of the participating transactions and also preserve the order in which the instructions appear in each individual transaction.

(b) Non-serial schedule

Ans: In a multi-user database system, several transactions are executed concurrently for efficient use of system resources. When two transactions are executed concurrently, the operating system may execute one transaction for some time, then perform a context switch and execute the second transaction for some time, and then switch back to the first transaction, and so on. Thus, when multiple transactions are executed concurrently, the CPU time is shared among all the transactions. The schedule resulted from this type of interleaving of operations from various transactions is known as non-serial schedule.

(c) Cascading rollback

Ans: Cascading rollback is a situation in which failure of single transaction leads to a series of transaction rollbacks.

(d) Blind writes

Ans: Blind writes are those write operations which are performed without performing the read operation.

(e) Topological sorting

Ans: The process of ordering the nodes of an acyclic preceding graph is known as topological sorting. The topological ordering produces equivalent serial schedule.

(f) Observable external writes

Ans: There are some situations when a transaction needs to perform write operations on a terminal or a printer. These writes are known as observable external writes. Since the content once written on the printer or the terminal cannot be erased, the database system allows such writes to take place only after the transaction has entered the committed state.

(g) Compensating transaction

Ans : Compensating transaction is a transaction which reverses the effect of a committed transaction.

5. Define serial schedule. Why is it always considered to be correct?

Ans: Serial schedule is a schedule consisting of a sequence of instructions from various transactions, where the operations of one single transaction appear together in that schedule. Each transaction in a serial schedule is executed independently without any interference from the operations of other transactions. As long as every transaction is executed from beginning to end without any interference from other transactions, it gives a correct end result on the database. Therefore, every serial schedule is considered to be correct. Hence, for a set of n transactions, n! different valid serial schedules are possible. The serial schedule of transactions T₁ and T₅ in the order T₅ followed by T₁ is shown in Figure 9.3.

Figure 9.3 Serial Schedule in the Order T₅ Followed by T₁

6. Explain serializable schedule by giving an example.

Ans: The consistency of the database under concurrent execution can be ensured by interleaving the operations of transactions in such a way that the final output is the same as that of some serial schedule of those transactions. Such a schedule is referred to as serializable schedule. Thus, a schedule S of n transactions T₁, T₂, T₃, …, T_n is serializable if it is equivalent to some serial schedule of the same n transactions.

Figure 9.4 shows a non-serial schedule of transactions T₁ and T₅, which is equivalent to serial schedule shown in Figure 9.3. After the execution of this schedule, the final values of accounts A, B and C are $1900, $1550 and $550. Thus, the sum A + B + C is preserved and, hence, it is a serializable schedule.

Figure 9.4 Serializable Schedule

7. What are result equivalent schedules? Explain with example.

Ans: Two different schedules may produce the same final state of the database. Such schedules are known as result equivalent, since they have the same effect on the database. However, in some cases, two different schedules may accidentally produce the same final state of database. For example, consider two schedules S_i and S. that are updating the data item Q, as shown in Figure 9.5. Suppose that the value of data item Q is $100 then the final state of database produced by schedules S_i and S_j is the same, that is, they produce the same value of Q ($200).

Figure 9.5 Result Equivalent Schedules

8. Discuss the two different forms of schedule equivalence.

Ans: Two different forms of schedule equivalence are conflict equivalence and view equivalence that lead to the notions of conflict serializability and view serializability.

Conflict equivalence and conflict serializability: Two operations in a schedule are said to conflict if they belong to different transactions, access the same data item, and at least one of them is the write operation. On the other hand, two operations belonging to different transactions in a schedule do not conflict if both of them are read operations or both of them are accessing different data items. To understand the concept of conflicting operations in a schedule, consider two transactions T₆ and T₇ that are updating the data items Q and R in the database. A non-serial schedule of these transactions is shown in Figure 9.6.

Figure 9.6 Non-serial Schedule Showing Conflicting Operations

In Figure 9.6, the write(Q) operation of T₆ conflicts with the read(Q) operation of T₇ because both the operations are accessing the same data item Q, and one of these operation is the write operation. On the other hand, the write(Q) operation of T₇ is not conflicting with the read(R) operation of T₆, because both are accessing different data items Q and R.

If a schedule S can be transformed into a schedule S‘ by performing a series of swaps of non-conflicting operations, then S and S’ are conflict equivalents. Note that while swapping the order of execution of two conflicting operations cannot be changed because if they are applied in different order, they can have different effect on the database or on the other transactions in the schedule. Thus, two schedules are said to be conflict equivalent if the order of any two conflicting operations is same in both the schedules. A schedule, which is conflict equivalent to some serial schedule, is known as conflict serializable.

View equivalence and view serializability: Two schedules S and S' are said to be view equivalent if the schedules satisfy these conditions.

The same set of transactions participates in S and S‘, and S and S’ include the same set of operations of those transactions.

If the transaction T_i reads the initial value of a data item say Q in schedule S, then T_i must read the initial value of same data item Q in schedule S' also.

For each data item Q, if T_i executes read(Q) operation after the write(Q) operation of transaction T_j in schedule S, then T_i must execute the read(Q) operation after the write(Q) operation of T_j in schedule S' also.

If the transaction T_i performs the final write operation on any data item say Q in schedule S, then it must perform final write operation on the same data item Q in schedule S' also.

A schedule S is said to be view serializable if it is view equivalent to some serial schedule.

9. What is a precedence graph? How can it be used to test the conflict serializability of a schedule?

Ans: A precedence graph is a directed graph G=(N,E) where N={T₁, T₂, …, T_n} is a set of nodes and E = {e₁, e₂, …, e_n} is a set of directed edges. For each transaction T_i in a schedule, there exists a node in the graph. An edge e_i in the graph is shown as (T_i → T_j), where T_i is the starting node and T_j is the ending node of the edge e_i. The edge e_i is created if one of the operations in T_i appears in the schedule before some conflicting operation in T_j.

The algorithm to test the conflict serializability of a schedule S is given as follows:

1. For each transaction T_i participating in the schedule S, create a node labelled T_i.

2. If T_j executes a read operation after T_i executes a write operation on any data item say Q, create an edge (T_i → T_j) in the precedence graph.

3. If T_j executes a write operation after T_i executes a read operation on any data item say Q, create an edge (T_i → T_j) in the precedence graph.

4. If T executes a write operation after T>_j executes a write operation on any data item say Q, create an edge (T_i → T_j) in the precedence graph.

5. If the resultant precedence graph is acyclic, that is, it does not contain any cycle, then the schedule S is considered to be conflict serializable. If a precedence graph contains a cycle, the schedule S is not conflict serializable.

An edge from T_i to T_j in the precedence graph implies that the transaction T_i must appear before T_j in any serial schedule S‘ that is equivalent to S. Thus, if there are no cycles in the precedence graph, a serial schedule S’ equivalent to S can be created by ordering the transactions in such a way that whenever an edge (T_i → T_j) exists in the precedence graph, T must appear before T. in the equivalent serial schedule S'.

For example, consider the schedule shown in Figure 9.5. The precedence graph of this schedule is shown in Figure 9.7(a). It contains two nodes, one for each transaction—T₆ and T₇. An edge from T₆ to T₇ is created, which is labelled with the data items Q and R that led to the creation of this edge. Since there are no cycles in the precedence graph, this schedule is a conflict serializable schedule. The serial schedule equivalent to Schedule 5 is shown in Figure 9.6(b).

Figure 9.7 Testing Conflict Serializability

10. What are recoverable and cascadeless schedules? Explain.

Ans: If, in a schedule S, a transaction T_j reads a data item written by T_i, then the commit operation of T_i should appear before the commit operation of T_j. Such a schedule is known as recoverable schedule.

Sometimes, even if a schedule is recoverable, several transactions may have to be rolled back in order to recover completely from the failure of a transaction T_i It happens if transactions have read the value of a data item written by T_i. For example, consider the schedule shown in Figure 9.8. In this schedule, the transaction T₈ reads the value of data item Q written by transaction T₉, and transaction T₁₀ reads the value of data item Q written by transaction T₈.

Now suppose that the transaction T₈ fails due to any reason, it has to be rolled back. Since transaction T₉ has read the data written by T₈, it also needs to be rolled back. The transaction T₁₀ also needs to be rolled back because it has in turn read the data written by T₉. Such a situation, in which failure of single transaction leads to a series of transaction rollbacks, is called cascading rollback.

Cascading rollback requires undoing of significant amount of work and, thus, is undesirable. Therefore, schedules should be restricted to avoid cascading rollbacks. The schedules that avoid cascading rollbacks are known as cascadeless schedule. In cascadeless schedules, transactions T_i and T_j are interleaved in such a way that T_j reads a data item previously written by T_i however, commit operation of T_i appears before the read operation of T_j.

Figure 9.8 Schedule Resulting Cascading Rollback

11. Discuss the four levels of isolation in SQL. What are the possible violations for these levels?

Ans: SQL:92 supports four levels of transaction isolation. The isolation levels reduce the isolation between concurrently executing transactions. The level providing the greatest isolation from other transactions is SERIALIZABLE. It is the default level of isolation. At this level, a transaction is fully isolated from the changes made by other transactions. The other isolation levels are READ UNCOMMITTED, READ COMMITTED and REPEATABLE READ. READ UNCOMMITTED is the lowest level of isolation. At this level, a transaction can read subsequent changes made by other transactions, either committed or uncommitted. With read uncommitted isolation level, one or more of the three problems, namely, dirty read, unrepeatable read and phantom read may occur.

In READ COMMITTED isolation level, dirty reads are not possible since a transaction is not allowed to read the updates made by uncommitted transactions. However, unrepeatable reads and phantoms are possible. Phantom read problem occurs when a transaction T reads a set of rows from a table based on some condition specified in WHERE clause, meanwhile another transaction T inserts a new row into that table that also satisfies the condition and commits. Now, if T is again executed, then it will see a tuple (or phantom tuple) that previously did not exist.

In REPEATABLE READ isolation level, dirty reads and unrepeatable reads are not possible but phantoms are possible. In SERIALIZABLE isolation level, dirty reads, unrepeatable reads, and phantoms are not possible. The possible violations for different transaction isolation levels are summarized in Table 9.1.

Table 9.1 Possible violations for different isolation levels

12. Discuss the commit and rollback statements in SQL.

Ans: The COMMIT statement terminates the current transaction and makes all the changes made by the transaction permanent in the database. That is, it commits the changes to the database. The COMMIT statement has the following general format:

COMMIT [WORK]

where WORK is an optional keyword that does not change the semantics of COMMIT.

The ROLLBACK statement, on the other hand, terminates the current transaction and undoes all the changes made by the transaction. That is, it rolls back the changes to the database. The ROLLBACK statement has the following general format:

ROLLBACK [WORK]

In this case also, WORK is an optional keyword that does not change the semantics of ROLLBACK.

13. Consider two transactions T₁ and T₂ given below:

Give a serial schedule for these transactions and an equivalent non-serial schedule for these transactions.

Ans: Serial schedule

Non-serial schedule

14. Consider the following precedence graph. Is the corresponding schedule conflict serializable? Give reasons also.

Ans: The corresponding schedule will be conflict serializable as the precedence graph is acyclic.

15. Consider three transactions T₁, T₂ and T₃ and two schedules S₁ and S₂ given below. Draw the precedence graph for schedules S₁ and S₂ and test whether they are conflict serializable or not.

T₁: r₁ (P); r₁ (R); w₁ (P);
T₂: r₂ (R); r₂(Q); w₂ (R); w₂ (Q);
T₃: r₃ (P); r₃ (Q); w₃ (Q);
S₁: r₁ (P); r₂ (R); r₁ (R); r₃ (P); r₃ (Q); w₁ (P); w₃ (Q); r₂ (Q); w₂ (R);
w₂ (Q);
S₂: r₁ (P); r₂ (R); r₃ (P); r₁ (R); r₂ (Q); r₃ (Q); w₁ (P); w₂ (R); w₃ (Q);
w₂(Q);

Ans: Schedule S₁

The precedence graph for this schdeule is shown below:

This schedule is conflict serializable since the precedence graph is acyclic. Schedule S₂

This schedule is not conflict serializable since the precedence graph is cyclic.

16. Consider schedule S₃ given below. Determine whether it is cascadeless, recoverable or non-recoverable.

S₃: r₁(P); r₂(R); r₁(R); r₃(P); r₃(Q); w₁(P); w₃(Q); r₂(Q); w₃(R); w₂(Q); c₁ c₂; c₃

Ans:

Here, transaction T₂ reads the data item Q written by T₃. If transaction T₃ fails after T₂ commits, its effects need to be undone. Since T₂ has read the values of Q written by transaction T₃, it also has to be rolled back. However, T₂ has already been committed, thus cannot be rolled back. Therefore, this schedule is a non-recoverable schedule.

17. Give an example of SQL transaction.

Ans: An example of an SQL transaction consisting of multiple SQL statements is given in Figure 9.9. In this transaction, two SQL statements are given that update the BOOK and PUBLISHER relations of Online Book database. The first statement updates the price of all the books published by publisher P001. The next statement updates the email-id of the publisher with P_ID="P002". If an error occurs on any SQL statement, the entire transaction is rolled back. That is, any updated value would be restored to its original value.

Figure 9.9 An Example of SQL Transaction

GATE 2018 NOTES ..

Wednesday, 29 March 2017

Transaction Mnagement System Questions

No comments:

Post a Comment

GATE 2018 NOTES

Amazing idea

Report Abuse