Distributed Concurrency Control and Two-Phase Commit Protocol
Concurrency control becomes especially important in the distributed database environment because multisite, multiple-process operations are more likely to create data inconsistencies and deadlocked transactions than single-site systems are. For example, the TP component of a DDBMS must ensure that all parts of the transaction are completed at all sites before a final COMMIT is issued to record the transaction.
Suppose that each transaction operation was committed by each local DP, but one of the DPs could not commit the transaction’s results. Such a scenario would yield the problems illustrated in the following figure.
The transaction(s) would yield an inconsistent database, with its inevitable integrity problems, because committed data cannot be uncommitted! The solution for the problem is illustrated in the figure is a two-phase commit protocol.
Evoluation of DDBMS
Distributed Processing and Distributed Databases
Advantages and Disadvantages of DDBMS
Components of DDBMS
Two-Phase Commit Protocol
Centralized databases require only one DP. All database operations take place at only one site, and the consequences of database operations are immediately known to the DBMS. In contrast, distributed databases make it possible for a transaction to access data at several sites. A final COMMIT must not be issued until all sites have committed their parts of the transaction. The two-phase commit protocol guarantees that if a portion of a transaction operation cannot be committed; all changes made at the other sites participating in the transaction will be undone to maintain a consistent database state.
Each DP maintains its own transaction log. The two-phase commit protocol requires that the transaction entry log for each DP be written before the database fragment is actually updated. The two-phase commit protocol requires a DO-UNDO-REDO protocol and a write-ahead protocol.
The DO-UNDO-REDO protocol is used by the DP to roll back and/or roll forward transactions with the help of the system’s transaction log entries. The DO-UNDO-REDO protocol defines three types of operations:
• DO performs the operation and records the “before” and “after” values in the transaction log.
• UNDO reverses an operation, using the log entries written by the DO portion of the sequence.
• REDO redoes an operation, using the log entries written by the DO portion of the sequence.
To ensure that the DO, UNDO, and REDO operations can survive a system crash while they are being executed, a write-ahead protocol is used. The write-ahead protocol forces the log entry to be written to permanent storage before the actual operation takes place.
The two-phase commit protocol defines the operations between two types of nodes: the coordinator and one or more subordinates, or cohorts. The participating nodes agree on a coordinator. Generally, the coordinator role is assigned to the node that initiates the transaction.
The protocol is implemented in two phases:
Phase 1: Preparation
1. The coordinator sends a PREPARE TO COMMIT message to all subordinates.
2. The subordinates receive the message; write the transaction log, using the write-ahead protocol; and send an acknowledgment (YES/PREPARED TO COMMIT or NO/NOT PREPARED) message to the coordinator.
3. The coordinator makes sure that all nodes are ready to commit, or it aborts the action.
If all nodes are PREPARED TO COMMIT, the transaction goes to Phase 2. If one or more nodes reply NO or NOT PREPARED, the coordinator broadcasts an ABORT message to all subordinates.
Phase 2: The Final COMMIT
1. The coordinator broadcasts a COMMIT message to all subordinates and waits for the replies.
2. Each subordinate receives the COMMIT message, and then updates the database using the DO protocol.
3. The subordinates reply with a COMMITTED or NOT COMMITTED message to the coordinator.
If one or more subordinates did not commit, the coordinator sends an ABORT message, thereby forcing them to UNDO all changes.
The objective of the two-phase commit is to ensure that each node commits its part of the transaction; otherwise, the transaction is aborted. If one of the nodes fails to commit, the information necessary to recover the database is in the transaction log, and the database can be recovered with the DO-UNDO-REDO protocol.
Distributed Database Design Concepts
Different Types of Distribution Transparency