Workshop do Projeto ARGO

Workshop do Projeto ARGO – Junho/2001

A Review of Replication Control Protocols from the Communication Support Perspective

Udo Fritzke Jr^*

Departamento de Automação e Sistemas - DAS/UFSC

e-mail: udo@das.ufsc.br

Keywords: Transactions, groups, replication, broadcast, multicast.

Transactions and group communication are two structuring concepts that simplify design and implementation of fault-tolerant distributed applications. Systems supporting transactions ensure that users only read consistent data in spite of concurrent updates and failures. Group communication systems facilitate the management of groups of processes and offer broadcast or multicast primitives with different delivery properties; several such systems have been developed in the last decade [11].

Replication of data improves data availability in spite of process and communication failures. Transactions that access replicated data can execute even if sites (or communication links) are, permanently or temporary, not available. Several protocols for replication control have been proposed in the literature along the last twenty years. This paper briefly reports some results of our work [4] comparing important replication control protocols. Existent replication control protocols can be classified according to the communication primitives they use: point-to-point, broadcast or multicast. Particularly, we consider multicast and broadcast primitives implementing reliable and total ordered delivery of messages. In the following, we summarize the benefits of these delivery properties by stressing model and functional aspects they help to implement at the transactional level. We focus here on the replication and failure detection models assumed, and on the integration of atomic commitment and deadlock prevention.

Traditional protocols are based only on point-to-point communication [1]. They allow partial replication of data: each object (or data item) can have any number of copies. The use of group communication as a support for transactions on replicated data appeared recently (e.g.[8, 9, 10]). These protocols assume a total replication model in which a database is fully replicated by the processes of the distributed system. Total ordered deliveries easily provide one-copy equivalent executions, that can be extended to one-copy serializable ones with the help of well known locking mechanisms. Total replication presents however two major drawbacks in a context more general than database. Firstly, it prevents different data items to have different replication degrees, what represents a problem if some data have placement constraints. Secondly, such an approach does not scale well when the data to be replicated are distributed in a large scale system.

To overcome these limitations, keeping the benefits of group communication, Schiper and Raynal presented in [12] a partial replication model where each object is made fault-tolerant by a process group. While previous protocols using group communication are based on broadcast, a primitive that addresses messages to all processes of the replicated system, this partial replication model requires multicast primitives, that allow to send messages to subsets of processes composing the system. With the same replication model, but in opposition to the one-shot (or static) transactions of [12], we proposed in [5] a replication control protocol that handles ordinary transactions, which are dynamically programmed.

Much research has been spent to asynchronous distributed systems, aiming to understand the difficulty of solving agreement problems with unreliable failure detectors [3].This concerns replication control in that distributed atomic commitment and distributed locking can be considered agreement problems. Further research, however, has shown that both atomic broadcast and atomic multicast can be implemented in asynchronous systems [2, 6] when some hypotheses on the properties of failure detection mechanisms can be taken into account. Consequently, in opposition to point-to-point based replication, protocols based on those group communication services have the advantage of working without modifications even with unreliable failure detectors.

Additionally, reliability and total order directly implement distributed atomic commitment when either a total replication, or a partial replication model with one-shot transactions is used. With ordinary transactions on partially replicated data, transaction atomicity may be achieved with a straightforward two-phase commit protocol, when reliable and atomic multicasts are used. In point- to-point based communication, atomicity has to be implemented by complex ad-hoc non-blocking atomic commitment protocols.

In traditional point-to-point based protocols, copies are accessed directly by transactions, following different access rules (e.g. available copies and quorum). However, it has been shown [7] that if transactions access copies of data directly, they become quite susceptible to deadlocks. Protocols conceived to reduce deadlock probability (e.g. [7]) either constraint data placement, or do not ensure complete isolation of transactions. Replication control protocols based on multicast and broadcast avoid deadlocks by implementing transaction abort policies in some transaction conflict cases, and by taking benefit from total order of message deliveries. In most protocols, write operations of transactions are executed in the same order in all implied processes, and waits due to lock requests are kept acyclic. This cannot be enforced in usual point-to-point based replication, that have to make use of ad-hoc deadlock detection and resolution mechanisms.

We remark also that the use of genuine multicast implementations (that is, implementations where only the sender and the destination processes are involved in a message multicast), has a fundamental importance when one has to face large-scale distributed environments with partially replicated data. If broadcast is used in such a model, replication control would be related to the size of the whole distributed system, while it is desirable to limit communication costs to the replication degrees of actually accessed data items.

References

[1] P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison Wesley, Massachusetts, 1987.

[2] T. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. JACM, 43(1):225-267, March 1996.

[3] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, April 1985.

[4] Udo Fritzke Jr. Les systèmes transactionnels répartis pour données dupliquées fondés sur la communication de groupes. PhD thesis, Université de Rennes 1, January 2001.

[5] Udo Fritzke Jr. and Philippe Ingels. Transactions on partially replicated data based on reliable and atomic multicasts. In Proceedings of the IEEE ICDCS-21, Phoenix, Arizona, USA, April 2001.

[6] Udo Fritzke Jr., Philippe Ingels, Achour Mostefaoui, and Michel Raynal. Fault-tolerant total order multicast to asynchronous groups. In Proceedings of the 17h IEEE SRDS, West Lafayette, USA., October 1998.

[7] Jim Gray,Pat Helland, Patrick O'Neil, and Dennis Shasha. The dangers of replication and a solution. In Proceedings of the 1996 ACM SIGMOD, pages 1-10, Montreal, Canada, June 1996. ACM.

[8] JoAnne Holliday, Divyakant Agrawal, and Amr El Abbadi. The performance of database replication with group multicast. In Proceedings of the IEEE FTCS29, pages 158-165. IEEE, May 1999.

[9] Bettina Kemme and Gustavo Alonso. A suite of database replication protocols based on group communication primitives. In Proceedings of the IEEE ICDCS-18, pages 156-163, Amsterdam, The Netherlands, May 1998.

[10] F. Pedone, R. Guerraoui, and A. Schiper. Exploiting atomic broadcast in replicated databases. In Proceedings of the 4th International Euro-Par, LNCS 1470, pages 514{520, Southhampton, September 1998. Springer.

[11] David Powell. Special issue in group communication. CACM, 39(4), April 1996.

[12] André Schiper and Michel Raynal. From group communications to transactions in distributed systems. CACM, 39(4):84-87, April 1996.