Workshop do Projeto ARGO – Junho/2001
Eden : a Consensus Based Group Communication System
Frédéric Tronel
IRISA/Université de Rennes (France)
ftronel@irisa.fr
Agreement problems are among the most difficult problems that distributed system designers have to cope with. These difficulties and the different ways to circumvent them have been detailed in the previous talk.
During this talk, we will focus on the implementation of a group communication system that rely on the use of a generic agreement block (whose specification is derived from the one of the consensus problem). This middleware is called Eden, and is divided into two major sets of components written in Java.
Eva : an Event Based Architecture for Building High Level Protocol
The first set of components on which Eden is built is called Eva. Eva [1] is a set of classes that provides the key features of an event-based architecture. This choice is plainly justified by the fact that most of the algorithms written in the field of distributed algorithms are formulated under a form that is very close to event based architectures. So they can be easily translated into the Eva semantics.
All the entities created by Eva are either producers or consumers (or both) of events. Events capture and extend the notion of messages usually encountered in distributed algorithms.
The design of Eva has been conducted having efficiency in mind, while at the same time trying to preserve simplicity, and openness. As instance, because events are short-life objects, they are generated by the mean of factories that are pools of objects acting as a filter for memory management. This is useful to decouple Eva from the garbage collecting of the Java virtual machine.
Moreover, events that have to be sent over the network need to be serialized. This mechanism is tightly combined with the factory paradigm, leading to good performances. Furthermore, Eva is able to serialized events in a hierarchical manner, allowing the partial deserialization of objects. This technology is very efficient since it allows to check if an event is carrying sufficiently significant information at one level, without requiring its full deserialization. Old events (encountered when message transfer delays are unstable) are discarded without being fully deserialized. This mechanism is also fully integrated in the memory management of factories.
Adam : an Integrated Agreement Framework for Building Group Communication Services
Adam is the second set of components that makes up Eden. It is written in the Eva style of writing high level protocol. That is, we have designed Java classes that synchronized by producing/consuming events (either locally, or across the network).
Adam is providing the necessary mechanisms to build up a group communication service. To do this, a generic agreement framework (called GAF [3] herein after) lies at the core of Adam.
The group communications abstraction provided by Adam can be used to provide an active replication scheme for fault tolerance. It can also be employed in all the distributed applications that require to take into account dynamical evolution of their membership.
To provide the strong semantics usually associated to group communication (e.g. virtual synchrony), one needs to solve many agreement problems, namely, atomic broadcast and membership [2]. As explained in the previous talk about agreement problems by Michel Hurfin, all these agreement problems can be viewed as particular instantiations of a common problem called Consensus. Since Adam is providing a powerful and generic class called GAF that can solve any agreement problem, it could have been possible to implement virtual synchrony by instantiating in different thread as many consensus blocks as needed (atomic broadcast, membership ...). Then we would have had to implement an external mechanism, whose goal would have been to synchronize the different decisions issued by the different consensus blocks. Indeed, in this implementation nothing guarantees that two different decisions issued by two different agreement service, are seen in the same order by all the site. Due to the inherent asynchronism of the system, it may well happen that two geographic distant sites could see the two decisions in different orders. This would lead to a misbehavior, and would surely violate the strong property of virtual synchrony. The price to pay would be to use a third part agreement to synchronize the different decisions issued by the different service.
We have discarded this design choice in favor of a better use of GAF, which naturally provides this third part synchronizer mechanism for free. Indeed, in its current implementation GAF is carrying out a infinite sequence of decisions. Each decision issued by GAF is subdivided into different decisions, one by agreement problems to be solved. The real semantics of these fields is hidden to GAF, that communicates these fields to specialized components in charge of their treatments. It is even possible to add and remove agreement problems on the y. However, it is still to the responsibility of the application to guarantee the correctness of such dynamic changes. But it is quite simple to implement a static agreement service that is always loaded into GAF at the start of the application and that can not be withdrawn by any process during the runtime. This service would be in charge of the necessary synchronization between the different group members to carry out adding and removing of agreement service in a consistent manner.
Note that the required synchronizations between the different agreement problems follow from the total order obtained on the sequence of decisions produced by GAF. Since, GAF is solving multiple agreement problems at the same time, it may happen that one agreement service is not able to provide any significant input at a given time. This is perfectly taken into account by GAF, that is able to decide empty values, when it is not provided with enough significant values.
Another interesting point in the implementation of Adam and GAF in particular, is that we do not rely on the use of traditional failure detectors to provide agreement. Indeed, failure detectors are relegated to the membership service, where they are used with long timeout delay to minimize the number of false suspicions. While instead GAF makes use of strict timeout to progress between successive rounds. This mechanism has many advantages :
References
[1] F. Brasileiro, F. Greve, M. Hurfin, J.P. Le Narzul, and F. Tronel. Eva: an event-based framework for developing specialised communication protocols. Research Report 1345, IRISA, Jul 2000.
[2] F. Greve, M. Hurfin, M. Raynal, and F. Tronel. Primary component asynchronous group membership as an instance of a generic agreement framework. In Proc. of the 5th IEEE Int. Symposium on Autonomous Decentralized Systems, Dallas, TX, Mar 2001.
[3] M. Hurfin, R. Macedo, M. Raynal, and F. Tronel. A general framework to solve agreement problems. In Proc. of the 18th IEEE Symposium on Reliable Distributed Systems, (SRDS-99), pages 56{67, Lausanne, Oct 1999. IEEE.