home
CFP
committees
dates
submission
program
keynotes
tutorials
workshops
awards
registration
venue & accomodations
contact
social events

 



Tutorial 2: Beyond the glamour of Byzantine Fault Tolerance: OR why resisting intrusions means more than BFT
Prof. Paulo Veríssimo (Faculty of Sciences, University of Lisbon)
http://homepages.di.fc.ul.pt/~pjv/pjv.html

Abstract: Byzantine Fault Tolerance (BFT) has become a reference paradigm for dealing with faults and intrusions, achieving security (and dependability) in an automatic way, much along the lines of classical fault tolerance. However, BFT is a means to an end --- intrusion tolerance and resilience --- and resilience to intrusions means actually more than BFT.

The explosive combination of the desired asynchrony of these systems with the real-life (and real-time) power of attackers, has brought about limitations of the paradigm as a basis for designing resilient systems, addressed by several researchers, some of which quite unexpected. Although recent practical algorithmic or systems fixes have partially improved the situation, we show that the problems have a formal root: exhaustion failure and the susceptibility of current BFT systems to it. We give several practical examples of the phenomenon.

The tutorial consolidates recent results pointing to the fact that there is more to designing resilient systems than BFT and that, surprisingly or not, not all BFT algorithms lead to resilient designs (resilience meaning the capacity of your system to fulfill its mission to the end in the presence of, perhaps harsh, accidents and attacks, i.e. faults and intrusions).

Firstly, we start by discussing the theoretical underpinnings: we propose a system predicate, called exhaustion safety (ES), that should in fact be met by any resilient-to-be BFT algorithm and system; we show impossibility results for ES in asynchronous BFT systems and show that they can be overcome under hybrid distributed systems models; we review recent algorithmic lower bounds that show the power of this latter model. Then, we review recent research results that address a complete approach to designing resilient BFT systems, especially in dynamic and long-lived environments. Concepts like consensus, state machine replication, proactive/reactive recovery/resilience, diversity, distributed systems hybridization, exhaustion safety, are put in context in a coherent whole, giving insight on the correct design of resilient systems: how to structure a BFT hybrid distributed system; how to design and show the correctness of BFT algorithms under hybrid models; how to actually solve the above-mentioned problems of BFT. Finally, extensive literature pointers are given, namely to works featuring a concern to achieve actual resilience against Byzantine faults. The matters of the tutorial have been presented and perfected over several editions, for example at PhD level courses at U. Roma la Sapienza, Carnegie Mellon, Swiss Romande >PhD Spring School, or recently at the INRIA Winter School on Hot Topics in Distributed Computing.

Tutorial elements:

  • General problem definition: prevention vs. tolerance vs. resilience
  • Specific problem definition: misconceptions and limitations w.r.t. Byzantine Fault Tolerance
  • Formalisation: exhaustion failure and exhaustion safety
  • Practical examples of the problems
  • Solutions: hybrid distributed system models
  • Validity of the hybrid approach: algorithms, lower bounds, related work


Short bio: Paulo Veríssimo is currently a professor of the Department of Informatics (DI) of the University of Lisboa Faculty of Sciences (http://www.di.fc.ul.pt/~pjv), and past Director of LASIGE, a research laboratory of the DI (http://lasige.di.fc.ul.pt). He is Fellow of the IEEE and Fellow of the ACM. He is associate editor of the Elsevier Int’l Journal on Critical Infrastructure Protection, and past associate editor of the IEEE Tacs. on Dependable and Secure Computing. He belonged to the European Security & Dependability Advisory Board. He is past Chair of the IEEE Technical Committee on Fault Tolerant Computing and of the Steering Committee of the DSN conference, and belonged to the Executive Board of the CaberNet European Network of Excellence. He was coordinator of the CORTEX IST/FET project (http://cortex.di.fc.ul.pt). Paulo Veríssimo leads the Navigators research group of LASIGE, and is currently interested in: architecture, middleware and protocols for distributed, pervasive and embedded systems, in the facets of real-time adaptability and fault/intrusion tolerance. He is author of more than 160 refereed publications in international scientific conferences and journals in the area, and co-author of five books (ex. http://www.navigators.di.fc.ul.pt/dssa/).