Formal Verification of Fault-Tolerant and Recovery Mechanisms for Safe Node Sequence Protocol.

The main idea is to implement a fault tolerance strategy using a fault tree and an ordered set of recovery modules. ... This paper presents a generic approach to specify a fault tolerant robot controller, and its implementation and validation with ROS and Gazebo. ... Basically, a fault tolerance mechanism (FTM) is composed of a detection module (DM) and a recovery module (RM). ...

doi:10.1109/dsn-w50199.2020.00031 dblp:conf/dsn/FavierMGFL20 fatcat:2v3jo6gu3zgx5j2u5x2ikjukze

Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. ... Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. ... Ricketts, and Ryan Stutsman. We also thank Nate Foster for shepherding our paper, and the anonymous reviewers for their helpful and insightful feedback. ...

doi:10.1145/2737924.2737958 dblp:conf/pldi/WilcoxWPTWEA15 fatcat:psuhauftirc55n6vkk3bpt2sfy

Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. ... Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. ... Ricketts, and Ryan Stutsman. We also thank Nate Foster for shepherding our paper, and the anonymous reviewers for their helpful and insightful feedback. ...

doi:10.1145/2813885.2737958 fatcat:c4mh5tkhdjdzdi2h7qkkbpfrdq

In particular, VBFT tolerates f = n-1/3 faults (which is the best possible), guarantees strong safety for honest leaders, and requires no trusted hardware. ... Existing low-latency protocols have achieved consensus with just two communication steps by reducing the maximum number of faults the protocol can tolerate (from f = n-1/3 to f = n+1/5), by relaxing protocol ... Fault tolerance improves the protocol resilience in comparison to the fast BFT protocols that have trade fault tolerance for lower latency. ...

arXiv:2310.09663v5 fatcat:5hmzizeswzgd7igf2vusrewo2i

Multiple Versions

These extensions add runtime verification mechanisms and selfhealing capabilities via new reusable nodes, some of them leveraging meta-programming techniques. ... As they permeate our everyday lives, more of them become safety-critical, increasing the need for making them testable and fault-tolerant, with minimal human intervention. ... The use of runtime verification probes the system for errors, and the self-healing is accomplished by the activation of system recovery and/or maintenance of health mechanisms. ...

doi:10.1007/978-3-030-50426-7_27 fatcat:t6y465y5ujax3d6fmxjcqy5iha

(For more information, see the "Background and related works" sidebar.) Here, we focus on fault tolerance modeling and the control of activity flow in fault tolerance mechanisms (FTMs). ... Removal of fault tolerance deficiencies Due to the very negative impact of deficiencies affecting the design and/or implementation of FTMs, the early verification of these mechanisms is essential. ... We can identify fault tolerance chains that are composed of several mechanisms in series such as coding and decoding devices, error detection and error recovery, and broadcast and vote. ...

doi:10.1109/40.782569 fatcat:wvly2wjecvfonelnzuryofdr4a

AbstractÐThe development and validation of fault-tolerant computers for critical real-time applications are currently both costly and time consuming. ... The two dimensions of physical redundancy allow the definition of a wide variety of instances with different fault tolerance strategies. ... Formal Verification Formal approaches were used both for specification and as a design-aid. ...

doi:10.1109/71.774908 fatcat:6fts7tmq5veilfhkejnm744b34

Previous protocols have achieved consensus with just two communication steps either by reducing the bound on the number of faults the protocol can tolerate (f ≤n-1/5) or use of trusted hardware like Trusted ... Fast B4B can tolerate maximum number of faults a partial BFT consensus can tolerate (f ≤n-1/3). Furthermore, Fast B4B does not require the use of any trusted hardware. ... A protocol is R-safe against all Byzantine faults if the following statement holds: in the presence of f Byzantine nodes, if 2 f + 1 nodes or f + 1 correct (honest) nodes commit a block at the sequence ...

arXiv:2109.14604v2 fatcat:skot5ijs5nadfk7trre2j4vvvq

Multiple Versions

This report is a slightly modified version of the first chapter of the monograph Fault tolerance techniques for high-performance computing edited by Thomas Herault and Yves Robert, and to be published ... We present the main two protocols, namely coordinated checkpointing and hierarchical checkpointing. Then we introduce performance models and use them to assess the performance of theses protocols. ... The research presented in this report was supported in part by the French ANR (Rescue project) and by contracts with the DOE through the SUPER-SCIDAC project, and the CREST project of the Japan Science ...

doi:10.1007/978-3-319-20943-2_1 fatcat:2rehvmh6bvdsvbru6ukpcij6ae

The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adoption in commercial distributed applications. ... We also show that ZZ can handle simultaneous failures and achieve sub-second recovery. ... Agreement nodes in ZZ are capable of detecting invalid execution or checkpoint messages; the fault detection and recovery steps for each of these are identical, so for brevity we focus on invalid or missing ...

doi:10.1145/1966445.1966457 dblp:conf/eurosys/WoodSVSC11 fatcat:oyzrb64apve3bdej33xuv74k4a

In particular, we present and implement a BDD-based synthesis heuristic for adding masking fault-tolerance to existing fault-intolerant distributed programs automatically. ... Intuitively, a program is masking fault-tolerant, if it satisfies its safety and liveness specifications in the absence and presence of faults. ... And, such deadlock freedom can be achieved by adding safe recovery and no state elimination is required. Formally, the fault-tolerant of Infuse is the following program: IF ′ 0 j :: (r.j = r. ...

doi:10.1007/s00446-011-0139-3 fatcat:mcom6lyexvgrxpz3wgowawrosm

Fault tolerance will be an ineludible consideration for extreme-scale computing. ... We introduce Camel, a protocol that has a low memory overhead for multicast and reduction operations. ... This work also used machine resources from PARTS project and Directors discretionary allocation on Intrepid at ANL for which authors thank the ALCF and ANL staff. ...

doi:10.1007/s11227-015-1402-3 fatcat:zqylwttdbbhnze4bmwkztppkhy

Byzantine fault-tolerant (BFT) state machine replication (SMR) is regarded as an ideal candidate that can tolerate arbitrary faulty behaviors. ... For each representative protocol, we conduct an in-depth discussion of its most important architectural building blocks as well as the key techniques they used. ... And, a replica recovery mechanism is required to recover replicas from faults. ...

dblp:journals/iacr/Wang21c fatcat:wggbkbi25fg43fieebp4qzlk3m

This robustness is largely due to the widespread belief in a set of guidelines for critical design decisions such as where to initiate recovery and how to maintain state. ... In this paper we propose a set of six design guidelines for improving the robustness of network protocols to these kinds of arbitrary failures. ... In particular, cryptographic authentication, fault-tolerance via consensus, and formal (and informal) protocol specification are extremely useful, and should be used whenever appropriate. ...

doi:10.1145/774763.774783 fatcat:k7hw3glvz5g6njbgdp5fjzywcu

Our technique for synthesis is based on the use of (bi)simulation algorithms for capturing different fault-tolerance classes, and the extension of a synthesis algorithm for CTL to cope with dCTLspecifications ... a logical specification of the component, and the system's required level of fault-tolerance. ... Acknowledgements First of all, I would like to express my deepest gratitude to my supervisor Tom Maibaum, for his advice, guidance, and encouragement throughout the course of this work. ...

doi:10.1109/ase.2013.6693149 dblp:conf/kbse/Demasi13 fatcat:z7sidjnzrfdwnginmj7nujhcg4

A hierarchical fault tolerant architecture for an autonomous robot

Preserved Fulltext

Verdi: a framework for implementing and formally verifying distributed systems

Preserved Fulltext

Verdi: a framework for implementing and formally verifying distributed systems

Preserved Fulltext

VBFT: Veloce Byzantine Fault Tolerant Consensus for Blockchains [article]

Preserved Fulltext

Other Versions

Visual Self-healing Modelling for Reliable Internet-of-Things Systems [chapter]

Preserved Fulltext

Validation-based development of dependable systems

Preserved Fulltext

GUARDS: a generic upgradable architecture for real-time dependable systems

Preserved Fulltext

Fast B4B: Fast BFT for Blockchains [article]

Preserved Fulltext

Other Versions

Fault Tolerance Techniques for High-Performance Computing [chapter]

Preserved Fulltext

ZZ and the art of practical BFT execution

Preserved Fulltext

Symbolic synthesis of masking fault-tolerant distributed programs

Preserved Fulltext

Camel: collective-aware message logging

Preserved Fulltext

SoK: Understanding BFT Consensus in the Age of Blockchains [article]

Preserved Fulltext

Design guidelines for robust Internet protocols

Preserved Fulltext

Synthesizing fault-tolerant programs from deontic logic specifications

Preserved Fulltext