Byzantine fault tolerance is a measure of the ability of a distributed system to continue operating even if one or more of its components fails.
A defining characteristic of blockchains such as Bitcoin or Ethereum is that they are run by open networks of nodes, which anyone can join pseudonymously without permission, provided they have the necessary hardware and software. They are decentralized, so there is no central entity coordinating activities. Instead, rules and communication protocols govern the operation of the network and the methods by which nodes reach consensus.
But what if a node, or group of nodes, decides to attack the network by transmitting information about false transactions in an attempt to steal funds? The ability of the network to resist such an attack and continue operating uninterrupted is known as Byzantine fault tolerance.
History of Byzantine fault tolerance
The term Byzantine fault tolerance originates with the Byzantine Generals’ problem, a game theory problem that poses the dilemma faced by a hypothetical group of Byzantine generals stationed outside an enemy city who wish to coordinate an attack. To do so, they need a secure means of communication that will allow them to identify whether other generals are transmitting unreliable information, possibly in an attempt to subvert the attack. It’s a simple analogy, but it sums up the challenges facing distributed system developers very well.
The Byzantine Generals’ problem was first outlined in a 1982 paper by three computer scientists, Leslie Lamport, Robert Shostak, and Marshall Pease. The term “fault” was already common parlance to describe the kind of failure described by the Byzantine Generals’ problem; thus, Byzantine fault tolerance became the term to describe resilience to such failures.
In the 1990s, researchers developed an algorithm called “Practical Byzantine Fault Tolerance” (pBFT) which enabled nodes in a network to reach consensus without relying on a central entity to coordinate. However, it had limited practical applications since the time taken to reach consensus increased exponentially compared to the rate of network growth.
In 2008, Satoshi Nakamoto published the Bitcoin white paper, which proposed a novel Byzantine fault-tolerant consensus method based on the proof of work (PoW) protocol. Since the launch of Bitcoin, blockchain researchers have advanced these efforts through the development of other blockchain consensus methods, such as proof of stake (PoS) which also aim to achieve Byzantine fault tolerance.
How blockchain consensus protocols attain Byzantine fault tolerance
In any blockchain network, regardless of the consensus method used, miners or validators must reach consensus over the validity of each transaction before it’s added to the blockchain ledger. To check if a transaction is valid, it’s compared to the historical data on the ledger and discarded if it appears that the action will be inconsistent – for example, if someone is trying to send funds they don’t have in their account. However, validated transactions are included as a permanent, unalterable record on the ledger, which is shared with all participants. Therefore, all participants have shared points of truth against which to validate future incoming transactions.
Consensus protocols rely on the principle of game theory to provide sufficient incentives to network participants to act in the interests of the network rather than launch an attack. Large networks such as Bitcoin and Ethereum have remained secure for many years thanks to the incentive power of their reward structures.
Understanding Byzantine fault tolerance in blockchains
A Byzantine fault is defined as any distributed system failure that manifests with different symptoms to different observers. Since blockchain networks aim to achieve consensus regarding the state of the ledger, Byzantine faults typically take the form of conflicting information regarding transaction data, which may involve a miner or validator proposing an invalid block or attempting to validate an invalid transaction.
In some cases, the node operators behind such faults may be attempting to launch a malicious attack, but nodes can also suffer from faulty hardware or software that may cause them to inadvertently present false information. The larger a network becomes, the more likely such faults are to occur.
Therefore, the purpose of consensus protocols is not to eliminate such faults and achieve 100% Byzantine fault tolerance but to ensure that the system can continue operating regardless of the fact that Byzantine faults will inevitably occur. The consensus protocol only needs to ensure that a majority of nodes can reach a consensus while acting in the interests of the network, and the blockchain will continue operating with the transaction ledger intact.
Although Byzantine fault tolerance is a feature of blockchains, it’s also a feature of other types of distributed systems used in applications, including nuclear power plants, flight control systems, and space travel.
Byzantine fault tolerance essentials
- Byzantine fault tolerance refers to the resilience of distributed systems against the failure of one or more components
- Blockchains achieve Byzantine fault tolerance through the application of consensus protocols which enable network nodes to validate transactions
- The objective of consensus protocols is not to eliminate Byzantine faults but to ensure that the system is sufficiently tolerant against faults, provided that a majority of participants are acting in its interests