Data availability is a key concept in blockchain that ensures all necessary data for transactions is accessible and secure.
What is data availability in blockchain?
Blockchain data availability refers to the ability of any given network node to access historical data on the ledger.
Blockchains like Bitcoin or Ethereum operate as a ledger of transactions. Each transaction comprises a set of data, such as sending and receiving addresses, the amount, or the timestamp. Transactions are organized into blocks, each of which undergoes validation by the network using a consensus protocol such as proof of work or proof of stake.
Each block is linked cryptographically to its predecessor, which enables the nodes to validate transaction data. With the entire ledger history available, each transaction in each block can be traced back to its predecessor.
As decentralized networks with no trusted authority, blockchains are designed based on the principle of “don’t trust, verify.” Therefore, access to historical blockchain data is a prerequisite for the secure validation of a block.
Such access is known as data availability (DA), and it plays a critical role in the security and transparency of blockchain networks. If the availability of data cannot be guaranteed at all times to all network participants, the integrity of block data may be compromised. Furthermore, no participant should be able to affect the availability of data by withholding or changing it. This would be considered a malicious act.
The key challenge is that over time, the amount of data on the ledger invariably increases, making it more difficult to ensure full data availability. Data storage is expensive, so the financial barriers to becoming a node operator increase, in turn increasing the risk of centralization. It also takes time to download a full copy of the current blockchain state, slowing down consensus. This challenge is known as blockchain bloat.
While Layer 2 platforms can help alleviate the traffic on the network, they still rely heavily on the base layer for validation and settlement of bundled transactions and overall security. Therefore, the proliferation of Layer 2 platforms over recent years has also added to Ethereum’s challenges in ensuring data availability.
Solutions for blockchain data availability
Blockchain developers have devised various data availability solutions that aim to overcome the challenges of blockchain bloat without making unacceptable compromises on security or decentralization.
Data availability sampling
Data sampling involves nodes downloading random samples of ledger data to verify availability. By eliminating the need to download the entire ledger, sampling aims to maintain or increase throughput.
However, sampling doesn’t necessarily guarantee that all data is available at all times. It also risks introducing so-called withholding attacks, where data ends up missing from blocks.
One way around this is to deploy erasure coding. Erasure coding is redundant information added to each block that allows transaction data to be identified in the event that any is found to be missing from the ledger.
Data availability sampling is featured on Ethereum’s scalability roadmap. It is also currently in operation on the data availability layer Celestia.
Data availability committees
Data availability committees (DACs) are off-chain groups of nodes or entities that are trusted to store data on behalf of the blockchain network, making it available as and when required. The DAC receives data about the state of the blockchain from network nodes and, in turn, publishes attestations that act as proof of data availability.
DACs are potentially more secure than sampling since the availability of all data should be guaranteed by the DAC. However, they are typically permissioned committees, meaning that participants must be trusted. StarkEx and Arbitrum are both examples of solutions that rely on data availability committees. Infura, the Ethereum node infrastructure provider operated by ConsenSys, is a participant on these DACs.
Data availability protocols
Several projects have been developed as blockchain layers that can address the challenges of data availability.
Data availability protocols operate similarly to DACs in that they offload data entirely from the main chain. However, DA protocols generally operate their own chain using a proof of stake consensus or similar. Network participants are responsible for storing data and guaranteeing its availability by staking tokens in the network.
Should the data not be available on demand, the relevant participants will have their stake slashed. In this way, participants are incentivized to ensure data is available at all times.
DA layers may also deploy the methods outlined above, using data availability sampling or a committee approach to allow further scaling of data capacity.
Examples of data availability protocols include NEAR, Celestia, and EigenDA.
Data availability essentials
- Data availability refers to the availability of historical blockchain data to the miners, validators, or node operators in the blockchain network.
- Data availability is critical to the security of a blockchain since historical data is necessary for verifying that transactions are legitimate.
- As the amount of ledger data increases, blockchains rely on solutions such as sampling, committees, or data availability protocols to overcome the challenges of maintaining availability.