Realizing an archiving solution with the help of blockchain has many considerations. First of all, blockhcian is not very efficient to store a large amount of data. For this reason, we usually use a mixed architecture, namely a centralized or decentralized storage for storing the documents and a blockchain platform to store the integrity data of the document versions:
The architecture provides many different versions and combinations:
- Blockchain: can be public or a consortium one. It might work with many different consensus algorithm providing different kind of and different strength of cryptoeconomical guarantees.
- Storage: can be totally centralized, like a file storage or a cloud storage. It can be decentralized as well, like realized by IPFS, Swarm or Bittorrent.
Integrity of a document can be realized by hashing the document data with a timestamp and with some metadata and writing the data into the blockchain. This saves the integrity information into the chain and provides a hint that the document did exist. In real implementations, further consideration must be done, since the simple hash value might be vulnerable to a dictionary or rainbow table attack. For this reason, the simple hash value might be extended with a random salt, or optionally the document might be encrypted first and only the encryptoed version is written into the chain.
A further architecture possibility can be if we do not want to save even the hash value into the chain. In this scenario the blockchain is only used to track a certain number of trusted validators and a document can be regarded as valid if a majority of the tracked validators sign the document with some metadata. In this architecture there is no information about the existence of the document in the chain, but if the document exist, we can prove if it is valid.
Last but not least, we can have some consideration about the fact how the archiving logic works. The archiving logic might be somewhat more complicated, having like different rules for archiving. In such a scenario we might as well evaluate if the logic itself should run centralized or decentralized, like with the help of a Byzantine fault tolerant system.