The Software Economist Blog: do it yourself

Showing posts with label do it yourself. Show all posts

Saturday, September 1, 2018

How to implement a Blockchain from scratch - block and blockchain objects

Building up a blockhcain archtiecture, two of the most important roles are certainly the blocks and the blockchain. At the implementation at least three important scenarios have to be taken into account:

- Mining: at mining, the miner creates a brand new block, adds the transactions to the block from the transaction pool and solves some cryptographic puzzle. As the block is created correctly it will be added to the local blockchain and will be communicated on the network with the help of a gossip protocol.

- Synchronization: at synchronization, the blocks are queried from the network one by one, they are validated and added to the chain. At synchronization, there is not necessarily a forking strategy, the blocks can be queried based on the block id which can be provided as on consistent set of the chain. Similarly, the other side of the synchronization is to provide a set of transaction id-s which is regarded as the longest consistent blockchain.

- Gossiping blocks: If the node receives a block on the network, first of all the network should already by synchronized. if we receive a new block, it must be validated and a fork resolution algorithms has to be executed as well. If the block extends the longest chain, the block should be added to the longest chain. If a block extends one of the alternative chains, it can be added to the alternative chain and it must be decided if we have a new chain longest chain. There might be the case, that the block can not be added to the chain at all, if so it can be added to a pool of orphaned blocks. Last but not least, there might be the situation as the block can not be added to the chain alone, but only with the help of another block that is already in the pool of orphaned blocks.

Tuesday, August 28, 2018

How to implement a Blockchain from scratch - syncing accounts between state and wallet

In an account balance based blockchain system, there are accounts both in the blockchain state and in the wallet as well. It is important to understand the life cycle and syncronization between these elements:

- The accounts in the wallet should represent only a copy of the accounts of the state.

- Extended information can be stored art the accounts of the wallet, as an example the private keys for making signature simpler.

- The accounts of the state should contain only public keys or addresses derived from public keys, not private key should be stored in the account of the chain.

- After every new block, the wallet has to be synchronized. It is an open question how the synchronization should be carried out with the fork resolution strategy. There might be different strategies, like showing always the values of the top block of the actual state or waiting for a certain number of confirmations to avoid forks.

- If a new transaction is initiated, it might refer to accounts that are still not in the state, only the public private keys or address were generated and they are only stored in the wallet.

- At a currency transfer transaction the from account has to be in the state with a big enough fund and with a consistent nonce.

- At a currency transfer transaction the to account should not necessarily be in the state. It can be added at the mining with the amount of money that is transferred to. It is important that the to account must be compatible with the from account if we consider a multi-asset scenario.

- There must be a couple of genesis accounts and or coinbase transactions for each cryptoasset, for the initial distribution of the monetary supply. The exact implementation depends on the issuance of the cryptoasset. For creating a genesis or coinbase of a new crptoasset, a new validation rule, perhaps a brand new transaction type has to be introduced.

- At a data setting transaction, the initial account must not necessarily exist, it can be added anytime if there is a valid signature related to the address of the account.

Tuesday, August 21, 2018

How to implement a Blockchain from scratch - gossip protocol

Blockchain protocols have several different ways of communication, there are gossip and non-gossip based ones. The beginning of the network communication is usually a non-blockchain based one, a peer connects to several neighboring peers, checks versions of the peers and queries further peer information if it is required. Similarly, synchronizing the blockchain is not a gossip protocol either: the peer queries the neighbors for the latest block number and based on an inventory query it will synchronize the whole blockchain. Blocks and transactions are propagated with the help of a gossip protocol. The logic is something similar:

- If the node initiates a new valid transaction, the transaction is added to the transaction pool and propagated to all neighbouring peers.

- If a node receives a transaction, first the validity of the transaction has to be validated. If the transaction is valid, it has to be checked if the transaction is already somewhere mined in the blockchain or in the transaction pool. If so nothing has to be done. If not, the transaction has to be added to the transaction pool and the transaction has to be propagated to the connecting peers except from the one from that we got it.

- If a miner mined a new block, the block has to be propagated to the network, and the local wallet has be updated based on the new block information.

- If a node gets a block on the network, first the validity of the block has to be checked. It might be a little bit difficult, because it might still not in the blockchain. Therefore there should be an explicite set containing stale blocks that still can be not added to the blockchain. A new block is valid if it can be added directly to the blockchain, or there is already a stale block in the pool and the two blocks can be added to the blockchain. If it can not be added to the chain, it should be saved in the stale blocks pool. If the block is already in the blockchain or in the stale blocks pool, there should not be propagated further. Otherwise the block must be propagated to the neighboring peers.

To avoid network overload, it is possible to use only the block and transaction id-s in the gossip, flooding process and getting the content of the data only if it is necessary.

Saturday, August 18, 2018

How to implement a Blockchain from scratch - event bus

Key component of the every blockchain architecture must be a reliable event bus. There are many parallel actors working with the data of a node, like

- peers gossiping and requesting information, like new transactions, or new blocks

- wallets initiating transactions

- miners or validators working directly or indirectly with a node

- blockchain explorers requesting important data regularly

- and of course an advanced logging system writing everything to a local log and supporting both standard and debug mode is also required.

For this reason, it is practical that every node implements an event bus with the funcionalities:

- different actors can push different pieces of information on the bus, with the type of information and the severity of the information or error.

- different actors can subscribe for different pieces of information, as an example, a logger would write everything into a file, a wallet would be interested on events if the blockchain gets synchronized, if the initiated transaction gets mined or validated, if the balance of a supervised account changes and so on. Similarly, a blockchain explorer interested if there is a new transaction which is being gossiped into the system, if there is a new but still not validated block, if there is a new validated block and so on.

Even some part of the standard protocol might work totally asynchronous from each other realizing the central communication protocol via an internal event bus of the node.

Sunday, August 12, 2018

How to implement a Blockchain from scratch - smart contract simplified

In a simple account/ balance/ state based blokchain system implementing smart contracts is pretty straightforward. Accounts represent for the first run not necessarily just balance but a kind of a general data as well that can be modified by the smart contracts. In order to create smart contracts, you should define the language or smart contract programming environment itself and the effect that a smart contract can result in the state. Certainly one way of doing it is to define a virtual machine which guarantees that the smart contract is executed exactly the same way on every peer. However we might as well consider an exiting virtual machine as well, like the java virtual machine and limit somehow the effects of the program. As an example a simple smart contract could:

- read some of the state information of the blockchain manifested by accounts data and balances. This state information is the previous block on which we want to mine our contract.

- having some computation on top.

- changing the data value of a certain account.

- storing the smart contract code somehow as data or string, like with the help of serialization

- creating a special transaction containing the smart contract as data with the sign of the account that you want to modify, indicated indirectly the owner of the account allows the data to be changed.

At mining process:

- The signature of the smart contract transaction has to be checked.

- It has to be made sure that only the effected accounts are modified.

- The code has to be run and the new data value must be calculated.

- It has to be made sure that the smart contract does not cause infinite loop, one way of doing it is to avoid general loops, or to terminate the contract after a certain number of iteration resulting the transaction as invalid. Certainly another way can be built in as well, like with the help of a cryptoeconomical mechanism longer smart contract runtime can disincentivized, just like as with Ethereum.

- The new value or values have to be applied.

- The block must contain the valid transaction and the new valid state, which is the new value of the computed accounts.

At validation process, practically the same steps have to be repeated, without the last one, which is putting the transaction and state to a new block and doing proof of work calculation or voting of a byzantian consensus mechanism:

- The signature of the smart contract transaction has to be checked.

- It has to be made sure that only the effected accounts are modified.

- The code has to be run and the new data value must be calculated.

- The calculation must have finite time.

- It has to be checked if the new values of the state of the given are the calculated values based on the values of the previous block.

The wallet functionality has to be extended:
- to have the possibility to write or integrate smart contracts.
- to transform the programs into data, like with the help of serialization.
- to create transactions based on the smart contract.
- to sign them.
- to broadcast them on the network.

How to implement a Blockchain from scratch - extended wrapper classes

At designing at implementing a blockchain system from scratch, there might be some contradictory design perspectives: on the one hand elements of the blockchain that are stored or transported on the network must require as less storage as possible, resulting in lower bandwidth or data storage requirements. On the other hand for efficient processing, some further information is usually required. Examples are:

- Blocks in a basic scenario should store the difficulty, the nonce and the a hash value of the these values values together with the merkle roots of the transactions and state and the previous block hash as well.

- Blocks in an extended scenario might contain explicitly a link for the previous block, for the further, some information if it is an orphans block or on the block height.

- Accounts in a basic scenario should contain an address of an account, which is usually a public key, and some kind of a change request, like transferring money, or changing a value. On top, certainly a nonce value.

- Accounts in an advanced scenario are related rather to the accounts of a certain wallet, so they might contain explicitly the private key and meta-information if the account is synchronized with the blockchain, or still not available in the blockchain.

There might be a similar consideration for other elements of the blockchain as well, like Block Headers, Transactions or Peers. Practically every object that is moved on the network can be considered as implemented as a basic version containing only the relevant data, and as an extended version containing all the computational relevant data.

Saturday, August 11, 2018

How to implement a Blockchain from scratch - network protocols

At creating a blockchain solution from scratch, one of the most important part is to design a set of network protocols. It is important to note that these protocols are on the application level, under the hood they might be realized by simple socket sending on a TCP, something more abstract RPC or JSON-RPC, or even with onion routing. At any case, the following protocols should be considered:

0. Get client version: connecting to a known peer and getting the version of that peer. It prevents the usage of incompatible peers, that might be highly important especially if there are updates of the code regularly.

1. Update peer information: A brand new peers is usually started with a couple of preconfigured nodes further peer information. These peer information are queried for the further known peers until the number of known peers reach a certain limit (like 15 in Bitcoin by default). The peers may go offline and come back online again, therefore the active peers have to be checked regularly if they are still alive, if not different strategies can be implemented, like:
- deleting the inactive peer from the cached peer list.
- waiting for a certain time if the peer will be online again.
- querying all the still active node to get more peers that are hopefully active.
- or a combination of the previous strategies.

2. Syncronizing the blocks and states: As a first step of using the blockchin, the blockchain must be more or less up to date with the rest of the world. As a first step, the node can connect with all of the peers and ask the size of the blockchain. Based on that information, it knows exactly if the blockchain needs to be synchronized or not. As a second step a

3. Propagating transactions and propagating blocks: if new transactions or blocks arrive, they must be registered locally and further communicated possibly as fast as possible to all of the connecting peers in the form of a gossip protocol. If it is a transaction, it must be added to the local transaction pool, if it is a block it must be added to the possible top headers of the blockchain. It is especially critical with the new blocks as the winner of the mining competition depends on the speed of the propagation. This phase should be available just after the blockhcain has already reached at least a quasi synchronized status. It is however an interesting general question how the propagating mechanism might stop, without effectively over flooding the network. One way might be to implement something as a finite hop system, in which a given information is propagated only at a given time. Another idea might be to implement regularly handshakes where peers exchange information on relevant transactions and blocks first before transporting them.

In all of the communication protocols, it is always questionable, if we consider a kind of a push or pull protocol. In both protocols, the design direction should be to transfer as less amount of data as possible, and only if it is necessary. As a consequence, most protocols should transfer the inventory first, meaning the hash values of blocks and transactions. After that the nodes should be able to download the content of the hash values on-demand.

Thursday, August 9, 2018

How to implement a Blockchain from scratch - tasks of a miner

As opposed to the common idea, the task of the miner is not only to calculate the hashes. Its task in a blockchain system is to create a new block with a set of valid transactions and state information. The most important tasks are the followings in an accoun/balance based blockchain system:

- Choose one of the top block of the different partly competing blockchains: as the top of the blockchain always forks, one task of the miner or validator is to pick one of the possible top blocks and start building on top. As these top blocks are competing with each other a heuristic has to be used to choose a blockchain that will most likely the longest one. Miners or validators are cryptoeconomically incetivized to pick the most probable longest block.

- Pick a valid set of transactions from the transaction pool. If a transaction is valid or not depends highly on the transaction itself. The transaction has contain a valid signature, its from and to address should either contain valid account or the related account should be able to add to the transactions. Transactions for double spending or replay attacks must be avoided. There might be further considerations for picking the transactions, like there might be transaction fee included that has to be maximized and there can be a limit for the available transactions as well.

- Apply transactions to the state. How it is exactly carried out is again highly dependent on the transactions. If it is a transfer transaction, than the related balances of the related accounts have to modified. If it is a state modifying transaction, the state of the corresponding account has to be modified.

- As soon as the set of transactions is chosen and the state is calculated, merkle or patricia root both for the transactions and for the state has to be calculated.

- As a last but one step, proof of work has to been done, meaning that nonce values have to be fine-tuned until the hash value will be less than the given difficulty.

- As a last step, the new block has to be boradcasted to the system.

Wednesday, August 8, 2018

How to implement a Blockchain from scratch - adding system transactions

Adding a system transaction to a blockchain system is actually quite easy. Let we imagine the situation that we have something as a number of privileged accounts - they might be related to nodes as well and we want to initiate a transaction to adding a new one to these privileged accounts:

- There must be a transaction that contain as an address one of the address of the privileged accounts.

- There must be a state information in the blockchain containing the available set of system accounts

- There must be a validation rule that not only validate the signature but checks as well if the related account is in the system state among the possible addresses.

- There must be a state transforming rule, which in case the transaction is valid adds the new account to the state.

The system can be started by adding to the first node a system right and then the first node can attach further nodes to the system account state.

Optionally, there can be other special accounts or activities in the state that can be carried out only by system transactions.

Certainly it is an open question if such a system is more vulnerable against forking attacks.

Such a system could provide the basis for realizing on-chain governance. Let we imagine the situation that difficult is a variable in the system state that can be modified only by system transactions. There might be one model that every authorized account can modify the variable, however we imagine that each transaction makes just a kind of a vote for the new value. If the number of positive votes reach a certain value, the new difficult will be considered in the future.

How to implement a Blockchain from scratch - apply transaction to the state

Mining in an account/balance blockchain system practically means finding a set of consistent valid transactions and applying them to the system state. For this activity we need two sub-functions: on the one hand the, transaction has to be validated against the state, on the other hand the transaction has to be applied to the state. For the sake of simplicity we assume that each block stored the whole state, which is copied one to one at the initializing the block.

For a simple transaction only setting data, the pseudo code is the following:

ValidateTransaction (Transaction t)

if t.signature is not valid

return error signature not valid

else

foreach account a in the state

if a.address == t.fromAddress

if a.nonce != t.nonce +1

return error replay attack

else

return transaction is valid

else

add new account to accounts with fromAddress

For a transaction transferring money:

ValidateTransaction (Transaction t)

if t.signature is not valid

return error signature not valid

else

foreach account a in the state

if a.address == t.fromAddress

if a.nonce != t.nonce +1

return error replay attack

else

if a.balance < t.amount

return balance is not enough

else

foreach account a in the state

if a.address == t.toAddress

return transaction is valid

add new account to accounts with toAddress

else

return error from account must exist at transferring money

For applying the state we have to iterate on all of the account check them if they are valid and after that apply the state change. It might vary again based on the exact type of the transaction. If it is a simple data transfer transaction:

ApplyState (Transaction t)

foreach account a in the state

if a.address == t.fromAddress

a.data = t.data

for the money transferring transaction it is a little bit more complicated

ApplyState (Transaction t)

foreach account a in the state

if a.address == t.fromAddress

a.balance -= t.amount

foreach account a in the state

if a.address == t.toAddress

a.balance = t.amount

How to implement a Blockchain from scratch - minimum wallet

Wallet is practically a set of keys that is stored directly or indirectly in the local node. It is important to note that this is a wallet functionality of a blockchain node, which might have similar functionalities as a mobile or web wallet, however the implementation might be a little bit different.

In an UTXO based system, a wallet stores the keys and a set of unspent transaction outputs that provide the possibility to read out balance without initiating a full blockchain search. In an account balance based system, a copy of the locally managed accounts are practical to store, with the corresponding private keys. This provides the possibility to read out balances or data in an efficient way. On the other hand, locally created but still not with the blokchain accounts have to be managed as well.

An account has the flowing data;

- Accounts: a list of extended account information is practical to be stored, meaning the basic account information with the private keys and some flag if they are fully synchronized with the blockchain. If the private keys are encrypted, further information might be required as well.

A minimum wallet should implement the further functionalities:

- CreateNewAccount: creating a new account locally, generating a new public private keys and calculating the necessary further information. The functionality might be implemented differently, if it is a random wallet, hierarchical wallet, or hierarchical deterministic wallet.

- ImportAccount: importing an account based on the private key.

- CreateAndSignTransaction: creating a new transaction which can be any type of a transaction like value transferring or data setting transaction. The transaction is signed by the private key of the related account might vary based on the type of the transaction. A value transferring transaction is signed by the private key of the sender address. After a successful signature, the transaction is broadcast ed into the network. A value setting transaction is signed by the private key of the account which value has to be set. The functionality might be separated into several sub-functionalities, like creation transaction, signing an existing transaction, and broadcasting into the network as three separated tasks.

- GetBalance: getting the balance of an account or of the whole wallet.

- BackupWallet: backing up the locally stored account information, especially the private keys.

- RestoreWallet: restoring the account based on the backed up information, especially on the private keys.

How to implement a Blockchain from scratch - transactions

Transactions are responsible in every blockchain to create changes in the system. Considering an account/balance based system, they are actually simple statements saying transfer money from an account A to account B, or change the state of account A for a new value. It is important that a properties of a transaction should be set only once, practically at the beginning, to avoid possible hacking attempts. A transaction should contain at least the following elements:

- TransactionId, it is actually a hash of all the important values of a transaction. Practically the hash of all of the previous elements. TransactionId provided on the one hand as a kind of a primary key for the transaction itself, the transaction can be identified based on this Id. On the other hand, it might provide a kind of a hacking resistance consistency the transaction is only valid if the TransactionId is consistent with the other values. It is certainly a question if the TransactionId itself should be stored on the blockhain or if it is enough to generate it. If we generate the value, we might miss one consistency guarantee, on the other hand we might as well save storage space on the chain. At any case TransactionId is practical if we want to refer to created but still not signed transactions on the client side.

- Nonce: the value should avoid replay attacks. It should be set by the wallet software as an incremental value of the account nonce.

- Address: the address of the account that we want to modify, or from which address we want to transfer cryptocurrency. If the address is the public key, this field should contain the public key, if it is calculated value of the public key like with hashing or double hashing, than this calculated value should be here.

- Signature: valid transactions must have a signature, which is the data of all relevant information in the transaction, signed by the private key. The signature is generated by the used cryptographical algorithm, like with the help of Elliptic Curve Cryptography. In case we have TransactionId as well, than this id should not necessarily be presented in the signature. The reason for that is that we might want to administrate valid but still not signed transaction on the wallet side.

Depending on the exact transaction type we can have further properties as well. It is important to note that in a given system, we might as well several different kinds of transactions, like one for transferring money, and further ones for setting data, like in case of an identity management system.

- ToAddress: if our transaction is a value movement transaction, we will need the address where we want to move the money.

- Amount: in case of a value movement transaction, we will need the amount to transfer as well.

- Data: if our transaction is meant to register data in the blockchain, we will need the new data value as well.

Transactions need to have the following functionalities:

- Create transaction: in a way that all of the important properties can only be set once.

- SignTransaction: with a private key and a given cryptography the signature of the transaction can be created. The signature should be in case as well to be able to set only once.

- VerifyTransaction: based on the signature, exiting data parameters and public key, the signature can be verified. If the public key is directly the address, the signature is simple. If the address is derived with a hash function from the public key, the public key must be also given as an input.

Advanced scenarios might have further functionalities as well, like creating raw transactions, or partially sign transactions.

How to implement a Blockchain from scratch - accounts and balances

There are two kind of a blockchain:

1. UTXO based systems stores only transactions with inputs and outputs represented as practically coins, an output is practically spent, if there is a transaction which refers with an input to the output of the other transaction.

2. In state based systems there is an explicit representation of an account which contains a balances or other information as data.

An account has the following data structure:

- Address : it is a public key or the hash value of the public key.

- Sequence : it provides a protection against a replay attack, in simplest case it is an integer, in more complicated scenarios it is a kind of a ring of hashes.

- Balance: if the blockchain solution implements cryptocurrency either externally or as an internal cryptoeconomical incentive, there must be a variable for that of a type double or float.

- Data: if the platform implements something other that cryptocurrency, we can have some more data elements as well, like a string or strings for identity management or an array of key value - value pairs for a general smart contract system.

- AccountId: it is questionable if account id is required, as the account address should identify the account as a primary key. The implementation of such an id is the hash of all or some data in the account. On the one hand, It provides some more consistency and hacking resistance, on the other hand however, the value should be recalculated at every new value assignment or balance change.

If the account of a wallet is not stored in the blockchain but in the local wallet, than further data and functionalities should be taken into consideration:

- Private Key: As the wallet creates and signs transactions if the account, private key of the account has to be stored as well, either in a plain or an encrypted version. It is important that private keys should somehow, directly or encrypted, be stored on the wallet side, but they must not be stored in the blockchain.

- Syncronised with the blockchain: if we create a brand new account with the help of the wallet it might still not be added to the blockchain. It can be added to the blockchain at the first transaction (like at changing value of the account or at transferring money to that transaction), or it can be added with an explicit transaction. Independently if the account is added to the blockchain, at each round there might be a synchronizing round that synchronized the account values from the blockchain state to the local wallet account store.

As a functionality one should provide the following services on the wallet side related to the accounts:
- GenerateAccount: creating a brand new account, means creating a new private and public key with a cryptography and key generation mechanism, like eliptic curve cryptography. The private key should be given back to the user and stored in the wallet account possibly with a corresponding symmetric encryption mechanism. If the address of the account is not the public key directly, the address should be generated from the public key, like with a help of different hash algorithms.

- ImportAccount: the account should be created with the help of an input private key. Public key and address should be generated based on the given cryptographical protocol. As this is not a brand new account, the data of the account must be synchronized from the blockchain.
- SyncAccount: the data, balance and sequence parameters of the wallet account should be updated based on the latest reliable values from the blockchain.

From a conceptual point of view, accounts that are stored by a wallet are the accounts that are administrated by the wallet. In an account/balance based system, they are similar as storing the list of unspent transactions in an UTXO based blockchain platform.

There might be more than one different style of account in the system, similarly as externally owned and smart contract accounts in Ethereum. Typically one can be responsible only for storing the cryptocurrency balance, as others for storing data on the blockchain.


"On a long enough timeline we will all become Satoshi Nakamoto.."
Daniel Szego

	*Daniel Szego*
Having spent one and a half decade in software development, engineering, R&D, project management and leading software companies; and having two master degrees one in engineering and one in business administration, I thought I summarize some of my theoretical and practical thought on the software industry and enterprise softwares.

	Contact