The Internet is plagued with definitions of blockchain technology that make no one any wiser. Blockchain technology is a distributed, immutable database for verifying, recording, and storing information, but what does any of that mean? Sure, you might not need a full understanding of blockchain technology in order to participate in the industry, but I believe that deeply understanding the fundamentals of blockchain technology allows you to cut through the hype and make better decisions.
That’s why I wrote this article. My intention is to explain, in an easy-to-understand way, how blockchain technology actually works. To do so, I lean on my own experience writing for blockchain magazines, but also on Satoshi Nakamoto’s original Bitcoin white paper, an excellent blog post about the Bitcoin protocol that Michael Nielsen wrote in 2013, and a 3Blue1Brown video on the mechanics of Bitcoin. At the end of my article, I hope you’ll come away with a much better understanding of blockchain technology.
Blockchain is 3 Ideas Combined
New ideas often spring into existence through the combination of disparate old ideas. In his 2008 white paper, Satoshi Nakamoto invented both blockchain technology and its first application: Bitcoin. In order to do so, Nakamoto combined three ideas: the Internet, private key cryptography, and a protocol for incentivization. Let’s go over each idea and explain how it is relevant for blockchain technology.
The Internet Makes a Blockchain Distributed
The Internet has two elements that make it important for blockchain technology: Firstly, it’s a fast and efficient way to send information back and forth. Secondly, billions of people have access to it.
Let’s now imagine a scenario where Alice and Bob are two friends who often go out together. They don’t want to worry about splitting the bill when they’re going out, so they have a Google Sheet where they record who paid for what. At the end of the month, they add up their bills to see who owes who. If Alice paid $63 while Bob paid $37, Bob owes Alice $13 to even it out.
Then, disaster strikes. A major fault in Google’s codebase makes Google Sheets unavailable for weeks. Google publishes a statement saying that some data may have been lost forever. Alice and Bob cannot access their spreadsheet and no longer know who owes who how much. They realize that there’s always a risk, no matter how unlikely, in trusting a third party with your data, whether that’s a bank, the government, or a big tech company like Google.
So Alice and Bob create a new spreadsheet. This time, however, they each keep the spreadsheet locally, as a file on their computer. Whenever Bob pays for something, he adds it to his copy and sends his updated copy of the spreadsheet, through a peer-to-peer network (P2P), directly to Alice’s computer, where her copy of the spreadsheet is updated in turn. This ensures that both Alice and Bob always have the latest, updated version of the spreadsheet on their computer.
This is how blockchain technology works too. A blockchain is a database that people can add information to. This database is not hosted in a single place. Instead, it is hosted on many different machines (computers, laptops, phones, …) that each have a local copy of the entire database stored locally. These machines are called nodes, and they’re all directly connected with one another through a P2P network. Changes to the database are communicated to all other nodes so they will always end up with the same, updated version of the database.
That is why blockchain technology is often called decentralized or distributed. In fact, another name for blockchain technology is distributed ledger technology (DLT). It’s distributed, because the database runs on many different nodes in the network. There’s no single entity that has absolute control over the database, so there’s no need to trust such an entity either. The Internet made that possible.
Of course, there are still many problems with the local spreadsheet of Alice and Bob. While they might have found a solution where they no longer need to trust Google (or any third party), how can they trust each other? What if Alice decided to add a few bills that she didn’t pay for? What if a hacker accesses Bob’s computer and adds a few transactions on the spreadsheet without Bob knowing? That’s where private key cryptography comes in.
Cryptography Makes a Blockchain Trustless
Realizing the flaws in their spreadsheet, Alice and Bob introduce a new element to it. They decide to add their digital signature to every transaction they add to the spreadsheet. A digital signature is uniquely different from a physical signature in that it’s much harder to copy or fake.
To create a digital signature, you need two digital keys: a public key and a private key. You create your digital signature with a function. You insert both the transaction you want to sign off on and your private key into the function and it will generate a seemingly random collection of hexadecimals. This output is called a hash and it serves as your digital signature to add to every transaction.
So Alice could sign off on the $27 bill she paid for by putting the $5 and her private key into the function. This function, in turn, would create the hash that is her digital signature. Now here’s the trick: If Bob wanted to check whether Alice actually signed off on this transaction, he could use another function that takes the transaction, Alice’s digital signature, and her public key (which anyone can find) as inputs. This function will give a true or false output, which would tell Bob whether it was actually Alice who signed off on that particular transaction or not.
So, to summarize, there are two functions and here’s how they work:
- generateDigitalSignature(transaction, private key) = hash
- verifyDigitalSignature(transaction, digital signature, public key) = true or false
The function to generate the digital signature is special because it’s cryptographic. This means that it’s almost impossible to figure out what the private key is on the basis of the hash and the transaction. You cannot reverse engineer it. Bitcoin, for example, uses the SHA-256 algorithm as its cryptographic function, which generates a hash that is 256 bits long and that changes drastically whenever there’s even the slightest change in any of the inputs.
The only way to “hack” a cryptographic function is through trial and error. A hash generated through SHA-256 has 2^256 possible combinations of zeros and ones that you need to pick through. Impossible work for even the fastest supercomputer (except for quantum computers, but that’s a topic for a different blog post).
The main point is that Alice and Bob now have a cryptographically secure way to sign and verify each other’s transactions. The distribution of local copies through a P2P network and cryptography have made Alice and Bob’s spreadsheet trustless.
But it seems like a lot of work for Alice and Bob to verify that every transaction is valid. They could do spot checks, but it would be much more secure if a machine could check every transaction. That’s where the incentivization protocol comes in.
The Incentivization Protocol Keeps a Blockchain Secure
Let’s imagine that Alice and Bob have a computer that checks every transaction. It verifies whether the right person signed the transaction and whether there are any duplicate transactions. The problem here is that it probably shouldn’t be either Alice or Bob’s computer that does the checking. That would pose a conflict of interest; the owner of the computer could easily sabotage how the computer checks their transactions. It probably can’t be both Alice and Bob’s computers either. A disagreement on a transaction, with Alice’s computer saying one thing and Bob’s another, would have no way out.
Instead, there should be many computers of other people checking the spreadsheet. All of these computers should have a local copy of the spreadsheet and the first computer that can verify the validity of a transaction should broadcast it out to all computers in the network, which will update their copies in turn.
In essence, that’s how a blockchain works. There are many machines continuously verifying the validity of information. They bundle a bunch of information into a block, push it through the cryptographic function we discussed above to sign off on it, broadcast it to all machines in the network, and attach it to the previous block by including the previous block’s hash along with all the information of the new block.
That final point is important, because it’s what makes it near impossible to change any information in the chain of blocks once that information has been verified. If you’d change the information in block 231 even slightly, the hash of the block would change dramatically. It would no longer match the previous hash that was included in block 232. So if someone wanted to change information in block 231, they’d need to change the previous hash in block 232 too. But that would change the hash of block 232, which was included as the previous hash in block 233. So the hacker would need to change that block too.
The bottom line is that, if a hacker wanted to change anything in any block, he’d need to modify all the blocks. Not only would this be very obvious to everyone else, but it’s not so easy to change information in a block either…
You see, all these machines verifying information don’t work for free. It would make no sense to spend computing resources without asking for something in return. That’s why every blockchain has a protocol that 1) keeps it secure and 2) rewards machines for verifying information. Let’s talk about security first.
At this point in Alice and Bob’s spreadsheet (and it’s unlikely any two people would ever go to such lengths to split bills, but please bear with me) security comes from a number of machines agreeing on which transactions are valid and which are not. How they agree is called the consensus mechanism.
There are many different ways to come to a consensus on the validity of information. One of the bigger debates in the blockchain industry is about which consensus mechanism is better. We’ll cover different consensus mechanisms in another blog post, so let’s use Bitcoin’s consensus mechanism, which is called Proof of Work (PoW), to continue.
PoW is set up in such a way that it’s difficult for machines to validate information (transactions, in Bitcoin’s case). This might sound counterintuitive at first. After all, why would you want to make it hard for a machine to do something that’s so essential for the security of a blockchain?
To answer this, think about a hacker who wants to mess with some of the information in a block. He’s read through this article up until this point (good on him) and realizes that he’ll need to change the hashes of all subsequent blocks. Imagine if that’s easy to do. That wouldn’t be a desirable outcome. If it’s hard to do, he’d be less inclined to try. That’s one of the reasons why consensus mechanisms generally make it hard for machines to validate information.
For PoW specifically, it’s once again made hard through cryptography. Miners need to find a number –a nonce – that, when attached to the list of transactions, creates a hash that starts with a number of zeros. They can only find this number through trial and error. Because of the number of possibilities, this takes quite a bit of computational work.
Bitcoin automatically varies the number of zeros that need to be found to make the challenge either easier (fewer zeros in the hash) or harder (more zeros in the hash). It varies the challenge in such a way that it should take on average ten minutes to solve. So every ten minutes, a new block full of transactions gets added to the Bitcoin blockchain.
From the moment a miner finds the solution to the challenge, it broadcasts it out to the network. The other miners can easily verify whether it’s the right solution by plugging the number into the hashing function to see whether the hash does indeed start with the right number of zeros.
If two miners find the solution at the same time and broadcast it to the network, the blockchain “forks.” Miners will choose between either fork of the blockchain and work away until one chain is longer than the other, which is when they’ll switch. This means that the fork with most computational resources working on it will always be the one miners eventually switch to.
That’s also why it can take a while before Bitcoin transactions are verified. Not only do machines need to solve the challenge, but it needs to be part of the longest fork (in case there is a fork). The Bitcoin protocol states that at least five blocks need to follow before transactions in a block are fully verified. Every subsequent block is called a confirmation; Bitcoin transactions require six confirmations.
Let’s now talk about reward. As said above, these miners don’t work for free. That’s why the first miner who figures out the solution to the challenge of a new block gets a certain number of bitcoins as a reward. It’s why miners keep on mining and how new bitcoins are created. When the Bitcoin blockchain first started running, the challenge was still very easy because not many computers were working on the blockchain. The reward, in retrospect, was very high too: for every new block, a miner received 50 bitcoin.
As I’m writing this, that would’ve been worth almost $400,000. For solving one block. Every ten minutes, there’s a new block, so the early Bitcoin miners must all be millionaires (if not billionaires, as the Winklevoss twins are) if they’ve kept their bitcoins until now.
But the Bitcoin reward is no longer 50 bitcoin. Every 210,000 blocks, the reward halves. This means that there’s an upper boundary to the number of bitcoins that will ever be in circulation: 21 million. Bitcoin is deflationary in nature, because there will only ever be a limited number of them (but there are a lot of caveats to that statement that I won’t go into). Currently, the mining reward sits at 12.5 bitcoin, but that’s going to halve in a few weeks to 6.25.
Once all bitcoins have been mined, miners can still earn money through transaction fees. For every transaction they verify, they are allowed to charge a small fee to keep them going. This will be their only source of revenue from 2140 onward, when the last bitcoin will be mined. Let’s see what happens. Suffice to say a lot can happen before then.
Let’s Put Our Knowledge into Practice
To finish up this article, let’s put our newly-found knowledge to use. The below screenshot is block 628110 of the Bitcoin blockchain. In the first line, you can see that the hash starts with a number of zeros. Toward the bottom, the nonce is the random number that the miners needed to find to generate the hash of the block.
You can see that this was the last block in the chain when I took this screenshot, because it had only one confirmation at the time (which means no other blocks have come after it). The height tells you that there are 628109 blocks attached to it. This block has 2,785 transactions inside and was mined by F2Pool, who received 12.5 BTC for solving the puzzle and 0.24 BTC in transaction fees. You can examine the block for yourself here if you want to see the individual transactions that went into this block.
To conclude, let’s turn back to the blockchain definition we brought up at the very beginning of the article: blockchain technology is a distributed, immutable database for verifying, recording, and storing information.
- It is distributed because of the local copies on many different machines.
- It is immutable because of cryptography and because each block is attached to the previous one.
- Machines are incentivized for verifying, recording, and storing that information with block rewards and transaction fees.
There are still many areas that I haven’t covered, particularly since all blockchains are set up in different ways, but I hope this article has given you a good idea of the fundamental concepts that make blockchain such a unique and interesting idea.