blockchain data structure

The first use case for blockchain technology is digital money. To have a monetary system without central control, you must have a special and sophisticated way to handle all the data produced with each transfer. Imagine if every person could access and modify the databases kept by banks. It would be a disaster.
In order to make decentralized money a reality a method of accounting had to be developed - the UTXO model, also referred to as triple-entry accounting. You can compute every account balance at any time by storing all transactions in a digital ledger.
A digital ledger used for digital money requires a set of properties that were not achievable before blockchain came along. In this article, we will look at how the blockchain handles data and why blockchains special properties partly result from it.
Common Data Structures
Let’s develop an understanding of data structures before we look at blockchain itself. Here are some of the most common data structures:
Arrays
Arrays are one of the purest forms to store data. Arrays are useful when you know how many data elements you need to store and how large each data element will be. Your computer will calculate the required storage from those inputs and set it aside, preventing other programs from accessing this partition of your memory. The drawback to partitioning memory is that reserved memory may be too small for future expansion. In this case, the entire array must be moved to a different location.
Each element of an array has an index that starts at 0. You can instantly access and modify an element if you know where you stored it. If you don’t know an element’s location, you must do a sequential lookup. This means you check the elements one by one (starting at index 0) until you find it.
Arrays are useful for their simplicity and instant access property.
One-dimensional array with six elements.
Linked List
Programs that use a linked list to store data don’t have to know how many data elements you want to store beforehand, but the linked list does need to know what each element consists of. The data elements of a linked list are called nodes. Each node can contain several objects of different types.
For example, If you were to store information about cars in a linked list, you could define a node as the set of information about the brand, model, year produced, and license plate.
The first element of a linked list is called the head, and the last one is called the tail.
Linked list with three nodes and n objects per node.
Each node also contains a pointer to the next node. The pointer tells your computer where the following node is located in memory. This allows you to expand a linked list easily because the data doesn’t have to be in a single, continuous location in memory.
Usage of RAM by arrays and linked lists. Array stored in a single location; linked list stored across memory.
When searching for a piece of data, your computer will check the head of the linked list first. If it’s not there, it will look at the pointer, go to the location in memory where the following node is stored, and continue following pointers until it finds the desired data. This method of finding data is called sequential lookup.
Using a linked list gives you more flexibility in terms of expanding the list later on by adding new nodes, but unlike arrays, it doesn’t give you instant access.
Hash Table
The last data structure we want to look at before moving on to the blockchain is the hash table. The data elements you are storing in a hash table are called keys. To store a key, it is first hashed using a hash function. All you need to know at this point is that a hash function uses an argument of variable length as input and produces an output of fixed length. In the example below, the output is a three-digit number.
The keys are mapped to buckets by their hash value, e.g., if “Alice” hashes to 152, it is stored in this bucket. The buckets can be stored in an array because the output space of the hash function is known. Each bucket can instantly be accessed through its index.
A hash table is useful when you need to store many related data elements, like in a customer database. Initially, you could create a customer ID by hashing the customer’s name. Now there is a dedicated location to store purchases, refunds, or contact information. Whenever you need to access the customer data, your computer would hash the name you are looking for to find the bucket efficiently and add, change, or delete data.
The hash functions used for hash tables are usually not collision-resistant. This means two keys might produce the same hash and would consequently be mapped to the same bucket.
A linked list within the hash table is used to store several keys within a single bucket. In the example below, bucket 152 stores a pointer to Alice’s data in the first node, which points to the second node containing Dave’s data.
If the hash table is well-dimensioned, the cost (or the number of instructions/computations) for each lookup is independent of the total number of elements stored in the table. Hash tables give you instant access without even knowing the location of every element in memory. The location is defined by the data itself, making it convenient for systems that have to store large amounts of data and repeatedly access them.
Schematic of a hash table. Keys mapped to buckets (array elements) using a hash function. Buckets expanded as linked lists to accommodate overflow entries.
There are many different data structures; each of them comes with some trade-offs, and depending on the use case, one might choose one over the other. Sophisticated data structures often leverage several more simple concepts in combination to achieve the set of desired properties. We chose the three examples above to show how an array and a linked list can be used to build a hash table.
The blockchain is a rather sophisticated data structure, made up of many sub-structures. It gives us a set of properties that are paramount to building a decentralized ledger for digital money.

Posted on by