Hash table data structure using the C programming language. I briefly demonstrate linear and binary search, and then design and implement a hash table. My goal is to show that hash table internals are not scary, but within certain constraints are easy enough to build from scratch.
Go to: Linear search | Binary search | Hash tables | Implementation | Discussion
simple program that counts word frequencies across various languages, and one of the things that came up was how C doesnt have a hash table data structure in its standard library.
There are many things you can do when you realize this: use linear search, use binary search, grab someone elses hash table implementation, or write your own hash table. Or switch to a richer language. Were going to take a quick look at linear and binary search, and then learn how to write our own hash table. This is often necessary in C, but it can also be useful if you need a custom hash table when using another language.
Linear search
The simplest option is to use linear search to scan through an array. This is actually not a bad strategy if youve only got a few items in my simple comparison using strings, its faster than a hash table lookup up to about 7 items (but unless your program is very performance-sensitive, its probably fine up to 20 or 30 items). Linear search also allows you to append new items to the end of the array. With this type of search youre comparing an average of num_keys/2 items.
Lets say youre searching for the key bob in the following array (each item is a string key with an associated integer value):
Index 0 1 2 3 4 5 6
Key foo bar bazz buzz bob jane x
Value 10 42 36 7 11 100 200
You simply start at the beginning (foo at index 0) and compare each key. If the key matches what youre looking for, youre done. If not, you move to the next slot. Searching for bob takes five steps (indexes 0 through 4).
Here is the algorithm in C (assuming each array item is a string key and integer value):
typedef struct {
char* key;
int value;
} item;
item* linear_search(item* items, size_t size, const char* key) {
for (size_t i=0; i<size; i++) {
if (strcmp(items[i].key, key) == 0) {
return &items[i];
}
}
return NULL;
}
int main(void) {
item items[] = {
{"foo", 10}, {"bar", 42}, {"bazz", 36}, {"buzz", 7},
{"bob", 11}, {"jane", 100}, {"x", 200}};
size_t num_items = sizeof(items) / sizeof(item);
item* found = linear_search(items, num_items, "bob");
if (!found) {
return 1;
}
printf("linear_search: value of 'bob' is %d\n", found->value);
return 0;
}
Binary search
Another simple approach is to put the items in an array which is sorted by key, and use binary search to reduce the number of comparisons. This is kind of how we might look something up in a (paper) dictionary.
C even has a bsearch function in its standard library. Binary search is reasonably fast even for hundreds of items (though not as fast as a hash table), because youre only comparing an average of log(num_keys) items. However, because the array needs to stay sorted, you cant insert items without copying the rest down, so insertions still require an average of num_keys/2 operations.
Assume were looking up bob again (in this pre-sorted array):
Index 0 1 2 3 4 5 6
Key bar bazz bob buzz foo jane x
Value 42 36 11 7 10 100 200
With binary search, we start in the middle (buzz), and if the key there is greater than what were looking for, we repeat the process with the lower half. If its greater, we repeat the process with the higher half. In this case it results in three steps, at indexes 3, 1, 2, and then we have it. This is 3 steps instead of 5, and the improvement over linear search gets (exponentially) better the more items you have.
Heres how youd do it in C (with and withoutbsearch). The definition of the item struct is the same as above.
int cmp(const void* a, const void* b) {
item* item_a = (item*)a;
item* item_b = (item*)b;
return strcmp(item_a->key, item_b->key);
}
item* binary_search(item* items, size_t size, const char* key) {
if (size + size < size) {
return NULL; // size too big; avoid overflow
}
size_t low = 0;
size_t high = size;
while (low < high) {
size_t mid = (low + high) / 2;
int c = strcmp(items[mid].key, key);
if (c == 0) {
return &items[mid];
}
if (c < 0) {
low = mid + 1; // eliminate low half of array
} else {
high = mid; // eliminate high half of array
}
}
// Entire array has been eliminated, key not found.
return NULL;
}
int main(void) {
item items[] = {
{"bar", 42}, {"bazz", 36}, {"bob", 11}, {"buzz", 7},
{"foo", 10}, {"jane", 100}, {"x", 200}};
size_t num_items = sizeof(items) / sizeof(item);
item key = {"bob", 0};
item* found = bsearch(&key, items, num_items, sizeof(item), cmp);
if (found == NULL) {
return 1;
}
printf("bsearch: value of 'bob' is %d\n", found->value);
found = binary_search(items, num_items, "bob");
if (found == NULL) {
return 1;
}
printf("binary_search: value of 'bob' is %d\n", found->value);
return 0;
}