Quick & Dirty Notes / Mini-Quiz: HashTables

Janu Sung
5 min readMar 22, 2021

A quick cheat sheet / mini-quiz on HashTables to brush up on the fundamentals.

This blog post will be formatted with a question and then a corresponding answer. I find that having a question posed can help solidify my ability to recall information as well as force me to have a better understanding of the topic. See if it does the same for you.

As you go through the blog try to answer the questions in your own words before looking at the provided answer.

What are some of the other names for a Hash Table?

  • Hash Maps
  • Maps
  • Unordered Maps
  • Dictionaries
  • Objects

*depending on the language there will be slight differences, but they’ll have a built-in hashTable

JS → Object
Python → Dictionaries
Java → Maps
Ruby → Hashes

What is a hash table?

  • Hash tables are a data structure that stores data with key-value pairs.
  • Keys are used as the index to locate the data in memory.
  • The data is known as the ‘value’ → Hash Tables create ‘key value pairs’.

*This is done with a hash function. The hash function takes the name of the key and creates a key into an index in memory and stores both the key and its value within the memory slot/address.

What is a hash function?

A hash function is simply a function that generates a value of fixed length for each input that it gets.

i.e. it takes a set of inputs of any arbitrary size and fits them into a table or other data structure that contains fixed-size elements.

What are the Key Aspects of Hash Functions?

  • Hash Functions are one-way. Once the hash is generated the original input cannot be determined (unless the hashing algorithm is cracked).
  • Once the hash is generated, if you pass the same input to the hash function it will reproduce the exact same hash. If the input is changed at all it will produce a different corresponding hash.

* Hash Functions are idempotent: A function given the same input will always produce the same output. → pure functions are idempotent by nature.

What’s the benefit of a Hash Function? Why would we want to use it in a data structure? How does it do it?

  • We get really fast access to our data.
  • The hash function will take the key → convert it to a hash → then convert the hash into an index/address space where it will store the data for the key.
  • When we look up data, we simply have to pass the key → the function generates the hash → and we immediately know where to retrieve our data.
  • Because we have the hash/index address we can lookup data very quickly.
  • Hash functions built into languages for Hash Tables are intentionally very fast in order to allow for quick lookups. We can assume their bigO to be O(1) → constant time.

Briefly go over the procedure of adding/retrieving data to a hash table.

1. Create a key — -> pass the hash function a string
2. Convert the key to a hash
3. Use the hash to create/find an index/address space within memory
4a. Store the key and the value (data) pair within the address space (for adding)
4b. Access the address space within memory and retrieve the value/data (for retrieval)

What are the BigO’s for Hash Table operations? How do they work under the hood?

  • insert → O(1)
    *hash the key → places it automatically within the address space that it came up with
    **unless there’s a collision → O(n)
  • lookup → O(1)
    *same → pass the key→ the key gets hashed → we use the hash to look up the exact space to find our value/data
    **unless there’s a collision → O(n)
  • delete → O(1)
    *same → use the key → right away we know where to delete the item from → because it's not ordered, we do not need to shift indices like arrays so no O(n)
  • search → O(1)
    *simply use the hash function to check to see if the value is there

What can you store within a Hash table?

Different Languages will implement Hash tables differently.

However, typically the key and the value can be of any data type → all data types are allowed

That said. Within Javascript ES6.

Hash Tables:
* will always have their keys as strings → whether an int or function it will be stringified. The values can be of any data type including functions.
** has no order → data is stored randomly

Hash Maps:
* will allow for keys to be of any data type including a function. Same for values. You can Save any data type as a key.
** also allow for data to be stored in sequence → maintains the insertion order unlike Hash Tables → therefore when you loop through a Hash Map → you loop through in order.

Hash Sets:
* stores only the keys and no values

What is a collision?

When two separate keys are assigned the same location in memory.

Note: this occurs when two different inputs produce the same output in hashing (in hashing X inputs are mapped to a fixed # characters)

* there is no way to predict collisions and is considered an acceptable risk — therefore hashing is still used for integrity and non-repudiation

How does a collision occur?

Collisions occur randomly.

Because the hashing function doesn’t have anything to tell it to distribute space evenly, it randomly assigns the same space to two separate keys when the hashing function returns the same hash for both keys.

How does a collision impact runtime?

If collisions occur, theoretically it slows down reading and writing to → O(n/k) → k being the size of the hash table → it resolves to → O(n)

There are a few common ways of resolving collisions:

- separate chaining
- open addressing
- robin hood hashing

(see wiki page to learn more)

Why use a hash table over an array?

Hash tables are great for when you quick access to certain values.

  • Search → O(1)
    *often used in databases
  • insert → O(1)
    *most of the time we don’t need to worry about collision
  • lookup → O(1)
  • delete → O(1)

Compared to Array:

  • search → O(n)
  • insert → O(n)
  • lookup → O(1)
  • *push → O(1)
  • delete → O(n)

What are some strengths and weaknesses of Hash Tables?

  • Faster lookups (as long as there's good collision resolution included — like in most languages). Faster inserts. Flexible keys (depending on the language)
  • Slower key iteration → O(n).
    Unordered / Scattered Memory → makes it take longer to locate all addresses in memory

What's the trade-off with Hash Tables?

  • The tradeoff is improved Time Complexity due to faster access to data.
  • Cost → requires more memory → greater Space Complexity of O(n).

How would you implement your own Hash Table?

Depending on the language you’re most familiar with the answer may vary. But take some time to walk through a solution. Consider what operations are inherent to hash tables, and try writing out a class definition for an array in pseudo-code (or actual code) and see how well you understand it.

That’s it for this mini-quiz/cheatsheet. Hope it tickled your brain and you had some fun!

--

--

Janu Sung

Just another one of those dreamers with a sparkle in his eyes.