Novel DS & Algos Roundup #1 | CS Journey

These are a few novel algorithms that I stumbled upon this year. I intend to curate and jot down stuff like this on a regular basis from now on so that I can quickly get to speed in the future (that is once I’ve forgotten about them). These also serve as notes. I don’t have to hunt down these links in the future.

Bloom Filters

It’s a space efficient algorithm that checks if a particular item exists in a list or not. It trades memory requirements for accuracy. It has a very high accuracy but isn’t right all the time (you can tune it to use more memory and increase accuracy or reduce memory requirement and accuracy at the same time). Ingenious use of hashing.

Bloom Filter for Dummies
This is a well explained article keeping a sample usage in mind. Loved it.
Tutorial with a Demo

Music Shuffling

Random algorithms obviously aren’t deterministic. When using them it’s possible that at times you might end up the same number multiple times clustered together. For example 1, 2, 2, 8, 4, 5, 5, 5. This algorithm ensures that the songs are uniformly distributed.

The Art of Music Shuffling
The above link is visually explained and is very easy to grok.

Rsync

Rsync efficiently transfers only the differences between the two files. Say you have an outdated file A on your server and the latest file A on your personal computer. You have to replace the file on the server with the latest file. The naive way to send the file would be to send the entire file from your computer to the server. It uses more bandwidth and takes more time.

The smarter way would be to just send the differences between both the files. Only the parts that have actually changed. Rsync uses hashing to find out the differences and can then enable you to send only the differences saving you time and bandwidth.

Overview of How Rsync Works

Hyperloglog

I stumbled upon this during my attempt to read the source code of Redis. It is used to return the number of distinct elements in a list or container. It uses lesser memory than the traditional approach (iterate over each thing, if not already encountered increase count by 1) but isn’t exact. It’s pretty accurate that even Redis uses it.

Explanation by Antirez
StackOverflow
Points to a number of related resources and gives a decent explanation as well.

Pratt Parsing

Pratt parsing is a simple parsing technique. I haven’t worked with parsers so I don’t know the benefits it offers over other parsing techniques.

These tutorials go really in depth.