Before I cause a panic… It is about 1-in-a-1000 chance of a collision **if you have 3000 sample files, all the same length**.

*:: math geekery ahead ::*

This is essentially a variation on the Birthday Problem, but with files and hash values, not people and birthdays.

The hash value is 32-bits. Assuming it is a decent algorithm, each file has arbitrary ~~birthday~~ hash value somewhere between 0 and 4 billion (2^32).

For a given number of files, we can compute the likelihood of two of them ~~partying~~ colliding on the same ~~birthday~~ hash:

p(n files) ≈ 1 - e ^ (-n×n / 2^33)

At about 3,000 files, this is 1 in a 1000 chance, at 9,000 files, about 1% chance, and at 77,000 files 50% of a collision. See the first row from this table

**BUT before you freak out,** the files are only considered the same if the length *and* the hash are the same. So you need 3,000 files exactly the same size before you hit that 1-in-a-1000 chance.

So - sure, if you load up 3,000 single cycle waveforms, all exactly the same number of samples – you have a 1 in a 1,000 chance that two of these will collide and one won’t load. If you load up 77,000 single cycle waveforms… you have more time to audition samples than I do!

This all assumes that the hash algorithm is good - but any modern algo. producing 32 bits from 1k byte or more of sample data will be good. As far as I can tell, the algo. used by Elektron has all the right goodness.

Yes, it is a bit of a shame they choose only 32bits for the hash. At 64bits, you couldn’t load enough samples on the unit to get a 1-in-a-million chance of collision. At 128bits, we could all be loading samples until heat death of the universe and still have an infinitesimal chance of collision.

*:: end of math geekery ::*

Don’t worry - hash collision not impossible, but not likely unless you load thousands of samples all the same size.