Philip Oakley <philipoakley@iee.email> writes: > Recently I saw a report [1] on a new theoretical result about how to > manage hash maps which get nearly 'full', which beats Knuth's limit > formula. The full paper is at [2] > > As I understand it, the method adds the gravestone entries early during > has collisions to avoid clumping of such collision insertions, rather > than always having to enter the collision list at the end. This keeps > the available slots relatively randomly spaced. > > It feels like the old random bus arrival problem where the average wait > for next bus is identical to the average time since the last bust, which > is the same as the average bus interval (thus 1 + 1 = 1), and the > technique maintains that advantageous perception. > > Given Git's use of hashes, it sounds like it could have uses, assuming > the theory pans out. I've not yet gone through the paper itself [2] but > hope springs eternal. > > Philip > > [1] > S. Nadis and M. I. of Technology, “Theoretical breakthrough could boost > data storage.” > https://techxplore.com/news/2021-11-theoretical-breakthrough-boost-storage.html > (accessed Nov. 18, 2021). > > [2] > M. A. Bender, B. C. Kuszmaul, and W. Kuszmaul, “Linear Probing > Revisited: Tombstones Mark the Death of Primary Clustering,” > arXiv:2107.01250 [cs, math], Jul. 2021, Accessed: Nov. 18, 2021. > [Online]. Available: http://arxiv.org/abs/2107.01250 Very interesting, thanks for sharing. I haven't read the full paper either, but this is an interesting result. It seems that this result is limited to hashmaps with a approximately equal number of insertions and deletions.. >From [1] They found that for applications where the number of insertions and deletions stays about the same—and the amount of data added is roughly equal to that removed—linear-probing hash tables can operate at high storage capacities without sacrificing speed.apacities without sacrificing speed. and [2] ...We then turn our attention to sequences of operations that contain deletions, and show that the tombstones left behind by those deletions have a primary-anti-clustering effect, that is, they have a tendency to speed up future insertions Do we have any such use cases?