Re: New (better?) hash map technique in limit case.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Philip Oakley <philipoakley@iee.email> writes:

> Recently I saw a report [1] on a new theoretical result about how to
> manage hash maps which get nearly 'full', which beats Knuth's limit
> formula. The full paper is at [2]
>
> As I understand it, the method adds the gravestone entries early during
> has collisions to avoid clumping of such collision insertions, rather
> than always having to enter the collision list at the end. This keeps
> the available slots relatively randomly spaced.
>
> It feels like the old random bus arrival problem where the average wait
> for next bus is identical to the average time since the last bust, which
> is the same as the average bus interval (thus 1 + 1 = 1), and the
> technique maintains that advantageous perception.
>
> Given Git's use of hashes, it sounds like it could have uses, assuming
> the theory pans out. I've not yet gone through the paper itself [2] but
> hope springs eternal.
>
> Philip
>
> [1]
> S. Nadis and M. I. of Technology, “Theoretical breakthrough could boost
> data storage.”
> https://techxplore.com/news/2021-11-theoretical-breakthrough-boost-storage.html
> (accessed Nov. 18, 2021).
>
> [2]
> M. A. Bender, B. C. Kuszmaul, and W. Kuszmaul, “Linear Probing
> Revisited: Tombstones Mark the Death of Primary Clustering,”
> arXiv:2107.01250 [cs, math], Jul. 2021, Accessed: Nov. 18, 2021.
> [Online]. Available: http://arxiv.org/abs/2107.01250

Very interesting, thanks for sharing. I haven't read the full paper
either, but this is an interesting result.

It seems that this result is limited to hashmaps with a approximately
equal number of insertions and deletions..

>From [1]

  They found that for applications where the number of insertions and
  deletions stays about the same—and the amount of data added is roughly
  equal to that removed—linear-probing hash tables can operate at high
  storage capacities without sacrificing speed.apacities without
  sacrificing speed.

and [2]

  ...We then turn our attention to sequences of operations that contain
  deletions, and show that the tombstones left behind by those deletions
  have a primary-anti-clustering effect, that is, they have a tendency
  to speed up future insertions

Do we have any such use cases?




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux