On Thu, Feb 22, 2024 at 08:37:23PM +0000, Matthew Wilcox (Oracle) wrote: > Rosebush is a resizing, scalable, cache-aware, RCU optimised hash table. > I've written a load of documentation about how it works, mostly in > Documentation/core-api/rosebush.rst but some is dotted through the > rosebush.c file too. > > You can see this code as a further exploration of the "Linked lists are > evil" design space. For the workloads which a hashtable is suited to, > it has lower overhead than either the maple tree or the rhashtable. > It cannot support ranges (so is not a replacement for the maple tree), > but it does have per-bucket locks so is more scalable for write-heavy > workloads. I suspect one could reimplement the rhashtable API on top > of the rosebush, but I am not interested in doing that work myself. > > The per-object overhead is 12 bytes, as opposed to 16 bytes for the > rhashtable and 32 bytes for the maple tree. The constant overhead is also > small, being just 16 bytes for the struct rosebush. The exact amount > of memory consumed for a given number of objects is going to depend on > the distribution of hashes; here are some estimated consumptions for > power-of-ten entries distributed evenly over a 32-bit hash space in the > various data structures: > > number xarray maple rhash rosebush > 1 3472 272 280 256 > 10 32272 784 424 256 > 100 262kB 3600 1864 2080 > 1000 [1] 34576 17224 16432 > 10k [1] 343k 168392 131344 > 100k [1] 3.4M 1731272 2101264 So I think the interesting numbers to see (besides performance numbers) are going to be the fill factor numbers under real world use. It's an interesting technique, I've played around with it a bit (actually using it in bcachefs for the nocow locks hash table), but no idea if it makes sense as a general purpose thing yet... you also mentioned that a motivation was API mismatch between rhashtable and dcache - could you elaborate on that?