On Fri, Mar 07, 2014 at 09:17:19AM -0800, Andi Kleen wrote: > Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > > > * switch allocation to alloc_large_system_hash() > > * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=) > > * switch mountpoint_hashtable from list_head to hlist_head > > So how much memory does this use on a standard system (<4GB memory)? Two hash chains per megabyte. IOW, 2Gb => 4096 of them. Could drive it lower, probably, but I'm a bit nervous about _really_ low-end boxen. Right now it matches your variant at 128Mb box. > How much memory does it use on a large system (0.5TB)? Probably over the top - alloc_large_system_hash() has that problem in general. 0.5Tb would amount to 2^20 hash chains. Which is probably too much, but then dentry hash on the same box will be 2^26 hash chains and inode hash - 2^25 chains. > How good is your hash function. Would jhash be more appropiate > and allow smaller hash tables? How the hell does hash function affect the average chain length? And yes, they are pretty evenly distributed. And while we are at it, what the hell does jhash have to do with the whole thing? No strings involved - the hash key is a pair of pointers. To objects allocated by kmem_cache_alloc(), so their middle bits are already fairly random. > Perhaps just want a tree here. _What_ tree? Andi, WTF are you talking about? That hash is a mapping from (parent vfsmount, mountpoint dentry) to child vfsmount. Sure, we do have the mount tree - all children of given vfsmount are on a cyclic list anchored in it. And iterating through those is painfully slow on realistic setups. Have a bunch of nfs4 referrals on one fs, and you are welcome to a hundred of vfsmounts on the child list of one. Create a bunch of bindings in assorted places in /usr, have *any* mountpoint in /usr pay the price when we cross it on pathname resolution. Same with /, since there tends to be a bunch of mounts on directories in root filesystem. FWIW, on the box I'm typing at right now - /proc, /sys, /dev, /run, /tmp, /home, /usr, /var. 8 of them. On the box I'm sshed into: /proc, /sys, /dev, /run, /usr, /media, /var, /tmp, /boot, /archive - 10. And traversing that would be the price on *any* pathname resolution trying to cross from root to e.g. /home or /usr. To get an equally bad behaviour on the current setup (even with the ridiculously small hash table size set by your patch back in 2002), you'd need ~2000 mounts. And it very easily degrades even more - consider e.g. a box that has a shitload of stuff automounted under /home. It can easily go well past 10 simultaneous ones... We could keep a per-mountpoint list, anchored in dentry. And pay with an extra pointer in each struct dentry out there. Which will cost a lot more. Or, we could anchor them in struct mountpoint and do hash lookup *and* list search on each mountpoint crossing - one to find struct mountpoint by struct dentry, another to look for vfsmount with the right parent. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html