On Thu, Nov 07, 2024 at 09:54:36AM -1000, Linus Torvalds wrote: > On Thu, 31 Oct 2024 at 12:31, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > Added some stats, and on my load (reading email in the web browser, > > some xterms and running an allmodconfig kernel build), I get about a > > 45% hit-rate for the fast-case: out of 44M calls to > > generic_permission(), about 20M hit the fast-case path. > > So the 45% hit rate really bothered me, because on the load I was > testing I really thought it should be 100%. > > And in fact, sometimes it *was* 100% when I did profiles, and I never > saw the slow case at all. So I saw that odd bimodal behavior where > sometimes about half the accesses went through the slow path, and > sometimes none of them did. > > It took me way too long to realize why that was the case: the quick > "do we have ACL's" test works wonderfully well when the ACL > information is cached, but the cached case isn't always filled in. > > For some unfathomable reason I just mindlessly thought that "if the > ACL info isn't filled in, and we will go to the slow case, it now > *will* be filled in, so next time around we'll have it in the cache". > > But that was just silly of me. We may never call "check_acl()" at all, > because if we do the lookup as the owner, we never even bother to look > up any ACL information: > > /* Are we the owner? If so, ACL's don't matter */ > > So next time around, the ACL info *still* won't be filled in, and so > we *still* won't take the fastpath. > > End result: that patch is not nearly as effective as I would have > liked. Yes, it actually gets reasonable hit-rates, but the > ACL_NOT_CACHED state ends up being a lot stickier than my original > mental model incorrectly throught it would be. > How about filesystems maintaing a flag: IOP_EVERYONECANTRAREVERSE? The name is a keybordfull and not the actual proposal. Rationale: To my reading generic_permission gets called for all path components, where almost all of them just want to check if they can traverse. So happens for vast majority of real path components the x is there for *everyone*. Even in case of /home/$user/crap, while the middle dir has x only for the owner and maybe the group, everything *below* tends to also be all x. I just did a kernel build while poking at the state with bpftrace: bpftrace -e 'kprobe:generic_permission { @[(((struct inode *)arg1)->i_mode & 0x49) == 0x49] = count(); }' result: @[0]: 5623736 @[1]: 64867147 iow in 92% of calls everyone had x. Also note this collects calls for non-traversal, so the real hit ratio is higher so to speak. I don't use acls here so they were of no consequence anyway btw. So if a filesystem cares to be faster, when instatianating an inode or getting setattr called on it it can (re)compute if there is anything blocking x for anyone. If nothing is in the way it can the flag and allow link_path_walk to skip everything, otherwise *unset* the flag (as needed). This is completely transparent to filesystems which don't participate. So that would be my proposal, no interest in coding it.