On Thu, 18 Jan 2024 at 07:42, Gabriel Krisman Bertazi <krisman@xxxxxxx> wrote: > > But I don't see how this could be a problem. the str pointer itself > can't change since it's already a copy of the dentry->d_name pointer; if > the string is external, it is guaranteed to exist until the end of the > lookup; Finally, If it's inlined, the string can change concurrently > which, in the worst case scenario, gives us a false result. But then, > VFS will retry, like we do for the case-insensitive comparison right > below. > > ..Right? :) Wrong, for two subtle reasons. The issue is not that the string can go away. The issue is that the string and the length have been loaded independently - and may not match. So when you do that if (len == name->len && !memcmp(str, name->name, len)) the 'name->len' you test for equality with 'len' may not match the length of the string allocated in 'name->name', because they are two different loads of two different values, and we do not hold the lock that makes them consistent. See the big comment (and the READ_ONCE()" in dentry_cmp(), and notice how dentry_cmp() intentionally doesn't use 'name->len' at all. Subtle, subtle - but this is *incredibly* performance-critical code. Locks are completely out of the question - they would make the whole RCU pathwalk completely pointless, and the RCU path-walk with no stores to the dentry cache is what makes the dentry cache perform so well. So it's not even that locks have contention, it's literally that RCU path lookup treats the dentries as read-only and you actually get shared cachelines and true parallel lookups. No reference count updates, no *nothing* like that. This is why dentry_cmp() does that magical dentry_string_cmp(), which is very subtle: it knows that regardless of *which* source of naem we have, the word-aligned reads of d_name->name are safe, because (a) the allocations are word-aligned (b) the dentry name allocations all end with a NUL byte (unlike the pathname ones that can have '/' at the end of the name) (c) the *pathname* length is reliable, and the previous word didn't have a NUL byte in it because that would have ended the compare (d) so that "read one word from cs" is safe, *despite* the fact that the length we have isn't something we can rely on (it comes from the pathname side, of course). And yes, this code is subtle. There are very few people who really understand all the dentry rules. But it is *the* most critical piece of code in the kernel for a lot of real-life loads. Looking up pathnames is pretty much "Job #1" of an OS - a lot of the rest is "details". Linus