On Wed, Jun 12, 2024 at 06:23:18PM -0700, Linus Torvalds wrote: > So I have no problem with your patch 2/2 - moving the lockref data > structure away from everything else that can be shared read-only makes > a ton of sense independently of anything else. > > Except you also randomly increased a retry count in there, which makes no sense. > Cmon man, that's a change which unintentionally crept into the patch and I failed to notice. > But this patch 1/2 makes me go "Eww, hacky hacky". > > We already *have* the retry loop, it's just that currently it only > covers the cmpxchg failures. > > The natural thing to do is to just make the "wait for unlocked" be > part of the same loop. > I was playing with a bpftrace probe reporting how many spins were performed waiting for the lock. For my intentional usage with ls it was always < 30 or so. The random-ass intruder which messes with my bench on occasion needed over 100. The bump is something I must have fat-fingered into the wrong patch. Ultimately even if *some* iterations will still take the lock, they should be able to avoid it next time around, so the permanent degradation problem is still solved. > Mind testing this approach instead? > So I'm not going to argue about the fix. I tested your code and confirm it fixes the problem, nothing blows up either and I fwiw I don't see any bugs in it. When writing the commit message feel free to use mine in whatever capacity (including none) you want. On Wed, Jun 12, 2024 at 06:49:00PM -0700, Linus Torvalds wrote: > On Wed, 12 Jun 2024 at 18:23, Linus Torvalds > One of the ideas behind the reflux was that locking should be an > exceptional thing when something special happens. So things like final > dput() and friends. > > What I *think* is going on - judging by your description of how you > triggered this - is that sadly our libfs 'readdir()' thing is pretty > nasty. > > It does use d_lock a lot for the cursor handling, and things like > scan_positives() in particular. > > I understand *why* it does that, and maybe it's practically unfixable, > but I do think the most likely deeper reason for that "go into slow > mode" is the cursor on the directory causing issues. > > Put another way: while I think doing the retry loop will help > benchmarks, it would be lovely if you were to look at that arguably > deeper issue of the 'd_sib' list. > I think lockref claiming to be a general locking facility means it should not be suffering the degradation problem to begin with, so that would be the real problem as far as lockref goes. For vfs as a whole I do agree that the d_lock usage in various spots is probably avoidable and if so would be nice to get patched out, readdir included. So Someone(tm) should certainly look into it. As for me at the moment I have other stuff on my todo list, so I am not going to do it for the time being. Regardless, patching up lockref to dodge the lock is a step forward in the right direction AFAICS. ===== All that aside, you did not indicate how do you want to move forward regarding patch submission. As indicated in my cover letter the vfs change (if it is to land) needs to be placed *after* the lockref issue is addressed, otherwise it may result in bogus regression reports. Thus I think it makes most sense to just ship them together. So maybe you can send me a patch and I send a v2 of the patchset with that, alternatively perhaps you can edit out the unintional retry adjustment from my dentry patch and handle the rest yourself. Or maybe you have some other idea. The thing that matters to me is not landing in a state where d_lockref is moved around, but the lockref corner case is not patched (even it is patched *after*). I really don't want to investigate a regression report only to find it's caused by the above.