On Mon, Mar 23, 2020 at 09:33:42AM -0700, Darrick J. Wong wrote: > On Mon, Mar 23, 2020 at 10:22:02AM +0100, Pavel Reichl wrote: > > Oh, thanks for the heads up...I'll try to investigate. > > Ahah, I figured it out. It took me a while to pin down a solid reproducer, > but here's a stack trace that I see most often: .... > lockdep tracks the rwsem's lock state /and/ which process actually > holds the rwsem. This ownership doesn't transfer from 28177 to 27081, > so when the kworker asks lockdep if it holds ILOCK_EXCL, lockdep says > no, because 27081 doesn't own the lock, 28177 does. Kaboom. Yeah, linux has this weird idea that semaphores can only be accessed from a single process context, like a mutex. We've beaten our head against this many times, and it's the only reason struct semaphore still exists and everything isn't a mutex. rwsems now do crazy stuff like track the owner for spinning optimisations, despite the fact it's a semaphore and, by definition, can be released by non-owner contexts.... > The old mrlock_t had that 'int mr_writer' field which didn't care about > lock ownership and so isilocked would return true and so the assert was > happy. > > So now comes the fun part -- the old isilocked code had a glaring hole > in which it would return true if *anyone* held the lock, even if the > owner is some other unrelated thread. That's probably good enough for > most of the fstests because we generally only run one thread at a time, > and developers will probably notice. :) No, that's not a bug, that's how semaphores work - semaphores protect the object, not the process context that took the lock. i.e. the difference between a mutex and a semaphore is that the mutex protects a code path from running concurrently with other processes, while a semaphore protects an object from other accesses, even when they don't have a process context associated with them (e.g. while they are under IO). > However, with your series applied, the isilocked function becomes much > more powerful when lockdep is active because now we can test that the > lock is held *by the caller*, which closes that hole. Not really, it's just another "lockdep behaviour is wrong" issue we have to work around. When we switch to a worker, we need to release and acquire the lockdep context so that it thinks the current process working on the object owns the lock. > Unfortunately, it also trips over this bmap split case, so if there's a > way to solve that problem we'd better find it quickly. Unfortunately, I > don't know of a way to gift a lock to another thread temporarily... rwsem_release() when the work is queued, rwsem_acquire() when the work is run. And vice versa when the work is done. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx