On Thu, 17 Dec 2020 11:03:20 +0100 Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > I think we're tripping over the might_sleep() all the mutexes have, > and that's not as good as yours, but good enough to catch a missing > rcu_read_unlock(). That's kinda why I'm baffled, since like almost > every 2nd function in the backtrace grabbed a mutex and it was all > fine until the very last. > > I think it would be really nice if the rcu checks could retain (in > debugging only) the backtrace of the outermost rcu_read_lock, so we > could print that when something goes wrong in cases where it's leaked. > For normal locks lockdep does that already (well not full backtrace I > think, just the function that acquired the lock, but that's often > enough). I guess that doesn't exist yet? > > Also yes without reproducer this is kinda tough nut to crack. I'm looking at drm_client_modeset_commit_atomic(), where it triggered after the "retry:" label, which to get to, does a bit of goto spaghetti, with a -EDEADLK detected and a goto backoff, which calls goto retry, and then the next mutex taken is the one that triggers the bug. As this is hard to reproduce, but reproducible by a fuzzer, I'm guessing there's some error return path somewhere in there that doesn't release an rcu_read_lock(). -- Steve