Hi, On Mon, Aug 01, 2022 at 10:04:46PM -0700, Dipanjan Das wrote: > On Sun, Jul 31, 2022 at 2:53 AM Willy Tarreau <w@xxxxxx> wrote: > > > > Thus I'm a bit confused about what to look for. It's very likely that > > there are still bugs left in this driver, but trying to identify them > > and to validate a fix will be difficult if they cannot be reproduced. > > Maybe they only happen under emulation due to timing issues. > > > > As such, any hint about the exact setup and how long to wait to get > > the error would be much appreciated. > > We can confirm that we were able to trigger the issue on the latest > 5.19 (commit: 3d7cb6b04c3f3115719235cc6866b10326de34cd) with the > C-repro within a VM. We use this: > https://syzkaller.appspot.com/text?tag=KernelConfig&x=cd73026ceaed1402 > config to build the kernel. The issue triggers after around 143 > seconds. For all the five times we tried, we were able to reproduce > the issue deterministically every time. Please let us know if you need > any other information. Yep, I could reproduce it under qemu as well. I've added traces, and ugly things are happening with the lock (but I haven't understood what yet). What I saw was that process_fd_request() is first called under lock, then we drop the lock, then __floppy_read_block_0() is called under lock, which calls process_fd_request(), then the lock is dropped, wait_for_completion() is called, then process_fd_request() is called again without lock this time, and from there we're looping in fd_wait_for_completion. I need to dig into more details but it doesn't seem right to me that process_fd_request() is sometimes called under a lock and sometimes out, and that __floppy_read_block_0() is called with a lock held and it's relesed under it. I could have missed certain things due to the concurrent accesses but in any case I should probably not be observing this. I'll try to dig deeper. I really don't know that area and I must confess it's not the most exciting to rediscover each time :-) Thanks, Willy