On Tue, May 23, 2023 at 03:34:31PM +0200, Jan Kara wrote: > I've checked the code and AFAICT it is all indeed handled. BTW, I've now > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25 > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different > way (by prefaulting pages from the iter before grabbing the problematic > lock and then disabling page faults for the iomap_dio_rw() call). I guess > we should somehow unify these schemes so that we don't have two mechanisms > for avoiding exactly the same deadlock. Adding GFS2 guys to CC. > > Also good that you've written a fstest for this, that is definitely a useful > addition, although I suspect GFS2 guys added a test for this not so long > ago when testing their stuff. Maybe they have a pointer handy? generic/708 is the btrfs version of this. But I think all of the file systems that have this deadlock are actually fundamentally broken because they have a mess up locking hierarchy where page faults take the same lock that is held over the the direct I/ operation. And the right thing is to fix this. I have work in progress for btrfs, and something similar should apply to gfs2, with the added complication that it probably means a revision to their network protocol. I'm absolutely not in favour to add workarounds for thes kind of locking problems to the core kernel. I already feel bad for allowing the small workaround in iomap for btrfs, as just fixing the locking back then would have avoid massive ratholing.