On Fri, May 26, 2023 at 12:25:31AM +0200, Andreas Grünbacher wrote: > Am Di., 23. Mai 2023 um 18:28 Uhr schrieb Christoph Hellwig <hch@xxxxxxxxxxxxx>: > > On Tue, May 23, 2023 at 03:34:31PM +0200, Jan Kara wrote: > > > I've checked the code and AFAICT it is all indeed handled. BTW, I've now > > > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25 > > > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different > > > way (by prefaulting pages from the iter before grabbing the problematic > > > lock and then disabling page faults for the iomap_dio_rw() call). I guess > > > we should somehow unify these schemes so that we don't have two mechanisms > > > for avoiding exactly the same deadlock. Adding GFS2 guys to CC. > > > > > > Also good that you've written a fstest for this, that is definitely a useful > > > addition, although I suspect GFS2 guys added a test for this not so long > > > ago when testing their stuff. Maybe they have a pointer handy? > > > > generic/708 is the btrfs version of this. > > > > But I think all of the file systems that have this deadlock are actually > > fundamentally broken because they have a mess up locking hierarchy > > where page faults take the same lock that is held over the the direct I/ > > operation. And the right thing is to fix this. I have work in progress > > for btrfs, and something similar should apply to gfs2, with the added > > complication that it probably means a revision to their network > > protocol. > > We do disable page faults, and there can be deadlocks in page fault > handlers while no page faults are allowed. > > I'm roughly aware of the locking hierarchy that other filesystems use, > and that's something we want to avoid because of two reasons: (1) it > would be an incompatible change, and (2) we want to avoid cluster-wide > locking operations as much as possible because they are very slow. > > These kinds of locking conflicts are so rare in practice that the > theoretical inefficiency of having to retry the operation doesn't > matter. Would you be willing to expand on that? I'm wondering if this would simplify things for gfs2, but you mention locking heirarchy being an incompatible change - how does that work? > > > I'm absolutely not in favour to add workarounds for thes kind of locking > > problems to the core kernel. I already feel bad for allowing the > > small workaround in iomap for btrfs, as just fixing the locking back > > then would have avoid massive ratholing. > > Please let me know when those btrfs changes are in a presentable shape ... I would also be curious to know what btrfs needs and what the approach is there.