Am Do., 25. Mai 2023 um 10:56 Uhr schrieb Jan Kara <jack@xxxxxxx>: > On Tue 23-05-23 12:49:06, Kent Overstreet wrote: > > > > No, that's definitely handled (and you can see it in the code I linked), > > > > and I wrote a torture test for fstests as well. > > > > > > I've checked the code and AFAICT it is all indeed handled. BTW, I've now > > > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25 > > > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different > > > way (by prefaulting pages from the iter before grabbing the problematic > > > lock and then disabling page faults for the iomap_dio_rw() call). I guess > > > we should somehow unify these schemes so that we don't have two mechanisms > > > for avoiding exactly the same deadlock. Adding GFS2 guys to CC. > > > > Oof, that sounds a bit sketchy. What happens if the dio call passes in > > an address from the same address space? > > If we submit direct IO that uses mapped file F at offset O as a buffer for > direct IO from file F, offset O, it will currently livelock in an > indefinite retry loop. It should rather return error or fall back to > buffered IO. But that should be fixable. Andreas? Yes, I guess. Thanks for the heads-up. Andreas > But if the buffer and direct IO range does not overlap, it will just > happily work - iomap_dio_rw() invalidates only the range direct IO is done > to. > > > What happens if we race with the pages we faulted in being evicted? > > We fault them in again and retry. > > > > Also good that you've written a fstest for this, that is definitely a useful > > > addition, although I suspect GFS2 guys added a test for this not so long > > > ago when testing their stuff. Maybe they have a pointer handy? > > > > More tests more good. > > > > So if we want to lift this scheme to the VFS layer, we'd start by > > replacing the lock you added (grepping for it, the name escapes me) with > > a different type of lock - two_state_shared_lock in my code, it's like a > > rw lock except writers don't exclude other writers. That way the DIO > > path can use it without singlethreading writes to a single file. > > Yes, I've noticed that you are introducing in bcachefs a lock with very > similar semantics to mapping->invalidate_lock, just with this special lock > type. What I'm kind of worried about with two_state_shared_lock as > implemented in bcachefs is the fairness. AFAICS so far if someone is e.g. > heavily faulting pages on a file, direct IO to that file can be starved > indefinitely. That is IMHO not a good thing and I would not like to use > this type of lock in VFS until this problem is resolved. But it should be > fixable e.g. by introducing some kind of deadline for a waiter after which > it will block acquisitions of the other lock state. > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR