Am Di., 23. Mai 2023 um 15:37 Uhr schrieb Jan Kara <jack@xxxxxxx>: > On Wed 10-05-23 02:18:45, Kent Overstreet wrote: > > On Wed, May 10, 2023 at 03:07:37AM +0200, Jan Kara wrote: > > > On Tue 09-05-23 12:56:31, Kent Overstreet wrote: > > > > From: Kent Overstreet <kent.overstreet@xxxxxxxxx> > > > > > > > > This is used by bcachefs to fix a page cache coherency issue with > > > > O_DIRECT writes. > > > > > > > > Also relevant: mapping->invalidate_lock, see below. > > > > > > > > O_DIRECT writes (and other filesystem operations that modify file data > > > > while bypassing the page cache) need to shoot down ranges of the page > > > > cache - and additionally, need locking to prevent those pages from > > > > pulled back in. > > > > > > > > But O_DIRECT writes invoke the page fault handler (via get_user_pages), > > > > and the page fault handler will need to take that same lock - this is a > > > > classic recursive deadlock if userspace has mmaped the file they're DIO > > > > writing to and uses those pages for the buffer to write from, and it's a > > > > lock ordering deadlock in general. > > > > > > > > Thus we need a way to signal from the dio code to the page fault handler > > > > when we already are holding the pagecache add lock on an address space - > > > > this patch just adds a member to task_struct for this purpose. For now > > > > only bcachefs is implementing this locking, though it may be moved out > > > > of bcachefs and made available to other filesystems in the future. > > > > > > It would be nice to have at least a link to the code that's actually using > > > the field you are adding. > > > > Bit of a trick to link to a _later_ patch in the series from a commit > > message, but... > > > > https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/fs-io.c#n975 > > https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/fs-io.c#n2454 > > Thanks and I'm sorry for the delay. > > > > Also I think we were already through this discussion [1] and we ended up > > > agreeing that your scheme actually solves only the AA deadlock but a > > > malicious userspace can easily create AB BA deadlock by running direct IO > > > to file A using mapped file B as a buffer *and* direct IO to file B using > > > mapped file A as a buffer. > > > > No, that's definitely handled (and you can see it in the code I linked), > > and I wrote a torture test for fstests as well. > > I've checked the code and AFAICT it is all indeed handled. BTW, I've now > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25 > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different > way (by prefaulting pages from the iter before grabbing the problematic > lock and then disabling page faults for the iomap_dio_rw() call). I guess > we should somehow unify these schemes so that we don't have two mechanisms > for avoiding exactly the same deadlock. Adding GFS2 guys to CC. > > Also good that you've written a fstest for this, that is definitely a useful > addition, although I suspect GFS2 guys added a test for this not so long > ago when testing their stuff. Maybe they have a pointer handy? Ah yes, that's xfstests commit d3cbdabf ("generic: Test page faults during read and write"). Thanks, Andreas > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR