On Tue, Jul 16, 2013 at 01:18:06PM -0700, Linus Torvalds wrote: > On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers <bpm@xxxxxxx> wrote: > >> > > >> > And looking more at that, I'm actually starting to think this is an > >> > XFS locking problem. XFS really should not call back to splice while > >> > holding the inode lock. > > .. that was misleading, normally "inode lock" would be i_lock, but > here I meant the XFS-specific i_iolock. > > But you clearly picked up on it: > > > CPU0 CPU1 CPU2 > > ---- ---- ---- > > lock(&sig->cred_guard_mutex); > > lock(&pipe->mutex/1); > > lock(&(&ip->io_lock)->mr_lock); > > lock(&(&ip->io_lock)->mr_lock); > > lock(&sig->cred_guard_mutex); > > lock(&pipe->mutex/1); > > Yup. > > > I agree that fixing this in XFS seems to be the most reasonable plan, > > and Dave's approach looks ok to me. Seems like patch 1 should go > > through Al's tree, but we'll also need to commit it to the xfs tree > > prerequisite to patch 2. > > Btw, is there some reason why XFS couldn't just use > generic_file_splice_read() directly? Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We don't use i_mutex for many things IO related, and so internal locking is needed to serialise against stuff like truncate, hole punching, etc, that are run through non-vfs interfaces. > And splice has mmap semantics - the whole point of splice is about > moving pages around, after all - so I *think* your current > "xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);" is actually over-serialization. No, that's just taking the ip->i_iolock in shared mode - that's less serialisation than holding i_mutex as it allow parallel read operations but still locks out concurrent buffered writes to the file (i.e. posix atomic write vs read requirements) > The pages will be shared by the pipe buffers anyway, so splicing by > definition doesn't imply full data serialization (because the reading > of the data itself will happen much later). > > So the per-page serialization done by readpage() should already be > sufficient, no? > > I dunno. Maybe there's something I'm missing. XFS does seem to wrap > all the other generic functions in that lock too, but since mmap() etc > clearly have to be able to get things one page at a time (without any > wrapping at higher layers), I'm just wondering whether splice_read > might not be able to avoid it. Read isn't the problem - it's write that's the deadlock issue... Cheers, Dave. > > Linus > -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs