On Thu, Jul 18, 2013 at 05:21:17PM -0500, Ben Myers wrote: > Dave, > > On Thu, Jul 18, 2013 at 04:16:32PM -0500, Ben Myers wrote: > > On Thu, Jul 18, 2013 at 01:42:03PM +1000, Dave Chinner wrote: > > > On Wed, Jul 17, 2013 at 05:17:36PM -0700, Linus Torvalds wrote: > > > > On Wed, Jul 17, 2013 at 4:40 PM, Ben Myers <bpm@xxxxxxx> wrote: > > > > >> > > > > >> We're still talking at cross purposes then. > > > > >> > > > > >> How the hell do you handle mmap() and page faulting? > > > > > > > > > > __xfs_get_blocks serializes access to the block map with the i_lock on the > > > > > xfs_inode. This appears to be racy with respect to hole punching. > > > > > > > > Would it be possible to just make __xfs_get_blocks get the i_iolock > > > > (non-exclusively)? > > > > > > No. __xfs_get_blocks() operates on metadata (e.g. extent lists), and > > > as such is protected by the i_ilock (note: not the i_iolock). i.e. > > > XFS has a multi-level locking strategy: > > > > > > i_iolock is provided for *data IO serialisation*, > > > i_ilock is for *inode metadata serialisation*. > > > > I think if __xfs_get_blocks has some way of knowing it is the mmap/page fault > > path, taking the iolock shared in addition to the ilock (in just that case) > > would prevent the mmap from being able to read stale data from disk. You would > > see either the data before the punch or you would see the hole. > > > > Actually... I think that is wrong: You'd have to take the iolock across the > > read itself (not just the access to the block map) for it to have the desired > > effect: > > > > 1608 int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > > ... > > 1704 page_not_uptodate: > > 1705 /* > > 1706 * Umm, take care of errors if the page isn't up-to-date. > > 1707 * Try to re-read it _once_. We do this synchronously, > > 1708 * because there really aren't any performance issues here > > 1709 * and we need to check for errors. > > 1710 */ > > 1711 ClearPageError(page); > > 1712 error = mapping->a_ops->readpage(file, page); > > 1713 if (!error) { > > 1714 wait_on_page_locked(page); > > 1715 if (!PageUptodate(page)) > > 1716 error = -EIO; > > 1717 } > > 1718 page_cache_release(page); > > > > Wouldn't you have to hold the iolock until after wait_on_page_locked returns? > > Maybe like so (crappy/untested/probably wrong/fodder for ridicule/etc): Try running it with lockdep. You'll see pretty quickly why you can't take the i_iolock or i_mutex in the ->fault path - it is called with the mmap_sem held. The lock inversion that can deadlock is that the page fault might be occurring in the read path that is already holding the i_mutex/i_iolock.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs