Re: hole-punch vs fault

Matthew Wilcox <matthew@xxxxxx> · Mon, 2 Dec 2013 13:13:15 -0700

On Mon, Dec 02, 2013 at 08:58:25AM -0700, Matthew Wilcox wrote:
> On Mon, Dec 02, 2013 at 09:33:18AM +0100, Jan Kara wrote:
> > > Indeed, XIP could make use of the structures we already
> > > have in the struct inode/address space to behave more like a normal
> > > filesystem. We already use the mapping tree to store
> > > per-page information that is used to serialise truncate vs page
> > > faults, so why not make XIP do exactly the same thing? 
> >   I believe that grabbing mmap_sem for writing during truncate (in case of
> > ext2 around xip_truncate_page() & truncate_setsize() calls should do the
> > trick. But I need to verify with lockdep that it doesn't introduce new
> > locking problems.
> 
> Umm ... mmap_sem for write?  On each of the tasks that have a file mmaped?
> I know we have double_down that orders by memory address, but I don't
> think we have an N_down_write().

Here's a simpler solution: Grab i_mmap_sem over vm_insert_mixed().
i_mmap_sem is already taken in unmap_mapping_range(), called from
truncate_pagecache(), so we know that while we hold i_mmap_sem that
i_size has already been updated and the block can't be deallocated
from under us.  We know that i_mmap_sem is safe to take here because it
already is (in __xip_unmap()).

Now we need only worry about holepunch, and it's the same problem for
pagecache and XIP files alike.

diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index d8d9fe3..f6de4c3 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -260,8 +260,17 @@ again:
 		__xip_unmap(mapping, vmf->pgoff);
 
 found:
+		/* We must recheck i_size under i_mmap_mutex */
+		mutex_lock(&mapping->i_mmap_mutex);
+		size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
+							PAGE_CACHE_SHIFT;
+		if (unlikely(vmf->pgoff >= size)) {
+			mutex_unlock(&mapping->i_mmap_mutex);
+			return VM_FAULT_SIGBUS;
+		}
 		err = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address,
 							xip_pfn);
+		mutex_unlock(&mapping->i_mmap_mutex);
 		if (err == -ENOMEM)
 			return VM_FAULT_OOM;
 		/*
@@ -285,6 +294,15 @@ found:
 		}
 		if (error != -ENODATA)
 			goto out;
+
+		/* We must recheck i_size under i_mmap_mutex */
+		mutex_lock(&mapping->i_mmap_mutex);
+		size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
+							PAGE_CACHE_SHIFT;
+		if (unlikely(vmf->pgoff >= size)) {
+			ret = VM_FAULT_SIGBUS;
+			goto out;
+		}
 		/* not shared and writable, use xip_sparse_page() */
 		page = xip_sparse_page();
 		if (!page)
@@ -296,6 +314,7 @@ found:
 
 		ret = VM_FAULT_NOPAGE;
 out:
+		mutex_unlock(&mapping->i_mmap_mutex);
 		write_seqcount_end(&xip_sparse_seq);
 		mutex_unlock(&xip_sparse_mutex);
 

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html