Re: hole-punch vs fault

Jan Kara <jack@xxxxxxx> · Tue, 3 Dec 2013 00:13:01 +0100



On Mon 02-12-13 13:13:15, Matthew Wilcox wrote:
> On Mon, Dec 02, 2013 at 08:58:25AM -0700, Matthew Wilcox wrote:
> > On Mon, Dec 02, 2013 at 09:33:18AM +0100, Jan Kara wrote:
> > > > Indeed, XIP could make use of the structures we already
> > > > have in the struct inode/address space to behave more like a normal
> > > > filesystem. We already use the mapping tree to store
> > > > per-page information that is used to serialise truncate vs page
> > > > faults, so why not make XIP do exactly the same thing? 
> > >   I believe that grabbing mmap_sem for writing during truncate (in case of
> > > ext2 around xip_truncate_page() & truncate_setsize() calls should do the
> > > trick. But I need to verify with lockdep that it doesn't introduce new
> > > locking problems.
> > 
> > Umm ... mmap_sem for write?  On each of the tasks that have a file mmaped?
> > I know we have double_down that orders by memory address, but I don't
> > think we have an N_down_write().
> 
> Here's a simpler solution: Grab i_mmap_sem over vm_insert_mixed().
> i_mmap_sem is already taken in unmap_mapping_range(), called from
> truncate_pagecache(), so we know that while we hold i_mmap_sem that
> i_size has already been updated and the block can't be deallocated
> from under us.  We know that i_mmap_sem is safe to take here because it
> already is (in __xip_unmap()).
  Yeah, that should do the trick.

								Honza

> Now we need only worry about holepunch, and it's the same problem for
> pagecache and XIP files alike.
> 
> diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
> index d8d9fe3..f6de4c3 100644
> --- a/mm/filemap_xip.c
> +++ b/mm/filemap_xip.c
> @@ -260,8 +260,17 @@ again:
>  		__xip_unmap(mapping, vmf->pgoff);
>  
>  found:
> +		/* We must recheck i_size under i_mmap_mutex */
> +		mutex_lock(&mapping->i_mmap_mutex);
> +		size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
> +							PAGE_CACHE_SHIFT;
> +		if (unlikely(vmf->pgoff >= size)) {
> +			mutex_unlock(&mapping->i_mmap_mutex);
> +			return VM_FAULT_SIGBUS;
> +		}
>  		err = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address,
>  							xip_pfn);
> +		mutex_unlock(&mapping->i_mmap_mutex);
>  		if (err == -ENOMEM)
>  			return VM_FAULT_OOM;
>  		/*
> @@ -285,6 +294,15 @@ found:
>  		}
>  		if (error != -ENODATA)
>  			goto out;
> +
> +		/* We must recheck i_size under i_mmap_mutex */
> +		mutex_lock(&mapping->i_mmap_mutex);
> +		size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
> +							PAGE_CACHE_SHIFT;
> +		if (unlikely(vmf->pgoff >= size)) {
> +			ret = VM_FAULT_SIGBUS;
> +			goto out;
> +		}
>  		/* not shared and writable, use xip_sparse_page() */
>  		page = xip_sparse_page();
>  		if (!page)
> @@ -296,6 +314,7 @@ found:
>  
>  		ret = VM_FAULT_NOPAGE;
>  out:
> +		mutex_unlock(&mapping->i_mmap_mutex);
>  		write_seqcount_end(&xip_sparse_seq);
>  		mutex_unlock(&xip_sparse_mutex);
>  
> 
> -- 
> Matthew Wilcox				Intel Open Source Technology Centre
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours.  We can't possibly take such
> a retrograde step."
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html