Re: hole-punch vs fault

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 28 Nov 2013 15:22:48 +1100

On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote:
> On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote:
> > > The second is to put some kind of generation counter in the inode or
> > > address_space that is incremented on every deallocation.  The fault path
> > > would read the generation counter before calling find_get_page() and then
> > > check it hasn't changed after getting the page lock (goto retry_find).  I
> > > like that one a little more since we can fix it all in common code, and I
> > > think it'll be lower overhead in the fault path.
> >   I'm not sure I understand how this should work. After pagecache is
> > truncated, fault can come and happily instantiate the page again. After
> > that fault is done, hole punching awakes and removes blocks from under that
> > page. So checking the generation counter after you get the page lock seems
> > useless to me.
> 
> Yeah, I don't think the page lock is enough (maybe someone can convince
> me otherwise).  How does this look?  Checking the generation number
> after we get i_mutex ensures that the truncate has finished running.

But there's no guarantee that whoever is removing the pages from the
page cache is holding the i_mutex (XFS hole punching and most page
cache operations serialise on XFS_IOLOCK_EXCL, not i_mutex), so you
can't use i_mutex in this way for generic code.

Besides...

> @@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	pgoff_t offset = vmf->pgoff;
>  	struct page *page;
>  	pgoff_t size;
> +	unsigned damage = mapping_damage(mapping);
>  	int ret = 0;
>  
>  	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> @@ -1676,6 +1677,18 @@ retry_find:
>  		return VM_FAULT_SIGBUS;
>  	}
>  
> +	/*
> +	 * Check if we were the unlucky victim of a holepunch
> +	 */
> +	mutex_lock(&inode->i_mutex);
> +	if (unlikely(mapping_is_damaged(mapping, damage))) {
> +		mutex_unlock(&inode->i_mutex);
> +		unlock_page(page);
> +		page_cache_release(page);
> +		damage = mapping_damage(mapping);
> +		goto retry_find;
> +	}

... aren't we holding the mmap_sem here? i.e. taking the i_mutex
here will lead to deadlocks....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html