Re: hole-punch vs fault

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 29 Nov 2013 09:12:46 +1100

On Wed, Nov 27, 2013 at 09:44:54PM -0700, Matthew Wilcox wrote:
> On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote:
> > On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote:
> > > > The second is to put some kind of generation counter in the inode or
> > > > address_space that is incremented on every deallocation.  The fault path
> > > > would read the generation counter before calling find_get_page() and then
> > > > check it hasn't changed after getting the page lock (goto retry_find).  I
> > > > like that one a little more since we can fix it all in common code, and I
> > > > think it'll be lower overhead in the fault path.
> > >   I'm not sure I understand how this should work. After pagecache is
> > > truncated, fault can come and happily instantiate the page again. After
> > > that fault is done, hole punching awakes and removes blocks from under that
> > > page. So checking the generation counter after you get the page lock seems
> > > useless to me.
> > 
> > Yeah, I don't think the page lock is enough (maybe someone can convince
> > me otherwise).  How does this look?  Checking the generation number
> > after we get i_mutex ensures that the truncate has finished running.
> 
> Ohh.  We can't take i_mutex because that's an inversion with mmap_sem.
> So option 1 is to convince ourselves that page lock is enough (and that
> doesn't solve the problem for ext2-xip).
> 
> Option 2 is to do something similar to seqcount_t.  We can't use
> seqcount_t as-is because it spins in read_seqcount_begin, and it could
> be spinning for a very long time while we read a page in from disc!

Option 2 is really any other method of synchronisation. It doesn't
matter is it's a seqlock like construct, or a more complex
shared/exclusive range lock - it's still the "same" solution.

Indeed, we've already discussed this at length - if we are going to
introduce a new method of synchronisation between mmap and
holepunch, then we cannot ignore all the other mmap vs IO
synchronisation problems we have that are caused by the same issue.
Hacking workarounds into the holepunch/mmap paths is not a robust or
maintainable solution.

> How about something like this:
> 
> typedef struct seqtex {
> 	unsigned		sequence;
> 	spinlock_t		wait_lock;
> 	struct list_head	wait_list;
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> 	struct lockdep_map	dep_map;
> #endif
> }

The biggest problem I see here is that

	a) it is a entire file lock and truncate/holepunch do not
	   affect the entire scope of the file. i.e. there is no
	   reason why holepunch cannot run concurrently with IO if
	   they are to different ranges of a file.

	b) it increases the struct inode by at least 5%. If we are
	   going to increase the inode by that much, we better make
	   sure we solve more than one isolated, relatively rare
	   corner case behaviour.

We need to do better than playing whack-a-mole with a big hammer
here...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html