On Wed, Nov 27, 2013 at 09:44:54PM -0700, Matthew Wilcox wrote: > On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote: > > On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote: > > > > The second is to put some kind of generation counter in the inode or > > > > address_space that is incremented on every deallocation. The fault path > > > > would read the generation counter before calling find_get_page() and then > > > > check it hasn't changed after getting the page lock (goto retry_find). I > > > > like that one a little more since we can fix it all in common code, and I > > > > think it'll be lower overhead in the fault path. > > > I'm not sure I understand how this should work. After pagecache is > > > truncated, fault can come and happily instantiate the page again. After > > > that fault is done, hole punching awakes and removes blocks from under that > > > page. So checking the generation counter after you get the page lock seems > > > useless to me. > > > > Yeah, I don't think the page lock is enough (maybe someone can convince > > me otherwise). How does this look? Checking the generation number > > after we get i_mutex ensures that the truncate has finished running. > > Ohh. We can't take i_mutex because that's an inversion with mmap_sem. > So option 1 is to convince ourselves that page lock is enough (and that > doesn't solve the problem for ext2-xip). > > Option 2 is to do something similar to seqcount_t. We can't use > seqcount_t as-is because it spins in read_seqcount_begin, and it could > be spinning for a very long time while we read a page in from disc! Option 2 is really any other method of synchronisation. It doesn't matter is it's a seqlock like construct, or a more complex shared/exclusive range lock - it's still the "same" solution. Indeed, we've already discussed this at length - if we are going to introduce a new method of synchronisation between mmap and holepunch, then we cannot ignore all the other mmap vs IO synchronisation problems we have that are caused by the same issue. Hacking workarounds into the holepunch/mmap paths is not a robust or maintainable solution. > How about something like this: > > typedef struct seqtex { > unsigned sequence; > spinlock_t wait_lock; > struct list_head wait_list; > #ifdef CONFIG_DEBUG_LOCK_ALLOC > struct lockdep_map dep_map; > #endif > } The biggest problem I see here is that a) it is a entire file lock and truncate/holepunch do not affect the entire scope of the file. i.e. there is no reason why holepunch cannot run concurrently with IO if they are to different ranges of a file. b) it increases the struct inode by at least 5%. If we are going to increase the inode by that much, we better make sure we solve more than one isolated, relatively rare corner case behaviour. We need to do better than playing whack-a-mole with a big hammer here... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html