Re: [PATCH 1/1 v4] ext4: fix xfstests 75, 112, 127 punch hole failure

"Ted Ts'o" <tytso@xxxxxxx> · Sun, 7 Aug 2011 22:42:13 -0400

Thanks, much better.  I still have some questions though.

>  /*
> + * ext4_unmap_partial_page_buffers()
> + * Unmaps a page range of length 'length' starting from offset
> + * 'from'.  The range to be unmaped must be contained with in
> + * one page.  If the specified range exceeds the end of the page
> + * it will be shortened to end of the page that cooresponds to
> + * 'from'.  Only block aligned buffers will be unmapped and unblock
> + * aligned buffers are skipped
> + */
> +static int ext4_unmap_partial_page_buffers(handle_t *handle,
> +		struct address_space *mapping, loff_t from, loff_t length)

Is "unmap" really the right name of this function?  Isn't the real to
invalidate a buffer head that corresponds to a punched-out block?

Correct me if I'm wrong, but the primary problem that we need to worry
about here is that the block that has just been punched out could be
reused by some other inode --- but if we still have a dirty buffer
head mapped to the now-punched-out-and-released-block, writeback could
end up corrupting some other buffer.  So in some sense, what we are
effectively doing is a bforget, right?

In fact, if we are doing data journalling, don't we need to call
ext4_forget()?  Otherwise, the block gets reassigned, but if the block
has been data journaled, without the revoke record written by
ext4_forget(), won't we have a potential problem where the journal
replay could end up corrupting another inode's data?

Furthermore, the question I have is why don't have precisely the same
problem with truncate?  If we have a 16k page size, and we open a 15k
file for writing, and then we overwrite the first 15k, and then
truncate the file down to 4k.  At the moment we'll zero out the last
11k of the file, and leave the buffer heads dirty and mapped; then
suppose those blocks get reassigned and used before writeback happens.
We're not calling ext4_unmap_partial_page_buffers() now; why isn't
this a problem today?  Are we just getting lucky?  Why is this more of
a problem with punch, such that xfstests 75, 112, 127, etc. are
failing?

Am I missing something here?

> +	/*
> +	 * Now we need to unmap the non-page-aligned buffers.
> +	 * If the block size is smaller than the page size
> +	 * and the file space being truncated is not
> +	 * page aligned, then unmap the buffers
> +	 */
> +	if (inode->i_sb->s_blocksize < PAGE_CACHE_SIZE &&
> +	   !((offset % PAGE_CACHE_SIZE == 0) &&
> +	   (length % PAGE_CACHE_SIZE == 0))) {

How does this solve the situation I had outlined before?  Suppose are
have a 4k page size, and a 4k block size, and we issue a punch
operation with offset=4092, and length=4105.  In the previous section
of code, the offsets 4092--4095 and 8193--8197 will be zero'ed out.
But offset will not be a multiple of the page size, and length will
not be a multiple of page size, but what has been left to be dealt
with *is* an exactl multiple of a page size, and thus could be
optimized out.

I didn't see any changes in your patch that adjust offset and length
after calling ext4_block_zero_page_range(); so I think we still have
an optimization opportunity we're missing here.

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html