Re: [PATCH] e2fsck: do not allow initialized blocks pass i_size

Andreas Dilger <adilger@xxxxxxxxx> · Tue, 3 Jul 2018 11:16:32 -0600

On Jul 2, 2018, at 2:30 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> 
> On Fri, Jun 29, 2018 at 01:35:41PM -0600, Andreas Dilger wrote:
>>>>> Right.  So there are two choices:
>>>>> 
>>>>> 1) Keep the blocks beyond i_size marked as uninitialized.  You
>>>>> transfer and write the full PAGE_SIZE of data, but it simply will
>>>>> never be available to the user.
>>> 
>>> Yes, that's for extent mapped files.
>>> 
>>>>> 2)  Zero the page, write it out to the file, and then extend i_size and
>>>>> mark the extents as uninitialized.
>>> 
>>> Except at that point you do not really need to mark the extent as
>>> unitialized, the blocks are allocated and written to and i_size is
>>> extended. That's how it needs to be done for indirect block mapped
>>> files.
> 
>>>>> Why is it that Lustre is choosing to keep i_size where it is, but to
>>>>> mark the blocks beyond it as initialized?
>>>> 
>>>> This isn't about initialized vs. uninitialized extents.  It is only about
>>>> allocated vs. unallocated blocks, possibly with block-mapped files.  There
>>>> is no way to have uninitialized blocks with a block-mapped file.
> 
> Does Lustre really support block-mapped files today?  If so, why?

We used to support block-mapped files on the data servers, and we
can't say for sure that all such files are gone.  Also, we recently
added a feature to support small files on the metadata servers, which
are formatted without extents because they are < 16TB and it is more
efficient to use block-mapped dirs than extent-mapped dirs.

> And if it must support block-mapped files and not just only
> extent-mapped files, is there any reason why Lustre can just make sure
> (a) there are no blocks allocated past i_size --- ext4 can handle this
> case just fine, even if that means there are parts of the page which
> are not mapped to a block.  Alternatively, (b) if (a) is impossible,
> to simply make sure i_size is moved to page_size boundary and all of
> the allocated blocks are zero'ed if they haven't been written yet?

I would have to see how hard (a) is to implement, but it was definitely
implemented in this way for a reason in the first place.

I don't see how (b) is possible, since i_size will not be correct
in that case?  We definitely zero the end of the page beyond i_size
so that the data is correct if the file is truncated to a larger
size, or blocks are written beyond i_size.

>> Like I said previously, this is done with Lustre, which has a different IO submission path than stock ext4.  I don't think
>> there is any requirement that this only be in upstream ext4,
>> since e2fsprogs also has code to support running on BSD, Windows,
>> even Hurd.
> 
> If neither (a) or (b) is possible, I'm willing to entertain this.  If
> we have to go down that path, then we it should be something that
> should be configured, perhaps via /etc/e2fsck.conf.  The reason for
> this is Lustre really is minority use case; and it is *useful* for
> e2fsck to flag cases where there are initialized blocks past, i_size,
> since it should never happen with the Linux stack.  And if it does,
> it's a bug, and we should (for example) flag it when running xfstests.
> 
> So I think what I'm going to do for 1.44.3 is to take Lukas's patch.
> 
> We can possibly put it back under some kind of conditional, either via
> e2fsck.conf, or via some kind of superblock flag.  Or it can be
> something that can be patched back in for the Lustre fork of
> e2fsprogs.
> 
> 						- Ted

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP