On Jul 2, 2018, at 2:30 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > > On Fri, Jun 29, 2018 at 01:35:41PM -0600, Andreas Dilger wrote: >>>>> Right. So there are two choices: >>>>> >>>>> 1) Keep the blocks beyond i_size marked as uninitialized. You >>>>> transfer and write the full PAGE_SIZE of data, but it simply will >>>>> never be available to the user. >>> >>> Yes, that's for extent mapped files. >>> >>>>> 2) Zero the page, write it out to the file, and then extend i_size and >>>>> mark the extents as uninitialized. >>> >>> Except at that point you do not really need to mark the extent as >>> unitialized, the blocks are allocated and written to and i_size is >>> extended. That's how it needs to be done for indirect block mapped >>> files. > >>>>> Why is it that Lustre is choosing to keep i_size where it is, but to >>>>> mark the blocks beyond it as initialized? >>>> >>>> This isn't about initialized vs. uninitialized extents. It is only about >>>> allocated vs. unallocated blocks, possibly with block-mapped files. There >>>> is no way to have uninitialized blocks with a block-mapped file. > > Does Lustre really support block-mapped files today? If so, why? We used to support block-mapped files on the data servers, and we can't say for sure that all such files are gone. Also, we recently added a feature to support small files on the metadata servers, which are formatted without extents because they are < 16TB and it is more efficient to use block-mapped dirs than extent-mapped dirs. > And if it must support block-mapped files and not just only > extent-mapped files, is there any reason why Lustre can just make sure > (a) there are no blocks allocated past i_size --- ext4 can handle this > case just fine, even if that means there are parts of the page which > are not mapped to a block. Alternatively, (b) if (a) is impossible, > to simply make sure i_size is moved to page_size boundary and all of > the allocated blocks are zero'ed if they haven't been written yet? I would have to see how hard (a) is to implement, but it was definitely implemented in this way for a reason in the first place. I don't see how (b) is possible, since i_size will not be correct in that case? We definitely zero the end of the page beyond i_size so that the data is correct if the file is truncated to a larger size, or blocks are written beyond i_size. >> Like I said previously, this is done with Lustre, which has a different IO submission path than stock ext4. I don't think >> there is any requirement that this only be in upstream ext4, >> since e2fsprogs also has code to support running on BSD, Windows, >> even Hurd. > > If neither (a) or (b) is possible, I'm willing to entertain this. If > we have to go down that path, then we it should be something that > should be configured, perhaps via /etc/e2fsck.conf. The reason for > this is Lustre really is minority use case; and it is *useful* for > e2fsck to flag cases where there are initialized blocks past, i_size, > since it should never happen with the Linux stack. And if it does, > it's a bug, and we should (for example) flag it when running xfstests. > > So I think what I'm going to do for 1.44.3 is to take Lukas's patch. > > We can possibly put it back under some kind of conditional, either via > e2fsck.conf, or via some kind of superblock flag. Or it can be > something that can be patched back in for the Lustre fork of > e2fsprogs. > > - Ted Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP