On Fri, Jun 29, 2018 at 01:35:41PM -0600, Andreas Dilger wrote: > >>> Right. So there are two choices: > >>> > >>> 1) Keep the blocks beyond i_size marked as uninitialized. You > >>> transfer and write the full PAGE_SIZE of data, but it simply will > >>> never be available to the user. > > > > Yes, that's for extent mapped files. > > > >>> 2) Zero the page, write it out to the file, and then extend i_size and > >>> mark the extents as uninitialized. > > > > Except at that point you do not really need to mark the extent as > > unitialized, the blocks are allocated and written to and i_size is > > extended. That's how it needs to be done for indirect block mapped > > files. > >>> Why is it that Lustre is choosing to keep i_size where it is, but to > >>> mark the blocks beyond it as initialized? > >> > >> This isn't about initialized vs. uninitialized extents. It is only about > >> allocated vs. unallocated blocks, possibly with block-mapped files. There > >> is no way to have uninitialized blocks with a block-mapped file. Does Lustre really support block-mapped files today? If so, why? And if it must support block-mapped files and not just only extent-mapped files, is there any reason why Lustre can just make sure (a) there are no blocks allocated past i_size --- ext4 can handle this case just fine, even if that means there are parts of the page which are not mapped to a block. Alternatively, (b) if (a) is impossible, to simply make sure i_size is moved to page_size boundary and all of the allocated blocks are zero'ed if they haven't been written yet? > Like I said previously, this is done with Lustre, which has a different IO > submission path than stock ext4. I don't think there is any requirement that > this only be in upstream ext4, since e2fsprogs also has code to support running > on BSD, Windows, even Hurd. If neither (a) or (b) is possible, I'm willing to entertain this. If we have to go down that path, then we it should be something that should be configured, perhaps via /etc/e2fsck.conf. The reason for this is Lustre really is minority use case; and it is *useful* for e2fsck to flag cases where there are initialized blocks past, i_size, since it should never happen with the Linux stack. And if it does, it's a bug, and we should (for example) flag it when running xfstests. So I think what I'm going to do for 1.44.3 is to take Lukas's patch. We can possibly put it back under some kind of conditional, either via e2fsck.conf, or via some kind of superblock flag. Or it can be something that can be patched back in for the Lustre fork of e2fsprogs. - Ted