On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti <sarthakkukreti@xxxxxxxxxxxx> wrote: > > From: Sarthak Kukreti <sarthakkukreti@xxxxxxxxxxxx> > > This patch adds an extended option "assume_storage_prezeroed" to > mke2fs. When enabled, this option acts as a hint to mke2fs that > the underlying block device was zeroed before mke2fs was called. > This allows mke2fs to optimize out the zeroing of the inode > table and the journal, which speeds up the filesystem creation > time. > > Additionally, on thinly provisioned storage devices (like Ceph, > dm-thin), ... and newly-created sparse loopback files > reads on unmapped extents return zero. This property > allows mke2fs (with assume_storage_prezeroed) to avoid > pre-allocating metadata space for inode tables for the entire > filesystem and saves space that would normally be preallocated > for zero inode tables. > > Testing on ChromeOS (running linux kernel 4.19) with dm-thin > and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>': > > - Time taken by mke2fs drops from 1.07s to 0.08s. > - Avoiding zeroing out the inode table and journal reduces the > initial metadata space allocation from 0.48% to 0.01%. > - Lazy inode table zeroing results in a further 1.45% of logical > volume space getting allocated for inode tables, even if not file > data is added to the filesystem. With assume_storage_prezeroed, > the metadata allocation remains at 0.01%. This seems beneficial, but I'm wondering if this could also be done automatically when TRIM/DISCARD is used by mke2fs to erase a device? One safe option to do this automatically would be to start by *reading* the disk blocks and check if they are all zero, and only switch to zero-block writes if any block is found with non-zero data. That would avoid the extra space usage from zero-block writes in the above cases, and also work for the huge majority of users that won't know the "assume_storage_prezeroed" option even exits, though it won't necessarily reduce the runtime. > diff --git a/misc/mke2fs.c b/misc/mke2fs.c > index 04b2fbce..5293d9b0 100644 > --- a/misc/mke2fs.c > +++ b/misc/mke2fs.c > @@ -3095,6 +3102,18 @@ int main (int argc, char *argv[]) > io_channel_set_options(fs->io, opt_string); > } > > + if (assume_storage_prezeroed) { > + if (verbose) > + printf("%s", > + _("Assuming the storage device is prezeroed " > + "- skipping inode table and journal wipe\n")); > + > + lazy_itable_init = 1; > + itable_zeroed = 1; > + zero_hugefile = 0; > + journal_flags |= EXT2_MKJOURNAL_LAZYINIT; > + } Indentation appears to be broken here - only 2 spaces instead of a tab. This is also missing any kind of test case. Since a large number of the e2fsck test cases are using loopback filesystems created on a sparse file, this would both be good test cases, as well as reducing time/space used during testing. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP