Thanks for reviewing the patch, Andreas! On Tue, Sep 21, 2021 at 2:39 PM Andreas Dilger <adilger@xxxxxxxxx> wrote: > > On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti <sarthakkukreti@xxxxxxxxxxxx> wrote: > > is > > From: Sarthak Kukreti <sarthakkukreti@xxxxxxxxxxxx> > > ... > > Additionally, on thinly provisioned storage devices (like Ceph, > > dm-thin), > > ... and newly-created sparse loopback files > Thanks for pointing that out, added to the commit message in v2. ... > > Testing on ChromeOS (running linux kernel 4.19) with dm-thin > > and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>': > > > > - Time taken by mke2fs drops from 1.07s to 0.08s. > > - Avoiding zeroing out the inode table and journal reduces the > > initial metadata space allocation from 0.48% to 0.01%. > > - Lazy inode table zeroing results in a further 1.45% of logical > > volume space getting allocated for inode tables, even if not file > > data is added to the filesystem. With assume_storage_prezeroed, > > the metadata allocation remains at 0.01%. > > This seems beneficial, but I'm wondering if this could also be > done automatically when TRIM/DISCARD is used by mke2fs to erase > a device? > > One safe option to do this automatically would be to start by > *reading* the disk blocks and check if they are all zero, and only > switch to zero-block writes if any block is found with non-zero > data. That would avoid the extra space usage from zero-block > writes in the above cases, and also work for the huge majority of > users that won't know the "assume_storage_prezeroed" option even > exits, though it won't necessarily reduce the runtime. > I agree with Ted (quoting a reply on a forked thread below) that reading all inode table blocks on the device will slow down mke2fs a lot depending on the storage medium and size. Maybe it can be done instead at first mount in conjunction with lazy_itable_init ie. ext4 reads the block and only issues a zero-out if the block is not already zero? Even so, an explicit hint would be compatible with this approach: it avoids (unnecessarily) reading through all the inode table blocks as long as the hint was passed at creation time. On Wed, Sep 22, 2021 at 8:57 PM Theodore Ts'o <tytso@xxxxxxx> wrote: > The problem is mke2fs really does need to care about the performance > of discard or write same. Users want mke2fs to be fast, especially > during the distro installation process. That's why we implemented the > lazy inode table initialization feature in the first place. So > reading all each block from the inode table to see if it's zero might > be slow, and so we might be better off just doing the lazy itable init > instead. ... > > + if (assume_storage_prezeroed) { > > + if (verbose) > > + printf("%s", > > + _("Assuming the storage device is prezeroed " > > + "- skipping inode table and journal wipe\n")); > > + > > + lazy_itable_init = 1; > > + itable_zeroed = 1; > > + zero_hugefile = 0; > > + journal_flags |= EXT2_MKJOURNAL_LAZYINIT; > > + } > > Indentation appears to be broken here - only 2 spaces instead of a tab. > > This is also missing any kind of test case. Since a large number of > the e2fsck test cases are using loopback filesystems created on a sparse > file, this would both be good test cases, as well as reducing time/space > used during testing. > Oops, thanks for catching that! Fixed in v2 and I added a test case for this option. I was playing around with adding the option as a default to tests/mke2fs.conf.in; that didn't affect the overall test run time much (a lot of the tests seem to be dd'ing entire files and not using sparse files). Best Sarthak