Hi all, Thanks for the discussions on the original patch. I wanted to circle back and see if you had any further comments/concerns on the second version of the patchset. Best Sarthak On Mon, Sep 27, 2021 at 3:44 AM Sarthak Kukreti <sarthakkukreti@xxxxxxxxxxxx> wrote: > > This patch adds an extended option "assume_storage_prezeroed" to > mke2fs. When enabled, this option acts as a hint to mke2fs that > the underlying block device was zeroed before mke2fs was called. > This allows mke2fs to optimize out the zeroing of the inode > table and the journal, which speeds up the filesystem creation > time. > > Additionally, on thinly provisioned storage devices (like Ceph, > dm-thin, newly created sparse loopback files), reads on unmapped extents > return zero. This property allows mke2fs (with assume_storage_prezeroed) > to avoid pre-allocating metadata space for inode tables for the entire > filesystem and saves space that would normally be preallocated > for zero inode tables. > > Tests > ----- > 1) Running 'mke2fs -t ext4' on 10G sparse files on an ext4 > filesystem drops the time taken by mke2fs from 0.09s to 0.04s > and reduces the initial metadata space allocation (stat on > sparse file) from 139736 blocks (545M) to 8672 blocks (34M). > > 2) On ChromeOS (running linux kernel 4.19) with dm-thin > and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>': > > - Time taken by mke2fs drops from 1.07s to 0.08s. > - Avoiding zeroing out the inode table and journal reduces the > initial metadata space allocation from 0.48% to 0.01%. > - Lazy inode table zeroing results in a further 1.45% of logical > volume space getting allocated for inode tables, even if no file > data is added to the filesystem. With assume_storage_prezeroed, > the metadata allocation remains at 0.01%. > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@xxxxxxxxxxxx> > -- > Changes in v2: Added regression test, fixed indentation. > --- > misc/mke2fs.8.in | 7 ++++++ > misc/mke2fs.c | 21 ++++++++++++++++- > tests/m_assume_storage_prezeroed/expect | 2 ++ > tests/m_assume_storage_prezeroed/script | 31 +++++++++++++++++++++++++ > 4 files changed, 60 insertions(+), 1 deletion(-) > create mode 100644 tests/m_assume_storage_prezeroed/expect > create mode 100644 tests/m_assume_storage_prezeroed/script > > diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in > index c0b53245..5c6ea5ec 100644 > --- a/misc/mke2fs.8.in > +++ b/misc/mke2fs.8.in > @@ -365,6 +365,13 @@ small risk if the system crashes before the journal has been overwritten > entirely one time. If the option value is omitted, it defaults to 1 to > enable lazy journal inode zeroing. > .TP > +.B assume_storage_prezeroed\fR[\fB= \fI<0 to disable, 1 to enable>\fR] > +If enabled, > +.BR mke2fs > +assumes that the storage device has been prezeroed, skips zeroing the journal > +and inode tables, and annotates the block group flags to signal that the inode > +table has been zeroed. > +.TP > .B no_copy_xattrs > Normally > .B mke2fs > diff --git a/misc/mke2fs.c b/misc/mke2fs.c > index 04b2fbce..24c69966 100644 > --- a/misc/mke2fs.c > +++ b/misc/mke2fs.c > @@ -95,6 +95,7 @@ int journal_size; > int journal_flags; > int journal_fc_size; > static int lazy_itable_init; > +static int assume_storage_prezeroed; > static int packed_meta_blocks; > int no_copy_xattrs; > static char *bad_blocks_filename = NULL; > @@ -1012,6 +1013,11 @@ static void parse_extended_opts(struct ext2_super_block *param, > lazy_itable_init = strtoul(arg, &p, 0); > else > lazy_itable_init = 1; > + } else if (!strcmp(token, "assume_storage_prezeroed")) { > + if (arg) > + assume_storage_prezeroed = strtoul(arg, &p, 0); > + else > + assume_storage_prezeroed = 1; > } else if (!strcmp(token, "lazy_journal_init")) { > if (arg) > journal_flags |= strtoul(arg, &p, 0) ? > @@ -1115,7 +1121,8 @@ static void parse_extended_opts(struct ext2_super_block *param, > "\tnodiscard\n" > "\tencoding=<encoding>\n" > "\tencoding_flags=<flags>\n" > - "\tquotatype=<quota type(s) to be enabled>\n\n"), > + "\tquotatype=<quota type(s) to be enabled>\n" > + "\tassume_storage_prezeroed=<0 to disable, 1 to enable>\n\n"), > badopt ? badopt : ""); > free(buf); > exit(1); > @@ -3095,6 +3102,18 @@ int main (int argc, char *argv[]) > io_channel_set_options(fs->io, opt_string); > } > > + if (assume_storage_prezeroed) { > + if (verbose) > + printf("%s", > + _("Assuming the storage device is prezeroed " > + "- skipping inode table and journal wipe\n")); > + > + lazy_itable_init = 1; > + itable_zeroed = 1; > + zero_hugefile = 0; > + journal_flags |= EXT2_MKJOURNAL_LAZYINIT; > + } > + > /* Can't undo discard ... */ > if (!noaction && discard && dev_size && (io_ptr != undo_io_manager)) { > retval = mke2fs_discard_device(fs); > diff --git a/tests/m_assume_storage_prezeroed/expect b/tests/m_assume_storage_prezeroed/expect > new file mode 100644 > index 00000000..2ca3784a > --- /dev/null > +++ b/tests/m_assume_storage_prezeroed/expect > @@ -0,0 +1,2 @@ > +2384 > +336 > diff --git a/tests/m_assume_storage_prezeroed/script b/tests/m_assume_storage_prezeroed/script > new file mode 100644 > index 00000000..0745fb28 > --- /dev/null > +++ b/tests/m_assume_storage_prezeroed/script > @@ -0,0 +1,31 @@ > +test_description="test prezeroed storage metadata allocation" > +FILE_SIZE=16M > + > +LOG=$test_name.log > +OUT=$test_name.out > +EXP=$test_dir/expect > + > +dd if=/dev/zero of=$TMPFILE.1 bs=1 count=0 seek=$FILE_SIZE >> $LOG 2>&1 > +dd if=/dev/zero of=$TMPFILE.2 bs=1 count=0 seek=$FILE_SIZE >> $LOG 2>&1 > + > +$MKE2FS -o Linux -t ext4 -O has_journal $TMPFILE.1 >> $LOG 2>&1 > +stat -c "%b" $TMPFILE.1 > $OUT > + > +$MKE2FS -o Linux -t ext4 -O has_journal -E assume_storage_prezeroed=1 $TMPFILE.2 >> $LOG 2>&1 > +stat -c "%b" $TMPFILE.2 >> $OUT > + > +rm -f $TMPFILE.1 $TMPFILE.2 > + > +cmp -s $OUT $EXP > +status=$? > + > +if [ "$status" = 0 ] ; then > + echo "$test_name: $test_description: ok" > + touch $test_name.ok > +else > + echo "$test_name: $test_description: failed" > + cat $LOG > $test_name.failed > + diff $EXP $OUT >> $test_name.failed > +fi > + > +unset LOG OUT EXP FILE_SIZE > \ No newline at end of file > -- > 2.31.0 >