I noticed that `mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0` was
a lot faster on a dm-crypt device (1min 30s) than on the underlying NVMe
drive (MKNSSDPL2TB-D8, 8min). I tracked it down to the `fallocate`
system calls made while writing the inode tables (with mode
FALLOC_FL_ZERO_RANGE).
To show the difference more pronouncedly, I ran the `fallocate` command
with a bigger size (1 GiB) on the block devices directly. Before each
test, I ran `echo 3 > /proc/sys/vm/drop_caches`, and after each test, I
ran `sync` but it finished almost instantly.
fallocate --zero-range /dev/mapper/test -o 1073741824 -l 1073741824
real 0m0.488s
user 0m0.000s
sys 0m0.026s
fallocate --zero-range /dev/nvme0n1p1 -o 1073741824 -l 1073741824
real 0m15.253s
user 0m0.000s
sys 0m0.037s
When opening the dm-crypt device with NO_READ_WORKQUEUE and
NO_WRITE_WORKQUEUE, the difference is not quite as big. It is probably
because the encryption happens on one core instead of in 8 tasks on 4
physical cores.
fallocate --zero-range /dev/mapper/test -o 1073741824 -l 1073741824
real 0m0.943s
user 0m0.003s
sys 0m0.939s
The slowdown on the unencrypted device doesn’t really affect me except
when running benchmarks (the original goal was testing the performance
of various operations ON the file system, not file system creation), but
if I can help tracking down the source of the slowdown, I’d be happy to
provide more information.