FALLOC_FL_ZERO_RANGE is faster on dm-crypt device than on underlying block device

Manuel Jacob <me@xxxxxxxxxxxxxx> · Wed, 13 Jul 2022 18:21:49 +0200

I noticed that `mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0` was 
a lot faster on a dm-crypt device (1min 30s) than on the underlying NVMe 
drive (MKNSSDPL2TB-D8, 8min). I tracked it down to the `fallocate` 
system calls made while writing the inode tables (with mode 
FALLOC_FL_ZERO_RANGE).

To show the difference more pronouncedly, I ran the `fallocate` command 
with a bigger size (1 GiB) on the block devices directly. Before each 
test, I ran `echo 3 > /proc/sys/vm/drop_caches`, and after each test, I 
ran `sync` but it finished almost instantly.

fallocate --zero-range /dev/mapper/test -o 1073741824 -l 1073741824
real	0m0.488s
user	0m0.000s
sys	0m0.026s

fallocate --zero-range /dev/nvme0n1p1 -o 1073741824 -l 1073741824
real	0m15.253s
user	0m0.000s
sys	0m0.037s

When opening the dm-crypt device with NO_READ_WORKQUEUE and 
NO_WRITE_WORKQUEUE, the difference is not quite as big. It is probably 
because the encryption happens on one core instead of in 8 tasks on 4 
physical cores.

fallocate --zero-range /dev/mapper/test -o 1073741824 -l 1073741824
real	0m0.943s
user	0m0.003s
sys	0m0.939s

The slowdown on the unencrypted device doesn’t really affect me except 
when running benchmarks (the original goal was testing the performance 
of various operations ON the file system, not file system creation), but 
if I can help tracking down the source of the slowdown, I’d be happy to 
provide more information.