Re: Does LVM have any plan/schedule to support btrfs in fsadm

Gionatan Danti <g.danti@xxxxxxxxxx> · Wed, 30 Jun 2021 00:32:05 +0200

Il 2021-06-29 01:00 Chris Murphy ha scritto:
Pretty sure it's fixed since 4.14.
https://lkml.org/lkml/2019/2/10/23

Hi Chris, the headline states "harden against duplicate fsid". Does it 
means that the issue is "only" less likely or it was really solved?

It's not inherently slow, it's a tracking cost problem as very large
numbers of extents accumulate. And it also depends on the write
pattern of the guest file system. If you use Btrfs in a guest on a
host using Btrfs, it's a lot more competitive. There's certainly room
for improvement, possibly with some hinting to avoid writing out a
metric ton of 4KiB blocks as other file systems are prone to doing,
where btrfs can turn these into  largely sequential writes, they lose
any locality optimization the guest file system expects for subsequent
reads. A lot of the locality issue is a factor on rotational devices.
When talking about hundreds of thousands of extents per VM file, this
has a noticeable impact on even SSDs, but the much reduced latency
makes it tolerable for some scenarios.

I think the main issue stems for btrfs striking to have 4K CoW extents.
ZFS has a default 128K recordsize that, while commanding a fair 
read/modify/write overhead, works much better with HDDs (for SSDs one 
can lower recordize to 16K or 32K).
XFS with reflink does something similar, doing CoW at 128K block 
granularity (we had a similar discussion in the past: 
https://www.spinics.net/lists/linux-xfs/msg35679.html)

But I've seen similar problems with VM's on LVM thinp when making many
snapshots and incurring cow, however temporary (like a btrfs nodatacow
file that's subject to snapshots or reflink copies; or a backing file
on xfs likewise reflink copied). There really isn't much better we can
do than LVM thick in this regard. And if that's the standard bearer,
it's not much different if you fallocate a nodatacow file.

If I remember correctly thin LVM minimum chunk size should be 64K, 
making it much less prone to fragmentation. Moreover, it only CoW when a 
snapshot if overwritten for the first time (ZFS reallocates at each 
write and I think btrfs does something similar).

In a distant past, I benchmarked a virtual machine running on btrfs over 
a fallocated+nocow files and the result was quite bleak. Maybe things 
have improved more than I can imagine... time for some more benchmark I 
suppose! Do you have any to share?

Some databases are cow friendly, notably rocksdb. And sqlite with wal
enabled is at least not cow unfriendly. The worst offender seems to be
postgresql but I haven't seen any benchmarking since the multiple
kernel series of fsync work done on btrfs to improve the performance
of databases in general; that was kernel 5.8 through 5.11.

Yeah, both PostgreSQL and MySQL tend to be slow on btrfs.

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/