xfs_clear_incompat_log_features considered harmful?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

Christoph spied the xfs_swapext_can_use_without_log_assistance
function[0] in the atomic file updates patchset[1] and wondered why we
go through this inverted-bitmask dance to avoid setting the
XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature.

(The same principles apply to xfs_attri_can_use_without_log_assistance
from the since-merged LARP series.)

The reason for this dance is that xfs_add_incompat_log_feature is an
expensive operation -- it forces the log, pushes the AIL, and then if
nobody's beaten us to it, sets the feature bit and issues a synchronous
write of the primary superblock.  That could be a one-time cost
amortized over the life of the filesystem, but the log quiesce and cover
operations call xfs_clear_incompat_log_features to remove feature bits
opportunistically.  On a moderately loaded filesystem this leads to us
cycling those bits on and off over and over, which hurts performance.

Why do we clear the log incompat bits?  Back in ~2020 I think Dave and I
had a conversation on IRC[2] about what the log incompat bits represent.
IIRC in that conversation we decided that the log incompat bits protect
unrecovered log items so that old kernels won't try to recover them and
barf.  Since a clean log has no protected log items, we could clear the
bits at cover/quiesce time.  At the time, I think we decided to go with
this idea because we didn't like the idea of reducing the span of
kernels that can recover a filesystem over the lifetime of that
filesystem.

[ISTR Eric pointing out at some point that adding incompat feature bits
at runtime could confuse users who crash and try to recover with Ye Olde
Knoppix CD because there's now a log incompat bit set that wasn't there
at format time, but my memory is a bit hazy.]

Christoph wondered why I don't just set the log incompat bits at mkfs
time, especially for filesystems that have some newer feature set (e.g.
parent pointers, metadir, rtgroups...) to avoid the runtime cost of
adding the feature flag.  I don't bother with that because of the log
clearing behavior.  He also noted that _can_use_without_log_assistance
is potentially dangerous if distro vendors backport features to old
kernels in a different order than they land in upstream.

Another problem with this scheme is the l_incompat_users rwsem that we
use to protect a log cleaning operation from clearing a feature bit that
a frontend thread is trying to set -- this lock adds another way to fail
w.r.t. locking.  For the swapext series, I shard that into multiple
locks just to work around the lockdep complaints, and that's fugly.

My final point is that this cycling increases fstests runtime by a good
5%, though fstests isn't a normal workflow.

So.

Given that this set/clear dance imposes continuous runtime costs on all
the users, I want to remove xfs_clear_incompat_log_features.  Log
incompat bits get set once, and they never go away.  This eliminates the
need for the rwsem, all the extra incompat-clearing bits in the log
code, and fixes the performance problems I see.

Going forward, I'd make mkfs set the log incompat features during a
fresh format if any of the currently-undefined feature bits are set,
which means that they'll be enabled by default on any filesystem with
directory parent pointers and/or metadata directories.  I'd also add
mkfs -l options so that sysadmins can turn it on at format time.

We can discuss whether we want to allow people to set the log incompat
features at runtime -- allowing it at least for recent filesystems (e.g.
v5 + rmap) is easier for users, but only if we decide that we don't
really care about the "recover with old Knoppix" surprise.  If we decide
against online upgrades, we /could/ at least allow upgrades via
xfs_admin like we have for bigtime/inobtcnt.  Or we could decide that
new functionality requires a reformat.

Thoughts?  I vote for removing xfs_clear_incompat_log_features and
letting people turn on log incompat features at runtime.

--D

[0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=djwong-wtf&id=fc6f493c540c520b24e28dfa77c23eea773fa20d
[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tag/?h=atomic-file-updates_2024-01-30
[2] https://lore.kernel.org/linux-xfs/10237117-2149-d504-bbad-6ec28d420c9b@xxxxxxxxxx/




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux