Re: mnt_list corruption triggered during btrfs/326

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2025/1/4 21:56, Christian Brauner 写道:
On Wed, Jan 01, 2025 at 07:05:10AM +1030, Qu Wenruo wrote:


在 2024/12/30 19:59, Qu Wenruo 写道:
Hi,

Although I know it's triggered from btrfs, but the mnt_list handling is
out of btrfs' control, so I'm here asking for some help.

Thanks for the report.


[BUG]
With CONFIG_DEBUG_LIST and CONFIG_BUG_ON_DATA_CORRUPTION, and an
upstream 6.13-rc kernel, which has commit 951a3f59d268 ("btrfs: fix
mount failure due to remount races"), I can hit the following crash,
with varied frequency (from 1/4 to hundreds runs no crash):

There is also another WARNING triggered, without btrfs callback involved
at all:

[  192.688671] ------------[ cut here ]------------
[  192.690016] WARNING: CPU: 3 PID: 59747 at fs/mount.h:150

This would indicate that move_from_ns() was called on a mount that isn't
attached to a mount namespace (anymore or never has).

Here's it's particularly peculiar because it looks like the warning is
caused by calling move_from_ns() when moving a mount from an anonymous
mount namespace in attach_recursive_mnt().

Can you please try and reproduce this with
commit 211364bef4301838b2e1 ("fs: kill MNT_ONRB")
from the vfs-6.14.mount branch in
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git ?


After the initial 1000 runs (with 951a3f59d268 ("btrfs: fix mount failure due to remount races") cherry picked, or it won't pass that test case), there is no crash nor warning so far.

It's already the best run so far, but I'll keep it running for another day or so just to be extra safe.

So I guess the offending commit is 2eea9ce4310d ("mounts: keep list of mounts in an rbtree")? Putting a list and rb_tree into a union indeed seems a little dangerous, sorry I didn't notice that earlier, but my vmcore indeed show a seemingly valid mnt_node (color = 1, both left/right are NULL).

Thanks a lot for the fix, and it's really a huge relief that it's not something inside btrfs causing the bug.

Thanks,
Qu

[...]

The only caller doesn't hold @mount_lock is iterate_mounts() but that's
only called from audit, and I'm not sure if audit is even involved in
this case.

This is fine as audit creates a private copy of the mount tree it is
interested in. The mount tree is not visible to other callers anymore.


So I ran out of ideas why this mnt_list can even happen.

Even if it's some btrfs' abuse, all mnt_list users are properly
protected thus it should not lead to such list corruption.

Any advice would be appreciated.

Thanks,
Qu








[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux