Re: mnt_list corruption triggered during btrfs/326

Daniel Vacek <neelx@xxxxxxxx> · Mon, 6 Jan 2025 22:50:58 +0100

On Sat, 4 Jan 2025 at 23:26, Qu Wenruo <wqu@xxxxxxxx> wrote:
>
>
>
> 在 2025/1/4 21:56, Christian Brauner 写道:
> > On Wed, Jan 01, 2025 at 07:05:10AM +1030, Qu Wenruo wrote:
> >>
> >>
> >> 在 2024/12/30 19:59, Qu Wenruo 写道:
> >>> Hi,
> >>>
> >>> Although I know it's triggered from btrfs, but the mnt_list handling is
> >>> out of btrfs' control, so I'm here asking for some help.
> >
> > Thanks for the report.
> >
> >>>
> >>> [BUG]
> >>> With CONFIG_DEBUG_LIST and CONFIG_BUG_ON_DATA_CORRUPTION, and an
> >>> upstream 6.13-rc kernel, which has commit 951a3f59d268 ("btrfs: fix
> >>> mount failure due to remount races"), I can hit the following crash,
> >>> with varied frequency (from 1/4 to hundreds runs no crash):
> >>
> >> There is also another WARNING triggered, without btrfs callback involved
> >> at all:
> >>
> >> [  192.688671] ------------[ cut here ]------------
> >> [  192.690016] WARNING: CPU: 3 PID: 59747 at fs/mount.h:150
> >
> > This would indicate that move_from_ns() was called on a mount that isn't
> > attached to a mount namespace (anymore or never has).
> >
> > Here's it's particularly peculiar because it looks like the warning is
> > caused by calling move_from_ns() when moving a mount from an anonymous
> > mount namespace in attach_recursive_mnt().
> >
> > Can you please try and reproduce this with
> > commit 211364bef4301838b2e1 ("fs: kill MNT_ONRB")
> > from the vfs-6.14.mount branch in
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git ?
> >
>
> After the initial 1000 runs (with 951a3f59d268 ("btrfs: fix mount
> failure due to remount races") cherry picked, or it won't pass that test
> case), there is no crash nor warning so far.
>
> It's already the best run so far, but I'll keep it running for another
> day or so just to be extra safe.
>
> So I guess the offending commit is 2eea9ce4310d ("mounts: keep list of
> mounts in an rbtree")?

This one was merged in v6.8 - why would it cause crashes only now?

> Putting a list and rb_tree into a union indeed seems a little dangerous,
> sorry I didn't notice that earlier, but my vmcore indeed show a
> seemingly valid mnt_node (color = 1, both left/right are NULL).

The union seems fine to me as long as the `MNT_ONRB` bit stays
consistent. The crashes (nor warnings) are simply caused by the flag
missing where it should have been set.

--nX

> Thanks a lot for the fix, and it's really a huge relief that it's not
> something inside btrfs causing the bug.
>
> Thanks,
> Qu
>
> [...]
> >>>
> >>> The only caller doesn't hold @mount_lock is iterate_mounts() but that's
> >>> only called from audit, and I'm not sure if audit is even involved in
> >>> this case.
> >
> > This is fine as audit creates a private copy of the mount tree it is
> > interested in. The mount tree is not visible to other callers anymore.
> >
> >>>
> >>> So I ran out of ideas why this mnt_list can even happen.
> >>>
> >>> Even if it's some btrfs' abuse, all mnt_list users are properly
> >>> protected thus it should not lead to such list corruption.
> >>>
> >>> Any advice would be appreciated.
> >>>
> >>> Thanks,
> >>> Qu
> >>>
> >>
> >
>
>