Re: [PATCH] ceph: force updating the msg pointer in non-split case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 17, 2023 at 6:18 PM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
>
> Hey Xiubo,
>
> On Wed, May 17, 2023 at 4:45 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> >
> >
> > On 5/17/23 19:04, Xiubo Li wrote:
> > >
> > > On 5/17/23 18:31, Ilya Dryomov wrote:
> > >> On Wed, May 17, 2023 at 7:24 AM <xiubli@xxxxxxxxxx> wrote:
> > >>> From: Xiubo Li <xiubli@xxxxxxxxxx>
> > >>>
> > >>> When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the
> > >>> request may still contain a list of 'split_realms', and we need
> > >>> to skip it anyway. Or it will be parsed as a corrupt snaptrace.
> > >>>
> > >>> Cc: stable@xxxxxxxxxxxxxxx
> > >>> Cc: Frank Schilder <frans@xxxxxx>
> > >>> Reported-by: Frank Schilder <frans@xxxxxx>
> > >>> URL: https://tracker.ceph.com/issues/61200
> > >>> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> > >>> ---
> > >>>   fs/ceph/snap.c | 3 +++
> > >>>   1 file changed, 3 insertions(+)
> > >>>
> > >>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > >>> index 0e59e95a96d9..d95dfe16b624 100644
> > >>> --- a/fs/ceph/snap.c
> > >>> +++ b/fs/ceph/snap.c
> > >>> @@ -1114,6 +1114,9 @@ void ceph_handle_snap(struct ceph_mds_client
> > >>> *mdsc,
> > >>>                                  continue;
> > >>>                          adjust_snap_realm_parent(mdsc, child,
> > >>> realm->ino);
> > >>>                  }
> > >>> +       } else {
> > >>> +               p += sizeof(u64) * num_split_inos;
> > >>> +               p += sizeof(u64) * num_split_realms;
> > >>>          }
> > >>>
> > >>>          /*
> > >>> --
> > >>> 2.40.1
> > >>>
> > >> Hi Xiubo,
> > >>
> > >> This code appears to be very old -- it goes back to the initial commit
> > >> 963b61eb041e ("ceph: snapshot management") in 2009.  Do you have an
> > >> explanation for why this popped up only now?
> > >
> > > As I remembered we hit this before in one cu BZ last year, but I
> > > couldn't remember exactly which one.  But I am not sure whether @Jeff
> > > saw this before I joint ceph team.
> > >
> > @Venky,
> >
> > Do you remember which one ? As I remembered this is why we fixed the
> > snaptrace issue by blocking all the IOs and at the same time
> > blocklisting the kclient before.
> >
> > Before the kcleint won't dump the corrupted msg and we don't know what
> > was wrong with the msg and also we added code to dump the msg in the
> > above fix.
>
> The "corrupted" snaptrace issue happened just after the mds asserted
> hitting the metadata corruption (dentry first corrupted) and it
> _seemed_ that this corruption somehow triggered a corrupted snaptrace
> to be sent to the client.

[sent message a bit early - cotd...]

But I think you found the issue - the message dump did help and its
not related to the dentry first corruption.

>
> >
> > Thanks
> >
> > - Xiubo
> >
> > >
> > >> Has MDS always been including split_inos and split_realms arrays in
> > >> !CEPH_SNAP_OP_SPLIT case or is this a recent change?  If it's a recent
> > >> change, I'd argue that this needs to be addressed on the MDS side.
> > >
> > > While in MDS side for the _UPDATE op it won't send the 'split_realm'
> > > list just before the commit in 2017:
> > >
> > > commit 93e7267757508520dfc22cff1ab20558bd4a44d4
> > > Author: Yan, Zheng <zyan@xxxxxxxxxx>
> > > Date:   Fri Jul 21 21:40:46 2017 +0800
> > >
> > >     mds: send snap related messages centrally during mds recovery
> > >
> > >     sending CEPH_SNAP_OP_SPLIT and CEPH_SNAP_OP_UPDATE messages to
> > >     clients centrally in MDCache::open_snaprealms()
> > >
> > >     Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx>
> > >
> > > Before this commit it will only send the 'split_realm' list for the
> > > _SPLIT op.
> > >
> > >
> > > The following the snaptrace:
> > >
> > > [Wed May 10 16:03:06 2023] header: 00000000: 05 00 00 00 00 00 00 00
> > > 00 00 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023] header: 00000010: 12 03 7f 00 01 00 00 01
> > > 00 00 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023] header: 00000020: 00 00 00 00 02 01 00 00
> > > 00 00 00 00 00 01 00 00  ................
> > > [Wed May 10 16:03:06 2023] header: 00000030: 00 98 0d 60
> > > 93                                   ...`.
> > > [Wed May 10 16:03:06 2023]  front: 00000000: 00 00 00 00 00 00 00 00
> > > 00 00 00 00 00 00 00 00  ................ <<== The op is 0, which is
> > > 'CEPH_SNAP_OP_UPDATE'
> > > [Wed May 10 16:03:06 2023]  front: 00000010: 0c 00 00 00 88 00 00 00
> > > d1 c0 71 38 00 01 00 00  ..........q8....           <<== The '0c' is
> > > the split_realm number
> > > [Wed May 10 16:03:06 2023]  front: 00000020: 22 c8 71 38 00 01 00 00
> > > d7 c7 71 38 00 01 00 00  ".q8......q8....       <<== All the 'q8' are
> > > the ino#
> > > [Wed May 10 16:03:06 2023]  front: 00000030: d9 c7 71 38 00 01 00 00
> > > d4 c7 71 38 00 01 00 00  ..q8......q8....
> > > [Wed May 10 16:03:06 2023]  front: 00000040: f1 c0 71 38 00 01 00 00
> > > d4 c0 71 38 00 01 00 00  ..q8......q8....
> > > [Wed May 10 16:03:06 2023]  front: 00000050: 20 c8 71 38 00 01 00 00
> > > 1d c8 71 38 00 01 00 00   .q8......q8....
> > > [Wed May 10 16:03:06 2023]  front: 00000060: ec c0 71 38 00 01 00 00
> > > d6 c0 71 38 00 01 00 00  ..q8......q8....
> > > [Wed May 10 16:03:06 2023]  front: 00000070: ef c0 71 38 00 01 00 00
> > > 6a 11 2d 1a 00 01 00 00  ..q8....j.-.....
> > > [Wed May 10 16:03:06 2023]  front: 00000080: 01 00 00 00 00 00 00 00
> > > 01 00 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023]  front: 00000090: ee 01 00 00 00 00 00 00
> > > 01 00 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023]  front: 000000a0: 00 00 00 00 00 00 00 00
> > > 01 00 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023]  front: 000000b0: 01 09 00 00 00 00 00 00
> > > 00 00 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023]  front: 000000c0: 01 00 00 00 00 00 00 00
> > > 02 09 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023]  front: 000000d0: 05 00 00 00 00 00 00 00
> > > 01 09 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023]  front: 000000e0: ff 08 00 00 00 00 00 00
> > > fd 08 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023]  front: 000000f0: fb 08 00 00 00 00 00 00
> > > f9 08 00 00 00 00 00 00  ................
> > > [Wed May 10 16:03:06 2023] footer: 00000000: ca 39 06 07 00 00 00 00
> > > 00 00 00 00 42 06 63 61  .9..........B.ca
> > > [Wed May 10 16:03:06 2023] footer: 00000010: 7b 4b 5d 2d
> > > 05                                   {K]-.
> > >
> > >
> > > And if the split_realm number equals to sizeof(ceph_mds_snap_realm) +
> > > extra snap buffer size by coincidence, the above 'corrupted' snaptrace
> > > will be parsed by kclient too and kclient won't give any warning, but
> > > it will corrupted the snaprealm and capsnap info in kclient.
> > >
> > >
> > > Thanks
> > >
> > > - Xiubo
> > >
> > >
> > >> Thanks,
> > >>
> > >>                  Ilya
> > >>
> >
>
>
> --
> Cheers,
> Venky



-- 
Cheers,
Venky





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux