Re: [PATCH] ceph: force updating the msg pointer in non-split case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 17, 2023 at 5:03 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> On Wed, May 17, 2023 at 7:27 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> >
> > On Wed, May 17, 2023 at 3:59 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > >
> > > On Wed, May 17, 2023 at 4:33 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, May 17, 2023 at 1:04 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > > On 5/17/23 18:31, Ilya Dryomov wrote:
> > > > > > On Wed, May 17, 2023 at 7:24 AM <xiubli@xxxxxxxxxx> wrote:
> > > > > >> From: Xiubo Li <xiubli@xxxxxxxxxx>
> > > > > >>
> > > > > >> When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the
> > > > > >> request may still contain a list of 'split_realms', and we need
> > > > > >> to skip it anyway. Or it will be parsed as a corrupt snaptrace.
> > > > > >>
> > > > > >> Cc: stable@xxxxxxxxxxxxxxx
> > > > > >> Cc: Frank Schilder <frans@xxxxxx>
> > > > > >> Reported-by: Frank Schilder <frans@xxxxxx>
> > > > > >> URL: https://tracker.ceph.com/issues/61200
> > > > > >> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> > > > > >> ---
> > > > > >>   fs/ceph/snap.c | 3 +++
> > > > > >>   1 file changed, 3 insertions(+)
> > > > > >>
> > > > > >> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > > > > >> index 0e59e95a96d9..d95dfe16b624 100644
> > > > > >> --- a/fs/ceph/snap.c
> > > > > >> +++ b/fs/ceph/snap.c
> > > > > >> @@ -1114,6 +1114,9 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
> > > > > >>                                  continue;
> > > > > >>                          adjust_snap_realm_parent(mdsc, child, realm->ino);
> > > > > >>                  }
> > > > > >> +       } else {
> > > > > >> +               p += sizeof(u64) * num_split_inos;
> > > > > >> +               p += sizeof(u64) * num_split_realms;
> > > > > >>          }
> > > > > >>
> > > > > >>          /*
> > > > > >> --
> > > > > >> 2.40.1
> > > > > >>
> > > > > > Hi Xiubo,
> > > > > >
> > > > > > This code appears to be very old -- it goes back to the initial commit
> > > > > > 963b61eb041e ("ceph: snapshot management") in 2009.  Do you have an
> > > > > > explanation for why this popped up only now?
> > > > >
> > > > > As I remembered we hit this before in one cu BZ last year, but I
> > > > > couldn't remember exactly which one.  But I am not sure whether @Jeff
> > > > > saw this before I joint ceph team.
> > > > >
> > > > >
> > > > > > Has MDS always been including split_inos and split_realms arrays in
> > > > > > !CEPH_SNAP_OP_SPLIT case or is this a recent change?  If it's a recent
> > > > > > change, I'd argue that this needs to be addressed on the MDS side.
> > > > >
> > > > > While in MDS side for the _UPDATE op it won't send the 'split_realm'
> > > > > list just before the commit in 2017:
> > > > >
> > > > > commit 93e7267757508520dfc22cff1ab20558bd4a44d4
> > > > > Author: Yan, Zheng <zyan@xxxxxxxxxx>
> > > > > Date:   Fri Jul 21 21:40:46 2017 +0800
> > > > >
> > > > >      mds: send snap related messages centrally during mds recovery
> > > > >
> > > > >      sending CEPH_SNAP_OP_SPLIT and CEPH_SNAP_OP_UPDATE messages to
> > > > >      clients centrally in MDCache::open_snaprealms()
> > > > >
> > > > >      Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx>
> > > > >
> > > > > Before this commit it will only send the 'split_realm' list for the
> > > > > _SPLIT op.
> > > >
> > > > It sounds like we have the culprit.  This should be treated as
> > > > a regression and fixed on the MDS side.  I don't see a justification
> > > > for putting useless data on the wire.
> > >
> > > I don't really understand this viewpoint. We can treat it as an MDS
> > > regression if we want, but it's a six-year-old patch so this is in
> > > nearly every version of server code anybody's running. Why wouldn't we
> > > fix it on both sides?
> >
> > Well, if I didn't speak up chances are we wouldn't have identified the
> > regression in the MDS at all.  People seem to have this perception that
> > the client is somehow "easier" to fix, assume that the server is always
> > doing the right thing and default to patching the client.  I'm just
> > trying to push back on that.
> >
> > In this particular case, after understanding the scope of the issue
> > _and_ getting a committal for the MDS side fix, I approved taking the
> > kernel client patch in an earlier reply.
> >
> > >
> > > On Wed, May 17, 2023 at 4:07 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > > And if the split_realm number equals to sizeof(ceph_mds_snap_realm) +
> > > > extra snap buffer size by coincidence, the above 'corrupted' snaptrace
> > > > will be parsed by kclient too and kclient won't give any warning, but it
> > > > will corrupted the snaprealm and capsnap info in kclient.
> > >
> > > I'm a bit confused about this patch, but I also don't follow the
> > > kernel client code much so please forgive my ignorance. The change
> > > you've made is still only invoked inside of the CEPH_SNAP_OP_SPLIT
> > > case, so clearly the kclient *mostly* does the right thing when these
> >
> > No, it's invoked outside: it introduces a "op != CEPH_SNAP_OP_SPLIT"
> > branch.
>
> Oh I mis-parsed the braces/spacing there.
>
> I'm still not getting how the precise size is causing the problem —
> obviously this isn't an unheard-of category of issue, but the fact
> that it works until the count matches a magic number is odd. Is that
> ceph_decode_need macro being called from ceph_update_snap_trace just
> skipping over the split data somehow? *puzzled*

I'm not sure what count or magic number you are referring to.

AFAIU the issue is that ceph_update_snap_trace() expects to be called
with "p" pointing at ceph_mds_snap_realm encoding.  In the case that is
being fixed it gets called with "p" pointing at "split_inos" array (and
"split_realms" array following that).  The exact check that throws it
off (whether ceph_decode_need() or something else) is irrelevant -- we
are already toasted by then.  In the worst case no check could fire and
ceph_update_snap_trace() could interpret the bogus array data as
a valid ceph_mds_snap_realm with who knows what consequences...

Thanks,

                Ilya




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux