On Wed, May 17, 2023 at 5:03 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Wed, May 17, 2023 at 7:27 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > On Wed, May 17, 2023 at 3:59 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > > > > > On Wed, May 17, 2023 at 4:33 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > > > > > On Wed, May 17, 2023 at 1:04 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > On 5/17/23 18:31, Ilya Dryomov wrote: > > > > > > On Wed, May 17, 2023 at 7:24 AM <xiubli@xxxxxxxxxx> wrote: > > > > > >> From: Xiubo Li <xiubli@xxxxxxxxxx> > > > > > >> > > > > > >> When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the > > > > > >> request may still contain a list of 'split_realms', and we need > > > > > >> to skip it anyway. Or it will be parsed as a corrupt snaptrace. > > > > > >> > > > > > >> Cc: stable@xxxxxxxxxxxxxxx > > > > > >> Cc: Frank Schilder <frans@xxxxxx> > > > > > >> Reported-by: Frank Schilder <frans@xxxxxx> > > > > > >> URL: https://tracker.ceph.com/issues/61200 > > > > > >> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > > > > > >> --- > > > > > >> fs/ceph/snap.c | 3 +++ > > > > > >> 1 file changed, 3 insertions(+) > > > > > >> > > > > > >> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > > > > >> index 0e59e95a96d9..d95dfe16b624 100644 > > > > > >> --- a/fs/ceph/snap.c > > > > > >> +++ b/fs/ceph/snap.c > > > > > >> @@ -1114,6 +1114,9 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc, > > > > > >> continue; > > > > > >> adjust_snap_realm_parent(mdsc, child, realm->ino); > > > > > >> } > > > > > >> + } else { > > > > > >> + p += sizeof(u64) * num_split_inos; > > > > > >> + p += sizeof(u64) * num_split_realms; > > > > > >> } > > > > > >> > > > > > >> /* > > > > > >> -- > > > > > >> 2.40.1 > > > > > >> > > > > > > Hi Xiubo, > > > > > > > > > > > > This code appears to be very old -- it goes back to the initial commit > > > > > > 963b61eb041e ("ceph: snapshot management") in 2009. Do you have an > > > > > > explanation for why this popped up only now? > > > > > > > > > > As I remembered we hit this before in one cu BZ last year, but I > > > > > couldn't remember exactly which one. But I am not sure whether @Jeff > > > > > saw this before I joint ceph team. > > > > > > > > > > > > > > > > Has MDS always been including split_inos and split_realms arrays in > > > > > > !CEPH_SNAP_OP_SPLIT case or is this a recent change? If it's a recent > > > > > > change, I'd argue that this needs to be addressed on the MDS side. > > > > > > > > > > While in MDS side for the _UPDATE op it won't send the 'split_realm' > > > > > list just before the commit in 2017: > > > > > > > > > > commit 93e7267757508520dfc22cff1ab20558bd4a44d4 > > > > > Author: Yan, Zheng <zyan@xxxxxxxxxx> > > > > > Date: Fri Jul 21 21:40:46 2017 +0800 > > > > > > > > > > mds: send snap related messages centrally during mds recovery > > > > > > > > > > sending CEPH_SNAP_OP_SPLIT and CEPH_SNAP_OP_UPDATE messages to > > > > > clients centrally in MDCache::open_snaprealms() > > > > > > > > > > Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx> > > > > > > > > > > Before this commit it will only send the 'split_realm' list for the > > > > > _SPLIT op. > > > > > > > > It sounds like we have the culprit. This should be treated as > > > > a regression and fixed on the MDS side. I don't see a justification > > > > for putting useless data on the wire. > > > > > > I don't really understand this viewpoint. We can treat it as an MDS > > > regression if we want, but it's a six-year-old patch so this is in > > > nearly every version of server code anybody's running. Why wouldn't we > > > fix it on both sides? > > > > Well, if I didn't speak up chances are we wouldn't have identified the > > regression in the MDS at all. People seem to have this perception that > > the client is somehow "easier" to fix, assume that the server is always > > doing the right thing and default to patching the client. I'm just > > trying to push back on that. > > > > In this particular case, after understanding the scope of the issue > > _and_ getting a committal for the MDS side fix, I approved taking the > > kernel client patch in an earlier reply. > > > > > > > > On Wed, May 17, 2023 at 4:07 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > > And if the split_realm number equals to sizeof(ceph_mds_snap_realm) + > > > > extra snap buffer size by coincidence, the above 'corrupted' snaptrace > > > > will be parsed by kclient too and kclient won't give any warning, but it > > > > will corrupted the snaprealm and capsnap info in kclient. > > > > > > I'm a bit confused about this patch, but I also don't follow the > > > kernel client code much so please forgive my ignorance. The change > > > you've made is still only invoked inside of the CEPH_SNAP_OP_SPLIT > > > case, so clearly the kclient *mostly* does the right thing when these > > > > No, it's invoked outside: it introduces a "op != CEPH_SNAP_OP_SPLIT" > > branch. > > Oh I mis-parsed the braces/spacing there. > > I'm still not getting how the precise size is causing the problem — > obviously this isn't an unheard-of category of issue, but the fact > that it works until the count matches a magic number is odd. Is that > ceph_decode_need macro being called from ceph_update_snap_trace just > skipping over the split data somehow? *puzzled* I'm not sure what count or magic number you are referring to. AFAIU the issue is that ceph_update_snap_trace() expects to be called with "p" pointing at ceph_mds_snap_realm encoding. In the case that is being fixed it gets called with "p" pointing at "split_inos" array (and "split_realms" array following that). The exact check that throws it off (whether ceph_decode_need() or something else) is irrelevant -- we are already toasted by then. In the worst case no check could fire and ceph_update_snap_trace() could interpret the bogus array data as a valid ceph_mds_snap_realm with who knows what consequences... Thanks, Ilya