Re: mds dump inode crashes file system

Frank Schilder <frans@xxxxxx> · Fri, 26 May 2023 13:38:35 +0000

Update to the list: a first issue was discovered and fixed on both, the MDS and kclient side. the tracker for the bug is here: https://tracker.ceph.com/issues/61200 . It contains a link to the kclient patchwork. There is no link to the MDS PR (yet).

This bug is responsible for the mount going stale. I still need to confirm that it did not lead to meta data corruption and also fixes the original problem reported here, the crash on "mds dump inode". TBC.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Xiubo Li <xiubli@xxxxxxxxxx>
Sent: Wednesday, May 17, 2023 7:43 AM
To: Gregory Farnum; Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: mds dump inode crashes file system

On 5/16/23 21:55, Gregory Farnum wrote:
> On Fri, May 12, 2023 at 5:28 AM Frank Schilder <frans@xxxxxx> wrote:
>> Dear Xiubo and others.
>>
>>>> I have never heard about that option until now. How do I check that and how to I disable it if necessary?
>>>> I'm in meetings pretty much all day and will try to send some more info later.
>>> $ mount|grep ceph
>> I get
>>
>> MON-IPs:SRC on DST type ceph (rw,relatime,name=con-fs2-rit-pfile,secret=<hidden>,noshare,acl,mds_namespace=con-fs2,_netdev)
>>
>> so async dirop seems disabled.
>>
>>> Yeah, the kclient just received a corrupted snaptrace from MDS.
>>> So the first thing is you need to fix the corrupted snaptrace issue in cephfs and then continue.
>> Ooookaaayyyy. I will take it as a compliment that you seem to assume I know how to do that. The documentation gives 0 hits. Could you please provide me with instructions of what to look for and/or what to do first?
>>
>>> If possible you can parse the above corrupted snap message to check what exactly corrupted.
>>> I haven't get a chance to do that.
>> Again, how would I do that? Is there some documentation and what should I expect?
>>
>>> You seems didn't enable the 'osd blocklist' cephx auth cap for mon:
>> I can't find anything about an osd blocklist client auth cap in the documentation. Is this something that came after octopus? Our caps are as shown in the documentation for a ceph fs client (https://docs.ceph.com/en/octopus/cephfs/client-auth/), the one for mon is "allow r":
>>
>>          caps mds = "allow rw path=/shares"
>>          caps mon = "allow r"
>>          caps osd = "allow rw tag cephfs data=con-fs2"
>>
>>
>>> I checked that but by reading the code I couldn't get what had cause the MDS crash.
>>> There seems something wrong corrupt the metadata in cephfs.
>> He wrote something about an invalid xattrib (empty value). It would be really helpful to get a clue how to proceed. I managed to dump the MDS cache with the critical inode in cache. Would this help with debugging? I also managed to get debug logs with debug_mds=20 during a crash caused by an "mds dump inode" command. Would this contain something interesting? I can also pull the rados objects out and can upload all of these files.
> I was just guessing about the invalid xattr based on the very limited
> crash info, so if it's clearly broken snapshot metadata from the
> kclient logs I would focus on that.

Actually the snaptrace was not corrupted and I have fixed the bug in
kclient side. More detail please see my reply in the last mail.

For the MDS side's crash:

  ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7fe979ae9b92]
  2: (()+0x27ddac) [0x7fe979ae9dac]
  3: (()+0x5ce831) [0x7fe979e3a831]
  4: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
  5: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
  6: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) [0x55c08c41e00c]

I just guess it may corrupted in dumping the 'old_inode':

4383 void InodeStoreBase::dump(Formatter *f) const
4384 {

...
4401   f->open_array_section("old_inodes");
4402   for (const auto &p : old_inodes) {
4403     f->open_object_section("old_inode");
4404     // The key is the last snapid, the first is in the
mempool_old_inode
4405     f->dump_int("last", p.first);
4406     p.second.dump(f);
4407     f->close_section();  // old_inode
4408   }
4409   f->close_section();  // old_inodes
...
4413 }

Because incorrectly parsing the snaptrace in kclient may corrupt the
snaprealm and then send corruped capsnap back to MDS.

Thanks

- Xiubo

> I'm surprised/concerned your system managed to generate one of those,
> of course...I'll let Xiubo work with you on that.
> -Greg
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx