Re: mds dump inode crashes file system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 12, 2023 at 5:28 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Dear Xiubo and others.
>
> >> I have never heard about that option until now. How do I check that and how to I disable it if necessary?
> >> I'm in meetings pretty much all day and will try to send some more info later.
> >
> > $ mount|grep ceph
>
> I get
>
> MON-IPs:SRC on DST type ceph (rw,relatime,name=con-fs2-rit-pfile,secret=<hidden>,noshare,acl,mds_namespace=con-fs2,_netdev)
>
> so async dirop seems disabled.
>
> > Yeah, the kclient just received a corrupted snaptrace from MDS.
> > So the first thing is you need to fix the corrupted snaptrace issue in cephfs and then continue.
>
> Ooookaaayyyy. I will take it as a compliment that you seem to assume I know how to do that. The documentation gives 0 hits. Could you please provide me with instructions of what to look for and/or what to do first?
>
> > If possible you can parse the above corrupted snap message to check what exactly corrupted.
> > I haven't get a chance to do that.
>
> Again, how would I do that? Is there some documentation and what should I expect?
>
> > You seems didn't enable the 'osd blocklist' cephx auth cap for mon:
>
> I can't find anything about an osd blocklist client auth cap in the documentation. Is this something that came after octopus? Our caps are as shown in the documentation for a ceph fs client (https://docs.ceph.com/en/octopus/cephfs/client-auth/), the one for mon is "allow r":
>
>         caps mds = "allow rw path=/shares"
>         caps mon = "allow r"
>         caps osd = "allow rw tag cephfs data=con-fs2"
>
>
> > I checked that but by reading the code I couldn't get what had cause the MDS crash.
> > There seems something wrong corrupt the metadata in cephfs.
>
> He wrote something about an invalid xattrib (empty value). It would be really helpful to get a clue how to proceed. I managed to dump the MDS cache with the critical inode in cache. Would this help with debugging? I also managed to get debug logs with debug_mds=20 during a crash caused by an "mds dump inode" command. Would this contain something interesting? I can also pull the rados objects out and can upload all of these files.

I was just guessing about the invalid xattr based on the very limited
crash info, so if it's clearly broken snapshot metadata from the
kclient logs I would focus on that.

I'm surprised/concerned your system managed to generate one of those,
of course...I'll let Xiubo work with you on that.
-Greg
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux