Re: ceph-fs crashes on getfattr

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 11 Jul 2022 10:14:26 -0700

On Mon, Jul 11, 2022 at 8:26 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi all,
>
> we made a very weird observation on our ceph test cluster today. A simple getfattr with a misspelled attribute name sends the MDS cluster into a crash+restart loop. Something as simple as
>
>   getfattr -n ceph.dir.layout.po /mnt/cephfs
>
> kills a ceph-fs completely. The problem can be resolved if one executes a "umount -f /mnt/cephfs" on the host where the getfattr was executed. The MDS daemons need a restart. One might also need to clear the OSD blacklist.
>
> We observe this with a kernel client on 5.18.6-1.el7.elrepo.x86_64 (Centos 7) with mimic and I'm absolutely sure I have not seen this problem with mimic on earlier 5.9.X-kernel versions.
>
> Is this known to be a kernel client bug? Possibly fixed already?

That obviously shouldn't happen. Please file a tracker ticket.

There's been a fair bit of churn in how we handle the "vxattrs" so my
guess is an incompatibility got introduced between newer clients and
the old server implementation, but obviously we want it to work and we
especially shouldn't be crashing the MDS. Skimming through it I'm
actually not seeing what a client *could* do in that path to crash the
server so I'm a bit confused...
Oh. I think I see it now, but I'd like to confirm. Yeah, please make
that tracker ticket and attach the backtrace you get.
Thanks,
-Greg

>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx