On Mon, Jul 11, 2022 at 8:26 AM Frank Schilder <frans@xxxxxx> wrote: > > Hi all, > > we made a very weird observation on our ceph test cluster today. A simple getfattr with a misspelled attribute name sends the MDS cluster into a crash+restart loop. Something as simple as > > getfattr -n ceph.dir.layout.po /mnt/cephfs > > kills a ceph-fs completely. The problem can be resolved if one executes a "umount -f /mnt/cephfs" on the host where the getfattr was executed. The MDS daemons need a restart. One might also need to clear the OSD blacklist. > > We observe this with a kernel client on 5.18.6-1.el7.elrepo.x86_64 (Centos 7) with mimic and I'm absolutely sure I have not seen this problem with mimic on earlier 5.9.X-kernel versions. > > Is this known to be a kernel client bug? Possibly fixed already? That obviously shouldn't happen. Please file a tracker ticket. There's been a fair bit of churn in how we handle the "vxattrs" so my guess is an incompatibility got introduced between newer clients and the old server implementation, but obviously we want it to work and we especially shouldn't be crashing the MDS. Skimming through it I'm actually not seeing what a client *could* do in that path to crash the server so I'm a bit confused... Oh. I think I see it now, but I'd like to confirm. Yeah, please make that tracker ticket and attach the backtrace you get. Thanks, -Greg > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx