On 7/11/22 17:22, Frank Schilder wrote:
Hi all, we made a very weird observation on our ceph test cluster today. A simple getfattr with a misspelled attribute name sends the MDS cluster into a crash+restart loop. Something as simple as getfattr -n ceph.dir.layout.po /mnt/cephfs kills a ceph-fs completely. The problem can be resolved if one executes a "umount -f /mnt/cephfs" on the host where the getfattr was executed. The MDS daemons need a restart. One might also need to clear the OSD blacklist. We observe this with a kernel client on 5.18.6-1.el7.elrepo.x86_64 (Centos 7) with mimic and I'm absolutely sure I have not seen this problem with mimic on earlier 5.9.X-kernel versions. Is this known to be a kernel client bug? Possibly fixed already?
I cannot reproduce on a 5.18.9 kernel (Arch linux) with a Ceph Octopus 15.2.16 cluster
# getfattr -n ceph.dir.layout.po /mnt/cephfs /mnt/cephfs: ceph.dir.layout.po: No such attribute Gr. Stefan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx