MDS Stuck in Replay Loop (Segfault) after subvolume creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

i want to test something with cephfs subvolume an how to mount it and set quota.

after some "ceph fs" commands I got an E-Mail from Prometheus that the cluster is in 
"Health Warn".
The Error was that every MDS crash with a Segfault.

Following Some Information of my cluster.
The cluster is running via podman.
ceph version 16.2.3 (381b476cb3900f9a92eb95d03b4850b953cfd79a) pacific (stable) 


The commands I habe used to setup my subvolume.

 ceph fs subvolumegroup create cephfs test-mount 

# Size 10G 
ceph fs subvolume create cephfs test-mount-volume test-mount --size=1073741824 
ceph fs subvolume info cephfs test-mount-volume test-mount 

# Failed 
ceph fs subvolume authorize cephfs test-mount-volume test-mount-client test-mount / 

# Success 
ceph fs subvolume authorize cephfs test-mount-volume test-mount-client test-mount 


Now i got from Prometheus a Health Warn Alert.
The reason is  MDS stuck in "replay" loop. No MDS come up anymore
All MDS crashes with the following Error Message.
# MDS Error Message 
replayed ESubtreeMap at 12471325829 subtree root 0x1 is not mine in cache (it's -2,-2) 
*** Caught signal (Segmentation fault) **

The journal looks like OK.
cephfs-journal-tool --rank=cephfs:all journal inspect 
Overall journal integrity: OK

I took a backup from the journal via.
cephfs-journal-tool --rank=cephfs:all journal export backup.bin

and an export from the cephfs_metadata pool.
rados -p cephfs_metadata export cephfs_metadata_backup


A short Output of the events.
cephfs-journal-tool --rank=cephfs:all event get list
https://pastebin.com/jUDTQL2U[1]



I would take the following actions to recover the mds failure:
cephfs-journal-tool --rank=cephfs:all event recover_dentries summary
cephfs-journal-tool --rank=cephfs:all journal reset
cephfs-table-tool --rank=cephfs:all reset session


Any suggestions how to fix this?

Thanks
-- 
Carsten Feuls



--------
[1] https://pastebin.com/jUDTQL2U
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux