Hello, we are having issues with cephfs cluster. Any help would be appreciated. We are running still on 18.2.0.During holidays we had outage caused by filling up rootfs. OSDs started randomly dying and we had time when not all PGs were active. This issue is already solved and all OSDs work fine but we're stuck with some MDS issues.
warnings we are concerned about: [WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOsmds.arm-vol.k02r04nvm01.zaqebs(mds.0): 29 slow metadata IOs are blocked > 30 secs, oldest blocked for 1899 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimmingmds.arm-vol.k02r04nvm01.zaqebs(mds.0): Behind on trimming (4851/128) max_segments: 128, num_segments: 4851
1. Out MDSs are not trimming. 2. our active MDS has metadata slow ops which we cannot understand Cephfs status look ok, main MDS is active. All metadata pool PGs are active and working, there are not laggy PGs. Trying to dump ops from mds also doesn't help ceph daemon ./ceph-mds.arm-vol.k02r04nvm01.zaqebs.asok dump_ops_in_flight { "ops": [], "num_ops": 0 } MDS failover or MDS restart also doesn't help.Metadata slow ops always return after MDS restart. (all MDSs have this issue)
After failover main MDS is stuck in rejoin state for a long time.We've used mds_wipe_sessions config option to bring it quickly into active state.
I'm guessing slow metadata ops are stopping MDS from trimming but we cannot figure out what is causing these slow ops.
Best regards Adam Prycki
Attachment:
smime.p7s
Description: Kryptograficzna sygnatura S/MIME
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx