I hope this message finds you well. I have a cephfs cluster with 3 active mds, and use 3-node samba to export through the kernel. Currently, there are 2 node mds experiencing slow requests. We have tried restarting the mds. After a few hours, the replay log status became active. But the slow request reappears. The slow request does not seem to come from the client, but from the request of the mds node. Looking forward to your prompt response. HEALTH_WARN 2 MDSs report slow requests; 2 MDSs behind on trimming [WRN] MDS_SLOW_REQUEST: 2 MDSs report slow requests mds.osd44(mds.0): 2 slow requests are blocked > 30 secs mds.osd43(mds.1): 2 slow requests are blocked > 30 secs [WRN] MDS_TRIM: 2 MDSs behind on trimming mds.osd44(mds.0): Behind on trimming (18642/1024) max_segments: 1024, num_segments: 18642 mds.osd43(mds.1): Behind on trimming (976612/1024) max_segments: 1024, num_segments: 976612 mds.0 { "ops": [ { "description": "peer_request:mds.1:1", "initiated_at": "2023-12-31T11:19:38.679925+0800", "age": 4358.8009461359998, "duration": 4358.8009636369998, "type_data": { "flag_point": "dispatched", "reqid": "mds.1:1", "op_type": "peer_request", "leader_info": { "leader": "1" }, "events": [ { "time": "2023-12-31T11:19:38.679925+0800", "event": "initiated" }, { "time": "2023-12-31T11:19:38.679925+0800", "event": "throttled" }, { "time": "2023-12-31T11:19:38.679925+0800", "event": "header_read" }, { "time": "2023-12-31T11:19:38.679936+0800", "event": "all_read" }, { "time": "2023-12-31T11:19:38.679940+0800", "event": "dispatched" } ] } }, { "description": "peer_request:mds.1:2", "initiated_at": "2023-12-31T11:19:38.679938+0800", "age": 4358.8009326969996, "duration": 4358.8009763549999, "type_data": { "flag_point": "dispatched", "reqid": "mds.1:2", "op_type": "peer_request", "leader_info": { "leader": "1" }, "events": [ { "time": "2023-12-31T11:19:38.679938+0800", "event": "initiated" }, { "time": "2023-12-31T11:19:38.679938+0800", "event": "throttled" }, { "time": "2023-12-31T11:19:38.679938+0800", "event": "header_read" }, { "time": "2023-12-31T11:19:38.679941+0800", "event": "all_read" }, { "time": "2023-12-31T11:19:38.679991+0800", "event": "dispatched" } ] } } ], "complaint_time": 30, "num_blocked_ops": 2 } mds.1 { "ops": [ { "description": "internal op exportdir:mds.1:1", "initiated_at": "2023-12-31T11:19:34.416451+0800", "age": 4384.38814198, "duration": 4384.3881617610004, "type_data": { "flag_point": "failed to wrlock, waiting", "reqid": "mds.1:1", "op_type": "internal_op", "internal_op": 5377, "op_name": "exportdir", "events": [ { "time": "2023-12-31T11:19:34.416451+0800", "event": "initiated" }, { "time": "2023-12-31T11:19:34.416451+0800", "event": "throttled" }, { "time": "2023-12-31T11:19:34.416451+0800", "event": "header_read" }, { "time": "2023-12-31T11:19:34.416451+0800", "event": "all_read" }, { "time": "2023-12-31T11:19:34.416451+0800", "event": "dispatched" }, { "time": "2023-12-31T11:19:38.679923+0800", "event": "requesting remote authpins" }, { "time": "2023-12-31T11:19:38.693981+0800", "event": "failed to wrlock, waiting" } ] } }, { "description": "internal op exportdir:mds.1:2", "initiated_at": "2023-12-31T11:19:34.416482+0800", "age": 4384.3881117999999, "duration": 4384.3881714600002, "type_data": { "flag_point": "failed to wrlock, waiting", "reqid": "mds.1:2", "op_type": "internal_op", "internal_op": 5377, "op_name": "exportdir", "events": [ { "time": "2023-12-31T11:19:34.416482+0800", "event": "initiated" }, { "time": "2023-12-31T11:19:34.416482+0800", "event": "throttled" }, { "time": "2023-12-31T11:19:34.416482+0800", "event": "header_read" }, { "time": "2023-12-31T11:19:34.416482+0800", "event": "all_read" }, { "time": "2023-12-31T11:19:34.416482+0800", "event": "dispatched" }, { "time": "2023-12-31T11:19:38.679929+0800", "event": "requesting remote authpins" }, { "time": "2023-12-31T11:19:38.693995+0800", "event": "failed to wrlock, waiting" } ] } } ], "complaint_time": 30, "num_blocked_ops": 2 } I can't find any other solution other than restarting the mds service with slow requests. Currently, the backlog of mds logs in the metadata pool exceeds 4TB. Best regards, _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx