I've just spent a couple of hours waiting for an MDS server to replay a journal that it was behind on and it seems to be getting worse. The system is not terribly busy, but there are 14 ops in flight that are very old and do not seem to go away on their own. Is there anything I can do to unwedge the mds server? root@dalmore:~# ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1 MDSs behind on trimming MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mds.dalmore(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 4107 secs MDS_SLOW_REQUEST 1 MDSs report slow requests mds.dalmore(mds.0): 14 slow requests are blocked > 30 secs MDS_TRIM 1 MDSs behind on trimming mds.dalmore(mds.0): Behind on trimming (3443/128) max_segments: 128, num_segments: 3443 root@dalmore:~# ceph daemon mds.dalmore dump_ops_in_flight { "ops": [ { "description": "client_request(client.9801215:348401522 readdir #0x2001fba1c83 2021-04-21 15:46:30.820531 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:46:30.822084", "age": 4020.3439449050002, "duration": 4020.343998116, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348401522", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348401522 }, "events": [ { "time": "2021-04-21 15:46:30.822084", "event": "initiated" }, { "time": "2021-04-21 15:46:30.822084", "event": "header_read" }, { "time": "2021-04-21 15:46:30.822090", "event": "throttled" }, { "time": "2021-04-21 15:46:30.822108", "event": "all_read" }, { "time": "2021-04-21 15:46:30.966336", "event": "dispatched" }, { "time": "2021-04-21 15:46:30.966419", "event": "acquired locks" }, { "time": "2021-04-21 16:01:40.084378", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084438", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9801215:348454506 setfilelock rule 1, type 4, owner 14336643358275911908, pid 32085, start 0, length 0, wait 0 #0x1001e694623 20 21-04-21 15:55:50.829175 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:55:50.832432", "age": 3460.3335972099999, "duration": 3460.333751787, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348454506", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348454506 }, "events": [ { "time": "2021-04-21 15:55:50.832432", "event": "initiated" }, { "time": "2021-04-21 15:55:50.832432", "event": "header_read" }, { "time": "2021-04-21 15:55:50.832439", "event": "throttled" }, { "time": "2021-04-21 15:55:50.832465", "event": "all_read" }, { "time": "2021-04-21 15:55:50.832608", "event": "dispatched" }, { "time": "2021-04-21 16:01:40.084448", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084451", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9143865:2383945773 getattr pAsLsXsFs #0x10004f6a16b 2021-04-21 16:35:30.075110 caller_uid=0, caller_gid=0{})", "initiated_at": "2021-04-21 16:35:30.077995", "age": 1081.0880337159999, "duration": 1081.0882681850001, "type_data": { "flag_point": "failed to authpin, subtree is being exported", "reqid": "client.9143865:2383945773", "op_type": "client_request", "client_info": { "client": "client.9143865", "tid": 2383945773 }, "events": [ { "time": "2021-04-21 16:35:30.077995", "event": "initiated" }, { "time": "2021-04-21 16:35:30.077995", "event": "header_read" }, { "time": "2021-04-21 16:35:30.077999", "event": "throttled" }, { "time": "2021-04-21 16:35:30.078018", "event": "all_read" }, { "time": "2021-04-21 16:35:30.078091", "event": "dispatched" }, { "time": "2021-04-21 16:35:30.078156", "event": "failed to authpin, subtree is being exported" } ] } }, { "description": "client_request(client.9801215:348417261 lookup #0x10004cd0675/tasks 2021-04-21 15:49:09.248696 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:49:09.252834", "age": 3861.9131950870001, "duration": 3861.9135065690002, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348417261", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348417261 }, "events": [ { "time": "2021-04-21 15:49:09.252834", "event": "initiated" }, { "time": "2021-04-21 15:49:09.252834", "event": "header_read" }, { "time": "2021-04-21 15:49:09.252839", "event": "throttled" }, { "time": "2021-04-21 15:49:09.252856", "event": "all_read" }, { "time": "2021-04-21 15:49:09.252982", "event": "dispatched" }, { "time": "2021-04-21 15:49:09.253070", "event": "failed to authpin, subtree is being exported" }, { "time": "2021-04-21 16:01:40.084439", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084444", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9143865:2383945793 getattr pAsLsXsFs #0x10004f6a16b 2021-04-21 16:40:33.868658 caller_uid=0, caller_gid=0{})", "initiated_at": "2021-04-21 16:40:33.871565", "age": 777.29446343799998, "duration": 777.29486169500001, "type_data": { "flag_point": "failed to authpin, subtree is being exported", "reqid": "client.9143865:2383945793", "op_type": "client_request", "client_info": { "client": "client.9143865", "tid": 2383945793 }, "events": [ { "time": "2021-04-21 16:40:33.871565", "event": "initiated" }, { "time": "2021-04-21 16:40:33.871565", "event": "header_read" }, { "time": "2021-04-21 16:40:33.871572", "event": "throttled" }, { "time": "2021-04-21 16:40:33.871617", "event": "all_read" }, { "time": "2021-04-21 16:40:33.871699", "event": "dispatched" }, { "time": "2021-04-21 16:40:33.871909", "event": "failed to authpin, subtree is being exported" } ] } }, { "description": "client_request(client.9801215:348424659 lookup #0x10004cd0675/slaves 2021-04-21 15:50:01.592755 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:50:01.595195", "age": 3809.5708340870001, "duration": 3809.5713271260001, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348424659", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348424659 }, "events": [ { "time": "2021-04-21 15:50:01.595195", "event": "initiated" }, { "time": "2021-04-21 15:50:01.595195", "event": "header_read" }, { "time": "2021-04-21 15:50:01.595198", "event": "throttled" }, { "time": "2021-04-21 15:50:01.595210", "event": "all_read" }, { "time": "2021-04-21 15:50:01.595307", "event": "dispatched" }, { "time": "2021-04-21 15:50:01.595389", "event": "failed to authpin, subtree is being exported" }, { "time": "2021-04-21 16:01:40.084444", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084448", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9801215:348456034 setfilelock rule 1, type 4, owner 14336643483181153700, pid 32085, start 0, length 0, wait 0 #0x1001e694623 20 21-04-21 15:56:20.849213 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:56:20.851803", "age": 3430.3142257680001, "duration": 3430.3148090620002, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348456034", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348456034 }, "events": [ { "time": "2021-04-21 15:56:20.851803", "event": "initiated" }, { "time": "2021-04-21 15:56:20.851803", "event": "header_read" }, { "time": "2021-04-21 15:56:20.851815", "event": "throttled" }, { "time": "2021-04-21 15:56:20.851853", "event": "all_read" }, { "time": "2021-04-21 15:56:20.851983", "event": "dispatched" }, { "time": "2021-04-21 16:01:40.084452", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084454", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9801215:348456035 lookup #0x10004cd0675/tasks 2021-04-21 15:56:20.849213 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:56:20.851899", "age": 3430.314129722, "duration": 3430.3147965909998, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348456035", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348456035 }, "events": [ { "time": "2021-04-21 15:56:20.851899", "event": "initiated" }, { "time": "2021-04-21 15:56:20.851899", "event": "header_read" }, { "time": "2021-04-21 15:56:20.851900", "event": "throttled" }, { "time": "2021-04-21 15:56:20.851908", "event": "all_read" }, { "time": "2021-04-21 15:56:20.852165", "event": "dispatched" }, { "time": "2021-04-21 15:56:20.852254", "event": "failed to authpin, subtree is being exported" }, { "time": "2021-04-21 16:01:40.084455", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084457", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9801215:348456036 lookup #0x10004cd0675/slaves 2021-04-21 15:56:20.849213 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:56:20.879287", "age": 3430.2867415639998, "duration": 3430.287496596, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348456036", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348456036 }, "events": [ { "time": "2021-04-21 15:56:20.879287", "event": "initiated" }, { "time": "2021-04-21 15:56:20.879287", "event": "header_read" }, { "time": "2021-04-21 15:56:20.879292", "event": "throttled" }, { "time": "2021-04-21 15:56:20.879306", "event": "all_read" }, { "time": "2021-04-21 15:56:20.879408", "event": "dispatched" }, { "time": "2021-04-21 15:56:20.879502", "event": "failed to authpin, subtree is being exported" }, { "time": "2021-04-21 16:01:40.084458", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084460", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9801215:348456042 lookup #0x10004cd0675/jul 2021-04-21 15:58:53.729411 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:58:53.740318", "age": 3277.4257106780001, "duration": 3277.4265688219998, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348456042", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348456042 }, "events": [ { "time": "2021-04-21 15:58:53.740318", "event": "initiated" }, { "time": "2021-04-21 15:58:53.740318", "event": "header_read" }, { "time": "2021-04-21 15:58:53.740324", "event": "throttled" }, { "time": "2021-04-21 15:58:53.740345", "event": "all_read" }, { "time": "2021-04-21 15:58:53.743424", "event": "dispatched" }, { "time": "2021-04-21 15:58:53.750527", "event": "failed to authpin, subtree is being exported" }, { "time": "2021-04-21 16:01:40.084465", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084468", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9143865:2383914038 getattr pAsLsXsFs #0x10004f6a16b 2021-04-21 16:29:35.332632 caller_uid=903, caller_gid=1003{})", "initiated_at": "2021-04-21 16:29:35.333838", "age": 1435.8321910940001, "duration": 1435.833136707, "type_data": { "flag_point": "failed to authpin, subtree is being exported", "reqid": "client.9143865:2383914038", "op_type": "client_request", "client_info": { "client": "client.9143865", "tid": 2383914038 }, "events": [ { "time": "2021-04-21 16:29:35.333838", "event": "initiated" }, { "time": "2021-04-21 16:29:35.333838", "event": "header_read" }, { "time": "2021-04-21 16:29:35.333841", "event": "throttled" }, { "time": "2021-04-21 16:29:35.333854", "event": "all_read" }, { "time": "2021-04-21 16:29:35.335013", "event": "dispatched" }, { "time": "2021-04-21 16:29:35.335091", "event": "failed to authpin, subtree is being exported" } ] } }, { "description": "client_request(client.10439445:7 lookup #0x10004cd0675/jul 2021-04-21 16:26:17.334272 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 16:26:17.352448", "age": 1633.813581226, "duration": 1633.8146036109999, "type_data": { "flag_point": "cleaned up request", "reqid": "client.10439445:7", "op_type": "client_request", "client_info": { "client": "client.10439445", "tid": 7 }, "events": [ { "time": "2021-04-21 16:26:17.352448", "event": "initiated" }, { "time": "2021-04-21 16:26:17.352448", "event": "header_read" }, { "time": "2021-04-21 16:26:17.352450", "event": "throttled" }, { "time": "2021-04-21 16:26:17.352460", "event": "all_read" }, { "time": "2021-04-21 16:26:17.356901", "event": "dispatched" }, { "time": "2021-04-21 16:26:17.357337", "event": "failed to authpin, subtree is being exported" }, { "time": "2021-04-21 16:38:08.054109", "event": "killing request" }, { "time": "2021-04-21 16:38:08.054120", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9801215:348456041 lookup #0x10004cd0675/jul 2021-04-21 15:56:23.089216 caller_uid=1000, caller_gid=1000{})", "initiated_at": "2021-04-21 15:56:23.091531", "age": 3428.0744974170002, "duration": 3428.0756079930002, "type_data": { "flag_point": "cleaned up request", "reqid": "client.9801215:348456041", "op_type": "client_request", "client_info": { "client": "client.9801215", "tid": 348456041 }, "events": [ { "time": "2021-04-21 15:56:23.091531", "event": "initiated" }, { "time": "2021-04-21 15:56:23.091531", "event": "header_read" }, { "time": "2021-04-21 15:56:23.091536", "event": "throttled" }, { "time": "2021-04-21 15:56:23.091555", "event": "all_read" }, { "time": "2021-04-21 15:56:23.091660", "event": "dispatched" }, { "time": "2021-04-21 15:56:23.091763", "event": "failed to authpin, subtree is being exported" }, { "time": "2021-04-21 16:01:40.084461", "event": "killing request" }, { "time": "2021-04-21 16:01:40.084465", "event": "cleaned up request" } ] } }, { "description": "client_request(client.9143865:2383885743 getattr pAsLsXsFs #0x10004f6a16b 2021-04-21 16:26:36.713371 caller_uid=903, caller_gid=1003{})", "initiated_at": "2021-04-21 16:26:36.716418", "age": 1614.4496113350001, "duration": 1614.4508114350001, "type_data": { "flag_point": "failed to authpin, subtree is being exported", "reqid": "client.9143865:2383885743", "op_type": "client_request", "client_info": { "client": "client.9143865", "tid": 2383885743 }, "events": [ { "time": "2021-04-21 16:26:36.716418", "event": "initiated" }, { "time": "2021-04-21 16:26:36.716418", "event": "header_read" }, { "time": "2021-04-21 16:26:36.716421", "event": "throttled" }, { "time": "2021-04-21 16:26:36.716435", "event": "all_read" }, { "time": "2021-04-21 16:26:36.716526", "event": "dispatched" }, { "time": "2021-04-21 16:26:36.716632", "event": "failed to authpin, subtree is being exported" } ] } } ], "num_ops": 14 } -- Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/ _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx