Hi all, I needed to reduce the number of active MDS daemons from 4 to 1. Unfortunately, the last MDS to stop is stuck in stopping state. Ceph version is mimic 13.2.10. Each MDS has 3 blocked OPS, that seem to be related to deleted snapshots; more info below. I failed the MDS in stopping state already several times in the hope that the operations get flushed out. Before failing rank 0, I would appreciate if someone could look at this issue and advise on how to proceed safely. Some diagnostic info: # ceph fs status con-fs2 - 1659 clients ======= +------+----------+---------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+----------+---------+---------------+-------+-------+ | 0 | active | ceph-08 | Reqs: 176 /s | 2844k | 2775k | | 1 | stopping | ceph-17 | | 27.7k | 59 | +------+----------+---------+---------------+-------+-------+ +---------------------+----------+-------+-------+ | Pool | type | used | avail | +---------------------+----------+-------+-------+ | con-fs2-meta1 | metadata | 555M | 1261G | | con-fs2-meta2 | data | 0 | 1261G | | con-fs2-data | data | 1321T | 5756T | | con-fs2-data-ec-ssd | data | 252G | 4035G | | con-fs2-data2 | data | 389T | 5233T | +---------------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | ceph-09 | | ceph-24 | | ceph-14 | | ceph-16 | | ceph-12 | | ceph-23 | | ceph-10 | | ceph-15 | | ceph-13 | | ceph-11 | +-------------+ MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable) # ceph status cluster: id: health: HEALTH_WARN 2 MDSs report slow requests services: mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 mgr: ceph-01(active), standbys: ceph-02, ceph-03, ceph-25, ceph-26 mds: con-fs2-2/2/1 up {0=ceph-08=up:active,1=ceph-17=up:stopping}, 10 up:standby osd: 1051 osds: 1050 up, 1050 in data: pools: 13 pools, 17374 pgs objects: 1.01 G objects, 1.9 PiB usage: 2.3 PiB used, 9.2 PiB / 11 PiB avail pgs: 17352 active+clean 20 active+clean+scrubbing+deep 2 active+clean+scrubbing io: client: 129 MiB/s rd, 175 MiB/s wr, 2.57 kop/s rd, 2.77 kop/s wr # ceph health detail HEALTH_WARN 2 MDSs report slow requests MDS_SLOW_REQUEST 2 MDSs report slow requests mdsceph-08(mds.0): 3 slow requests are blocked > 30 secs mdsceph-17(mds.1): 3 slow requests are blocked > 30 secs # ssh ceph-08 ceph daemon mds.ceph-08 dump_blocked_ops { "ops": [ { "description": "client_request(mds.1:126521 rename #0x100/stray5/1000eec35f7 #0x101/stray5/1000eec35f7 caller_uid=0, caller_gid=0{})", "initiated_at": "2021-12-13 13:08:59.430597", "age": 5034.983083, "duration": 5034.983109, "type_data": { "flag_point": "acquired locks", "reqid": "mds.1:126521", "op_type": "client_request", "client_info": { "client": "mds.1", "tid": 126521 }, "events": [ { "time": "2021-12-13 13:08:59.430597", "event": "initiated" }, { "time": "2021-12-13 13:08:59.430597", "event": "header_read" }, { "time": "2021-12-13 13:08:59.430597", "event": "throttled" }, { "time": "2021-12-13 13:08:59.430601", "event": "all_read" }, { "time": "2021-12-13 13:09:00.730197", "event": "dispatched" }, { "time": "2021-12-13 13:09:01.517306", "event": "requesting remote authpins" }, { "time": "2021-12-13 13:09:01.557219", "event": "failed to xlock, waiting" }, { "time": "2021-12-13 13:09:01.647692", "event": "failed to wrlock, waiting" }, { "time": "2021-12-13 13:09:01.663629", "event": "waiting for remote wrlocks" }, { "time": "2021-12-13 13:09:01.673789", "event": "waiting for remote wrlocks" }, { "time": "2021-12-13 13:09:01.676523", "event": "failed to xlock, waiting" }, { "time": "2021-12-13 13:09:01.691962", "event": "failed to xlock, waiting" }, { "time": "2021-12-13 13:09:01.704202", "event": "acquired locks" } ] } }, { "description": "client_request(mds.1:1 rename #0x100/stray5/1000eec35f7 #0x101/stray5/1000eec35f7 caller_uid=0, caller_gid=0{})", "initiated_at": "2021-12-13 13:31:56.260453", "age": 3658.153227, "duration": 3658.153337, "type_data": { "flag_point": "requesting remote authpins", "reqid": "mds.1:1", "op_type": "client_request", "client_info": { "client": "mds.1", "tid": 1 }, "events": [ { "time": "2021-12-13 13:31:56.260453", "event": "initiated" }, { "time": "2021-12-13 13:31:56.260453", "event": "header_read" }, { "time": "2021-12-13 13:31:56.260454", "event": "throttled" }, { "time": "2021-12-13 13:31:56.260461", "event": "all_read" }, { "time": "2021-12-13 13:31:56.260511", "event": "dispatched" }, { "time": "2021-12-13 13:31:56.260604", "event": "requesting remote authpins" } ] } }, { "description": "client_request(mds.1:993 rename #0x100/stray5/1000eec35f7 #0x101/stray5/1000eec35f7 caller_uid=0, caller_gid=0{})", "initiated_at": "2021-12-13 13:15:31.979997", "age": 4642.433683, "duration": 4642.433850, "type_data": { "flag_point": "requesting remote authpins", "reqid": "mds.1:993", "op_type": "client_request", "client_info": { "client": "mds.1", "tid": 993 }, "events": [ { "time": "2021-12-13 13:15:31.979997", "event": "initiated" }, { "time": "2021-12-13 13:15:31.979997", "event": "header_read" }, { "time": "2021-12-13 13:15:31.979998", "event": "throttled" }, { "time": "2021-12-13 13:15:31.980003", "event": "all_read" }, { "time": "2021-12-13 13:15:31.980079", "event": "dispatched" }, { "time": "2021-12-13 13:15:31.980174", "event": "requesting remote authpins" }, { "time": "2021-12-13 13:31:50.634734", "event": "requesting remote authpins" } ] } } ], "complaint_time": 30.000000, "num_blocked_ops": 3 } # ssh ceph-17 ceph daemon mds.ceph-17 dump_blocked_ops { "ops": [ { "description": "rejoin:mds.1:126521", "initiated_at": "2021-12-13 13:30:34.602931", "age": 3791.164314, "duration": 3791.164335, "type_data": { "flag_point": "dispatched", "reqid": "mds.1:126521", "op_type": "no_available_op_found", "events": [ { "time": "2021-12-13 13:30:34.602931", "event": "initiated" }, { "time": "2021-12-13 13:30:34.602931", "event": "header_read" }, { "time": "2021-12-13 13:30:34.602932", "event": "throttled" }, { "time": "2021-12-13 13:30:34.602978", "event": "all_read" }, { "time": "2021-12-13 13:30:34.605856", "event": "dispatched" } ] } }, { "description": "slave_request(mds.1:993.0 authpin)", "initiated_at": "2021-12-13 13:31:50.634857", "age": 3715.132388, "duration": 3715.132451, "type_data": { "flag_point": "dispatched", "reqid": "mds.1:993", "op_type": "slave_request", "master_info": { "master": "mds.0" }, "request_info": { "attempt": 0, "op_type": "authpin", "lock_type": 0, "object_info": "0x1000eec35f7.head", "srcdnpath": "", "destdnpath": "", "witnesses": "", "has_inode_export": false, "inode_export_v": 0, "op_stamp": "0.000000" }, "events": [ { "time": "2021-12-13 13:31:50.634857", "event": "initiated" }, { "time": "2021-12-13 13:31:50.634857", "event": "header_read" }, { "time": "2021-12-13 13:31:50.634858", "event": "throttled" }, { "time": "2021-12-13 13:31:50.634867", "event": "all_read" }, { "time": "2021-12-13 13:31:50.634893", "event": "dispatched" } ] } }, { "description": "slave_request(mds.1:1.0 authpin)", "initiated_at": "2021-12-13 13:31:56.260729", "age": 3709.506516, "duration": 3709.506631, "type_data": { "flag_point": "dispatched", "reqid": "mds.1:1", "op_type": "slave_request", "master_info": { "master": "mds.0" }, "request_info": { "attempt": 0, "op_type": "authpin", "lock_type": 0, "object_info": "0x1000eec35f7.head", "srcdnpath": "", "destdnpath": "", "witnesses": "", "has_inode_export": false, "inode_export_v": 0, "op_stamp": "0.000000" }, "events": [ { "time": "2021-12-13 13:31:56.260729", "event": "initiated" }, { "time": "2021-12-13 13:31:56.260729", "event": "header_read" }, { "time": "2021-12-13 13:31:56.260731", "event": "throttled" }, { "time": "2021-12-13 13:31:56.260743", "event": "all_read" }, { "time": "2021-12-13 13:31:56.264063", "event": "dispatched" } ] } } ], "complaint_time": 30.000000, "num_blocked_ops": 3 } ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx