Dear Gregory, thanks for your fast response. The situation started worsening shortly after I sent my e-mail and I had to take action. More operations got stuck in the active MDS, leading to a failure of journal trimming. I more or less went through all combinations of what you wrote. In the end, the last option was to fail rank 0. This also crashed rank 1, but then things started to clear up. There were more blocked ops warnings every now and then, but these were temporary. After a bit of watching the 2 MDSes struggle, the "stopping" one finally became standby and everything is up and clean right now. The reason for this exercise is a (rare?) race condition between dirfrag subtree exports (multi-active MDS) and certain client operations (rename, create), which leads to a complete standstill of the file system (the fs journal size eventually reaches the pool quota). Its seems to be caused by different clients (servers) accessing the same sub-dir on the file system concurrently. It is only one workload where we observe this, no useful information so far. Thanks again and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Gregory Farnum <gfarnum@xxxxxxxxxx> Sent: 13 December 2021 17:39:55 To: Frank Schilder Cc: ceph-users Subject: Re: MDS stuck in stopping state This looks awkward — just from the ops, it seems mds.1 is trying to move some stray items (presumably snapshots of since-deleted files, from what you said?) into mds0's stray directory, and then mds.0 tries to get auth pins from mds.1 but that fails for some reason which isn't apparent from the dump. Somebody might be able to get farther along by tracing logs of mds.1 rebooting, but my guess is that rebooting both servers will clear it up. You might also try increasing max_mds to 2 and seeing if that jogs things loose; I'm not sure what would be less disruptive for you. -Greg On Mon, Dec 13, 2021 at 5:37 AM Frank Schilder <frans@xxxxxx> wrote: > > Hi all, I needed to reduce the number of active MDS daemons from 4 to 1. Unfortunately, the last MDS to stop is stuck in stopping state. Ceph version is mimic 13.2.10. Each MDS has 3 blocked OPS, that seem to be related to deleted snapshots; more info below. I failed the MDS in stopping state already several times in the hope that the operations get flushed out. Before failing rank 0, I would appreciate if someone could look at this issue and advise on how to proceed safely. > > Some diagnostic info: > > # ceph fs status > con-fs2 - 1659 clients > ======= > +------+----------+---------+---------------+-------+-------+ > | Rank | State | MDS | Activity | dns | inos | > +------+----------+---------+---------------+-------+-------+ > | 0 | active | ceph-08 | Reqs: 176 /s | 2844k | 2775k | > | 1 | stopping | ceph-17 | | 27.7k | 59 | > +------+----------+---------+---------------+-------+-------+ > +---------------------+----------+-------+-------+ > | Pool | type | used | avail | > +---------------------+----------+-------+-------+ > | con-fs2-meta1 | metadata | 555M | 1261G | > | con-fs2-meta2 | data | 0 | 1261G | > | con-fs2-data | data | 1321T | 5756T | > | con-fs2-data-ec-ssd | data | 252G | 4035G | > | con-fs2-data2 | data | 389T | 5233T | > +---------------------+----------+-------+-------+ > +-------------+ > | Standby MDS | > +-------------+ > | ceph-09 | > | ceph-24 | > | ceph-14 | > | ceph-16 | > | ceph-12 | > | ceph-23 | > | ceph-10 | > | ceph-15 | > | ceph-13 | > | ceph-11 | > +-------------+ > MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable) > > # ceph status > cluster: > id: > health: HEALTH_WARN > 2 MDSs report slow requests > > services: > mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 > mgr: ceph-01(active), standbys: ceph-02, ceph-03, ceph-25, ceph-26 > mds: con-fs2-2/2/1 up {0=ceph-08=up:active,1=ceph-17=up:stopping}, 10 up:standby > osd: 1051 osds: 1050 up, 1050 in > > data: > pools: 13 pools, 17374 pgs > objects: 1.01 G objects, 1.9 PiB > usage: 2.3 PiB used, 9.2 PiB / 11 PiB avail > pgs: 17352 active+clean > 20 active+clean+scrubbing+deep > 2 active+clean+scrubbing > > io: > client: 129 MiB/s rd, 175 MiB/s wr, 2.57 kop/s rd, 2.77 kop/s wr > > # ceph health detail > HEALTH_WARN 2 MDSs report slow requests > MDS_SLOW_REQUEST 2 MDSs report slow requests > mdsceph-08(mds.0): 3 slow requests are blocked > 30 secs > mdsceph-17(mds.1): 3 slow requests are blocked > 30 secs > > # ssh ceph-08 ceph daemon mds.ceph-08 dump_blocked_ops > { > "ops": [ > { > "description": "client_request(mds.1:126521 rename #0x100/stray5/1000eec35f7 #0x101/stray5/1000eec35f7 caller_uid=0, caller_gid=0{})", > "initiated_at": "2021-12-13 13:08:59.430597", > "age": 5034.983083, > "duration": 5034.983109, > "type_data": { > "flag_point": "acquired locks", > "reqid": "mds.1:126521", > "op_type": "client_request", > "client_info": { > "client": "mds.1", > "tid": 126521 > }, > "events": [ > { > "time": "2021-12-13 13:08:59.430597", > "event": "initiated" > }, > { > "time": "2021-12-13 13:08:59.430597", > "event": "header_read" > }, > { > "time": "2021-12-13 13:08:59.430597", > "event": "throttled" > }, > { > "time": "2021-12-13 13:08:59.430601", > "event": "all_read" > }, > { > "time": "2021-12-13 13:09:00.730197", > "event": "dispatched" > }, > { > "time": "2021-12-13 13:09:01.517306", > "event": "requesting remote authpins" > }, > { > "time": "2021-12-13 13:09:01.557219", > "event": "failed to xlock, waiting" > }, > { > "time": "2021-12-13 13:09:01.647692", > "event": "failed to wrlock, waiting" > }, > { > "time": "2021-12-13 13:09:01.663629", > "event": "waiting for remote wrlocks" > }, > { > "time": "2021-12-13 13:09:01.673789", > "event": "waiting for remote wrlocks" > }, > { > "time": "2021-12-13 13:09:01.676523", > "event": "failed to xlock, waiting" > }, > { > "time": "2021-12-13 13:09:01.691962", > "event": "failed to xlock, waiting" > }, > { > "time": "2021-12-13 13:09:01.704202", > "event": "acquired locks" > } > ] > } > }, > { > "description": "client_request(mds.1:1 rename #0x100/stray5/1000eec35f7 #0x101/stray5/1000eec35f7 caller_uid=0, caller_gid=0{})", > "initiated_at": "2021-12-13 13:31:56.260453", > "age": 3658.153227, > "duration": 3658.153337, > "type_data": { > "flag_point": "requesting remote authpins", > "reqid": "mds.1:1", > "op_type": "client_request", > "client_info": { > "client": "mds.1", > "tid": 1 > }, > "events": [ > { > "time": "2021-12-13 13:31:56.260453", > "event": "initiated" > }, > { > "time": "2021-12-13 13:31:56.260453", > "event": "header_read" > }, > { > "time": "2021-12-13 13:31:56.260454", > "event": "throttled" > }, > { > "time": "2021-12-13 13:31:56.260461", > "event": "all_read" > }, > { > "time": "2021-12-13 13:31:56.260511", > "event": "dispatched" > }, > { > "time": "2021-12-13 13:31:56.260604", > "event": "requesting remote authpins" > } > ] > } > }, > { > "description": "client_request(mds.1:993 rename #0x100/stray5/1000eec35f7 #0x101/stray5/1000eec35f7 caller_uid=0, caller_gid=0{})", > "initiated_at": "2021-12-13 13:15:31.979997", > "age": 4642.433683, > "duration": 4642.433850, > "type_data": { > "flag_point": "requesting remote authpins", > "reqid": "mds.1:993", > "op_type": "client_request", > "client_info": { > "client": "mds.1", > "tid": 993 > }, > "events": [ > { > "time": "2021-12-13 13:15:31.979997", > "event": "initiated" > }, > { > "time": "2021-12-13 13:15:31.979997", > "event": "header_read" > }, > { > "time": "2021-12-13 13:15:31.979998", > "event": "throttled" > }, > { > "time": "2021-12-13 13:15:31.980003", > "event": "all_read" > }, > { > "time": "2021-12-13 13:15:31.980079", > "event": "dispatched" > }, > { > "time": "2021-12-13 13:15:31.980174", > "event": "requesting remote authpins" > }, > { > "time": "2021-12-13 13:31:50.634734", > "event": "requesting remote authpins" > } > ] > } > } > ], > "complaint_time": 30.000000, > "num_blocked_ops": 3 > } > > # ssh ceph-17 ceph daemon mds.ceph-17 dump_blocked_ops > { > "ops": [ > { > "description": "rejoin:mds.1:126521", > "initiated_at": "2021-12-13 13:30:34.602931", > "age": 3791.164314, > "duration": 3791.164335, > "type_data": { > "flag_point": "dispatched", > "reqid": "mds.1:126521", > "op_type": "no_available_op_found", > "events": [ > { > "time": "2021-12-13 13:30:34.602931", > "event": "initiated" > }, > { > "time": "2021-12-13 13:30:34.602931", > "event": "header_read" > }, > { > "time": "2021-12-13 13:30:34.602932", > "event": "throttled" > }, > { > "time": "2021-12-13 13:30:34.602978", > "event": "all_read" > }, > { > "time": "2021-12-13 13:30:34.605856", > "event": "dispatched" > } > ] > } > }, > { > "description": "slave_request(mds.1:993.0 authpin)", > "initiated_at": "2021-12-13 13:31:50.634857", > "age": 3715.132388, > "duration": 3715.132451, > "type_data": { > "flag_point": "dispatched", > "reqid": "mds.1:993", > "op_type": "slave_request", > "master_info": { > "master": "mds.0" > }, > "request_info": { > "attempt": 0, > "op_type": "authpin", > "lock_type": 0, > "object_info": "0x1000eec35f7.head", > "srcdnpath": "", > "destdnpath": "", > "witnesses": "", > "has_inode_export": false, > "inode_export_v": 0, > "op_stamp": "0.000000" > }, > "events": [ > { > "time": "2021-12-13 13:31:50.634857", > "event": "initiated" > }, > { > "time": "2021-12-13 13:31:50.634857", > "event": "header_read" > }, > { > "time": "2021-12-13 13:31:50.634858", > "event": "throttled" > }, > { > "time": "2021-12-13 13:31:50.634867", > "event": "all_read" > }, > { > "time": "2021-12-13 13:31:50.634893", > "event": "dispatched" > } > ] > } > }, > { > "description": "slave_request(mds.1:1.0 authpin)", > "initiated_at": "2021-12-13 13:31:56.260729", > "age": 3709.506516, > "duration": 3709.506631, > "type_data": { > "flag_point": "dispatched", > "reqid": "mds.1:1", > "op_type": "slave_request", > "master_info": { > "master": "mds.0" > }, > "request_info": { > "attempt": 0, > "op_type": "authpin", > "lock_type": 0, > "object_info": "0x1000eec35f7.head", > "srcdnpath": "", > "destdnpath": "", > "witnesses": "", > "has_inode_export": false, > "inode_export_v": 0, > "op_stamp": "0.000000" > }, > "events": [ > { > "time": "2021-12-13 13:31:56.260729", > "event": "initiated" > }, > { > "time": "2021-12-13 13:31:56.260729", > "event": "header_read" > }, > { > "time": "2021-12-13 13:31:56.260731", > "event": "throttled" > }, > { > "time": "2021-12-13 13:31:56.260743", > "event": "all_read" > }, > { > "time": "2021-12-13 13:31:56.264063", > "event": "dispatched" > } > ] > } > } > ], > "complaint_time": 30.000000, > "num_blocked_ops": 3 > } > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx