Re: MDS stuck ops

Venky Shankar <vshankar@xxxxxxxxxx> · Mon, 28 Nov 2022 23:11:19 +0530

On Mon, Nov 28, 2022 at 10:19 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
>
> Hopefully someone will be able to point me in the right direction here:
>
> Cluster is Octopus/15.2.17 on Ubuntu 20.04.
> All are kernel cephfs clients, either 5.4.0-131-generic or 5.15.0-52-generic.
> Cluster is nearful, and more storage is coming, but still 2-4 weeks out from delivery.
>
> > HEALTH_WARN 1 clients failing to respond to capability release; 1 clients failing to advance oldest client/flush tid; 1 MDSs report slow requests; 2 MDSs behind on trimming; 28 nearfull osd(s); 8 pool(s) nearfull; (muted: MDS_CLIENT_RECALL POOL_TOO_FEW_PGS POOL_TOO_MANY_PGS)
> > [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
> >     mds.mds1(mds.0): Client $client1 failing to respond to capability release client_id: 2825526519
> > [WRN] MDS_CLIENT_OLDEST_TID: 1 clients failing to advance oldest client/flush tid
> >     mds.mds1(mds.0): Client $client2 failing to advance its oldest client/flush tid.  client_id: 2825533964
> > [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
> >     mds.mds1(mds.0): 4 slow requests are blocked > 30 secs
> > [WRN] MDS_TRIM: 2 MDSs behind on trimming
> >     mds.mds1(mds.0): Behind on trimming (13258/128) max_segments: 128, num_segments: 13258
> >     mds.mds2(mds.0): Behind on trimming (13260/128) max_segments: 128, num_segments: 13260
> > [WRN] OSD_NEARFULL: 28 nearfull osd(s)
>
> > cephfs - 121 clients
> > ======
> > RANK      STATE       MDS      ACTIVITY     DNS    INOS
> >  0        active       mds1   Reqs: 4303 /s  5905k  5880k
> > 0-s   standby-replay   mds2   Evts:  244 /s  1483k   586k
> >     POOL       TYPE     USED  AVAIL
> > fs-metadata  metadata   243G  11.0T
> >    fs-hd3      data    3191G  12.0T
> >    fs-ec73     data     169T  25.3T
> >    fs-ec82     data     211T  28.9T
> > MDS version: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
>
> Pastebin of mds ops-in-flight: https://pastebin.com/5DqBDynj <https://pastebin.com/5DqBDynj>

A good chunk of those are waiting for the directory to finish
fragmentation (split). I think those ops are not progressing since
fragmentation involves creating more objects in the metadata pool.

>
> I seem to have about 43 mds ops that are just stuck and not progressing, and I’m unsure how to unstick the ops and get everything back to a healthy state.
> Comparing the client ID’s for the stuck ops against ceph tell mds.$mds client ls, I don’t see any patterns for a specific problematic client(s) or kernel version(s).
> The fs-metadata pool is on SSDs, while the data pools are on HDD’s in various replication/EC configs.
>
> I decreased the mds_cache_trim_decay_rate down to 0.9, but the num_segments just continues to climb.
> I suspect that trimming may be queued behind some operation that is stuck.

Update ops will involve appending to the mds journal consuming disk
space which you are already running out of.

>
> I’ve considered bumping up the nearful ratio up to try and see if getting out of synchronous writes penalty makes any difference, but I assume something may be more deeply unhappy than just that.
>
> Appreciate any pointers anyone can give.

If you have snapshots that are no longer required, maybe consider
deleting those?

>
> Thanks,
> Reed
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx