Hi Venky, Thanks for responding. > A good chunk of those are waiting for the directory to finish > fragmentation (split). I think those ops are not progressing since > fragmentation involves creating more objects in the metadata pool. > Update ops will involve appending to the mds journal consuming disk > space which you are already running out of. So the metadata pool is on SSD’s, which are not nearful. So I don’t believe that space should be an issue. > POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL > fs-metadata 16 32 84 GiB 10.63M 251 GiB 0.74 11 TiB But in the past I feel like all OSDs got implicated in the nearful penalty. Assuming that to be true, could the dirfrag split be slowed by the nearful sync writes? If so, maybe moving the nearful needle temporarily could get the dirfrag split across the finish line, and then I can retreat to nearful safety? Is there a way to monitor dirfrag progress? > If you have snapshots that are no longer required, maybe consider > deleting those? There are actually no snapshots on cephfs, so that shouldn’t be an issue either. > # ceph fs get cephfs > Filesystem 'cephfs' (1) > fs_name cephfs > epoch 1081642 > flags 30 > created 2016-12-01T12:02:37.528559-0500 > modified 2022-11-28T13:03:52.630590-0500 > tableserver 0 > root 0 > session_timeout 60 > session_autoclose 300 > max_file_size 1099511627776 > min_compat_client 0 (unknown) > last_failure 0 > last_failure_osd_epoch 0 > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} > max_mds 1 > in 0 > up {0=2824746206} > failed > damaged > stopped > data_pools [17,37,40] > metadata_pool 16 > inline_data disabled > balancer > standby_count_wanted 1 Including the fs info in case there is a compat issue that stands out? Only a single rank, with active/standby-replay MDS. I also don’t have any MDS specific configs set, outside of mds_cache_memory_limit and mds_standby_replay, So all of the mds_bal_* values should be defaults. Again, appreciate the pointers. Thanks, Reed > On Nov 28, 2022, at 11:41 AM, Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > On Mon, Nov 28, 2022 at 10:19 PM Reed Dier <reed.dier@xxxxxxxxxxx <mailto:reed.dier@xxxxxxxxxxx>> wrote: >> >> Hopefully someone will be able to point me in the right direction here: >> >> Cluster is Octopus/15.2.17 on Ubuntu 20.04. >> All are kernel cephfs clients, either 5.4.0-131-generic or 5.15.0-52-generic. >> Cluster is nearful, and more storage is coming, but still 2-4 weeks out from delivery. >> >>> HEALTH_WARN 1 clients failing to respond to capability release; 1 clients failing to advance oldest client/flush tid; 1 MDSs report slow requests; 2 MDSs behind on trimming; 28 nearfull osd(s); 8 pool(s) nearfull; (muted: MDS_CLIENT_RECALL POOL_TOO_FEW_PGS POOL_TOO_MANY_PGS) >>> [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release >>> mds.mds1(mds.0): Client $client1 failing to respond to capability release client_id: 2825526519 >>> [WRN] MDS_CLIENT_OLDEST_TID: 1 clients failing to advance oldest client/flush tid >>> mds.mds1(mds.0): Client $client2 failing to advance its oldest client/flush tid. client_id: 2825533964 >>> [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests >>> mds.mds1(mds.0): 4 slow requests are blocked > 30 secs >>> [WRN] MDS_TRIM: 2 MDSs behind on trimming >>> mds.mds1(mds.0): Behind on trimming (13258/128) max_segments: 128, num_segments: 13258 >>> mds.mds2(mds.0): Behind on trimming (13260/128) max_segments: 128, num_segments: 13260 >>> [WRN] OSD_NEARFULL: 28 nearfull osd(s) >> >>> cephfs - 121 clients >>> ====== >>> RANK STATE MDS ACTIVITY DNS INOS >>> 0 active mds1 Reqs: 4303 /s 5905k 5880k >>> 0-s standby-replay mds2 Evts: 244 /s 1483k 586k >>> POOL TYPE USED AVAIL >>> fs-metadata metadata 243G 11.0T >>> fs-hd3 data 3191G 12.0T >>> fs-ec73 data 169T 25.3T >>> fs-ec82 data 211T 28.9T >>> MDS version: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) >> >> Pastebin of mds ops-in-flight: https://pastebin.com/5DqBDynj <https://pastebin.com/5DqBDynj> <https://pastebin.com/5DqBDynj <https://pastebin.com/5DqBDynj>> > > A good chunk of those are waiting for the directory to finish > fragmentation (split). I think those ops are not progressing since > fragmentation involves creating more objects in the metadata pool. > >> >> I seem to have about 43 mds ops that are just stuck and not progressing, and I’m unsure how to unstick the ops and get everything back to a healthy state. >> Comparing the client ID’s for the stuck ops against ceph tell mds.$mds client ls, I don’t see any patterns for a specific problematic client(s) or kernel version(s). >> The fs-metadata pool is on SSDs, while the data pools are on HDD’s in various replication/EC configs. >> >> I decreased the mds_cache_trim_decay_rate down to 0.9, but the num_segments just continues to climb. >> I suspect that trimming may be queued behind some operation that is stuck. > > Update ops will involve appending to the mds journal consuming disk > space which you are already running out of. > >> >> I’ve considered bumping up the nearful ratio up to try and see if getting out of synchronous writes penalty makes any difference, but I assume something may be more deeply unhappy than just that. >> >> Appreciate any pointers anyone can give. > > If you have snapshots that are no longer required, maybe consider > deleting those? > >> >> Thanks, >> Reed >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> > > > > -- > Cheers, > Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx