Hi Venky, and cephers Thanks for reply. no config changes had been made before the issues occurred. It suspects to be client bug. Please see following message about the log segment accumulation to be trimmed.for the moment problematic client nodes can not be rebooted.evicting client will definitely interrupt business. Any thoughts to stop the warnings? Best wishes, Ben Venky Shankar <vshankar@xxxxxxxxxx> 于2023年9月28日周四 11:56写道: > Hi Ben, > > On Tue, Sep 26, 2023 at 6:02 PM Ben <ruidong.gao@xxxxxxxxx> wrote: > > > > Hi, > > see below for details of warnings. > > the cluster is running 17.2.5. the warnings have been around for a while. > > one concern of mine is num_segments growing over time. > > Any config changes related to trimming that was done? A slow metadata > pool can also cause slow journal trimming. > > > clients with > > warn of MDS_CLIENT_OLDEST_TID > > increase from 18 to 25 as well. > > You're likely running into > > https://tracker.ceph.com/issues/62257 > > It's a bug likely in the MDS since this warning even shows up with > kclients. > > The nodes are with kernel > > 4.19.0-91.82.42.uelc20.x86_64. > > It looks like bugs with client library. And rebooting nodes with problem > > will fix it for short period of time? Any suggestions from community for > > fixing? > > > > Thanks, > > Ben > > > > > > [root@8cd2c0657c77 /]# ceph health detail > > > > HEALTH_WARN 6 hosts fail cephadm check; 2 clients failing to respond to > > capability release; 25 clients failing to advance oldest client/flush > tid; > > 3 MDSs report slow requests; 3 MDSs behind on trimming > > > > [WRN] CEPHADM_HOST_CHECK_FAILED: 6 hosts fail cephadm check > > > > host host15w (192.168.31.33) failed check: Unable to reach remote > host > > host15w. Process exited with non-zero exit status 1 > > > > host host20w (192.168.31.38) failed check: Unable to reach remote > host > > host20w. Process exited with non-zero exit status 1 > > > > host host19w (192.168.31.37) failed check: Unable to reach remote > host > > host19w. Process exited with non-zero exit status 1 > > > > host host17w (192.168.31.35) failed check: Unable to reach remote > host > > host17w. Process exited with non-zero exit status 1 > > > > host host18w (192.168.31.36) failed check: Unable to reach remote > host > > host18w. Process exited with non-zero exit status 1 > > > > host host16w (192.168.31.34) failed check: Unable to reach remote > host > > host16w. Process exited with non-zero exit status 1 > > > > [WRN] MDS_CLIENT_LATE_RELEASE: 2 clients failing to respond to capability > > release > > > > mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to > > respond to capability release client_id: 460983 > > > > mds.code-store.host16w.vucirx(mds.3): Client failing to respond to > > capability release client_id: 460983 > > > > [WRN] MDS_CLIENT_OLDEST_TID: 25 clients failing to advance oldest > > client/flush tid > > > > mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to > > advance its oldest client/flush tid. client_id: 460983 > > > > mds.code-store.host18w.fdsqff(mds.1): Client failing to advance its > > oldest client/flush tid. client_id: 460226 > > > > mds.code-store.host18w.fdsqff(mds.1): Client k8s-node32 failing to > > advance its oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host15w.reolpx(mds.5): Client k8s-node34 failing to > > advance its oldest client/flush tid. client_id: 460226 > > > > mds.code-store.host15w.reolpx(mds.5): Client k8s-node32 failing to > > advance its oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host15w.reolpx(mds.5): Client failing to advance its > > oldest client/flush tid. client_id: 460983 > > > > mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node34 failing to > > advance its oldest client/flush tid. client_id: 460226 > > > > mds.code-store.host18w.rtyvdy(mds.7): Client failing to advance its > > oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node36 failing to > > advance its oldest client/flush tid. client_id: 460983 > > > > mds.code-store.host17w.kcdopb(mds.2): Client failing to advance its > > oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host17w.kcdopb(mds.2): Client failing to advance its > > oldest client/flush tid. client_id: 460983 > > > > mds.code-store.host17w.kcdopb(mds.2): Client k8s-node34 failing to > > advance its oldest client/flush tid. client_id: 460226 > > > > mds.code-store.host17w.kcdopb(mds.2): Client k8s-node24 failing to > > advance its oldest client/flush tid. client_id: 12072730 > > > > mds.code-store.host20w.bfoftp(mds.4): Client k8s-node32 failing to > > advance its oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host20w.bfoftp(mds.4): Client k8s-node36 failing to > > advance its oldest client/flush tid. client_id: 460983 > > > > mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node24 failing to > > advance its oldest client/flush tid. client_id: 12072730 > > > > mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node34 failing to > > advance its oldest client/flush tid. client_id: 460226 > > > > mds.code-store.host19w.ywrmiz(mds.6): Client failing to advance its > > oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host19w.ywrmiz(mds.6): Client failing to advance its > > oldest client/flush tid. client_id: 460983 > > > > mds.code-store.host16w.vucirx(mds.3): Client failing to advance its > > oldest client/flush tid. client_id: 460983 > > > > mds.code-store.host16w.vucirx(mds.3): Client failing to advance its > > oldest client/flush tid. client_id: 460226 > > > > mds.code-store.host16w.vucirx(mds.3): Client failing to advance its > > oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host17w.pdziet(mds.0): Client k8s-node32 failing to > > advance its oldest client/flush tid. client_id: 239797 > > > > mds.code-store.host17w.pdziet(mds.0): Client k8s-node34 failing to > > advance its oldest client/flush tid. client_id: 460226 > > > > mds.code-store.host17w.pdziet(mds.0): Client k8s-node36 failing to > > advance its oldest client/flush tid. client_id: 460983 > > > > [WRN] MDS_SLOW_REQUEST: 3 MDSs report slow requests > > > > mds.code-store.host15w.reolpx(mds.5): 4 slow requests are blocked > 5 > > secs > > > > mds.code-store.host20w.bfoftp(mds.4): 6 slow requests are blocked > 5 > > secs > > > > mds.code-store.host16w.vucirx(mds.3): 97 slow requests are blocked > > 5 > > secs > > > > [WRN] MDS_TRIM: 3 MDSs behind on trimming > > > > mds.code-store.host15w.reolpx(mds.5): Behind on trimming (25831/128) > > max_segments: 128, num_segments: 25831 > > > > mds.code-store.host20w.bfoftp(mds.4): Behind on trimming (27605/128) > > max_segments: 128, num_segments: 27605 > > > > mds.code-store.host16w.vucirx(mds.3): Behind on trimming (28676/128) > > max_segments: 128, num_segments: 28676 > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > -- > Cheers, > Venky > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx