Hi Ben, On Tue, Sep 26, 2023 at 6:02 PM Ben <ruidong.gao@xxxxxxxxx> wrote: > > Hi, > see below for details of warnings. > the cluster is running 17.2.5. the warnings have been around for a while. > one concern of mine is num_segments growing over time. Any config changes related to trimming that was done? A slow metadata pool can also cause slow journal trimming. > clients with > warn of MDS_CLIENT_OLDEST_TID > increase from 18 to 25 as well. You're likely running into https://tracker.ceph.com/issues/62257 It's a bug likely in the MDS since this warning even shows up with kclients. The nodes are with kernel > 4.19.0-91.82.42.uelc20.x86_64. > It looks like bugs with client library. And rebooting nodes with problem > will fix it for short period of time? Any suggestions from community for > fixing? > > Thanks, > Ben > > > [root@8cd2c0657c77 /]# ceph health detail > > HEALTH_WARN 6 hosts fail cephadm check; 2 clients failing to respond to > capability release; 25 clients failing to advance oldest client/flush tid; > 3 MDSs report slow requests; 3 MDSs behind on trimming > > [WRN] CEPHADM_HOST_CHECK_FAILED: 6 hosts fail cephadm check > > host host15w (192.168.31.33) failed check: Unable to reach remote host > host15w. Process exited with non-zero exit status 1 > > host host20w (192.168.31.38) failed check: Unable to reach remote host > host20w. Process exited with non-zero exit status 1 > > host host19w (192.168.31.37) failed check: Unable to reach remote host > host19w. Process exited with non-zero exit status 1 > > host host17w (192.168.31.35) failed check: Unable to reach remote host > host17w. Process exited with non-zero exit status 1 > > host host18w (192.168.31.36) failed check: Unable to reach remote host > host18w. Process exited with non-zero exit status 1 > > host host16w (192.168.31.34) failed check: Unable to reach remote host > host16w. Process exited with non-zero exit status 1 > > [WRN] MDS_CLIENT_LATE_RELEASE: 2 clients failing to respond to capability > release > > mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to > respond to capability release client_id: 460983 > > mds.code-store.host16w.vucirx(mds.3): Client failing to respond to > capability release client_id: 460983 > > [WRN] MDS_CLIENT_OLDEST_TID: 25 clients failing to advance oldest > client/flush tid > > mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to > advance its oldest client/flush tid. client_id: 460983 > > mds.code-store.host18w.fdsqff(mds.1): Client failing to advance its > oldest client/flush tid. client_id: 460226 > > mds.code-store.host18w.fdsqff(mds.1): Client k8s-node32 failing to > advance its oldest client/flush tid. client_id: 239797 > > mds.code-store.host15w.reolpx(mds.5): Client k8s-node34 failing to > advance its oldest client/flush tid. client_id: 460226 > > mds.code-store.host15w.reolpx(mds.5): Client k8s-node32 failing to > advance its oldest client/flush tid. client_id: 239797 > > mds.code-store.host15w.reolpx(mds.5): Client failing to advance its > oldest client/flush tid. client_id: 460983 > > mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node34 failing to > advance its oldest client/flush tid. client_id: 460226 > > mds.code-store.host18w.rtyvdy(mds.7): Client failing to advance its > oldest client/flush tid. client_id: 239797 > > mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node36 failing to > advance its oldest client/flush tid. client_id: 460983 > > mds.code-store.host17w.kcdopb(mds.2): Client failing to advance its > oldest client/flush tid. client_id: 239797 > > mds.code-store.host17w.kcdopb(mds.2): Client failing to advance its > oldest client/flush tid. client_id: 460983 > > mds.code-store.host17w.kcdopb(mds.2): Client k8s-node34 failing to > advance its oldest client/flush tid. client_id: 460226 > > mds.code-store.host17w.kcdopb(mds.2): Client k8s-node24 failing to > advance its oldest client/flush tid. client_id: 12072730 > > mds.code-store.host20w.bfoftp(mds.4): Client k8s-node32 failing to > advance its oldest client/flush tid. client_id: 239797 > > mds.code-store.host20w.bfoftp(mds.4): Client k8s-node36 failing to > advance its oldest client/flush tid. client_id: 460983 > > mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node24 failing to > advance its oldest client/flush tid. client_id: 12072730 > > mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node34 failing to > advance its oldest client/flush tid. client_id: 460226 > > mds.code-store.host19w.ywrmiz(mds.6): Client failing to advance its > oldest client/flush tid. client_id: 239797 > > mds.code-store.host19w.ywrmiz(mds.6): Client failing to advance its > oldest client/flush tid. client_id: 460983 > > mds.code-store.host16w.vucirx(mds.3): Client failing to advance its > oldest client/flush tid. client_id: 460983 > > mds.code-store.host16w.vucirx(mds.3): Client failing to advance its > oldest client/flush tid. client_id: 460226 > > mds.code-store.host16w.vucirx(mds.3): Client failing to advance its > oldest client/flush tid. client_id: 239797 > > mds.code-store.host17w.pdziet(mds.0): Client k8s-node32 failing to > advance its oldest client/flush tid. client_id: 239797 > > mds.code-store.host17w.pdziet(mds.0): Client k8s-node34 failing to > advance its oldest client/flush tid. client_id: 460226 > > mds.code-store.host17w.pdziet(mds.0): Client k8s-node36 failing to > advance its oldest client/flush tid. client_id: 460983 > > [WRN] MDS_SLOW_REQUEST: 3 MDSs report slow requests > > mds.code-store.host15w.reolpx(mds.5): 4 slow requests are blocked > 5 > secs > > mds.code-store.host20w.bfoftp(mds.4): 6 slow requests are blocked > 5 > secs > > mds.code-store.host16w.vucirx(mds.3): 97 slow requests are blocked > 5 > secs > > [WRN] MDS_TRIM: 3 MDSs behind on trimming > > mds.code-store.host15w.reolpx(mds.5): Behind on trimming (25831/128) > max_segments: 128, num_segments: 25831 > > mds.code-store.host20w.bfoftp(mds.4): Behind on trimming (27605/128) > max_segments: 128, num_segments: 27605 > > mds.code-store.host16w.vucirx(mds.3): Behind on trimming (28676/128) > max_segments: 128, num_segments: 28676 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx