Re: cephfs health warn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Venky, and cephers

Thanks for reply.

no config changes had been made before the issues occurred. It suspects to
be client bug. Please see following message about the log segment
accumulation to be trimmed.for the moment problematic client nodes can not
be rebooted.evicting client will definitely interrupt business. Any
thoughts to stop the warnings?

Best wishes,
Ben

Venky Shankar <vshankar@xxxxxxxxxx> 于2023年9月28日周四 11:56写道:

> Hi Ben,
>
> On Tue, Sep 26, 2023 at 6:02 PM Ben <ruidong.gao@xxxxxxxxx> wrote:
> >
> > Hi,
> > see below for details of warnings.
> > the cluster is running 17.2.5. the warnings have been around for a while.
> > one concern of mine is num_segments growing over time.
>
> Any config changes related to trimming that was done? A slow metadata
> pool can also cause slow journal trimming.
>
> > clients with
> > warn of MDS_CLIENT_OLDEST_TID
> > increase from 18 to 25 as well.
>
> You're likely running into
>
>         https://tracker.ceph.com/issues/62257
>
> It's a bug likely in the MDS since this warning even shows up with
> kclients.
>
> The nodes are with kernel
> > 4.19.0-91.82.42.uelc20.x86_64.
> > It looks like bugs with client library. And rebooting nodes with problem
> > will fix it for short period of time? Any suggestions from community for
> > fixing?
> >
> > Thanks,
> > Ben
> >
> >
> > [root@8cd2c0657c77 /]# ceph health detail
> >
> > HEALTH_WARN 6 hosts fail cephadm check; 2 clients failing to respond to
> > capability release; 25 clients failing to advance oldest client/flush
> tid;
> > 3 MDSs report slow requests; 3 MDSs behind on trimming
> >
> > [WRN] CEPHADM_HOST_CHECK_FAILED: 6 hosts fail cephadm check
> >
> >     host host15w (192.168.31.33) failed check: Unable to reach remote
> host
> > host15w. Process exited with non-zero exit status 1
> >
> >     host host20w (192.168.31.38) failed check: Unable to reach remote
> host
> > host20w. Process exited with non-zero exit status 1
> >
> >     host host19w (192.168.31.37) failed check: Unable to reach remote
> host
> > host19w. Process exited with non-zero exit status 1
> >
> >     host host17w (192.168.31.35) failed check: Unable to reach remote
> host
> > host17w. Process exited with non-zero exit status 1
> >
> >     host host18w (192.168.31.36) failed check: Unable to reach remote
> host
> > host18w. Process exited with non-zero exit status 1
> >
> >     host host16w (192.168.31.34) failed check: Unable to reach remote
> host
> > host16w. Process exited with non-zero exit status 1
> >
> > [WRN] MDS_CLIENT_LATE_RELEASE: 2 clients failing to respond to capability
> > release
> >
> >     mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to
> > respond to capability release client_id: 460983
> >
> >     mds.code-store.host16w.vucirx(mds.3): Client  failing to respond to
> > capability release client_id: 460983
> >
> > [WRN] MDS_CLIENT_OLDEST_TID: 25 clients failing to advance oldest
> > client/flush tid
> >
> >     mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to
> > advance its oldest client/flush tid.  client_id: 460983
> >
> >     mds.code-store.host18w.fdsqff(mds.1): Client  failing to advance its
> > oldest client/flush tid.  client_id: 460226
> >
> >     mds.code-store.host18w.fdsqff(mds.1): Client k8s-node32 failing to
> > advance its oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host15w.reolpx(mds.5): Client k8s-node34 failing to
> > advance its oldest client/flush tid.  client_id: 460226
> >
> >     mds.code-store.host15w.reolpx(mds.5): Client k8s-node32 failing to
> > advance its oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host15w.reolpx(mds.5): Client  failing to advance its
> > oldest client/flush tid.  client_id: 460983
> >
> >     mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node34 failing to
> > advance its oldest client/flush tid.  client_id: 460226
> >
> >     mds.code-store.host18w.rtyvdy(mds.7): Client  failing to advance its
> > oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node36 failing to
> > advance its oldest client/flush tid.  client_id: 460983
> >
> >     mds.code-store.host17w.kcdopb(mds.2): Client  failing to advance its
> > oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host17w.kcdopb(mds.2): Client  failing to advance its
> > oldest client/flush tid.  client_id: 460983
> >
> >     mds.code-store.host17w.kcdopb(mds.2): Client k8s-node34 failing to
> > advance its oldest client/flush tid.  client_id: 460226
> >
> >     mds.code-store.host17w.kcdopb(mds.2): Client k8s-node24 failing to
> > advance its oldest client/flush tid.  client_id: 12072730
> >
> >     mds.code-store.host20w.bfoftp(mds.4): Client k8s-node32 failing to
> > advance its oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host20w.bfoftp(mds.4): Client k8s-node36 failing to
> > advance its oldest client/flush tid.  client_id: 460983
> >
> >     mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node24 failing to
> > advance its oldest client/flush tid.  client_id: 12072730
> >
> >     mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node34 failing to
> > advance its oldest client/flush tid.  client_id: 460226
> >
> >     mds.code-store.host19w.ywrmiz(mds.6): Client  failing to advance its
> > oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host19w.ywrmiz(mds.6): Client  failing to advance its
> > oldest client/flush tid.  client_id: 460983
> >
> >     mds.code-store.host16w.vucirx(mds.3): Client  failing to advance its
> > oldest client/flush tid.  client_id: 460983
> >
> >     mds.code-store.host16w.vucirx(mds.3): Client  failing to advance its
> > oldest client/flush tid.  client_id: 460226
> >
> >     mds.code-store.host16w.vucirx(mds.3): Client  failing to advance its
> > oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host17w.pdziet(mds.0): Client k8s-node32 failing to
> > advance its oldest client/flush tid.  client_id: 239797
> >
> >     mds.code-store.host17w.pdziet(mds.0): Client k8s-node34 failing to
> > advance its oldest client/flush tid.  client_id: 460226
> >
> >     mds.code-store.host17w.pdziet(mds.0): Client k8s-node36 failing to
> > advance its oldest client/flush tid.  client_id: 460983
> >
> > [WRN] MDS_SLOW_REQUEST: 3 MDSs report slow requests
> >
> >     mds.code-store.host15w.reolpx(mds.5): 4 slow requests are blocked > 5
> > secs
> >
> >     mds.code-store.host20w.bfoftp(mds.4): 6 slow requests are blocked > 5
> > secs
> >
> >     mds.code-store.host16w.vucirx(mds.3): 97 slow requests are blocked >
> 5
> > secs
> >
> > [WRN] MDS_TRIM: 3 MDSs behind on trimming
> >
> >     mds.code-store.host15w.reolpx(mds.5): Behind on trimming (25831/128)
> > max_segments: 128, num_segments: 25831
> >
> >     mds.code-store.host20w.bfoftp(mds.4): Behind on trimming (27605/128)
> > max_segments: 128, num_segments: 27605
> >
> >     mds.code-store.host16w.vucirx(mds.3): Behind on trimming (28676/128)
> > max_segments: 128, num_segments: 28676
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
>
>
> --
> Cheers,
> Venky
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux