Re: cephfs health warn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ben,

On Tue, Sep 26, 2023 at 6:02 PM Ben <ruidong.gao@xxxxxxxxx> wrote:
>
> Hi,
> see below for details of warnings.
> the cluster is running 17.2.5. the warnings have been around for a while.
> one concern of mine is num_segments growing over time.

Any config changes related to trimming that was done? A slow metadata
pool can also cause slow journal trimming.

> clients with
> warn of MDS_CLIENT_OLDEST_TID
> increase from 18 to 25 as well.

You're likely running into

        https://tracker.ceph.com/issues/62257

It's a bug likely in the MDS since this warning even shows up with kclients.

The nodes are with kernel
> 4.19.0-91.82.42.uelc20.x86_64.
> It looks like bugs with client library. And rebooting nodes with problem
> will fix it for short period of time? Any suggestions from community for
> fixing?
>
> Thanks,
> Ben
>
>
> [root@8cd2c0657c77 /]# ceph health detail
>
> HEALTH_WARN 6 hosts fail cephadm check; 2 clients failing to respond to
> capability release; 25 clients failing to advance oldest client/flush tid;
> 3 MDSs report slow requests; 3 MDSs behind on trimming
>
> [WRN] CEPHADM_HOST_CHECK_FAILED: 6 hosts fail cephadm check
>
>     host host15w (192.168.31.33) failed check: Unable to reach remote host
> host15w. Process exited with non-zero exit status 1
>
>     host host20w (192.168.31.38) failed check: Unable to reach remote host
> host20w. Process exited with non-zero exit status 1
>
>     host host19w (192.168.31.37) failed check: Unable to reach remote host
> host19w. Process exited with non-zero exit status 1
>
>     host host17w (192.168.31.35) failed check: Unable to reach remote host
> host17w. Process exited with non-zero exit status 1
>
>     host host18w (192.168.31.36) failed check: Unable to reach remote host
> host18w. Process exited with non-zero exit status 1
>
>     host host16w (192.168.31.34) failed check: Unable to reach remote host
> host16w. Process exited with non-zero exit status 1
>
> [WRN] MDS_CLIENT_LATE_RELEASE: 2 clients failing to respond to capability
> release
>
>     mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to
> respond to capability release client_id: 460983
>
>     mds.code-store.host16w.vucirx(mds.3): Client  failing to respond to
> capability release client_id: 460983
>
> [WRN] MDS_CLIENT_OLDEST_TID: 25 clients failing to advance oldest
> client/flush tid
>
>     mds.code-store.host18w.fdsqff(mds.1): Client k8s-node36 failing to
> advance its oldest client/flush tid.  client_id: 460983
>
>     mds.code-store.host18w.fdsqff(mds.1): Client  failing to advance its
> oldest client/flush tid.  client_id: 460226
>
>     mds.code-store.host18w.fdsqff(mds.1): Client k8s-node32 failing to
> advance its oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host15w.reolpx(mds.5): Client k8s-node34 failing to
> advance its oldest client/flush tid.  client_id: 460226
>
>     mds.code-store.host15w.reolpx(mds.5): Client k8s-node32 failing to
> advance its oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host15w.reolpx(mds.5): Client  failing to advance its
> oldest client/flush tid.  client_id: 460983
>
>     mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node34 failing to
> advance its oldest client/flush tid.  client_id: 460226
>
>     mds.code-store.host18w.rtyvdy(mds.7): Client  failing to advance its
> oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host18w.rtyvdy(mds.7): Client k8s-node36 failing to
> advance its oldest client/flush tid.  client_id: 460983
>
>     mds.code-store.host17w.kcdopb(mds.2): Client  failing to advance its
> oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host17w.kcdopb(mds.2): Client  failing to advance its
> oldest client/flush tid.  client_id: 460983
>
>     mds.code-store.host17w.kcdopb(mds.2): Client k8s-node34 failing to
> advance its oldest client/flush tid.  client_id: 460226
>
>     mds.code-store.host17w.kcdopb(mds.2): Client k8s-node24 failing to
> advance its oldest client/flush tid.  client_id: 12072730
>
>     mds.code-store.host20w.bfoftp(mds.4): Client k8s-node32 failing to
> advance its oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host20w.bfoftp(mds.4): Client k8s-node36 failing to
> advance its oldest client/flush tid.  client_id: 460983
>
>     mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node24 failing to
> advance its oldest client/flush tid.  client_id: 12072730
>
>     mds.code-store.host19w.ywrmiz(mds.6): Client k8s-node34 failing to
> advance its oldest client/flush tid.  client_id: 460226
>
>     mds.code-store.host19w.ywrmiz(mds.6): Client  failing to advance its
> oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host19w.ywrmiz(mds.6): Client  failing to advance its
> oldest client/flush tid.  client_id: 460983
>
>     mds.code-store.host16w.vucirx(mds.3): Client  failing to advance its
> oldest client/flush tid.  client_id: 460983
>
>     mds.code-store.host16w.vucirx(mds.3): Client  failing to advance its
> oldest client/flush tid.  client_id: 460226
>
>     mds.code-store.host16w.vucirx(mds.3): Client  failing to advance its
> oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host17w.pdziet(mds.0): Client k8s-node32 failing to
> advance its oldest client/flush tid.  client_id: 239797
>
>     mds.code-store.host17w.pdziet(mds.0): Client k8s-node34 failing to
> advance its oldest client/flush tid.  client_id: 460226
>
>     mds.code-store.host17w.pdziet(mds.0): Client k8s-node36 failing to
> advance its oldest client/flush tid.  client_id: 460983
>
> [WRN] MDS_SLOW_REQUEST: 3 MDSs report slow requests
>
>     mds.code-store.host15w.reolpx(mds.5): 4 slow requests are blocked > 5
> secs
>
>     mds.code-store.host20w.bfoftp(mds.4): 6 slow requests are blocked > 5
> secs
>
>     mds.code-store.host16w.vucirx(mds.3): 97 slow requests are blocked > 5
> secs
>
> [WRN] MDS_TRIM: 3 MDSs behind on trimming
>
>     mds.code-store.host15w.reolpx(mds.5): Behind on trimming (25831/128)
> max_segments: 128, num_segments: 25831
>
>     mds.code-store.host20w.bfoftp(mds.4): Behind on trimming (27605/128)
> max_segments: 128, num_segments: 27605
>
>     mds.code-store.host16w.vucirx(mds.3): Behind on trimming (28676/128)
> max_segments: 128, num_segments: 28676
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux