Re: HEALTH_WARN 1 MDSs report slow metadata IOs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Check if there is any hang request in 'ceph daemon  mds.xxx objecter_requests'

On Tue, Jul 16, 2019 at 11:51 PM Dietmar Rieder
<dietmar.rieder@xxxxxxxxxxx> wrote:
>
> On 7/16/19 4:11 PM, Dietmar Rieder wrote:
> > Hi,
> >
> > We are running ceph version 14.1.2 with cephfs only.
> >
> > I just noticed that one of our pgs had scrub errors which I could repair
> >
> > # ceph health detail
> > HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow requests;
> > 1 scrub errors; Possible data damage: 1 pg inconsistent
> > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
> >     mdscephmds-01(mds.0): 3 slow metadata IOs are blocked > 30 secs,
> > oldest blocked for 47743 secs
> > MDS_SLOW_REQUEST 1 MDSs report slow requests
> >     mdscephmds-01(mds.0): 2 slow requests are blocked > 30 secs
> > OSD_SCRUB_ERRORS 1 scrub errors
> > PG_DAMAGED Possible data damage: 1 pg inconsistent
> >     pg 6.e0b is active+clean+inconsistent, acting
> > [194,23,116,183,149,82,42,132,26]
> >
> >
> > Apparently I was able to repair the pg:
> >
> > #  rados list-inconsistent-pg hdd-ec-data-pool
> > ["6.e0b"]
> >
> > # ceph pg repair 6.e0b
> > instructing pg 6.e0bs0 on osd.194 to repair
> >
> > [...]
> > 2019-07-16 15:07:13.700 7f851d720700  0 log_channel(cluster) log [DBG] :
> > 6.e0b repair starts
> > 2019-07-16 15:10:23.852 7f851d720700  0 log_channel(cluster) log [DBG] :
> > 6.e0b repair ok, 0 fixed
> > [....]
> >
> >
> > However I still have HEALTH_WARN do to slow metadata IOs.
> >
> > # ceph health detail
> > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests
> > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
> >     mdscephmds-01(mds.0): 3 slow metadata IOs are blocked > 30 secs,
> > oldest blocked for 51123 secs
> > MDS_SLOW_REQUEST 1 MDSs report slow requests
> >     mdscephmds-01(mds.0): 5 slow requests are blocked > 30 secs
> >
> >
> > I already rebooted all my client machines accessing the cephfs via
> > kernel client, but the HEALTH_WARN status is still the one above.
> >
> > In the MDS log I see tons of the following messages:
> >
> > [...]
> > 2019-07-16 16:08:17.770 7f727fd2e700  0 log_channel(cluster) log [WRN] :
> > slow request 1920.184123 seconds old, received at 2019-07-16
> > 15:36:17.586647: client_request(client.3902814:84 getattr pAsLsXsFs
> > #0x10001daa8ad 2019-07-16 15:36:17.585355 caller_uid=40059,
> > caller_gid=50000{}) currently failed to rdlock, waiting
> > 2019-07-16 16:08:19.069 7f7282533700  1 mds.cephmds-01 Updating MDS map
> > to version 12642 from mon.0
> > 2019-07-16 16:08:22.769 7f727fd2e700  0 log_channel(cluster) log [WRN] :
> > 5 slow requests, 0 included below; oldest blocked for > 49539.644840 secs
> > 2019-07-16 16:08:26.683 7f7282533700  1 mds.cephmds-01 Updating MDS map
> > to version 12643 from mon.0
> > [...]
> >
> > How can I get back to normal?
> >
> > I'd be grateful for any help
>
>
> after I restarted the 3 mds daemons I got rid of the blocked client
> requests but there is still the slow metadata IOs warning:
>
>
> # ceph health detail
> HEALTH_WARN 1 MDSs report slow metadata IOs
> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>     mdscephmds-01(mds.0): 2 slow metadata IOs are blocked > 30 secs,
> oldest blocked for 563 secs
>
> the mds log has now these messages every ~5 seconds:
> [...]
> 2019-07-16 17:31:20.456 7f38947a2700  1 mds.cephmds-01 Updating MDS map
> to version 13638 from mon.2
> 2019-07-16 17:31:24.529 7f38947a2700  1 mds.cephmds-01 Updating MDS map
> to version 13639 from mon.2
> 2019-07-16 17:31:28.560 7f38947a2700  1 mds.cephmds-01 Updating MDS map
> to version 13640 from mon.2
> [...]
>
> What does this tell me? Can I do something about it?
> For now I stopped all IO.
>
> Best
>   Dietmar
>
>
>
>
> --
> _________________________________________
> D i e t m a r  R i e d e r, Mag.Dr.
> Innsbruck Medical University
> Biocenter - Division for Bioinformatics
> Innrain 80, 6020 Innsbruck
> Phone: +43 512 9003 71402
> Fax: +43 512 9003 73100
> Email: dietmar.rieder@xxxxxxxxxxx
> Web:   http://www.icbi.at
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux