Check if there is any hang request in 'ceph daemon mds.xxx objecter_requests' On Tue, Jul 16, 2019 at 11:51 PM Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> wrote: > > On 7/16/19 4:11 PM, Dietmar Rieder wrote: > > Hi, > > > > We are running ceph version 14.1.2 with cephfs only. > > > > I just noticed that one of our pgs had scrub errors which I could repair > > > > # ceph health detail > > HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; > > 1 scrub errors; Possible data damage: 1 pg inconsistent > > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > > mdscephmds-01(mds.0): 3 slow metadata IOs are blocked > 30 secs, > > oldest blocked for 47743 secs > > MDS_SLOW_REQUEST 1 MDSs report slow requests > > mdscephmds-01(mds.0): 2 slow requests are blocked > 30 secs > > OSD_SCRUB_ERRORS 1 scrub errors > > PG_DAMAGED Possible data damage: 1 pg inconsistent > > pg 6.e0b is active+clean+inconsistent, acting > > [194,23,116,183,149,82,42,132,26] > > > > > > Apparently I was able to repair the pg: > > > > # rados list-inconsistent-pg hdd-ec-data-pool > > ["6.e0b"] > > > > # ceph pg repair 6.e0b > > instructing pg 6.e0bs0 on osd.194 to repair > > > > [...] > > 2019-07-16 15:07:13.700 7f851d720700 0 log_channel(cluster) log [DBG] : > > 6.e0b repair starts > > 2019-07-16 15:10:23.852 7f851d720700 0 log_channel(cluster) log [DBG] : > > 6.e0b repair ok, 0 fixed > > [....] > > > > > > However I still have HEALTH_WARN do to slow metadata IOs. > > > > # ceph health detail > > HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests > > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > > mdscephmds-01(mds.0): 3 slow metadata IOs are blocked > 30 secs, > > oldest blocked for 51123 secs > > MDS_SLOW_REQUEST 1 MDSs report slow requests > > mdscephmds-01(mds.0): 5 slow requests are blocked > 30 secs > > > > > > I already rebooted all my client machines accessing the cephfs via > > kernel client, but the HEALTH_WARN status is still the one above. > > > > In the MDS log I see tons of the following messages: > > > > [...] > > 2019-07-16 16:08:17.770 7f727fd2e700 0 log_channel(cluster) log [WRN] : > > slow request 1920.184123 seconds old, received at 2019-07-16 > > 15:36:17.586647: client_request(client.3902814:84 getattr pAsLsXsFs > > #0x10001daa8ad 2019-07-16 15:36:17.585355 caller_uid=40059, > > caller_gid=50000{}) currently failed to rdlock, waiting > > 2019-07-16 16:08:19.069 7f7282533700 1 mds.cephmds-01 Updating MDS map > > to version 12642 from mon.0 > > 2019-07-16 16:08:22.769 7f727fd2e700 0 log_channel(cluster) log [WRN] : > > 5 slow requests, 0 included below; oldest blocked for > 49539.644840 secs > > 2019-07-16 16:08:26.683 7f7282533700 1 mds.cephmds-01 Updating MDS map > > to version 12643 from mon.0 > > [...] > > > > How can I get back to normal? > > > > I'd be grateful for any help > > > after I restarted the 3 mds daemons I got rid of the blocked client > requests but there is still the slow metadata IOs warning: > > > # ceph health detail > HEALTH_WARN 1 MDSs report slow metadata IOs > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > mdscephmds-01(mds.0): 2 slow metadata IOs are blocked > 30 secs, > oldest blocked for 563 secs > > the mds log has now these messages every ~5 seconds: > [...] > 2019-07-16 17:31:20.456 7f38947a2700 1 mds.cephmds-01 Updating MDS map > to version 13638 from mon.2 > 2019-07-16 17:31:24.529 7f38947a2700 1 mds.cephmds-01 Updating MDS map > to version 13639 from mon.2 > 2019-07-16 17:31:28.560 7f38947a2700 1 mds.cephmds-01 Updating MDS map > to version 13640 from mon.2 > [...] > > What does this tell me? Can I do something about it? > For now I stopped all IO. > > Best > Dietmar > > > > > -- > _________________________________________ > D i e t m a r R i e d e r, Mag.Dr. > Innsbruck Medical University > Biocenter - Division for Bioinformatics > Innrain 80, 6020 Innsbruck > Phone: +43 512 9003 71402 > Fax: +43 512 9003 73100 > Email: dietmar.rieder@xxxxxxxxxxx > Web: http://www.icbi.at > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com