Hi, We are running ceph version 14.1.2 with cephfs only. I just noticed that one of our pgs had scrub errors which I could repair # ceph health detail HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1 scrub errors; Possible data damage: 1 pg inconsistent MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdscephmds-01(mds.0): 3 slow metadata IOs are blocked > 30 secs, oldest blocked for 47743 secs MDS_SLOW_REQUEST 1 MDSs report slow requests mdscephmds-01(mds.0): 2 slow requests are blocked > 30 secs OSD_SCRUB_ERRORS 1 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 6.e0b is active+clean+inconsistent, acting [194,23,116,183,149,82,42,132,26] Apparently I was able to repair the pg: # rados list-inconsistent-pg hdd-ec-data-pool ["6.e0b"] # ceph pg repair 6.e0b instructing pg 6.e0bs0 on osd.194 to repair [...] 2019-07-16 15:07:13.700 7f851d720700 0 log_channel(cluster) log [DBG] : 6.e0b repair starts 2019-07-16 15:10:23.852 7f851d720700 0 log_channel(cluster) log [DBG] : 6.e0b repair ok, 0 fixed [....] However I still have HEALTH_WARN do to slow metadata IOs. # ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdscephmds-01(mds.0): 3 slow metadata IOs are blocked > 30 secs, oldest blocked for 51123 secs MDS_SLOW_REQUEST 1 MDSs report slow requests mdscephmds-01(mds.0): 5 slow requests are blocked > 30 secs I already rebooted all my client machines accessing the cephfs via kernel client, but the HEALTH_WARN status is still the one above. In the MDS log I see tons of the following messages: [...] 2019-07-16 16:08:17.770 7f727fd2e700 0 log_channel(cluster) log [WRN] : slow request 1920.184123 seconds old, received at 2019-07-16 15:36:17.586647: client_request(client.3902814:84 getattr pAsLsXsFs #0x10001daa8ad 2019-07-16 15:36:17.585355 caller_uid=40059, caller_gid=50000{}) currently failed to rdlock, waiting 2019-07-16 16:08:19.069 7f7282533700 1 mds.cephmds-01 Updating MDS map to version 12642 from mon.0 2019-07-16 16:08:22.769 7f727fd2e700 0 log_channel(cluster) log [WRN] : 5 slow requests, 0 included below; oldest blocked for > 49539.644840 secs 2019-07-16 16:08:26.683 7f7282533700 1 mds.cephmds-01 Updating MDS map to version 12643 from mon.0 [...] How can I get back to normal? I'd be grateful for any help Thanks Dietmar -- _________________________________________ D i e t m a r R i e d e r, Mag.Dr. Innsbruck Medical University Biocenter - Division for Bioinformatics Innrain 80, 6020 Innsbruck Phone: +43 512 9003 71402 Fax: +43 512 9003 73100 Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com