Dear All, A bad disk controller appears to have damaged our cluster... # ceph health HEALTH_ERR 10 scrub errors; Possible data damage: 10 pgs inconsistent probing to find bad pg... # ceph health detail HEALTH_ERR 10 scrub errors; Possible data damage: 10 pgs inconsistent OSD_SCRUB_ERRORS 10 scrub errors PG_DAMAGED Possible data damage: 10 pgs inconsistent pg 4.1de is active+clean+inconsistent, acting [333,367,315,36,241,280,200,439,182,121] (SNIP...next 9 bad pg are listed similar to above) now looking for further detail... [root@ceph1 ~]# rados list-inconsistent-obj 4.1de No scrub information available for pg 4.1de error 2: (2) No such file or directory presumably we need to initiate a manual scrub...? # ceph pg scrub 4.1de instructing pg 4.1des0 on osd.333 to scrub Current date/time is... # date +"%F %T" 2018-06-21 09:57:27 now look at the osd log... # tail -2 ceph-osd.333.log 2018-06-21 07:27:56.253 7f39a4423700 0 log_channel(cluster) log [DBG] : 5.d27 deep-scrub starts 2018-06-21 07:27:56.331 7f39a4423700 0 log_channel(cluster) log [DBG] : 5.d27 deep-scrub ok Note the above date stamps, the scrub command appears to be ignored Any ideas on why this is happening, and what we can do to fix the error? Some background: Cluster upgraded from Luminous (12.2.5) to Mimic (13.2.0) Pool uses EC 8+2, 10 nodes, 450 x 8TB Bluestore OSD Any ideas gratefully received.. Jake -- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com