Hi Sean, Many thanks for the suggestion, but unfortunately deep-scrub also appears to be ignored: # ceph pg deep-scrub 4.ff instructing pg 4.ffs0 on osd.318 to deep-scrub 'tail -f ceph-osd.318.log' shows no new entries. To get more info, I set debug level 10 on the osd, and issued another repair command: # ceph daemon osd.318 config set debug_osd 10 # ceph pg repair 4.ff instructing pg 4.ffs0 on osd.318 to repair Tailing OSD log, showed what might be an appropriate response: 2018-07-04 13:54:44.181 7faaaeaa8700 10 osd.318 pg_epoch: 180138 pg[4.ffs0( v 180138'5043225 (180078'5040201,180138'5043225] local-lis/les=179843/179844 n=124423 ec=735/735 lis/c 179843/179843 les/c/f 179844/180011/0 179841/179843/174426) [318,403,150,13,225,261,382,175,282,324]p318(0) r=0 lpr=179843 crt=180138'5043225 lcod 180138'5043224 mlcod 180138'5043224 active+clean+inconsistent MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB ps=926] state<Started/Primary>: marking for scrub However, the scrub still doesn't start... # ceph pg 4.ff query shows ..... "last_deep_scrub_stamp": "2018-07-01 18:00:41.769956", "last_clean_scrub_stamp": "2018-06-27 05:55:13.023760", "num_scrub_errors": 23, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 23, "scrub": { "scrubber.epoch_start": "178857", "scrubber.active": false, "scrubber.state": "INACTIVE", "scrubber.start": "MIN", "scrubber.end": "MIN", "scrubber.max_end": "MIN", "scrubber.subset_last_update": "0'0", "scrubber.deep": false, "scrubber.waiting_on_whom": [] Not sure where to go from here :( Jake On 04/07/18 01:14, Sean Redmond wrote: > do a deep-scrub instead of just a scrub > > On Tue, 3 Jul 2018, 12:37 Jake Grimmett, <jog@xxxxxxxxxxxxxxxxx > <mailto:jog@xxxxxxxxxxxxxxxxx>> wrote: > > Dear All, > > Sorry to bump the thread, but I still can't manually repair inconsistent > pgs on our Mimic cluster (13.2.0, upgraded from 12.2.5) > > There are many similarities to an unresolved bug: > > http://tracker.ceph.com/issues/15781 > > To give more examples of the problem: > > The following commands appear to run OK, but *nothing* appears in the > osd log to indicate that the commands are running. The OSD's are > otherwise working & logging OK. > > # ceph pg scrub 4.e19 > instructing pg 4.e19s0 on osd.246 to scrub > > # ceph pg repair 4.e19 > instructing pg 4.e19s0 on osd.246 to repair > > # ceph osd scrub 246 > instructed osd(s) 246 to scrub > > # ceph osd repair 246 > instructed osd(s) 246 to repair > > It does not matter which osd or pg the repair is initiated on. > > This command also fails: > # rados list-inconsistent-obj 4.e19 > No scrub information available for pg 4.e19 > error 2: (2) No such file or directory > > >From the OSD logs, and 'ceph -s' I can see that the OSD's are still > doing automatic background pg scrubs, just not the ones I have asked > them to do, at the time of my request they are not currently scrubbing. > > Could it be that my commands are not being sent to the OSD's? > > Any idea on how to debug this? > > ... > > Further info: > > Output of 'ceph pg 4.e19 query' is here: > http://p.ip.fi/9x5v > > Output of 'ceph daemon osd.246 config show' is here > http://p.ip.fi/RAuk > > Cluster has 10 nodes, 128GB RAM, dual Xeon > 450 Bluestore SATA OSD, EC 8:2 > 4 NVME OSD, replicated > used for cephfs (2.3PB), daily snapshots only > > # ceph health detail > HEALTH_ERR 9500031/5149746146 objects misplaced (0.184%); 80 scrub > errors; Possible data damage: 7 pgs inconsistent > OBJECT_MISPLACED 9500031/5149746146 objects misplaced (0.184%) > OSD_SCRUB_ERRORS 80 scrub errors > PG_DAMAGED Possible data damage: 7 pgs inconsistent > pg 4.ff is active+clean+inconsistent, acting > [318,403,150,13,225,261,382,175,282,324] > pg 4.2e2 is active+clean+inconsistent, acting > [352,59,328,451,195,119,42,66,158,150] > pg 4.551 is active+clean+inconsistent, acting > [391,105,124,150,205,22,269,184,293,91] > pg 4.61c is active+clean+inconsistent, acting > [382,131,84,35,282,214,236,366,309,150] > pg 4.8cd is active+clean+inconsistent, acting > [353,58,5,252,187,183,323,150,387,32] > pg 4.a20 is active+clean+inconsistent, acting > [346,104,398,282,225,133,150,70,165,17] > pg 4.e19 is active+clean+inconsistent, acting > [246,447,245,98,170,348,111,155,150,295] > > again, thanks for any advice, > > Jake > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. Phone 01223 267019 Mobile 0776 9886539 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com