There should actually be "[ERR]" messages in the osd logs some time after "deep-scrub starts". Can we see those and a pg query for one of the affected pgs? -- Cheers, Brad On Sat, Sep 3, 2016 at 9:52 PM, Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote: > hello > > I am running ceph hammer on debian jessie. using 6 old used underwhelming > servers > > the cluster is a "in-migration" bastard mix of 3TB sata drives with on disk > journal partition, beeing migrated to 5 disk raid5 MD arrays with ssd > journals, for ram limitation reasons. There are about 18 raid5 sets atm and > the rest is 3TB spinners. > > I have some challenges with scrub errors, that i am trying to sort out using > this http://ceph.com/planet/ceph-manually-repair-object/ method. but they > are quite stubborn/sticky > > i do see that osd.8 is often represented in these inconsistencies. but the > broken objects are not allways on osd.8 itself > > in the instructions at http://ceph.com/planet/ceph-manually-repair-object/, > one finds the object name by grepping in the logs. > but some of these haven been here a while. so how can i identify the broken > object if the log file have been rotated away ? > > in the end i move away the broken object with size 0 and run pg repair, but > the error is not removed. > does the pg need to scrub after the repair for it to clear the error. ? > > any advice is appreciated > > kind regards > Ronny Aasen > > > #ceph -s > cluster 3c229f54-bd12-4b4e-a143-1ec73dd0f12a > health HEALTH_ERR > 3 pgs degraded > 9 pgs inconsistent > 3 pgs recovering > 3 pgs stuck degraded > 3 pgs stuck unclean > recovery 88/125583766 objects degraded (0.000%) > recovery 666778/125583766 objects misplaced (0.531%) > recovery 88/45311043 unfound (0.000%) > 9 scrub errors > noout,noscrub,nodeep-scrub flag(s) set > monmap e1: 3 mons at > {mon1=10.24.11.11:6789/0,mon2=10.24.11.12:6789/0,mon3=10.24.11.13:6789/0} > election epoch 60, quorum 0,1,2 mon1,mon2,mon3 > osdmap e105977: 92 osds: 92 up, 92 in; 2 remapped pgs > flags noout,noscrub,nodeep-scrub > pgmap v12896186: 4608 pgs, 3 pools, 117 TB data, 44249 kobjects > 308 TB used, 107 TB / 416 TB avail > 88/125583766 objects degraded (0.000%) > 666778/125583766 objects misplaced (0.531%) > 88/45311043 unfound (0.000%) > 4593 active+clean > 9 active+clean+inconsistent > 3 active+clean+scrubbing > 2 active+recovering+degraded+remapped > 1 active+recovering+degraded > client io 4572 kB/s rd, 1141 op/s > > # ceph health detail > HEALTH_ERR 3 pgs degraded; 9 pgs inconsistent; 3 pgs recovering; 3 pgs stuck > degraded; 3 pgs stuck unclean; recovery 88/125583766 objects degraded > (0.000%); recovery 666778/125583766 objects misplaced (0.531%); recovery > 88/45311043 unfound (0.000%); 9 scrub errors; noout,noscrub,nodeep-scrub > flag(s) set > pg 6.d4 is stuck unclean for 3770820.461291, current state > active+recovering+degraded+remapped, last acting [62,8] > pg 6.da is stuck unclean for 2420102.778679, current state > active+recovering+degraded, last acting [6,110] > pg 6.ab is stuck unclean for 3774233.330685, current state > active+recovering+degraded+remapped, last acting [12,8] > pg 6.d4 is stuck degraded for 304239.715211, current state > active+recovering+degraded+remapped, last acting [62,8] > pg 6.da is stuck degraded for 416210.309539, current state > active+recovering+degraded, last acting [6,110] > pg 6.ab is stuck degraded for 304239.779541, current state > active+recovering+degraded+remapped, last acting [12,8] > pg 1.356 is active+clean+inconsistent, acting [8,84,39] > pg 1.1a7 is active+clean+inconsistent, acting [8,36,34] > pg 1.11e is active+clean+inconsistent, acting [8,12,6] > pg 6.da is active+recovering+degraded, acting [6,110], 25 unfound > pg 6.d4 is active+recovering+degraded+remapped, acting [62,8], 25 unfound > pg 6.ab is active+recovering+degraded+remapped, acting [12,8], 38 unfound > pg 1.de4 is active+clean+inconsistent, acting [41,8,108] > pg 1.c90 is active+clean+inconsistent, acting [12,71,8] > pg 1.ae6 is active+clean+inconsistent, acting [8,36,49] > pg 1.8bc is active+clean+inconsistent, acting [59,8,107] > pg 1.806 is active+clean+inconsistent, acting [60,3,106] > pg 1.675 is active+clean+inconsistent, acting [37,106,62] > recovery 88/125583766 objects degraded (0.000%) > recovery 666778/125583766 objects misplaced (0.531%) > recovery 88/45311043 unfound (0.000%) > 9 scrub errors > noout,noscrub,nodeep-scrub flag(s) set > > NB: the 88 unfound objects are in a pool i experimented with size 2, so not > important in this context. > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com