stubborn/sticky scrub errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello

I am running ceph hammer on debian jessie. using 6 old used underwhelming servers

the cluster is a "in-migration" bastard mix of 3TB sata drives with on disk journal partition, beeing migrated to 5 disk raid5 MD arrays with ssd journals, for ram limitation reasons. There are about 18 raid5 sets atm and the rest is 3TB spinners.

I have some challenges with scrub errors, that i am trying to sort out using this http://ceph.com/planet/ceph-manually-repair-object/ method. but they are quite stubborn/sticky

i do see that osd.8 is often represented in these inconsistencies. but the broken objects are not allways on osd.8 itself

in the instructions at http://ceph.com/planet/ceph-manually-repair-object/, one finds the object name by grepping in the logs. but some of these haven been here a while. so how can i identify the broken object if the log file have been rotated away ?

in the end i move away the broken object with size 0 and run pg repair, but the error is not removed.
does the pg  need to scrub after the repair for it to clear the error. ?

any advice is appreciated

kind regards
Ronny Aasen


#ceph -s
    cluster 3c229f54-bd12-4b4e-a143-1ec73dd0f12a
     health HEALTH_ERR
            3 pgs degraded
            9 pgs inconsistent
            3 pgs recovering
            3 pgs stuck degraded
            3 pgs stuck unclean
            recovery 88/125583766 objects degraded (0.000%)
            recovery 666778/125583766 objects misplaced (0.531%)
            recovery 88/45311043 unfound (0.000%)
            9 scrub errors
            noout,noscrub,nodeep-scrub flag(s) set
monmap e1: 3 mons at {mon1=10.24.11.11:6789/0,mon2=10.24.11.12:6789/0,mon3=10.24.11.13:6789/0}
            election epoch 60, quorum 0,1,2 mon1,mon2,mon3
     osdmap e105977: 92 osds: 92 up, 92 in; 2 remapped pgs
            flags noout,noscrub,nodeep-scrub
      pgmap v12896186: 4608 pgs, 3 pools, 117 TB data, 44249 kobjects
            308 TB used, 107 TB / 416 TB avail
            88/125583766 objects degraded (0.000%)
            666778/125583766 objects misplaced (0.531%)
            88/45311043 unfound (0.000%)
                4593 active+clean
                   9 active+clean+inconsistent
                   3 active+clean+scrubbing
                   2 active+recovering+degraded+remapped
                   1 active+recovering+degraded
  client io 4572 kB/s rd, 1141 op/s

# ceph health detail
HEALTH_ERR 3 pgs degraded; 9 pgs inconsistent; 3 pgs recovering; 3 pgs stuck degraded; 3 pgs stuck unclean; recovery 88/125583766 objects degraded (0.000%); recovery 666778/125583766 objects misplaced (0.531%); recovery 88/45311043 unfound (0.000%); 9 scrub errors; noout,noscrub,nodeep-scrub flag(s) set pg 6.d4 is stuck unclean for 3770820.461291, current state active+recovering+degraded+remapped, last acting [62,8] pg 6.da is stuck unclean for 2420102.778679, current state active+recovering+degraded, last acting [6,110] pg 6.ab is stuck unclean for 3774233.330685, current state active+recovering+degraded+remapped, last acting [12,8] pg 6.d4 is stuck degraded for 304239.715211, current state active+recovering+degraded+remapped, last acting [62,8] pg 6.da is stuck degraded for 416210.309539, current state active+recovering+degraded, last acting [6,110] pg 6.ab is stuck degraded for 304239.779541, current state active+recovering+degraded+remapped, last acting [12,8]
pg 1.356 is active+clean+inconsistent, acting [8,84,39]
pg 1.1a7 is active+clean+inconsistent, acting [8,36,34]
pg 1.11e is active+clean+inconsistent, acting [8,12,6]
pg 6.da is active+recovering+degraded, acting [6,110], 25 unfound
pg 6.d4 is active+recovering+degraded+remapped, acting [62,8], 25 unfound
pg 6.ab is active+recovering+degraded+remapped, acting [12,8], 38 unfound
pg 1.de4 is active+clean+inconsistent, acting [41,8,108]
pg 1.c90 is active+clean+inconsistent, acting [12,71,8]
pg 1.ae6 is active+clean+inconsistent, acting [8,36,49]
pg 1.8bc is active+clean+inconsistent, acting [59,8,107]
pg 1.806 is active+clean+inconsistent, acting [60,3,106]
pg 1.675 is active+clean+inconsistent, acting [37,106,62]
recovery 88/125583766 objects degraded (0.000%)
recovery 666778/125583766 objects misplaced (0.531%)
recovery 88/45311043 unfound (0.000%)
9 scrub errors
noout,noscrub,nodeep-scrub flag(s) set

NB: the 88 unfound objects are in a pool i experimented with size 2, so not important in this context.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux