Luminous won't fully recover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We recently had a few Ceph nodes go offline which required a reboot.  I have been able to get the cluster back to the state listed below however it does not seem like it will progress past the point of 23473/287823588 objects misplaced.



Yesterday it was about 13% of the data that was misplaced…however this morning it has goteen to 0.008% but has not moved past this point in about an hour.



Does anyone see anything in the output below that points to the problem and/or are there any suggestions that I can follow in order to figure out why the cluster health is not moving beyond this point?





---------------------------------------------------

root@rbd1:~# ceph -s

cluster:

    id:     504b5794-34bd-44e7-a8c3-0494cf800c23

    health: HEALTH_ERR

            crush map has legacy tunables (require argonaut, min is firefly)

            23473/287823588 objects misplaced (0.008%)

            14 scrub errors

            Reduced data availability: 2 pgs inactive

            Possible data damage: 8 pgs inconsistent



  services:

    mon: 3 daemons, quorum hqceph1,hqceph2,hqceph3

    mgr: hqceph2(active), standbys: hqceph3

    osd: 288 osds: 270 up, 270 in; 2 remapped pgs

    rgw: 1 daemon active



  data:

    pools:   17 pools, 9411 pgs

    objects: 95.95M objects, 309TiB

    usage:   936TiB used, 627TiB / 1.53PiB avail

    pgs:     0.021% pgs not active

             23473/287823588 objects misplaced (0.008%)

             9369 active+clean

             30   active+clean+scrubbing+deep

             8    active+clean+inconsistent

             2    activating+remapped

             2    active+clean+scrubbing



  io:

    client:   1000B/s rd, 0B/s wr, 0op/s rd, 0op/s wr



root@rbd1:~# ceph health detail

HEALTH_ERR crush map has legacy tunables (require argonaut, min is firefly); 1 osds down; 23473/287823588 objects misplaced (0.008%); 14 scrub errors; Reduced data availability: 3 pgs inactive, 13 pgs peering; Possible data damage: 8 pgs inconsistent; Degraded data redundancy: 408658/287823588 objects degraded (0.142%), 38 pgs degraded

OLD_CRUSH_TUNABLES crush map has legacy tunables (require argonaut, min is firefly)

    see http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables

OSD_DOWN 1 osds down

    osd.95 (root=default,host=hqosd8) is down

OBJECT_MISPLACED 23473/287823588 objects misplaced (0.008%)

OSD_SCRUB_ERRORS 14 scrub errors

PG_AVAILABILITY Reduced data availability: 3 pgs inactive, 13 pgs peering

    pg 3.b41 is stuck peering for 106.682058, current state peering, last acting [204,190]

    pg 3.c33 is stuck peering for 103.403643, current state peering, last acting [228,274]

    pg 3.d15 is stuck peering for 128.537454, current state peering, last acting [286,24]

    pg 3.fa9 is stuck peering for 106.526146, current state peering, last acting [286,47]

    pg 3.fb7 is stuck peering for 105.878878, current state peering, last acting [62,97]

    pg 3.13a2 is stuck peering for 106.491138, current state peering, last acting [270,219]

    pg 3.1521 is stuck inactive for 170180.165265, current state activating+remapped, last acting [94,186,188]

    pg 3.1565 is stuck peering for 106.782784, current state peering, last acting [121,60]

    pg 3.157c is stuck peering for 128.557448, current state peering, last acting [128,268]

    pg 3.1744 is stuck peering for 106.639603, current state peering, last acting [192,142]

    pg 3.1ac8 is stuck peering for 127.839550, current state peering, last acting [221,190]

    pg 3.1e24 is stuck peering for 128.201670, current state peering, last acting [118,158]

    pg 3.1e46 is stuck inactive for 169121.764376, current state activating+remapped, last acting [87,199,170]

    pg 18.36 is stuck peering for 128.554121, current state peering, last acting [204]

    pg 21.1ce is stuck peering for 106.582584, current state peering, last acting [266,192]

PG_DAMAGED Possible data damage: 8 pgs inconsistent

    pg 3.1ca is active+clean+inconsistent, acting [201,8,180]

    pg 3.56a is active+clean+inconsistent, acting [148,240,8]

    pg 3.b0f is active+clean+inconsistent, acting [148,260,8]

    pg 3.b56 is active+clean+inconsistent, acting [218,8,240]

    pg 3.10ff is active+clean+inconsistent, acting [262,8,211]

    pg 3.1192 is active+clean+inconsistent, acting [192,8,187]

    pg 3.124a is active+clean+inconsistent, acting [123,8,222]

    pg 3.1c55 is active+clean+inconsistent, acting [180,8,287]

PG_DEGRADED Degraded data redundancy: 408658/287823588 objects degraded (0.142%), 38 pgs degraded

    pg 3.8f is active+undersized+degraded, acting [163,149]

    pg 3.ba is active+undersized+degraded, acting [68,280]

    pg 3.1aa is active+undersized+degraded, acting [176,211]

    pg 3.29e is active+undersized+degraded, acting [241,194]

    pg 3.323 is active+undersized+degraded, acting [78,194]

    pg 3.343 is active+undersized+degraded, acting [242,144]

    pg 3.4ae is active+undersized+degraded, acting [153,237]

    pg 3.524 is active+undersized+degraded, acting [252,222]

    pg 3.5c9 is active+undersized+degraded, acting [272,252]

    pg 3.713 is active+undersized+degraded, acting [273,80]

    pg 3.730 is active+undersized+degraded, acting [235,212]

    pg 3.88f is active+undersized+degraded, acting [222,285]

    pg 3.8cb is active+undersized+degraded, acting [285,20]

    pg 3.9a0 is active+undersized+degraded, acting [240,200]

    pg 3.c19 is active+undersized+degraded, acting [165,276]

    pg 3.ec8 is active+undersized+degraded, acting [158,40]

    pg 3.1025 is active+undersized+degraded, acting [258,274]

    pg 3.1058 is active+undersized+degraded, acting [38,68]

    pg 3.14e4 is active+undersized+degraded, acting [185,39]

    pg 3.150c is active+undersized+degraded, acting [138,140]

    pg 3.1545 is active+undersized+degraded, acting [222,55]

    pg 3.15a6 is active+undersized+degraded, acting [242,272]

    pg 3.1620 is active+undersized+degraded, acting [200,164]

    pg 3.1710 is active+undersized+degraded, acting [176,285]

    pg 3.1792 is active+undersized+degraded, acting [190,11]

    pg 3.17bd is active+undersized+degraded, acting [207,15]

    pg 3.17da is active+undersized+degraded, acting [5,160]

    pg 3.183e is active+undersized+degraded, acting [273,136]

    pg 3.197d is active+undersized+degraded, acting [241,139]

    pg 3.1a3d is active+undersized+degraded, acting [184,121]

    pg 3.1ba6 is active+undersized+degraded, acting [47,249]

    pg 3.1c2b is active+undersized+degraded, acting [268,80]

    pg 3.1ca2 is active+undersized+degraded, acting [280,152]

    pg 3.1cd4 is active+undersized+degraded, acting [2,129]

    pg 3.1e13 is active+undersized+degraded, acting [247,114]

    pg 12.56 is active+undersized+degraded, acting [54]

    pg 18.8 is undersized+degraded+peered, acting [260]

    pg 21.9f is active+undersized+degraded, acting [215,201]
--------------------------------------------------------------------------------------------------


Thanks,
Shain

Shain Miley | Director of Platform and Infrastructure | Digital Media | smiley@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux