incomplete pg, and some mess

Linux Chips <linux.chips@xxxxxxxxx> · Tue, 22 Dec 2015 01:19:20 +0300



    hi every one,

    we some how got our cluster really messed up. we had one node down
    du to system disk failing. while we were working to bring it back we
    had few osds crashing, they kept crashing, so we stopped them. that
    would be a story for another thread though.

    now we have few unfound objects, which we are ok if we lost. but we
    also have two incomplete pgs.

    one on a pool size is 2, the other on an erasure coded pool (12+4).
    and they are stuck, i am ok with losing the data, but can not figure
    out how to get rid of them. all requests to objects inside of the
    are blocked, driving the whole cluster to a halt.

    thanks

    
    # ceph health detail  | grep incomplete

    HEALTH_WARN 679 pgs backfill; 2 pgs backfilling; 3141 pgs degraded;
    2 pgs incomplete; 2488 pgs recovery_wait; 3141 pgs stuck degraded; 2
    pgs stuck inactive; 3171 pgs stuck unclean; 1226 pgs stuck
    undersized; 1226 pgs undersized; 103 requests are blocked > 32
    sec; 2 osds have slow requests; recovery 15237140/686754017 objects
    degraded (2.219%); recovery 23314256/686754017 objects misplaced
    (3.395%); recovery 79/102730138 unfound (0.000%);
    noout,noscrub,nodeep-scrub flag(s) set

    pg 19.e8d is stuck inactive since forever, current state incomplete,
    last acting [94,78]

    pg 108.176 is stuck inactive for 35922.975233, current state
    remapped+incomplete, last acting
[2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,8,231,176,299,175,118,277,293,100,44]

    pg 108.176 is stuck unclean for 69383.394860, current state
    remapped+incomplete, last acting
[2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,8,231,176,299,175,118,277,293,100,44]

    pg 19.e8d is stuck unclean since forever, current state incomplete,
    last acting [94,78]

    pg 19.e8d is incomplete, acting [94,78]

    pg 108.176 is remapped+incomplete, acting
    [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,8,231,176,299,175,118,277,293,100,44]
    (reducing pool .rgw.buckets.erasure min_size from 12 may help;
    search ceph.com/docs for 'incomplete')

    
    # ceph health detail  | grep unfound

    HEALTH_WARN 679 pgs backfill; 2 pgs backfilling; 3141 pgs degraded;
    2 pgs incomplete; 2488 pgs recovery_wait; 3141 pgs stuck degraded; 2
    pgs stuck inactive; 3171 pgs stuck unclean; 1226 pgs stuck
    undersized; 1226 pgs undersized; 103 requests are blocked > 32
    sec; 2 osds have slow requests; recovery 15240378/686758679 objects
    degraded (2.219%); recovery 23313887/686758679 objects misplaced
    (3.395%); recovery 79/102730392 unfound (0.000%);
    noout,noscrub,nodeep-scrub flag(s) set

    pg 4.5d3 is active+recovery_wait+undersized+degraded+remapped,
    acting [208], 15 unfound

    pg 19.5c4 is active+recovery_wait+undersized+degraded+remapped,
    acting [208], 15 unfound

    pg 4.4a7 is active+recovery_wait+undersized+degraded+remapped,
    acting [201], 13 unfound

    pg 19.498 is active+recovery_wait+undersized+degraded+remapped,
    acting [201], 13 unfound

    pg 4.1d0 is active+recovery_wait+undersized+degraded+remapped,
    acting [208], 13 unfound

    pg 19.1c1 is active+recovery_wait+undersized+degraded+remapped,
    acting [208], 10 unfound

    recovery 79/102730392 unfound (0.000%)

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com