Re: ceph inconsistent pg missing ec object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay, after consulting with a colleague this appears to be an instance of http://tracker.ceph.com/issues/21382. Assuming the object is one that doesn't have snapshots, your easiest resolution is to use rados get to retrieve the object (which, unlike recovery, should work) and then "rados put" it back in to place.

This fix might be backported to Jewel for a later release, but it's tricky so wasn't done proactively.
-Greg

On Fri, Oct 20, 2017 at 12:27 AM Stijn De Weirdt <stijn.deweirdt@xxxxxxxx> wrote:
hi gregory,

we more or less followed the instructions on the site (famous last
words, i know ;)

grepping for the error in the osd logs of the osds of the pg, the
primary logs had "5.5e3s0 shard 59(5) missing
5:c7ae919b:::10014d3184b.00000000:head"

we looked for the object using the find command, we got

> [root@osd003 ~]# find /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/ -name "*10014d3184b.00000000*"
>
> /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/DIR_3/DIR_E/DIR_5/DIR_7/DIR_9/10014d3184b.00000000__head_D98975E3__5_ffffffffffffffff_0

then we ran this find on all 11 osds from the pg, and 10 out of 11 osds
gave similar path (the suffix _[0-9a] matched the index of the osd in
the list of osds reported by the pg, so i assumed that was the ec
splitting up the data in 11 pieces)

on one osd in the list of osds, there was no such object (the 6th one,
index 5, so more assuming form our side that this was the 5 in 5:...
from the logfile). so we assumed this was the missing object that the
error reported. we have absolutely no clue why it was missing or what
happened, nothing in any logs.

what we did then was stop the osd that had the missing object, flush the
journal and start the osd and ran repair. (the guide mentioned to delete
an object, we did not delete anything, because we assumed the issue was
the already missing object from the 6th osd)

flushing the journal segfaulted, but the osd started fine again.

the scrub errors did not disappear, so we did the same again on the
primary (no deleting of anything; and again, the flush segfaulted).

wrt the segfault, i attached the output of a segfaulting flush with
debug on another osd.


stijn


On 10/20/2017 02:56 AM, Gregory Farnum wrote:
> Okay, you're going to need to explain in very clear terms exactly what
> happened to your cluster, and *exactly* what operations you performed
> manually.
>
> The PG shards seem to have different views of the PG in question. The
> primary has a different log_tail, last_user_version, and last_epoch_clean
> from the others. Plus different log sizes? It's not making a ton of sense
> at first glance.
> -Greg
>
> On Thu, Oct 19, 2017 at 1:08 AM Stijn De Weirdt <stijn.deweirdt@xxxxxxxx>
> wrote:
>
>> hi greg,
>>
>> i attached the gzip output of the query and some more info below. if you
>> need more, let me know.
>>
>> stijn
>>
>>> [root@mds01 ~]# ceph -s
>>>     cluster 92beef0a-1239-4000-bacf-4453ab630e47
>>>      health HEALTH_ERR
>>>             1 pgs inconsistent
>>>             40 requests are blocked > 512 sec
>>>             1 scrub errors
>>>             mds0: Behind on trimming (2793/30)
>>>      monmap e1: 3 mons at {mds01=
>> 1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0}
>>>             election epoch 326, quorum 0,1,2 mds01,mds02,mds03
>>>       fsmap e238677: 1/1/1 up {0=mds02=up:active}, 2 up:standby
>>>      osdmap e79554: 156 osds: 156 up, 156 in
>>>             flags sortbitwise,require_jewel_osds
>>>       pgmap v51003893: 4096 pgs, 3 pools, 387 TB data, 243 Mobjects
>>>             545 TB used, 329 TB / 874 TB avail
>>>                 4091 active+clean
>>>                    4 active+clean+scrubbing+deep
>>>                    1 active+clean+inconsistent
>>>   client io 284 kB/s rd, 146 MB/s wr, 145 op/s rd, 177 op/s wr
>>>   cache io 115 MB/s flush, 153 MB/s evict, 14 op/s promote, 3 PG(s)
>> flushing
>>
>>> [root@mds01 ~]# ceph health detail
>>> HEALTH_ERR 1 pgs inconsistent; 52 requests are blocked > 512 sec; 5 osds
>> have slow requests; 1 scrub errors; mds0: Behind on trimming (2782/30)
>>> pg 5.5e3 is active+clean+inconsistent, acting
>> [35,50,91,18,139,59,124,40,104,12,71]
>>> 34 ops are blocked > 524.288 sec on osd.8
>>> 6 ops are blocked > 524.288 sec on osd.67
>>> 6 ops are blocked > 524.288 sec on osd.27
>>> 1 ops are blocked > 524.288 sec on osd.107
>>> 5 ops are blocked > 524.288 sec on osd.116
>>> 5 osds have slow requests
>>> 1 scrub errors
>>> mds0: Behind on trimming (2782/30)(max_segments: 30, num_segments: 2782)
>>
>>> # zgrep -C 1 ERR ceph-osd.35.log.*.gz
>>> ceph-osd.35.log.5.gz:2017-10-14 11:25:52.260668 7f34d6748700  0 --
>> 10.141.16.13:6801/1001792 >> 1.2.3.11:6803/1951 pipe(0x56412da80800
>> sd=273 :6801 s=2 pgs=3176 cs=31 l=0 c=0x564156e83b00).fault with nothing to
>> send, going to standby
>>> ceph-osd.35.log.5.gz:2017-10-14 11:26:06.071011 7f3511be4700 -1
>> log_channel(cluster) log [ERR] : 5.5e3s0 shard 59(5) missing
>> 5:c7ae919b:::10014d3184b.00000000:head
>>> ceph-osd.35.log.5.gz:2017-10-14 11:28:36.465684 7f34ffdf5700  0 --
>> 1.2.3.13:6801/1001792 >> 1.2.3.21:6829/1834 pipe(0x56414e2a2000 sd=37
>> :6801 s=0 pgs=0 cs=0 l=0 c=0x5641470d2a00).accept connect_seq 33 vs
>> existing 33 state standby
>>> ceph-osd.35.log.5.gz:--
>>> ceph-osd.35.log.5.gz:2017-10-14 11:43:35.570711 7f3508efd700  0 --
>> 1.2.3.13:6801/1001792 >> 1.2.3.20:6825/1806 pipe(0x56413be34000 sd=138
>> :6801 s=2 pgs=2763 cs=45 l=0 c=0x564132999480).fault with nothing to send,
>> going to standby
>>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235548 7f3511be4700 -1
>> log_channel(cluster) log [ERR] : 5.5e3s0 deep-scrub 1 missing, 0
>> inconsistent objects
>>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235554 7f3511be4700 -1
>> log_channel(cluster) log [ERR] : 5.5e3 deep-scrub 1 errors
>>> ceph-osd.35.log.5.gz:2017-10-14 11:59:02.331454 7f34d6d4e700  0 --
>> 1.2.3.13:6801/1001792 >> 1.2.3.11:6817/1941 pipe(0x56414d370800 sd=227
>> :42104 s=2 pgs=3238 cs=89 l=0 c=0x56413122d200).fault with nothing to send,
>> going to standby
>>
>>
>>
>> On 10/18/2017 10:19 PM, Gregory Farnum wrote:
>>> It would help if you can provide the exact output of "ceph -s", "pg
>> query",
>>> and any other relevant data. You shouldn't need to do manual repair of
>>> erasure-coded pools, since it has checksums and can tell which bits are
>>> bad. Following that article may not have done you any good (though I
>>> wouldn't expect it to hurt, either...)...
>>> -Greg
>>>
>>> On Wed, Oct 18, 2017 at 5:56 AM Stijn De Weirdt <stijn.deweirdt@xxxxxxxx
>>>
>>> wrote:
>>>
>>>> hi all,
>>>>
>>>> we have a ceph 10.2.7 cluster with a 8+3 EC pool.
>>>> in that pool, there is a pg in inconsistent state.
>>>>
>>>> we followed http://ceph.com/geen-categorie/ceph-manually-repair-object/
>> ,
>>>> however, we are unable to solve our issue.
>>>>
>>>> from the primary osd logs, the reported pg had a missing object.
>>>>
>>>> we found a related object on the primary osd, and then looked for
>>>> similar ones on the other osds in same path (i guess it is just has the
>>>> index of the osd in the pg list of osds suffixed)
>>>>
>>>> one osd did not have such a file (the 10 others did).
>>>>
>>>> so we did the "stop osd/flush/start os/pg repair" on both the primary
>>>> osd and on the osd with the missing EC part.
>>>>
>>>> however, the scrub error still exists.
>>>>
>>>> does anyone have any hints what to do in this case?
>>>>
>>>> stijn
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux