Re: ceph inconsistent pg missing ec object

Stijn De Weirdt <stijn.deweirdt@xxxxxxxx> · Fri, 20 Oct 2017 09:27:07 +0200

hi gregory,

we more or less followed the instructions on the site (famous last
words, i know ;)

grepping for the error in the osd logs of the osds of the pg, the
primary logs had "5.5e3s0 shard 59(5) missing
5:c7ae919b:::10014d3184b.00000000:head"

we looked for the object using the find command, we got

> [root@osd003 ~]# find /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/ -name "*10014d3184b.00000000*"
> 
> /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/DIR_3/DIR_E/DIR_5/DIR_7/DIR_9/10014d3184b.00000000__head_D98975E3__5_ffffffffffffffff_0

then we ran this find on all 11 osds from the pg, and 10 out of 11 osds
gave similar path (the suffix _[0-9a] matched the index of the osd in
the list of osds reported by the pg, so i assumed that was the ec
splitting up the data in 11 pieces)

on one osd in the list of osds, there was no such object (the 6th one,
index 5, so more assuming form our side that this was the 5 in 5:...
from the logfile). so we assumed this was the missing object that the
error reported. we have absolutely no clue why it was missing or what
happened, nothing in any logs.

what we did then was stop the osd that had the missing object, flush the
journal and start the osd and ran repair. (the guide mentioned to delete
an object, we did not delete anything, because we assumed the issue was
the already missing object from the 6th osd)

flushing the journal segfaulted, but the osd started fine again.

the scrub errors did not disappear, so we did the same again on the
primary (no deleting of anything; and again, the flush segfaulted).

wrt the segfault, i attached the output of a segfaulting flush with
debug on another osd.

stijn

On 10/20/2017 02:56 AM, Gregory Farnum wrote:
> Okay, you're going to need to explain in very clear terms exactly what
> happened to your cluster, and *exactly* what operations you performed
> manually.
> 
> The PG shards seem to have different views of the PG in question. The
> primary has a different log_tail, last_user_version, and last_epoch_clean
> from the others. Plus different log sizes? It's not making a ton of sense
> at first glance.
> -Greg
> 
> On Thu, Oct 19, 2017 at 1:08 AM Stijn De Weirdt <stijn.deweirdt@xxxxxxxx>
> wrote:
> 
>> hi greg,
>>
>> i attached the gzip output of the query and some more info below. if you
>> need more, let me know.
>>
>> stijn
>>
>>> [root@mds01 ~]# ceph -s
>>>     cluster 92beef0a-1239-4000-bacf-4453ab630e47
>>>      health HEALTH_ERR
>>>             1 pgs inconsistent
>>>             40 requests are blocked > 512 sec
>>>             1 scrub errors
>>>             mds0: Behind on trimming (2793/30)
>>>      monmap e1: 3 mons at {mds01=
>> 1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0}
>>>             election epoch 326, quorum 0,1,2 mds01,mds02,mds03
>>>       fsmap e238677: 1/1/1 up {0=mds02=up:active}, 2 up:standby
>>>      osdmap e79554: 156 osds: 156 up, 156 in
>>>             flags sortbitwise,require_jewel_osds
>>>       pgmap v51003893: 4096 pgs, 3 pools, 387 TB data, 243 Mobjects
>>>             545 TB used, 329 TB / 874 TB avail
>>>                 4091 active+clean
>>>                    4 active+clean+scrubbing+deep
>>>                    1 active+clean+inconsistent
>>>   client io 284 kB/s rd, 146 MB/s wr, 145 op/s rd, 177 op/s wr
>>>   cache io 115 MB/s flush, 153 MB/s evict, 14 op/s promote, 3 PG(s)
>> flushing
>>
>>> [root@mds01 ~]# ceph health detail
>>> HEALTH_ERR 1 pgs inconsistent; 52 requests are blocked > 512 sec; 5 osds
>> have slow requests; 1 scrub errors; mds0: Behind on trimming (2782/30)
>>> pg 5.5e3 is active+clean+inconsistent, acting
>> [35,50,91,18,139,59,124,40,104,12,71]
>>> 34 ops are blocked > 524.288 sec on osd.8
>>> 6 ops are blocked > 524.288 sec on osd.67
>>> 6 ops are blocked > 524.288 sec on osd.27
>>> 1 ops are blocked > 524.288 sec on osd.107
>>> 5 ops are blocked > 524.288 sec on osd.116
>>> 5 osds have slow requests
>>> 1 scrub errors
>>> mds0: Behind on trimming (2782/30)(max_segments: 30, num_segments: 2782)
>>
>>> # zgrep -C 1 ERR ceph-osd.35.log.*.gz
>>> ceph-osd.35.log.5.gz:2017-10-14 11:25:52.260668 7f34d6748700  0 --
>> 10.141.16.13:6801/1001792 >> 1.2.3.11:6803/1951 pipe(0x56412da80800
>> sd=273 :6801 s=2 pgs=3176 cs=31 l=0 c=0x564156e83b00).fault with nothing to
>> send, going to standby
>>> ceph-osd.35.log.5.gz:2017-10-14 11:26:06.071011 7f3511be4700 -1
>> log_channel(cluster) log [ERR] : 5.5e3s0 shard 59(5) missing
>> 5:c7ae919b:::10014d3184b.00000000:head
>>> ceph-osd.35.log.5.gz:2017-10-14 11:28:36.465684 7f34ffdf5700  0 --
>> 1.2.3.13:6801/1001792 >> 1.2.3.21:6829/1834 pipe(0x56414e2a2000 sd=37
>> :6801 s=0 pgs=0 cs=0 l=0 c=0x5641470d2a00).accept connect_seq 33 vs
>> existing 33 state standby
>>> ceph-osd.35.log.5.gz:--
>>> ceph-osd.35.log.5.gz:2017-10-14 11:43:35.570711 7f3508efd700  0 --
>> 1.2.3.13:6801/1001792 >> 1.2.3.20:6825/1806 pipe(0x56413be34000 sd=138
>> :6801 s=2 pgs=2763 cs=45 l=0 c=0x564132999480).fault with nothing to send,
>> going to standby
>>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235548 7f3511be4700 -1
>> log_channel(cluster) log [ERR] : 5.5e3s0 deep-scrub 1 missing, 0
>> inconsistent objects
>>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235554 7f3511be4700 -1
>> log_channel(cluster) log [ERR] : 5.5e3 deep-scrub 1 errors
>>> ceph-osd.35.log.5.gz:2017-10-14 11:59:02.331454 7f34d6d4e700  0 --
>> 1.2.3.13:6801/1001792 >> 1.2.3.11:6817/1941 pipe(0x56414d370800 sd=227
>> :42104 s=2 pgs=3238 cs=89 l=0 c=0x56413122d200).fault with nothing to send,
>> going to standby
>>
>>
>>
>> On 10/18/2017 10:19 PM, Gregory Farnum wrote:
>>> It would help if you can provide the exact output of "ceph -s", "pg
>> query",
>>> and any other relevant data. You shouldn't need to do manual repair of
>>> erasure-coded pools, since it has checksums and can tell which bits are
>>> bad. Following that article may not have done you any good (though I
>>> wouldn't expect it to hurt, either...)...
>>> -Greg
>>>
>>> On Wed, Oct 18, 2017 at 5:56 AM Stijn De Weirdt <stijn.deweirdt@xxxxxxxx
>>>
>>> wrote:
>>>
>>>> hi all,
>>>>
>>>> we have a ceph 10.2.7 cluster with a 8+3 EC pool.
>>>> in that pool, there is a pg in inconsistent state.
>>>>
>>>> we followed http://ceph.com/geen-categorie/ceph-manually-repair-object/
>> ,
>>>> however, we are unable to solve our issue.
>>>>
>>>> from the primary osd logs, the reported pg had a missing object.
>>>>
>>>> we found a related object on the primary osd, and then looked for
>>>> similar ones on the other osds in same path (i guess it is just has the
>>>> index of the osd in the pg list of osds suffixed)
>>>>
>>>> one osd did not have such a file (the 10 others did).
>>>>
>>>> so we did the "stop osd/flush/start os/pg repair" on both the primary
>>>> osd and on the osd with the missing EC part.
>>>>
>>>> however, the scrub error still exists.
>>>>
>>>> does anyone have any hints what to do in this case?
>>>>
>>>> stijn
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
> 
2017-10-18 14:08:03.210112 7f9fde1bf800  0 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185), process ceph-osd, pid 2403656
2017-10-18 14:08:03.234189 7f9fde1bf800  0 filestore(/var/lib/ceph/osd/ceph-60) backend xfs (magic 0x58465342)
2017-10-18 14:08:03.234472 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-10-18 14:08:03.234478 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-10-18 14:08:03.234489 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: splice is supported
2017-10-18 14:08:03.295779 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2017-10-18 14:08:03.295820 7f9fde1bf800  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_feature: extsize is disabled by conf
2017-10-18 14:08:03.296497 7f9fde1bf800  1 leveldb: Recovering log #141536
2017-10-18 14:08:03.346301 7f9fde1bf800  1 leveldb: Delete type=2 #141537

2017-10-18 14:08:03.346513 7f9fde1bf800  1 leveldb: Delete type=3 #141535

2017-10-18 14:08:03.346568 7f9fde1bf800  1 leveldb: Delete type=0 #141536

2017-10-18 14:08:03.346666 7f9fd7c9d700  1 leveldb: Compacting 4@0 + 7@1 files
2017-10-18 14:08:03.347443 7f9fde1bf800  0 filestore(/var/lib/ceph/osd/ceph-60) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2017-10-18 14:08:03.352156 7f9fde1bf800  1 journal _open /var/lib/ceph/osd/ceph-60/journal fd 14: 10737418240 bytes, block size 4096 bytes, directio = 1, aio = 1
2017-10-18 14:08:03.354994 7f9fde1bf800  1 journal _open /var/lib/ceph/osd/ceph-60/journal fd 14: 10737418240 bytes, block size 4096 bytes, directio = 1, aio = 1
2017-10-18 14:08:03.356157 7f9fde1bf800  1 filestore(/var/lib/ceph/osd/ceph-60) upgrade
2017-10-18 14:08:03.356309 7f9fde1bf800  1 journal close /var/lib/ceph/osd/ceph-60/journal
*** Caught signal (Segmentation fault) **
 in thread 7f9fd7c9d700 thread_name:ceph-osd
 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x91d8ea) [0x55b5595848ea]
 2: (()+0xf5e0) [0x7f9fdc90e5e0]
 3: [0x55b564718670]
2017-10-18 14:08:03.373770 7f9fd7c9d700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f9fd7c9d700 thread_name:ceph-osd

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x91d8ea) [0x55b5595848ea]
 2: (()+0xf5e0) [0x7f9fdc90e5e0]
 3: [0x55b564718670]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -45> 2017-10-18 14:08:03.207336 7f9fde1bf800  5 asok(0x55b564670140) register_command perfcounters_dump hook 0x55b564644030
   -44> 2017-10-18 14:08:03.207355 7f9fde1bf800  5 asok(0x55b564670140) register_command 1 hook 0x55b564644030
   -43> 2017-10-18 14:08:03.207358 7f9fde1bf800  5 asok(0x55b564670140) register_command perf dump hook 0x55b564644030
   -42> 2017-10-18 14:08:03.207362 7f9fde1bf800  5 asok(0x55b564670140) register_command perfcounters_schema hook 0x55b564644030
   -41> 2017-10-18 14:08:03.207365 7f9fde1bf800  5 asok(0x55b564670140) register_command 2 hook 0x55b564644030
   -40> 2017-10-18 14:08:03.207368 7f9fde1bf800  5 asok(0x55b564670140) register_command perf schema hook 0x55b564644030
   -39> 2017-10-18 14:08:03.207371 7f9fde1bf800  5 asok(0x55b564670140) register_command perf reset hook 0x55b564644030
   -38> 2017-10-18 14:08:03.207375 7f9fde1bf800  5 asok(0x55b564670140) register_command config show hook 0x55b564644030
   -37> 2017-10-18 14:08:03.207378 7f9fde1bf800  5 asok(0x55b564670140) register_command config set hook 0x55b564644030
   -36> 2017-10-18 14:08:03.207381 7f9fde1bf800  5 asok(0x55b564670140) register_command config get hook 0x55b564644030
   -35> 2017-10-18 14:08:03.207384 7f9fde1bf800  5 asok(0x55b564670140) register_command config diff hook 0x55b564644030
   -34> 2017-10-18 14:08:03.207386 7f9fde1bf800  5 asok(0x55b564670140) register_command log flush hook 0x55b564644030
   -33> 2017-10-18 14:08:03.207388 7f9fde1bf800  5 asok(0x55b564670140) register_command log dump hook 0x55b564644030
   -32> 2017-10-18 14:08:03.207391 7f9fde1bf800  5 asok(0x55b564670140) register_command log reopen hook 0x55b564644030
   -31> 2017-10-18 14:08:03.210112 7f9fde1bf800  0 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185), process ceph-osd, pid 2403656
   -30> 2017-10-18 14:08:03.234006 7f9fde1bf800  5 asok(0x55b564670140) init /var/run/ceph/ceph-osd.60.asok
   -29> 2017-10-18 14:08:03.234013 7f9fde1bf800  5 asok(0x55b564670140) bind_and_listen /var/run/ceph/ceph-osd.60.asok
   -28> 2017-10-18 14:08:03.234100 7f9fde1bf800  5 asok(0x55b564670140) register_command 0 hook 0x55b5646400d0
   -27> 2017-10-18 14:08:03.234105 7f9fde1bf800  5 asok(0x55b564670140) register_command version hook 0x55b5646400d0
   -26> 2017-10-18 14:08:03.234109 7f9fde1bf800  5 asok(0x55b564670140) register_command git_version hook 0x55b5646400d0
   -25> 2017-10-18 14:08:03.234112 7f9fde1bf800  5 asok(0x55b564670140) register_command help hook 0x55b564644120
   -24> 2017-10-18 14:08:03.234116 7f9fde1bf800  5 asok(0x55b564670140) register_command get_command_descriptions hook 0x55b564644130
   -23> 2017-10-18 14:08:03.234140 7f9fd849e700  5 asok(0x55b564670140) entry start
   -22> 2017-10-18 14:08:03.234189 7f9fde1bf800  0 filestore(/var/lib/ceph/osd/ceph-60) backend xfs (magic 0x58465342)
   -21> 2017-10-18 14:08:03.234472 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
   -20> 2017-10-18 14:08:03.234478 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
   -19> 2017-10-18 14:08:03.234489 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: splice is supported
   -18> 2017-10-18 14:08:03.295779 7f9fde1bf800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
   -17> 2017-10-18 14:08:03.295820 7f9fde1bf800  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-60) detect_feature: extsize is disabled by conf
   -16> 2017-10-18 14:08:03.296497 7f9fde1bf800  1 leveldb: Recovering log #141536
   -15> 2017-10-18 14:08:03.346301 7f9fde1bf800  1 leveldb: Delete type=2 #141537

   -14> 2017-10-18 14:08:03.346513 7f9fde1bf800  1 leveldb: Delete type=3 #141535

   -13> 2017-10-18 14:08:03.346568 7f9fde1bf800  1 leveldb: Delete type=0 #141536

   -12> 2017-10-18 14:08:03.346666 7f9fd7c9d700  1 leveldb: Compacting 4@0 + 7@1 files
   -11> 2017-10-18 14:08:03.347443 7f9fde1bf800  0 filestore(/var/lib/ceph/osd/ceph-60) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
   -10> 2017-10-18 14:08:03.349818 7f9fde1bf800  2 journal open /var/lib/ceph/osd/ceph-60/journal fsid a0006624-4824-4c46-82dd-c213231a3520 fs_op_seq 126845496
    -9> 2017-10-18 14:08:03.352156 7f9fde1bf800  1 journal _open /var/lib/ceph/osd/ceph-60/journal fd 14: 10737418240 bytes, block size 4096 bytes, directio = 1, aio = 1
    -8> 2017-10-18 14:08:03.352745 7f9fde1bf800  2 journal No further valid entries found, journal is most likely valid
    -7> 2017-10-18 14:08:03.352752 7f9fde1bf800  2 journal No further valid entries found, journal is most likely valid
    -6> 2017-10-18 14:08:03.352754 7f9fde1bf800  3 journal journal_replay: end of journal, done.
    -5> 2017-10-18 14:08:03.354994 7f9fde1bf800  1 journal _open /var/lib/ceph/osd/ceph-60/journal fd 14: 10737418240 bytes, block size 4096 bytes, directio = 1, aio = 1
    -4> 2017-10-18 14:08:03.356157 7f9fde1bf800  1 filestore(/var/lib/ceph/osd/ceph-60) upgrade
    -3> 2017-10-18 14:08:03.356255 7f9fd364f700  1 FileStore::op_tp worker finish
    -2> 2017-10-18 14:08:03.356262 7f9fd3e50700  1 FileStore::op_tp worker finish
    -1> 2017-10-18 14:08:03.356309 7f9fde1bf800  1 journal close /var/lib/ceph/osd/ceph-60/journal
     0> 2017-10-18 14:08:03.373770 7f9fd7c9d700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f9fd7c9d700 thread_name:ceph-osd

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x91d8ea) [0x55b5595848ea]
 2: (()+0xf5e0) [0x7f9fdc90e5e0]
 3: [0x55b564718670]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file 
--- end dump of recent events ---
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com