Re: Repair inconsistent pgs..

Voloshanenko Igor <igor.voloshanenko@xxxxxxxxx> · Thu, 20 Aug 2015 19:57:39 +0300

Not yet. I will create.But according to mail lists and Inktank docs - it's expected behaviour when cache enable

2015-08-20 19:56 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
Is there a bug for this in the tracker?

-Sam

On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor

<igor.voloshanenko@xxxxxxxxx> wrote:

> Issue, that in forward mode, fstrim doesn't work proper, and when we take

> snapshot - data not proper update in cache layer, and client (ceph) see

> damaged snap.. As headers requested from cache layer.

>

> 2015-08-20 19:53 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>>

>> What was the issue?

>> -Sam

>>

>> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor

>> <igor.voloshanenko@xxxxxxxxx> wrote:

>> > Samuel, we turned off cache layer few hours ago...

>> > I will post ceph.log in few minutes

>> >

>> > For snap - we found issue, was connected with cache tier..

>> >

>> > 2015-08-20 19:23 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>> >>

>> >> Ok, you appear to be using a replicated cache tier in front of a

>> >> replicated base tier.  Please scrub both inconsistent pgs and post the

>> >> ceph.log from before when you started the scrub until after.  Also,

>> >> what command are you using to take snapshots?

>> >> -Sam

>> >>

>> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor

>> >> <igor.voloshanenko@xxxxxxxxx> wrote:

>> >> > Hi Samuel, we try to fix it in trick way.

>> >> >

>> >> > we check all rbd_data chunks from logs (OSD) which are affected, then

>> >> > query

>> >> > rbd info to compare which rbd consist bad rbd_data, after that we

>> >> > mount

>> >> > this

>> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume to new

>> >> > one.

>> >> >

>> >> > But after that - scrub errors growing... Was 15 errors.. .Now 35...

>> >> > We

>> >> > laos

>> >> > try to out OSD which was lead, but after rebalancing this 2 pgs still

>> >> > have

>> >> > 35 scrub errors...

>> >> >

>> >> > ceph osd getmap -o <outfile> - attached

>> >> >

>> >> >

>> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>> >> >>

>> >> >> Is the number of inconsistent objects growing?  Can you attach the

>> >> >> whole ceph.log from the 6 hours before and after the snippet you

>> >> >> linked above?  Are you using cache/tiering?  Can you attach the

>> >> >> osdmap

>> >> >> (ceph osd getmap -o <outfile>)?

>> >> >> -Sam

>> >> >>

>> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor

>> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:

>> >> >> > ceph - 0.94.2

>> >> >> > Its happen during rebalancing

>> >> >> >

>> >> >> > I thought too, that some OSD miss copy, but looks like all miss...

>> >> >> > So any advice in which direction i need to go

>> >> >> >

>> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:

>> >> >> >>

>> >> >> >> From a quick peek it looks like some of the OSDs are missing

>> >> >> >> clones

>> >> >> >> of

>> >> >> >> objects. I'm not sure how that could happen and I'd expect the pg

>> >> >> >> repair to handle that but if it's not there's probably something

>> >> >> >> wrong; what version of Ceph are you running? Sam, is this

>> >> >> >> something

>> >> >> >> you've seen, a new bug, or some kind of config issue?

>> >> >> >> -Greg

>> >> >> >>

>> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor

>> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:

>> >> >> >> > Hi all, at our production cluster, due high rebalancing ((( we

>> >> >> >> > have 2

>> >> >> >> > pgs in

>> >> >> >> > inconsistent state...

>> >> >> >> >

>> >> >> >> > root@temp:~# ceph health detail | grep inc

>> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors

>> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]

>> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

>> >> >> >> >

>> >> >> >> > From OSD logs, after recovery attempt:

>> >> >> >> >

>> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while

>> >> >> >> > read

>> >> >> >> > i;

>> >> >> >> > do

>> >> >> >> > ceph pg repair ${i} ; done

>> >> >> >> > dumped all in format plain

>> >> >> >> > instructing pg 2.490 on osd.56 to repair

>> >> >> >> > instructing pg 2.c4 on osd.56 to repair

>> >> >> >> >

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > 90c59490/rbd_data.eb486436f2beb.0000000000007a65/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490

>> >> >> >> > e1509490/rbd_data.1423897545e146.00000000000009a6/head//2

>> >> >> >> > expected

>> >> >> >> > clone

>> >> >> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/141//2

>> >> >> >> > /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765

>> >> >> >> > 7f94663b3700

>> >> >> >> > -1

>> >> >> >> > log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors

>> >> >> >> >

>> >> >> >> > So, how i can solve "expected clone" situation by hand?

>> >> >> >> > Thank in advance!

>> >> >> >> >

>> >> >> >> >

>> >> >> >> >

>> >> >> >> > _______________________________________________

>> >> >> >> > ceph-users mailing list

>> >> >> >> > ceph-users@xxxxxxxxxxxxxx

>> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >> >> >

>> >> >> >

>> >> >> >

>> >> >

>> >> >

>> >

>> >

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com