Re: Repair inconsistent pgs..

Samuel Just <sjust@xxxxxxxxxx> · Thu, 20 Aug 2015 14:52:00 -0700

Ah, this is kind of silly.  I think you don't have 37 errors, but 2
errors.  pg 2.490 object
3fac9490/rbd_data.eb5f22eb141f2.00000000000004ba/snapdir//2 is missing
snap 141.  If you look at the objects after that in the log:

2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
[ERR] repair 2.490
68c89490/rbd_data.16796a3d1b58ba.0000000000000047/head//2 expected
clone 2d7b9490/rbd_data.18f92c3d1b58ba.0000000000006167/141//2
2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
[ERR] repair 2.490
ded49490/rbd_data.11a25c7934d3d4.0000000000008a8a/head//2 expected
clone 68c89490/rbd_data.16796a3d1b58ba.0000000000000047/141//2

The clone from the second line matches the head object from the
previous line, and they have the same clone id.  I *think* that the
first error is real, and the subsequent ones are just scrub being
dumb.  Same deal with pg 2.c4.  I just opened
http://tracker.ceph.com/issues/12738.

The original problem is that
3fac9490/rbd_data.eb5f22eb141f2.00000000000004ba/snapdir//2 and
22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir//2 are both
missing a clone.  Not sure how that happened, my money is on a
cache/tiering evict racing with a snap trim.  If you have any logging
or relevant information from when that happened, you should open a
bug.  The 'snapdir' in the two object names indicates that the head
object has actually been deleted (which makes sense if you moved the
image to a new image and deleted the old one) and is only being kept
around since there are live snapshots.  I suggest you leave the
snapshots for those images alone for the time being -- removing them
might cause the osd to crash trying to clean up the wierd on disk
state.  Other than the leaked space from those two image snapshots and
the annoying spurious scrub errors, I think no actual corruption is
going on though.  I created a tracker ticket for a feature that would
let ceph-objectstore-tool remove the spurious clone from the
head/snapdir metadata.

Am I right that you haven't actually seen any osd crashes or user
visible corruption (except possibly on snapshots of those two images)?
-Sam

On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
<igor.voloshanenko@xxxxxxxxx> wrote:
> Inktank:
> https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
>
> Mail-list:
> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg18338.html
>
> 2015-08-20 20:06 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>>
>> Which docs?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>> <igor.voloshanenko@xxxxxxxxx> wrote:
>> > Not yet. I will create.
>> > But according to mail lists and Inktank docs - it's expected behaviour
>> > when
>> > cache enable
>> >
>> > 2015-08-20 19:56 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >>
>> >> Is there a bug for this in the tracker?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we
>> >> > take
>> >> > snapshot - data not proper update in cache layer, and client (ceph)
>> >> > see
>> >> > damaged snap.. As headers requested from cache layer.
>> >> >
>> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >> >>
>> >> >> What was the issue?
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> > Samuel, we turned off cache layer few hours ago...
>> >> >> > I will post ceph.log in few minutes
>> >> >> >
>> >> >> > For snap - we found issue, was connected with cache tier..
>> >> >> >
>> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >> >> >>
>> >> >> >> Ok, you appear to be using a replicated cache tier in front of a
>> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and
>> >> >> >> post
>> >> >> >> the
>> >> >> >> ceph.log from before when you started the scrub until after.
>> >> >> >> Also,
>> >> >> >> what command are you using to take snapshots?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> >> > Hi Samuel, we try to fix it in trick way.
>> >> >> >> >
>> >> >> >> > we check all rbd_data chunks from logs (OSD) which are
>> >> >> >> > affected,
>> >> >> >> > then
>> >> >> >> > query
>> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that
>> >> >> >> > we
>> >> >> >> > mount
>> >> >> >> > this
>> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume
>> >> >> >> > to
>> >> >> >> > new
>> >> >> >> > one.
>> >> >> >> >
>> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now
>> >> >> >> > 35...
>> >> >> >> > We
>> >> >> >> > laos
>> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
>> >> >> >> > still
>> >> >> >> > have
>> >> >> >> > 35 scrub errors...
>> >> >> >> >
>> >> >> >> > ceph osd getmap -o <outfile> - attached
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >> >> >> >>
>> >> >> >> >> Is the number of inconsistent objects growing?  Can you attach
>> >> >> >> >> the
>> >> >> >> >> whole ceph.log from the 6 hours before and after the snippet
>> >> >> >> >> you
>> >> >> >> >> linked above?  Are you using cache/tiering?  Can you attach
>> >> >> >> >> the
>> >> >> >> >> osdmap
>> >> >> >> >> (ceph osd getmap -o <outfile>)?
>> >> >> >> >> -Sam
>> >> >> >> >>
>> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>> >> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> >> >> > ceph - 0.94.2
>> >> >> >> >> > Its happen during rebalancing
>> >> >> >> >> >
>> >> >> >> >> > I thought too, that some OSD miss copy, but looks like all
>> >> >> >> >> > miss...
>> >> >> >> >> > So any advice in which direction i need to go
>> >> >> >> >> >
>> >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum
>> >> >> >> >> > <gfarnum@xxxxxxxxxx>:
>> >> >> >> >> >>
>> >> >> >> >> >> From a quick peek it looks like some of the OSDs are
>> >> >> >> >> >> missing
>> >> >> >> >> >> clones
>> >> >> >> >> >> of
>> >> >> >> >> >> objects. I'm not sure how that could happen and I'd expect
>> >> >> >> >> >> the
>> >> >> >> >> >> pg
>> >> >> >> >> >> repair to handle that but if it's not there's probably
>> >> >> >> >> >> something
>> >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this
>> >> >> >> >> >> something
>> >> >> >> >> >> you've seen, a new bug, or some kind of config issue?
>> >> >> >> >> >> -Greg
>> >> >> >> >> >>
>> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >> >> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> >> >> >> > Hi all, at our production cluster, due high rebalancing
>> >> >> >> >> >> > (((
>> >> >> >> >> >> > we
>> >> >> >> >> >> > have 2
>> >> >> >> >> >> > pgs in
>> >> >> >> >> >> > inconsistent state...
>> >> >> >> >> >> >
>> >> >> >> >> >> > root@temp:~# ceph health detail | grep inc
>> >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >> >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29]
>> >> >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
>> >> >> >> >> >> >
>> >> >> >> >> >> > From OSD logs, after recovery attempt:
>> >> >> >> >> >> >
>> >> >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 |
>> >> >> >> >> >> > while
>> >> >> >> >> >> > read
>> >> >> >> >> >> > i;
>> >> >> >> >> >> > do
>> >> >> >> >> >> > ceph pg repair ${i} ; done
>> >> >> >> >> >> > dumped all in format plain
>> >> >> >> >> >> > instructing pg 2.490 on osd.56 to repair
>> >> >> >> >> >> > instructing pg 2.c4 on osd.56 to repair
>> >> >> >> >> >> >
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18
>> >> >> >> >> >> > 07:26:37.035910
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > 90c59490/rbd_data.eb486436f2beb.0000000000007a65/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18
>> >> >> >> >> >> > 07:26:37.035960
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18
>> >> >> >> >> >> > 07:26:37.036133
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18
>> >> >> >> >> >> > 07:26:37.036243
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18
>> >> >> >> >> >> > 07:26:37.036289
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18
>> >> >> >> >> >> > 07:26:37.036314
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18
>> >> >> >> >> >> > 07:26:37.036363
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18
>> >> >> >> >> >> > 07:26:37.036432
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> > e1509490/rbd_data.1423897545e146.00000000000009a6/head//2
>> >> >> >> >> >> > expected
>> >> >> >> >> >> > clone
>> >> >> >> >> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/141//2
>> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:59:2015-08-18
>> >> >> >> >> >> > 07:26:38.548765
>> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> > -1
>> >> >> >> >> >> > log_channel(cluster) log [ERR] : 2.490 deep-scrub 17
>> >> >> >> >> >> > errors
>> >> >> >> >> >> >
>> >> >> >> >> >> > So, how i can solve "expected clone" situation by hand?
>> >> >> >> >> >> > Thank in advance!
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > _______________________________________________
>> >> >> >> >> >> > ceph-users mailing list
>> >> >> >> >> >> > ceph-users@xxxxxxxxxxxxxx
>> >> >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com