Re: Repair inconsistent pgs..

Samuel Just <sjust@xxxxxxxxxx> · Thu, 20 Aug 2015 15:36:32 -0700

Actually, now that I think about it, you probably didn't remove the
images for 3fac9490/rbd_data.eb5f22eb141f2.00000000000004ba/snapdir//2
and 22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir//2, but
other images (that's why the scrub errors went down briefly, those
objects -- which were fine -- went away).  You might want to export
and reimport those two images into new images, but leave the old ones
alone until you can clean up the on disk state (image and snapshots)
and clear the scrub errors.  You probably don't want to read the
snapshots for those images either.  Everything else is, I think,
harmless.

The ceph-objectstore-tool feature would probably not be too hard,
actually.  Each head/snapdir image has two attrs (possibly stored in
leveldb -- that's why you want to modify the ceph-objectstore-tool and
use its interfaces rather than mucking about with the files directly)
'_' and 'snapset' which contain encoded representations of
object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
SnapSet has a set of clones and related metadata -- you want to read
the SnapSet attr off disk and commit a transaction writing out a new
version with that clone removed.  I'd start by cloning the repo,
starting a vstart cluster locally, and reproducing the issue.  Next,
get familiar with using ceph-objectstore-tool on the osds in that
vstart cluster.  A good first change would be creating a
ceph-objectstore-tool op that lets you dump json for the object_info_t
and SnapSet (both types have format() methods which make that easy) on
an object to stdout so you can confirm what's actually there.  oftc
#ceph-devel or the ceph-devel mailing list would be the right place to
ask questions.

Otherwise, it'll probably get done in the next few weeks.
-Sam

On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
<igor.voloshanenko@xxxxxxxxx> wrote:
> thank you Sam!
> I also noticed this linked errors during scrub...
>
> Now all lools like reasonable!
>
> So we will wait for bug to be closed.
>
> do you need any help on it?
>
> I mean i can help with coding/testing/etc...
>
> 2015-08-21 0:52 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>>
>> Ah, this is kind of silly.  I think you don't have 37 errors, but 2
>> errors.  pg 2.490 object
>> 3fac9490/rbd_data.eb5f22eb141f2.00000000000004ba/snapdir//2 is missing
>> snap 141.  If you look at the objects after that in the log:
>>
>> 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
>> [ERR] repair 2.490
>> 68c89490/rbd_data.16796a3d1b58ba.0000000000000047/head//2 expected
>> clone 2d7b9490/rbd_data.18f92c3d1b58ba.0000000000006167/141//2
>> 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
>> [ERR] repair 2.490
>> ded49490/rbd_data.11a25c7934d3d4.0000000000008a8a/head//2 expected
>> clone 68c89490/rbd_data.16796a3d1b58ba.0000000000000047/141//2
>>
>> The clone from the second line matches the head object from the
>> previous line, and they have the same clone id.  I *think* that the
>> first error is real, and the subsequent ones are just scrub being
>> dumb.  Same deal with pg 2.c4.  I just opened
>> http://tracker.ceph.com/issues/12738.
>>
>> The original problem is that
>> 3fac9490/rbd_data.eb5f22eb141f2.00000000000004ba/snapdir//2 and
>> 22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir//2 are both
>> missing a clone.  Not sure how that happened, my money is on a
>> cache/tiering evict racing with a snap trim.  If you have any logging
>> or relevant information from when that happened, you should open a
>> bug.  The 'snapdir' in the two object names indicates that the head
>> object has actually been deleted (which makes sense if you moved the
>> image to a new image and deleted the old one) and is only being kept
>> around since there are live snapshots.  I suggest you leave the
>> snapshots for those images alone for the time being -- removing them
>> might cause the osd to crash trying to clean up the wierd on disk
>> state.  Other than the leaked space from those two image snapshots and
>> the annoying spurious scrub errors, I think no actual corruption is
>> going on though.  I created a tracker ticket for a feature that would
>> let ceph-objectstore-tool remove the spurious clone from the
>> head/snapdir metadata.
>>
>> Am I right that you haven't actually seen any osd crashes or user
>> visible corruption (except possibly on snapshots of those two images)?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
>> <igor.voloshanenko@xxxxxxxxx> wrote:
>> > Inktank:
>> >
>> > https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
>> >
>> > Mail-list:
>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg18338.html
>> >
>> > 2015-08-20 20:06 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >>
>> >> Which docs?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> > Not yet. I will create.
>> >> > But according to mail lists and Inktank docs - it's expected
>> >> > behaviour
>> >> > when
>> >> > cache enable
>> >> >
>> >> > 2015-08-20 19:56 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >> >>
>> >> >> Is there a bug for this in the tracker?
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> > Issue, that in forward mode, fstrim doesn't work proper, and when
>> >> >> > we
>> >> >> > take
>> >> >> > snapshot - data not proper update in cache layer, and client
>> >> >> > (ceph)
>> >> >> > see
>> >> >> > damaged snap.. As headers requested from cache layer.
>> >> >> >
>> >> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >> >> >>
>> >> >> >> What was the issue?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> >> > Samuel, we turned off cache layer few hours ago...
>> >> >> >> > I will post ceph.log in few minutes
>> >> >> >> >
>> >> >> >> > For snap - we found issue, was connected with cache tier..
>> >> >> >> >
>> >> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >> >> >> >>
>> >> >> >> >> Ok, you appear to be using a replicated cache tier in front of
>> >> >> >> >> a
>> >> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and
>> >> >> >> >> post
>> >> >> >> >> the
>> >> >> >> >> ceph.log from before when you started the scrub until after.
>> >> >> >> >> Also,
>> >> >> >> >> what command are you using to take snapshots?
>> >> >> >> >> -Sam
>> >> >> >> >>
>> >> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>> >> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> >> >> > Hi Samuel, we try to fix it in trick way.
>> >> >> >> >> >
>> >> >> >> >> > we check all rbd_data chunks from logs (OSD) which are
>> >> >> >> >> > affected,
>> >> >> >> >> > then
>> >> >> >> >> > query
>> >> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after
>> >> >> >> >> > that
>> >> >> >> >> > we
>> >> >> >> >> > mount
>> >> >> >> >> > this
>> >> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad
>> >> >> >> >> > volume
>> >> >> >> >> > to
>> >> >> >> >> > new
>> >> >> >> >> > one.
>> >> >> >> >> >
>> >> >> >> >> > But after that - scrub errors growing... Was 15 errors..
>> >> >> >> >> > .Now
>> >> >> >> >> > 35...
>> >> >> >> >> > We
>> >> >> >> >> > laos
>> >> >> >> >> > try to out OSD which was lead, but after rebalancing this 2
>> >> >> >> >> > pgs
>> >> >> >> >> > still
>> >> >> >> >> > have
>> >> >> >> >> > 35 scrub errors...
>> >> >> >> >> >
>> >> >> >> >> > ceph osd getmap -o <outfile> - attached
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
>> >> >> >> >> >>
>> >> >> >> >> >> Is the number of inconsistent objects growing?  Can you
>> >> >> >> >> >> attach
>> >> >> >> >> >> the
>> >> >> >> >> >> whole ceph.log from the 6 hours before and after the
>> >> >> >> >> >> snippet
>> >> >> >> >> >> you
>> >> >> >> >> >> linked above?  Are you using cache/tiering?  Can you attach
>> >> >> >> >> >> the
>> >> >> >> >> >> osdmap
>> >> >> >> >> >> (ceph osd getmap -o <outfile>)?
>> >> >> >> >> >> -Sam
>> >> >> >> >> >>
>> >> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>> >> >> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> >> >> >> > ceph - 0.94.2
>> >> >> >> >> >> > Its happen during rebalancing
>> >> >> >> >> >> >
>> >> >> >> >> >> > I thought too, that some OSD miss copy, but looks like
>> >> >> >> >> >> > all
>> >> >> >> >> >> > miss...
>> >> >> >> >> >> > So any advice in which direction i need to go
>> >> >> >> >> >> >
>> >> >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum
>> >> >> >> >> >> > <gfarnum@xxxxxxxxxx>:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> From a quick peek it looks like some of the OSDs are
>> >> >> >> >> >> >> missing
>> >> >> >> >> >> >> clones
>> >> >> >> >> >> >> of
>> >> >> >> >> >> >> objects. I'm not sure how that could happen and I'd
>> >> >> >> >> >> >> expect
>> >> >> >> >> >> >> the
>> >> >> >> >> >> >> pg
>> >> >> >> >> >> >> repair to handle that but if it's not there's probably
>> >> >> >> >> >> >> something
>> >> >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is
>> >> >> >> >> >> >> this
>> >> >> >> >> >> >> something
>> >> >> >> >> >> >> you've seen, a new bug, or some kind of config issue?
>> >> >> >> >> >> >> -Greg
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >> >> >> >> >> >> <igor.voloshanenko@xxxxxxxxx> wrote:
>> >> >> >> >> >> >> > Hi all, at our production cluster, due high
>> >> >> >> >> >> >> > rebalancing
>> >> >> >> >> >> >> > (((
>> >> >> >> >> >> >> > we
>> >> >> >> >> >> >> > have 2
>> >> >> >> >> >> >> > pgs in
>> >> >> >> >> >> >> > inconsistent state...
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > root@temp:~# ceph health detail | grep inc
>> >> >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
>> >> >> >> >> >> >> > pg 2.490 is active+clean+inconsistent, acting
>> >> >> >> >> >> >> > [56,15,29]
>> >> >> >> >> >> >> > pg 2.c4 is active+clean+inconsistent, acting
>> >> >> >> >> >> >> > [56,10,42]
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > From OSD logs, after recovery attempt:
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1
>> >> >> >> >> >> >> > |
>> >> >> >> >> >> >> > while
>> >> >> >> >> >> >> > read
>> >> >> >> >> >> >> > i;
>> >> >> >> >> >> >> > do
>> >> >> >> >> >> >> > ceph pg repair ${i} ; done
>> >> >> >> >> >> >> > dumped all in format plain
>> >> >> >> >> >> >> > instructing pg 2.490 on osd.56 to repair
>> >> >> >> >> >> >> > instructing pg 2.c4 on osd.56 to repair
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.035910
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > 90c59490/rbd_data.eb486436f2beb.0000000000007a65/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.035960
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.036133
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.036243
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.036289
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.036314
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.036363
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18
>> >> >> >> >> >> >> > 07:26:37.036432
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > e1509490/rbd_data.1423897545e146.00000000000009a6/head//2
>> >> >> >> >> >> >> > expected
>> >> >> >> >> >> >> > clone
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/141//2
>> >> >> >> >> >> >> > /var/log/ceph/ceph-osd.56.log:59:2015-08-18
>> >> >> >> >> >> >> > 07:26:38.548765
>> >> >> >> >> >> >> > 7f94663b3700
>> >> >> >> >> >> >> > -1
>> >> >> >> >> >> >> > log_channel(cluster) log [ERR] : 2.490 deep-scrub 17
>> >> >> >> >> >> >> > errors
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > So, how i can solve "expected clone" situation by
>> >> >> >> >> >> >> > hand?
>> >> >> >> >> >> >> > Thank in advance!
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > _______________________________________________
>> >> >> >> >> >> >> > ceph-users mailing list
>> >> >> >> >> >> >> > ceph-users@xxxxxxxxxxxxxx
>> >> >> >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com