Re: Repairing PG inconsistencies — Ceph Documentation - where's the text?

huang jun <hjwsm1989@xxxxxxxxx> · Sat, 18 May 2019 09:56:42 +0800



That may have problem with your disk?
Do you check the syslog or demsg log,?
>From the code, it will return 'read_error' only the read return EIO.
So i doubt that your disk have a sector error.

Stuart Longland <stuartl@xxxxxxxxxxxxxxxxxx> 于2019年5月18日周六 上午9:43写道：
>
> On 18/5/19 11:34 am, huang jun wrote:
> > Stuart Longland <stuartl@xxxxxxxxxxxxxxxxxx> 于2019年5月18日周六 上午9:26写道：
> >>
> >> On 16/5/19 8:55 pm, Stuart Longland wrote:
> >>> As this is Bluestore, it's not clear what I should do to resolve that,
> >>> so I thought I'd "RTFM" before asking here:
> >>> http://docs.ceph.com/docs/luminous/rados/operations/pg-repair/
> >>>
> >>> Maybe there's a secret hand-shake my web browser doesn't know about or
> >>> maybe the page is written in invisible ink, but that page appears blank
> >>> to me.
> >>
> >> Does anyone know why that page shows up blank?  I still have a placement
> >> group that is "inconsistent".  (A different one this time, but still!)
> >>
> > That maybe something wrong in ceph.com, it's a blank page for me.
>
> Ahh okay, so I'm not going crazy … yet. :-)
>
> >> Some pages I've researched suggest going to the OSD's mount-point and
> >> moving the offending object away, however Linux kernel 4.19.17 does not
> >> have a 'bluestore' driver, so I can't mount the file system to get at
> >> the offending object.
> >>
> >> Running `ceph pg repair <ID>` tells me it has "instructed" the OSD to do
> >> a repair.  The OSD shows nothing at all in its logs even acknowledging
> >> the command, and the problem persists.  The only log messages I have of
> >> the issue are from yesterday:
> >>
> >>> 2019-05-17 05:59:53.170552 7f009b0be700 -1 log_channel(cluster) log [ERR] : 7.1a shard 3 soid 7:581d78de:::rbd_data.b48c7238e1f29.0000000000001b34:head : candidate had a read error
> >>> 2019-05-17 07:07:20.723999 7f009b0be700 -1 log_channel(cluster) log [ERR] : 7.1a shard 3 soid 7:5b335293:::rbd_data.8c9e1238e1f29.0000000000001438:head : candidate had a read error
> >>> 2019-05-17 07:29:16.537539 7f009b0be700 -1 log_channel(cluster) log [ERR] : 7.1a deep-scrub 0 missing, 2 inconsistent objects
> >>> 2019-05-17 07:29:16.537557 7f009b0be700 -1 log_channel(cluster) log [ERR] : 7.1a deep-scrub 2 errors
> >>
> >> … not from just now when I issued the command.  Why is my `ceph pg
> >> repair` command being ignored?
> > ceph pg repair will let pg do scrub and repair the inconsistent
> > do you still see this warning messages after 'pg repair'?
>
> Yes, I've been running `ceph pg repair 7.1a` repeatedly for the past 4
> hours.  No new log messages, and still `ceph health detail` shows this:
>
> > carbon ~ # ceph pg repair 7.1a
> > instructing pg 7.1a on osd.2 to repair
> > carbon ~ # ceph health detail
> > HEALTH_ERR 2 scrub errors; Possible data damage: 1 pg inconsistent
> > OSD_SCRUB_ERRORS 2 scrub errors
> > PG_DAMAGED Possible data damage: 1 pg inconsistent
> >     pg 7.1a is active+clean+inconsistent, acting [2,3]
>
> I've also tried `ceph pg deep-scrub 7.1a` to no effect.
>
> I may shut the cluster down later to do some power infrastructure work
> (need to add a new power distribution box to power two new nodes) and
> possibly even install a new 48-port Ethernet switch but right now, I'd
> like to try and get my storage cluster back to health.
>
> Regards,
> --
> Stuart Longland (aka Redhatter, VK4MSL)
>
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.


-- 
Thank you!
HuangJun
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com