Re: pgs inconsistent, scrub errors

femi anjorin <femi.anjorin@xxxxxxxxx> · Fri, 22 Feb 2013 14:18:38 +0100

Hi Martin,

You are perfectly right. I had check the pg num earlier...and found the host.

i did dmesg on the host ... one of the drives is already responding
error. with this log.

sd 0:0:2:0: [sdc] Unhandled sense code
sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current]
sd 0:0:2:0: [sdc] Add. Sense: Internal target failure
sd 0:0:2:0: [sdc] CDB: Read(16): 88 00 00 00 00 03 80 01 18 20 00 00 00 08 00 00
XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820
  ("xfs_trans_read_buf") error 121 buf count 4096
XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820
  ("xfs_trans_read_buf") error 121 buf count 4096
XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820
  ("xfs_trans_read_buf") error 121 buf count 4096
XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820
  ("xfs_trans_read_buf") error 121 buf count 4096
sd 0:0:2:0: [sdc] Unhandled sense code
sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current]
sd 0:0:2:0: [sdc] Add. Sense: Internal target failure
sd 0:0:2:0: [sdc] CDB: Read(16): 88 00 00 00 00 03 80 01 18 20 00 00 00 08 00 00

Then i check the  consistency of the drive on linux and it was ok...

Then I got back to ceph to do the following :

# ceph health details | more
HEALTH_ERR 1 pgs inconsistent; recovery  recovering 5 o/s, 4307B/s; 1 scrub erro
rs
pg 1.73c is active+clean+inconsistent, acting [38,68]
recovery  recovering 5 o/s, 4307B/s
1 scrub errors

# ceph pg repair 1.73c
instructing pg 1.73c on osd.38 to repair

## ceph -w
   health HEALTH_ERR 1 pgs inconsistent; 1 pgs stuck unclean; recovery
1/4240325 degraded (0.000%); 1 scrub errors
   monmap e1: 3 mons at
{a=172.16.0.25:6789/0,b=172.16.0.24:6789/0,c=172.16.0.27:6789/0},
election epoch 38, quorum 0,1,2 a,b,c
   osdmap e1020: 96 osds: 96 up, 96 in
    pgmap v10240: 12416 pgs: 12415 active+clean, 1
active+inconsistent; 10738 MB data, 1009 GB used, 674 TB / 675 TB
avail; 1/4240325 degraded (0.000%)
   mdsmap e35: 1/1/1 up {0=b=up:active}, 1 up:standby

2013-02-22 14:04:36.338239 osd.38 [ERR] 1.73c missing primary copy of
9d7a673c/100001b30                             6c.00000000/head//1,
unfound

Summary: pg wont repair... what do u suggest????

Regards,
Femi.

On Fri, Feb 22, 2013 at 1:26 PM, Martin B Nielsen <martin@xxxxxxxxxxx> wrote:
> Hi Femi,
>
> I just had a few of those as well - turned out it was a disk going bad
> and it eventually died ~12h after those turned up.
>
> While it was ongoing I fixed it with first finding the pg in question with:
>
> ceph pg dump | grep inconsistent
>
> You should get a pg id then; then I did a deep scrub of it
>
> ceph pg deep-scrub <pg_id>
>
> Watched the logs and found that it was inconsistent. I checked dmesg
> and syslog and found a disk had reported a badblock via smart. I
> continued to repair it with:
>
> ceph pg repair <pg_id>
>
> I verified with another deep-scrub afterwards.
>
> More info here: http://eu.ceph.com/docs.raw/ref/wip-3072/control/#pg-subsystem
>
> /Martin
>
> On Fri, Feb 22, 2013 at 1:18 PM, femi anjorin <femi.anjorin@xxxxxxxxx> wrote:
>> Hi ,
>>
>> Pls how can should i solve this error?
>>
>> #ceph health
>> HEALTH_ERR 1 pgs inconsistent, 1 scrub errors
>>
>> I just want to take the cluster back to the clean state.
>>
>>
>> Regards.
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com