Hi, Did you ever find an answer/solution and/or why it happened? I'm sure someone else will eventually run into this as well. /Martin On Fri, Feb 22, 2013 at 2:18 PM, femi anjorin <femi.anjorin@xxxxxxxxx> wrote: > Hi Martin, > > You are perfectly right. I had check the pg num earlier...and found the host. > > i did dmesg on the host ... one of the drives is already responding > error. with this log. > > sd 0:0:2:0: [sdc] Unhandled sense code > sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] > sd 0:0:2:0: [sdc] Add. Sense: Internal target failure > sd 0:0:2:0: [sdc] CDB: Read(16): 88 00 00 00 00 03 80 01 18 20 00 00 00 08 00 00 > XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 > ("xfs_trans_read_buf") error 121 buf count 4096 > XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 > ("xfs_trans_read_buf") error 121 buf count 4096 > XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 > ("xfs_trans_read_buf") error 121 buf count 4096 > XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 > ("xfs_trans_read_buf") error 121 buf count 4096 > sd 0:0:2:0: [sdc] Unhandled sense code > sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] > sd 0:0:2:0: [sdc] Add. Sense: Internal target failure > sd 0:0:2:0: [sdc] CDB: Read(16): 88 00 00 00 00 03 80 01 18 20 00 00 00 08 00 00 > > Then i check the consistency of the drive on linux and it was ok... > > Then I got back to ceph to do the following : > > # ceph health details | more > HEALTH_ERR 1 pgs inconsistent; recovery recovering 5 o/s, 4307B/s; 1 scrub erro > rs > pg 1.73c is active+clean+inconsistent, acting [38,68] > recovery recovering 5 o/s, 4307B/s > 1 scrub errors > > # ceph pg repair 1.73c > instructing pg 1.73c on osd.38 to repair > > > ## ceph -w > health HEALTH_ERR 1 pgs inconsistent; 1 pgs stuck unclean; recovery > 1/4240325 degraded (0.000%); 1 scrub errors > monmap e1: 3 mons at > {a=172.16.0.25:6789/0,b=172.16.0.24:6789/0,c=172.16.0.27:6789/0}, > election epoch 38, quorum 0,1,2 a,b,c > osdmap e1020: 96 osds: 96 up, 96 in > pgmap v10240: 12416 pgs: 12415 active+clean, 1 > active+inconsistent; 10738 MB data, 1009 GB used, 674 TB / 675 TB > avail; 1/4240325 degraded (0.000%) > mdsmap e35: 1/1/1 up {0=b=up:active}, 1 up:standby > > > 2013-02-22 14:04:36.338239 osd.38 [ERR] 1.73c missing primary copy of > 9d7a673c/100001b30 6c.00000000/head//1, > unfound > > > Summary: pg wont repair... what do u suggest???? > > > Regards, > Femi. > > > On Fri, Feb 22, 2013 at 1:26 PM, Martin B Nielsen <martin@xxxxxxxxxxx> wrote: >> Hi Femi, >> >> I just had a few of those as well - turned out it was a disk going bad >> and it eventually died ~12h after those turned up. >> >> While it was ongoing I fixed it with first finding the pg in question with: >> >> ceph pg dump | grep inconsistent >> >> You should get a pg id then; then I did a deep scrub of it >> >> ceph pg deep-scrub <pg_id> >> >> Watched the logs and found that it was inconsistent. I checked dmesg >> and syslog and found a disk had reported a badblock via smart. I >> continued to repair it with: >> >> ceph pg repair <pg_id> >> >> I verified with another deep-scrub afterwards. >> >> More info here: http://eu.ceph.com/docs.raw/ref/wip-3072/control/#pg-subsystem >> >> /Martin >> >> On Fri, Feb 22, 2013 at 1:18 PM, femi anjorin <femi.anjorin@xxxxxxxxx> wrote: >>> Hi , >>> >>> Pls how can should i solve this error? >>> >>> #ceph health >>> HEALTH_ERR 1 pgs inconsistent, 1 scrub errors >>> >>> I just want to take the cluster back to the clean state. >>> >>> >>> Regards. >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com