Hi Martin, You are perfectly right. I had check the pg num earlier...and found the host. i did dmesg on the host ... one of the drives is already responding error. with this log. sd 0:0:2:0: [sdc] Unhandled sense code sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] sd 0:0:2:0: [sdc] Add. Sense: Internal target failure sd 0:0:2:0: [sdc] CDB: Read(16): 88 00 00 00 00 03 80 01 18 20 00 00 00 08 00 00 XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 ("xfs_trans_read_buf") error 121 buf count 4096 XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 ("xfs_trans_read_buf") error 121 buf count 4096 XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 ("xfs_trans_read_buf") error 121 buf count 4096 XFS (sdc): I/O error occurred: meta-data dev sdc block 0x380011820 ("xfs_trans_read_buf") error 121 buf count 4096 sd 0:0:2:0: [sdc] Unhandled sense code sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] sd 0:0:2:0: [sdc] Add. Sense: Internal target failure sd 0:0:2:0: [sdc] CDB: Read(16): 88 00 00 00 00 03 80 01 18 20 00 00 00 08 00 00 Then i check the consistency of the drive on linux and it was ok... Then I got back to ceph to do the following : # ceph health details | more HEALTH_ERR 1 pgs inconsistent; recovery recovering 5 o/s, 4307B/s; 1 scrub erro rs pg 1.73c is active+clean+inconsistent, acting [38,68] recovery recovering 5 o/s, 4307B/s 1 scrub errors # ceph pg repair 1.73c instructing pg 1.73c on osd.38 to repair ## ceph -w health HEALTH_ERR 1 pgs inconsistent; 1 pgs stuck unclean; recovery 1/4240325 degraded (0.000%); 1 scrub errors monmap e1: 3 mons at {a=172.16.0.25:6789/0,b=172.16.0.24:6789/0,c=172.16.0.27:6789/0}, election epoch 38, quorum 0,1,2 a,b,c osdmap e1020: 96 osds: 96 up, 96 in pgmap v10240: 12416 pgs: 12415 active+clean, 1 active+inconsistent; 10738 MB data, 1009 GB used, 674 TB / 675 TB avail; 1/4240325 degraded (0.000%) mdsmap e35: 1/1/1 up {0=b=up:active}, 1 up:standby 2013-02-22 14:04:36.338239 osd.38 [ERR] 1.73c missing primary copy of 9d7a673c/100001b30 6c.00000000/head//1, unfound Summary: pg wont repair... what do u suggest???? Regards, Femi. On Fri, Feb 22, 2013 at 1:26 PM, Martin B Nielsen <martin@xxxxxxxxxxx> wrote: > Hi Femi, > > I just had a few of those as well - turned out it was a disk going bad > and it eventually died ~12h after those turned up. > > While it was ongoing I fixed it with first finding the pg in question with: > > ceph pg dump | grep inconsistent > > You should get a pg id then; then I did a deep scrub of it > > ceph pg deep-scrub <pg_id> > > Watched the logs and found that it was inconsistent. I checked dmesg > and syslog and found a disk had reported a badblock via smart. I > continued to repair it with: > > ceph pg repair <pg_id> > > I verified with another deep-scrub afterwards. > > More info here: http://eu.ceph.com/docs.raw/ref/wip-3072/control/#pg-subsystem > > /Martin > > On Fri, Feb 22, 2013 at 1:18 PM, femi anjorin <femi.anjorin@xxxxxxxxx> wrote: >> Hi , >> >> Pls how can should i solve this error? >> >> #ceph health >> HEALTH_ERR 1 pgs inconsistent, 1 scrub errors >> >> I just want to take the cluster back to the clean state. >> >> >> Regards. >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com