Probably common sense but I was bitten by this once in a likewise situation..
Cheers,
Martin
On Wed, Nov 13, 2013 at 12:59 AM, David Zafman <david.zafman@xxxxxxxxxxx> wrote:
Since the disk is failing and you have 2 other copies I would take osd.0 down. This means that ceph will not attempt to read the bad disk either for clients or to make another copy of the data:
***** Not sure about the syntax of this for the version of ceph you are running
ceph osd down 0
Mark it “out” which will immediately trigger recovery to create more copies of the data with the remaining OSDs.
ceph osd out 0
You can now finish the process of removing the osd by looking at these instructions:
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual
David Zafman
Senior Developer
http://www.inktank.com
> _______________________________________________
On Nov 12, 2013, at 3:16 AM, Mihály Árva-Tóth <mihaly.arva-toth@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> I have 3 node, with 3 OSD in each node. I'm using .rgw.buckets pool with 3 replica. One of my HDD (osd.0) has just bad sectors, when I try to read an object from OSD direct, I get Input/output errror. dmesg:
>
> [1214525.670065] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
> [1214525.670072] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
> [1214525.670100] sd 0:0:2:0: [sdc] Unhandled sense code
> [1214525.670104] sd 0:0:2:0: [sdc]
> [1214525.670107] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [1214525.670110] sd 0:0:2:0: [sdc]
> [1214525.670112] Sense Key : Medium Error [current]
> [1214525.670117] Info fld=0x60c8f21
> [1214525.670120] sd 0:0:2:0: [sdc]
> [1214525.670123] Add. Sense: Unrecovered read error
> [1214525.670126] sd 0:0:2:0: [sdc] CDB:
> [1214525.670128] Read(16): 88 00 00 00 00 00 06 0c 8f 20 00 00 00 08 00 00
>
> Okay I known need to replace HDD.
>
> Fragment of ceph -s output:
> pgmap v922039: 856 pgs: 855 active+clean, 1 active+clean+inconsistent;
>
> ceph pg dump | grep inconsistent
>
> 11.15d 25443 0 0 0 6185091790 3001 3001 active+clean+inconsistent 2013-11-06 02:30:45.23416.....
>
> ceph pg map 11.15d
>
> osdmap e1600 pg 11.15d (11.15d) -> up [0,8,3] acting [0,8,3]
>
> pg repair or deep-scrub can not fix this issue. But if I understand correctly, osd has to known it can not retrieve object from osd.0 and need to be replicate an another osd because there is no 3 working replicas now.
>
> Thank you,
> Mihaly
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com