Re: HDD bad sector, pg inconsistent, no object remapping

David Zafman <david.zafman@xxxxxxxxxxx> · Tue, 12 Nov 2013 15:59:13 -0800

Since the disk is failing and you have 2 other copies I would take osd.0 down.  This means that ceph will not attempt to read the bad disk either for clients or to make another copy of the data:

***** Not sure about the syntax of this for the version of ceph you are running
ceph osd down 0

Mark it “out” which will immediately trigger recovery to create more copies of the data with the remaining OSDs.
ceph osd out 0

You can now finish the process of removing the osd by looking at these instructions:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

David Zafman
Senior Developer
http://www.inktank.com

On Nov 12, 2013, at 3:16 AM, Mihály Árva-Tóth <mihaly.arva-toth@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hello,
> 
> I have 3 node, with 3 OSD in each node. I'm using .rgw.buckets pool with 3 replica. One of my HDD (osd.0) has just bad sectors, when I try to read an object from OSD direct, I get Input/output errror. dmesg:
> 
> [1214525.670065] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
> [1214525.670072] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
> [1214525.670100] sd 0:0:2:0: [sdc] Unhandled sense code
> [1214525.670104] sd 0:0:2:0: [sdc]  
> [1214525.670107] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [1214525.670110] sd 0:0:2:0: [sdc]  
> [1214525.670112] Sense Key : Medium Error [current] 
> [1214525.670117] Info fld=0x60c8f21
> [1214525.670120] sd 0:0:2:0: [sdc]  
> [1214525.670123] Add. Sense: Unrecovered read error
> [1214525.670126] sd 0:0:2:0: [sdc] CDB: 
> [1214525.670128] Read(16): 88 00 00 00 00 00 06 0c 8f 20 00 00 00 08 00 00
> 
> Okay I known need to replace HDD.
> 
> Fragment of ceph -s  output:
>   pgmap v922039: 856 pgs: 855 active+clean, 1 active+clean+inconsistent;
> 
> ceph pg dump | grep inconsistent
> 
> 11.15d  25443   0       0       0       6185091790      3001    3001    active+clean+inconsistent       2013-11-06 02:30:45.23416.....
> 
> ceph pg map 11.15d
> 
> osdmap e1600 pg 11.15d (11.15d) -> up [0,8,3] acting [0,8,3]
> 
> pg repair or deep-scrub can not fix this issue. But if I understand correctly, osd has to known it can not retrieve object from osd.0 and need to be replicate an another osd because there is no 3 working replicas now.
> 
> Thank you,
> Mihaly
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com