Since the disk is failing and you have 2 other copies I would take osd.0 down. This means that ceph will not attempt to read the bad disk either for clients or to make another copy of the data: ***** Not sure about the syntax of this for the version of ceph you are running ceph osd down 0 Mark it “out” which will immediately trigger recovery to create more copies of the data with the remaining OSDs. ceph osd out 0 You can now finish the process of removing the osd by looking at these instructions: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual David Zafman Senior Developer http://www.inktank.com On Nov 12, 2013, at 3:16 AM, Mihály Árva-Tóth <mihaly.arva-toth@xxxxxxxxxxxxxxxxxxxxxx> wrote: > Hello, > > I have 3 node, with 3 OSD in each node. I'm using .rgw.buckets pool with 3 replica. One of my HDD (osd.0) has just bad sectors, when I try to read an object from OSD direct, I get Input/output errror. dmesg: > > [1214525.670065] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) > [1214525.670072] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) > [1214525.670100] sd 0:0:2:0: [sdc] Unhandled sense code > [1214525.670104] sd 0:0:2:0: [sdc] > [1214525.670107] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > [1214525.670110] sd 0:0:2:0: [sdc] > [1214525.670112] Sense Key : Medium Error [current] > [1214525.670117] Info fld=0x60c8f21 > [1214525.670120] sd 0:0:2:0: [sdc] > [1214525.670123] Add. Sense: Unrecovered read error > [1214525.670126] sd 0:0:2:0: [sdc] CDB: > [1214525.670128] Read(16): 88 00 00 00 00 00 06 0c 8f 20 00 00 00 08 00 00 > > Okay I known need to replace HDD. > > Fragment of ceph -s output: > pgmap v922039: 856 pgs: 855 active+clean, 1 active+clean+inconsistent; > > ceph pg dump | grep inconsistent > > 11.15d 25443 0 0 0 6185091790 3001 3001 active+clean+inconsistent 2013-11-06 02:30:45.23416..... > > ceph pg map 11.15d > > osdmap e1600 pg 11.15d (11.15d) -> up [0,8,3] acting [0,8,3] > > pg repair or deep-scrub can not fix this issue. But if I understand correctly, osd has to known it can not retrieve object from osd.0 and need to be replicate an another osd because there is no 3 working replicas now. > > Thank you, > Mihaly > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com