>>> 马忠明 <manian1987@xxxxxxx> schrieb am Sonntag, 20. November 2016 um 12:16: > Hi guys, > So our cluster always got osd down due to medium error.Our current action > plan is to replace the defective disk drive.But I was wondering whether it's > too sensitive for ceph to take it down.Or whether our action plan was too > simple and crude.Any advice for this issue will be appreciated. > No, your plan is correct. Replacing cluster components during normal operation, that's what ceph was made for. Do ceph osd out osd.<x> stop ceph-osd id=<x> ceph osd crush remove osd.<x> ceph auth del osd.<x> ceph osd rm osd.<x> for the specific osd. Replace the disk (it's hot pluggable, isn't it?) and configure a new osd. That's it. Regards Steffen > > medium error from dmesg: > [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm] > [Sun Nov 20 15:52:10 2016] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm] > [Sun Nov 20 15:52:10 2016] Sense Key : Medium Error [current] > [Sun Nov 20 15:52:10 2016] Info fld=0x235f23e0 > [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm] > [Sun Nov 20 15:52:10 2016] Add. Sense: Unrecovered read error > [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm] CDB: > [Sun Nov 20 15:52:10 2016] Read(10): 28 00 23 5f 23 60 00 02 30 00 > [Sun Nov 20 15:52:10 2016] end_request: critical medium error, dev sdm, > sector 593437664 > > > > > osd log always shows after deep-scrub,osd caught read error. > > -3> 2016-11-20 16:54:39.740795 7f71f7e75700 0 log_channel(cluster) log [INF] > : 13.7e9 deep-scrub starts > -2> 2016-11-20 16:54:41.958706 7f71f7e75700 0 log_channel(cluster) log > [INF] : 13.7e9 deep-scrub ok > -1> 2016-11-20 16:54:48.740180 7f71f7e75700 0 log_channel(cluster) log > [INF] : 13.5c9 deep-scrub starts > 0> 2016-11-20 16:55:00.704106 7f71f7e75700 -1 os/FileStore.cc: In function > 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t, size_t, > ceph::bufferl > ist&, uint32_t, bool)' thread 7f71f7e75700 time 2016-11-20 16:55:00.699763 > os/FileStore.cc: 2850: FAILED assert(allow_eio || !m_filestore_fail_eio || > got != -5) > > > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x8b) [0x7f7228bad78b] > 2: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned long, > ceph::buffer::list&, unsigned int, bool)+0xc58) [0x7f722898b718] > 3: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, > ScrubMap::object&, ThreadPool::TPHandle&)+0x2f9) [0x7f7228a17279] > 4: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, > std::allocator<hobject_t> > const&, bool, unsigned int, > ThreadPool::TPHandle&)+0x2c8) [0x7f72289510a8] > 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > unsigned int, ThreadPool::TPHandle&)+0x1fa) [0x7f7228869eea] > 6: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f7228870100] > 7: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7f72288717ee] > 8: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) > [0x7f7228756069] > 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x7f7228b9e376] > 10: (ThreadPool::WorkThread::entry()+0x10) [0x7f7228b9f420] > 11: (()+0x8182) [0x7f72279ab182] > 12: (clone()+0x6d) [0x7f7225f1647d] > > > > > megacli showed medium error count. > Enclosure Device ID: 32 > Slot Number: 15 > Device Id: 15 > Sequence Number: 2 > Media Error Count: 9 > Other Error Count: 0 > Predictive Failure Count: 0 > Last Predictive Failure Event Seq Number: 0 > PD Type: SAS > Raw Size: 1.090 TB [0x8bba0cb0 Sectors] > Non Coerced Size: 1.090 TB [0x8baa0cb0 Sectors] > Coerced Size: 1.090 TB [0x8ba80000 Sectors] > Firmware state: JBOD > SAS Address(0): 0x5000c50084f2971d > SAS Address(1): 0x0 > Connected Port Number: 0(path0) -- Klinik-Service Neubrandenburg GmbH Allendestr. 30, 17036 Neubrandenburg Amtsgericht Neubrandenburg, HRB 2457 Geschaeftsfuehrerin: Gudrun Kappich _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com