Antw: ceph osd down

"Steffen Weißgerber" <WeissgerberS@xxxxxxx> · Mon, 21 Nov 2016 09:01:21 +0100

>>> 马忠明 <manian1987@xxxxxxx> schrieb am Sonntag, 20. November 2016 um
12:16:
> Hi guys,
> So our cluster always got osd down due to medium error.Our current
action 
> plan is to replace the defective disk drive.But I was wondering
whether it's 
> too sensitive for ceph to take it down.Or whether our action plan was
too 
> simple and crude.Any advice for this issue will be appreciated.
> 

No, your plan is correct. Replacing cluster components during normal
operation, that's
what ceph was made for.

Do

ceph osd out osd.<x>
stop ceph-osd id=<x>
ceph osd crush remove osd.<x>
ceph auth del osd.<x>
ceph osd rm osd.<x>

for the specific osd.
Replace the disk (it's hot pluggable, isn't it?) and configure a new
osd.

That's it.

Regards

Steffen

> 
> medium error from dmesg:
> [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm]
> [Sun Nov 20 15:52:10 2016] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
> [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm]
> [Sun Nov 20 15:52:10 2016] Sense Key : Medium Error [current]
> [Sun Nov 20 15:52:10 2016] Info fld=0x235f23e0
> [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm]
> [Sun Nov 20 15:52:10 2016] Add. Sense: Unrecovered read error
> [Sun Nov 20 15:52:10 2016] sd 0:0:15:0: [sdm] CDB:
> [Sun Nov 20 15:52:10 2016] Read(10): 28 00 23 5f 23 60 00 02 30 00
> [Sun Nov 20 15:52:10 2016] end_request: critical medium error, dev
sdm, 
> sector 593437664
> 
> 
> 
> 
> osd log always shows after deep-scrub,osd caught read error.
> 
>   -3> 2016-11-20 16:54:39.740795 7f71f7e75700  0 log_channel(cluster)
log [INF] 
> : 13.7e9 deep-scrub starts
>     -2> 2016-11-20 16:54:41.958706 7f71f7e75700  0
log_channel(cluster) log 
> [INF] : 13.7e9 deep-scrub ok
>     -1> 2016-11-20 16:54:48.740180 7f71f7e75700  0
log_channel(cluster) log 
> [INF] : 13.5c9 deep-scrub starts
>      0> 2016-11-20 16:55:00.704106 7f71f7e75700 -1 os/FileStore.cc:
In function 
> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
size_t, 
> ceph::bufferl
> ist&, uint32_t, bool)' thread 7f71f7e75700 time 2016-11-20
16:55:00.699763
> os/FileStore.cc: 2850: FAILED assert(allow_eio ||
!m_filestore_fail_eio || 
> got != -5)
> 
> 
>  ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x8b) [0x7f7228bad78b]
>  2: (FileStore::read(coll_t, ghobject_t const&, unsigned long,
unsigned long, 
> ceph::buffer::list&, unsigned int, bool)+0xc58) [0x7f722898b718]
>  3: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int,

> ScrubMap::object&, ThreadPool::TPHandle&)+0x2f9) [0x7f7228a17279]
>  4: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, 
> std::allocator<hobject_t> > const&, bool, unsigned int, 
> ThreadPool::TPHandle&)+0x2c8) [0x7f72289510a8]
>  5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,

> unsigned int, ThreadPool::TPHandle&)+0x1fa) [0x7f7228869eea]
>  6: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f7228870100]
>  7: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7f72288717ee]
>  8: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) 
> [0x7f7228756069]
>  9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56)
[0x7f7228b9e376]
>  10: (ThreadPool::WorkThread::entry()+0x10) [0x7f7228b9f420]
>  11: (()+0x8182) [0x7f72279ab182]
>  12: (clone()+0x6d) [0x7f7225f1647d]
> 
> 
> 
> 
> megacli showed medium error count.
> Enclosure Device ID: 32
> Slot Number: 15
> Device Id: 15
> Sequence Number: 2
> Media Error Count: 9
> Other Error Count: 0
> Predictive Failure Count: 0
> Last Predictive Failure Event Seq Number: 0
> PD Type: SAS
> Raw Size: 1.090 TB [0x8bba0cb0 Sectors]
> Non Coerced Size: 1.090 TB [0x8baa0cb0 Sectors]
> Coerced Size: 1.090 TB [0x8ba80000 Sectors]
> Firmware state: JBOD
> SAS Address(0): 0x5000c50084f2971d
> SAS Address(1): 0x0
> Connected Port Number: 0(path0) 

-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com