Re: Read Errors and OSD Flapping

Nick Fisk <nick@xxxxxxxxxx> · Tue, 2 Jun 2015 18:43:59 +0100

> -----Original Message-----
> From: Gregory Farnum [mailto:greg@xxxxxxxxxxx]
> Sent: 02 June 2015 18:34
> To: Nick Fisk
> Cc: ceph-users
> Subject: Re:  Read Errors and OSD Flapping
> 
> On Sat, May 30, 2015 at 2:23 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >
> > Hi All,
> >
> >
> >
> > I was noticing poor performance on my cluster and when I went to
> > investigate I noticed OSD 29 was flapping up and down. On
> > investigation it looks like it has 2 pending sectors, kernel log is
> > filled with the following
> >
> >
> >
> > end_request: critical medium error, dev sdk, sector 4483365656
> >
> > end_request: critical medium error, dev sdk, sector 4483365872
> >
> >
> >
> > I can see in the OSD logs that it looked like when the OSD was crashing it
> was trying to scrub the PG, probably failing when the kernel passes up the
> read error.
> >
> >
> >
> > ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >
> > 1: /usr/bin/ceph-osd() [0xacaf4a]
> >
> > 2: (()+0x10340) [0x7fdc43032340]
> >
> > 3: (gsignal()+0x39) [0x7fdc414d1cc9]
> >
> > 4: (abort()+0x148) [0x7fdc414d50d8]
> >
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fdc41ddc6b5]
> >
> > 6: (()+0x5e836) [0x7fdc41dda836]
> >
> > 7: (()+0x5e863) [0x7fdc41dda863]
> >
> > 8: (()+0x5eaa2) [0x7fdc41ddaaa2]
> >
> > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x278) [0xbc2908]
> >
> > 10: (FileStore::read(coll_t, ghobject_t const&, unsigned long,
> > unsigned long, ceph::buffer::list&, unsigned int, bool)+0xc98)
> > [0x9168e
> >
> > 8]
> >
> > 11: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int,
> > ScrubMap::object&, ThreadPool::TPHandle&)+0x2f9) [0xa05bf9]
> >
> > 12: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t,
> > std::allocator<hobject_t> > const&, bool, unsigned int,
> > ThreadPool::TPH
> >
> > andle&)+0x2c8) [0x8dab98]
> >
> > 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> > unsigned int, ThreadPool::TPHandle&)+0x1fa) [0x7f099a]
> >
> > 14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x4a2)
> > [0x7f1132]
> >
> > 15: (OSD::RepScrubWQ::_process(MOSDRepScrub*,
> > ThreadPool::TPHandle&)+0xbe) [0x6e583e]
> >
> > 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb38ae]
> >
> > 17: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> >
> > 18: (()+0x8182) [0x7fdc4302a182]
> >
> > 19: (clone()+0x6d) [0x7fdc4159547d]
> >
> >
> >
> > Few questions:
> >
> > 1.       Is this the expected behaviour, or should Ceph try and do something
> to either keep the OSD down or rewrite the sector to cause a sector remap?
> 
> So the OSD is committing suicide and we want it to stay dead. But the init
> system is restarting it. We are actually discussing how that should change
> right now, but aren't quite sure what the right settings
> are: http://tracker.ceph.com/issues/11798
> 
> Presuming you still have the logs, how long was the cycle time for it to
> suicide, restart, and suicide again?

Just looking through a few examples of it. It looks like it took about 2 seconds from suicide to restart and then about 5 minutes till it died again.

I have taken a copy of the log, let me know if it's of any use to you.

> 
> >
> > 2.       I am monitoring smart stats, but is there any other way of picking this
> up or getting Ceph to highlight it? Something like a flapping OSD notification
> would be nice.
> >
> > 3.       I’m assuming at this stage this disk will not be replaceable under
> warranty, am I best to mark it as out, let it drain and then re-introduce it
> again, which should overwrite the sector and cause a remap? Or is there a
> better way?
> 
> I'm not really sure about these ones. I imagine most users are covering it via
> nagios monitoring of the processes themselves?

> -Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com