OSD is crashing while running admin socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 9 Sep 2014, Somnath Roy wrote:
> Created the following tracker and assigned to me.
> 
> http://tracker.ceph.com/issues/9384

By the way, this might be the same as or similar to
http://tracker.ceph.com/issues/8885

Thanks!
sage


> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Samuel Just [mailto:sam.just at inktank.com]
> Sent: Monday, September 08, 2014 5:22 PM
> To: Somnath Roy
> Cc: Sage Weil (sweil at redhat.com); ceph-devel at vger.kernel.org; ceph-users at lists.ceph.com
> Subject: Re: OSD is crashing while running admin socket
> 
> That seems reasonable.  Bug away!
> -Sam
> 
> On Mon, Sep 8, 2014 at 5:11 PM, Somnath Roy <Somnath.Roy at sandisk.com> wrote:
> > Hi Sage/Sam,
> >
> >
> >
> > I faced a crash in OSD with latest Ceph master. Here is the log trace
> > for the same.
> >
> >
> >
> > ceph version 0.85-677-gd5777c4
> > (d5777c421548e7f039bb2c77cb0df2e9c7404723)
> >
> > 1: ceph-osd() [0x990def]
> >
> > 2: (()+0xfbb0) [0x7f72ae6e6bb0]
> >
> > 3: (gsignal()+0x37) [0x7f72acc08f77]
> >
> > 4: (abort()+0x148) [0x7f72acc0c5e8]
> >
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f72ad5146e5]
> >
> > 6: (()+0x5e856) [0x7f72ad512856]
> >
> > 7: (()+0x5e883) [0x7f72ad512883]
> >
> > 8: (()+0x5eaae) [0x7f72ad512aae]
> >
> > 9: (ceph::buffer::list::substr_of(ceph::buffer::list const&, unsigned
> > int, unsigned int)+0x277) [0xa88747]
> >
> > 10: (ceph::buffer::list::write(int, int, std::ostream&) const+0x81)
> > [0xa89541]
> >
> > 11: (operator<<(std::ostream&, OSDOp const&)+0x1f6) [0x717a16]
> >
> > 12: (MOSDOp::print(std::ostream&) const+0x172) [0x6e5e32]
> >
> > 13: (TrackedOp::dump(utime_t, ceph::Formatter*) const+0x223)
> > [0x6b6483]
> >
> > 14: (OpTracker::dump_ops_in_flight(ceph::Formatter*)+0xa7) [0x6b7057]
> >
> > 15: (OSD::asok_command(std::string, std::map<std::string,
> > boost::variant<std::string, bool, long, double,
> > std::vector<std::string, std::allocator<std::string> >,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_>, std::less<std::string>,
> > std::allocator<std::pair<std::string const,
> > boost::variant<std::string, bool, long, double,
> > std::vector<std::string, std::allocator<std::string> >,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_> > > >&, std::string,
> > std::ostream&)+0x1d7) [0x612cb7]
> >
> > 16: (OSDSocketHook::call(std::string, std::map<std::string,
> > boost::variant<std::string, bool, long, double,
> > std::vector<std::string, std::allocator<std::string> >,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_>, std::less<std::string>,
> > std::allocator<std::pair<std::string const,
> > boost::variant<std::string, bool, long, double,
> > std::vector<std::string, std::allocator<std::string> >,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_, boost::detail::variant::void_,
> > boost::detail::variant::void_> > > >&, std::string,
> > ceph::buffer::list&)+0x67) [0x67c8b7]
> >
> > 17: (AdminSocket::do_accept()+0x1007) [0xa79817]
> >
> > 18: (AdminSocket::entry()+0x258) [0xa7b448]
> >
> > 19: (()+0x7f6e) [0x7f72ae6def6e]
> >
> > 20: (clone()+0x6d) [0x7f72acccc9cd]
> >
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> >
> >
> > Steps to reproduce:
> >
> > -----------------------
> >
> >
> >
> > 1.       Run ios
> >
> > 2.       While ios running , run the following command continuously.
> >
> >
> >
> > ?ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight?
> >
> >
> >
> > 3.       At some point the osd will be crashed.
> >
> >
> >
> > I think I have root caused it..
> >
> >
> >
> > 1.       OpTracker::RemoveOnDelete::operator() is calling
> > op->_unregistered() which clears out message->data() and payload
> >
> > 2.       After that, if optracking is enabled we are calling
> > unregister_inflight_op() which removed the op from the xlist.
> >
> > 3.       Now, while dumping ops, we are calling
> > _dump_op_descriptor_unlocked() from TrackedOP::dump, which tries to
> > print the message.
> >
> > 4.       So, there is a race condition when it tries to print the message
> > whoes ops (data) field is already cleared.
> >
> >
> >
> > Fix could be, call this op->_unregistered (in case optracking is
> > enabled) after it is removed from xlist.
> >
> >
> >
> > With this fix, I am not getting the crash anymore.
> >
> >
> >
> > If my observation is correct, please let me know. I will raise a bug
> > and will fix that as part of the overall optracker performance
> > improvement (I will submit that pull request soon).
> >
> >
> >
> > Thanks & Regards
> >
> > Somnath
> >
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message
> > is intended only for the use of the designated recipient(s) named
> > above. If the reader of this message is not the intended recipient,
> > you are hereby notified that you have received this message in error
> > and that any review, dissemination, distribution, or copying of this
> > message is strictly prohibited. If you have received this
> > communication in error, please notify the sender by telephone or
> > e-mail (as shown above) immediately and destroy any and all copies of
> > this message in your possession (whether hard copies or electronically stored copies).
> >
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> 


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux