Re: OSD is crashing while running admin socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That seems reasonable.  Bug away!
-Sam

On Mon, Sep 8, 2014 at 5:11 PM, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote:
> Hi Sage/Sam,
>
>
>
> I faced a crash in OSD with latest Ceph master. Here is the log trace for
> the same.
>
>
>
> ceph version 0.85-677-gd5777c4 (d5777c421548e7f039bb2c77cb0df2e9c7404723)
>
> 1: ceph-osd() [0x990def]
>
> 2: (()+0xfbb0) [0x7f72ae6e6bb0]
>
> 3: (gsignal()+0x37) [0x7f72acc08f77]
>
> 4: (abort()+0x148) [0x7f72acc0c5e8]
>
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f72ad5146e5]
>
> 6: (()+0x5e856) [0x7f72ad512856]
>
> 7: (()+0x5e883) [0x7f72ad512883]
>
> 8: (()+0x5eaae) [0x7f72ad512aae]
>
> 9: (ceph::buffer::list::substr_of(ceph::buffer::list const&, unsigned int,
> unsigned int)+0x277) [0xa88747]
>
> 10: (ceph::buffer::list::write(int, int, std::ostream&) const+0x81)
> [0xa89541]
>
> 11: (operator<<(std::ostream&, OSDOp const&)+0x1f6) [0x717a16]
>
> 12: (MOSDOp::print(std::ostream&) const+0x172) [0x6e5e32]
>
> 13: (TrackedOp::dump(utime_t, ceph::Formatter*) const+0x223) [0x6b6483]
>
> 14: (OpTracker::dump_ops_in_flight(ceph::Formatter*)+0xa7) [0x6b7057]
>
> 15: (OSD::asok_command(std::string, std::map<std::string,
> boost::variant<std::string, bool, long, double, std::vector<std::string,
> std::allocator<std::string> >, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_>,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> boost::variant<std::string, bool, long, double, std::vector<std::string,
> std::allocator<std::string> >, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_> > > >&,
> std::string, std::ostream&)+0x1d7) [0x612cb7]
>
> 16: (OSDSocketHook::call(std::string, std::map<std::string,
> boost::variant<std::string, bool, long, double, std::vector<std::string,
> std::allocator<std::string> >, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_>,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> boost::variant<std::string, bool, long, double, std::vector<std::string,
> std::allocator<std::string> >, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_,
> boost::detail::variant::void_, boost::detail::variant::void_> > > >&,
> std::string, ceph::buffer::list&)+0x67) [0x67c8b7]
>
> 17: (AdminSocket::do_accept()+0x1007) [0xa79817]
>
> 18: (AdminSocket::entry()+0x258) [0xa7b448]
>
> 19: (()+0x7f6e) [0x7f72ae6def6e]
>
> 20: (clone()+0x6d) [0x7f72acccc9cd]
>
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
>
>
> Steps to reproduce:
>
> -----------------------
>
>
>
> 1.       Run ios
>
> 2.       While ios running , run the following command continuously.
>
>
>
> “ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight”
>
>
>
> 3.       At some point the osd will be crashed.
>
>
>
> I think I have root caused it..
>
>
>
> 1.       OpTracker::RemoveOnDelete::operator() is calling
> op->_unregistered() which clears out message->data() and payload
>
> 2.       After that, if optracking is enabled we are calling
> unregister_inflight_op() which removed the op from the xlist.
>
> 3.       Now, while dumping ops, we are calling
> _dump_op_descriptor_unlocked() from TrackedOP::dump, which tries to print
> the message.
>
> 4.       So, there is a race condition when it tries to print the message
> whoes ops (data) field is already cleared.
>
>
>
> Fix could be, call this op->_unregistered (in case optracking is enabled)
> after it is removed from xlist.
>
>
>
> With this fix, I am not getting the crash anymore.
>
>
>
> If my observation is correct, please let me know. I will raise a bug and
> will fix that as part of the overall optracker performance improvement (I
> will submit that pull request soon).
>
>
>
> Thanks & Regards
>
> Somnath
>
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux