Re: OSDs going down/up at random

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/01/2018 4:24 PM, Sam Huracan wrote:
> Hi Mike,
>
> Could you show system log at moment osd down and up?
Ok so I have no idea how I missed this each time I looked but the syslog
does show a problem.

I've created the dump file mentioned in the log its 29M compressed so
any one who wants it I'll have to more directly send it.

Mike

------
Jan 10 15:56:31 pve ceph-osd[2722]: 2018-01-10 15:56:31.338068
7efe5eac1700 -1 abort: Corruption: block checksum mismatch
Jan 10 15:56:31 pve ceph-osd[2722]: *** Caught signal (Aborted) **
Jan 10 15:56:31 pve ceph-osd[2722]:  in thread 7efe5eac1700
thread_name:tp_osd_tp
Jan 10 15:56:31 pve ceph-osd[2722]:  ceph version 12.2.2
(215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 10 15:56:31 pve ceph-osd[2722]:  1: (()+0xa16664) [0x55a8b396b664]
Jan 10 15:56:31 pve ceph-osd[2722]:  2: (()+0x110c0) [0x7efe796b70c0]
Jan 10 15:56:31 pve ceph-osd[2722]:  3: (gsignal()+0xcf) [0x7efe7867efcf]
Jan 10 15:56:31 pve ceph-osd[2722]:  4: (abort()+0x16a) [0x7efe786803fa]
Jan 10 15:56:31 pve ceph-osd[2722]:  5:
(RocksDBStore::get(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, char const*,
unsigned long, ceph::buffer::list*)+0x29f) [0x55a8b38a995f]
Jan 10 15:56:31 pve ceph-osd[2722]:  6:
(BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae)
[0x55a8b382d2ae]
Jan 10 15:56:31 pve ceph-osd[2722]:  7:
(BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x55a8b382e326]
Jan 10 15:56:31 pve ceph-osd[2722]:  8:
(PGBackend::objects_get_attr(hobject_t const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, ceph::buffer::list*)+0x106) [0x55a8b35bde26]
Jan 10 15:56:31 pve ceph-osd[2722]:  9:
(PrimaryLogPG::get_snapset_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*, bool)+0x3fb) [0x55a8b35081db]
Jan 10 15:56:31 pve ceph-osd[2722]:  10:
(PrimaryLogPG::get_object_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*)+0xc39) [0x55a8b352fec9]
Jan 10 15:56:31 pve ceph-osd[2722]:  11:
(PrimaryLogPG::find_object_context(hobject_t const&,
std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x387)
[0x55a8b3533687]
Jan 10 15:56:31 pve ceph-osd[2722]:  12:
(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2214)
[0x55a8b3571694]
Jan 10 15:56:31 pve ceph-osd[2722]:  13:
(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0xec6) [0x55a8b352c436]
Jan 10 15:56:31 pve ceph-osd[2722]:  14:
(OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab)
[0x55a8b33a99eb]
Jan 10 15:56:31 pve ceph-osd[2722]:  15:
(PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x5a) [0x55a8b3647eba]
Jan 10 15:56:31 pve ceph-osd[2722]:  16:
(OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x103d) [0x55a8b33d0f4d]
Jan 10 15:56:31 pve ceph-osd[2722]:  17:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef)
[0x55a8b39b806f]
Jan 10 15:56:31 pve ceph-osd[2722]:  18:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55a8b39bb370]
Jan 10 15:56:31 pve ceph-osd[2722]:  19: (()+0x7494) [0x7efe796ad494]
Jan 10 15:56:31 pve ceph-osd[2722]:  20: (clone()+0x3f) [0x7efe78734aff]
Jan 10 15:56:31 pve ceph-osd[2722]: 2018-01-10 15:56:31.343532
7efe5eac1700 -1 *** Caught signal (Aborted) **
Jan 10 15:56:31 pve ceph-osd[2722]:  in thread 7efe5eac1700
thread_name:tp_osd_tp
Jan 10 15:56:31 pve ceph-osd[2722]:  ceph version 12.2.2
(215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 10 15:56:31 pve ceph-osd[2722]:  1: (()+0xa16664) [0x55a8b396b664]
Jan 10 15:56:31 pve ceph-osd[2722]:  2: (()+0x110c0) [0x7efe796b70c0]
Jan 10 15:56:31 pve ceph-osd[2722]:  3: (gsignal()+0xcf) [0x7efe7867efcf]
Jan 10 15:56:31 pve ceph-osd[2722]:  4: (abort()+0x16a) [0x7efe786803fa]
Jan 10 15:56:31 pve ceph-osd[2722]:  5:
(RocksDBStore::get(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, char const*,
unsigned long, ceph::buffer::list*)+0x29f) [0x55a8b38a995f]
Jan 10 15:56:31 pve ceph-osd[2722]:  6:
(BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae)
[0x55a8b382d2ae]
Jan 10 15:56:31 pve ceph-osd[2722]:  7:
(BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x55a8b382e326]
Jan 10 15:56:31 pve ceph-osd[2722]:  8:
(PGBackend::objects_get_attr(hobject_t const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, ceph::buffer::list*)+0x106) [0x55a8b35bde26]
Jan 10 15:56:31 pve ceph-osd[2722]:  9:
(PrimaryLogPG::get_snapset_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*, bool)+0x3fb) [0x55a8b35081db]
Jan 10 15:56:31 pve ceph-osd[2722]:  10:
(PrimaryLogPG::get_object_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*)+0xc39) [0x55a8b352fec9]
Jan 10 15:56:31 pve ceph-osd[2722]:  11:
(PrimaryLogPG::find_object_context(hobject_t const&,
std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x387)
[0x55a8b3533687]
Jan 10 15:56:31 pve ceph-osd[2722]:  12:
(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2214)
[0x55a8b3571694]
Jan 10 15:56:31 pve ceph-osd[2722]:  13:
(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0xec6) [0x55a8b352c436]
Jan 10 15:56:31 pve ceph-osd[2722]:  14:
(OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab)
[0x55a8b33a99eb]
Jan 10 15:56:31 pve ceph-osd[2722]:  15:
(PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x5a) [0x55a8b3647eba]
Jan 10 15:56:31 pve ceph-osd[2722]:  16:
(OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x103d) [0x55a8b33d0f4d]
Jan 10 15:56:31 pve ceph-osd[2722]:  17:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef)
[0x55a8b39b806f]
Jan 10 15:56:31 pve ceph-osd[2722]:  18:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55a8b39bb370]
Jan 10 15:56:31 pve ceph-osd[2722]:  19: (()+0x7494) [0x7efe796ad494]
Jan 10 15:56:31 pve ceph-osd[2722]:  20: (clone()+0x3f) [0x7efe78734aff]
Jan 10 15:56:31 pve ceph-osd[2722]:  NOTE: a copy of the executable, or
`objdump -rdS <executable>` is needed to interpret this.
Jan 10 15:56:31 pve systemd[1]: ceph-osd@12.service: Main process
exited, code=killed, status=6/ABRT
Jan 10 15:56:31 pve systemd[1]: ceph-osd@12.service: Unit entered failed
state.
Jan 10 15:56:31 pve systemd[1]: ceph-osd@12.service: Failed with result
'signal'.
Jan 10 15:56:31 pve kernel: [171262.263294] libceph: osd12 down
Jan 10 15:56:51 pve systemd[1]: ceph-osd@12.service: Service hold-off
time over, scheduling restart.
Jan 10 15:56:51 pve systemd[1]: Stopped Ceph object storage daemon osd.12.
Jan 10 15:56:51 pve systemd[1]: Starting Ceph object storage daemon
osd.12...
Jan 10 15:56:51 pve systemd[1]: Started Ceph object storage daemon osd.12.
Jan 10 15:56:51 pve ceph-osd[26121]: starting osd.12 at - osd_data
/var/lib/ceph/osd/ceph-12 /var/lib/ceph/osd/ceph-12/journal
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux