Re: v0.61.6 Cuttlefish update released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/25/2013 12:01 PM, peter@xxxxxxxxx wrote:
On 2013-07-25 11:52, Wido den Hollander wrote:
On 07/25/2013 11:46 AM, peter@xxxxxxxxx wrote:
Any news on this? I'm not sure if you guys received the link to the log
and monitor files. One monitor and osd is still crashing with the error
below.

I think you are seeing this issue: http://tracker.ceph.com/issues/5737

You can try with new packages from here:
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-5737-cuttlefish/


That should resolve it.

Wido

Hi Wido,

This is the same issue I reported earlier with 0.61.5. I applied the
above package and the problem was solved. Then 0.61.6 was released with
a fix for this issue. I installed 0.61.6 and the issue is back on one of
my monitors and I have one osd crashing. So, it seems the bug is still
there in 0.61.6 or it is a new bug. It seems the guys from Inktank
haven't picked this up yet.


It has been picked up, Sage mentioned this yesterday on the dev list:

"This is fixed in the cuttlefish branch as of earlier this afternoon. I've spent most of the day expanding the automated test suite to include upgrade combinations to trigger this and *finally* figured out that this particular problem seems to surface on clusters that upgraded from bobtail-> cuttlefish but not clusters created on cuttlefish.

If you've run into this issue, please use the cuttlefish branch build for now. We will have a release out in the next day or so that includes this and a few other pending fixes.

I'm sorry we missed this one! The upgrade test matrix I've been working on today should catch this type of issue in the future."

Wido

Regards,



On 2013-07-24 09:57, peter@xxxxxxxxx wrote:
Hi Sage,

I just had a 0.61.6 monitor crash and one osd. The mon and all osds
restarted just fine after the update but it decided to crash after 15
minutes orso. See a snippet of the logfile below. I have you sent a
link to the logfiles and monitor store. It seems the bug hasn't been
fully fixed or something else is going on. I have to note though that
I had one monitor with a clock skew warning for a few minutes (this
happened because of a reboot it was fixed by ntp). So beware when
upgrading.

Cheers,

mon:

--- begin dump of recent events ---
     0> 2013-07-24 09:42:57.655257 7f262392e780 -1 *** Caught signal
(Aborted) **
 in thread 7f262392e780

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-mon() [0x597cfa]
 2: (()+0xfcb0) [0x7f2622fc8cb0]
 3: (gsignal()+0x35) [0x7f2621b9e425]
 4: (abort()+0x17b) [0x7f2621ba1b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f26224f069d]
 6: (()+0xb5846) [0x7f26224ee846]
 7: (()+0xb5873) [0x7f26224ee873]
 8: (()+0xb596e) [0x7f26224ee96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x64ffaf]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 13: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 14: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 15: (main()+0x1c19) [0x4835c9]
 16: (__libc_start_main()+0xed) [0x7f2621b8976d]
 17: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---
2013-07-24 09:42:57.935730 7fb08d67a780  0 ceph version 0.61.6
(59ddece17e36fef69ecf40e239aeffad33c9db35), process ceph-mon, pid
19878
2013-07-24 09:42:57.943330 7fb08d67a780  1 mon.ceph3@-1(probing) e1
preinit fsid 97e515bb-d334-4fa7-8b53-7d85615809fd
2013-07-24 09:42:57.966551 7fb08d67a780 -1 mon/OSDMonitor.cc: In
function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
7fb08d67a780 time 2013-07-24 09:42:57.964379
mon/OSDMonitor.cc: 167: FAILED assert(latest_bl.length() != 0)

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 2: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 4: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 5: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 6: (main()+0x1c19) [0x4835c9]
 7: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 8: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
   -25> 2013-07-24 09:42:57.933545 7fb08d67a780  5 asok(0x1a1e000)
register_command perfcounters_dump hook 0x1a13010
   -24> 2013-07-24 09:42:57.933581 7fb08d67a780  5 asok(0x1a1e000)
register_command 1 hook 0x1a13010
   -23> 2013-07-24 09:42:57.933584 7fb08d67a780  5 asok(0x1a1e000)
register_command perf dump hook 0x1a13010
   -22> 2013-07-24 09:42:57.933592 7fb08d67a780  5 asok(0x1a1e000)
register_command perfcounters_schema hook 0x1a13010
   -21> 2013-07-24 09:42:57.933595 7fb08d67a780  5 asok(0x1a1e000)
register_command 2 hook 0x1a13010
   -20> 2013-07-24 09:42:57.933597 7fb08d67a780  5 asok(0x1a1e000)
register_command perf schema hook 0x1a13010
   -19> 2013-07-24 09:42:57.933601 7fb08d67a780  5 asok(0x1a1e000)
register_command config show hook 0x1a13010
   -18> 2013-07-24 09:42:57.933604 7fb08d67a780  5 asok(0x1a1e000)
register_command config set hook 0x1a13010
   -17> 2013-07-24 09:42:57.933606 7fb08d67a780  5 asok(0x1a1e000)
register_command log flush hook 0x1a13010
   -16> 2013-07-24 09:42:57.933609 7fb08d67a780  5 asok(0x1a1e000)
register_command log dump hook 0x1a13010
   -15> 2013-07-24 09:42:57.933612 7fb08d67a780  5 asok(0x1a1e000)
register_command log reopen hook 0x1a13010
   -14> 2013-07-24 09:42:57.935730 7fb08d67a780  0 ceph version
0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35), process ceph-mon,
pid 19878
   -13> 2013-07-24 09:42:57.938697 7fb08d67a780  5 asok(0x1a1e000)
init /var/run/ceph/ceph-mon.ceph3.asok
   -12> 2013-07-24 09:42:57.938732 7fb08d67a780  5 asok(0x1a1e000)
bind_and_listen /var/run/ceph/ceph-mon.ceph3.asok
   -11> 2013-07-24 09:42:57.938775 7fb08d67a780  5 asok(0x1a1e000)
register_command 0 hook 0x1a120b0
   -10> 2013-07-24 09:42:57.938793 7fb08d67a780  5 asok(0x1a1e000)
register_command version hook 0x1a120b0
    -9> 2013-07-24 09:42:57.938804 7fb08d67a780  5 asok(0x1a1e000)
register_command git_version hook 0x1a120b0
    -8> 2013-07-24 09:42:57.938815 7fb08d67a780  5 asok(0x1a1e000)
register_command help hook 0x1a130d0
    -7> 2013-07-24 09:42:57.940462 7fb08971b700  5 asok(0x1a1e000)
entry start
    -6> 2013-07-24 09:42:57.943200 7fb08d67a780  1 --
10.255.0.30:6789/0 learned my addr 10.255.0.30:6789/0
    -5> 2013-07-24 09:42:57.943227 7fb08d67a780  1
accepter.accepter.bind my_inst.addr is 10.255.0.30:6789/0 need_addr=0
    -4> 2013-07-24 09:42:57.943265 7fb08d67a780  5 adding auth
protocol: cephx
    -3> 2013-07-24 09:42:57.943279 7fb08d67a780  5 adding auth
protocol: cephx
    -2> 2013-07-24 09:42:57.943330 7fb08d67a780  1
mon.ceph3@-1(probing) e1 preinit fsid
97e515bb-d334-4fa7-8b53-7d85615809fd
    -1> 2013-07-24 09:42:57.963915 7fb08d67a780  4
mon.ceph3@-1(probing).mds e228053 new map
     0> 2013-07-24 09:42:57.966551 7fb08d67a780 -1 mon/OSDMonitor.cc:
In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
7fb08d67a780 time 2013-07-24 09:42:57.964379
mon/OSDMonitor.cc: 167: FAILED assert(latest_bl.length() != 0)

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 2: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 4: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 5: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 6: (main()+0x1c19) [0x4835c9]
 7: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 8: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---
2013-07-24 09:42:57.973296 7fb08d67a780 -1 *** Caught signal
(Aborted) **
 in thread 7fb08d67a780

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-mon() [0x597cfa]
 2: (()+0xfcb0) [0x7fb08cd14cb0]
 3: (gsignal()+0x35) [0x7fb08b8ea425]
 4: (abort()+0x17b) [0x7fb08b8edb8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb08c23c69d]
 6: (()+0xb5846) [0x7fb08c23a846]
 7: (()+0xb5873) [0x7fb08c23a873]
 8: (()+0xb596e) [0x7fb08c23a96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x64ffaf]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 13: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 14: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 15: (main()+0x1c19) [0x4835c9]
 16: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 17: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
     0> 2013-07-24 09:42:57.973296 7fb08d67a780 -1 *** Caught signal
(Aborted) **
 in thread 7fb08d67a780

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-mon() [0x597cfa]
 2: (()+0xfcb0) [0x7fb08cd14cb0]
 3: (gsignal()+0x35) [0x7fb08b8ea425]
 4: (abort()+0x17b) [0x7fb08b8edb8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb08c23c69d]
 6: (()+0xb5846) [0x7fb08c23a846]
 7: (()+0xb5873) [0x7fb08c23a873]
 8: (()+0xb596e) [0x7fb08c23a96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x64ffaf]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 13: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 14: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 15: (main()+0x1c19) [0x4835c9]
 16: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 17: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---

osd:

--- begin dump of recent events ---
     0> 2013-07-24 09:21:34.087645 7f3c2fa7d700 -1 *** Caught signal
(Aborted) **
 in thread 7f3c2fa7d700

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-osd() [0x79219a]
 2: (()+0xfcb0) [0x7f3c40aaacb0]
 3: (gsignal()+0x35) [0x7f3c3f263425]
 4: (abort()+0x17b) [0x7f3c3f266b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3c3fbb569d]
 6: (()+0xb5846) [0x7f3c3fbb3846]
 7: (()+0xb5873) [0x7f3c3fbb3873]
 8: (()+0xb596e) [0x7f3c3fbb396e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x84303f]
 10: (OSDService::get_map(unsigned int)+0x428) [0x630c88]
 11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>,
std::less<boost::intrusive_ptr<PG> >,
std::allocator<boost::intrusive_ptr<PG> > >*)+0x11d) [0x6327bd]
 12: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*>
const&, ThreadPool::TPHandle&)+0x244) [0x632f14]
 13: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2]
 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476]
 15: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0]
 16: (()+0x7e9a) [0x7f3c40aa2e9a]
 17: (clone()+0x6d) [0x7f3c3f320ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.6.log
--- end dump of recent events ---

On 2013-07-24 06:47, Sage Weil wrote:
There was a problem with the monitor daemons in v0.61.5 that would
prevent
them from restarting after some period of time.  This release fixes
the
bug and works around the issue to allow affected monitors to restart.
All v0.61.5 users are strongly recommended to upgrade.
Thanks everyone who helped track the problem down!
Notable changes:
* mon: record latest full osdmap
 * mon: work around previous bug in which latest full osdmap was not
   recorded
 * mon: avoid scrub while paxos is updating
For more information please see the complete release notes:
* http://ceph.com/docs/master/release-notes/#v0-61-6-cuttlefish
You can get v0.61.6 from the usual locations:
* Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.61.6.tar.gz
 * For Debian/Ubuntu packages, see
http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux