Re: v0.61.6 Cuttlefish update released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013-07-25 12:08, Wido den Hollander wrote:
On 07/25/2013 12:01 PM, peter@xxxxxxxxx wrote:
On 2013-07-25 11:52, Wido den Hollander wrote:
On 07/25/2013 11:46 AM, peter@xxxxxxxxx wrote:
Any news on this? I'm not sure if you guys received the link to the log and monitor files. One monitor and osd is still crashing with the error
below.

I think you are seeing this issue: http://tracker.ceph.com/issues/5737

You can try with new packages from here:
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-5737-cuttlefish/


That should resolve it.

Wido

Hi Wido,

This is the same issue I reported earlier with 0.61.5. I applied the
above package and the problem was solved. Then 0.61.6 was released with a fix for this issue. I installed 0.61.6 and the issue is back on one of my monitors and I have one osd crashing. So, it seems the bug is still
there in 0.61.6 or it is a new bug. It seems the guys from Inktank
haven't picked this up yet.


It has been picked up, Sage mentioned this yesterday on the dev list:

"This is fixed in the cuttlefish branch as of earlier this afternoon.
I've spent most of the day expanding the automated test suite to
include upgrade combinations to trigger this and *finally* figured out
that this particular problem seems to surface on clusters that
upgraded from bobtail-> cuttlefish but not clusters created on
cuttlefish.

If you've run into this issue, please use the cuttlefish branch build
for now.  We will have a release out in the next day or so that
includes this and a few other pending fixes.

I'm sorry we missed this one!  The upgrade test matrix I've been
working on today should catch this type of issue in the future."

Wido

Regards,

We created this cluster on cuttlefish and not on bobtail so it doesn't apply. I'm not sure if it is clear what I am trying to say or that I'm missing something here but I still see this issue either way :-)

I will check out the dev list also but perhaps someone from Inktank can at least look at the files I provided.

Peter




On 2013-07-24 09:57, peter@xxxxxxxxx wrote:
Hi Sage,

I just had a 0.61.6 monitor crash and one osd. The mon and all osds restarted just fine after the update but it decided to crash after 15 minutes orso. See a snippet of the logfile below. I have you sent a link to the logfiles and monitor store. It seems the bug hasn't been fully fixed or something else is going on. I have to note though that I had one monitor with a clock skew warning for a few minutes (this
happened because of a reboot it was fixed by ntp). So beware when
upgrading.

Cheers,

mon:

--- begin dump of recent events ---
0> 2013-07-24 09:42:57.655257 7f262392e780 -1 *** Caught signal
(Aborted) **
 in thread 7f262392e780

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-mon() [0x597cfa]
 2: (()+0xfcb0) [0x7f2622fc8cb0]
 3: (gsignal()+0x35) [0x7f2621b9e425]
 4: (abort()+0x17b) [0x7f2621ba1b8b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f26224f069d]
 6: (()+0xb5846) [0x7f26224ee846]
 7: (()+0xb5873) [0x7f26224ee873]
 8: (()+0xb596e) [0x7f26224ee96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x64ffaf]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 13: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 14: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 15: (main()+0x1c19) [0x4835c9]
 16: (__libc_start_main()+0xed) [0x7f2621b8976d]
 17: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---
2013-07-24 09:42:57.935730 7fb08d67a780  0 ceph version 0.61.6
(59ddece17e36fef69ecf40e239aeffad33c9db35), process ceph-mon, pid
19878
2013-07-24 09:42:57.943330 7fb08d67a780 1 mon.ceph3@-1(probing) e1
preinit fsid 97e515bb-d334-4fa7-8b53-7d85615809fd
2013-07-24 09:42:57.966551 7fb08d67a780 -1 mon/OSDMonitor.cc: In
function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
7fb08d67a780 time 2013-07-24 09:42:57.964379
mon/OSDMonitor.cc: 167: FAILED assert(latest_bl.length() != 0)

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 2: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 4: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 5: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 6: (main()+0x1c19) [0x4835c9]
 7: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 8: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
   -25> 2013-07-24 09:42:57.933545 7fb08d67a780  5 asok(0x1a1e000)
register_command perfcounters_dump hook 0x1a13010
   -24> 2013-07-24 09:42:57.933581 7fb08d67a780  5 asok(0x1a1e000)
register_command 1 hook 0x1a13010
   -23> 2013-07-24 09:42:57.933584 7fb08d67a780  5 asok(0x1a1e000)
register_command perf dump hook 0x1a13010
   -22> 2013-07-24 09:42:57.933592 7fb08d67a780  5 asok(0x1a1e000)
register_command perfcounters_schema hook 0x1a13010
   -21> 2013-07-24 09:42:57.933595 7fb08d67a780  5 asok(0x1a1e000)
register_command 2 hook 0x1a13010
   -20> 2013-07-24 09:42:57.933597 7fb08d67a780  5 asok(0x1a1e000)
register_command perf schema hook 0x1a13010
   -19> 2013-07-24 09:42:57.933601 7fb08d67a780  5 asok(0x1a1e000)
register_command config show hook 0x1a13010
   -18> 2013-07-24 09:42:57.933604 7fb08d67a780  5 asok(0x1a1e000)
register_command config set hook 0x1a13010
   -17> 2013-07-24 09:42:57.933606 7fb08d67a780  5 asok(0x1a1e000)
register_command log flush hook 0x1a13010
   -16> 2013-07-24 09:42:57.933609 7fb08d67a780  5 asok(0x1a1e000)
register_command log dump hook 0x1a13010
   -15> 2013-07-24 09:42:57.933612 7fb08d67a780  5 asok(0x1a1e000)
register_command log reopen hook 0x1a13010
   -14> 2013-07-24 09:42:57.935730 7fb08d67a780  0 ceph version
0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35), process ceph-mon,
pid 19878
   -13> 2013-07-24 09:42:57.938697 7fb08d67a780  5 asok(0x1a1e000)
init /var/run/ceph/ceph-mon.ceph3.asok
   -12> 2013-07-24 09:42:57.938732 7fb08d67a780  5 asok(0x1a1e000)
bind_and_listen /var/run/ceph/ceph-mon.ceph3.asok
   -11> 2013-07-24 09:42:57.938775 7fb08d67a780  5 asok(0x1a1e000)
register_command 0 hook 0x1a120b0
   -10> 2013-07-24 09:42:57.938793 7fb08d67a780  5 asok(0x1a1e000)
register_command version hook 0x1a120b0
    -9> 2013-07-24 09:42:57.938804 7fb08d67a780  5 asok(0x1a1e000)
register_command git_version hook 0x1a120b0
    -8> 2013-07-24 09:42:57.938815 7fb08d67a780  5 asok(0x1a1e000)
register_command help hook 0x1a130d0
    -7> 2013-07-24 09:42:57.940462 7fb08971b700  5 asok(0x1a1e000)
entry start
    -6> 2013-07-24 09:42:57.943200 7fb08d67a780  1 --
10.255.0.30:6789/0 learned my addr 10.255.0.30:6789/0
    -5> 2013-07-24 09:42:57.943227 7fb08d67a780  1
accepter.accepter.bind my_inst.addr is 10.255.0.30:6789/0 need_addr=0
    -4> 2013-07-24 09:42:57.943265 7fb08d67a780  5 adding auth
protocol: cephx
    -3> 2013-07-24 09:42:57.943279 7fb08d67a780  5 adding auth
protocol: cephx
    -2> 2013-07-24 09:42:57.943330 7fb08d67a780  1
mon.ceph3@-1(probing) e1 preinit fsid
97e515bb-d334-4fa7-8b53-7d85615809fd
    -1> 2013-07-24 09:42:57.963915 7fb08d67a780  4
mon.ceph3@-1(probing).mds e228053 new map
0> 2013-07-24 09:42:57.966551 7fb08d67a780 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
7fb08d67a780 time 2013-07-24 09:42:57.964379
mon/OSDMonitor.cc: 167: FAILED assert(latest_bl.length() != 0)

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 2: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 4: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 5: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 6: (main()+0x1c19) [0x4835c9]
 7: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 8: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---
2013-07-24 09:42:57.973296 7fb08d67a780 -1 *** Caught signal
(Aborted) **
 in thread 7fb08d67a780

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-mon() [0x597cfa]
 2: (()+0xfcb0) [0x7fb08cd14cb0]
 3: (gsignal()+0x35) [0x7fb08b8ea425]
 4: (abort()+0x17b) [0x7fb08b8edb8b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb08c23c69d]
 6: (()+0xb5846) [0x7fb08c23a846]
 7: (()+0xb5873) [0x7fb08c23a873]
 8: (()+0xb596e) [0x7fb08c23a96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x64ffaf]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 13: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 14: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 15: (main()+0x1c19) [0x4835c9]
 16: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 17: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
0> 2013-07-24 09:42:57.973296 7fb08d67a780 -1 *** Caught signal
(Aborted) **
 in thread 7fb08d67a780

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-mon() [0x597cfa]
 2: (()+0xfcb0) [0x7fb08cd14cb0]
 3: (gsignal()+0x35) [0x7fb08b8ea425]
 4: (abort()+0x17b) [0x7fb08b8edb8b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb08c23c69d]
 6: (()+0xb5846) [0x7fb08c23a846]
 7: (()+0xb5873) [0x7fb08c23a873]
 8: (()+0xb596e) [0x7fb08c23a96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x64ffaf]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x507c77]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4ede7b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48e617]
 13: (Monitor::init_paxos()+0xf5) [0x48e7d5]
 14: (Monitor::preinit()+0x6ac) [0x4a4e6c]
 15: (main()+0x1c19) [0x4835c9]
 16: (__libc_start_main()+0xed) [0x7fb08b8d576d]
 17: /usr/bin/ceph-mon() [0x485eed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---

osd:

--- begin dump of recent events ---
0> 2013-07-24 09:21:34.087645 7f3c2fa7d700 -1 *** Caught signal
(Aborted) **
 in thread 7f3c2fa7d700

 ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)
 1: /usr/bin/ceph-osd() [0x79219a]
 2: (()+0xfcb0) [0x7f3c40aaacb0]
 3: (gsignal()+0x35) [0x7f3c3f263425]
 4: (abort()+0x17b) [0x7f3c3f266b8b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3c3fbb569d]
 6: (()+0xb5846) [0x7f3c3fbb3846]
 7: (()+0xb5873) [0x7f3c3fbb3873]
 8: (()+0xb596e) [0x7f3c3fbb396e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x84303f]
 10: (OSDService::get_map(unsigned int)+0x428) [0x630c88]
 11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>,
std::less<boost::intrusive_ptr<PG> >,
std::allocator<boost::intrusive_ptr<PG> > >*)+0x11d) [0x6327bd]
12: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*>
const&, ThreadPool::TPHandle&)+0x244) [0x632f14]
13: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2]
14: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476]
 15: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0]
 16: (()+0x7e9a) [0x7f3c40aa2e9a]
 17: (clone()+0x6d) [0x7f3c3f320ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.6.log
--- end dump of recent events ---

On 2013-07-24 06:47, Sage Weil wrote:
There was a problem with the monitor daemons in v0.61.5 that would
prevent
them from restarting after some period of time. This release fixes
the
bug and works around the issue to allow affected monitors to restart.
All v0.61.5 users are strongly recommended to upgrade.
Thanks everyone who helped track the problem down!
Notable changes:
* mon: record latest full osdmap
* mon: work around previous bug in which latest full osdmap was not
   recorded
 * mon: avoid scrub while paxos is updating
For more information please see the complete release notes:
* http://ceph.com/docs/master/release-notes/#v0-61-6-cuttlefish
You can get v0.61.6 from the usual locations:
* Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.61.6.tar.gz
 * For Debian/Ubuntu packages, see
http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux