Re: v0.61.6 Cuttlefish update released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013-07-25 15:21, Joao Eduardo Luis wrote:
On 07/25/2013 11:20 AM, peter@xxxxxxxxx wrote:
On 2013-07-25 12:08, Wido den Hollander wrote:
On 07/25/2013 12:01 PM, peter@xxxxxxxxx wrote:
On 2013-07-25 11:52, Wido den Hollander wrote:
On 07/25/2013 11:46 AM, peter@xxxxxxxxx wrote:
Any news on this? I'm not sure if you guys received the link to the
log
and monitor files. One monitor and osd is still crashing with the
error
below.

I think you are seeing this issue: http://tracker.ceph.com/issues/5737

You can try with new packages from here:
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-5737-cuttlefish/



That should resolve it.

Wido

Hi Wido,

This is the same issue I reported earlier with 0.61.5. I applied the above package and the problem was solved. Then 0.61.6 was released with a fix for this issue. I installed 0.61.6 and the issue is back on one of my monitors and I have one osd crashing. So, it seems the bug is still
there in 0.61.6 or it is a new bug. It seems the guys from Inktank
haven't picked this up yet.


It has been picked up, Sage mentioned this yesterday on the dev list:

"This is fixed in the cuttlefish branch as of earlier this afternoon.
I've spent most of the day expanding the automated test suite to
include upgrade combinations to trigger this and *finally* figured out
that this particular problem seems to surface on clusters that
upgraded from bobtail-> cuttlefish but not clusters created on
cuttlefish.

If you've run into this issue, please use the cuttlefish branch build
for now.  We will have a release out in the next day or so that
includes this and a few other pending fixes.

I'm sorry we missed this one!  The upgrade test matrix I've been
working on today should catch this type of issue in the future."

Wido

Regards,

We created this cluster on cuttlefish and not on bobtail so it doesn't apply. I'm not sure if it is clear what I am trying to say or that I'm
missing something here but I still see this issue either way :-)

I will check out the dev list also but perhaps someone from Inktank can
at least look at the files I provided.

Peter,

We did take a look at your files (thanks a lot btw!), and as of last
night's patches (which are now on the cuttlefish branch), your store
worked just fine.

As Sage mentioned on ceph-devel, one of the issues would only happen
on a bobtail -> cuttlefish cluster.  That is not your issue though.  I
believe Sage meant the FAILED assert(latest_full > 0) -- i.e., the one
reported on #5737.

Your issue however was caused by a bug on a patch meant to fix #5704.
It made an on-disk key to be updated erroneously with a value for a
version that did not yet existed at the time update_from_paxos() was
called.  In a nutshell, one of the latest patches (see
115468c73f121653eec2efc030d5ba998d834e43) fixed that issue and another
patch (see 27f31895664fa7f10c1617d486f2a6ece0f97091) worked around it.

A point-release should come out soon, but in the mean time the
cuttlefish branch should be safe to use.

If you run into any other issues, please let us know.

  -Joao

Hi Joao,

I installed the packages from that branch but I still see the same crashes:

root@ceph3:~/ceph# ceph-mon -v
ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3)
root@ceph3:~/ceph# ceph-osd -v
ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3)

Both monitor and one of three osds (on that host) still crash on startup. I must be doing something wrong if it works for you...

OSD:

--- begin dump of recent events ---
0> 2013-07-25 15:35:32.563404 7f8172241700 -1 *** Caught signal (Aborted) **
 in thread 7f8172241700

ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3)
 1: /usr/bin/ceph-osd() [0x79430a]
 2: (()+0xfcb0) [0x7f81833e1cb0]
 3: (gsignal()+0x35) [0x7f81814af425]
 4: (abort()+0x17b) [0x7f81814b2b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f8181e0169d]
 6: (()+0xb5846) [0x7f8181dff846]
 7: (()+0xb5873) [0x7f8181dff873]
 8: (()+0xb596e) [0x7f8181dff96e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x84618f]
 10: (OSDService::get_map(unsigned int)+0x428) [0x63bc48]
11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x11d) [0x63d77d] 12: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x244) [0x63ded4] 13: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x678c52]
 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x83b5c6]
 15: (ThreadPool::WorkThread::entry()+0x10) [0x83d3f0]
 16: (()+0x7e9a) [0x7f81833d9e9a]
 17: (clone()+0x6d) [0x7f818156cccd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.6.log
--- end dump of recent events ---

MON:

--- end dump of recent events ---
2013-07-25 15:34:14.171423 7fca27f0e780 0 ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3), process ceph-mon, pid 18182 2013-07-25 15:34:14.181109 7fca27f0e780 1 mon.ceph3@-1(probing) e1 preinit fsid 97e515bb-d334-4fa7-8b53-7d85615809fd 2013-07-25 15:34:14.203909 7fca27f0e780 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7fca27f0e780 time 2013-07-25 15:34:14.201568
mon/OSDMonitor.cc: 167: FAILED assert(latest_bl.length() != 0)

ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3)
 1: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x512b07]
 2: (PaxosService::refresh(bool*)+0x19b) [0x4f925b]
 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x49a5b7]
 4: (Monitor::init_paxos()+0xe5) [0x49a755]
 5: (Monitor::preinit()+0x6ac) [0x4b0a6c]
 6: (main()+0x1c19) [0x48f799]
 7: (__libc_start_main()+0xed) [0x7fca25fc276d]
 8: /usr/bin/ceph-mon() [0x4920bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-25> 2013-07-25 15:34:14.169227 7fca27f0e780 5 asok(0x18fe000) register_command perfcounters_dump hook 0x18f3010 -24> 2013-07-25 15:34:14.169262 7fca27f0e780 5 asok(0x18fe000) register_command 1 hook 0x18f3010 -23> 2013-07-25 15:34:14.169267 7fca27f0e780 5 asok(0x18fe000) register_command perf dump hook 0x18f3010 -22> 2013-07-25 15:34:14.169275 7fca27f0e780 5 asok(0x18fe000) register_command perfcounters_schema hook 0x18f3010 -21> 2013-07-25 15:34:14.169278 7fca27f0e780 5 asok(0x18fe000) register_command 2 hook 0x18f3010 -20> 2013-07-25 15:34:14.169280 7fca27f0e780 5 asok(0x18fe000) register_command perf schema hook 0x18f3010 -19> 2013-07-25 15:34:14.169284 7fca27f0e780 5 asok(0x18fe000) register_command config show hook 0x18f3010 -18> 2013-07-25 15:34:14.169287 7fca27f0e780 5 asok(0x18fe000) register_command config set hook 0x18f3010 -17> 2013-07-25 15:34:14.169290 7fca27f0e780 5 asok(0x18fe000) register_command log flush hook 0x18f3010 -16> 2013-07-25 15:34:14.169292 7fca27f0e780 5 asok(0x18fe000) register_command log dump hook 0x18f3010 -15> 2013-07-25 15:34:14.169295 7fca27f0e780 5 asok(0x18fe000) register_command log reopen hook 0x18f3010 -14> 2013-07-25 15:34:14.171423 7fca27f0e780 0 ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3), process ceph-mon, pid 18182 -13> 2013-07-25 15:34:14.171523 7fca27f0e780 5 asok(0x18fe000) init /var/run/ceph/ceph-mon.ceph3.asok -12> 2013-07-25 15:34:14.171540 7fca27f0e780 5 asok(0x18fe000) bind_and_listen /var/run/ceph/ceph-mon.ceph3.asok -11> 2013-07-25 15:34:14.171571 7fca27f0e780 5 asok(0x18fe000) register_command 0 hook 0x18f2030 -10> 2013-07-25 15:34:14.171576 7fca27f0e780 5 asok(0x18fe000) register_command version hook 0x18f2030 -9> 2013-07-25 15:34:14.171580 7fca27f0e780 5 asok(0x18fe000) register_command git_version hook 0x18f2030 -8> 2013-07-25 15:34:14.171583 7fca27f0e780 5 asok(0x18fe000) register_command help hook 0x18f3050 -7> 2013-07-25 15:34:14.173270 7fca24d85700 5 asok(0x18fe000) entry start -6> 2013-07-25 15:34:14.180955 7fca27f0e780 1 -- 10.255.0.30:6789/0 learned my addr 10.255.0.30:6789/0 -5> 2013-07-25 15:34:14.180994 7fca27f0e780 1 accepter.accepter.bind my_inst.addr is 10.255.0.30:6789/0 need_addr=0 -4> 2013-07-25 15:34:14.181041 7fca27f0e780 5 adding auth protocol: cephx -3> 2013-07-25 15:34:14.181054 7fca27f0e780 5 adding auth protocol: cephx -2> 2013-07-25 15:34:14.181109 7fca27f0e780 1 mon.ceph3@-1(probing) e1 preinit fsid 97e515bb-d334-4fa7-8b53-7d85615809fd -1> 2013-07-25 15:34:14.201153 7fca27f0e780 4 mon.ceph3@-1(probing).mds e228053 new map 0> 2013-07-25 15:34:14.203909 7fca27f0e780 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7fca27f0e780 time 2013-07-25 15:34:14.201568
mon/OSDMonitor.cc: 167: FAILED assert(latest_bl.length() != 0)

ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3)
 1: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x512b07]
 2: (PaxosService::refresh(bool*)+0x19b) [0x4f925b]
 3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x49a5b7]
 4: (Monitor::init_paxos()+0xe5) [0x49a755]
 5: (Monitor::preinit()+0x6ac) [0x4b0a6c]
 6: (main()+0x1c19) [0x48f799]
 7: (__libc_start_main()+0xed) [0x7fca25fc276d]
 8: /usr/bin/ceph-mon() [0x4920bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---
2013-07-25 15:34:14.210161 7fca27f0e780 -1 *** Caught signal (Aborted) **
 in thread 7fca27f0e780

ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3)
 1: /usr/bin/ceph-mon() [0x59f1ca]
 2: (()+0xfcb0) [0x7fca27aeccb0]
 3: (gsignal()+0x35) [0x7fca25fd7425]
 4: (abort()+0x17b) [0x7fca25fdab8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fca2692969d]
 6: (()+0xb5846) [0x7fca26927846]
 7: (()+0xb5873) [0x7fca26927873]
 8: (()+0xb596e) [0x7fca2692796e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x65873f]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x512b07]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4f925b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x49a5b7]
 13: (Monitor::init_paxos()+0xe5) [0x49a755]
 14: (Monitor::preinit()+0x6ac) [0x4b0a6c]
 15: (main()+0x1c19) [0x48f799]
 16: (__libc_start_main()+0xed) [0x7fca25fc276d]
 17: /usr/bin/ceph-mon() [0x4920bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2013-07-25 15:34:14.210161 7fca27f0e780 -1 *** Caught signal (Aborted) **
 in thread 7fca27f0e780

ceph version 0.61.6-1-g28720b0 (28720b0b4d55ef98f3b7d0855b18339e75f759e3)
 1: /usr/bin/ceph-mon() [0x59f1ca]
 2: (()+0xfcb0) [0x7fca27aeccb0]
 3: (gsignal()+0x35) [0x7fca25fd7425]
 4: (abort()+0x17b) [0x7fca25fdab8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fca2692969d]
 6: (()+0xb5846) [0x7fca26927846]
 7: (()+0xb5873) [0x7fca26927873]
 8: (()+0xb596e) [0x7fca2692796e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x65873f]
 10: (OSDMonitor::update_from_paxos(bool*)+0x29e7) [0x512b07]
 11: (PaxosService::refresh(bool*)+0x19b) [0x4f925b]
 12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x49a5b7]
 13: (Monitor::init_paxos()+0xe5) [0x49a755]
 14: (Monitor::preinit()+0x6ac) [0x4b0a6c]
 15: (main()+0x1c19) [0x48f799]
 16: (__libc_start_main()+0xed) [0x7fca25fc276d]
 17: /usr/bin/ceph-mon() [0x4920bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.ceph3.log
--- end dump of recent events ---


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux