ceph 0.78 mon and mds crashing (bus error)

Kenneth.Waegeman at UGent.be (Kenneth Waegeman) · Tue, 01 Apr 2014 18:02:15 +0200

----- Message from "Yan, Zheng" <ukernel at gmail.com> ---------
    Date: Tue, 1 Apr 2014 22:12:15 +0800
    From: "Yan, Zheng" <ukernel at gmail.com>
Subject: Re: ceph 0.78 mon and mds crashing (bus error)
      To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
      Cc: Joao Eduardo Luis <joao.luis at inktank.com>, ceph-users  
<ceph-users at lists.ceph.com>

> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
> <Kenneth.Waegeman at ugent.be> wrote:
>> After some more searching, I've found that the source of the problem is with
>> the mds and not the mon.. The mds crashes, generates a core dump that eats
>> the local space, and in turn the monitor (because of leveldb) crashes.
>>
>> The error in the mds log of one host:
>>
>> 2014-04-01 15:46:34.414615 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
>> 10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 cs=0 l=1
>> c=0x147ac780).connect got RESETSESSION but no longer connecting
>> 2014-04-01 15:46:34.438792 7f871194f700  0 -- 10.141.8.180:6836/13152 >>
>> 10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0 l=1
>> c=0x1fd44360).connect got RESETSESSION but no longer connecting
>> 2014-04-01 15:46:34.439028 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
>> 10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 cs=0 l=1
>> c=0x1fd43de0).connect got RESETSESSION but no longer connecting
>> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function 'void
>> CDir::_omap_fetched(ceph::bufferlist&, std::map<std::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >, ceph::buffer::list,
>> std::less<std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > >, std::allocator<std::pair<const
>> std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
>> ceph::buffer::list> > >&, const std::string&, int)' thread 7f871b7ae700 time
>> 2014-04-01 15:46:34.448320
>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)
>>
>
> could you use gdb to check what is value of variable 'r' .
>
I'm not very experienced with gdb, but I think I'm doing it right.
But when I run gdb /usr/bin/ceph-mds core.ceph-mds.37714 I get  :

'Reading symbols from /usr/bin/ceph-mds...(no debugging symbols  
found)...done.'
(So print r gives 'No symbol table is loaded.  Use the "file" command.')

and

Missing separate debuginfos, use: debuginfo-install ceph-0.78-0.el6.x86_64

But that package doesn't seem to exists for this version..(only 0.68  
on http://ceph.com/rpm-testing/el6/x86_64/)

Kind regards,
Kenneth

> Regards
> Yan, Zheng
>
>
>
>
>>  ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
>>  1: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string,
>> ceph::buffer::list, std::less<std::string>,
>> std::allocator<std::pair<std::string const, ceph::buffer::list> > >&,
>> std::string const&, int)+0x4d71) [0x77c3c1]
>>  2: (Context::complete(int)+0x9) [0x56bb79]
>>  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) [0x806dd6]
>>  4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7]
>>  5: (MDS::_dispatch(Message*)+0x2f) [0x59028f]
>>  6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b]
>>  7: (DispatchQueue::entry()+0x582) [0x902072]
>>  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d]
>>  9: /lib64/libpthread.so.0() [0x34c36079d1]
>>  10: (clone()+0x6d) [0x34c32e8b6d]
>>
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>>
>> --- begin dump of recent events ---
>> -10000> 2014-04-01 15:46:34.313794 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 --> 10.141.8.182:6789/0 -- auth(proto 0 42 bytes
>> epoch 1) v1 -- ?+0 0x5b301c800 con 0x377e5c0
>>  -9999> 2014-04-01 15:46:34.313811 7f871b7ae700 10 monclient(hunting):
>> renew_subs
>>  -9998> 2014-04-01 15:46:34.313829 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33052 ====
>> osd_op_reply(4176829 10000c23f15.00000000 [getxattr] v0'0 uv0 ack = -108
>> (Cannot send after transport endpoint shutdown)) v6 ==== 187+0+0 (3499028751
>> 0 0) 0x51a1fb200 con 0x37ea3c0
>>  -9997> 2014-04-01 15:46:34.313858 7f871b7ae700  5 mds.0.16 is_laggy
>> 600.641332 > 15 since last acked beacon
>>  -9996> 2014-04-01 15:46:34.313865 7f871b7ae700  5 mds.0.16 initiating
>> monitor reconnect; maybe we're not the slow one
>>  -9995> 2014-04-01 15:46:34.313867 7f871b7ae700 10 monclient(hunting):
>> _reopen_session rank -1 name
>>  -9994> 2014-04-01 15:46:34.313872 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 mark_down 0x377e5c0 -- 0xae9cb480
>>  -9993> 2014-04-01 15:46:34.313888 7f870e319700  2 --
>> 10.141.8.180:6836/13152 >> 10.141.8.182:6789/0 pipe(0xae9cb480 sd=54 :0 s=4
>> pgs=0 cs=0 l=1 c=0x377e5c0).connect couldn't read banner, Success
>>  -9992> 2014-04-01 15:46:34.313902 7f870e319700  3 --
>> 10.141.8.180:6836/13152 >> 10.141.8.182:6789/0 pipe(0xae9cb480 sd=54 :0 s=4
>> pgs=0 cs=0 l=1 c=0x377e5c0).connect fault, but state = closed != connecting,
>> stopping
>>  -9991> 2014-04-01 15:46:34.313912 7f871b7ae700 10 monclient(hunting):
>> picked mon.ceph001 con 0x377aec0 addr 10.141.8.180:6789/0
>>  -9990> 2014-04-01 15:46:34.313923 7f871b7ae700 10 monclient(hunting):
>> _send_mon_message to mon.ceph001 at 10.141.8.180:6789/0
>>  -9989> 2014-04-01 15:46:34.313927 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 --> 10.141.8.180:6789/0 -- auth(proto 0 42 bytes
>> epoch 1) v1 -- ?+0 0x5b301f500 con 0x377aec0
>>  -9988> 2014-04-01 15:46:34.313935 7f871b7ae700 10 monclient(hunting):
>> renew_subs
>>  -9987> 2014-04-01 15:46:34.313944 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 <== osd.2 10.141.8.180:6800/2233 33052 ====
>> osd_op_reply(2116129 100003f7763.00000000 [getxattr] v0'0 uv0 ack = -108
>> (Cannot send after transport endpoint shutdown)) v6 ==== 187+0+0 (567486810
>> 0 0) 0x4ea1b9e00 con 0x37ebb20
>> .....
>> .....
>>    -19> 2014-04-01 15:46:34.440672 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 mark_down 0x4eff853e0 -- 0xa8693980
>>    -18> 2014-04-01 15:46:34.440684 7f871194f700  2 --
>> 10.141.8.180:6836/13152 >> 10.141.8.182:6789/0 pipe(0xa8693980 sd=54 :0 s=4
>> pgs=0 cs=0 l=1 c=0x4eff853e0).connect couldn't read banner, Success
>>    -17> 2014-04-01 15:46:34.440697 7f871b7ae700 10 monclient(hunting):
>> picked mon.ceph001 con 0x4eff82260 addr 10.141.8.180:6789/0
>>    -16> 2014-04-01 15:46:34.440703 7f871b7ae700 10 monclient(hunting):
>> _send_mon_message to mon.ceph001 at 10.141.8.180:6789/0
>>    -15> 2014-04-01 15:46:34.440701 7f871194f700  3 --
>> 10.141.8.180:6836/13152 >> 10.141.8.182:6789/0 pipe(0xa8693980 sd=54 :0 s=4
>> pgs=0 cs=0 l=1 c=0x4eff853e0).connect fault, but state = closed !=
>> connecting, stopping
>>    -14> 2014-04-01 15:46:34.440708 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 --> 10.141.8.180:6789/0 -- auth(proto 0 42 bytes
>> epoch 1) v1 -- ?+0 0x5e77ec140 con 0x4eff82260
>>    -13> 2014-04-01 15:46:34.440716 7f871b7ae700 10 monclient(hunting):
>> renew_subs
>>    -12> 2014-04-01 15:46:34.440746 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 <== osd.10 10.141.8.180:6806/2663 33077 ====
>> osd_op_reply(1698429 100003917b8.00000000 [getxattr] v0'0 uv0 ack = -108
>> (Cannot send after transport endpoint shutd
>> own)) v6 ==== 187+0+0 (1851692513 0 0) 0x369dc7080 con 0x37ea680
>>    -11> 2014-04-01 15:46:34.440770 7f871b7ae700  5 mds.0.16 is_laggy
>> 600.768243 > 15 since last acked beacon
>>    -10> 2014-04-01 15:46:34.440777 7f871b7ae700  5 mds.0.16 initiating
>> monitor reconnect; maybe we're not the slow one
>>     -9> 2014-04-01 15:46:34.440779 7f871b7ae700 10 monclient(hunting):
>> _reopen_session rank -1 name
>>     -8> 2014-04-01 15:46:34.440781 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 mark_down 0x4eff82260 -- 0xc3287800
>>     -7> 2014-04-01 15:46:34.440796 7f871063c700  2 --
>> 10.141.8.180:6836/13152 >> 10.141.8.180:6789/0 pipe(0xc3287800 sd=8 :43167
>> s=4 pgs=0 cs=0 l=1 c=0x4eff82260).connect couldn't write my addr,
>> Interrupted system call
>>     -6> 2014-04-01 15:46:34.440810 7f871063c700  3 --
>> 10.141.8.180:6836/13152 >> 10.141.8.180:6789/0 pipe(0xc3287800 sd=8 :43167
>> s=4 pgs=0 cs=0 l=1 c=0x4eff82260).connect fault, but state = closed !=
>> connecting, stopping
>>     -5> 2014-04-01 15:46:34.440822 7f871b7ae700 10 monclient(hunting):
>> picked mon.ceph002 con 0x4eff865c0 addr 10.141.8.181:6789/0
>>     -4> 2014-04-01 15:46:34.440832 7f871b7ae700 10 monclient(hunting):
>> _send_mon_message to mon.ceph002 at 10.141.8.181:6789/0
>>     -3> 2014-04-01 15:46:34.440837 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 --> 10.141.8.181:6789/0 -- auth(proto 0 42 bytes
>> epoch 1) v1 -- ?+0 0x5e77e8900 con 0x4eff865c0
>>     -2> 2014-04-01 15:46:34.440848 7f871b7ae700 10 monclient(hunting):
>> renew_subs
>>     -1> 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 ====
>> osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header 0~0,omap-get-vals
>> 0~16] v0'0 uv0 ack = -108 (Cannot send a
>> fter transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0)
>> 0x2cff7aa80 con 0x37ea3c0
>>      0> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function
>> 'void CDir::_omap_fetched(ceph::bufferlist&,
>> std::map<std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> >, ceph::buffer::list, std::less<std::ba
>> sic_string<char, std::char_traits<char>, std::allocator<char> > >,
>> std::allocator<std::pair<const std::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >, ceph::buffer::list> > >&,
>> const std::string&, int)' thread 7f871b7ae7
>> 00 time 2014-04-01 15:46:34.448320
>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)
>> .....
>>
>> 2014-04-01 15:46:34.557614 7f871b7ae700 -1 *** Caught signal (Aborted) **
>>  in thread 7f871b7ae700
>>
>>  ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
>>  1: /usr/bin/ceph-mds() [0x827909]
>>  2: /lib64/libpthread.so.0() [0x34c360f710]
>>  3: (gsignal()+0x35) [0x34c3232925]
>>  4: (abort()+0x175) [0x34c3234105]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x34c3abea5d]
>>
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>>
>>
>> Thanks again!
>>
>>
>>
>>
>> ----- Message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be> ---------
>>    Date: Tue, 01 Apr 2014 14:36:20 +0200
>>    From: Kenneth Waegeman <Kenneth.Waegeman at UGent.be>
>>
>> Subject: Re: ceph 0.78 mon and mds crashing (bus error)
>>      To: Joao Eduardo Luis <joao.luis at inktank.com>
>>      Cc: ceph-users <ceph-users at lists.ceph.com>
>>
>>
>>
>>> ----- Message from Joao Eduardo Luis <joao.luis at inktank.com> ---------
>>>    Date: Tue, 01 Apr 2014 13:33:05 +0100
>>>    From: Joao Eduardo Luis <joao.luis at inktank.com>
>>> Subject: Re: ceph 0.78 mon and mds crashing (bus error)
>>>      To: Kenneth Waegeman <Kenneth.Waegeman at UGent.be>, ceph-users
>>> <ceph-users at lists.ceph.com>
>>>
>>>
>>>> On 04/01/2014 01:17 PM, Kenneth Waegeman wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> We have installed Ceph 0.78 on 3 hosts running SL6, each serving OSD and
>>>>> MON daemons, 2 of them running MDS in active/backup. Since today the MON
>>>>> and MDS daemons are crashing every time (after a short while). Rebooting
>>>>> the nodes didn't help. The cluster was running for 1-2 weeks without
>>>>> this problem. This is what's in the log files:
>>>>>
>>>>
>>>> Something is going wrong on leveldb.  What leveldb version are you using?
>>>
>>>
>>> This is leveldb-1.7.0-2.el6.x86_64
>>>>
>>>>
>>>>  -Joao
>>>>
>>>>>
>>>>>
>>>>> 2014-04-01 12:48:09.442819 7fa39cb1d700  0 log [INF] : mdsmap e67: 1/1/1
>>>>> up {0=ceph001.cubone.os=up:active(laggy or crashed)}
>>>>> 2014-04-01 12:48:09.442897 7fa39cb1d700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos active c 504009..504758) is_readable
>>>>> now=2014-04-01 12:48:09.442898 lease_expire=2014-04-01 12:48:14.442854
>>>>> has v0 lc 504758
>>>>> 2014-04-01 12:48:15.340419 7fa39d51e700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos updating c 504009..504758) is_readable
>>>>> now=2014-04-01 12:48:15.340420 lease_expire=2014-04-01 12:48:14.442854
>>>>> has v0 lc 504758
>>>>> 2014-04-01 12:48:15.340438 7fa39d51e700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos updating c 504009..504758) is_readable
>>>>> now=2014-04-01 12:48:15.340439 lease_expire=2014-04-01 12:48:14.442854
>>>>> has v0 lc 504758
>>>>> 2014-04-01 12:48:15.340443 7fa39d51e700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos updating c 504009..504758) is_readable
>>>>> now=2014-04-01 12:48:15.340444 lease_expire=2014-04-01 12:48:14.442854
>>>>> has v0 lc 504758
>>>>> 2014-04-01 12:48:22.260674 7fa39d51e700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos active c 504009..504759) is_readable
>>>>> now=2014-04-01 12:48:22.260676 lease_expire=2014-04-01 12:48:27.260625
>>>>> has v0 lc 504759
>>>>> 2014-04-01 12:48:22.260688 7fa39d51e700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos active c 504009..504759) is_readable
>>>>> now=2014-04-01 12:48:22.260689 lease_expire=2014-04-01 12:48:27.260625
>>>>> has v0 lc 504759
>>>>> 2014-04-01 12:48:22.260703 7fa39d51e700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos active c 504009..504759) is_readable
>>>>> now=2014-04-01 12:48:22.260704 lease_expire=2014-04-01 12:48:27.260625
>>>>> has v0 lc 504759
>>>>> 2014-04-01 12:48:22.292363 7fa39cb1d700  0 log [INF] : mon.ceph001
>>>>> calling new monitor election
>>>>> 2014-04-01 12:48:22.292428 7fa39cb1d700  1
>>>>> mon.ceph001 at 0(electing).elector(43) init, la2014-04-01 12:48:27.450606
>>>>> 7fa39d51e700 -1 *** Caught signal (Bus error) **
>>>>> in thread 7fa39d51e700
>>>>>
>>>>> ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
>>>>> 1: /usr/bin/ceph-mon() [0x85e619]
>>>>> 2: /lib64/libpthread.so.0() [0x34c360f710]
>>>>> 3: (memcpy()+0x15b) [0x34c3289aab]
>>>>> 4: (()+0x3d4b2) [0x7fa3a0ae54b2]
>>>>> 5: (leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType,
>>>>> char const*, unsigned long)+0x125) [0x7fa3a0acbe85]
>>>>> 6: (leveldb::log::Writer::AddRecord(leveldb::Slice const&)+0xe2)
>>>>> [0x7fa3a0acc092]
>>>>> 7: (leveldb::DBImpl::Write(leveldb::WriteOptions const&,
>>>>> leveldb::WriteBatch*)+0x38e) [0x7fa3a0ac0bde]
>>>>> 8:
>>>>>
>>>>> (LevelDBStore::submit_transaction_sync(std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)+0x2b)
>>>>> [0x7c63eb]
>>>>> 9: (MonitorDBStore::apply_transaction(MonitorDBStore::Transaction
>>>>> const&)+0x183) [0x53c6b3]
>>>>> 10: (Paxos::begin(ceph::buffer::list&)+0x604) [0x5a2ca4]
>>>>> 11: (Paxos::propose_queued()+0x273) [0x5a3773]
>>>>> 12: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x160)
>>>>> [0x5a3960]
>>>>> 13: (PaxosService::propose_pending()+0x386) [0x5ad946]
>>>>> 14: (Context::complete(int)+0x9) [0x580609]
>>>>> 15: (SafeTimer::timer_thread()+0x453) [0x768a33]
>>>>> 16: (SafeTimerThread::entry()+0xd) [0x76abed]
>>>>> 17: /lib64/libpthread.so.0() [0x34c36079d1]
>>>>> 18: (clone()+0x6d) [0x34c32e8b6d]
>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed to interpret this.
>>>>>
>>>>> --- begin dump of recent events ---
>>>>> -10000> 2014-04-01 12:37:41.891458 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 --> mon.2 10.141.8.182:6789/0 -- paxos(lease lc
>>>>> 504309 fc 503758 pn 0 opn 0) v3 -- ?+0 0x6246e00
>>>>> -9999> 2014-04-01 12:37:41.891476 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 <== mon.0 10.141.8.180:6789/0 0 ==== log(1 entries)
>>>>> v1 ==== 0+0+0 (0 0 0) 0x29f4380 con 0x2050840
>>>>> -9998> 2014-04-01 12:37:41.891487 7fa39cb1d700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos active c 503758..504309) is_readable
>>>>> now=2014-04-01 12:37:41.891488 lease_expire=2014-04-01 12:37:46.891457
>>>>> has v0 lc 504309
>>>>> -9997> 2014-04-01 12:37:41.893979 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 <== mon.2 10.141.8.182:6789/0 672234191 ====
>>>>> paxos(lease_ack lc 504309 fc 503758 pn 0 opn 0) v3 ==== 80+0+0
>>>>> (2650854818 0 0) 0x6246e00 con 0x2051760
>>>>> -9996> 2014-04-01 12:37:42.087358 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 <== mon.2 10.141.8.182:6789/0 672234192 ====
>>>>> forward(pg_stats(1 pgs tid 13980 v 0) v1 caps allow profile osd tid
>>>>> 57315 con_features 4398046511103) to leader v2 ==== 916+0+0 (3952263404
>>>>> 0 0) 0x6247580 con 0x2051760
>>>>> -9995> 2014-04-01 12:37:42.087395 7fa39cb1d700  1
>>>>> mon.ceph001 at 0(leader).paxos(paxos active c 503758..504309) is_readable
>>>>> now=2014-04-01 12:37:42.087396 lease_expire=2014-04-01 12:37:46.891457
>>>>> has v0 lc 504309
>>>>> -9994> 2014-04-01 12:37:42.890767 7fa39d51e700  5
>>>>> mon.ceph001 at 0(leader).paxos(paxos active c 503758..504309)
>>>>> queue_proposal bl 16145 bytes; ctx = 0x1fa6e90
>>>>> -9993> 2014-04-01 12:37:42.891438 7fa39d51e700  1 --
>>>>> 10.141.8.180:6789/0 --> mon.2 10.141.8.182:6789/0 -- paxos(begin lc
>>>>> 504309 fc 0 pn 1800 opn 0) v3 -- ?+0 0x369d000
>>>>> -9992> 2014-04-01 12:37:42.891520 7fa39d51e700  5
>>>>> mon.ceph001 at 0(leader).paxos(paxos updating c 503758..504309)
>>>>> queue_proposal bl 1025 bytes; ctx = 0x1fa0af0
>>>>> -9991> 2014-04-01 12:37:42.891527 7fa39d51e700  5
>>>>> mon.ceph001 at 0(leader).paxos(paxos updating c 503758..504309)
>>>>> propose_new_value not active; proposal queued
>>>>> -9990> 2014-04-01 12:37:42.892656 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 <== mon.2 10.141.8.182:6789/0 672234193 ====
>>>>> paxos(accept lc 504309 fc 0 pn 1800 opn 0) v3 ==== 80+0+0 (2064056753 0
>>>>> 0) 0x369d000 con 0x2051760
>>>>> -9989> 2014-04-01 12:37:42.893660 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 --> mon.2 10.141.8.182:6789/0 -- paxos(commit lc
>>>>> 504310 fc 0 pn 1800 opn 0) v3 -- ?+0 0x6247580
>>>>> -9988> 2014-04-01 12:37:42.894019 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 --> 10.141.8.180:6789/0 -- log(last 1348) v1 -- ?+0
>>>>> 0x41ddcc0 con 0x2050840
>>>>> -9987> 2014-04-01 12:37:42.894038 7fa39cb1d700  1 --
>>>>> 10.141.8.180:6789/0 --> mon.2 10.141.8.182:6789/0 -- paxos(lease lc
>>>>> 504310 fc 503758 pn 0 opn 0) v3 -- ?+0 0x6246e00
>>>>>
>>>>> 2 of the 3 MONS are down, the MDSs are down, but the OSDs are still
>>>>> running..
>>>>> Does someone knows what is happening here?
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Kind regards,
>>>>> Kenneth
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>> --
>>>> Joao Eduardo Luis
>>>> Software Engineer | http://inktank.com | http://ceph.com
>>>
>>>
>>>
>>> ----- End message from Joao Eduardo Luis <joao.luis at inktank.com> -----
>>>
>>> --
>>>
>>> Met vriendelijke groeten,
>>> Kenneth Waegeman
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ----- End message from Kenneth Waegeman <Kenneth.Waegeman at UGent.be> -----
>>
>>
>> --
>>
>> Met vriendelijke groeten,
>> Kenneth Waegeman
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

----- End message from "Yan, Zheng" <ukernel at gmail.com> -----

-- 

Met vriendelijke groeten,
Kenneth Waegeman