Re: [BUG] ceph-mon crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 12.11.2012 um 00:26 schrieb Joao Eduardo Luis <joao.luis@xxxxxxxxxxx>:

> On 11/11/2012 10:22 PM, Stefan Priebe wrote:
>> I wanted to remove osd.51 to 54. And executed the following commands:
>> ceph osd 51 out
>> ceph osd 52 out
>> ceph osd 53 out
>> ceph osd 54 out
>> 
>> Greets,
>> Stefan
> 
> Were those osds shutdown/killed before issuing the out commands?
No


> 
> And I'm assuming all the commands completed successfully with a 'marked
> out osd.X'. Is this correct?
Yes

Stefan


> 
>  -Joao
> 
> 
>> Am 11.11.2012 23:20, schrieb Joao Eduardo Luis:
>>> Hi Stefan,
>>> 
>>> Any chance you can get us the output of `ceph osd dump | grep 'osd.53'`?
>>> 
>>> Also, do you have any idea what happened to osd.53?
>>> 
>>>   -Joao
>>> 
>>> On 11/11/2012 09:51 PM, Stefan Priebe wrote:
>>>> Hello list,
>>>> 
>>>> i've now seen the following ceph-mon crash several times:
>>>>    -22> 2012-11-11 22:45:10.941641 7f5d49168700  1
>>>> mon.b@1(leader).paxos(pgmap active c 11425..11925) is_readable
>>>> now=2012-11-11 22:45:10.941643 lease_expire=2012-11-11 22:45:15.313577
>>>> has v1017 lc 11925
>>>>    -21> 2012-11-11 22:45:11.017199 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 <== mon.2 10.255.0.102:6789/0 123 ==== paxos(logm
>>>> lease_ack lc 14415 fc 13913 pn 0 opn 0 gv {}) v2 ==== 88+0+0 (1926207536
>>>> 0 0) 0x58fe840 con 0x2c38c60
>>>>    -20> 2012-11-11 22:45:11.084775 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 <== mon.2 10.255.0.102:6789/0 124 ==== paxos(mdsmap
>>>> lease_ack lc 1 fc 1 pn 0 opn 0 gv {}) v2 ==== 88+0+0 (2037515134 0 0)
>>>> 0x4e59080 con 0x2c38c60
>>>>    -19> 2012-11-11 22:45:11.084812 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 <== mon.2 10.255.0.102:6789/0 125 ==== paxos(monmap
>>>> lease_ack lc 1 fc 1 pn 0 opn 0 gv {}) v2 ==== 88+0+0 (3822314369 0 0)
>>>> 0x4e58580 con 0x2c38c60
>>>>    -18> 2012-11-11 22:45:11.230970 7f5d48065700  2 --
>>>> 10.255.0.101:6789/0 >> 10.255.0.100:6789/0 pipe(0x2cc9b40 sd=17 :6789
>>>> pgs=16 cs=2 l=0).connect error 10.255.0.100:6789/0, 111: Connection
>>>> refused
>>>>    -17> 2012-11-11 22:45:11.231004 7f5d48065700  2 --
>>>> 10.255.0.101:6789/0 >> 10.255.0.100:6789/0 pipe(0x2cc9b40 sd=17 :6789
>>>> pgs=16 cs=2 l=0).fault 111: Connection refused
>>>>    -16> 2012-11-11 22:45:11.238427 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 <== osd.33 10.255.0.102:6806/3537 1315 ====
>>>> pg_stats(478 pgs tid 185 v 1017) v1 ==== 158856+0+0 (1263639313 0 0)
>>>> 0x2ca5b40 con 0x2ca4dc0
>>>>    -15> 2012-11-11 22:45:11.238451 7f5d49168700  1
>>>> mon.b@1(leader).paxos(pgmap active c 11425..11925) is_readable
>>>> now=2012-11-11 22:45:11.238452 lease_expire=2012-11-11 22:45:15.313577
>>>> has v1017 lc 11925
>>>>    -14> 2012-11-11 22:45:11.287036 7f5d49969700  5
>>>> mon.b@1(leader).paxos(pgmap active c 11425..11925) propose_new_value
>>>> 11926 311714 bytes, gv 28550
>>>>    -13> 2012-11-11 22:45:11.296084 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 <== osd.41 10.255.0.103:6800/3324 1086 ====
>>>> pg_stats(440 pgs tid 185 v 1017) v1 ==== 146276+0+0 (1903337195 0 0)
>>>> 0x39bad80 con 0x2ca4c60
>>>>    -12> 2012-11-11 22:45:11.350697 7f5d49969700  1 --
>>>> 10.255.0.101:6789/0 --> mon.2 10.255.0.102:6789/0 -- paxos(pgmap begin
>>>> lc 11925 fc 0 pn 3501 opn 0 gv {11926=28550}) v2 -- ?+0 0x4e60b00
>>>>    -11> 2012-11-11 22:45:11.350790 7f5d49168700  1
>>>> mon.b@1(leader).paxos(pgmap updating c 11425..11925) is_readable
>>>> now=2012-11-11 22:45:11.350793 lease_expire=2012-11-11 22:45:15.313577
>>>> has v1017 lc 11925
>>>>    -10> 2012-11-11 22:45:11.435570 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 <== mon.2 10.255.0.102:6789/0 126 ==== paxos(pgmap
>>>> accept lc 11925 fc 0 pn 3501 opn 0 gv {}) v2 ==== 88+0+0 (3719346061 0
>>>> 0) 0x4e60b00 con 0x2c38c60
>>>>     -9> 2012-11-11 22:45:11.476091 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 --> mon.2 10.255.0.102:6789/0 -- paxos(pgmap commit
>>>> lc 11926 fc 0 pn 3501 opn 0 gv {11926=28550}) v2 -- ?+0 0x4e58580
>>>>     -8> 2012-11-11 22:45:11.476125 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 --> mon.2 10.255.0.102:6789/0 -- paxos(pgmap lease
>>>> lc 11926 fc 11425 pn 0 opn 0 gv {}) v2 -- ?+0 0x4e59080
>>>>     -7> 2012-11-11 22:45:11.593945 7f5d49168700  0 log [INF] : pgmap
>>>> v11926: 7032 pgs: 5977 active+clean, 62 active+remapped+wait_backfill,
>>>> 586 active+degraded+wait_backfill, 32 active+remapped+backfilling, 18
>>>> active+degraded+backfilling, 343 active+degraded+remapped+wait_backfill,
>>>> 2 remapped+peering, 12 active+degraded+remapped+backfilling; 61211 MB
>>>> data, 61662 MB used, 4185 GB / 4245 GB avail; 5419/31221 degraded
>>>> (17.357%)
>>>>     -6> 2012-11-11 22:45:11.593954 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 --> mon.1 10.255.0.101:6789/0 -- log(1 entries) v1
>>>> -- ?+0 0x40c7200
>>>>     -5> 2012-11-11 22:45:11.593969 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 --> 10.255.0.103:6803/3461 -- pg_stats_ack(460 pgs
>>>> tid 185) v1 -- ?+0 0x3ab5a80 con 0x3bd5420
>>>>     -4> 2012-11-11 22:45:11.594050 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 --> 10.255.0.102:6806/3537 -- pg_stats_ack(478 pgs
>>>> tid 185) v1 -- ?+0 0x4f0b8c0 con 0x2ca4dc0
>>>>     -3> 2012-11-11 22:45:11.594359 7f5d49168700  1
>>>> mon.b@1(leader).paxos(pgmap active c 11426..11926) is_readable
>>>> now=2012-11-11 22:45:11.594363 lease_expire=2012-11-11 22:45:16.476123
>>>> has v1017 lc 11926
>>>>     -2> 2012-11-11 22:45:11.594736 7f5d49168700  1 --
>>>> 10.255.0.101:6789/0 <== mon.1 10.255.0.101:6789/0 0 ==== log(1 entries)
>>>> v1 ==== 0+0+0 (0 0 0) 0x40c7200 con 0x2c382c0
>>>>     -1> 2012-11-11 22:45:11.595139 7f5d49969700  1 mon.b@1(leader).osd
>>>> e1017  we have enough reports/reporters to mark osd.53 down
>>>>      0> 2012-11-11 22:45:11.595601 7f5d49969700 -1 ./osd/OSDMap.h: In
>>>> function 'entity_inst_t OSDMap::get_inst(int) const' thread 7f5d49969700
>>>> time 2012-11-11 22:45:11.595149
>>>> ./osd/OSDMap.h: 345: FAILED assert(is_up(osd))
>>>> 
>>>>  ceph version 0.53-758-g26e5f2d
>>>> (26e5f2d63f569b955bb07b50aa8f930ed9450bc4)
>>>>  1: (OSDMonitor::check_failure(utime_t, int, failure_info_t&)+0xc09)
>>>> [0x4b14e9]
>>>>  2: (OSDMonitor::check_failures(utime_t)+0x3a) [0x4b16ca]
>>>>  3: (OSDMonitor::tick()+0xab) [0x4b213b]
>>>>  4: (Monitor::tick()+0x6d) [0x477fcd]
>>>>  5: (SafeTimer::timer_thread()+0x453) [0x58f203]
>>>>  6: (SafeTimerThread::entry()+0xd) [0x5913dd]
>>>>  7: (()+0x68ca) [0x7f5d4dbce8ca]
>>>>  8: (clone()+0x6d) [0x7f5d4c456bfd]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to interpret this.
>>> 
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux