Re: ceph-mon segmentation fault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

>>> I created a ticket: http://tracker.ceph.com/issues/7487
>>> 
>>> But my guess is that this is a result of having 0 CRUSH weight for the
>>> entire tree while linking them up. Can you give the OSD a weight and
>>> see if it works after that?
>> 
>> How to do this?
>> I still not very familiar with ceph tools yet :)
> 
> See http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#adding-osds
> In particular you'll want to use "ceph osd reweight <osd-id>
> <weight>". (The weight should probably just be 1, or the disk size in
> TB, or similar.)
> 

I have tried following command:
root@ceph-base:/# ceph osd reweight 0 1

nothing is changed:

root@ceph-base:/# ceph osd tree
# id    weight  type name       up/down reweight
-3      0       osd osd.0
-1      0       root default
-2      0               host odd-host

The following command caused a error:

root@ceph-base:/# ceph osd reweight osd.0 1
Invalid command:  osd.0 doesn't represent an int
osd reweight <int[0-]> <float[0.0-1.0]> :  reweight osd to 0.0 < <weight> < 1.0
Error EINVAL: invalid command

The problem still exists:

root@ceph-base:/# ceph osd reweight 0 1
root@ceph-base:/# ceph osd crush move osd.0 host=osd-host
2014-02-20 21:52:30.580751 7f2cf92f2700  0 monclient: hunting for new mon
2014-02-20 21:52:30.580943 7f2cf81ef700  0 -- 172.17.0.223:0/1000358 >> 172.17.0.222:6789/0 pipe(0x7f2ce80046e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f2ce8004940).fault

> I assumed you were basically following those steps already!


I just have played with ceph, trying to resolve my "osd down" problem.
Finally, just right now I have found the solution:

If I use "ceph odd create" command with UUID, ceph-osd --mkfs created fs not taking this UUID into account, so it cannot connect to the monitor after all. Removing uuid parameter from "ceph osd create" fixes the problem.
If this is not a bug, may be it will be better to document this behavior.

With best regards,
  Pavel.




>> Pavel.
>> 
>> 
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>> 
>>> 
>>> On Tue, Feb 18, 2014 at 4:21 AM, Pavel V. Kaygorodov <pasha@xxxxxxxxx> wrote:
>>>> Hi!
>>>> 
>>>> Playing with ceph, I found a bug:
>>>> 
>>>> I have compiled and installed ceph from sources on debian/jessie:
>>>> 
>>>> git clone --recursive -b v0.75 https://github.com/ceph/ceph.git
>>>> cd ceph/ && ./autogen.sh && ./configure && make && make install
>>>> 
>>>> /usr/local/bin/ceph-authtool --create-keyring /data/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
>>>> /usr/local/bin/ceph-authtool --create-keyring /ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
>>>> /usr/local/bin/ceph-authtool /data/ceph.mon.keyring --import-keyring /ceph.client.admin.keyring
>>>> /usr/local/bin/monmaptool --create --fsid e90dfd37-98d1-45bb-a847-8590a5ed8e71 /data/monmap
>>>> /usr/local/bin/ceph-mon --mkfs -i ceph-mon.dkctl --monmap /data/monmap --keyring /data/ceph.mon.keyring
>>>> 
>>>> my ceph.conf is (I have configured local TLD dkctl. with ceph-mon A-record):
>>>> 
>>>> [global]
>>>> 
>>>> fsid = e90dfd37-98d1-45bb-a847-8590a5ed8e71
>>>> mon initial members = ceph-mon.dkctl
>>>> 
>>>> auth cluster required = cephx
>>>> auth service required = cephx
>>>> auth client required = cephx
>>>> 
>>>> keyring = /ceph.client.admin.keyring
>>>> 
>>>> osd pool default size = 2
>>>> osd pool default min size = 2
>>>> osd pool default pg num = 333
>>>> osd pool default pgp num = 333
>>>> osd crush chooseleaf type = 1
>>>> osd journal size = 1000
>>>> 
>>>> filestore xattr use omap = true
>>>> 
>>>> mon host = ceph-mon.dkctl
>>>> mon addr = ceph-mon.dkctl
>>>> 
>>>> log file = /data/logs/ceph.log
>>>> 
>>>> [mon]
>>>> mon data = /data/mon
>>>> keyring = /data/ceph.mon.keyring
>>>> log file = /data/logs/mon.log
>>>> 
>>>> [osd.0]
>>>> osd host    = osd0
>>>> osd data    = /data/osd
>>>> osd journal = /data/osd.journal
>>>> log file    = /data/logs/osd.log
>>>> keyring     = /data/ceph.osd.keyring
>>>> 
>>>> started ceph-mon:
>>>> 
>>>> /usr/local/bin/ceph-mon -c /ceph.conf --public-addr `grep ceph-mon /etc/hosts | awk '{print $1}'` -i ceph-mon.dkctl
>>>> 
>>>> After that following commands crushed ceph-mon daemon:
>>>> 
>>>> root@ceph-mon:/# ceph osd crush add-bucket osd-host host
>>>> added bucket osd-host type host to crush map
>>>> root@ceph-mon:/# ceph osd crush move osd-host root=default
>>>> moved item id -2 name 'osd-host' to location {root=default} in crush map
>>>> root@ceph-mon:/# ceph osd crush add-bucket osd.0 osd
>>>> added bucket osd.0 type osd to crush map
>>>> root@ceph-mon:/# ceph osd tree
>>>> # id    weight  type name       up/down reweight
>>>> -3      0       osd osd.0
>>>> -1      0       root default
>>>> -2      0               host osd-host
>>>> 
>>>> root@ceph-mon:/# ceph osd crush move osd.0 host=osd-host
>>>> 2014-02-18 16:00:14.093243 7ff077fff700  0 monclient: hunting for new mon
>>>> 2014-02-18 16:00:14.093781 7ff07c130700  0 -- 172.17.0.160:0/1000148 >> 172.17.0.160:6789/0 pipe(0x7ff06c004770 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff06c0049d0).fault
>>>> 2014-02-18 16:00:16.996981 7ff07c231700  0 -- 172.17.0.160:0/1000148 >> 172.17.0.160:6789/0 pipe(0x7ff060000c00 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff060000e60).fault
>>>> 2014-02-18 16:00:19.998108 7ff07c130700  0 -- 172.17.0.160:0/1000148 >> 172.17.0.160:6789/0 pipe(0x7ff060003010 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff060001e70).fault
>>>> 
>>>> Log file of ceph mon shows:
>>>> 
>>>> *** Caught signal (Segmentation fault) **
>>>> in thread 7f09109dd700
>>>> ceph version 0.75 (946d60369589d6a269938edd65c0a6a7b1c3ef5c)
>>>> 1: /usr/local/bin/ceph-mon() [0x83457e]
>>>> 2: (()+0xf210) [0x7f0915772210]
>>>> 3: /usr/local/bin/ceph-mon() [0x7c398a]
>>>> 4: /usr/local/bin/ceph-mon() [0x7c3c9c]
>>>> 5: /usr/local/bin/ceph-mon() [0x7c3d31]
>>>> 6: (crush_do_rule()+0x20a) [0x7c448a]
>>>> 7: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >&) const+0xdd) [0x725add]
>>>> 8: (OSDMap::pg_to_acting_osds(pg_t, std::vector<int, std::allocator<int> >&) const+0x81) [0x725da1]
>>>> 9: (PGMonitor::map_pg_creates()+0x15f) [0x610abf]
>>>> 10: (PGMonitor::post_paxos_update()+0x25) [0x611205]
>>>> 11: (Monitor::refresh_from_paxos(bool*)+0x95) [0x543205]
>>>> 12: (Paxos::do_refresh()+0x24) [0x590c24]
>>>> 13: (Paxos::begin(ceph::buffer::list&)+0x99e) [0x59b54e]
>>>> 14: (Paxos::propose_queued()+0xdd) [0x59b92d]
>>>> 15: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x150) [0x59ca30]
>>>> 16: (PaxosService::propose_pending()+0x6d9) [0x5a3099]
>>>> 17: (PaxosService::dispatch(PaxosServiceMessage*)+0xd77) [0x5a4347]
>>>> 18: (Monitor::handle_command(MMonCommand*)+0x1073) [0x56e253]
>>>> 19: (Monitor::dispatch(MonSession*, Message*, bool)+0x2e8) [0x571168]
>>>> 20: (Monitor::_ms_dispatch(Message*)+0x1e4) [0x571774]
>>>> 21: (Monitor::ms_dispatch(Message*)+0x20) [0x590050]
>>>> 22: (DispatchQueue::entry()+0x56a) [0x80a65a]
>>>> 23: (DispatchQueue::DispatchThread::entry()+0xd) [0x73e75d]
>>>> 24: (()+0x7e0e) [0x7f091576ae0e]
>>>> 25: (clone()+0x6d) [0x7f0913d1c0fd]
>>>> 2014-02-18 16:00:14.088851 7f09109dd700 -1 *** Caught signal (Segmentation fault
>>>> ) **
>>>> in thread 7f09109dd700
>>>> 
>>>> ceph version 0.75 (946d60369589d6a269938edd65c0a6a7b1c3ef5c)
>>>> 1: /usr/local/bin/ceph-mon() [0x83457e]
>>>> 2: (()+0xf210) [0x7f0915772210]
>>>> 3: /usr/local/bin/ceph-mon() [0x7c398a]
>>>> 4: /usr/local/bin/ceph-mon() [0x7c3c9c]
>>>> 5: /usr/local/bin/ceph-mon() [0x7c3d31]
>>>> 6: (crush_do_rule()+0x20a) [0x7c448a]
>>>> 7: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator
>>>> <int> >&) const+0xdd) [0x725add]
>>>> 8: (OSDMap::pg_to_acting_osds(pg_t, std::vector<int, std::allocator<int> >&) co
>>>> nst+0x81) [0x725da1]
>>>> 9: (PGMonitor::map_pg_creates()+0x15f) [0x610abf]
>>>> 10: (PGMonitor::post_paxos_update()+0x25) [0x611205]
>>>> 11: (Monitor::refresh_from_paxos(bool*)+0x95) [0x543205]
>>>> 12: (Paxos::do_refresh()+0x24) [0x590c24]
>>>> 13: (Paxos::begin(ceph::buffer::list&)+0x99e) [0x59b54e]
>>>> 14: (Paxos::propose_queued()+0xdd) [0x59b92d]
>>>> 15: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x150) [0x59ca30]
>>>> 16: (PaxosService::propose_pending()+0x6d9) [0x5a3099]
>>>> 17: (PaxosService::dispatch(PaxosServiceMessage*)+0xd77) [0x5a4347]
>>>> 18: (Monitor::handle_command(MMonCommand*)+0x1073) [0x56e253]
>>>> 19: (Monitor::dispatch(MonSession*, Message*, bool)+0x2e8) [0x571168]
>>>> 20: (Monitor::_ms_dispatch(Message*)+0x1e4) [0x571774]
>>>> 21: (Monitor::ms_dispatch(Message*)+0x20) [0x590050]
>>>> 22: (DispatchQueue::entry()+0x56a) [0x80a65a]
>>>> 23: (DispatchQueue::DispatchThread::entry()+0xd) [0x73e75d]
>>>> 24: (()+0x7e0e) [0x7f091576ae0e]
>>>> 25: (clone()+0x6d) [0x7f0913d1c0fd]
>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to int
>>>> erpret this.
>>>> 
>>>> --- begin dump of recent events ---
>>>> -395> 2014-02-18 15:59:09.388974 7f0915dfb7c0  5 asok(0x354af50) register_command perfcounters_dump hook 0x3542010
>>>> -394> 2014-02-18 15:59:09.389006 7f0915dfb7c0  5 asok(0x354af50) register_command 1 hook 0x3542010
>>>> -393> 2014-02-18 15:59:09.389011 7f0915dfb7c0  5 asok(0x354af50) register_command perf dump hook 0x3542010
>>>> -392> 2014-02-18 15:59:09.389016 7f0915dfb7c0  5 asok(0x354af50) register_command perfcounters_schema hook 0x3542010
>>>> -391> 2014-02-18 15:59:09.389020 7f0915dfb7c0  5 asok(0x354af50) register_command 2 hook 0x3542010
>>>> -390> 2014-02-18 15:59:09.389021 7f0915dfb7c0  5 asok(0x354af50) register_command perf schema hook 0x3542010
>>>> -389> 2014-02-18 15:59:09.389023 7f0915dfb7c0  5 asok(0x354af50) register_command config show hook 0x3542010
>>>> -388> 2014-02-18 15:59:09.389028 7f0915dfb7c0  5 asok(0x354af50) register_command config set hook 0x3542010
>>>> -387> 2014-02-18 15:59:09.389029 7f0915dfb7c0  5 asok(0x354af50) register_command config get hook 0x3542010
>>>> -386> 2014-02-18 15:59:09.389031 7f0915dfb7c0  5 asok(0x354af50) register_command log flush hook 0x3542010
>>>> -385> 2014-02-18 15:59:09.389035 7f0915dfb7c0  5 asok(0x354af50) register_command log dump hook 0x3542010
>>>> -384> 2014-02-18 15:59:09.389037 7f0915dfb7c0  5 asok(0x354af50) register_command log reopen hook 0x3542010
>>>> -383> 2014-02-18 15:59:09.390539 7f0915dfb7c0  0 ceph version 0.75 (946d60369589d6a269938edd65c0a6a7b1c3ef5c), process ceph-mon, pid 6
>>>> -382> 2014-02-18 15:59:09.390870 7f0915dfb7c0  5 asok(0x354af50) init /var/run/ceph/ceph-mon.ceph-mon.dkctl.asok
>>>> -381> 2014-02-18 15:59:09.390898 7f0915dfb7c0  5 asok(0x354af50) bind_and_listen /var/run/ceph/ceph-mon.ceph-mon.dkctl.asok
>>>> -380> 2014-02-18 15:59:09.391018 7f0915dfb7c0  5 asok(0x354af50) register_command 0 hook 0x353e038
>>>> -379> 2014-02-18 15:59:09.391043 7f0915dfb7c0  5 asok(0x354af50) register_command version hook 0x353e038
>>>> -378> 2014-02-18 15:59:09.391046 7f0915dfb7c0  5 asok(0x354af50) register_command git_version hook 0x353e038
>>>> -377> 2014-02-18 15:59:09.391049 7f0915dfb7c0  5 asok(0x354af50) register_command help hook 0x3542050
>>>> -376> 2014-02-18 15:59:09.391051 7f0915dfb7c0  5 asok(0x354af50) register_command get_command_descriptions hook 0x3542040
>>>> -375> 2014-02-18 15:59:09.391104 7f09121e0700  5 asok(0x354af50) entry start
>>>> -374> 2014-02-18 15:59:09.459305 7f0915dfb7c0  1 -- 172.17.0.160:6789/0 learned my addr 172.17.0.160:6789/0
>>>> -373> 2014-02-18 15:59:09.459333 7f0915dfb7c0  1 accepter.accepter.bind my_inst.addr is 172.17.0.160:6789/0 need_addr=0
>>>> -372> 2014-02-18 15:59:09.459359 7f0915dfb7c0  5 adding auth protocol: cephx
>>>> -371> 2014-02-18 15:59:09.459363 7f0915dfb7c0  5 adding auth protocol: cephx
>>>> -370> 2014-02-18 15:59:09.459451 7f0915dfb7c0  1 mon.ceph-mon.dkctl@-1(probing) e1 preinit fsid e90dfd37-98d1-45bb-a847-8590a5ed8e71
>>>> -369> 2014-02-18 15:59:09.459512 7f0915dfb7c0  1 mon.ceph-mon.dkctl@-1(probing) e1  initial_members ceph-mon.dkctl, filtering seed monmap
>>>> -368> 2014-02-18 15:59:09.459524 7f0915dfb7c0  1  keeping ceph-mon.dkctl 172.17.0.160:6789/0
>>>> -367> 2014-02-18 15:59:09.459812 7f0915dfb7c0  2 auth: KeyRing::load: loaded key file /data/mon/keyring
>>>> -366> 2014-02-18 15:59:09.459832 7f0915dfb7c0  5 asok(0x354af50) register_command mon_status hook 0x35420e0
>>>> -365> 2014-02-18 15:59:09.459838 7f0915dfb7c0  5 asok(0x354af50) register_command quorum_status hook 0x35420e0
>>>> -364> 2014-02-18 15:59:09.459840 7f0915dfb7c0  5 asok(0x354af50) register_command sync_force hook 0x35420e0
>>>> -363> 2014-02-18 15:59:09.459842 7f0915dfb7c0  5 asok(0x354af50) register_command add_bootstrap_peer_hint hook 0x35420e0
>>>> -362> 2014-02-18 15:59:09.459844 7f0915dfb7c0  5 asok(0x354af50) register_command quorum enter hook 0x35420e0
>>>> -361> 2014-02-18 15:59:09.459845 7f0915dfb7c0  5 asok(0x354af50) register_command quorum exit hook 0x35420e0
>>>> -360> 2014-02-18 15:59:09.459851 7f0915dfb7c0  1 -- 172.17.0.160:6789/0 messenger.start
>>>> -359> 2014-02-18 15:59:09.459917 7f0915dfb7c0  2 mon.ceph-mon.dkctl@-1(probing) e1 init
>>>> -358> 2014-02-18 15:59:09.459979 7f0915dfb7c0  1 accepter.accepter.start
>>>> -357> 2014-02-18 15:59:09.460029 7f0915dfb7c0  0 mon.ceph-mon.dkctl@-1(probing) e1  my rank is now 0 (was -1)
>>>> -356> 2014-02-18 15:59:09.460033 7f0915dfb7c0  1 -- 172.17.0.160:6789/0 mark_down_all
>>>> -355> 2014-02-18 15:59:09.460045 7f0915dfb7c0  1 mon.ceph-mon.dkctl@0(probing) e1 win_standalone_election
>>>> -354> 2014-02-18 15:59:09.482424 7f0915dfb7c0  0 log [INF] : mon.ceph-mon.dkctl@0 won leader election with quorum 0
>>>> -353> 2014-02-18 15:59:09.482450 7f0915dfb7c0 10 send_log to self
>>>> -352> 2014-02-18 15:59:09.482453 7f0915dfb7c0 10  log_queue is 1 last_log 1 sent 0 num 1 unsent 1 sending 1
>>>> -351> 2014-02-18 15:59:09.482457 7f0915dfb7c0 10  will send 2014-02-18 15:59:09.482449 mon.0 172.17.0.160:6789/0 1 : [INF] mon.ceph-mon.dkctl@0 won leader election with quorum 0
>>>> -350> 2014-02-18 15:59:09.482491 7f0915dfb7c0  1 -- 172.17.0.160:6789/0 --> mon.0 172.17.0.160:6789/0 -- log(1 entries) v1 -- ?+0 0x35866c0
>>>> -349> 2014-02-18 15:59:09.482564 7f09109dd700  1 -- 172.17.0.160:6789/0 <== mon.0 172.17.0.160:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x35866c0 con 0x359a420
>>>> -348> 2014-02-18 15:59:09.482598 7f0915dfb7c0  5 mon.ceph-mon.dkctl@0(leader).paxos(paxos active c 0..0) queue_proposal bl 398 bytes; ctx = 0x35420c0
>>>> -347> 2014-02-18 15:59:09.530752 7f0915dfb7c0  0 log [INF] : pgmap v1: 0 pgs: ; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
>>>> -346> 2014-02-18 15:59:09.530776 7f0915dfb7c0 10 send_log to self
>>>> -345> 2014-02-18 15:59:09.530778 7f0915dfb7c0 10  log_queue is 2 last_log 2 sent 1 num 2 unsent 1 sending 1
>>>> -344> 2014-02-18 15:59:09.530781 7f0915dfb7c0 10  will send 2014-02-18 15:59:09.482449 mon.0 172.17.0.160:6789/0 1 : [INF] mon.ceph-mon.dkctl@0 won leader election with quorum 0
>>>> -343> 2014-02-18 15:59:09.530808 7f0915dfb7c0  1 -- 172.17.0.160:6789/0 --> mon.0 172.17.0.160:6789/0 -- log(1 entries) v1 -- ?+0 0x3586d80
>>>> -342> 2014-02-18 15:59:09.530898 7f0915dfb7c0  5 mon.ceph-mon.dkctl@0(leader).paxos(paxos active c 1..1) queue_proposal bl 477 bytes; ctx = 0x35420c0
>>>> -341> 2014-02-18 15:59:09.578860 7f0915dfb7c0  4 mon.ceph-mon.dkctl@0(leader).mds e1 new map
>>>> -340> 2014-02-18 15:59:09.578888 7f0915dfb7c0  0 mon.ceph-mon.dkctl@0(leader).mds e1 print_map
>>>> 
>>>> With best regards,
>>>> Pavel.
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux