Hi,all nodes are in one VLAN connected to one switch. Conectivity is OK, MTU 1500, can transfer data over netcat and mbuffer at 660 Mbps.
debug_ms, there is nothing interest: /usr/bin/ceph-osd --debug_ms 100 -f -i 0 --pid-file /run/ceph/osd.0.pid -c /etc/ceph/ceph.conf starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2015-12-29 00:18:05.878954 7fd9892e7800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-12-29 00:18:05.899633 7fd9892e7800 -1 osd.0 24 log_to_monitors {default=true} Thanks, Martin Dne 29.12.2015 v 00:08 Somnath Roy napsal(a):
It could be a network issue..May be related to MTU (?)..Try running with debug_ms = 1 and see if you find anything..Also, try running command like 'traceroute' and see if it is reporting any error.. Thanks & Regards Somnath -----Original Message----- From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Ing. Martin Samek Sent: Monday, December 28, 2015 2:59 PM To: Ceph Users Subject: My OSDs are down and not coming UP Hi, I'm a newbie in a Ceph world. I try setup my first testing Ceph cluster but unlikely my MON server running and talking each to other but my OSDs are still down and won't to come up. Actually only the one OSD running at the same node as a elected master is able to connect and come UP. To be technical. I have 4 physical nodes living in pure IPv6 environment, running Gentoo Linux and Ceph 9.2. All nodes names are resolvable in DNS and also saved in hosts files. I'm running OSD with command like this: node1# /usr/bin/ceph-osd -f -i 1 --pid-file /run/ceph/osd.1.pid -c /etc/ceph/ceph.conf single mon.0 is running also at node1, and OSD come up: 2015-12-28 23:37:27.931686 mon.0 [INF] osd.1 [2001:718:2:1612::50]:6800/23709 boot 2015-12-28 23:37:27.932605 mon.0 [INF] osdmap e19: 2 osds: 1 up, 1 in 2015-12-28 23:37:27.933963 mon.0 [INF] pgmap v24: 64 pgs: 64 stale+active+undersized+degraded; 0 bytes data, 1057 MB used, 598 GB / 599 GB avail but running osd.0 at node2: # /usr/bin/ceph-osd -f -i 0 --pid-file /run/ceph/osd.0.pid -c /etc/ceph/ceph.conf did nothing, process is running, netstat shows opened connection from ceph-osd between node2 and node1. Here I'm lost. IPv6 connectivity is OK, DNS is OK, time is in sync, 1 mon running, 2 osds but only one UP. What is missing? ceph-osd in debug mode show differences at node1 and node2: node1, UP:2015-12-28 01:42:59.084371 7f72f9873800 20 osd.1 15 clearing temps in 0.3f_head pgid 0.3f 2015-12-28 01:42:59.084453 7f72f9873800 0 osd.1 15 load_pgs 2015-12-28 01:42:59.085248 7f72f9873800 10 osd.1 15 load_pgs ignoring unrecognized meta 2015-12-28 01:42:59.094690 7f72f9873800 10 osd.1 15 pgid 0.0 coll 0.0_head 2015-12-28 01:42:59.094835 7f72f9873800 30 osd.1 0 get_map 15 -cached 2015-12-28 01:42:59.094848 7f72f9873800 10 osd.1 15 _open_lock_pg 0.0 2015-12-28 01:42:59.094857 7f72f9873800 10 osd.1 15 _get_pool 0 2015-12-28 01:42:59.094928 7f72f9873800 5 osd.1 pg_epoch: 15 pg[0.0(unlocked)] enter Initial 2015-12-28 01:42:59.094980 7f72f9873800 20 osd.1 pg_epoch: 15 pg[0.0(unlocked)] enter NotTrimming 2015-12-28 01:42:59.094998 7f72f9873800 30 osd.1 pg_epoch: 15 pg[0.0( DNE empty local-les=0 n=0 ec=0 les/c/f 0/0/0 0/0/0) [] r=0 lpr=0 crt=0'0 inactive NIBBLEW 2015-12-28 01:42:59.095186 7f72f9873800 20 read_log coll 0.0_head log_oid 0/00000000//headnode2, DOWN:2015-12-28 01:36:54.437246 7f4507957800 0 osd.0 11 load_pgs 2015-12-28 01:36:54.437267 7f4507957800 10 osd.0 11 load_pgs ignoring unrecognized meta 2015-12-28 01:36:54.437274 7f4507957800 0 osd.0 11 load_pgs opened 0 pgs 2015-12-28 01:36:54.437278 7f4507957800 10 osd.0 11 build_past_intervals_parallel nothing to build 2015-12-28 01:36:54.437282 7f4507957800 2 osd.0 11 superblock: i am osd.0 2015-12-28 01:36:54.437287 7f4507957800 10 osd.0 11 create_logger 2015-12-28 01:36:54.438157 7f4507957800 -1 osd.0 11 log_to_monitors {default=true} 2015-12-28 01:36:54.449278 7f4507957800 10 osd.0 11 set_disk_tp_priority class priority -1 2015-12-28 01:36:54.450813 7f44ddbff700 30 osd.0 11 heartbeat 2015-12-28 01:36:54.452558 7f44ddbff700 30 osd.0 11 heartbeat checking stats 2015-12-28 01:36:54.452592 7f44ddbff700 20 osd.0 11 update_osd_stat osd_stat(1056 MB used, 598 GB avail, 599 GB total, peers []/[] op hist []) 2015-12-28 01:36:54.452611 7f44ddbff700 5 osd.0 11 heartbeat: osd_stat(1056 MB used, 598 GB avail, 599 GB total, peers []/[] op hist []) 2015-12-28 01:36:54.452618 7f44ddbff700 30 osd.0 11 heartbeat check 2015-12-28 01:36:54.452622 7f44ddbff700 30 osd.0 11 heartbeat lonely? 2015-12-28 01:36:54.452624 7f44ddbff700 30 osd.0 11 heartbeat done 2015-12-28 01:36:54.452627 7f44ddbff700 30 osd.0 11 heartbeat_entry sleeping for 2.3 2015-12-28 01:36:54.452588 7f44da7fc700 10 osd.0 11 agent_entry start 2015-12-28 01:36:54.453338 7f44da7fc700 20 osd.0 11 agent_entry empty queueMy ceph.conf looks like this: [global] fsid = b186d870-9c6d-4a8b-ac8a-e263f4c205da ms_bind_ipv6 = true public_network = xxxx:xxxx:2:1612::/64 mon initial members = 0 mon host = [xxxx:xxxx:2:1612::50]:6789 auth cluster required = cephx auth service required = cephx auth client required = cephx osd pool default size = 2 osd pool default min size = 1 osd journal size = 1024 osd mkfs type = xfs osd mount options xfs = rw,inode64 osd crush chooseleaf type = 1 [mon.0] host = node1 mon addr = [xxxx:xxxx:2:1612::50]:6789 [mon.1] host = node3 mon addr = [xxxx:xxxx:2:1612::30]:6789 [osd.0] host = node2 devs = /dev/vg0/osd0 [osd.1] host = node1 devs = /dev/vg0/osd My ceph osd tree: node1 # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 2.00000 root default -2 1.00000 host node2 0 1.00000 osd.0 down 0 1.00000 -3 1.00000 host node1 1 1.00000 osd.1 up 1.00000 1.00000 Any help how to cope with this is appreciated. I follow steps in this guides: https://wiki.gentoo.org/wiki/Ceph/Installation#Installation http://docs.ceph.com/docs/master/install/manual-deployment/#adding-osds http://www.mad-hacking.net/documentation/linux/ha-cluster/storage-area-network/ceph-additional-nodes.xml http://blog.widodh.nl/2014/05/deploying-ceph-over-ipv6/ Thanks in advance. Martin -- ==================================== Ing. Martin Samek ICT systems engineer FELK Admin Czech Technical University in Prague Faculty of Electrical Engineering Department of Control Engineering Karlovo namesti 13/E, 121 35 Prague Czech Republic e-mail: samekma1@xxxxxxxxxxx phone: +420 22435 7599 mobile: +420 605 285 125 ====================================
Attachment:
smime.p7s
Description: Elektronicky podpis S/MIME
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com