MON running 'ceph -w' doesn't see OSD's booting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have 3 storage servers each with 30 osds. Each osd has a journal that is a partition on a virtual drive that is a raid0 of 6 ssds. I brought up a 3 osd (1 per storage server) cluster to bring up Ceph and figure out configuration etc.

From: Dan Van Der Ster [mailto:daniel.vanderster@xxxxxxx]
Sent: Thursday, August 21, 2014 1:17 AM
To: Bruce McFarland
Cc: ceph-users at ceph.com
Subject: Re: MON running 'ceph -w' doesn't see OSD's booting

Hi,
You only have one OSD? I've seen similar strange things in test pools having only one OSD - and I kinda explained it by assuming that OSDs need peers (other OSDs sharing the same PG) to behave correctly. Install a second OSD and see how it goes...
Cheers, Dan


On 21 Aug 2014, at 02:59, Bruce McFarland <Bruce.McFarland at taec.toshiba.com<mailto:Bruce.McFarland at taec.toshiba.com>> wrote:


I have a cluster with 1 monitor and 3 OSD Servers. Each server has multiple OSD's running on it. When I start the OSD using /etc/init.d/ceph start osd.0
I see the expected interaction between the OSD and the monitor authenticating keys etc and finally the OSD starts.

Running watching the cluster with 'ceph -w' running on the monitor I never see the INFO messages I expect. There isn't a msg from osd.0 for the boot event and the expected INFO messages from osdmap and pgmap  for the osd and it's pages being added to those maps.  I only see the last time the monitor was booted and it wins the monitor election and reports monmap, pgmap, and mdsmap info.

The firewalls are disabled with selinux==disabled and iptables turned off. All hosts can ssh w/o passwords into each other and I've verified traffic between hosts using tcpdump captures. Any ideas on what I'd need to add to ceph.conf or have overlooked would be greatly appreciated.
Thanks,
Bruce

[root at ceph0 ceph]# /etc/init.d/ceph restart osd.0
=== osd.0 ===
=== osd.0 ===
Stopping Ceph osd.0 on ceph0...kill 15676...done
=== osd.0 ===
2014-08-20 17:43:46.456592 7fa51a034700  1 -- :/0 messenger.start
2014-08-20 17:43:46.457363 7fa51a034700  1 -- :/1025971 --> 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7fa51402f9e0 con 0x7fa51402f570
2014-08-20 17:43:46.458229 7fa5189f0700  1 -- 209.243.160.83:0/1025971 learned my addr 209.243.160.83:0/1025971
2014-08-20 17:43:46.459664 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 1 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0) 0x7fa508000ab0 con 0x7fa51402f570
2014-08-20 17:43:46.459849 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570
2014-08-20 17:43:46.460180 7fa5135fe700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fa4fc0012d0 con 0x7fa51402f570
2014-08-20 17:43:46.461341 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (409581826 0 0) 0x7fa508000f60 con 0x7fa51402f570
2014-08-20 17:43:46.461514 7fa5135fe700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7fa4fc001cf0 con 0x7fa51402f570
2014-08-20 17:43:46.462824 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 393+0+0 (2134012784 0 0) 0x7fa5080011d0 con 0x7fa51402f570
2014-08-20 17:43:46.463011 7fa5135fe700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7fa51402bbc0 con 0x7fa51402f570
2014-08-20 17:43:46.463073 7fa5135fe700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x7fa4fc0025d0 con 0x7fa51402f570
2014-08-20 17:43:46.463329 7fa51a034700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7fa514030490 con 0x7fa51402f570
2014-08-20 17:43:46.463363 7fa51a034700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7fa5140309b0 con 0x7fa51402f570
2014-08-20 17:43:46.463564 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 5 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0) 0x7fa508001100 con 0x7fa51402f570
2014-08-20 17:43:46.463639 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (540052875 0 0) 0x7fa5080013e0 con 0x7fa51402f570
2014-08-20 17:43:46.463707 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 7 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (1040860857 0 0) 0x7fa5080015d0 con 0x7fa51402f570
2014-08-20 17:43:46.468877 7fa51a034700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_command({"prefix": "get_command_descriptions"} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570
2014-08-20 17:43:46.469862 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 8 ==== osd_map(554..554 src has 1..554) v3 ==== 59499+0+0 (2180258623 0 0) 0x7fa50800f980 con 0x7fa51402f570
2014-08-20 17:43:46.470428 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (540052875 0 0) 0x7fa50800fc40 con 0x7fa51402f570
2014-08-20 17:43:46.475021 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 10 ==== osd_map(554..554 src has 1..554) v3 ==== 59499+0+0 (2180258623 0 0) 0x7fa508001100 con 0x7fa51402f570
2014-08-20 17:43:46.475081 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 11 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (540052875 0 0) 0x7fa508001310 con 0x7fa51402f570
2014-08-20 17:43:46.477559 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 12 ==== mon_command_ack([{"prefix": "get_command_descriptions"}]=0  v0) v1 ==== 72+0+29681 (1092875540 0 3117897362) 0x7fa5080012b0 con 0x7fa51402f570
2014-08-20 17:43:46.592859 7fa51a034700  1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_command({"prefix": "osd crush create-or-move", "args": ["host=ceph0", "root=default"], "id": 0, "weight": 3.6400000000000001} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570
2014-08-20 17:43:46.594426 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 13 ==== mon_command_ack([{"prefix": "osd crush create-or-move", "args": ["host=ceph0", "root=default"], "id": 0, "weight": 3.6400000000000001}]=0 create-or-move updated item name 'osd.0' weight 3.64 at location {host=ceph0,root=default} to crush map v554) v1 ==== 254+0+0 (748268703 0 0) 0x7fa508001100 con 0x7fa51402f570
create-or-move updated item name 'osd.0' weight 3.64 at location {host=ceph0,root=default} to crush map
2014-08-20 17:43:46.602415 7fa51a034700  1 -- 209.243.160.83:0/1025971 mark_down 0x7fa51402f570 -- 0x7fa51402f300
2014-08-20 17:43:46.602500 7fa51a034700  1 -- 209.243.160.83:0/1025971 mark_down_all
2014-08-20 17:43:46.602666 7fa51a034700  1 -- 209.243.160.83:0/1025971 shutdown complete.
Starting Ceph osd.0 on ceph0...
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
[root at ceph0 ceph]#


Ceph -w output from ceph-mon01:
2014-08-20 17:20:24.648538 7f326ebfd700  0 monclient: hunting for new mon
2014-08-20 17:20:24.648857 7f327455f700  0 -- 209.243.160.84:0/1005462 >> 209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3264020570).fault
2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01 at 0<mailto:mon.ceph-mon01 at 0> won leader election with quorum 0
2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}
2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140821/659e206f/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux