I have 3 storage servers each with 30 osds. Each osd has a journal that is a partition on a virtual drive that is a raid0 of 6 ssds. I brought up a 3 osd (1 per storage server) cluster to bring up Ceph and figure out configuration etc. From: Dan Van Der Ster [mailto:daniel.vanderster@xxxxxxx] Sent: Thursday, August 21, 2014 1:17 AM To: Bruce McFarland Cc: ceph-users at ceph.com Subject: Re: MON running 'ceph -w' doesn't see OSD's booting Hi, You only have one OSD? I've seen similar strange things in test pools having only one OSD - and I kinda explained it by assuming that OSDs need peers (other OSDs sharing the same PG) to behave correctly. Install a second OSD and see how it goes... Cheers, Dan On 21 Aug 2014, at 02:59, Bruce McFarland <Bruce.McFarland at taec.toshiba.com<mailto:Bruce.McFarland at taec.toshiba.com>> wrote: I have a cluster with 1 monitor and 3 OSD Servers. Each server has multiple OSD's running on it. When I start the OSD using /etc/init.d/ceph start osd.0 I see the expected interaction between the OSD and the monitor authenticating keys etc and finally the OSD starts. Running watching the cluster with 'ceph -w' running on the monitor I never see the INFO messages I expect. There isn't a msg from osd.0 for the boot event and the expected INFO messages from osdmap and pgmap for the osd and it's pages being added to those maps. I only see the last time the monitor was booted and it wins the monitor election and reports monmap, pgmap, and mdsmap info. The firewalls are disabled with selinux==disabled and iptables turned off. All hosts can ssh w/o passwords into each other and I've verified traffic between hosts using tcpdump captures. Any ideas on what I'd need to add to ceph.conf or have overlooked would be greatly appreciated. Thanks, Bruce [root at ceph0 ceph]# /etc/init.d/ceph restart osd.0 === osd.0 === === osd.0 === Stopping Ceph osd.0 on ceph0...kill 15676...done === osd.0 === 2014-08-20 17:43:46.456592 7fa51a034700 1 -- :/0 messenger.start 2014-08-20 17:43:46.457363 7fa51a034700 1 -- :/1025971 --> 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7fa51402f9e0 con 0x7fa51402f570 2014-08-20 17:43:46.458229 7fa5189f0700 1 -- 209.243.160.83:0/1025971 learned my addr 209.243.160.83:0/1025971 2014-08-20 17:43:46.459664 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 1 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0) 0x7fa508000ab0 con 0x7fa51402f570 2014-08-20 17:43:46.459849 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570 2014-08-20 17:43:46.460180 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fa4fc0012d0 con 0x7fa51402f570 2014-08-20 17:43:46.461341 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (409581826 0 0) 0x7fa508000f60 con 0x7fa51402f570 2014-08-20 17:43:46.461514 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7fa4fc001cf0 con 0x7fa51402f570 2014-08-20 17:43:46.462824 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 393+0+0 (2134012784 0 0) 0x7fa5080011d0 con 0x7fa51402f570 2014-08-20 17:43:46.463011 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7fa51402bbc0 con 0x7fa51402f570 2014-08-20 17:43:46.463073 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x7fa4fc0025d0 con 0x7fa51402f570 2014-08-20 17:43:46.463329 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7fa514030490 con 0x7fa51402f570 2014-08-20 17:43:46.463363 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7fa5140309b0 con 0x7fa51402f570 2014-08-20 17:43:46.463564 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 5 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0) 0x7fa508001100 con 0x7fa51402f570 2014-08-20 17:43:46.463639 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (540052875 0 0) 0x7fa5080013e0 con 0x7fa51402f570 2014-08-20 17:43:46.463707 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 7 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (1040860857 0 0) 0x7fa5080015d0 con 0x7fa51402f570 2014-08-20 17:43:46.468877 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_command({"prefix": "get_command_descriptions"} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570 2014-08-20 17:43:46.469862 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 8 ==== osd_map(554..554 src has 1..554) v3 ==== 59499+0+0 (2180258623 0 0) 0x7fa50800f980 con 0x7fa51402f570 2014-08-20 17:43:46.470428 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (540052875 0 0) 0x7fa50800fc40 con 0x7fa51402f570 2014-08-20 17:43:46.475021 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 10 ==== osd_map(554..554 src has 1..554) v3 ==== 59499+0+0 (2180258623 0 0) 0x7fa508001100 con 0x7fa51402f570 2014-08-20 17:43:46.475081 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 11 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (540052875 0 0) 0x7fa508001310 con 0x7fa51402f570 2014-08-20 17:43:46.477559 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 12 ==== mon_command_ack([{"prefix": "get_command_descriptions"}]=0 v0) v1 ==== 72+0+29681 (1092875540 0 3117897362) 0x7fa5080012b0 con 0x7fa51402f570 2014-08-20 17:43:46.592859 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> 209.243.160.84:6789/0 -- mon_command({"prefix": "osd crush create-or-move", "args": ["host=ceph0", "root=default"], "id": 0, "weight": 3.6400000000000001} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570 2014-08-20 17:43:46.594426 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== mon.0 209.243.160.84:6789/0 13 ==== mon_command_ack([{"prefix": "osd crush create-or-move", "args": ["host=ceph0", "root=default"], "id": 0, "weight": 3.6400000000000001}]=0 create-or-move updated item name 'osd.0' weight 3.64 at location {host=ceph0,root=default} to crush map v554) v1 ==== 254+0+0 (748268703 0 0) 0x7fa508001100 con 0x7fa51402f570 create-or-move updated item name 'osd.0' weight 3.64 at location {host=ceph0,root=default} to crush map 2014-08-20 17:43:46.602415 7fa51a034700 1 -- 209.243.160.83:0/1025971 mark_down 0x7fa51402f570 -- 0x7fa51402f300 2014-08-20 17:43:46.602500 7fa51a034700 1 -- 209.243.160.83:0/1025971 mark_down_all 2014-08-20 17:43:46.602666 7fa51a034700 1 -- 209.243.160.83:0/1025971 shutdown complete. Starting Ceph osd.0 on ceph0... starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal [root at ceph0 ceph]# Ceph -w output from ceph-mon01: 2014-08-20 17:20:24.648538 7f326ebfd700 0 monclient: hunting for new mon 2014-08-20 17:20:24.648857 7f327455f700 0 -- 209.243.160.84:0/1005462 >> 209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3264020570).fault 2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01 at 0<mailto:mon.ceph-mon01 at 0> won leader election with quorum 0 2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0} 2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail 2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up _______________________________________________ ceph-users mailing list ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140821/659e206f/attachment.htm>