Are the OSD processes still alive? What's the osdmap output of "ceph -w" (which was not in the output you pasted)? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Aug 21, 2014 at 7:11 AM, Bruce McFarland <Bruce.McFarland at taec.toshiba.com> wrote: > I have 3 storage servers each with 30 osds. Each osd has a journal that is a > partition on a virtual drive that is a raid0 of 6 ssds. I brought up a 3 osd > (1 per storage server) cluster to bring up Ceph and figure out configuration > etc. > > > > From: Dan Van Der Ster [mailto:daniel.vanderster at cern.ch] > Sent: Thursday, August 21, 2014 1:17 AM > To: Bruce McFarland > Cc: ceph-users at ceph.com > Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting > > > > Hi, > > You only have one OSD? I?ve seen similar strange things in test pools having > only one OSD ? and I kinda explained it by assuming that OSDs need peers > (other OSDs sharing the same PG) to behave correctly. Install a second OSD > and see how it goes... > > Cheers, Dan > > > > > > On 21 Aug 2014, at 02:59, Bruce McFarland <Bruce.McFarland at taec.toshiba.com> > wrote: > > > > I have a cluster with 1 monitor and 3 OSD Servers. Each server has multiple > OSD?s running on it. When I start the OSD using /etc/init.d/ceph start osd.0 > > I see the expected interaction between the OSD and the monitor > authenticating keys etc and finally the OSD starts. > > > > Running watching the cluster with ?ceph ?w? running on the monitor I never > see the INFO messages I expect. There isn?t a msg from osd.0 for the boot > event and the expected INFO messages from osdmap and pgmap for the osd and > it?s pages being added to those maps. I only see the last time the monitor > was booted and it wins the monitor election and reports monmap, pgmap, and > mdsmap info. > > > > The firewalls are disabled with selinux==disabled and iptables turned off. > All hosts can ssh w/o passwords into each other and I?ve verified traffic > between hosts using tcpdump captures. Any ideas on what I?d need to add to > ceph.conf or have overlooked would be greatly appreciated. > > Thanks, > > Bruce > > > > [root at ceph0 ceph]# /etc/init.d/ceph restart osd.0 > > === osd.0 === > > === osd.0 === > > Stopping Ceph osd.0 on ceph0...kill 15676...done > > === osd.0 === > > 2014-08-20 17:43:46.456592 7fa51a034700 1 -- :/0 messenger.start > > 2014-08-20 17:43:46.457363 7fa51a034700 1 -- :/1025971 --> > 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 > 0x7fa51402f9e0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.458229 7fa5189f0700 1 -- 209.243.160.83:0/1025971 > learned my addr 209.243.160.83:0/1025971 > > 2014-08-20 17:43:46.459664 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 1 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0) > 0x7fa508000ab0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.459849 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== > 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570 > > 2014-08-20 17:43:46.460180 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 > 0x7fa4fc0012d0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.461341 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== > 206+0+0 (409581826 0 0) 0x7fa508000f60 con 0x7fa51402f570 > > 2014-08-20 17:43:46.461514 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 > 0x7fa4fc001cf0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.462824 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== > 393+0+0 (2134012784 0 0) 0x7fa5080011d0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.463011 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7fa51402bbc0 > con 0x7fa51402f570 > > 2014-08-20 17:43:46.463073 7fa5135fe700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 > 0x7fa4fc0025d0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.463329 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 > 0x7fa514030490 con 0x7fa51402f570 > > 2014-08-20 17:43:46.463363 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 > 0x7fa5140309b0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.463564 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 5 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0) > 0x7fa508001100 con 0x7fa51402f570 > > 2014-08-20 17:43:46.463639 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 > (540052875 0 0) 0x7fa5080013e0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.463707 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 7 ==== auth_reply(proto 2 0 (0) Success) v1 ==== > 194+0+0 (1040860857 0 0) 0x7fa5080015d0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.468877 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- mon_command({"prefix": "get_command_descriptions"} > v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570 > > 2014-08-20 17:43:46.469862 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 8 ==== osd_map(554..554 src has 1..554) v3 ==== > 59499+0+0 (2180258623 0 0) 0x7fa50800f980 con 0x7fa51402f570 > > 2014-08-20 17:43:46.470428 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 > (540052875 0 0) 0x7fa50800fc40 con 0x7fa51402f570 > > 2014-08-20 17:43:46.475021 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 10 ==== osd_map(554..554 src has 1..554) v3 ==== > 59499+0+0 (2180258623 0 0) 0x7fa508001100 con 0x7fa51402f570 > > 2014-08-20 17:43:46.475081 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 11 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 > (540052875 0 0) 0x7fa508001310 con 0x7fa51402f570 > > 2014-08-20 17:43:46.477559 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 12 ==== mon_command_ack([{"prefix": > "get_command_descriptions"}]=0 v0) v1 ==== 72+0+29681 (1092875540 0 > 3117897362) 0x7fa5080012b0 con 0x7fa51402f570 > > 2014-08-20 17:43:46.592859 7fa51a034700 1 -- 209.243.160.83:0/1025971 --> > 209.243.160.84:6789/0 -- mon_command({"prefix": "osd crush create-or-move", > "args": ["host=ceph0", "root=default"], "id": 0, "weight": > 3.6400000000000001} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570 > > 2014-08-20 17:43:46.594426 7fa5135fe700 1 -- 209.243.160.83:0/1025971 <== > mon.0 209.243.160.84:6789/0 13 ==== mon_command_ack([{"prefix": "osd crush > create-or-move", "args": ["host=ceph0", "root=default"], "id": 0, "weight": > 3.6400000000000001}]=0 create-or-move updated item name 'osd.0' weight 3.64 > at location {host=ceph0,root=default} to crush map v554) v1 ==== 254+0+0 > (748268703 0 0) 0x7fa508001100 con 0x7fa51402f570 > > create-or-move updated item name 'osd.0' weight 3.64 at location > {host=ceph0,root=default} to crush map > > 2014-08-20 17:43:46.602415 7fa51a034700 1 -- 209.243.160.83:0/1025971 > mark_down 0x7fa51402f570 -- 0x7fa51402f300 > > 2014-08-20 17:43:46.602500 7fa51a034700 1 -- 209.243.160.83:0/1025971 > mark_down_all > > 2014-08-20 17:43:46.602666 7fa51a034700 1 -- 209.243.160.83:0/1025971 > shutdown complete. > > Starting Ceph osd.0 on ceph0... > > starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 > /var/lib/ceph/osd/ceph-0/journal > > [root at ceph0 ceph]# > > > > > > Ceph ?w output from ceph-mon01: > > 2014-08-20 17:20:24.648538 7f326ebfd700 0 monclient: hunting for new mon > > 2014-08-20 17:20:24.648857 7f327455f700 0 -- 209.243.160.84:0/1005462 >> > 209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7f3264020570).fault > > 2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01 at 0 won leader election > with quorum 0 > > 2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at > {ceph-mon01=209.243.160.84:6789/0} > > 2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 creating; 0 > bytes data, 0 kB used, 0 kB / 0 kB avail > > 2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up > > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >