MON running 'ceph -w' doesn't see OSD's booting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Are the OSD processes still alive? What's the osdmap output of "ceph
-w" (which was not in the output you pasted)?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Aug 21, 2014 at 7:11 AM, Bruce McFarland
<Bruce.McFarland at taec.toshiba.com> wrote:
> I have 3 storage servers each with 30 osds. Each osd has a journal that is a
> partition on a virtual drive that is a raid0 of 6 ssds. I brought up a 3 osd
> (1 per storage server) cluster to bring up Ceph and figure out configuration
> etc.
>
>
>
> From: Dan Van Der Ster [mailto:daniel.vanderster at cern.ch]
> Sent: Thursday, August 21, 2014 1:17 AM
> To: Bruce McFarland
> Cc: ceph-users at ceph.com
> Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting
>
>
>
> Hi,
>
> You only have one OSD? I?ve seen similar strange things in test pools having
> only one OSD ? and I kinda explained it by assuming that OSDs need peers
> (other OSDs sharing the same PG) to behave correctly. Install a second OSD
> and see how it goes...
>
> Cheers, Dan
>
>
>
>
>
> On 21 Aug 2014, at 02:59, Bruce McFarland <Bruce.McFarland at taec.toshiba.com>
> wrote:
>
>
>
> I have a cluster with 1 monitor and 3 OSD Servers. Each server has multiple
> OSD?s running on it. When I start the OSD using /etc/init.d/ceph start osd.0
>
> I see the expected interaction between the OSD and the monitor
> authenticating keys etc and finally the OSD starts.
>
>
>
> Running watching the cluster with ?ceph ?w? running on the monitor I never
> see the INFO messages I expect. There isn?t a msg from osd.0 for the boot
> event and the expected INFO messages from osdmap and pgmap  for the osd and
> it?s pages being added to those maps.  I only see the last time the monitor
> was booted and it wins the monitor election and reports monmap, pgmap, and
> mdsmap info.
>
>
>
> The firewalls are disabled with selinux==disabled and iptables turned off.
> All hosts can ssh w/o passwords into each other and I?ve verified traffic
> between hosts using tcpdump captures. Any ideas on what I?d need to add to
> ceph.conf or have overlooked would be greatly appreciated.
>
> Thanks,
>
> Bruce
>
>
>
> [root at ceph0 ceph]# /etc/init.d/ceph restart osd.0
>
> === osd.0 ===
>
> === osd.0 ===
>
> Stopping Ceph osd.0 on ceph0...kill 15676...done
>
> === osd.0 ===
>
> 2014-08-20 17:43:46.456592 7fa51a034700  1 -- :/0 messenger.start
>
> 2014-08-20 17:43:46.457363 7fa51a034700  1 -- :/1025971 -->
> 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0
> 0x7fa51402f9e0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.458229 7fa5189f0700  1 -- 209.243.160.83:0/1025971
> learned my addr 209.243.160.83:0/1025971
>
> 2014-08-20 17:43:46.459664 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 1 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0)
> 0x7fa508000ab0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.459849 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.460180 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7fa4fc0012d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.461341 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 206+0+0 (409581826 0 0) 0x7fa508000f60 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.461514 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x7fa4fc001cf0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.462824 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 393+0+0 (2134012784 0 0) 0x7fa5080011d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463011 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7fa51402bbc0
> con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463073 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0
> 0x7fa4fc0025d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463329 7fa51a034700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0
> 0x7fa514030490 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463363 7fa51a034700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0
> 0x7fa5140309b0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463564 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 5 ==== mon_map v1 ==== 200+0+0 (3445960796 0 0)
> 0x7fa508001100 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463639 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
> (540052875 0 0) 0x7fa5080013e0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463707 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 7 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 194+0+0 (1040860857 0 0) 0x7fa5080015d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.468877 7fa51a034700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- mon_command({"prefix": "get_command_descriptions"}
> v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.469862 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 8 ==== osd_map(554..554 src has 1..554) v3 ====
> 59499+0+0 (2180258623 0 0) 0x7fa50800f980 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.470428 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
> (540052875 0 0) 0x7fa50800fc40 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.475021 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 10 ==== osd_map(554..554 src has 1..554) v3 ====
> 59499+0+0 (2180258623 0 0) 0x7fa508001100 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.475081 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 11 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
> (540052875 0 0) 0x7fa508001310 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.477559 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 12 ==== mon_command_ack([{"prefix":
> "get_command_descriptions"}]=0  v0) v1 ==== 72+0+29681 (1092875540 0
> 3117897362) 0x7fa5080012b0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.592859 7fa51a034700  1 -- 209.243.160.83:0/1025971 -->
> 209.243.160.84:6789/0 -- mon_command({"prefix": "osd crush create-or-move",
> "args": ["host=ceph0", "root=default"], "id": 0, "weight":
> 3.6400000000000001} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.594426 7fa5135fe700  1 -- 209.243.160.83:0/1025971 <==
> mon.0 209.243.160.84:6789/0 13 ==== mon_command_ack([{"prefix": "osd crush
> create-or-move", "args": ["host=ceph0", "root=default"], "id": 0, "weight":
> 3.6400000000000001}]=0 create-or-move updated item name 'osd.0' weight 3.64
> at location {host=ceph0,root=default} to crush map v554) v1 ==== 254+0+0
> (748268703 0 0) 0x7fa508001100 con 0x7fa51402f570
>
> create-or-move updated item name 'osd.0' weight 3.64 at location
> {host=ceph0,root=default} to crush map
>
> 2014-08-20 17:43:46.602415 7fa51a034700  1 -- 209.243.160.83:0/1025971
> mark_down 0x7fa51402f570 -- 0x7fa51402f300
>
> 2014-08-20 17:43:46.602500 7fa51a034700  1 -- 209.243.160.83:0/1025971
> mark_down_all
>
> 2014-08-20 17:43:46.602666 7fa51a034700  1 -- 209.243.160.83:0/1025971
> shutdown complete.
>
> Starting Ceph osd.0 on ceph0...
>
> starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0
> /var/lib/ceph/osd/ceph-0/journal
>
> [root at ceph0 ceph]#
>
>
>
>
>
> Ceph ?w output from ceph-mon01:
>
> 2014-08-20 17:20:24.648538 7f326ebfd700  0 monclient: hunting for new mon
>
> 2014-08-20 17:20:24.648857 7f327455f700  0 -- 209.243.160.84:0/1005462 >>
> 209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f3264020570).fault
>
> 2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01 at 0 won leader election
> with quorum 0
>
> 2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at
> {ceph-mon01=209.243.160.84:6789/0}
>
> 2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 creating; 0
> bytes data, 0 kB used, 0 kB / 0 kB avail
>
> 2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux