MON running 'ceph -w' doesn't see OSD's booting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes all of the ceph-osd processes are up and running. I perform a ceph-mon restart to see if that might trigger the osdmap update, but there is no INFO msg from the osdmap or the pgmap that I expect to when the osd's are started. All of the osd's and their hosts appear in the CRUSH map and in ceph.conf. 

Since I went through a bunch of issues getting the multiple osds/host setup and working I'm assuming that the monitor's tables might be hosed and am going to purgedata and reinstall the monitor and see if it builds the proper mappings. I've stopped all of the osd's and verified that there aren't any active ceph-osd processes. Then I'll follow the procedure for bringing online a new monitor to an existing cluster so that I use the proper fsid.

2014-08-20 17:20:24.648538 7f326ebfd700  0 monclient: hunting for new mon
2014-08-20 17:20:24.648857 7f327455f700  0 -- 209.243.160.84:0/1005462 >> 209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3264020570).fault
2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01 at 0 won leader election with quorum 0
2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}
2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up


-----Original Message-----
From: Gregory Farnum [mailto:greg@xxxxxxxxxxx] 
Sent: Thursday, August 21, 2014 8:44 AM
To: Bruce McFarland
Cc: Dan Van Der Ster; ceph-users at ceph.com
Subject: Re: MON running 'ceph -w' doesn't see OSD's booting

Are the OSD processes still alive? What's the osdmap output of "ceph -w" (which was not in the output you pasted)?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Aug 21, 2014 at 7:11 AM, Bruce McFarland <Bruce.McFarland at taec.toshiba.com> wrote:
> I have 3 storage servers each with 30 osds. Each osd has a journal 
> that is a partition on a virtual drive that is a raid0 of 6 ssds. I 
> brought up a 3 osd
> (1 per storage server) cluster to bring up Ceph and figure out 
> configuration etc.
>
>
>
> From: Dan Van Der Ster [mailto:daniel.vanderster at cern.ch]
> Sent: Thursday, August 21, 2014 1:17 AM
> To: Bruce McFarland
> Cc: ceph-users at ceph.com
> Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's 
> booting
>
>
>
> Hi,
>
> You only have one OSD? I?ve seen similar strange things in test pools 
> having only one OSD ? and I kinda explained it by assuming that OSDs 
> need peers (other OSDs sharing the same PG) to behave correctly. 
> Install a second OSD and see how it goes...
>
> Cheers, Dan
>
>
>
>
>
> On 21 Aug 2014, at 02:59, Bruce McFarland 
> <Bruce.McFarland at taec.toshiba.com>
> wrote:
>
>
>
> I have a cluster with 1 monitor and 3 OSD Servers. Each server has 
> multiple OSD?s running on it. When I start the OSD using 
> /etc/init.d/ceph start osd.0
>
> I see the expected interaction between the OSD and the monitor 
> authenticating keys etc and finally the OSD starts.
>
>
>
> Running watching the cluster with ?ceph ?w? running on the monitor I 
> never see the INFO messages I expect. There isn?t a msg from osd.0 for 
> the boot event and the expected INFO messages from osdmap and pgmap  
> for the osd and it?s pages being added to those maps.  I only see the 
> last time the monitor was booted and it wins the monitor election and 
> reports monmap, pgmap, and mdsmap info.
>
>
>
> The firewalls are disabled with selinux==disabled and iptables turned off.
> All hosts can ssh w/o passwords into each other and I?ve verified 
> traffic between hosts using tcpdump captures. Any ideas on what I?d 
> need to add to ceph.conf or have overlooked would be greatly appreciated.
>
> Thanks,
>
> Bruce
>
>
>
> [root at ceph0 ceph]# /etc/init.d/ceph restart osd.0
>
> === osd.0 ===
>
> === osd.0 ===
>
> Stopping Ceph osd.0 on ceph0...kill 15676...done
>
> === osd.0 ===
>
> 2014-08-20 17:43:46.456592 7fa51a034700  1 -- :/0 messenger.start
>
> 2014-08-20 17:43:46.457363 7fa51a034700  1 -- :/1025971 -->
> 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0
> 0x7fa51402f9e0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.458229 7fa5189f0700  1 -- 209.243.160.83:0/1025971 
> learned my addr 209.243.160.83:0/1025971
>
> 2014-08-20 17:43:46.459664 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 1 ==== mon_map v1 ==== 200+0+0 (3445960796 
> 0 0)
> 0x7fa508000ab0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.459849 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) 
> v1 ====
> 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.460180 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7fa4fc0012d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.461341 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) 
> v1 ====
> 206+0+0 (409581826 0 0) 0x7fa508000f60 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.461514 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x7fa4fc001cf0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.462824 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) 
> v1 ====
> 393+0+0 (2134012784 0 0) 0x7fa5080011d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463011 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 
> 0x7fa51402bbc0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463073 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0
> 0x7fa4fc0025d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463329 7fa51a034700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0
> 0x7fa514030490 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463363 7fa51a034700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0
> 0x7fa5140309b0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463564 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 5 ==== mon_map v1 ==== 200+0+0 (3445960796 
> 0 0)
> 0x7fa508001100 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463639 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 
> 20+0+0
> (540052875 0 0) 0x7fa5080013e0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.463707 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 7 ==== auth_reply(proto 2 0 (0) Success) 
> v1 ====
> 194+0+0 (1040860857 0 0) 0x7fa5080015d0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.468877 7fa51a034700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- mon_command({"prefix": 
> "get_command_descriptions"} v 0) v1 -- ?+0 0x7fa514030e20 con 
> 0x7fa51402f570
>
> 2014-08-20 17:43:46.469862 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 8 ==== osd_map(554..554 src has 1..554) v3 
> ====
> 59499+0+0 (2180258623 0 0) 0x7fa50800f980 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.470428 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 
> 20+0+0
> (540052875 0 0) 0x7fa50800fc40 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.475021 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 10 ==== osd_map(554..554 src has 1..554) 
> v3 ====
> 59499+0+0 (2180258623 0 0) 0x7fa508001100 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.475081 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 11 ==== mon_subscribe_ack(300s) v1 ==== 
> 20+0+0
> (540052875 0 0) 0x7fa508001310 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.477559 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 12 ==== mon_command_ack([{"prefix":
> "get_command_descriptions"}]=0  v0) v1 ==== 72+0+29681 (1092875540 0
> 3117897362) 0x7fa5080012b0 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.592859 7fa51a034700  1 -- 209.243.160.83:0/1025971 
> -->
> 209.243.160.84:6789/0 -- mon_command({"prefix": "osd crush 
> create-or-move",
> "args": ["host=ceph0", "root=default"], "id": 0, "weight":
> 3.6400000000000001} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570
>
> 2014-08-20 17:43:46.594426 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
> <==
> mon.0 209.243.160.84:6789/0 13 ==== mon_command_ack([{"prefix": "osd 
> crush create-or-move", "args": ["host=ceph0", "root=default"], "id": 0, "weight":
> 3.6400000000000001}]=0 create-or-move updated item name 'osd.0' weight 
> 3.64 at location {host=ceph0,root=default} to crush map v554) v1 ==== 
> 254+0+0
> (748268703 0 0) 0x7fa508001100 con 0x7fa51402f570
>
> create-or-move updated item name 'osd.0' weight 3.64 at location 
> {host=ceph0,root=default} to crush map
>
> 2014-08-20 17:43:46.602415 7fa51a034700  1 -- 209.243.160.83:0/1025971 
> mark_down 0x7fa51402f570 -- 0x7fa51402f300
>
> 2014-08-20 17:43:46.602500 7fa51a034700  1 -- 209.243.160.83:0/1025971 
> mark_down_all
>
> 2014-08-20 17:43:46.602666 7fa51a034700  1 -- 209.243.160.83:0/1025971 
> shutdown complete.
>
> Starting Ceph osd.0 on ceph0...
>
> starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 
> /var/lib/ceph/osd/ceph-0/journal
>
> [root at ceph0 ceph]#
>
>
>
>
>
> Ceph ?w output from ceph-mon01:
>
> 2014-08-20 17:20:24.648538 7f326ebfd700  0 monclient: hunting for new 
> mon
>
> 2014-08-20 17:20:24.648857 7f327455f700  0 -- 209.243.160.84:0/1005462 
> >>
> 209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1 
> c=0x7f3264020570).fault
>
> 2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01 at 0 won leader 
> election with quorum 0
>
> 2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at 
> {ceph-mon01=209.243.160.84:6789/0}
>
> 2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 
> creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
>
> 2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux