Re: mon not binding to public interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Monitors use the public network, not the cluster network. Only OSDs use the cluster network. The purpose of the cluster network is that OSDs do a lot of heartbeat checks, data replication, recovery, and rebalancing. So the cluster network will see more traffic than the front end public network. See http://ceph.com/docs/master/rados/configuration/mon-osd-interaction/ By contrast, Ceph clients connect to monitors and OSDs, so they must be on the public network. See the diagram here: http://ceph.com/docs/master/rados/configuration/network-config-ref/  Notice that all daemons use the public network? This is because clients connect using the public network. Yet, only OSDs use the cluster network.

In your configuration, you specified the following: 

[mon.controller1]
  host = controller1
  mon addr = 10.100.10.1:6789
  public addr = 10.100.0.150
  cluster addr = 10.100.10.1
  cluster network = 10.100.10.0/24
  public network = 10.100.0.0/21

The IP address for the mon.controller1 is set to a cluster network IP address--namely, 10.100.10.1:6789.  Since the monitor only connects on the public network, and you have specifically told it to connect only on a cluster network, that is why the monitor is running on the cluster network. Your monitor address should be something like 10.100.0.155:6789 in that range. 

However, now that you have a monitor IP address, changing it can be a bit troublesome too. See the following:

http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address





On Wed, Jan 15, 2014 at 1:13 PM, Jeff Bachtel <jbachtel@xxxxxxxxxxxxxxxxxxxxxx> wrote:
If I understand correctly then, I should either not specify mon addr or set it to an external IP?

Thanks for the clarification,

Jeff


On 01/15/2014 03:58 PM, John Wilkins wrote:
Jeff,

First, if you've specified the public and cluster networks in [global], you don't need to specify it anywhere else. If you do, they get overridden. That's not the issue here. It appears from your ceph.conf file that you've specified an address on the cluster network. Specifically, you specified mon addr = 10.100.10.1:6789, but you indicated elsewhere that this IP address belongs to the cluster network.


On Mon, Jan 13, 2014 at 11:29 AM, Jeff Bachtel <jbachtel@xxxxxxxxxxxxxxxxxxxxxx> wrote:
I've got a cluster with 3 mons, all of which are binding solely to a cluster network IP, and neither to 0.0.0.0:6789 nor a public IP. I hadn't noticed the problem until now because it makes little difference in how I normally use Ceph (rbd and radosgw), but now that I'm trying to use cephfs it's obviously suboptimal.

[global]
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
  keyring = /etc/ceph/keyring
  cluster network = 10.100.10.0/24
  public network = 10.100.0.0/21
  public addr = 10.100.0.150
  cluster addr = 10.100.10.1
   fsid = de10594a-0737-4f34-a926-58dc9254f95f

[mon]
  cluster network = 10.100.10.0/24
  public network = 10.100.0.0/21
  mon data = "">
[mon.controller1]
  host = controller1
  mon addr = 10.100.10.1:6789
  public addr = 10.100.0.150
  cluster addr = 10.100.10.1
  cluster network = 10.100.10.0/24
  public network = 10.100.0.0/21

And then with /usr/bin/ceph-mon -i controller1 --debug_ms 12 --pid-file /var/run/ceph/mon.controller1.pid -c /etc/ceph/ceph.conf I get in logs

2014-01-13 14:19:13.578458 7f195e6d97a0  0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 7559
2014-01-13 14:19:13.641639 7f195e6d97a0 10 -- :/0 rank.bind 10.100.10.1:6789/0
2014-01-13 14:19:13.641668 7f195e6d97a0 10 accepter.accepter.bind
2014-01-13 14:19:13.642773 7f195e6d97a0 10 accepter.accepter.bind bound to 10.100.10.1:6789/0
2014-01-13 14:19:13.642800 7f195e6d97a0  1 -- 10.100.10.1:6789/0 learned my addr 10.100.10.1:6789/0
2014-01-13 14:19:13.642808 7f195e6d97a0  1 accepter.accepter.bind my_inst.addr is 10.100.10.1:6789/0 need_addr=0

Whith no mention of public addr (10.100.2.1) or public network (10.100.0.0/21) found. mds (on this host) and osd (on other hosts) bind to 0.0.0.0 and a public IP, respectively.

At this point public/cluster addr/network are WAY overspecified in ceph.conf, but the problem appeared with far less specification.

Any ideas? Thanks,

Jeff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
John Wilkins
Senior Technical Writer
Intank
john.wilkins@xxxxxxxxxxx
(415) 425-9599
http://inktank.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
John Wilkins
Senior Technical Writer
Intank
john.wilkins@xxxxxxxxxxx
(415) 425-9599
http://inktank.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux