Hello, I think this might very well be my poor, unacknowledged bug report: http://tracker.ceph.com/issues/10012 People with a mon_hosts entry in [global] (as created by ceph-deploy) will be fine, people with mons specified outside of [global] will not. Regards, Christian On Thu, 11 Dec 2014 00:49:03 +0000 Joao Eduardo Luis wrote: > On 12/10/2014 09:05 PM, Gregory Farnum wrote: > > What version is he running? > > > > Joao, does this make any sense to you? > > From the MonMap code I'm pretty sure that the client should have built > the monmap from the [mon.X] sections, and solely based on 'mon addr'. > > 'mon_initial_members' is only useful to the monitors anyway, so it can > be disregarded. > > Thus, there are two ways for a client to build a monmap: > 1) based on 'mon_hosts' on the config (or -m on cli); or > 2) based on 'mon addr = ip1,ip2...' from the [mon.X] sections > > I don't see a 'mon hosts = ip1,ip2,...' on the config file, and I'm > assuming a '-m ip1,ip2...' has been supplied on the cli, so we would > have been left with the 'mon addr' options on each individual [mon.X] > section. > > We are left with two options here: assume there was unexpected behavior > on this code path -- logs or steps to reproduce would be appreciated in > this case! -- or assume something else failed: > > - are the ips on the remaining mon sections correct (nodo-1 && nodo-2)? > - were all the remaining monitors up and running when the failure > occurred? > - were the remaining monitors reachable by the client? > > In case you are able to reproduce this behavior, would be nice if you > could provide logs with 'debug monc = 10' and 'debug ms = 1'. > > Cheers! > > -Joao > > > > -Greg > > > > On Wed, Dec 10, 2014 at 11:54 AM, Christopher Armstrong > > <chris@xxxxxxxxxxxx> wrote: > >> Thanks Greg - I thought the same thing, but confirmed with the user > >> that it appears the radosgw client is indeed using initial members - > >> when he added all of his hosts to initial members, things worked just > >> fine. In either event, all of the monitors were always fully > >> enumerated later in the config file. Is this potentially a bug > >> specific to radosgw? Here's his config file: > >> > >> [global] > >> fsid = fc0e2e09-ade3-4ff6-b23e-f789775b2515 > >> mon initial members = nodo-3 > >> auth cluster required = cephx > >> auth service required = cephx > >> auth client required = cephx > >> osd pool default size = 3 > >> osd pool default min_size = 1 > >> osd pool default pg_num = 128 > >> osd pool default pgp_num = 128 > >> osd recovery delay start = 15 > >> log file = /dev/stdout > >> mon_clock_drift_allowed = 1 > >> > >> > >> [mon.nodo-1] > >> host = nodo-1 > >> mon addr = 192.168.2.200:6789 > >> > >> [mon.nodo-2] > >> host = nodo-2 > >> mon addr = 192.168.2.201:6789 > >> > >> [mon.nodo-3] > >> host = nodo-3 > >> mon addr = 192.168.2.202:6789 > >> > >> > >> > >> [client.radosgw.gateway] > >> host = deis-store-gateway > >> keyring = /etc/ceph/ceph.client.radosgw.keyring > >> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock > >> log file = /dev/stdout > >> > >> > >> On Wed, Dec 10, 2014 at 11:40 AM, Gregory Farnum <greg@xxxxxxxxxxx> > >> wrote: > >>> > >>> On Tue, Dec 9, 2014 at 3:11 PM, Christopher Armstrong > >>> <chris@xxxxxxxxxxxx> wrote: > >>>> Hi folks, > >>>> > >>>> I think we have a bit of confusion around how initial members is > >>>> used. I understand that we can specify a single monitor (or a > >>>> subset of monitors) so > >>>> that the cluster can form a quorum when it first comes up. This is > >>>> how we're > >>>> using the setting now - so the cluster can come up with just one > >>>> monitor, > >>>> with the other monitors to follow later. > >>>> > >>>> However, a Deis user reported that when the monitor in his initial > >>>> members > >>>> list went down, radosgw stopped functioning, even though there are > >>>> three mons in his config file. I would think that the radosgw > >>>> client would connect > >>>> to any of the nodes in the config file to get the state of the > >>>> cluster, and > >>>> that the initial members list is only used when the monitors first > >>>> come up > >>>> and are trying to achieve quorum. > >>>> > >>>> The issue he filed is here: https://github.com/deis/deis/issues/2711 > >>>> > >>>> He also found this Ceph issue filed: > >>>> https://github.com/ceph/ceph/pull/1233 > >>> > >>> Nope, this has nothing to do with it. > >>> > >>>> > >>>> Is that what we're seeing here? Can anyone point us in the right > >>>> direction? > >>> > >>> I didn't see the actual conf file posted anywhere to look at, but my > >>> guess is simply that (since it looks like you're using generated conf > >>> files which can differ across hosts) that the one on the server(s) in > >>> question don't have the monitors listed in them. I'm only skimming > >>> the code, but from it and my recollection, when a Ceph client starts > >>> up it will try to assemble a list of monitors to contact from: > >>> 1) the contents of the "mon host" config entry > >>> 2) the "mon addr" value in any of the "global", "mon" or "mon.X" > >>> sections > >>> > >>> The clients don't even look at mon_initial_members that I can see, > >>> actually — so perhaps your client config only lists the initial > >>> monitor, without adding the others? > >>> -Greg > >> > >> > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com