Re: Braindump: path names, partition labels, FHS, auto-discovery

Tommi Virtanen <tommi.virtanen@xxxxxxxxxxxxx> · Tue, 27 Mar 2012 11:21:27 -0700

On Mon, Mar 19, 2012 at 02:21, Bernard Grymonpon <bernard@xxxxxxxxxxxx> wrote:
> As I've been constructing some cookbooks to setup a default cluster, this is
> what I bumped into:
>
> - the numbering (0, 1, ...) of the OSDs and their need to keep the same number
>  throughout the lifetime of the cluster is a bit a hassle. Each OSD needs to
>  have a complete view of all the components of the cluster before it can
>  determine it's own ID. A random, auto-generated UUID would be nice (I
>  currently solved this by assigning each cluster a global "clustername", and
>  search the chef server for all nodes, look for the highest indexed OSDs, and
>  increment this to determine the new OSD's index - there must be a better
>  way).

That's why you ask the monitors to assign you one:
https://github.com/ceph/ceph-cookbooks/blob/1d381a3e1dd767c4c8ab668878285b545a70846a/ceph/recipes/bootstrap_osd.rb#L47

As far as I know, chef does NOT provide the necessary atomicity to
reliably allocate unique ids.

> - the configfile needs to be the same on all hosts - which is only partially
>  true. From my point of view, a OSD should only have some way of contacting
>  one mon, which would inform the OSD of the cluster layout. So, only the
>  mon-info should be there (together with the info for the OSD itself,
>  obviously)

Can't rely on a single mon, that'd the a single point of failure.

The only thing the config really needs is the monitor locations. I
expect the rest of this to slowly go away, as we improve the defaults
in the code and the cookbook:

https://github.com/ceph/ceph-cookbooks/blob/1d381a3e1dd767c4c8ab668878285b545a70846a/ceph/templates/default/ceph.conf.erb

> - there is a chicken-egg problem in the authentication of a osd to the mon. An
>  OSD should have permission to join the mon, for which we need to add the OSD
>  to the mon. As chef works on the node, and can't trigger stuff on other
>  nodes, the node that will hold the OSD needs some way of authenticating
>  itself to the mon (I solved this by storing the "client.admin" secret on the
>  mon-node, and then pulling this from there on the osd node, and using it to
>  register myself to the mon. It is like putting a copy of your homekey on
>  your front door...). I see no obvious solution here.

I use a "bootstrap-osd" key that can create new OSDs and authorize
keys for them. It's less powerful than client.admin.

https://github.com/ceph/ceph-cookbooks/blob/1d381a3e1dd767c4c8ab668878285b545a70846a/ceph/recipes/single_mon.rb#L49

https://github.com/ceph/ceph-cookbooks/blob/1d381a3e1dd767c4c8ab668878285b545a70846a/ceph/recipes/bootstrap_osd.rb#L20

> - the current (debian) start/stop scripts are a hassle to work with, as chef
>  doesn't understand the third parameter (/etc/init.d/ceph start mon.0). Each
>  mon / osd / ... should have its own start/stop script.

The cookbook uses upstart jobs, and runs an instance per osd id etc.

https://github.com/ceph/ceph-cookbooks/blob/1d381a3e1dd767c4c8ab668878285b545a70846a/ceph/templates/default/upstart-ceph-osd.conf.erb
https://github.com/ceph/ceph-cookbooks/tree/1d381a3e1dd767c4c8ab668878285b545a70846a/ceph/templates/default

> - there should be some way to ask a local running OSD/MON for its status,
>  without having to go through the monitor-nodes. Sort of "ceph-local-daemon
>  --uuid=xxx --type=mon status", which would inform us if it is running,
>  healthy, part of the cluster, lost in space...

That'd be the "admin socket". It's unfortunately not well documented currently.

> - growing the cluster bit by bit would be ideal, this is how chef works (it
>  handles node per node, not a bunch of nodes in one go)

The cookbook handles this, with some limitations that will be removed
once we have resources to work on it again.

> - ideal, there would be a automatic-crushmap-expansion command which would add
>  a device to an existing crushmap (or remove one). Now, the crushmap needs to
>  be reconstructed completely, and if your numbering changes somehow, you're
>  screwed. Ideal would be "take the current crushmap and add OSD with uuid
>  xxx" - "take the current crushmap and remove OSD xxx"

Exists already.

> Just my thoughts! I've been following the ceph project for a while now, set up
> a couple of test clusters in the past and the last two weeks, and made the
> cookbooks to make my life easier (and bumped in a lot of Ops-trouble doing
> this...).

To summarize:

Status of the cookbook at https://github.com/ceph/ceph-cookbooks is:

- it assumes you only run a single monitor
- it assumes you run 1 osd per node, as a subdirectory of /srv

Both of these restrictions will be eventually lifted, that was just to
get started.

Right now, I know we have one admin looking to lift the "1 osd per
node" limitation (he's ok doing mons manually), but other than that
I'm the only person who has put time into the cookbooks, and I'm
currently busy setting up our automated test infrastructure. We're
hiring Chef devopsy people, come help us!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html