I've reworked the monitor bootstrapping. It's still a little rough around the edges in terms of feeding in initial cluster state, but all the monitor refactoring is done so it should be mainly cleanup from here. The basic bootstrap/mkfs process looks something like this: $ ceph-authtool /etc/ceph/keyring --create-keyring --gen-key -n client.admin $ ceph-authtool /etc/ceph/keyring --gen-key -n mon. and then either $ monmaptool /tmp/monmap --create --clobber --add host1 1.2.3.4 --add host2 1.2.3.5 [...] and on each host $ ceph-mon -i `hostname` --mkfs --monmap /tmp/monmap or define monitors, mon addrs, an fsid (`uuidgen`) in ceph.conf and on each host $ ceph-mon -i `hostname` --mkfs On way or another, --mkfs is building an initial "seed" monmap that has an fsid and a list of initial monitor addresses. If you explicitly pass in a monmap (generated by monmaptool --create ...) that's pretty clear. Alternatively, it will make an initial map based on the --mon-hosts a,b,c list of addresses or on what it finds in ceph.conf. (This is the same bootstrapping that takes place when a random daemon or tool starts up and needs to contact a monitor to authenticate.) The fsid is required, but can come from the generated monmap, command line (--fsid $uuid), or an 'fsid' option in ceph.conf. There is likely some tweaking we can do here, particularly with the manually address specification step (TV is working on this), but the basic requirement is that we have (1) a unique fsid, (2) a list of initial monitor addresses, and (3) a keyring with the mon. and client.admin secret keys. Without those the new monitors don't know who to talk to to form the new cluster and initialize themelves. Thereafter, you can add monitors to the cluster the exact same way. As long as the fsid matches, the secret key is valid, and one of the monitors in the seed monmap is alive and well, the new monitor will sync itself and then add itself to the cluster (by adding itself to the cluster's master monmap). For example, after adding a new [mon.`hostname`] section to your ceph.conf with 'mon addr' defined, $ ceph auth get mon. -o /tmp/monkey $ fsid=`ceph fsid --concise` $ ceph-mon -i `hostname` --mkfs -k /tmp/monkey --fsid $fsid $ ceph-mon -i `hostname` will add a new monitor to the cluster. Here, the new monitor gets its peers from ceph.conf and the mon. key and fsid explicitly. You could also pass a recent copy of the monmap instead of relying on ceph.conf (if, say, the local ceph.conf doesn't list all monitors). The vstart.sh script has been switched to use the new process. Mainly this means that the initial osdmap isn't generated beforehand. Instead, when each osd is added, we do something like $ n=`ceph osd create --concise` $ ceph osd crush add $n osd.$n 1.0 host=localhost rack=localrack pool=default $ ceph-osd -i $n --mkfs --mkkey $ ceph auth add osd.$n osd "allow *" mon "allow rwx" -i dev/osd$n/keyring $ ceph-osd -i $n which allocates an osd id, adds it to the crush map, initializes the osd data dir and creates a random secret, adds that secret to the monitor auth database, and then starts the osd. One other piece here: currently, when a tool or daemon starts up, we build our initial monmap (list of monitor to try to contact) in this order of preference: 1- Was --monmap <fn> specified? (Normally it's not.) 2- Was --mon-host <list> specified? If so, resolve dns names and use that. Fill in fsid if provided (in ceph.conf or command line; normally it's not). 3- Look at the 'mon addr' values in the mon.* sections in my ceph.conf to build a list. Fill in fsid if provided. The current normal practice is #3, with a ceph.conf on every node that had [mon.NNN] sections and mon addr values. Instead, you can do #2, which means you have something like [global] mon host = one.foo.com two.foo.com three.foo.com One nice thing is that the client will try these at random until it connects and authenticates. Once that happens, it gets the real current monmap, which may include hosts not listed here. That means things like adding new monitors don't strictly require that you update ceph.conf all over the place (although that's presumably a good thing to do at some point). That's where we are currently in the master branch. For those of you working on the Chef and Juju stuff, if you have feedback on whether there are still pain points, now's the time to share! :) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html