Re: defaults paths #2

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 12 Apr 2012 09:08:43 -0700 (PDT)

On Thu, 12 Apr 2012, Bernard Grymonpon wrote:
> 
> On 09 Apr 2012, at 21:22, Tommi Virtanen wrote:
> 
> > On Mon, Apr 9, 2012 at 11:16, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> One thing we need to keep in mind here is that the individual disks are
> >> placed in the CRUSH hierarchy based on the host/rack/etc location in the
> >> datacenter.  Moving disk around arbitrarily will break the placement
> >> constraints if that position isn't also changed.
> > 
> > Yeah, the location will have to be updated. I tend to think disks
> > *will* move, and it's better to cope with it than to think it won't
> > happen. All you need is a simple power supply/mobo/raid
> > controller/nic/etc failure, if there's any free slots anywhere it's
> > probably better to plug the disks in there than waiting for a
> > replacement part. I'm working under the assumption that it's better to
> > "just bring them up" rather than having an extended osd outage or
> > claiming the osd as lost.
> 
> I've updated my recipes to support disk moving now (and multi-mon clusters, btw), and have moved from
> 
> /var/lib/ceph/osd/$clustername-$id 
> 
> to
> 
> /var/lib/ceph/osd/$clustername-$uuid
> 
> It just isn't pretty to mount a disk in a temp place, check the "whoami" 
> file, and then umount and remount everything on a certain ID. It is all 
> automatically handled, and I think this feels okay.
> 
> The disks are detected by the label, which I made "$cluster.ceph". If 
> such a label is detected, the disk is mounted, the whoami file is read, 
> and the OSD is started with the correct parameters. If the whoami file 
> is not present, the OSD is initialized and added to the mons...
> 
> Input would be much appreciated - both using chef (one node at a time), 
> and rather "building" a cluster instead of initializing a full cluster 
> at once, makes the setup a bit strange sometimes (I don't know how the 
> amount of pg's is determined, or can be suggested, when creating a 
> cluster).

The pg_num will eventually be able to be adjusted manually, or auto-scale 
to the size of the cluster or pool.  That's a couple versions out still, 
but coming up soon.

> > Updating the new location for the osd could be something we do even at
> > every osd start -- it's a nop if the location is the same as the old
> > one. And we can say the host knows where it is, and that information
> > is available in /etc or /var/lib/ceph.
> 
> I also got to the point where I want to update the location of an OSD when bringing a OSD online.
> 
> Adding a new (bare) disk (and OSD) is easy: 
> 
> ceph osd crush add 3 osd.3 1 pool=default host=2 rack=1
> 
> (with host=2 and rack=1 coming from the node itself, somehow - it would be easy if we could use alfanumerical hostnames in those parameters...)
> 
> If there would be a 
> 
> ceph osd crush update 3 osd.3 pool=default host=3 rack=2
> 
> command, that would solve the whole location problem.

Added this to the tracker, #2268.

Thanks!
sage

> 
> Rgds,
> Bernard
> 
> > 
> > I'll come back to this once it's a little bit more concrete; I'd
> > rather not make speculative changes, until I can actual trigger the
> > behavior in a test bench.
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html