Re: preferred OSD

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 11 Feb 2013 12:11:45 -0800

On Fri, Feb 8, 2013 at 4:45 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> Hi Marcus-
>
> On Fri, 8 Feb 2013, Marcus Sorensen wrote:
>> I know people have been disscussing on and off about providing a
>> "preferred OSD" for things like multi-datacenter, or even within a
>> datacenter, choosing an OSD that would avoid traversing uplinks.  Has
>> there been any discussion on how to do this? I seem to remember people
>> saying things like 'the crush map doesn't work that way at the
>> moment'. Presumably, when a client needs to access an object, it looks
>> up where the object should be stored via the crush map, which returns
>> all OSDs that could be read from.

Exactly.

>> I was thinking this morning that you
>> could potentially leave the crush map out of it, by setting a location
>> for each OSD in the ceph.conf, and an /etc/ceph/location file for the
>> client.  Then use the absolute value of the difference to determine
>> preferred OSD. So, if OSD0 was location=1, and OSD1 was location=3,
>> and client 1 was location=2, then it would do the normal thing, but if
>> client 1 was location=1.3, then it would prefer OSD0 for reads.
>> Perhaps that's overly simplistic and wouldn't scale to meet everyone's
>> requirements, but you could do multiple locations and sprinkle clients
>> in between them all in various ways.  Or perhaps the location is a
>> matrix, so you could literally map it out on a grid with a set of
>> coordinates. What ideas are being discussed around how to implement
>> this?
>
> We can do something like this for reads today, where we pick a read
> replica based on the closest IP or some other metric/mask.  We generally
> don't enable this because it leads to non-optimal cache behavior, but it
> could in principle be enabled via a config option for certain clusters
> (and in fact some of that code is already in place).

Just to be specific — there are currently flags which will let the
client read from local-host if it can figure that out, and those
aren't heavily-tested but do work when we turn them on. Other metrics
of "close" don't appear yet, though.
In general, CRUSH locations seem like a good measure of closeness that
the client could rely on, rather than a separate "location" value, but
it does restrict the usefulness if you've configured multiple CRUSH
root nodes. I think it would need to support a tree of some kind
though, rather than just a linear value.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html