Re: Location of MONs

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 23 Jul 2013 09:16:15 -0700

On Tue, Jul 23, 2013 at 9:12 AM, Matthew Walster <matthew@xxxxxxxxxxx> wrote:
> On 23 July 2013 17:07, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>>  If you have three osds that are
>> separated by 5ms each and all hosting a PG, then your lower-bound
>> latency for a write op is 10ms — 5 ms to send from the primary to the
>> replicas, 5ms for them to ack back.
>
>
> And without wanting to sound daft having missed a salient configuration
> detail, but there's no way to release when it's written the primary?

Definitely not. Ceph's consistency guarantees and recovery mechanisms
are all built on top of all the replicas having a consistent copy and
that breaks if you do primary-only acks. Maybe in the future something
like this will happen, but it's all very blue-sky right now.

> Likewise, there's no way of influencing a client to write to a particular
> structure in the CRUSH map in preference? i.e. influence the write so it
> tries to read/write from local where possible/available? Essentially I'm
> saying "if I have a structure of DC, rack, server, disk; can I say "this
> client is part of this DC, operate here first" and let the OSDs deal with
> the replication?

You can do things like say "this data is always accessed from this
location" and set up your pools and crush rules to associate the data
with a location; you cannot write to arbitrary replicas. There is some
limited work around doing things like "read from local host if it's
there" (which exists now) and "read from the closest CRUSH item"
(which I think exists in a branch somewhere, but I'm not sure), but
it's got some consistency issues right now (possible to get stale
data).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com