> On 13 Feb 2013 18:16, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote: > > > > On Wed, Feb 13, 2013 at 3:40 AM, Ben Rowland <ben.rowland@xxxxxxxxx> wrote: > So it sounds from the rest of your post like you'd want to, for each > pool that RGW uses (it's not just .rgw), run "ceph osd set .rgw > min_size 2". (and for .rgw.buckets, etc etc) Thanks, that did the trick. When the number of up OSDs is less than min_size, writes block for 30s then return http 500. Ceph honours my crush rule in this case - adding more OSDs to only one of two failure domains continues to block writes - all well and good! > > If this is the expected behaviour of Ceph, then it seems to prefer > > write-availability over read-availability (in this case my data is > > only stored on 1 OSD, thus a SPOF). Is there any way to change this > > trade-off, e.g. as you can in Cassandra with its write quorums? > > I'm not quite sure this is describing it correctly — Ceph guarantees > that anything that's been written to disk will be readable later on, > and placement groups won't go active if they can't retrieve all data. > The sort of flexible policies allowed by Cassandra aren't possible > within Ceph — it is a strictly consistent system. Are objects always readable even if a PG is missing some OSDs, and where it cannot recover? Example: 2 hosts each with 1 osd, pool min_size is 2, with a crush rule saying to write to both hosts. I write a file successfully, then one host goes down, and eventually is marked 'out'. Is the file readable on the 'up' host (say if I'm running rgw there?) What if the up host does not have the primary copy? Furthermore, if Ceph is strictly consistent, how would it resolve possible stale reads? Say, if in the 2 hosts example, the network connection died, but min_size was set to 1. Would it be possible for writes to proceed, say making edits to an existing object? Could readers at the other host see stale data? Thanks again in advance, Ben -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html