Re: Theoretical questions

Borodin Vladimir <v.a.borodin@xxxxxxxxx> · Thu, 22 Mar 2012 07:40:05 +0300

Thank you very much, Samuel. And sorry for writing from different e-mails.

2012/3/22 Samuel Just <sam.just@xxxxxxxxxxxxx>:
> First, an object is hashed into a pg.  The pg is then moved between
> osds as osds are added and removed (or fail).
>
> Once enough other osds report an osd as failed, the monitors will mark
> the osd as down.  During this time,
> pgs with that osd as a replica will continue to serve writes in a
> degraded state writing to the remaining
> replicas.  Some time later, that osd will be marked out causing the pg
> to be remapped to different osds,
> at which point the degraded objects will re-replicate.  The time
> between being marked down and being
> marked out is controlled by mon_osd_down_out_interval.  Note that
> setting that option to 0 prevents the
> down osd from ever being marked out.  I have created a bug to allow
> the user to force a down osd to
> be immediately marked out (#2198).
> -Sam Just
>
> On Wed, Mar 21, 2012 at 11:44 AM, Бородин Владимир <volk@xxxxxxxx> wrote:
>> Thank you, Samuel.
>>
>> Here is what I meant in fourth question:
>> We have pool "data" and in the crushmap there is a rule like this:
>> rule data {
>>        ruleset 1
>>        type replicated
>>        min_size 3
>>        max_size 3
>>        step take root
>>        step chooseleaf firstn 0 type rack
>>        step emit
>> }
>> The client tries to write an object to pool "data". Ceph selects PG
>> where to put the object. The object is then written to the buffer of
>> the primary OSD. Primary OSD then tries to write copies to buffer of
>> two replica OSDs and fails (perhaps, they are marked as down). Does
>> client recieve ack? Or will ceph try to write the object into another
>> PG?
>> Primary OSD then writes object from buffer to disk. Two replicas are
>> still down. Does client receive commit?
>> Is there a way to write always 3 successfull copies of an object and
>> then return ack to client?
>>
>> 21.03.2012, 21:12, "Samuel Just" <sam.just@xxxxxxxxxxxxx>:
>>> 1. The standby mds does not maintain cached metadata in memory or
>>> serve reads.  When taking over from a failed mds, it reads the
>>> primary's journal, which does warm up its cache somewhat.  Optionally,
>>> you can put an mds into standby-replay for an active mds.  In this case,
>>> the standby-replay mds will continually replay the primary's journal in
>>> order to more quickly take over in the case of a crash.
>>>
>>> 2. Primary-copy is the only strategy currently implemented.
>>>
>>> 3. On a sync, we wait for commits to ensure data safety.
>>>
>>> 4. I don't quite understand this question.
>>>
>>> 5.  Currently, there is not a good way to accomplish this using only
>>> ceph.  Replication is synchronous on writes, so you would be paying
>>> the latency cost between data centers on each write if you wanted to
>>> replicate between data centers.  We don't currently support reading from
>>> the closest replica.
>>>
>>> On Wed, Mar 21, 2012 at 2:22 AM, Borodin Vladimir <v.a.borodin@xxxxxxxxx> wrote:
>>>
>>>>  Hi all.
>>>>
>>>>  I've read everything in ceph.newdream.net/docs and
>>>>  ceph.newdream.net/wiki. I've also read some articles from
>>>>  ceph.newdream.net/publications. But I haven't found answers on some
>>>>  questions:
>>>>  1. there is one active MDS and one in standby mode. Active MDS caches
>>>>  all metadata in RAM. Does standby MDS copy this information to RAM?
>>>>  Will it take metadata from OSDs on every request after primary MDS
>>>>  failure? Do read requests come to standby MDS?
>>>>  2. what replication strategy on OSDs (primary-copy, chain or splay) is
>>>>  turned on by default?
>>>>  3. when does kernel client understand that the write was successfull
>>>>  (when it recieves ack or commit from primary OSD)?
>>>>  4. I want to have 3 copies of each object. By default there is a way
>>>>  to write only one successfull copy (and two others give a failure),
>>>>  isn't that? Is there a way to turn this off (even if one of three
>>>>  copies failed, the object will be placed to another PG)?
>>>>  5. if I have several data centers with good network connection, what
>>>>  is the way to provide data locality? For example, most of write and
>>>>  read requests from Spain come to Spanish DC and most of write and read
>>>>  requests from Russia come to Russian DC. Is it posible?
>>>>
>>>>  Regards,
>>>>  Vladimir.
>>>>  --
>>>>  To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>  the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>  More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html