Thank you very much, Samuel. And sorry for writing from different e-mails. 2012/3/22 Samuel Just <sam.just@xxxxxxxxxxxxx>: > First, an object is hashed into a pg. The pg is then moved between > osds as osds are added and removed (or fail). > > Once enough other osds report an osd as failed, the monitors will mark > the osd as down. During this time, > pgs with that osd as a replica will continue to serve writes in a > degraded state writing to the remaining > replicas. Some time later, that osd will be marked out causing the pg > to be remapped to different osds, > at which point the degraded objects will re-replicate. The time > between being marked down and being > marked out is controlled by mon_osd_down_out_interval. Note that > setting that option to 0 prevents the > down osd from ever being marked out. I have created a bug to allow > the user to force a down osd to > be immediately marked out (#2198). > -Sam Just > > On Wed, Mar 21, 2012 at 11:44 AM, Бородин Владимир <volk@xxxxxxxx> wrote: >> Thank you, Samuel. >> >> Here is what I meant in fourth question: >> We have pool "data" and in the crushmap there is a rule like this: >> rule data { >> ruleset 1 >> type replicated >> min_size 3 >> max_size 3 >> step take root >> step chooseleaf firstn 0 type rack >> step emit >> } >> The client tries to write an object to pool "data". Ceph selects PG >> where to put the object. The object is then written to the buffer of >> the primary OSD. Primary OSD then tries to write copies to buffer of >> two replica OSDs and fails (perhaps, they are marked as down). Does >> client recieve ack? Or will ceph try to write the object into another >> PG? >> Primary OSD then writes object from buffer to disk. Two replicas are >> still down. Does client receive commit? >> Is there a way to write always 3 successfull copies of an object and >> then return ack to client? >> >> 21.03.2012, 21:12, "Samuel Just" <sam.just@xxxxxxxxxxxxx>: >>> 1. The standby mds does not maintain cached metadata in memory or >>> serve reads. When taking over from a failed mds, it reads the >>> primary's journal, which does warm up its cache somewhat. Optionally, >>> you can put an mds into standby-replay for an active mds. In this case, >>> the standby-replay mds will continually replay the primary's journal in >>> order to more quickly take over in the case of a crash. >>> >>> 2. Primary-copy is the only strategy currently implemented. >>> >>> 3. On a sync, we wait for commits to ensure data safety. >>> >>> 4. I don't quite understand this question. >>> >>> 5. Currently, there is not a good way to accomplish this using only >>> ceph. Replication is synchronous on writes, so you would be paying >>> the latency cost between data centers on each write if you wanted to >>> replicate between data centers. We don't currently support reading from >>> the closest replica. >>> >>> On Wed, Mar 21, 2012 at 2:22 AM, Borodin Vladimir <v.a.borodin@xxxxxxxxx> wrote: >>> >>>> Hi all. >>>> >>>> I've read everything in ceph.newdream.net/docs and >>>> ceph.newdream.net/wiki. I've also read some articles from >>>> ceph.newdream.net/publications. But I haven't found answers on some >>>> questions: >>>> 1. there is one active MDS and one in standby mode. Active MDS caches >>>> all metadata in RAM. Does standby MDS copy this information to RAM? >>>> Will it take metadata from OSDs on every request after primary MDS >>>> failure? Do read requests come to standby MDS? >>>> 2. what replication strategy on OSDs (primary-copy, chain or splay) is >>>> turned on by default? >>>> 3. when does kernel client understand that the write was successfull >>>> (when it recieves ack or commit from primary OSD)? >>>> 4. I want to have 3 copies of each object. By default there is a way >>>> to write only one successfull copy (and two others give a failure), >>>> isn't that? Is there a way to turn this off (even if one of three >>>> copies failed, the object will be placed to another PG)? >>>> 5. if I have several data centers with good network connection, what >>>> is the way to provide data locality? For example, most of write and >>>> read requests from Spain come to Spanish DC and most of write and read >>>> requests from Russia come to Russian DC. Is it posible? >>>> >>>> Regards, >>>> Vladimir. >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html