On Wed, Jan 18, 2012 at 13:57, Andrey Stepachev <octo47@xxxxxxxxx> wrote: > Thanks for clarification. I don't dig yet, how data moved when osd > marked as out and how pg calculated for this data. (in other words, > how new place for missing replica calculated, and what happens, > when out osd returns back) That would be the CRUSH algorithm just getting a different cluster state as input, and thus deciding on a new location for (some of) the data. At that point, any data that needs to be migrated, would be migrated. If the same osd id returns (on new hardware or old), once it's marked "in", the CRUSH algorithm will again start placing data on it. Whether that's the same set of data or not depends on whether anything else changed in the cluster, in the meanwhile. Currently, CRUSH is best described in the academic papers about Ceph. There's a quick braindump/simplified explanation at http://ceph.newdream.net/docs/latest/dev/placement-group/ > We don't try to use Ceph in true WAN. But we try to find some dfs, which will > operate well in our environment: multiple dc, relatively low latency > (<10ms as average), > irregular dc outages. And we need synchronous replication and we need > bigtable (hbase in case of ceph). > I believe, that ceph can be configured to operate in such environment, but > I can be completely wrong. And I try to check some boundary conditions. As long as you understand that network blip translates to storage blip, with Ceph. That is, we don't just write to master and hope that replication catches up at some later time. > Yeah, it is good to see some progress (like mdadm shows), but it is not > critical. Can you point me, who does this job. MDS? This is answered in http://ceph.newdream.net/docs/latest/dev/delayed-delete/ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html