On Mon, Feb 3, 2014 at 10:43 AM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote: > I've been noticing somethings strange with my RGW federation. I added some > statistics to radosgw-agent to try and get some insight > (https://github.com/ceph/radosgw-agent/pull/7), but that just showed me that > I don't understand how replication works. > > When PUT traffic was relatively slow to the master zone, replication had no > issues keeping up. Now I'm trying to cause replication to fall behind, by > deliberately exceeding the amount of bandwidth between the two zones > (they're in different datacenters). Instead of falling behind, both the > radosgw-agent logs and the stats I added say that slave zone is keeping up. > > But some of the numbers don't add up. I'm not using enough bandwidth > between the two facilities, and I'm not using enough disk space in the slave > zone. The disk usage in the slave zone continues to fall further and > further behind the master. Despite this, I'm always able to download > objects from both zones. > > > How does radosgw-agent actually replicate metadata and data? Does > radosgw-agent actually copy all the bytes, or does it create placeholders in > the slave zone? If radosgw-agent is creating placeholders in the slave > zone, and radosgw populates the placeholder in the background, then that > would explain the behavior I'm seeing. If this is true, how can I tell if > replication is keeping up? Are you overwriting the same objects? Replication copies over the "present" version of an object, not all the versions which have ever existed. Similarly, the slave zone doesn't keep all the (garbage-collected) logs that the master zone has to, so those factors would be one way to get differing disk counts. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com