Re: Consistency problem with multiple rgws

Casey Bodley <cbodley@xxxxxxxxxx> · Wed, 21 Dec 2016 11:48:13 -0500

On 12/16/2016 02:36 AM, sw zhang wrote:
Hi,
I test it again today that each zone has one RGW with config
'rgw_num_rados_handles=2'. I use cosbench to  upload 50,000 object ,
each object is 4M,
the number of workers is 10.
After the data sync is finished(I use the command 'radosgw-admin
bucket sync status --bucket=<name>' and 'radosgw-admin sync status' to
check that)
Below is the bucket stats result:

Master zone:
[root@ceph36 ~]# radosgw-admin bucket stats --bucket=shard23
{
     "bucket": "shard23",
     "pool": "master.rgw.buckets.data",
     "index_pool": "master.rgw.buckets.index",
     "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
     "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
     "owner": "zsw-test",
     "ver": "0#50039,1#49964",
     "master_ver": "0#0,1#0",
     "mtime": "2016-12-16 10:58:56.174049",
     "max_marker": "0#00000050038.56144.3,1#00000049963.56109.3",
     "usage": {
         "rgw.main": {
             "size_kb": 195300782,
             "size_kb_actual": 195388276,
             "num_objects": 50000
         }
     },
     "bucket_quota": {
         "enabled": false,
         "max_size_kb": -1,
         "max_objects": -1
     }
}

Slave zone:
[root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
{
     "bucket": "shard23",
     "pool": "slave.rgw.buckets.data",
     "index_pool": "slave.rgw.buckets.index",
     "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
     "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
     "owner": "zsw-test",
     "ver": "0#51172,1#51070",
     "master_ver": "0#0,1#0",
     "mtime": "2016-12-16 10:58:56.174049",
     "max_marker": "0#00000051171.112193.3,1#00000051069.79607.3",
     "usage": {
         "rgw.main": {
             "size_kb": 194769532,
             "size_kb_actual": 194856788,
             "num_objects": 49861
         }
     },
     "bucket_quota": {
         "enabled": false,
         "max_size_kb": -1,
         "max_objects": -1
     }
}

We can see that in slave zone, object number in bucket stats is less
than master. But if I use s3cmd to list the bucket in slave zone, the
result is right:
[root@ceph05 ~]# s3cmd ls s3://shard23 | wc -l
50000

And after I list the bucket with s3cmd, I use the bucket stats in
slave zone again:
[root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
{
     "bucket": "shard23",
     "pool": "slave.rgw.buckets.data",
     "index_pool": "slave.rgw.buckets.index",
     "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
     "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
     "owner": "zsw-test",
     "ver": "0#51182,1#51079",
     "master_ver": "0#0,1#0",
     "mtime": "2016-12-16 10:58:56.174049",
     "max_marker": "0#00000051181.112203.9,1#00000051078.79616.9",
     "usage": {
         "rgw.main": {
             "size_kb": 194769532,
             "size_kb_actual": 194856788,
             "num_objects": 50000
         }
     },
     "bucket_quota": {
         "enabled": false,
         "max_size_kb": -1,
         "max_objects": -1
     }
}

We can see that the num_objects is right now. (According to the code ,
list bucket will send the 'dir_suggest_changes' request to the osd. I
think this is why the number is right now.)
If each zone have two rgw with config 'rgw_num_rados_handles=1', the
difference between the bucket stats is smaller, from 10 to 40.
If each zone have one rgw with config 'rgw_num_rados_handles=1', the
bucket stats are same.
My colleague and I have tested that multi times in two different
clusters(Ceph version is jewel), and this problem nearly occurs every
time.

Thanks for the extra info, I'll look into this. Could you please open a 
ticket at http://tracker.ceph.com/projects/rgw/issues/new and include 
this output?

2016-12-16 0:05 GMT+08:00 Casey Bodley <cbodley@xxxxxxxxxx>:
Hi,

On 12/15/2016 02:55 AM, 18896724396 wrote:

Hi,
We have two RGWs in master zone and two RGWs in slave zone. We use cosbench
to upload 50,000 objs to a single bucket. After the data sync is finished,
the bucket stats is not the same between master and slave zone.

The data sync may take a while with that many objects. How are you verifying
that data sync finished? Have you tried 'radosgw-admin bucket sync status
--bucket=<name>'?

Then we test the same case with one RGW in master zone and slave zone, the
stats is also not same. At last we test with one RGW and modify the config
rgw_num_rados_handles to 1(we set it 2 before), and this time the stats is
same and correct. Though multiple RGWs still have the problem.
According to the code, I find that when we update bucket index, rgw will
call cls_rgw_bucket_complete_op to update the bucket stats and at last osd
will call rgw_bucket_complete_op. In this function, osd first read the
bucket header, and then update the stats, last it write the head back. So I
think two concurrent request to update the stats may lead to the consistency
problem. And maybe some other operation also have the same problem. How
could we solve the consistency problem?

The osd guarantees that two operations in the same placement group won't run
concurrently, so this kind of logic in cls should be safe. How far off are
the bucket stats? Can you share some example output?

Best regards.
Zhang Shaowen

Thanks,
Casey

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html