Re: Consistency problem with multiple rgws

sw zhang <zhangsw1001@xxxxxxxxx> · Fri, 23 Dec 2016 09:53:26 +0800

My colleague already has opened a ticket, so I add the extra info to it.
http://tracker.ceph.com/issues/18260

2016-12-22 0:48 GMT+08:00 Casey Bodley <cbodley@xxxxxxxxxx>:
>
> On 12/16/2016 02:36 AM, sw zhang wrote:
>>
>> Hi,
>> I test it again today that each zone has one RGW with config
>> 'rgw_num_rados_handles=2'. I use cosbench to  upload 50,000 object ,
>> each object is 4M,
>> the number of workers is 10.
>> After the data sync is finished(I use the command 'radosgw-admin
>> bucket sync status --bucket=<name>' and 'radosgw-admin sync status' to
>> check that)
>> Below is the bucket stats result:
>>
>> Master zone:
>> [root@ceph36 ~]# radosgw-admin bucket stats --bucket=shard23
>> {
>>      "bucket": "shard23",
>>      "pool": "master.rgw.buckets.data",
>>      "index_pool": "master.rgw.buckets.index",
>>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "owner": "zsw-test",
>>      "ver": "0#50039,1#49964",
>>      "master_ver": "0#0,1#0",
>>      "mtime": "2016-12-16 10:58:56.174049",
>>      "max_marker": "0#00000050038.56144.3,1#00000049963.56109.3",
>>      "usage": {
>>          "rgw.main": {
>>              "size_kb": 195300782,
>>              "size_kb_actual": 195388276,
>>              "num_objects": 50000
>>          }
>>      },
>>      "bucket_quota": {
>>          "enabled": false,
>>          "max_size_kb": -1,
>>          "max_objects": -1
>>      }
>> }
>>
>> Slave zone:
>> [root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
>> {
>>      "bucket": "shard23",
>>      "pool": "slave.rgw.buckets.data",
>>      "index_pool": "slave.rgw.buckets.index",
>>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "owner": "zsw-test",
>>      "ver": "0#51172,1#51070",
>>      "master_ver": "0#0,1#0",
>>      "mtime": "2016-12-16 10:58:56.174049",
>>      "max_marker": "0#00000051171.112193.3,1#00000051069.79607.3",
>>      "usage": {
>>          "rgw.main": {
>>              "size_kb": 194769532,
>>              "size_kb_actual": 194856788,
>>              "num_objects": 49861
>>          }
>>      },
>>      "bucket_quota": {
>>          "enabled": false,
>>          "max_size_kb": -1,
>>          "max_objects": -1
>>      }
>> }
>>
>> We can see that in slave zone, object number in bucket stats is less
>> than master. But if I use s3cmd to list the bucket in slave zone, the
>> result is right:
>> [root@ceph05 ~]# s3cmd ls s3://shard23 | wc -l
>> 50000
>>
>> And after I list the bucket with s3cmd, I use the bucket stats in
>> slave zone again:
>> [root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23
>> {
>>      "bucket": "shard23",
>>      "pool": "slave.rgw.buckets.data",
>>      "index_pool": "slave.rgw.buckets.index",
>>      "id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
>>      "owner": "zsw-test",
>>      "ver": "0#51182,1#51079",
>>      "master_ver": "0#0,1#0",
>>      "mtime": "2016-12-16 10:58:56.174049",
>>      "max_marker": "0#00000051181.112203.9,1#00000051078.79616.9",
>>      "usage": {
>>          "rgw.main": {
>>              "size_kb": 194769532,
>>              "size_kb_actual": 194856788,
>>              "num_objects": 50000
>>          }
>>      },
>>      "bucket_quota": {
>>          "enabled": false,
>>          "max_size_kb": -1,
>>          "max_objects": -1
>>      }
>> }
>>
>> We can see that the num_objects is right now. (According to the code ,
>> list bucket will send the 'dir_suggest_changes' request to the osd. I
>> think this is why the number is right now.)
>> If each zone have two rgw with config 'rgw_num_rados_handles=1', the
>> difference between the bucket stats is smaller, from 10 to 40.
>> If each zone have one rgw with config 'rgw_num_rados_handles=1', the
>> bucket stats are same.
>> My colleague and I have tested that multi times in two different
>> clusters(Ceph version is jewel), and this problem nearly occurs every
>> time.
>>
> Thanks for the extra info, I'll look into this. Could you please open a
> ticket at http://tracker.ceph.com/projects/rgw/issues/new and include this
> output?
>
>
>> 2016-12-16 0:05 GMT+08:00 Casey Bodley <cbodley@xxxxxxxxxx>:
>>>
>>> Hi,
>>>
>>> On 12/15/2016 02:55 AM, 18896724396 wrote:
>>>
>>> Hi,
>>> We have two RGWs in master zone and two RGWs in slave zone. We use
>>> cosbench
>>> to upload 50,000 objs to a single bucket. After the data sync is
>>> finished,
>>> the bucket stats is not the same between master and slave zone.
>>>
>>> The data sync may take a while with that many objects. How are you
>>> verifying
>>> that data sync finished? Have you tried 'radosgw-admin bucket sync status
>>> --bucket=<name>'?
>>>
>>> Then we test the same case with one RGW in master zone and slave zone,
>>> the
>>> stats is also not same. At last we test with one RGW and modify the
>>> config
>>> rgw_num_rados_handles to 1(we set it 2 before), and this time the stats
>>> is
>>> same and correct. Though multiple RGWs still have the problem.
>>> According to the code, I find that when we update bucket index, rgw will
>>> call cls_rgw_bucket_complete_op to update the bucket stats and at last
>>> osd
>>> will call rgw_bucket_complete_op. In this function, osd first read the
>>> bucket header, and then update the stats, last it write the head back. So
>>> I
>>> think two concurrent request to update the stats may lead to the
>>> consistency
>>> problem. And maybe some other operation also have the same problem. How
>>> could we solve the consistency problem?
>>>
>>> The osd guarantees that two operations in the same placement group won't
>>> run
>>> concurrently, so this kind of logic in cls should be safe. How far off
>>> are
>>> the bucket stats? Can you share some example output?
>>>
>>>
>>> Best regards.
>>> Zhang Shaowen
>>>
>>>
>>> Thanks,
>>> Casey
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html