Re: RGW multisite replication failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Orit,

Yes, this bug looks to correlate. Was this included in 10.2.3?

I guess not as I have since updated to 10.2.3 but getting the same errors

This bug talks about not retrying after a failure, however do you know
why the sync fails in the first place? It seems that basically any
object over 500k in size fails :(

Kind regards,

Ben Morrice

______________________________________________________________________
Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
EPFL ENT CBS BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland

On 23/09/16 16:52, Orit Wasserman wrote:
> Hi Ben,
> It seems to be http://tracker.ceph.com/issues/16742.
> It is being backported to jewel http://tracker.ceph.com/issues/16794,
> you can try apply it and see if it helps you.
>
> Regards,
> Orit
>
> On Fri, Sep 23, 2016 at 9:21 AM, Ben Morrice <ben.morrice@xxxxxxx> wrote:
>> Hello all,
>>
>> I have two separate ceph (10.2.2) clusters and have configured multisite
>> replication between the two. I can see some buckets get synced, however
>> others do not.
>>
>> Both clusters are RHEL7, and I have upgraded libcurl from 7.29 to 7.50
>> (to avoid http://tracker.ceph.com/issues/15915).
>>
>> Below is some debug output on the 'secondary' zone (bbp-gva-secondary)
>> after uploading a file to the bucket 'bentest1' from onto the master
>> zone (bbp-gva-master).
>>
>> This appears to to be happening very frequently. The size of my bucket
>> pool in the master is ~120GB, however on the secondary site it's only
>> 5GB so things are not very happy at the moment.
>>
>> What steps can I take to work out why RGW cannot create a lock in the
>> log pool?
>>
>> Is there a way to force a full sync, starting fresh (the secondary site
>> is not advertised to users, thus it's okay to even clean pools to start
>> again)?
>>
>>
>> 2016-09-23 09:03:28.498292 7f992e664700 20 execute(): read data:
>> [{"key":6,"val":["bentest1:bbp-gva-master.85732351.16:-1"]}]
>> 2016-09-23 09:03:28.498453 7f992e664700 20 execute(): modified
>> key=bentest1:bbp-gva-master.85732351.16:-1
>> 2016-09-23 09:03:28.498456 7f992e664700 20 wakeup_data_sync_shards:
>> source_zone=bbp-gva-master,
>> shard_ids={6=bentest1:bbp-gva-master.85732351.16:-1}
>> 2016-09-23 09:03:28.498547 7f9a72ffd700 20 incremental_sync(): async
>> update notification: bentest1:bbp-gva-master.85732351.16:-1
>> 2016-09-23 09:03:28.499137 7f9a7dffb700 20 get_system_obj_state:
>> rctx=0x7f9a3c5f8e08
>> obj=.bbp-gva-secondary.log:bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16
>> state=0x7f9a0c069848 s->prefetch_data=0
>> 2016-09-23 09:03:28.501379 7f9a72ffd700 20 operate(): sync status for
>> bucket bentest1:bbp-gva-master.85732351.16:-1: 2
>> 2016-09-23 09:03:28.501433 7f9a877fe700 20 reading from
>> .bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16
>> 2016-09-23 09:03:28.501447 7f9a877fe700 20 get_system_obj_state:
>> rctx=0x7f9a877fc6d0
>> obj=.bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16
>> state=0x7f9a340cfbe8 s->prefetch_data=0
>> 2016-09-23 09:03:28.503269 7f9a877fe700 20 get_system_obj_state:
>> rctx=0x7f9a877fc6d0
>> obj=.bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16
>> state=0x7f9a340cfbe8 s->prefetch_data=0
>> 2016-09-23 09:03:28.510428 7f9a72ffd700 20 sending request to
>> https://bbpobjectstorage.epfl.ch:443/admin/log?bucket-instance=bentest1%3Abbp-gva-master.85732351.16&format=json&marker=00000000034.4578.3&type=bucket-index&rgwx-zonegroup=bbp-gva
>> 2016-09-23 09:03:28.625755 7f9a72ffd700 20 [inc sync] skipping object:
>> bentest1:bbp-gva-master.85732351.16:-1/1m: non-complete operation
>> 2016-09-23 09:03:28.625759 7f9a72ffd700 20 [inc sync] syncing object:
>> bentest1:bbp-gva-master.85732351.16:-1/1m
>> 2016-09-23 09:03:28.625831 7f9a72ffd700 20 bucket sync single entry
>> (source_zone=bbp-gva-master)
>> b=bentest1(@{i=.bbp-gva-secondary.rgw.buckets.index,e=.bbp-gva-master.rgw.buckets.extra}.bbp-gva-secondary.rgw.buckets[bbp-gva-master.85732351.16]):-1/1m[0]
>> log_entry=00000000036.4586.3 op=0 op_state=1
>> 2016-09-23 09:03:28.625857 7f9a72ffd700  5 bucket sync: sync obj:
>> bbp-gva-master/bentest1(@{i=.bbp-gva-secondary.rgw.buckets.index,e=.bbp-gva-master.rgw.buckets.extra}.bbp-gva-secondary.rgw.buckets[bbp-gva-master.85732351.16])/1m[0]
>> 2016-09-23 09:03:28.626092 7f9a85ffb700 20 get_obj_state:
>> rctx=0x7f9a85ff96a0 obj=bentest1:1m state=0x7f9a30051cf8 s->prefetch_data=0
>> 2016-09-23 09:03:28.626119 7f9a72ffd700 20 sending request to
>> https://bbpobjectstorage.epfl.ch:443/admin/log?bucket-instance=bentest1%3Abbp-gva-master.85732351.16&format=json&marker=00000000036.4586.3&type=bucket-index&rgwx-zonegroup=bbp-gva
>> 2016-09-23 09:03:28.627560 7f9a85ffb700 10 get_canon_resource():
>> dest=/bentest1/1m
>> /bentest1/1m
>> 2016-09-23 09:03:28.627612 7f9a85ffb700 20 sending request to
>> https://bbpobjectstorage.epfl.ch:443/bentest1/1m?rgwx-zonegroup=bbp-gva&rgwx-prepend-metadata=bbp-gva
>> 2016-09-23 09:03:28.725185 7f9a72ffd700 20 incremental_sync:1067:
>> shard_id=6 log_entry: 1_1474614207.373384_1713810.1:2016-09-23
>> 09:03:27.0.373384s:bentest1:bbp-gva-master.85732351.16
>> 2016-09-23 09:03:28.725477 7f9a9affd700 20 get_system_obj_state:
>> rctx=0x7f9a3c5f8e08
>> obj=.bbp-gva-secondary.log:bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16
>> state=0x7f9a741bb0a8 s->prefetch_data=0
>> 2016-09-23 09:03:28.728404 7f9a72ffd700 20 operate(): sync status for
>> bucket bentest1:bbp-gva-master.85732351.16:-1: 2
>> 2016-09-23 09:03:28.728462 7f9a7b7f6700 20 reading from
>> .bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16
>> 2016-09-23 09:03:28.728490 7f9a7b7f6700 20 get_system_obj_state:
>> rctx=0x7f9a7b7f46d0
>> obj=.bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16
>> state=0x7f9a000b19b8 s->prefetch_data=0
>> 2016-09-23 09:03:28.729664 7f9a7b7f6700 20 get_system_obj_state:
>> rctx=0x7f9a7b7f46d0
>> obj=.bbp-gva-secondary.domain.rgw:.bucket.meta.bentest1:bbp-gva-master.85732351.16
>> state=0x7f9a000b19b8 s->prefetch_data=0
>> 2016-09-23 09:03:28.731703 7f9a72ffd700 20
>> cr:s=0x7f9a3c5a4f90:op=0x7f9a3ca75ef0:20RGWContinuousLeaseCR: couldn't
>> lock
>> .bbp-gva-secondary.log:bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16:sync_lock:
>> retcode=-16
>> 2016-09-23 09:03:28.731721 7f9a72ffd700  0 ERROR: incremental sync on
>> bentest1 bucket_id=bbp-gva-master.85732351.16 shard_id=-1 failed,
>> retcode=-16
>> 2016-09-23 09:03:28.758421 7f9a72ffd700 20 store_marker(): updating
>> marker
>> marker_oid=bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16
>> marker=00000000035.4585.2
>> 2016-09-23 09:03:28.829207 7f9a72ffd700  0 ERROR: failed to sync object:
>> bentest1:bbp-gva-master.85732351.16:-1/1m
>> 2016-09-23 09:03:28.834281 7f9a72ffd700 20 store_marker(): updating
>> marker
>> marker_oid=bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16
>> marker=00000000036.4586.3
>>
>>
>>
>> --
>> Kind regards,
>>
>> Ben Morrice
>>
>> ______________________________________________________________________
>> Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
>> EPFL ENT CBS BBP
>> Biotech Campus
>> Chemin des Mines 9
>> 1202 Geneva
>> Switzerland
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux