Hello ceph users, We have a federated gateway configured to replicate between two zones. Replication seems to be working smoothly between the master and slave zone, however I have a recurring error in the replication log with the following info: INFO:radosgw_agent.worker:17573 is processing shard number 60 INFO:radosgw_agent.sync:60/128 items processed INFO:radosgw_agent.worker:finished processing shard 60 INFO:radosgw_agent.sync:61/128 items processed INFO:radosgw_agent.worker:17573 is processing shard number 61 INFO:radosgw_agent.worker:bucket instance "xxx-secondary-01:alph-1.80907.1" has 1 entries after "00000000112.112.3" INFO:radosgw_agent.worker:syncing bucket "xxx-secondary-01" ERROR:radosgw_agent.worker:failed to sync object xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd: state is error INFO:radosgw_agent.worker:finished processing shard 61 INFO:radosgw_agent.sync:62/128 items processed INFO:radosgw_agent.worker:17573 is processing shard number 62 INFO:radosgw_agent.worker:finished processing shard 62 This file was originally created and deleted via a 3rd party application (Citrix CloudPlatform). On the master zone I can see where the file was deleted and placed in a completed state, see below: (MASTER) radosgw-admin bilog list --bucket=xxxx-secondary-01 -n $GATEWAY_INST . { "op_id": "00000000107.107.2", "op_tag": "alph-1.81679.241", "op": "del", "object": "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd", "state": "pending", "index_ver": 107, "timestamp": "2014-09-18 02:57:58.000000Z", "ver": { "pool": 76, "epoch": 267}}, { "op_id": "00000000108.108.3", "op_tag": "alph-1.81679.241", "op": "del", "object": "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd", "state": "complete", "index_ver": 108, "timestamp": "2014-09-18 02:57:58.000000Z", "ver": { "pool": 76, "epoch": 348}}, . While looking through the slave zone I found the following: (SLAVE): adosgw-admin opstate list -n $GATEWAY_INST . { "client_id": "radosgw-agent", "op_id": "xxxx-xxxx-r1:25526:2", "object": "xxx-secondary-01\/snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vh d", "timestamp": "2014-09-29 17:12:43.402487Z", "state": "error"}, . Also, there was no reference when using: (SLAVE): radosgw-admin bilog list --bucket=xxx-secondary-01 -n $GATEWAY_INST nothing was returned. (SLAVE): The gateway log on the slave has some information: 2014-09-29 13:26:49.554771 7f58881cc700 1 ====== req done req=0x7f58a8080690 http_status=204 ====== 2014-09-29 13:26:49.581884 7f58a61fc700 1 ====== starting new request req=0x7f58a8063be0 ===== 2014-09-29 13:26:49.582592 7f58a61fc700 0 WARNING: couldn't find acl header for bucket, generating default 2014-09-29 13:26:49.587044 7f58a61fc700 0 > HTTP_DATE -> Mon Sep 29 17:26:49 2014 2014-09-29 13:26:49.587063 7f58a61fc700 0 > HTTP_X_AMZ_COPY_SOURCE -> xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd 2014-09-29 13:26:49.608648 7f58a61fc700 0 curl_easy_performed returned error: couldn't connect to host 2014-09-29 13:26:49.612826 7f58a61fc700 1 ====== req done req=0x7f58a8063be0 http_status=400 ====== 2014-09-29 13:26:49.640460 7f5898fe7700 1 ====== starting new request req=0x7f58a8077550 ===== 2014-09-29 13:26:49.643624 7f5898fe7700 1 ====== req done req=0x7f58a8077550 http_status=200 ====== >From the error above it appears the slave is attempting to connect to the master, yet the file it's requesting doesn't exist. I don't think "couldn't connect to host" is accurate because we're not seeing the issue with any other objects which have been replicated. Has anyone by chance run across an instance of this and if so what can be done to remove the references or clean it up? Thanks in advance for any help, MLM -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140929/c9677274/attachment.htm>