On Mon, Sep 29, 2014 at 10:44 AM, Lyn Mitchell <mitch95 at bellsouth.net> wrote: > > > Hello ceph users, > > > > We have a federated gateway configured to replicate between two zones. > Replication seems to be working smoothly between the master and slave zone, > however I have a recurring error in the replication log with the following > info: > > > INFO:radosgw_agent.worker:17573 is processing shard number 60 > > INFO:radosgw_agent.sync:60/128 items processed > > INFO:radosgw_agent.worker:finished processing shard 60 > > INFO:radosgw_agent.sync:61/128 items processed > > INFO:radosgw_agent.worker:17573 is processing shard number 61 > > INFO:radosgw_agent.worker:bucket instance "xxx-secondary-01:alph-1.80907.1" > has 1 entries after "00000000112.112.3" > > INFO:radosgw_agent.worker:syncing bucket "xxx-secondary-01" > > ERROR:radosgw_agent.worker:failed to sync object > xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd: > state is error > > INFO:radosgw_agent.worker:finished processing shard 61 > > INFO:radosgw_agent.sync:62/128 items processed > > INFO:radosgw_agent.worker:17573 is processing shard number 62 > > INFO:radosgw_agent.worker:finished processing shard 62 > > > > This file was originally created and deleted via a 3rd party application > (Citrix CloudPlatform). On the master zone I can see where the file was > deleted and placed in a completed state, see below: > > > > (MASTER) > > radosgw-admin bilog list -?bucket=xxxx-secondary-01 ?n $GATEWAY_INST > > ? > > { "op_id": "00000000107.107.2", > > "op_tag": "alph-1.81679.241", > > "op": "del", > > "object": > "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd", > > "state": "pending", > > "index_ver": 107, > > "timestamp": "2014-09-18 02:57:58.000000Z", > > "ver": { "pool": 76, > > "epoch": 267}}, > > { "op_id": "00000000108.108.3", > > "op_tag": "alph-1.81679.241", > > "op": "del", > > "object": > "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd", > > "state": "complete", > > "index_ver": 108, > > "timestamp": "2014-09-18 02:57:58.000000Z", > > "ver": { "pool": 76, > > "epoch": 348}}, > > ? > > > > While looking through the slave zone I found the following: > > (SLAVE): > > adosgw-admin opstate list -n $GATEWAY_INST > > ? > > { "client_id": "radosgw-agent", > > "op_id": "xxxx-xxxx-r1:25526:2", > > "object": > "xxx-secondary-01\/snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd", > > "timestamp": "2014-09-29 17:12:43.402487Z", > > "state": "error"}, > > ? > > Also, there was no reference when using: > (SLAVE): > > radosgw-admin bilog list --bucket=xxx-secondary-01 -n $GATEWAY_INST > > nothing was returned. > > > > (SLAVE): > > The gateway log on the slave has some information: > 2014-09-29 13:26:49.554771 7f58881cc700 1 ====== req done > req=0x7f58a8080690 http_status=204 ====== > > 2014-09-29 13:26:49.581884 7f58a61fc700 1 ====== starting new request > req=0x7f58a8063be0 ===== > > 2014-09-29 13:26:49.582592 7f58a61fc700 0 WARNING: couldn't find acl header > for bucket, generating default > > 2014-09-29 13:26:49.587044 7f58a61fc700 0 > HTTP_DATE -> Mon Sep 29 > 17:26:49 2014 > > 2014-09-29 13:26:49.587063 7f58a61fc700 0 > HTTP_X_AMZ_COPY_SOURCE -> > xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd > > 2014-09-29 13:26:49.608648 7f58a61fc700 0 curl_easy_performed returned > error: couldn't connect to host > > 2014-09-29 13:26:49.612826 7f58a61fc700 1 ====== req done > req=0x7f58a8063be0 http_status=400 ====== > > 2014-09-29 13:26:49.640460 7f5898fe7700 1 ====== starting new request > req=0x7f58a8077550 ===== > > 2014-09-29 13:26:49.643624 7f5898fe7700 1 ====== req done > req=0x7f58a8077550 http_status=200 ====== > > > > From the error above it appears the slave is attempting to connect to the > master, yet the file it?s requesting doesn?t exist. I don?t think ?couldn?t > connect to host? is accurate because we?re not seeing the issue with any > other objects which have been replicated. > I think that's the error that libcurl sends, so it I think it should reflect what's actually happening. > > > Has anyone by chance run across an instance of this and if so what can be > done to remove the references or clean it up? > > Can you turn up rgw debugging? debug ms = 1 debug rgw = 20 Thanks, Yehuda