failed to sync object

mitch95@xxxxxxxxxxxxx (Lyn Mitchell) · Mon, 29 Sep 2014 12:44:56 -0500

Hello ceph users,

We have a federated gateway configured to replicate between two zones.
Replication seems to be working smoothly between the master and slave zone,
however I have a recurring error in the replication log with the following
info:

INFO:radosgw_agent.worker:17573 is processing shard number 60

INFO:radosgw_agent.sync:60/128 items processed

INFO:radosgw_agent.worker:finished processing shard 60

INFO:radosgw_agent.sync:61/128 items processed

INFO:radosgw_agent.worker:17573 is processing shard number 61

INFO:radosgw_agent.worker:bucket instance "xxx-secondary-01:alph-1.80907.1"
has 1 entries after "00000000112.112.3"

INFO:radosgw_agent.worker:syncing bucket "xxx-secondary-01"

ERROR:radosgw_agent.worker:failed to sync object
xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd:
state is error

INFO:radosgw_agent.worker:finished processing shard 61

INFO:radosgw_agent.sync:62/128 items processed

INFO:radosgw_agent.worker:17573 is processing shard number 62

INFO:radosgw_agent.worker:finished processing shard 62

This file was originally created and deleted via a 3rd party application
(Citrix CloudPlatform).  On the master zone I can see where the file was
deleted and placed in a completed state, see below:

(MASTER) 

radosgw-admin bilog list --bucket=xxxx-secondary-01 -n $GATEWAY_INST

.

    { "op_id": "00000000107.107.2",

      "op_tag": "alph-1.81679.241",

      "op": "del",

      "object":
"snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",

      "state": "pending",

      "index_ver": 107,

      "timestamp": "2014-09-18 02:57:58.000000Z",

      "ver": { "pool": 76,

          "epoch": 267}},

    { "op_id": "00000000108.108.3",

      "op_tag": "alph-1.81679.241",

      "op": "del",

      "object":
"snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",

      "state": "complete",

      "index_ver": 108,

      "timestamp": "2014-09-18 02:57:58.000000Z",

      "ver": { "pool": 76,

          "epoch": 348}},

.

While looking through the slave zone I found the following:

(SLAVE):

adosgw-admin opstate list -n $GATEWAY_INST

. 

    { "client_id": "radosgw-agent",

      "op_id": "xxxx-xxxx-r1:25526:2",

      "object":
"xxx-secondary-01\/snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vh
d",

      "timestamp": "2014-09-29 17:12:43.402487Z",

      "state": "error"},

. 

Also, there was no reference when using:
(SLAVE):

radosgw-admin bilog list --bucket=xxx-secondary-01 -n $GATEWAY_INST

       nothing was returned.

(SLAVE):

The gateway log on the slave has some information:
2014-09-29 13:26:49.554771 7f58881cc700  1 ====== req done
req=0x7f58a8080690 http_status=204 ======

2014-09-29 13:26:49.581884 7f58a61fc700  1 ====== starting new request
req=0x7f58a8063be0 =====

2014-09-29 13:26:49.582592 7f58a61fc700  0 WARNING: couldn't find acl header
for bucket, generating default

2014-09-29 13:26:49.587044 7f58a61fc700  0 > HTTP_DATE -> Mon Sep 29
17:26:49 2014

2014-09-29 13:26:49.587063 7f58a61fc700  0 > HTTP_X_AMZ_COPY_SOURCE ->
xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd

2014-09-29 13:26:49.608648 7f58a61fc700  0 curl_easy_performed returned
error: couldn't connect to host

2014-09-29 13:26:49.612826 7f58a61fc700  1 ====== req done
req=0x7f58a8063be0 http_status=400 ======

2014-09-29 13:26:49.640460 7f5898fe7700  1 ====== starting new request
req=0x7f58a8077550 =====

2014-09-29 13:26:49.643624 7f5898fe7700  1 ====== req done
req=0x7f58a8077550 http_status=200 ======

>From the error above it appears the slave is attempting to connect to the
master, yet the file it's requesting doesn't exist.  I don't think "couldn't
connect to host" is accurate because we're not seeing the issue with any
other objects which have been replicated.

Has anyone by chance run across an instance of this and if so what can be
done to remove the references or clean it up?

Thanks in advance for any help,

MLM

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140929/c9677274/attachment.htm>