failed to sync object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 29, 2014 at 10:44 AM, Lyn Mitchell <mitch95 at bellsouth.net> wrote:
>
>
> Hello ceph users,
>
>
>
> We have a federated gateway configured to replicate between two zones.
> Replication seems to be working smoothly between the master and slave zone,
> however I have a recurring error in the replication log with the following
> info:
>
>
> INFO:radosgw_agent.worker:17573 is processing shard number 60
>
> INFO:radosgw_agent.sync:60/128 items processed
>
> INFO:radosgw_agent.worker:finished processing shard 60
>
> INFO:radosgw_agent.sync:61/128 items processed
>
> INFO:radosgw_agent.worker:17573 is processing shard number 61
>
> INFO:radosgw_agent.worker:bucket instance "xxx-secondary-01:alph-1.80907.1"
> has 1 entries after "00000000112.112.3"
>
> INFO:radosgw_agent.worker:syncing bucket "xxx-secondary-01"
>
> ERROR:radosgw_agent.worker:failed to sync object
> xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd:
> state is error
>
> INFO:radosgw_agent.worker:finished processing shard 61
>
> INFO:radosgw_agent.sync:62/128 items processed
>
> INFO:radosgw_agent.worker:17573 is processing shard number 62
>
> INFO:radosgw_agent.worker:finished processing shard 62
>
>
>
> This file was originally created and deleted via a 3rd party application
> (Citrix CloudPlatform).  On the master zone I can see where the file was
> deleted and placed in a completed state, see below:
>
>
>
> (MASTER)
>
> radosgw-admin bilog list -?bucket=xxxx-secondary-01 ?n $GATEWAY_INST
>
> ?
>
>     { "op_id": "00000000107.107.2",
>
>       "op_tag": "alph-1.81679.241",
>
>       "op": "del",
>
>       "object":
> "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",
>
>       "state": "pending",
>
>       "index_ver": 107,
>
>       "timestamp": "2014-09-18 02:57:58.000000Z",
>
>       "ver": { "pool": 76,
>
>           "epoch": 267}},
>
>     { "op_id": "00000000108.108.3",
>
>       "op_tag": "alph-1.81679.241",
>
>       "op": "del",
>
>       "object":
> "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",
>
>       "state": "complete",
>
>       "index_ver": 108,
>
>       "timestamp": "2014-09-18 02:57:58.000000Z",
>
>       "ver": { "pool": 76,
>
>           "epoch": 348}},
>
> ?
>
>
>
> While looking through the slave zone I found the following:
>
> (SLAVE):
>
> adosgw-admin opstate list -n $GATEWAY_INST
>
> ?
>
>     { "client_id": "radosgw-agent",
>
>       "op_id": "xxxx-xxxx-r1:25526:2",
>
>       "object":
> "xxx-secondary-01\/snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",
>
>       "timestamp": "2014-09-29 17:12:43.402487Z",
>
>       "state": "error"},
>
> ?
>
> Also, there was no reference when using:
> (SLAVE):
>
> radosgw-admin bilog list --bucket=xxx-secondary-01 -n $GATEWAY_INST
>
>        nothing was returned.
>
>
>
> (SLAVE):
>
> The gateway log on the slave has some information:
> 2014-09-29 13:26:49.554771 7f58881cc700  1 ====== req done
> req=0x7f58a8080690 http_status=204 ======
>
> 2014-09-29 13:26:49.581884 7f58a61fc700  1 ====== starting new request
> req=0x7f58a8063be0 =====
>
> 2014-09-29 13:26:49.582592 7f58a61fc700  0 WARNING: couldn't find acl header
> for bucket, generating default
>
> 2014-09-29 13:26:49.587044 7f58a61fc700  0 > HTTP_DATE -> Mon Sep 29
> 17:26:49 2014
>
> 2014-09-29 13:26:49.587063 7f58a61fc700  0 > HTTP_X_AMZ_COPY_SOURCE ->
> xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd
>
> 2014-09-29 13:26:49.608648 7f58a61fc700  0 curl_easy_performed returned
> error: couldn't connect to host
>
> 2014-09-29 13:26:49.612826 7f58a61fc700  1 ====== req done
> req=0x7f58a8063be0 http_status=400 ======
>
> 2014-09-29 13:26:49.640460 7f5898fe7700  1 ====== starting new request
> req=0x7f58a8077550 =====
>
> 2014-09-29 13:26:49.643624 7f5898fe7700  1 ====== req done
> req=0x7f58a8077550 http_status=200 ======
>
>
>
> From the error above it appears the slave is attempting to connect to the
> master, yet the file it?s requesting doesn?t exist.  I don?t think ?couldn?t
> connect to host? is accurate because we?re not seeing the issue with any
> other objects which have been replicated.
>

I think that's the error that libcurl sends, so it I think it should
reflect what's actually happening.

>
>
> Has anyone by chance run across an instance of this and if so what can be
> done to remove the references or clean it up?
>
>

Can you turn up rgw debugging?

debug ms = 1
debug rgw = 20

Thanks,
Yehuda


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux