Re: RGW Replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Craig Lewis
Senior Systems Engineer
Office +1.714.602.1309
Email clewis@xxxxxxxxxxxxxxxxxx

Central Desktop. Work together in ways you never thought possible.
Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog

On 2/4/14 11:36 , Yehuda Sadeh wrote:
Also, verify whether any objects are missing. Start with just counting
the total number of objects in the buckets (radosgw-admin bucket stats
can give you that info).

Yehuda

Thanks, I didn't know about bucket stats.

bucket stats reports that the slave have fewer objects and kB than the master.

Now that objects are missing in the slave, how do I fix it?  radosgw-agent --sync-scope=full ?



I figured out why replication went so quickly after the restart.  I missed an error in the radosgw-agent logs:
2014-02-04T08:16:28.936 14145:WARNING:radosgw_agent.worker:error locking shard 36 log,  skipping for now. Traceback:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/radosgw_agent/worker.py", line 58, in lock_shard
    self.lock.acquire()
  File "/usr/lib/python2.7/dist-packages/radosgw_agent/lock.py", line 65, in acquire
    self.zone_id, self.timeout, self.locker_id)
  File "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line 241, in lock_shard
    expect_json=False)
  File "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line 155, in request
    check_result_status(result)
  File "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line 116, in check_result_status
    HttpError)(result.status_code, result.content)
HttpError: Http error code 423 content {"Code":"Locked"}
2014-02-04T08:16:28.939 12730:ERROR:radosgw_agent.sync:error syncing shard 36

Full radosgw-agent.log, starting at restart: https://cd.centraldesktop.com/p/eAAAAAAAC60_AAAAAAia_J0



I shutdown radosgw-agent, and restarted all radosgw daemons in the slave cluster.  Replication is proceeding again on shard 36, but I'm seeing the same behavior.  The slave is catching up much too quickly.

Before the stall:
root@ceph1c:/var/log/ceph# zegrep '(live-2:us-west-1|shard 36)' radosgw-agent.us-west-1.us-central-1.log.1.gz | grep -v 'WARNING:radosgw_agent.sync:shard 36 log has fallen behind' | tail
2014-02-03T23:19:11.434 11783:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000115883.315938.2"
2014-02-03T23:24:51.246 11783:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-03T23:25:30.185 6419:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-03T23:25:46.826 6468:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000116882.316964.3"
2014-02-03T23:30:13.648 6468:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-03T23:30:50.132 29240:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-03T23:31:06.808 29390:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000117881.317984.2"
2014-02-03T23:38:56.830 29390:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-03T23:39:58.408 3744:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-03T23:40:15.049 3837:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000118880.319057.3"

After the radosgw and radosgw-agent restart (contained in the full logs linked above):
root@ceph1c:/var/log/ceph# egrep '(live-2:us-west-1|shard 36)' radosgw-agent.us-west-1.us-central-1.log | grep -v 'WARNING:radosgw_agent.sync:shard 36 log has fallen behind'
2014-02-04T08:15:58.966 14045:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T08:16:28.936 14145:WARNING:radosgw_agent.worker:error locking shard 36 log,  skipping for now. Traceback:
2014-02-04T08:16:28.939 12730:ERROR:radosgw_agent.sync:error syncing shard 36
2014-02-04T08:23:50.318 15231:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T08:24:05.970 15288:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000118880.319057.3"
2014-02-04T08:42:20.351 15288:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T08:48:36.509 24250:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T08:48:53.145 24280:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000119879.320127.2"
2014-02-04T08:57:22.429 24280:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:03:35.292 23586:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:03:53.561 23744:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000120878.321183.3"
2014-02-04T09:14:36.249 23744:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:20:15.250 30093:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:20:31.925 30330:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000121877.322255.2"
2014-02-04T09:26:46.652 30330:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:32:57.308 20145:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:33:13.897 20215:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000122876.323275.3"
2014-02-04T09:43:05.327 20215:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:49:20.255 25443:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T09:49:35.869 25479:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000123875.324352.2"
2014-02-04T09:57:12.177 25479:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T10:03:55.676 23373:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T10:04:11.318 23450:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000124874.325371.3"
2014-02-04T10:10:00.548 23450:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T13:29:05.528 28131:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T13:29:36.329 28219:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000125873.326393.2"
2014-02-04T13:35:25.659 28219:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T13:40:56.360 14609:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T13:41:12.087 14679:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000126872.327440.3"
2014-02-04T13:48:23.826 14679:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T13:56:18.406 15364:INFO:radosgw_agent.worker:finished processing shard 36
2014-02-04T13:56:34.125 15578:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000127871.328492.2"
2014-02-04T14:05:30.358 15578:INFO:radosgw_agent.worker:finished processing shard 36




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux