Re: RGW Replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 4, 2014 at 5:06 PM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote:
>
>
> On 2/4/14 14:43 , Yehuda Sadeh wrote:
>
> Now that objects are missing in the slave, how do I fix it?  radosgw-agent
> --sync-scope=full ?
>
> That would do it, yes.
>
>
> I'm hesitant to do this, at least until I understand what's going on better.  I know something is wrong, but I don't know what is wrong.
> I want to solve that before using a --sync-scope=full.  Otherwise it'll just happen again next time I start importing data.
>
> I'm going to shutdown replication cleanly, and leave it off.  I'll import enough objects that I hit > 1000 entries, then I'll start up replication with --verbose.  Then I'll check if all the imported objects exist in both clusters.  Repeat until I find missing objects in the slave cluster.
>
>
>
>
> A shard was locked by the agent, but the agent never unlocked it
> (maybe because you took it down?).  The lock itself has a timeout, so
> it's supposed to get released after a while, and then processing
> should resume as usual. However, when it happens you can try playing
> with the rados lock commands (rados lock list, rados lock info, rados
> lock break) to release it (as long as there's no agent running that
> has locked the shard).
>
>
> The rados lock command requires an object name.  I'll see if I can figure out how to map "shard 36" to a rados object in the .rgw.buckets pool.
>
> Thanks!
>
> Does it ever catching up? You mentioned before that most of the writes
> went to the same two buckets, so that's probably one of them. Note
> that writes to the same bucket are being handled in-order by the
> agent.
>
> Yehuda
>
>
> ... I think so.  This is what my graph looks like:
>
>
>
> Being able to answer that question is really what this graph is about.  If you have any suggestions for generic ways to answer that question, I'm open to suggestions.  If you'd like to see what I'm doing, take a look at https://github.com/ceph/radosgw-agent/pull/7
>
> Now that I've started seeing missing objects, I'm not able to download objects that should be on the slave if replication is up to date.  Either it's not up to date, or it's skipping objects every pass.
>
> I'm trying to get the radosgw-agent --verbose output I mentioned above, but this question is more fundamental.  If I don't know if it's up to date or not, looking for missing objects isn't going to do me any good.  I'll work on this now, and get back to the other experiment later.


You can run

$ radosgw-admin bilog list --bucket=<bucket> --marker=<id>

E.g.,

$ radosgw-admin bilog list --bucket=live-2 --marker=00000127871.328492.2

The entries there should have timestamp info.

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux