Re: radosgw multizone not syncing large bucket completly to other zone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Sun, Jun 24, 2018 at 12:59 AM, Enrico Kern <enrico.kern@xxxxxxxxxxxxxxx> wrote:
Hello,

We have two ceph luminous clusters (12.2.5).

recently one of our big buckets stopped syncing properly. We have a one specific bucket which is around 30TB in size consisting of alot of directories with each one having files of 10-20MB.

The secondary zone is often completly missing multiple days of data in this bucket, while all other smaller buckets sync just fine.

Even with the complete data missing radosgw-admin sync status always says everything is fine.

the sync error log doesnt show anything for those days. 

Running 

radosgw-admin metadata sync and data sync also doesnt solve the issue. The only way of making it sync again is to disable and re-eanble the sync. That needs to be done as often as like 10 times in an hour to make it sync properly.

radosgw-admin bucket sync disable
radosgw-admin bucket sync enable

when i run data init i sometimes get this:

 radosgw-admin data sync init --source-zone berlin
2018-06-24 07:55:46.337858 7fe7557fa700  0 ERROR: failed to distribute cache for amsterdam.rgw.log:datalog.sync-status.6a9448d2-bdba-4bec-aad6-aba72cd8eac6

Sometimes when really alot of data is missing (yesterday it was more then 1 month) this helps making them get in sync again when run on the secondary zone:

radosgw-admin bucket check --fix --check-objects 

how can i debug that problem further? We have so many requests on the cluster that is is hard to dig something out of the log files..

Given all the smaller buckets are perfectly in sync i suspect some problem because of the size of the bucket.

How many objects in the bucket? Is it getting automatically resharded?
 

Any points to the right direction are greatly appreciated.

A few things to look at that might help identify the issue.

What does this show (I think the luminous command is as follows):

$ radosgw-admin bucket sync status --source-zone=<zone>

You can try manually syncing the bucket, and get specific logs for that operation:

$ radosgw-admin bucket sync run -source-zone=<zone> --debug-rgw=20 --debug-ms=1

And you can try getting more info from the sync trace module:

$ ceph --admin-daemon <path to radosgw admin socket> sync trace history <bucket name>

You can also try the 'sync trace show' command.


Yehuda

 

Regards,

Enrico

--

Enrico Kern
VP IT Operations

+49 (0) 30 555713017 / +49 (0) 152 26814501
skype: flyersa
LinkedIn Profile




Glispa GmbH | Berlin Office
Sonnenburger Straße 73
10437 Berlin | Germany


Managing Director Din Karol-Gavish
Registered in Berlin
AG Charlottenburg | HRB 114678B
–––––––––––––––––––––––––––––

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux