Re: Multi-Site Cluster RGW Sync issues

Matthew H <matthew.heler@xxxxxxxxxxx> · Thu, 28 Feb 2019 05:02:37 +0000

Hey Ben,

Could you include the following?

radosgw-admin mdlog list

Thanks,

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Benjamin.Zieglmeier <Benjamin.Zieglmeier@xxxxxxxxxx>

Sent: Tuesday, February 26, 2019 9:33 AM

To: ceph-users@xxxxxxxxxxxxxx

Subject: Re:  Multi-Site Cluster RGW Sync issues

Hello,

We have a two zone multisite configured Luminous 12.2.5 cluster. Cluster has been running for about 1 year, and has only ~140G of data (~350k objects). We recently added a third zone to the zonegroup to facilitate a migration
 out of an existing site. Sync appears to be working and running `radosgw-admin sync status` and `radosgw-admin sync status –rgw-zone=<new zone name>` reflects the same. The problem we are having, is that once the data replication completes, one of the rgws
 serving the new zone has the radosgw process consuming all the CPU, and the rgw log is flooded with “ERROR: failed to read mdlog info with (2) No such file or directory”, to the amount of 1000 log entries/sec.

This has been happening for days on end now, and are concerned about what is going on between these two zones. Logs are constantly filling up on the rgws and we are out of ideas. Are they trying to catch up on metadata? After
 extensive searching and racking our brains, we are unable to figure out what is causing all these requests (and errors) between the two zones.

Thanks,

Ben

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com