RGW multisite slowness issue due to the "304 Not Modified" responses on primary zone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
We have 2 clusters (v18.2.1) primarily used for RGW which has over 2+ billion RGW objects. They are also in multisite configuration totaling to 2 zones and we've got around 2 Gbps of bandwidth dedicated (P2P) for the multisite traffic. We see that using "radosgw-admin sync status" on the zone 2, all the 128 shards are recovering and unfortunately there is very less data transfer from primary zone ie., the link utilization is barely 100 Mbps / 2 Gbps. Our objects are quite small as well like avg. of 1 MB in size. 
On further inspection, we noticed the rgw access the logs at primary site are mostly yielding "304 Not Modified" for RGWs at site-2. Is this expected? Here are some of the logs (information is redacted)

root@host-04:~# tail -f /var/log/haproxy-msync.log
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:33730 [12/Feb/2024:05:06:51.047] https~ backend/host-04-msync 0/0/0/2/2 304 143 - - ---- 56/55/1/0/0 0/0 "GET /bucket1/object1.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7 HTTP/1.1"
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:59730 [12/Feb/2024:05:06:51.048] https~ backend/host-04-msync 0/0/0/2/2 304 143 - - ---- 56/55/3/1/0 0/0 "GET /bucket1/object91.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7 HTTP/1.1"

We also took a look at our Grafana instance and out of 1000 requests / second, 200 requests are getting  "200 OK" and 800 requests are getting "304 Not Modified". Sync threads are run on only 2 rgw daemons per zone and are behind a Load Balancer. "# radosgw-admin sync error list" also contains around 20 errors which are mostly automatically recoverable.
As we understand, does it mean that RGW multisite sync logs in the log pool are yet to be generated or some sort? Please provide us some insights and let us know how to resolve this.

Thanks,
Praveen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux