Hi Ryan,
Sorry to hear about the crashes. Based on the fact that it's happening
on the source zone, I'm guessing that you're hitting this infinite loop
that leads to OOM: http://tracker.ceph.com/issues/20386. The jewel
backport for this one is still pending, so I raised its priority to
Urgent. I'm afraid there isn't a workaround here - the infinite loop
reproduces once the 'data changes log' grows above 1000 entries.
Casey
On 07/26/2017 11:05 AM, Ryan Leimenstoll wrote:
Hi all,
We are currently trying to migrate our RGW Object Storage service from one zone to another (in the same zonegroup) in part to make use of erasure coded data pools. That being said, the rgw daemon is reliably getting OOM killed on the rgw origin host serving the original zone (and thus the current production data) as a result of high rgw memory usage. We are willing to consider more memory for the rgw daemon’s hosts to solve this problem, but was wondering what would be expected memory wise (at least as a rule of thumb). I noticed there were a few memory related rgw sync fixes in 10.2.9, but so far upgrading hasn’t seemed to prevent crashing.
Some details about our cluster:
Ceph Version: 10.2.9
OS: RHEL 7.3
584 OSDs
Serving RBD, CephFS, and RGW
RGW Origin Hosts:
Virtualized via KVM/QEMU, RHEL 7.3
Memory: 32GB
CPU: 12 virtual cores (Hypervisor processors: Intel E5-2630)
First zone data and index pools:
pool name KB objects clones degraded unfound rd rd KB wr wr KB
.rgw.buckets 112190858231 34239746 0 0 0 2713542251 265848150719 475841837 153970795085
.rgw.buckets.index 0 4972 0 0 0 3721485483 5926323574 36030098 0
Thanks,
Ryan Leimenstoll
University of Maryland Institute for Advanced Computer Studies
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com