Hi All. We are trying to cope with radosGW crashing every 5-15mins. This seems to be getting worse and worse but we are unable to determine the cause, nothing in the logs as it appears to be a radosgw hang. The port is open, accepts a connect but there is no response to a HEAD/GET etc etc. We are unsure where to go from here. We have HAProxy running on a dual 10G connected server. It is also doing SSL offload for the gateways. The gateways are civetweb. We run obj01/02 on physical hardware. We have attempted to run 4 instances on the same machine, the machine can cope, but the instances still crash too. We are running 0.94-1337-gce175f3-1 which is https://github.com/ceph/ceph/tree/wip-rgw-orphans/src/rgw Attached is the data via the load balancer for the last week. As you can see its close to 500-900MB/s at most times. [client.radosgw.ceph-obj02] host = ceph-obj02 keyring = /etc/ceph/keyring.radosgw.ceph-obj02 rgw socket path = /tmp/radosgw.sock log file = /var/log/ceph/radosgw.log rgw data = /var/lib/ceph/radosgw/ceph-obj02 rgw thread pool size = 1024 rgw print continue = False rgw enable ops log = False log to stderr = False rgw enable usage log = False Anyone have any thoughts? Is this just a pure capacity/performance issue with civetweb and I need to run more threads/gateways? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150526/69ec3620/attachment.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2015-05-26 at 12.38.09 pm.png Type: image/png Size: 34493 bytes Desc: not available URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150526/69ec3620/attachment.png>