I am sorry for posting this if this has been addressed already. I am not sure on how to search through old ceph-users mailing list posts. I used to use gmane.org but that seems to be down.
My setup::
I have a moderate ceph cluster (ceph hammer 94.9 - fe6d859066244b97b24f09d46552afc2071e6f90 ). The cluster is running ubuntu but the gateways are running centos7 due to an odd memory issue we had across all of our gateways.
Outside of that the cluster is pretty standard and healthy:
My setup::
I have a moderate ceph cluster (ceph hammer 94.9 - fe6d859066244b97b24f09d46552afc2071e6f90 ). The cluster is running ubuntu but the gateways are running centos7 due to an odd memory issue we had across all of our gateways.
Outside of that the cluster is pretty standard and healthy:
[root@kh11-9 ~]# ceph -s
cluster XXX-XXX-XXX-XXX
health HEALTH_OK
monmap e4: 3 mons at {kh11-8=X.X.X.X:6789/0,kh12-8=X.X.X.X:6789/0,kh13-8=X.X.X.X:6789/0}
election epoch 150, quorum 0,1,2 kh11-8,kh12-8,kh13-8
osdmap e69678: 627 osds: 627 up, 627 in
Here is my radosgw config in ceph::
The gateways are sitting behind haproxy for ssl termination. Here is my haproxy config:
Here is my radosgw config in ceph::
[client.rgw.kh09-10]
log_file = /var/log/radosgw/client.radosgw.log
rgw_frontends = "civetweb port=80 access_log_file=/var/log/radosgw/rgw.access error_log_file=/var/log/radosgw/rgw.error"
rgw_enable_ops_log = true
rgw_ops_log_rados = true
rgw_thread_pool_size = 1000
rgw_override_bucket_index_max_shards = 23
error_log_file = /var/log/radosgw/civetweb.error.log
access_log_file = /var/log/radosgw/civetweb.access.log
objecter_inflight_op_bytes = 1073741824
objecter_inflight_ops = 20480
ms_dispatch_throttle_bytes = 209715200
The gateways are sitting behind haproxy for ssl termination. Here is my haproxy config:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /var/lib/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
tune.ssl.default-dh-param 2048
tune.ssl.maxrecord 2048
ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
ssl-default-server-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
option forwardfor
option http-server-close
frontend fourfourthree
bind :443 ssl crt /etc/ssl/STAR.opensciencedatacloud.org.pem
reqadd X-Forwarded-Proto:\ https
default_backend radosgw
backend radosgw
cookie RADOSGWLB insert indirect nocache
server primary 127.0.0.1:80 check cookie primary
--------------------
I am seeing sporadic 500 errors in my access logs on all of my radosgws:
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.635645 7feacf6c6700 0 RGWObjManifest::operator++(): result: ofs=12607029248 stripe_ofs=12607029248 part_ofs=12598640640 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.637559 7feacf6c6700 0 RGWObjManifest::operator++(): result: ofs=12611223552 stripe_ofs=12611223552 part_ofs=12598640640 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.642630 7feacf6c6700 0 RGWObjManifest::operator++(): result: ofs=12614369280 stripe_ofs=12614369280 part_ofs=12614369280 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.644368 7feadf6e6700 1 ====== req done req=0x7fed00053a50 http_status=500 ======
/var/log/radosgw/client.radosgw.log:2017-01-13 11:30:41.644475 7feadf6e6700 1 civetweb: 0x7fed00009340: 10.64.0.124 - - [13/Jan/2017:11:28:24 -0600] "GET /BUCKET/306d4fe1-1515-44e0-b527-eee0e83412bf/306d4fe1-1515-44e0-b527-eee0e83412bf_gdc_realn_rehead.bam HTTP/1.1" 500 0 - Boto/2.36.0 Python/2.7.6 Linux/3.13.0-95-generic
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.645611 7feacf6c6700 0 RGWObjManifest::operator++(): result: ofs=12618563584 stripe_ofs=12618563584 part_ofs=12614369280 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.647998 7feacf6c6700 0 RGWObjManifest::operator++(): result: ofs=12622757888 stripe_ofs=12622757888 part_ofs=12614369280 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.650262 7feacf6c6700 0 RGWObjManifest::operator++(): result: ofs=12626952192 stripe_ofs=12626952192 part_ofs=12614369280 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.656394 7feacf6c6700 0 RGWObjManifest::operator++(): result: ofs=12630097920 stripe_ofs=12630097920 part_ofs=12630097920 rule->part_size=15728640
I am able to download that file just fine locally using boto but i have heard from some users that the download hangs indefinitely on occasion. The cluster has been healthy afaik (as of graphite showing health_ok) for the entire period. I am not sure why this is happening or how to troubleshoot it further. Obviously rgw is throwing a 500 which to me means an underlying issue with ceph or the rgw server. All of my downloads complete with boto so I am not sure what is wrong or how this is happening. Is there anything I can do to figure out where the 500 is coming from // troubleshoot further?
- Sean: I wrote this. -
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com