Hello,
We have 3 RGW servers setup with 5 OSDs. We have an application that is doing pretty steady writes, as well as a bunch of reads from that and other applications.
Over the last week or so we have been seeing the app doing the writing getting blocked connections randomly, and in the RGW logs we are seeing it restart, and even with debug logs we are not getting any information about what is going on.
here is the related logs. Any help with where to look would be greatly appreciated. We are at somewhat of a loss as nothing changed on either side prior to this starting.
2016-11-15 03:34:04.565162 7ff5dbfff700 1 ====== req done req=0x7ff5dbff9710 op status=0 http_status=200 ======
2016-11-15 03:34:04.568338 7ff5dbfff700 1 civetweb: 0x7ff5cc0008c0: 10.247.176.69 - - [15/Nov/2016:03:34:04 -0500] "GET / HTTP/1.0" 200 0 - -
2016-11-15 03:34:04.582781 7ff558ff9700 1 ====== req done req=0x7ff558ff3710 op status=0 http_status=200 ======
2016-11-15 03:34:04.593918 7ff558ff9700 1 civetweb: 0x7ff5340008c0: 10.247.176.66 - - [15/Nov/2016:03:34:04 -0500] "GET / HTTP/1.0" 200 0 - -
2016-11-15 03:34:10.973015 7fe7f9c309c0 0 deferred set uid:gid to 167:167 (ceph:ceph)
2016-11-15 03:34:10.973038 7fe7f9c309c0 0 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b), process radosgw, pid 9350
2016-11-15 03:34:11.197719 7fe7f9c309c0 0 error in read_id for id : (2) No such file or directory
2016-11-15 03:34:11.209898 7fe7f9c309c0 0 error in read_id for id : (2) No such file or directory
2016-11-15 03:34:11.542444 7fe7f9c309c0 0 starting handler: civetweb
2016-11-15 03:34:11.543580 7fe7d7fff700 0 -- 10.247.179.50:0/1174194833 submit_message mon_subscribe({osdmap=508}) v2 remote, 10.247.179.50:6789/0, failed lossy con, dropping message 0x7fe7c4012be0
2016-11-15 03:34:11.543721 7fe7d7fff700 0 monclient: hunting for new mon
2016-11-15 03:34:11.712692 7fe6f8ff9700 1 ====== starting new request req=0x7fe6f8ff3710 =====
2016-11-15 03:34:11.774729 7fe6e7fff700 1 ====== starting new request req=0x7fe6e7ff9710 =====
2016-11-15 03:34:11.920085 7fe6e77fe700 1 ====== starting new request req=0x7fe6e77f8710 =====
2016-11-15 03:34:11.992696 7fe6f8ff9700 1 ====== req done req=0x7fe6f8ff3710 op status=0 http_status=200 ======
2016-11-15 03:34:11.992731 7fe6f8ff9700 1 civetweb: 0x7fe7b4002e90: 10.246.179.210 - - [15/Nov/2016:03:34:11 -0500] "GET /admin/log HTTP/1.1" 200 0 - -
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com