Yikes, I think this was actually the problem. nm
# ulimit -n
1024
-----
root@issdm-23:/var/log/ceph# grep -n "Too many" full_conn_refused.log
2417924:2011-12-04 14:52:15.289873 7f1406ecb700 --
192.168.141.123:6800/1325 accepter no incoming connection? sd = -1
errno 24 Too many open files
2417925:2011-12-04 14:52:15.289923 7f1406ecb700 --
192.168.141.123:6800/1325 accepter no incoming connection? sd = -1
errno 24 Too many open files
2417926:2011-12-04 14:52:15.289952 7f1406ecb700 --
192.168.141.123:6800/1325 accepter no incoming connection? sd = -1
errno 24 Too many open files
2417927:2011-12-04 14:52:15.289970 7f1406ecb700 --
192.168.141.123:6800/1325 accepter no incoming connection? sd = -1
errno 24 Too many open files
2417928:2011-12-04 14:52:15.290002 7f1406ecb700 --
192.168.141.123:6800/1325 accepter no incoming connection? sd = -1
errno 24 Too many open files
On 12/04/2011 04:22 PM, Noah Watkins wrote:
We are experiencing client connection problems that occur only after
some period of heavy use. Prior to the 'connection refused' error in
the client log the cluster behaves as normal. Restarting Ceph solves
the problem but we are not able to finish long jobs.
Logs attached. I have the full 1 GB MDS log if needed, and included
only the portition of the log in which the client had problems plus
about 5 seconds of context on either side of the test.
Thanks,
Noah
Client
====
...
2011-12-04 16:07:58.154523 7f4458314700 -- 192.168.141.123:0/1009375
>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0
l=0).connect 0
2011-12-04 16:07:58.154562 7f4458314700 -- 192.168.141.123:0/1009375
>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0
l=0).connecting to 192.168.141.123:6800/1325
2011-12-04 16:07:58.154605 7f4458314700 -- 192.168.141.123:0/1009375
>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0
l=0).connect error 192.168.141.123:6800/1325, 111: Connection refused
2011-12-04 16:07:58.154620 7f4458314700 -- 192.168.141.123:0/1009375
>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0
l=0).fault 111: Connection refused
2011-12-04 16:07:58.154635 7f4458314700 -- 192.168.141.123:0/1009375
>> 192.168.141.123:6800/1325 pipe(0x7f445437d020 sd=55 pgs=0 cs=0
l=0).fault waiting 3.200000
Full logs attached.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html