On Fri, Feb 24, 2012 at 07:38, Jim Schutt <jaschut@xxxxxxxxxx> wrote: > I've finally figured out what is going on with this behaviour. > Memory usage was on the right track. > > It turns out to be an unfortunate interaction between the > number of OSDs/server, number of clients, TCP socket buffer > autotuning, the policy throttler, and limits on the total > memory used by the TCP stack (net/ipv4/tcp_mem sysctl). > > What happens is that for throttled reader threads, the > TCP stack will continue to receive data as long as there > is available socket buffer, and the sender has data to send. Ohh! Yes, if the userspace stops reading a socket, kernel will buffer data as per SO_RCVBUF etc. And TCP has global limits, and that is going to push it uncomfortably close to the global limit. Ceph *could* manipulate SO_RCVBUF size at the time it decides to throttle a client, that would limit the TCP buffer space consumed by throttled clients (except for a race where the data got received before Ceph called setsockopt). I recall seeing a trick like that pulled off somewhere, but I can't find an example right now. Or perhaps we just say "sorry your server is swamped with too much work for the resources it's given; you need more of them". That's not nice though, when throttling can slow down the non-throttled connections. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html