Re: [RFC PATCH 0/6] Understanding delays due to throttling under very heavy write load

Tommi Virtanen <tommi.virtanen@xxxxxxxxxxxxx> · Fri, 24 Feb 2012 10:31:43 -0800

On Fri, Feb 24, 2012 at 07:38, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
> I've finally figured out what is going on with this behaviour.
> Memory usage was on the right track.
>
> It turns out to be an unfortunate interaction between the
> number of OSDs/server, number of clients, TCP socket buffer
> autotuning, the policy throttler, and limits on the total
> memory used by the TCP stack (net/ipv4/tcp_mem sysctl).
>
> What happens is that for throttled reader threads, the
> TCP stack will continue to receive data as long as there
> is available socket buffer, and the sender has data to send.

Ohh! Yes, if the userspace stops reading a socket, kernel will buffer
data as per SO_RCVBUF etc. And TCP has global limits, and that is
going to push it uncomfortably close to the global limit.

Ceph *could* manipulate SO_RCVBUF size at the time it decides to
throttle a client, that would limit the TCP buffer space consumed by
throttled clients (except for a race where the data got received
before Ceph called setsockopt). I recall seeing a trick like that
pulled off somewhere, but I can't find an example right now.

Or perhaps we just say "sorry your server is swamped with too much
work for the resources it's given; you need more of them". That's not
nice though, when throttling can slow down the non-throttled
connections.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html