Re: [RFC PATCH 0/6] Understanding delays due to throttling under very heavy write load

Tommi Virtanen <tommi.virtanen@xxxxxxxxxxxxx> · Fri, 24 Feb 2012 10:38:27 -0800

I created ticket http://tracker.newdream.net/issues/2100 for this.

On Fri, Feb 24, 2012 at 10:31, Tommi Virtanen
<tommi.virtanen@xxxxxxxxxxxxx> wrote:
> On Fri, Feb 24, 2012 at 07:38, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
>> I've finally figured out what is going on with this behaviour.
>> Memory usage was on the right track.
>>
>> It turns out to be an unfortunate interaction between the
>> number of OSDs/server, number of clients, TCP socket buffer
>> autotuning, the policy throttler, and limits on the total
>> memory used by the TCP stack (net/ipv4/tcp_mem sysctl).
>>
>> What happens is that for throttled reader threads, the
>> TCP stack will continue to receive data as long as there
>> is available socket buffer, and the sender has data to send.
>
> Ohh! Yes, if the userspace stops reading a socket, kernel will buffer
> data as per SO_RCVBUF etc. And TCP has global limits, and that is
> going to push it uncomfortably close to the global limit.
>
> Ceph *could* manipulate SO_RCVBUF size at the time it decides to
> throttle a client, that would limit the TCP buffer space consumed by
> throttled clients (except for a race where the data got received
> before Ceph called setsockopt). I recall seeing a trick like that
> pulled off somewhere, but I can't find an example right now.
>
> Or perhaps we just say "sorry your server is swamped with too much
> work for the resources it's given; you need more of them". That's not
> nice though, when throttling can slow down the non-throttled
> connections.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html