Re: [PATCH v2] memcg: add charging of already allocated slab objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/29/24 18:10, Shakeel Butt wrote:
> On Thu, Aug 29, 2024 at 11:42:10AM GMT, Vlastimil Babka wrote:
>> On 8/28/24 01:52, Shakeel Butt wrote:
>> > At the moment, the slab objects are charged to the memcg at the
>> > allocation time. However there are cases where slab objects are
>> > allocated at the time where the right target memcg to charge it to is
>> > not known. One such case is the network sockets for the incoming
>> > connection which are allocated in the softirq context.
>> > 
>> > Couple hundred thousand connections are very normal on large loaded
>> > server and almost all of those sockets underlying those connections get
>> > allocated in the softirq context and thus not charged to any memcg.
>> > However later at the accept() time we know the right target memcg to
>> > charge. Let's add new API to charge already allocated objects, so we can
>> > have better accounting of the memory usage.
>> > 
>> > To measure the performance impact of this change, tcp_crr is used from
>> > the neper [1] performance suite. Basically it is a network ping pong
>> > test with new connection for each ping pong.
>> > 
>> > The server and the client are run inside 3 level of cgroup hierarchy
>> > using the following commands:
>> > 
>> > Server:
>> >  $ tcp_crr -6
>> > 
>> > Client:
>> >  $ tcp_crr -6 -c -H ${server_ip}
>> > 
>> > If the client and server run on different machines with 50 GBPS NIC,
>> > there is no visible impact of the change.
>> > 
>> > For the same machine experiment with v6.11-rc5 as base.
>> > 
>> >           base (throughput)     with-patch
>> > tcp_crr   14545 (+- 80)         14463 (+- 56)
>> > 
>> > It seems like the performance impact is within the noise.
>> > 
>> > Link: https://github.com/google/neper [1]
>> > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
>> > ---
>> > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@xxxxxxxxx/
>> > Changes since v1:
>> > - Correctly handle large allocations which bypass slab
>> > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
>> > 
>> > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@xxxxxxxxx/
>> > Changes since the RFC:
>> > - Added check for already charged slab objects.
>> > - Added performance results from neper's tcp_crr
>> > 
>> >  include/linux/slab.h            |  1 +
>> >  mm/slub.c                       | 51 +++++++++++++++++++++++++++++++++
>> >  net/ipv4/inet_connection_sock.c |  5 ++--
>> >  3 files changed, 55 insertions(+), 2 deletions(-)
>> 
>> I can take the v3 in slab tree, if net people ack?
> 
> Thanks.
> 
>> 
>> BTW, will this be also useful for Linus's idea of charging struct files only
>> after they exist? But IIRC there was supposed to be also a part where we
>> have a way to quickly determine if we're not over limit (while allowing some
>> overcharge to make it quicker).
>>
> 
> Do you have link to those discussions or pointers to the code? From what
> you have described, I think this should work. We have the relevant gfp
> flags to control the charging behavior (with some caveats).

I think this was the last part of the discussion, and in the cover letter of
that there are links to the older threads for more context

https://lore.kernel.org/all/CAHk-%3DwhgFtbTxCAg2CWQtDj7n6CEyzvdV1wcCj2qpMfpw0%3Dm1A@xxxxxxxxxxxxxx/

>> Because right now this just overcharges unconditionally, but that's
>> understandable when the irq context creating the socket can't know the memcg
>> upfront. In the open() situation this is different.
>> 
> 
> For networking we deliberately overcharges in the irq context (if
> needed) and the course correct in the task context. However networking
> stack is very robust due to mechanisms like backoff, retransmit to handle
> situations like packet drops, allocation failures, congestion etc. Other
> subsystem are not that robust against ENOMEM. Once I have more detail I
> can follow up on the struct files case.

Ack. Agreed with Roman that it would be a separate followup change.

> thanks,
> Shakeel
> 
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux