Re: [PATCH] mm: memcontrol: do not miss MEMCG_MAX events for enforced allocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 6, 2022 at 11:56 AM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:
>
> On Wed, Jul 06, 2022 at 11:42:50AM +0800, Yafang Shao wrote:
> > On Wed, Jul 6, 2022 at 11:28 AM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:
> > >
> > > On Wed, Jul 06, 2022 at 10:46:48AM +0800, Yafang Shao wrote:
> > > > On Wed, Jul 6, 2022 at 4:49 AM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Jul 04, 2022 at 05:07:30PM +0200, Michal Hocko wrote:
> > > > > > On Sat 02-07-22 08:39:14, Roman Gushchin wrote:
> > > > > > > On Fri, Jul 01, 2022 at 10:50:40PM -0700, Shakeel Butt wrote:
> > > > > > > > On Fri, Jul 1, 2022 at 8:35 PM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > Yafang Shao reported an issue related to the accounting of bpf
> > > > > > > > > memory: if a bpf map is charged indirectly for memory consumed
> > > > > > > > > from an interrupt context and allocations are enforced, MEMCG_MAX
> > > > > > > > > events are not raised.
> > > > > > > > >
> > > > > > > > > It's not/less of an issue in a generic case because consequent
> > > > > > > > > allocations from a process context will trigger the reclaim and
> > > > > > > > > MEMCG_MAX events. However a bpf map can belong to a dying/abandoned
> > > > > > > > > memory cgroup, so it might never happen.
> > > > > > > >
> > > > > > > > The patch looks good but the above sentence is confusing. What might
> > > > > > > > never happen? Reclaim or MAX event on dying memcg?
> > > > > > >
> > > > > > > Direct reclaim and MAX events. I agree it might be not clear without
> > > > > > > looking into the code. How about something like this?
> > > > > > >
> > > > > > > "It's not/less of an issue in a generic case because consequent
> > > > > > > allocations from a process context will trigger the direct reclaim
> > > > > > > and MEMCG_MAX events will be raised. However a bpf map can belong
> > > > > > > to a dying/abandoned memory cgroup, so there will be no allocations
> > > > > > > from a process context and no MEMCG_MAX events will be triggered."
> > > > > >
> > > > > > Could you expand little bit more on the situation? Can those charges to
> > > > > > offline memcg happen indefinetely?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > How can it ever go away then?
> > > > >
> > > > > Bpf map should be deleted by a user first.
> > > > >
> > > >
> > > > It can't apply to pinned bpf maps, because the user expects the bpf
> > > > maps to continue working after the user agent exits.
> > > >
> > > > > > Also is this something that we actually want to encourage?
> > > > >
> > > > > Not really. We can implement reparenting (probably objcg-based), I think it's
> > > > > a good idea in general. I can take a look, but can't promise it will be fast.
> > > > >
> > > > > In thory we can't forbid deleting cgroups with associated bpf maps, but I don't
> > > > > thinks it's a good idea.
> > > > >
> > > >
> > > > Agreed. It is not a good idea.
> > > >
> > > > > > In other words shouldn't those remote charges be redirected when the
> > > > > > target memcg is offline?
> > > > >
> > > > > Reparenting is the best answer I have.
> > > > >
> > > >
> > > > At the cost of increasing the complexity of deployment, that may not
> > > > be a good idea neither.
> > >
> > > What do you mean? Can you please elaborate on it?
> > >
> >
> >                    parent memcg
> >                          |
> >                     bpf memcg   <- limit the memory size of bpf
> > programs
> >                         /           \
> >          bpf user agent     pinned bpf program
> >
> > After bpf user agents exit, the bpf memcg will be dead, and then all
> > its memory will be reparented.
> > That is okay for preallocated bpf maps, but not okay for
> > non-preallocated bpf maps.
> > Because the bpf maps will continue to charge, but as all its memory
> > and objcg are reparented, so we have to limit the bpf memory size in
> > the parent as follows,
>
> So you're relying on the memory limit of a dying cgroup?

No. I didn't say it.  What I said is you can't use a dying cgroup to
limit it, that's why I said that we have to use parant memcg to limit
it.

> Sorry, but I don't think we can seriously discuss such a design.
> A dying cgroup is invisible for a user, a user can't change any tunables,
> they have zero visibility into any stats or charges. Why would you do this?
>
> If you want the cgroup to be an active part of the memory management
> process, don't delete it. There are exactly zero guarantees about what
> happens with a memory cgroup after being deleted by a user, it's all
> implementation details.
>
> Anyway, here is the patch for reparenting bpf maps:
> https://github.com/rgushchin/linux/commit/f57df8bb35770507a4624fe52216b6c14f39c50c
>
> I gonna post it to bpf@ after some testing.
>

I will take a look at it.
But AFAIK the reparenting can't resolve the problem of non-preallocated maps.


-- 
Regards
Yafang



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux