Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 14, 2022 at 12:24 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Wed, Jul 13, 2022 at 10:24:05PM +0800, Yafang Shao wrote:
> > I have told you that it is not reasonable to refuse a containerized
> > process to pin bpf programs, but if you are not familiar with k8s, it
> > is not easy to explain clearly why it is a trouble for deployment.
> > But I can try to explain to you from a *systemd user's* perspective.
>
> The way systemd currently sets up cgroup hierarchy doesn't work for
> persistent per-service resource tracking. It needs to introduce an extra
> layer for that which woudl be a significant change for systemd too.
>
> > I assume the above hierarchy is what you expect.
> > But you know, in the k8s environment, everything is pod-based, that
> > means if we use the above hierarchy in the k8s environment, the k8s's
> > limiting, monitoring, debugging must be changed consequently.  That
> > means it may be a fullstack change in k8s, a great refactor.
> >
> > So below hierarchy is a reasonable solution,
> >                                           bpf-memcg
> >                                                 |
> >   bpf-foo pod                    bpf-foo-memcg     (limited)
> >        /          \                                /
> > (charge)     (not-charged)      (charged)
> > proc-foo                     bpf-foo
> >
> > And then keep the bpf-memgs persistent.
>
> It looks like you draw the diagram with variable width font and it's
> difficult to tell what you're trying to say.

Maybe below diagram is more clear to you ?
                                          bpf-memcg
                                                |
  bpf-foo pod                    bpf-foo-memcg     (limited)
       /          \                                /
(charge)     (not-charged)      (charged)
   |                         \                /
   |                          \              /
proc-foo                   bpf-foo

bpf-foo is loaded by process-foo, but it is not charge to the bpf-foo
pod, while it is remotely charge to bpf-foo-memcg.

>  That said, I don't think the
> argument you're making is a good one in general. The topic at hand is future
> architectural direction in handling shared resources, which was never well
> supported before. ie. We're not talking about breaking existing behaviors.
>
> We don't want to architect kernel features to suit the expectations of one
> particular application. It has to be longer term than that and it can't be
> an one way road. Sometimes the kernel adapts to existing applications
> because the expectations make sense. At other times, kernel takes a
> direction which may require some work from applications to use new
> capabilities because that makes more sense in the long term.
>

The shared resources or remote charge is not a new issue, see also
task->active_memcg. The case (map->memcg or map->objcg) we are
handling now is similar with task->active_memcg. If we want to make it
generic, I think we can start with task->active_memcg.

To make it generic, I have some superficial thinking on the cgroup side.
1) Can we extend the cgroup tree to cgroup graph ?
2) Can we extend the cgroup from process-based (cgroup.procs) to
resource-based (cgroup.resources) ?

Regarding question 1).
Originally the charge direction is vertical,  looks like a tree, as below,
      parent
         ^
          |
       cgroup

But after the task->active_memcg, there's a newly horizontal charge, as below,
      parent
         ^
          |
       cgroup  ---->  friend

They will have a same ancestor, so finally it looks like a graph,
              ancestor
              /             \
           ...               ...
           /                  \
       cgroup  ----  friend


Regarding question 2).
The lifecycle of a leaf cgroup is same with the processes inside it.
But after the remote charge been introduced, the lifecycle of a leaf
cgroup may be same with the process in other cgroups. That said, it is
not sufficient to be treated as process-based, because what it really
care about is the resources, so may be we should extend it to
resource-based.

> Let's keep the discussion more focused on technical merits.
>
> Thanks.
>
> --
> tejun



-- 
Regards
Yafang




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux