Re: [PATCH v1] proc: Implement /proc/self/meminfo

Johannes Weiner <hannes@xxxxxxxxxxx> · Wed, 9 Jun 2021 16:31:37 -0400

On Wed, Jun 09, 2021 at 02:14:16PM -0500, Eric W. Biederman wrote:
> "Enrico Weigelt, metux IT consult" <lkml@xxxxxxxxx> writes:
> 
> > On 03.06.21 13:33, Chris Down wrote:
> >
> > Hi folks,
> >
> >
> >> Putting stuff in /proc to get around the problem of "some other metric I need
> >> might not be exported to a container" is not a very compelling argument. If
> >> they want it, then export it to the container...
> >>
> >> Ultimately, if they're going to have to add support for a new
> >> /proc/self/meminfo file anyway, these use cases should just do it properly
> >> through the already supported APIs.
> >
> > It's even a bit more complex ...
> >
> > /proc/meminfo always tells what the *machine* has available, not what a
> > process can eat up. That has been this way even long before cgroups.
> > (eg. ulimits).
> >
> > Even if you want a container look more like a VM - /proc/meminfo showing
> > what the container (instead of the machine) has available - just looking
> > at the calling task's cgroup is also wrong. Because there're cgroups
> > outside containers (that really shouldn't be affected) and there're even
> > other cgroups inside the container (that further restrict below the
> > container's limits).
> >
> > BTW: applications trying to autotune themselves by looking at
> > /proc/meminfo are broken-by-design anyways. This never has been a valid
> > metric on how much memory invididual processes can or should eat.
> 
> Which brings us to the problem.
> 
> Using /proc/meminfo is not valid unless your application can know it has
> the machine to itself.  Something that is becoming increasing less
> common.
> 
> Unless something has changed in the last couple of years, reading values
> out of the cgroup filesystem is both difficult (v1 and v2 have some
> gratuitous differences) and is actively discouraged.
> 
> So what should applications do?
> 
> Alex has found applications that are trying to do something with
> meminfo, and the fields that those applications care about.  I don't see
> anyone making the case that specifically what the applications are
> trying to do is buggy.
> 
> Alex's suggest is to have a /proc/self/meminfo that has the information
> that applications want, which would be something that would be easy
> to switch applications to.  The patch to userspace at that point is
> as simple as 3 lines of code.  I can imagine people take that patch into
> their userspace programs.

But is it actually what applications want?

Not all the information at the system level translates well to the
container level. Things like available memory require a hierarchical
assessment rather than just a look at the local level, since there
could be limits higher up the tree.

Not all items in meminfo have a container equivalent, either.

The familiar format is likely a liability rather than an asset.

> The simple fact that people are using /proc/meminfo when it doesn't make
> sense for anything except system monitoring tools is a pretty solid bug
> report on the existing linux apis.

I agree that we likely need a better interface for applications to
query the memory state of their container. But I don't think we should
try to emulate a format that is a poor fit for this.

We should also not speculate what users intended to do with the
meminfo data right now. There is a surprising amount of misconception
around what these values actually mean. I'd rather have users show up
on the mailing list directly and outline the broader usecase.