On Wed, Jun 09, 2021 at 02:14:16PM -0500, Eric W. Biederman wrote: > "Enrico Weigelt, metux IT consult" <lkml@xxxxxxxxx> writes: > > > On 03.06.21 13:33, Chris Down wrote: > > > > Hi folks, > > > > > >> Putting stuff in /proc to get around the problem of "some other metric I need > >> might not be exported to a container" is not a very compelling argument. If > >> they want it, then export it to the container... > >> > >> Ultimately, if they're going to have to add support for a new > >> /proc/self/meminfo file anyway, these use cases should just do it properly > >> through the already supported APIs. > > > > It's even a bit more complex ... > > > > /proc/meminfo always tells what the *machine* has available, not what a > > process can eat up. That has been this way even long before cgroups. > > (eg. ulimits). > > > > Even if you want a container look more like a VM - /proc/meminfo showing > > what the container (instead of the machine) has available - just looking > > at the calling task's cgroup is also wrong. Because there're cgroups > > outside containers (that really shouldn't be affected) and there're even > > other cgroups inside the container (that further restrict below the > > container's limits). > > > > BTW: applications trying to autotune themselves by looking at > > /proc/meminfo are broken-by-design anyways. This never has been a valid > > metric on how much memory invididual processes can or should eat. > > Which brings us to the problem. > > Using /proc/meminfo is not valid unless your application can know it has > the machine to itself. Something that is becoming increasing less > common. > > Unless something has changed in the last couple of years, reading values > out of the cgroup filesystem is both difficult (v1 and v2 have some > gratuitous differences) and is actively discouraged. > > So what should applications do? > > Alex has found applications that are trying to do something with > meminfo, and the fields that those applications care about. I don't see > anyone making the case that specifically what the applications are > trying to do is buggy. > > Alex's suggest is to have a /proc/self/meminfo that has the information > that applications want, which would be something that would be easy > to switch applications to. The patch to userspace at that point is > as simple as 3 lines of code. I can imagine people take that patch into > their userspace programs. But is it actually what applications want? Not all the information at the system level translates well to the container level. Things like available memory require a hierarchical assessment rather than just a look at the local level, since there could be limits higher up the tree. Not all items in meminfo have a container equivalent, either. The familiar format is likely a liability rather than an asset. > The simple fact that people are using /proc/meminfo when it doesn't make > sense for anything except system monitoring tools is a pretty solid bug > report on the existing linux apis. I agree that we likely need a better interface for applications to query the memory state of their container. But I don't think we should try to emulate a format that is a poor fit for this. We should also not speculate what users intended to do with the meminfo data right now. There is a surprising amount of misconception around what these values actually mean. I'd rather have users show up on the mailing list directly and outline the broader usecase.