Re: [RFD] Provide virtualized CPU system information for containers

Pratik Sampat <psampat@xxxxxxxxxxxxx> · Mon, 26 Jul 2021 17:09:02 +0530

Thank you for your comments.

On 22/07/21 8:52 pm, Eric W. Biederman wrote:
As stated I think this idea is a non-starter.

There is a real problem that there are applications that have a
legitimate need to know what cpu resources are available for them to use
and we don't have a good interfaces for them to request that
information.

I think MESOS solved this by passing a MAX_CPUS environment variable,
and at least the JVM was modified to use that variable.

That said as situations can be a bit more dynamic and fluid having
something where an application can look and see what resources are
available from it's view of the world seems reasonable.

AKA we need something so applications can stop conflating physical
cpu resources that are available with cpu resources that are allowed
to be used in an application.

This might be as simple as implementing a /proc/self/cpus_available
file.

Without the will to go through find existing open source applications
that care and update them so that they will use the new interface I
don't think anything will really happen.

From a process granular point of view I believe a /proc/self approach solves
this problem at root. However, as you have stated too; applications will now
have to look at another interface for the correct information and that could
potentially be a challenge.

The problem I see with changing existing interfaces that describe the
hardware is that the definition becomes unclear and so different
applications can legitimately expect different things, and it would
become impossible to implement what is needed correctly.

In our experimentation and survey we found out that container applications which
were restricted based on a cgroup restriction - both cpuset or period/quota
benefited from coherent information. That was also my understanding with the
usage of tools like LXCFS in the userspace.

Would you happen to know if there are any applications that expect the full
hardware/topology view even though it itself is restricted in its usage?

The problem I see with using cgroup interfaces is that they are not
targeted at end user applications and but rather are targeted at the
problem of controlling access to a resource.  Using them report what is
available again gets you into the multiple master problem.  Especially
as cgroups may not be the only thing in the system controlling access to
your resource.

I agree, cgroup is a control interface and should not be used for presenting of
information and cgroups may not be the only thing in the system controlling
access to the resources.

This is where the idea for a different interface really stemmed from.
That although there are mechanisms to restrict and control usage, there is no
interface that presents information coherently to the userspace

So I really think the only good solution that people won't mind is to go
through the applications figure out what information is legitimately
needed from an application perspective, and build an interface tailored
for applications to get that information.

Then applications can be updated to use the new interface, and as the
implementation of the system changes the implementation in the kernel
can be updated to keep the applications working.

I concur with this approach to build an application first interface.

My current frame of reference for the problems come from tools like LXCFS which
are built around the existing interfaces to present information and the
experiments were designed to quantify those shortcomings.
We could definitely use some help in understanding the shortcomings of the
current interfaces from people who use these applications.

--
Pratik

Eric