Re: [PATCH cgroup/for-3.11 1/3] cgroup: mark "tasks" cgroup file as insane

Tejun Heo <tj@xxxxxxxxxx> · Fri, 7 Jun 2013 13:23:49 -0700

Hello,

On Fri, Jun 07, 2013 at 11:12:20AM +0100, Daniel P. Berrange wrote:
> Well we pretty much needs the tunables available in the cpu, cpuset
> and cpuacct controllers to be available for the set of non-vCPU threads
> as a group. eg, cpu_shares, cfs_period_us, cfs_quota_us, cpuacct.usage,
> cpuacct.usage_percpu, cpuset.cpus, cpuset.mems.
> 
> CPU/memory affinity could possibly be done with a combination of
> sched_setaffinity + libnuma, but I'm not sure that it has quite
> the same semantics. IIUC, with cpuset cgroup changing affinity
> will cause the kernel to migrate existing memory allocations to
> the newly specified node masks, but this isn't done if you just
> use sched_setaffinity/libnuma.

The features are overlapping mostly.  The exact details might differ
but is that really an overriding concern?  With autonuma, the kernel
will do migration automatically anyway and it's more likely to show
better overall behavior anyway performing migration gradually.  One of
the problems with some of cgroup's features is that they're clutch or
hackery to achieve pretty specific goals and I want to at least
discourage such usages.  If something can be done with the usual
programming API, it's better to stick with them.  They tend to be much
better thought-through, engineered and given a lot more attention.

> For cpu accounting, you'd have to look at the overall cgroup usage
> and then subtract the usage accounted to each vcpu thread to get
> the non-vCPU thread group total. Possible but slightly tedious &
> more inaccurate since you will have timing delays getting info
> from each thread's /proc files

But you would want to keep track of how much cpu time each vcpu
consumes anyway, right?  That's a logical thing to do.

> I don't see any way to do cpu_sahres, cfs_period_us and cfs_quota_us
> for the group of non-vCPU threads as a whole. You can't set these
> at the per-thread level since that is semantically very different.
> You can't set these for the process as a whole at the cgroup level,
> since that'll confine vCPU threads at the same time which is also
> semantically very different.

The above doesn't really explain why you need them.  It just describes
what you're doing right now and how that doesn't map exactly to the
usual interface.  What effect are you achieving with tuning those
scheduler params which BTW is much more deeply connected to scheduler
internals and thus a lot more volatile?  Exploiting such features is
dangerous for both userland and kernel - userland is much more likely
to get affected by kernel implementation changes and in turn kernel is
locked into interface which it never intended to make available widely
and gets restricted in what it can do to improve the implementation.
While quite unlikely, people regularly talk about alternative
scheduler implementations and things like above mean that any such
alternative implementations would have to emulate, of course
imperfectly, those knobs which may have no meaning whatsoever to the
new scheduler.

I'm sure you had good reasons adding those but the fact that you're
depending on them at all instead of the usual scheduler API is not a
good thing and if at all possible should be removed.  This is about
the same theme that I talked above.  It looks like a good feature on
the surface but is also something which is deterimental in the long
term.  We want to closely limit accesses to such implementation
details as much as possible.

So, can you please explain what benefits you're getting out of by
tuning those scheduler specific knobs which can't be obtained via
normal scheduler API and how much difference that actually makes?
Because if it's something legitimately necessary, it should really be
accessible in generic manner so that lay programs can make use of them
too.

Thanks.

-- 
tejun
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers