On Sun, Jan 05, 2014 at 05:23:07AM +0000, Waskiewicz Jr, Peter P wrote: > The processor doesn't need to understand the grouping at all, but it > also isn't tracking things per-process that are rolled up later. > They're tracked via the RMID resource in the hardware, which could > correspond to a single process, or 500 processes. It really comes down > to the ease of management of grouping tasks in groups for two consumers, > 1) the end user, and 2) the process scheduler. > > I think I still may not be explaining how the CPU side works well > enough, in order to better understand what I'm trying to do with the > cgroup. Let me try to be a bit more clear, and if I'm still sounding > vague or not making sense, please tell me what isn't clear and I'll try > to be more specific. The new Documentation addition in patch 4 also has > a good overview, but let's try this: > > A CPU may have 32 RMID's in hardware. This is for the platform, not per > core. I may want to have a single process assigned to an RMID for > tracking, say qemu to monitor cache usage of a specific VM. But I also > may want to monitor cache usage of all MySQL database processes with > another RMID, or even split specific processes of that database between > different RMID's. It all comes down to how the end-user wants to > monitor their specific workloads, and how those workloads are impacting > cache usage and occupancy. > > With this implementation I've sent, all tasks are in RMID 0 by default. > Then one can create a subdirectory, just like the cpuacct cgroup, and > then add tasks to that subdirectory's task list. Once that > subdirectory's task list is enabled (through the cacheqos.monitor_cache > handle), then a free RMID is assigned from the CPU, and when the > scheduler switches to any of the tasks in that cgroup under that RMID, > the RMID begins monitoring the usage. > > The CPU side is easy and clean. When something in the software wants to > monitor when a particular task is scheduled and started, write whatever > RMID that task is assigned to (through some mechanism) to the proper MSR > in the CPU. When that task is swapped out, clear the MSR to stop > monitoring of that RMID. When that RMID's statistics are requested by > the software (through some mechanism), then the CPU's MSRs are written > with the RMID in question, and the value is read of what has been > collected so far. In my case, I decided to use a cgroup for this > "mechanism" since so much of the grouping and task/group association > already exists and doesn't need to be rebuilt or re-invented. This still doesn't explain why you can't use perf-cgroup for this. > > In general, I'm quite strongly opposed against using cgroup as > > arbitrary grouping mechanism for anything other than resource control, > > especially given that we're moving away from multiple hierarchies. > > Just to clarify then, would the mechanism in the cpuacct cgroup to > create a group off the root subsystem be considered multi-hierarchical? > If not, then the intent for this new cacheqos subsystem is to be > identical in that regard to cpuacct in the behavior. > > This is a resource controller, it just happens to be tied to a hardware > resource instead of an OS resource. No, cpuacct and perf-cgroup aren't actually controllers at all. They're resource monitors at best. Same with your Cache QoS Monitor, it doesn't control anything. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers