On Fri, Apr 8, 2011 at 1:38 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote: > On Fri, Apr 08, 2011 at 09:00:56PM +0200, Frederic Weisbecker wrote: >> On Fri, Apr 08, 2011 at 03:37:48AM -0400, Steven Rostedt wrote: >> > I actually agree, as perf is more focused on per process (or group) than >> > ftrace. But that said, I guess the issue is also, if they have a simple >> > solution that is not invasive and suits their needs, what's the harm in >> > accepting it? >> >> What about a kind of cgroup_of(path) operator that we can use on >> filters? >> >> Â Â Â common_pid cgroup_of(path) >> or >> Â Â Â common_pid __cgroup_of__ path >> >> That way you don't bloat the tracing fast path? > > Note in this example, we would simply ignore the common_pid > value and assume that pid is the one of current. This economizes > a step to pid -> task resolution. > This is a decent idea, but I'm worried about the complexity of using filters like this. Filters are written to *every* event that you want the filter to apply to (if you set the top-level filter, it just copies the filter to all applicable events), and this is a filter you would mostly only want to apply to *all* events at once. Furthermore, filters work by discarding the event *after* the event has already been written, so all tasks will be incurring full tracing overhead. With cgroup filtering up front, we can avoid ~90% [0] of the overhead for untraced cgroups. I'm also thinking that cgroups could be a way to expose tracing to non-root users. Making it a filter doesn't work for that. Hmm.. Maybe ftrace needs a "global filters" feature. cgroup and pid would be prime candidates for this, perhaps there are others. These would be an optional list of filters applied *before* writing the event or reserving buffer space, so they could not use the event fields. Mostly I'm thinking they would use things accessible from the current task_struct. If we could work all that out, then I would change a couple things: one of my grand plans for tracing is to remove pid from every event, and replace it with a tiny "pid_changed" event (unless "sched_switch" et al is enabled). So I wouldn't want to attach it to common_pid at all. Instead, I would make it a unary operator. It also doesn't work with multiple hieranchies. When you refer to a cgroup path of "/apps/container_3", are we talking about the cgroup for cpu, or mem, or blkio, or all, or a subset? This is what the "tracing_enabled" files in the cgroup filesystem in Vaibhav's proposal were for. Maybe this could be an optional argument to the unary operator. So, the operator becomes: cgroup_of(/path) means any subsystem, cgroup_of(/path, cpu, mem) means cpu or mem. d# [0] This figure is made up. Like most statistics. ;) _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers