On Tue, Jan 17, 2017 at 5:58 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > On Tue 17-01-17 14:32:04, Peter Zijlstra wrote: >> On Tue, Jan 17, 2017 at 02:03:03PM +0100, Michal Hocko wrote: >> > On Sun 15-01-17 20:19:01, Tejun Heo wrote: >> > [...] >> > > So, what's proposed is a proper part of bpf. In terms of >> > > implementation, cgroup helps by hosting the pointers but that doesn't >> > > necessarily affect the conceptual structure of it. Given that, I >> > > don't think it'd be a good idea to add anything to cgroup interface >> > > for this feature. Introspection is great to have but this should be >> > > introspectable together with other bpf programs using the same >> > > mechanism. That's where it belongs. >> > >> > If BPF only piggy backs on top of cgroup to iterate tasks shouldn't we >> > at least enforce that the cgroup has to be a leaf one and no further >> > children groups can be created once there is BPF program attached? >> >> Why (again) this stupid constraint? >> >> If you want to use cgroups for tagging (like perf does), _any_ parent >> cgroup will also tag you. >> >> So creating child cgroups, and placing tasks in it, should not be a >> problem, the BPF thing should apply to all of them. > > This would require using hierarchical cgroup iterators to iterate over > tasks. As per Andy's testing this doesn't seem to be the case. I haven't > checked the implementation closely but my understanding was that using > only cgroup specific tasks was intentional. The current semantics are AFAIK that only the innermost cgroup that has a hook installed is in effect. I think this is the wrong design. I think that the right semantics are probably to support both innermost-to-outermost and outermost-to-innermost and to select which is appropriate for each hook. Suppose we have a cgroup /a/b where a and b both have hooks installed. If the hook is a socket creation or egress hook, I think that b's hook should run first. If b's hook rejects, then a's hook is not run. If b's hook accepts, then a's hook is run. This way a gets the last word on any changes to the socket settings and a sees exactly what would happen if it were to accept. Conversely, for ingress hooks, I think that a's hook should run first. This way a sees the packet as it originally came in and can modify or reject it, and then b only sees whatever a chooses to let through. The guiding principle here is that, for actions that originate outside the machine, the outer hooks should IMO run first and, for actions that originate from a task in a cgroup, the innermost hooks should run first. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html