On 10/9/15 4:45 AM, Hannes Frederic Sowa wrote:
Afaics this problem hasn't even be solved in perf so far, tracepoints hit independent of the namespace currently.
yes and that's exactly what we're trying to solve. The "demux+worker bpf programs" proposal is a work-in-progress solution to get confidence how to actually separate tracepoint events into namespaces before adding any new APIs to kernel.
For me namespacing of ebpf code is actually not that important, I would much rather like to control which namespace is allowed to execute ebpf in an unpriviledged manner. Like Thomas wrote, a capability was great for that, but I don't know if any new capabilities will be added.
I think we're mixing too many things here. First I believe eBPF 'socket filters' do not need any caps. They're packet read-only and functionally very similar to classic with a distinction that packet data can be aggregated into maps and programs can be written in C. So I see no reason to restrict them per user or per namespace. Openstack use case is different. There it will be prog_type_sched_cls that can mangle packets, change skb metadata, etc under TC framework. These are not suitable for all users and this patch leaves them root-only. If you're proposing to add CAP_BPF_TC to let containers use them without being CAP_SYS_ADMIN, then I agree, it is useful, but needs a lot more safety analysis on tc side. Similar for prog_type_kprobe: we can add CAP_BPF_KPROBE to let some trusted applications run unprivileged, but still being able to do performance monitoring/analytics. And we would need to carefully think about program restrictions, since bpf_probe_read and kernel pointer walking is essential part in tracing. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html