> -----Original Message----- > From: netdev-owner@xxxxxxxxxxxxxxx [mailto:netdev-owner@xxxxxxxxxxxxxxx] > On Behalf Of Tejun Heo > Sent: Tuesday, December 8, 2015 6:39 > To: davem@xxxxxxxxxxxxx; pablo@xxxxxxxxxxxxx; kaber@xxxxxxxxx; > kadlec@xxxxxxxxxxxxxxxxx; daniel@xxxxxxxxxxxxx; daniel.wagner@xxxxxxxxxxxx; > nhorman@xxxxxxxxxxxxx > Cc: lizefan@xxxxxxxxxx; hannes@xxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; > netfilter-devel@xxxxxxxxxxxxxxx; coreteam@xxxxxxxxxxxxx; > cgroups@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; kernel-team@xxxxxx; > ninasc@xxxxxx > Subject: [PATCHSET v4] netfilter, cgroup: implement cgroup2 path match in > xt_cgroup > > Hello, > > This is v4 of the xt_cgroup2 patchset. Changes from v3 are > > * Refreshed on top of net-next/master e72c932d3f8a ("cxgb3: Convert > simple_strtoul to kstrtox"). Dropped "cgroups: Allow dynamically > changing net_classid" from series as it's already applied. > > The changes from v2 to v3 are > > * Folded cgroup2 path matching into xt_cgroup as a new revision rather > than a separate xt_cgroup2 match as suggested by Pablo. > > * Refreshed on top of Nina's net_cls dynamic config update fix patch. > I included the fix patch as part of this series to ease reviewing. > > The changes from v1 to v2 are > > * Instead of adding sock->sk_cgroup separately, sock->sk_cgrp_data now > carries either (prioidx, classid) pair or cgroup2 pointer. This > avoids inflating struct sock with yet another cgroup related field. > Unfortunately, this does add some complexity but that's the > trade-off and the complexity is contained in cgroup proper. > > * Various small updats as per David and Jan's reviews. > > > In cgroup v1, dealing with cgroup membership was difficult because the > number of membership associations was unbound. As a result, cgroup v1 > grew several controllers whose primary purpose is either tagging > membership or pull in configuration knobs from other subsystems so > that cgroup membership test can be avoided. > > net_cls and net_prio controllers are examples of the latter. They > allow configuring network-specific attributes from cgroup side so that > network subsystem can avoid testing cgroup membership; unfortunately, > these are not only cumbersome but also problematic. > > Both net_cls and net_prio aren't properly hierarchical. Both inherit > configuration from the parent on creation but there's no interaction > afterwards. An ancestor doesn't restrict the behavior in its subtree > in anyway and configuration changes aren't propagated downwards. > Especially when combined with cgroup delegation, this is problematic > because delegatees can mess up whatever network configuration > implemented at the system level. net_prio would allow the delegatees > to set whatever priority value regardless of CAP_NET_ADMIN and net_cls > the same for classid. > > While it is possible to solve these issues from controller side by > implementing hierarchical allowable ranges in both controllers, it > would involve quite a bit of complexity in the controllers and further > obfuscate network configuration as it becomes even more difficult to > tell what's actually being configured looking from the network side. > While not much can be done for v1 at this point, as membership > handling is sane on cgroup v2, it'd be better to make cgroup matching > behave like other network matches and classifiers than introducing > further complications. > > This patchset includes the following eight patches. > > 0001-cgroup-record-ancestor-IDs-and-reimplement-cgroup_is.patch > 0002-kernfs-implement-kernfs_walk_and_get.patch > 0003-cgroup-implement-cgroup_get_from_path-and-expose-cgr.patch > 0004-netprio_cgroup-limit-the-maximum-css-id-to-USHRT_MAX.patch > 0005-net-wrap-sock-sk_cgrp_prioidx-and-sk_classid-inside-.patch > 0006-sock-cgroup-add-sock-sk_cgroup.patch > 0007-netfilter-prepare-xt_cgroup-for-multi-revisions.patch > 0008-netfilter-implement-xt_cgroup-cgroup2-path-match.patch > > 0001-0003 are prepatory patches in kernfs and cgroup. These patches > are available in the following branch which will stay stable. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-4.5-ancestor-test > > 0004-0006 consolidate two cgroup related fields in struct sock into > cgroup_sock_data and update it so that it can alternatively carry a > cgroup pointer. > > 0007-0008 implement cgroup2 patch matching in xt_cgroup. > > This patchset is on top of net-next/master and also available in the > following git branch. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-xt_cgroup2 > > I'll post iptables extension as a reply. diffstat follows. Thanks. > > fs/kernfs/dir.c | 46 +++++++++++ > include/linux/cgroup-defs.h | 126 +++++++++++++++++++++++++++++++ > include/linux/cgroup.h | 66 +++++++++++++++- > include/linux/kernfs.h | 12 ++ > include/net/cls_cgroup.h | 11 +- > include/net/netprio_cgroup.h | 16 +++ > include/net/sock.h | 13 --- > include/uapi/linux/netfilter/xt_cgroup.h | 15 +++ > kernel/cgroup.c | 126 ++++++++++++++++++++++++------- > net/Kconfig | 6 + > net/core/dev.c | 3 > net/core/netclassid_cgroup.c | 11 +- > net/core/netprio_cgroup.c | 19 ++++ > net/core/scm.c | 4 > net/core/sock.c | 17 ---- > net/netfilter/nft_meta.c | 2 > net/netfilter/xt_cgroup.c | 108 ++++++++++++++++++++++---- > 17 files changed, 513 insertions(+), 88 deletions(-) > > -- > tejun Hi Tejun, With today's linux-next (next-20151214), I still got the same back trace, which was previously reported at http://lists.openwall.net/netdev/2015/11/23/80: [ 15.129701] BUG: spinlock bad magic on CPU#6, (systemd)/1012 [ 15.129701] lock: cgroup_sk_update_lock+0x0/0x40, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 15.129701] CPU: 6 PID: 1012 Comm: (systemd) Not tainted 4.4.0-rc4-next-20151214+ #3 [ 15.129701] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012 [ 15.129701] ffffffffae6cddc0 ffff8800e158bab0 ffffffffad317212 0000000000000000 [ 15.129701] ffff8800e158bad0 ffffffffad0a1b8c ffffffffae6cddc0 ffffffffad800ee6 [ 15.129701] ffff8800e158baf0 ffffffffad0a1c06 ffffffffae6cddc0 ffff8800ead9f080 [ 15.129701] Call Trace: [ 15.129701] [<ffffffffad317212>] dump_stack+0x44/0x62 [ 15.129701] [<ffffffffad0a1b8c>] spin_dump+0x7c/0xd0 [ 15.129701] [<ffffffffad0a1c06>] spin_bug+0x26/0x30 [ 15.129701] [<ffffffffad0a1d55>] do_raw_spin_lock+0xe5/0x120 [ 15.129701] [<ffffffffad5968b9>] _raw_spin_lock+0x39/0x40 [ 15.129701] [<ffffffffad4c27b3>] ? update_classid_sock+0x33/0x80 [ 15.129701] [<ffffffffad4c27b3>] update_classid_sock+0x33/0x80 [ 15.129701] [<ffffffffad4c2780>] ? write_classid+0x30/0x30 [ 15.129701] [<ffffffffad1e2c2a>] iterate_fd+0x5a/0x90 [ 15.129701] [<ffffffffad4c2717>] update_classid+0x47/0x80 [ 15.129701] [<ffffffffad4c2825>] cgrp_attach+0x25/0x30 [ 15.129701] [<ffffffffad0de43b>] cgroup_taskset_migrate+0x14b/0x280 [ 15.129701] [<ffffffffad0de62f>] cgroup_migrate+0xbf/0x100 [ 15.129701] [<ffffffffad0de575>] ? cgroup_migrate+0x5/0x100 [ 15.129701] [<ffffffffad0de725>] cgroup_attach_task+0xb5/0x100 [ 15.129701] [<ffffffffad0de675>] ? cgroup_attach_task+0x5/0x100 [ 15.129701] [<ffffffffad0dea0a>] __cgroup_procs_write+0x1da/0x310 [ 15.129701] [<ffffffffad0de88e>] ? __cgroup_procs_write+0x5e/0x310 [ 15.129701] [<ffffffffad0deb74>] cgroup_procs_write+0x14/0x20 [ 15.129701] [<ffffffffad0db3d0>] cgroup_file_write+0x40/0x130 [ 15.129701] [<ffffffffad242fa0>] kernfs_fop_write+0x130/0x180 [ 15.129701] [<ffffffffad1c4618>] __vfs_write+0x28/0xe0 [ 15.129701] [<ffffffffad09bb3c>] ? percpu_down_read+0x3c/0x90 [ 15.129701] [<ffffffffad1c7ddc>] ? __sb_start_write+0xdc/0xf0 [ 15.129701] [<ffffffffad1c7ddc>] ? __sb_start_write+0xdc/0xf0 [ 15.129701] [<ffffffffad1c4eb9>] vfs_write+0xa9/0x190 [ 15.129701] [<ffffffffad1c5e19>] SyS_write+0x49/0xa0 [ 15.129701] [<ffffffffad5972f6>] entry_SYSCALL_64_fastpath+0x16/0x7a My kernel config is attached FYI. Thanks, -- Dexuan
Attachment:
kernnel.config
Description: kernnel.config