RE: [PATCHSET v4] netfilter, cgroup: implement cgroup2 path match in xt_cgroup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: netdev-owner@xxxxxxxxxxxxxxx [mailto:netdev-owner@xxxxxxxxxxxxxxx]
> On Behalf Of Tejun Heo
> Sent: Tuesday, December 8, 2015 6:39
> To: davem@xxxxxxxxxxxxx; pablo@xxxxxxxxxxxxx; kaber@xxxxxxxxx;
> kadlec@xxxxxxxxxxxxxxxxx; daniel@xxxxxxxxxxxxx; daniel.wagner@xxxxxxxxxxxx;
> nhorman@xxxxxxxxxxxxx
> Cc: lizefan@xxxxxxxxxx; hannes@xxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx;
> netfilter-devel@xxxxxxxxxxxxxxx; coreteam@xxxxxxxxxxxxx;
> cgroups@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; kernel-team@xxxxxx;
> ninasc@xxxxxx
> Subject: [PATCHSET v4] netfilter, cgroup: implement cgroup2 path match in
> xt_cgroup
> 
> Hello,
> 
> This is v4 of the xt_cgroup2 patchset.  Changes from v3 are
> 
> * Refreshed on top of net-next/master e72c932d3f8a ("cxgb3: Convert
>   simple_strtoul to kstrtox").  Dropped "cgroups: Allow dynamically
>   changing net_classid" from series as it's already applied.
> 
> The changes from v2 to v3 are
> 
> * Folded cgroup2 path matching into xt_cgroup as a new revision rather
>   than a separate xt_cgroup2 match as suggested by Pablo.
> 
> * Refreshed on top of Nina's net_cls dynamic config update fix patch.
>   I included the fix patch as part of this series to ease reviewing.
> 
> The changes from v1 to v2 are
> 
> * Instead of adding sock->sk_cgroup separately, sock->sk_cgrp_data now
>   carries either (prioidx, classid) pair or cgroup2 pointer.  This
>   avoids inflating struct sock with yet another cgroup related field.
>   Unfortunately, this does add some complexity but that's the
>   trade-off and the complexity is contained in cgroup proper.
> 
> * Various small updats as per David and Jan's reviews.
> 
> 
> In cgroup v1, dealing with cgroup membership was difficult because the
> number of membership associations was unbound.  As a result, cgroup v1
> grew several controllers whose primary purpose is either tagging
> membership or pull in configuration knobs from other subsystems so
> that cgroup membership test can be avoided.
> 
> net_cls and net_prio controllers are examples of the latter.  They
> allow configuring network-specific attributes from cgroup side so that
> network subsystem can avoid testing cgroup membership; unfortunately,
> these are not only cumbersome but also problematic.
> 
> Both net_cls and net_prio aren't properly hierarchical.  Both inherit
> configuration from the parent on creation but there's no interaction
> afterwards.  An ancestor doesn't restrict the behavior in its subtree
> in anyway and configuration changes aren't propagated downwards.
> Especially when combined with cgroup delegation, this is problematic
> because delegatees can mess up whatever network configuration
> implemented at the system level.  net_prio would allow the delegatees
> to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
> the same for classid.
> 
> While it is possible to solve these issues from controller side by
> implementing hierarchical allowable ranges in both controllers, it
> would involve quite a bit of complexity in the controllers and further
> obfuscate network configuration as it becomes even more difficult to
> tell what's actually being configured looking from the network side.
> While not much can be done for v1 at this point, as membership
> handling is sane on cgroup v2, it'd be better to make cgroup matching
> behave like other network matches and classifiers than introducing
> further complications.
> 
> This patchset includes the following eight patches.
> 
>  0001-cgroup-record-ancestor-IDs-and-reimplement-cgroup_is.patch
>  0002-kernfs-implement-kernfs_walk_and_get.patch
>  0003-cgroup-implement-cgroup_get_from_path-and-expose-cgr.patch
>  0004-netprio_cgroup-limit-the-maximum-css-id-to-USHRT_MAX.patch
>  0005-net-wrap-sock-sk_cgrp_prioidx-and-sk_classid-inside-.patch
>  0006-sock-cgroup-add-sock-sk_cgroup.patch
>  0007-netfilter-prepare-xt_cgroup-for-multi-revisions.patch
>  0008-netfilter-implement-xt_cgroup-cgroup2-path-match.patch
> 
> 0001-0003 are prepatory patches in kernfs and cgroup.  These patches
> are available in the following branch which will stay stable.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-4.5-ancestor-test
> 
> 0004-0006 consolidate two cgroup related fields in struct sock into
> cgroup_sock_data and update it so that it can alternatively carry a
> cgroup pointer.
> 
> 0007-0008 implement cgroup2 patch matching in xt_cgroup.
> 
> This patchset is on top of net-next/master and also available in the
> following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-xt_cgroup2
> 
> I'll post iptables extension as a reply.  diffstat follows.  Thanks.
> 
>  fs/kernfs/dir.c                          |   46 +++++++++++
>  include/linux/cgroup-defs.h              |  126 +++++++++++++++++++++++++++++++
>  include/linux/cgroup.h                   |   66 +++++++++++++++-
>  include/linux/kernfs.h                   |   12 ++
>  include/net/cls_cgroup.h                 |   11 +-
>  include/net/netprio_cgroup.h             |   16 +++
>  include/net/sock.h                       |   13 ---
>  include/uapi/linux/netfilter/xt_cgroup.h |   15 +++
>  kernel/cgroup.c                          |  126 ++++++++++++++++++++++++-------
>  net/Kconfig                              |    6 +
>  net/core/dev.c                           |    3
>  net/core/netclassid_cgroup.c             |   11 +-
>  net/core/netprio_cgroup.c                |   19 ++++
>  net/core/scm.c                           |    4
>  net/core/sock.c                          |   17 ----
>  net/netfilter/nft_meta.c                 |    2
>  net/netfilter/xt_cgroup.c                |  108 ++++++++++++++++++++++----
>  17 files changed, 513 insertions(+), 88 deletions(-)
> 
> --
> tejun

Hi Tejun,
With today's linux-next (next-20151214), I still got the same back trace, which
was previously reported at http://lists.openwall.net/netdev/2015/11/23/80:

[   15.129701] BUG: spinlock bad magic on CPU#6, (systemd)/1012
[   15.129701]  lock: cgroup_sk_update_lock+0x0/0x40, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[   15.129701] CPU: 6 PID: 1012 Comm: (systemd) Not tainted 4.4.0-rc4-next-20151214+ #3
[   15.129701] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[   15.129701]  ffffffffae6cddc0 ffff8800e158bab0 ffffffffad317212 0000000000000000
[   15.129701]  ffff8800e158bad0 ffffffffad0a1b8c ffffffffae6cddc0 ffffffffad800ee6
[   15.129701]  ffff8800e158baf0 ffffffffad0a1c06 ffffffffae6cddc0 ffff8800ead9f080
[   15.129701] Call Trace:
[   15.129701]  [<ffffffffad317212>] dump_stack+0x44/0x62
[   15.129701]  [<ffffffffad0a1b8c>] spin_dump+0x7c/0xd0
[   15.129701]  [<ffffffffad0a1c06>] spin_bug+0x26/0x30
[   15.129701]  [<ffffffffad0a1d55>] do_raw_spin_lock+0xe5/0x120
[   15.129701]  [<ffffffffad5968b9>] _raw_spin_lock+0x39/0x40
[   15.129701]  [<ffffffffad4c27b3>] ? update_classid_sock+0x33/0x80
[   15.129701]  [<ffffffffad4c27b3>] update_classid_sock+0x33/0x80
[   15.129701]  [<ffffffffad4c2780>] ? write_classid+0x30/0x30
[   15.129701]  [<ffffffffad1e2c2a>] iterate_fd+0x5a/0x90
[   15.129701]  [<ffffffffad4c2717>] update_classid+0x47/0x80
[   15.129701]  [<ffffffffad4c2825>] cgrp_attach+0x25/0x30
[   15.129701]  [<ffffffffad0de43b>] cgroup_taskset_migrate+0x14b/0x280
[   15.129701]  [<ffffffffad0de62f>] cgroup_migrate+0xbf/0x100
[   15.129701]  [<ffffffffad0de575>] ? cgroup_migrate+0x5/0x100
[   15.129701]  [<ffffffffad0de725>] cgroup_attach_task+0xb5/0x100
[   15.129701]  [<ffffffffad0de675>] ? cgroup_attach_task+0x5/0x100
[   15.129701]  [<ffffffffad0dea0a>] __cgroup_procs_write+0x1da/0x310
[   15.129701]  [<ffffffffad0de88e>] ? __cgroup_procs_write+0x5e/0x310
[   15.129701]  [<ffffffffad0deb74>] cgroup_procs_write+0x14/0x20
[   15.129701]  [<ffffffffad0db3d0>] cgroup_file_write+0x40/0x130
[   15.129701]  [<ffffffffad242fa0>] kernfs_fop_write+0x130/0x180
[   15.129701]  [<ffffffffad1c4618>] __vfs_write+0x28/0xe0
[   15.129701]  [<ffffffffad09bb3c>] ? percpu_down_read+0x3c/0x90
[   15.129701]  [<ffffffffad1c7ddc>] ? __sb_start_write+0xdc/0xf0
[   15.129701]  [<ffffffffad1c7ddc>] ? __sb_start_write+0xdc/0xf0
[   15.129701]  [<ffffffffad1c4eb9>] vfs_write+0xa9/0x190
[   15.129701]  [<ffffffffad1c5e19>] SyS_write+0x49/0xa0
[   15.129701]  [<ffffffffad5972f6>] entry_SYSCALL_64_fastpath+0x16/0x7a

My kernel config is attached FYI.

Thanks,
-- Dexuan

Attachment: kernnel.config
Description: kernnel.config


[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux