Greg, I've found a number of CVEs fixed in upstream a while ago but still affecting stable branches. The following commits should fix most of those for 4.4: d29216842a85c7970c536108e093963f02714498 (CVE-2016-6213) [backported] 8dfbcc4351a0b6d2f2d77f367552f48ffefafe18 (CVE-2016-7913) c58d6c93680f28ac58984af61d0a7ebf4319c241 (CVE-2016-7917) 3de81b758853f0b29c61e246679d20b513c4cfec (CVE-2016-8632) [backported] 05692d7005a364add85c6e25a6c4447ce08f913a (CVE-2016-9083, CVE-2016-9084) 9590232bb4f4cc824f3425a6e1349afbe6d6d2b7 (CVE-2016-9120) 43a6684519ab0a6c52024b5e25322476cabad893 (CVE-2017-2671) 321027c1fe77f892f4ea07846aeae08cefbbb290 (CVE-2017-6001) [backported] 8f8d28e4d6d815a391285e121c3a53a0b6cb9e7b (CVE-2017-7308) bcc5364bdcfe131e6379363f089e7b4108d35b70 (CVE-2017-7308) I've attached patches for those that needed work to backport. CVE-2017-7308 isn't yet fixed in 4.9 or 4.10, but David Miller has the patches queued up. I should be able to provide you with a (much longer) list for 3.18 later. Ben. -- Ben Hutchings Software Developer, Codethink Ltd.
From ca2e63098cf9189b277bf57ca3f1b2f9d03fac4b Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> Date: Wed, 28 Sep 2016 00:27:17 -0500 Subject: [PATCH 1/3] mnt: Add a per mount namespace limit on the number of mounts commit d29216842a85c7970c536108e093963f02714498 upstream. CAI Qian <caiqian@xxxxxxxxxx> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@xxxxxxxxxx> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian <caiqian@xxxxxxxxxx> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@xxxxxxxxxxxxxxx> --- Documentation/sysctl/fs.txt | 7 +++++++ fs/mount.h | 2 ++ fs/namespace.c | 50 ++++++++++++++++++++++++++++++++++++++++++++- fs/pnode.c | 2 +- fs/pnode.h | 1 + include/linux/mount.h | 2 ++ kernel/sysctl.c | 9 ++++++++ 7 files changed, 71 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 302b5ed616a6..35e17f748ca7 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -265,6 +265,13 @@ aio-nr can grow to. ============================================================== +mount-max: + +This denotes the maximum number of mounts that may exist +in a mount namespace. + +============================================================== + 2. /proc/sys/fs/binfmt_misc ---------------------------------------------------------- diff --git a/fs/mount.h b/fs/mount.h index 3dc7dea5a357..13a4ebbbaa74 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -13,6 +13,8 @@ struct mnt_namespace { u64 seq; /* Sequence number to prevent loops */ wait_queue_head_t poll; u64 event; + unsigned int mounts; /* # of mounts in the namespace */ + unsigned int pending_mounts; }; struct mnt_pcp { diff --git a/fs/namespace.c b/fs/namespace.c index 7df3d406d3e0..f26d18d69712 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -27,6 +27,9 @@ #include "pnode.h" #include "internal.h" +/* Maximum number of mounts in a mount namespace */ +unsigned int sysctl_mount_max __read_mostly = 100000; + static unsigned int m_hash_mask __read_mostly; static unsigned int m_hash_shift __read_mostly; static unsigned int mp_hash_mask __read_mostly; @@ -925,6 +928,9 @@ static void commit_tree(struct mount *mnt) list_splice(&head, n->list.prev); + n->mounts += n->pending_mounts; + n->pending_mounts = 0; + __attach_mnt(mnt, parent); touch_mnt_namespace(n); } @@ -1445,11 +1451,16 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how) propagate_umount(&tmp_list); while (!list_empty(&tmp_list)) { + struct mnt_namespace *ns; bool disconnect; p = list_first_entry(&tmp_list, struct mount, mnt_list); list_del_init(&p->mnt_expire); list_del_init(&p->mnt_list); - __touch_mnt_namespace(p->mnt_ns); + ns = p->mnt_ns; + if (ns) { + ns->mounts--; + __touch_mnt_namespace(ns); + } p->mnt_ns = NULL; if (how & UMOUNT_SYNC) p->mnt.mnt_flags |= MNT_SYNC_UMOUNT; @@ -1850,6 +1861,28 @@ static int invent_group_ids(struct mount *mnt, bool recurse) return 0; } +int count_mounts(struct mnt_namespace *ns, struct mount *mnt) +{ + unsigned int max = READ_ONCE(sysctl_mount_max); + unsigned int mounts = 0, old, pending, sum; + struct mount *p; + + for (p = mnt; p; p = next_mnt(p, mnt)) + mounts++; + + old = ns->mounts; + pending = ns->pending_mounts; + sum = old + pending; + if ((old > sum) || + (pending > sum) || + (max < sum) || + (mounts > (max - sum))) + return -ENOSPC; + + ns->pending_mounts = pending + mounts; + return 0; +} + /* * @source_mnt : mount tree to be attached * @nd : place the mount tree @source_mnt is attached @@ -1919,6 +1952,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, struct path *parent_path) { HLIST_HEAD(tree_list); + struct mnt_namespace *ns = dest_mnt->mnt_ns; struct mountpoint *smp; struct mount *child, *p; struct hlist_node *n; @@ -1931,6 +1965,13 @@ static int attach_recursive_mnt(struct mount *source_mnt, if (IS_ERR(smp)) return PTR_ERR(smp); + /* Is there space to add these mounts to the mount namespace? */ + if (!parent_path) { + err = count_mounts(ns, source_mnt); + if (err) + goto out; + } + if (IS_MNT_SHARED(dest_mnt)) { err = invent_group_ids(source_mnt, true); if (err) @@ -1970,11 +2011,14 @@ static int attach_recursive_mnt(struct mount *source_mnt, out_cleanup_ids: while (!hlist_empty(&tree_list)) { child = hlist_entry(tree_list.first, struct mount, mnt_hash); + child->mnt_parent->mnt_ns->pending_mounts = 0; umount_tree(child, UMOUNT_SYNC); } unlock_mount_hash(); cleanup_group_ids(source_mnt, NULL); out: + ns->pending_mounts = 0; + read_seqlock_excl(&mount_lock); put_mountpoint(smp); read_sequnlock_excl(&mount_lock); @@ -2804,6 +2848,8 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns) init_waitqueue_head(&new_ns->poll); new_ns->event = 0; new_ns->user_ns = get_user_ns(user_ns); + new_ns->mounts = 0; + new_ns->pending_mounts = 0; return new_ns; } @@ -2853,6 +2899,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, q = new; while (p) { q->mnt_ns = new_ns; + new_ns->mounts++; if (new_fs) { if (&p->mnt == new_fs->root.mnt) { new_fs->root.mnt = mntget(&q->mnt); @@ -2891,6 +2938,7 @@ static struct mnt_namespace *create_mnt_ns(struct vfsmount *m) struct mount *mnt = real_mount(m); mnt->mnt_ns = new_ns; new_ns->root = mnt; + new_ns->mounts++; list_add(&mnt->mnt_list, &new_ns->list); } else { mntput(m); diff --git a/fs/pnode.c b/fs/pnode.c index b9f2af59b9a6..b394ca5307ec 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -259,7 +259,7 @@ static int propagate_one(struct mount *m) read_sequnlock_excl(&mount_lock); } hlist_add_head(&child->mnt_hash, list); - return 0; + return count_mounts(m->mnt_ns, child); } /* diff --git a/fs/pnode.h b/fs/pnode.h index 623f01772bec..dc87e65becd2 100644 --- a/fs/pnode.h +++ b/fs/pnode.h @@ -54,4 +54,5 @@ void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct mount *copy_tree(struct mount *, struct dentry *, int); bool is_path_reachable(struct mount *, struct dentry *, const struct path *root); +int count_mounts(struct mnt_namespace *ns, struct mount *mnt); #endif /* _LINUX_PNODE_H */ diff --git a/include/linux/mount.h b/include/linux/mount.h index f822c3c11377..dc6cd800cd5d 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -95,4 +95,6 @@ extern void mark_mounts_for_expiry(struct list_head *mounts); extern dev_t name_to_dev_t(const char *name); +extern unsigned int sysctl_mount_max; + #endif /* _LINUX_MOUNT_H */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 2f0d157258a2..300d64162aff 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -65,6 +65,7 @@ #include <linux/sched/sysctl.h> #include <linux/kexec.h> #include <linux/bpf.h> +#include <linux/mount.h> #include <asm/uaccess.h> #include <asm/processor.h> @@ -1749,6 +1750,14 @@ static struct ctl_table fs_table[] = { .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, + { + .procname = "mount-max", + .data = &sysctl_mount_max, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &one, + }, { } }; -- 2.11.0
From 7975af4c56dcc79851602f3154c06642c89f5ce3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Kube=C4=8Dek?= <mkubecek@xxxxxxx> Date: Fri, 2 Dec 2016 09:33:41 +0100 Subject: [PATCH 2/3] tipc: check minimum bearer MTU MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3de81b758853f0b29c61e246679d20b513c4cfec upstream. Qian Zhang (张谦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek <mkubecek@xxxxxxx> Reported-by: Qian Zhang (张谦) <zhangqian-c@xxxxxx> Acked-by: Ying Xue <ying.xue@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> [bwh: Backported to 4.4: - Adjust context - NETDEV_GOING_DOWN and NETDEV_CHANGEMTU cases in net notifier were combined] Signed-off-by: Ben Hutchings <ben.hutchings@xxxxxxxxxxxxxxx> --- net/tipc/bearer.c | 13 +++++++++++-- net/tipc/bearer.h | 13 +++++++++++++ net/tipc/udp_media.c | 5 +++++ 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 648f2a67f314..cb1381513c82 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -381,6 +381,10 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b, dev = dev_get_by_name(net, driver_name); if (!dev) return -ENODEV; + if (tipc_mtu_bad(dev, 0)) { + dev_put(dev); + return -EINVAL; + } /* Associate TIPC bearer with L2 bearer */ rcu_assign_pointer(b->media_ptr, dev); @@ -570,14 +574,19 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt, if (!b_ptr) return NOTIFY_DONE; - b_ptr->mtu = dev->mtu; - switch (evt) { case NETDEV_CHANGE: if (netif_carrier_ok(dev)) break; case NETDEV_GOING_DOWN: + tipc_reset_bearer(net, b_ptr); + break; case NETDEV_CHANGEMTU: + if (tipc_mtu_bad(dev, 0)) { + bearer_disable(net, b_ptr); + break; + } + b_ptr->mtu = dev->mtu; tipc_reset_bearer(net, b_ptr); break; case NETDEV_CHANGEADDR: diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 552185bc4773..5f11e18b1fa1 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -39,6 +39,7 @@ #include "netlink.h" #include "core.h" +#include "msg.h" #include <net/genetlink.h> #define MAX_MEDIA 3 @@ -61,6 +62,9 @@ #define TIPC_MEDIA_TYPE_IB 2 #define TIPC_MEDIA_TYPE_UDP 3 +/* minimum bearer MTU */ +#define TIPC_MIN_BEARER_MTU (MAX_H_SIZE + INT_H_SIZE) + /** * struct tipc_node_map - set of node identifiers * @count: # of nodes in set @@ -226,4 +230,13 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id, void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, struct sk_buff_head *xmitq); +/* check if device MTU is too low for tipc headers */ +static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve) +{ + if (dev->mtu >= TIPC_MIN_BEARER_MTU + reserve) + return false; + netdev_warn(dev, "MTU too low for tipc bearer\n"); + return true; +} + #endif /* _TIPC_BEARER_H */ diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 6af78c6276b4..39497a40b96b 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -376,6 +376,11 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b, udp_conf.local_ip.s_addr = htonl(INADDR_ANY); udp_conf.use_udp_checksums = false; ub->ifindex = dev->ifindex; + if (tipc_mtu_bad(dev, sizeof(struct iphdr) + + sizeof(struct udphdr))) { + err = -EINVAL; + goto err; + } b->mtu = dev->mtu - sizeof(struct iphdr) - sizeof(struct udphdr); #if IS_ENABLED(CONFIG_IPV6) -- 2.11.0
From 325e7202d8d71aeb48034bf4f692d57146c95370 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Date: Wed, 11 Jan 2017 21:09:50 +0100 Subject: [PATCH 3/3] perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream. Di Shen reported a race between two concurrent sys_perf_event_open() calls where both try and move the same pre-existing software group into a hardware context. The problem is exactly that described in commit: f63a8daa5812 ("perf: Fix event->ctx locking") ... where, while we wait for a ctx->mutex acquisition, the event->ctx relation can have changed under us. That very same commit failed to recognise sys_perf_event_context() as an external access vector to the events and thereby didn't apply the established locking rules correctly. So while one sys_perf_event_open() call is stuck waiting on mutex_lock_double(), the other (which owns said locks) moves the group about. So by the time the former sys_perf_event_open() acquires the locks, the context we've acquired is stale (and possibly dead). Apply the established locking rules as per perf_event_ctx_lock_nested() to the mutex_lock_double() for the 'move_group' case. This obviously means we need to validate state after we acquire the locks. Reported-by: Di Shen (Keen Lab) Tested-by: John Dias <joaodias@xxxxxxxxxx> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> Cc: Jiri Olsa <jolsa@xxxxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Min Chong <mchong@xxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Stephane Eranian <eranian@xxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Vince Weaver <vincent.weaver@xxxxxxxxx> Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Link: http://lkml.kernel.org/r/20170106131444.GZ3174@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> [bwh: Backported to 4.4: - Test perf_event::group_flags instead of group_caps - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@xxxxxxxxxxxxxxx> --- kernel/events/core.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index e4b5494f05f8..784ab8fe8714 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8250,6 +8250,37 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id) return 0; } +/* + * Variation on perf_event_ctx_lock_nested(), except we take two context + * mutexes. + */ +static struct perf_event_context * +__perf_event_ctx_lock_double(struct perf_event *group_leader, + struct perf_event_context *ctx) +{ + struct perf_event_context *gctx; + +again: + rcu_read_lock(); + gctx = READ_ONCE(group_leader->ctx); + if (!atomic_inc_not_zero(&gctx->refcount)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + mutex_lock_double(&gctx->mutex, &ctx->mutex); + + if (group_leader->ctx != gctx) { + mutex_unlock(&ctx->mutex); + mutex_unlock(&gctx->mutex); + put_ctx(gctx); + goto again; + } + + return gctx; +} + /** * sys_perf_event_open - open a performance event, associate it to a task/cpu * @@ -8486,8 +8517,26 @@ SYSCALL_DEFINE5(perf_event_open, } if (move_group) { - gctx = group_leader->ctx; - mutex_lock_double(&gctx->mutex, &ctx->mutex); + gctx = __perf_event_ctx_lock_double(group_leader, ctx); + + /* + * Check if we raced against another sys_perf_event_open() call + * moving the software group underneath us. + */ + if (!(group_leader->group_flags & PERF_GROUP_SOFTWARE)) { + /* + * If someone moved the group out from under us, check + * if this new event wound up on the same ctx, if so + * its the regular !move_group case, otherwise fail. + */ + if (gctx != ctx) { + err = -EINVAL; + goto err_locked; + } else { + perf_event_ctx_unlock(group_leader, gctx); + move_group = 0; + } + } } else { mutex_lock(&ctx->mutex); } @@ -8582,7 +8631,7 @@ SYSCALL_DEFINE5(perf_event_open, perf_unpin_context(ctx); if (move_group) - mutex_unlock(&gctx->mutex); + perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); if (task) { @@ -8610,7 +8659,7 @@ SYSCALL_DEFINE5(perf_event_open, err_locked: if (move_group) - mutex_unlock(&gctx->mutex); + perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); /* err_file: */ fput(event_file); -- 2.11.0