On Thu, Jan 16, 2020 at 8:39 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > On 1/15/20 6:54 AM, Dan Carpenter wrote: > > > > > > > > What we are trying to do is change the '=' character to a NUL terminator > > > > > > > > and then at the end of the function we restore it back to an '='. The > > > > > > > > problem is there are two error paths where we jump to the end of the > > > > > > > > function before we have replaced the '=' with NUL. We end up putting > > > > > > > > the '=' in the wrong place (possibly one element before the start of > > > > > > > > the buffer). > > > > > > > > > > > > > > Bleh. > > > > > > > > > > > > > > > Reported-by: syzbot+e64a13c5369a194d67df@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > > > Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display") > > > > > > > > Signed-off-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx> > > > > > > > > > > > > > > Acked-by: Vlastimil Babka <vbabka@xxxxxxx> > > > > > > > > > > > > > > CC stable perhaps? Can this (tmpfs mount options parsing AFAICS?) become > > > > > > > part of unprivileged operation in some scenarios? > > > > > > > > > > > > Yes, tmpfs can be mounted by any user inside of a user namespace. > > > > > > > > > > Huh, is there any restriction though? It is certainly not nice to have > > > > > an arbitrary memory allocated without a way of reclaiming it and OOM > > > > > killer wouldn't help for shmem. > > > > > > > > The last time I checked there were hundreds of ways to allocate > > > > arbitrary amounts of memory without any restrictions by any user. The > > > > example at hand was setting up GB-sized netfilter tables in netns > > > > under userns. It's not subject to ulimit/memcg. > > > > > > That's bad! > > > > > > > Most kmalloc/vmalloc's are not accounted and can be abused. > > > > > > Many of those should be bound to some objects and if those are directly > > > controllable by userspace then we should account at least. And if they > > > are not bound to a process life time then restricted. > > > > I see you actually added one GFP_ACCOUNT in netfilter in "netfilter: > > x_tables: do not fail xt_alloc_table_info too easilly". But it seems > > there are more: > > > > $ grep vmalloc\( net/netfilter/*.c > > net/netfilter/nf_tables_api.c: return kvmalloc(alloc, GFP_KERNEL); > > net/netfilter/x_tables.c: xt[af].compat_tab = vmalloc(mem); > > net/netfilter/x_tables.c: mem = vmalloc(len); > > net/netfilter/x_tables.c: info = kvmalloc(sz, GFP_KERNEL_ACCOUNT); > > net/netfilter/xt_hashlimit.c: /* FIXME: don't use vmalloc() here or > > anywhere else -HW */ > > net/netfilter/xt_hashlimit.c: hinfo = vmalloc(struct_size(hinfo, hash, size)); > > > > These are not bound to processes/threads as namespaces are orthogonal to tasks. > > I cannot really comment on those. This is for networking people to > examine and find out whether they allow an untrusted user to runaway. Unless I am missing an elephant in this whole picture, kernel code contains 20K+ unaccounted allocations and if I am not mistaken few of them were audited and are intentionally unaccounted rather than unaccounted just because it's the default. So if we want DoS protection, it's really for every kernel developer/maintainer to audit and fix these allocation sites. And since we have a unikernel, a single unaccounted allocation may compromise the whole kernel. I assume we would need something like GFP_UNACCOUNTED to mark audited allocations that don't need accounting and then slowly reduce number of allocations without both ACCOUNTED and UNACCOUNTED. > > Somebody told me that it's not good to use GFP_ACCOUNT if the > > allocation is not tied to the lifetime of the process. Is it still > > true? > > Those are more tricky. Mostly because there is no way to reclaim the > memory once the hard limit is hit. Even the memcg oom killer will not > help much. So a care should be taken when adding GFP_ACCOUNT for those. > On the other hand it would prevent an unbounded allocations at least > so the DoS would be reduced to the hard limited memcg. What exactly is this care in practice? It seems that in a148ce15375fc664ad64762c751c0c2aecb2cafe you just added it and the allocation is not tied to the process. At least I don't see any explanation as to why that one is safe, while accounting other similar allocation is not...