On 10/9/19 4:05 PM, Al Viro wrote: > get_mempolicy(2) and related syscalls have always passed > 1 + number of bits in nodemask as maxnodes argument - see e.g. > copy_nodes_to_user() and get_nodes(). Or libnuma, for the userland > side - > static void getpol(int *oldpolicy, struct bitmask *bmp) > { > if (get_mempolicy(oldpolicy, bmp->maskp, bmp->size + 1, 0, 0) < 0) > numa_error("get_mempolicy"); > } > and similar for other syscalls. However, the check for insufficient > destination size in get_mempolicy(2) used to be > if (nmask != NULL && maxnode < MAX_NUMNODES) > return -EINVAL; > IOW, maxnode == MAX_NUMNODES (representing "MAX_NUMNODES - 1 bits") > had been accepted. The reason why that hadn't messed libnuma > logics used to determine the required bitmap size is that > MAX_NUMNODES is always a power of 2 and the loop in libnuma > is > nodemask_sz = 16; > do { > nodemask_sz <<= 1; > mask = realloc(mask, nodemask_sz / 8); > if (!mask) > return; > } while (get_mempolicy(&pol, mask, nodemask_sz + 1, 0, 0) < 0 && errno == EINVAL && > nodemask_sz < 4096*8); > I.e. it's been passing 33, 65, 127, etc. until it got it large enough. Sigh, it was silly of me to hope nobody is doing that [1]. I thought libnuma was parsing /proc/self/status though, IIRC I've checked [2] > That sidesteps the boundary case - we never try to pass exactly > MAX_NUMNODES there. > > However, that has changed recently, when get_mempolicy() switched > to > if (nmask != NULL && maxnode < nr_node_ids) > return -EINVAL; > _That_ can trigger. Consider a box with nr_node_ids == 65. > The first call in libnuma:set_nodemask_size() loop will > pass 33 and fail, then we'll raise nodemask_sz to 64, > allocate a 64bit mask and call get_mempolicy(&pol, mask, 65, 0, 0), > which will succeed. OK, so we decide to use 64bit bitmaps, and > subsequent getpol() will be passing 65 to get_mempolicy(2). Which > is not a good idea, since kernel-side we'll get > copy_nodes_to_user(nmask, 65, &nodes) > And that will copy only 8 bytes out of kernel-side bitmap with > 65 bits in it... > > IOW, that check always should had been <=, not <; it didn't matter > until commit 050c17f239fd ("numa: change get_mempolicy() to use > nr_node_ids instead of MAX_NUMNODES") this year. The fix is trivial > - we need to make that check consistent with the code that does > actual copyin/copyout. > > Fixes: 050c17f239fd ("numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES") > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> We should have reverted 050c17f239fd as it was fixing a patch in mmotm that was ultimately discarded. It's not ideal e.g. for CRIU to determine maxnode on old system and keep the value even on a new system with possibly more nodes. But the commit was too quickly pushed into stables, complicating the situation. If we're not reverting then Acked-by: Vlastimil Babka <vbabka@xxxxxxx> Thanks. [1] https://lore.kernel.org/linux-mm/32575d26-b141-6985-833a-12d48c0dce6a@xxxxxxx/ [2] https://lore.kernel.org/linux-mm/4dab8a83-803a-56e0-6bbf-bdf581f2d1b4@xxxxxxx/ > --- > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 4ae967bcf954..e184df7633b0 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1561,7 +1561,7 @@ static int kernel_get_mempolicy(int __user *policy, > > addr = untagged_addr(addr); > > - if (nmask != NULL && maxnode < nr_node_ids) > + if (nmask != NULL && maxnode <= nr_node_ids) > return -EINVAL; > > err = do_get_mempolicy(&pval, &nodes, addr, flags); >