[PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[in vfs.git#fixes]

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor #128, despite #64 being observably not open.

The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.
    
Reproducer follows:

#define __GNU_SOURCE
#include <linux/close_range.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sched.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/mman.h>
    
void is_open(int fd)
{
	printf("#%d is %s\n", fd,
		fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}
    
int child(void *unused)
{
	while(1) {
	}
	return 0;
}
    
int main(void)
{
	char *stack;
	pid_t pid;

	stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
		     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
	if (stack == MAP_FAILED) {
		perror("mmap");
		return -1;
	}

	pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
	if (pid == -1) {
		perror("clone");
		return -1;
	}
	for (int i = 2; i < 128; i++)
	    dup2(0, i);
	close_range(64, ~0U, CLOSE_RANGE_UNSHARE);

	is_open(64);
	printf("dup(0) => %d, expected 64\n", dup(0));

	kill(pid, 9);
	return 0;
}

Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
---
diff --git a/fs/file.c b/fs/file.c
index a11e59b5d602..7f0ab8636d7c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -380,6 +380,20 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int
 	}
 
 	copy_fd_bitmaps(new_fdt, old_fdt, open_files);
+	if (unlikely(max_fds != NR_OPEN_MAX)) {
+		/*
+		 * if there had been opened descriptors past open_files,
+		 * the last copied word in full_fds_bits might have picked
+		 * stray bits; nothing of that sort happens to open_fds and
+		 * close_on_exec, since there the part that needs to be copied
+		 * will always fall on a word boundary.
+		 */
+		unsigned int n = open_files / BITS_PER_LONG;
+		if (n % BITS_PER_LONG) {
+			unsigned long mask = BITMAP_LAST_WORD_MASK(n);
+			new_fdt->full_fds_bits[n / BITS_PER_LONG] &= mask;
+		}
+	}
 
 	old_fds = old_fdt->fd;
 	new_fds = new_fdt->fd;




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux