Re: [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 03, 2024 at 11:50:54PM GMT, Al Viro wrote:
> [in vfs.git#fixes]
> 
> copy_fd_bitmaps(new, old, count) is expected to copy the first
> count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
> the rest with zeroes.  What it does is copying enough words
> (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
> That works fine, *if* all bits past the cutoff point are
> clear.  Otherwise we are risking garbage from the last word
> we'd copied.
> 
> For most of the callers that is true - expand_fdtable() has
> count equal to old->max_fds, so there's no open descriptors
> past count, let alone fully occupied words in ->open_fds[],
> which is what bits in ->full_fds_bits[] correspond to.
> 
> The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
> which is the smallest multiple of BITS_PER_LONG that covers all
> opened descriptors below max_fds.  In the common case (copying on
> fork()) max_fds is ~0U, so all opened descriptors will be below
> it and we are fine, by the same reasons why the call in expand_fdtable()
> is safe.
> 
> Unfortunately, there is a case where max_fds is less than that
> and where we might, indeed, end up with junk in ->full_fds_bits[] -
> close_range(from, to, CLOSE_RANGE_UNSHARE) with
> 	* descriptor table being currently shared
> 	* 'to' being above the current capacity of descriptor table
> 	* 'from' being just under some chunk of opened descriptors.
> In that case we end up with observably wrong behaviour - e.g. spawn
> a child with CLONE_FILES, get all descriptors in range 0..127 open,
> then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
> up with descriptor #128, despite #64 being observably not open.
> 
> The best way to fix that is in dup_fd() - there we both can easily check
> whether we might need to fix anything up and see which word needs the
> upper bits cleared.
>     
> Reproducer follows:
> 
> #define __GNU_SOURCE
> #include <linux/close_range.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <signal.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdbool.h>
> #include <sys/mman.h>
>     
> void is_open(int fd)
> {
> 	printf("#%d is %s\n", fd,
> 		fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
> }
>     
> int child(void *unused)
> {
> 	while(1) {
> 	}
> 	return 0;
> }
>     
> int main(void)
> {
> 	char *stack;
> 	pid_t pid;
> 
> 	stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
> 		     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
> 	if (stack == MAP_FAILED) {
> 		perror("mmap");
> 		return -1;
> 	}
> 
> 	pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
> 	if (pid == -1) {
> 		perror("clone");
> 		return -1;
> 	}
> 	for (int i = 2; i < 128; i++)
> 	    dup2(0, i);
> 	close_range(64, ~0U, CLOSE_RANGE_UNSHARE);
> 
> 	is_open(64);
> 	printf("dup(0) => %d, expected 64\n", dup(0));
> 
> 	kill(pid, 9);
> 	return 0;
> }
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

Hm, this should probably also get a fixes tag for
4b3b3b3b3b3b ("vfs: clear remainder of 'full_fds_bits' in dup_fd()")
(and possibly for the CLOSE_RANGE_UNSHARE addition).




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux