On Thu, Oct 29, 2020 at 05:43:35AM +0100, Jann Horn wrote: > On Thu, Oct 29, 2020 at 3:04 AM Tycho Andersen <tycho@tycho.pizza> wrote: > > On Thu, Oct 29, 2020 at 02:42:58AM +0100, Jann Horn wrote: > > > On Mon, Oct 26, 2020 at 10:55 AM Michael Kerrisk (man-pages) > > > <mtk.manpages@xxxxxxxxx> wrote: > > > > static bool > > > > getTargetPathname(struct seccomp_notif *req, int notifyFd, > > > > char *path, size_t len) > > > > { > > > > char procMemPath[PATH_MAX]; > > > > > > > > snprintf(procMemPath, sizeof(procMemPath), "/proc/%d/mem", req->pid); > > > > > > > > int procMemFd = open(procMemPath, O_RDONLY); > > > > if (procMemFd == -1) > > > > errExit("\tS: open"); > > > > > > > > /* Check that the process whose info we are accessing is still alive. > > > > If the SECCOMP_IOCTL_NOTIF_ID_VALID operation (performed > > > > in checkNotificationIdIsValid()) succeeds, we know that the > > > > /proc/PID/mem file descriptor that we opened corresponds to the > > > > process for which we received a notification. If that process > > > > subsequently terminates, then read() on that file descriptor > > > > will return 0 (EOF). */ > > > > > > > > checkNotificationIdIsValid(notifyFd, req->id); > > > > > > > > /* Read bytes at the location containing the pathname argument > > > > (i.e., the first argument) of the mkdir(2) call */ > > > > > > > > ssize_t nread = pread(procMemFd, path, len, req->data.args[0]); > > > > if (nread == -1) > > > > errExit("pread"); > > > > > > As discussed at > > > <https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@xxxxxxxxxxxxxx>, > > > we need to re-check checkNotificationIdIsValid() after reading remote > > > memory but before using the read value in any way. Otherwise, the > > > syscall could in the meantime get interrupted by a signal handler, the > > > signal handler could return, and then the function that performed the > > > syscall could free() allocations or return (thereby freeing buffers on > > > the stack). > > > > > > In essence, this pread() is (unavoidably) a potential use-after-free > > > read; and to make that not have any security impact, we need to check > > > whether UAF read occurred before using the read value. This should > > > probably be called out elsewhere in the manpage, too... > > > > > > Now, of course, **reading** is the easy case. The difficult case is if > > > we have to **write** to the remote process... because then we can't > > > play games like that. If we write data to a freed pointer, we're > > > screwed, that's it. (And for somewhat unrelated bonus fun, consider > > > that /proc/$pid/mem is originally intended for process debugging, > > > including installing breakpoints, and will therefore happily write > > > over "readonly" private mappings, such as typical mappings of > > > executable code.) > > > > > > So, uuuuh... I guess if anyone wants to actually write memory back to > > > the target process, we'd better come up with some dedicated API for > > > that, using an ioctl on the seccomp fd that magically freezes the > > > > By freeze here you mean a killable wait instead of an interruptible > > wait, right? > > Nope, nonkillable. > > Consider the case of vfork(), where a target process does something like this: > > void spawn_executable(char **argv, char **envv) { > pid_t child = vfork(); > if (child == 0) { > char path[1000]; > sprintf(path, ...); > execve(path, argv, envv); > } > } > > and the seccomp notifier wants to look at the execve() path (as a > somewhat silly example). The child process is just borrowing the > parent's stack, and as soon as the child either gets far enough into > execve() or dies, the parent continues using that stack. Ah ha, yes. Thanks for the explanation. Tycho _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers