On 10/30/20 8:20 PM, Jann Horn wrote: > On Thu, Oct 29, 2020 at 8:14 PM Michael Kerrisk (man-pages) > <mtk.manpages@xxxxxxxxx> wrote: >> On 10/29/20 2:42 AM, Jann Horn wrote: >>> As discussed at >>> <https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@xxxxxxxxxxxxxx>, >>> we need to re-check checkNotificationIdIsValid() after reading remote >>> memory but before using the read value in any way. Otherwise, the >>> syscall could in the meantime get interrupted by a signal handler, the >>> signal handler could return, and then the function that performed the >>> syscall could free() allocations or return (thereby freeing buffers on >>> the stack). >>> >>> In essence, this pread() is (unavoidably) a potential use-after-free >>> read; and to make that not have any security impact, we need to check >>> whether UAF read occurred before using the read value. This should >>> probably be called out elsewhere in the manpage, too... >>> >>> Now, of course, **reading** is the easy case. The difficult case is if >>> we have to **write** to the remote process... because then we can't >>> play games like that. If we write data to a freed pointer, we're >>> screwed, that's it. (And for somewhat unrelated bonus fun, consider >>> that /proc/$pid/mem is originally intended for process debugging, >>> including installing breakpoints, and will therefore happily write >>> over "readonly" private mappings, such as typical mappings of >>> executable code.) >>> >>> So, uuuuh... I guess if anyone wants to actually write memory back to >>> the target process, we'd better come up with some dedicated API for >>> that, using an ioctl on the seccomp fd that magically freezes the >>> target process inside the syscall while writing to its memory, or >>> something like that? And until then, the manpage should have a big fat >>> warning that writing to the target's memory is simply not possible >>> (safely). >> >> Thank you for your very clear explanation! It turned out to be >> trivially easy to demonstrate this issue with a slightly modified >> version of my program. >> >> As well as the change to the code example that I already mentioned >> my reply of a few hours ago, I've added the following text to the >> page: >> >> Caveats regarding the use of /proc/[tid]/mem >> The discussion above noted the need to use the >> SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) when opening the >> /proc/[tid]/mem file of the target to avoid the possibility of >> accessing the memory of the wrong process in the event that the >> target terminates and its ID is recycled by another (unrelated) >> thread. However, the use of this ioctl(2) operation is also >> necessary in other situations, as explained in the following >> pargraphs. > > (nit: paragraphs) I spotted that one also already. But thanks for reading carefully! >> Consider the following scenario, where the supervisor tries to >> read the pathname argument of a target's blocked mount(2) system >> call: > [...] >> Seem okay? > > Yeah, sounds good. > >> By the way, is there any analogous kind of issue concerning >> pidfd_getfd()? I'm thinking not, but I wonder if I've missed >> something. > > When it is used by a seccomp supervisor, you mean? I think basically > the same thing applies - when resource identifiers (such as memory > addresses or file descriptors) are passed to a syscall, it generally > has to be assumed that those identifiers may become invalid and be > reused as soon as the syscall has returned. I probably needed to be more explicit. Would the following (i.e., a single cookie check) not be sufficient to handle the above scenario. Here, the target is making a syscall a system call that employs the file descriptor 'tfd': T: makes syscall that triggers notification S: Get notification S: pidfd = pidfd_open(T, 0); S: sfd = pifd_getfd(pidfd, tfd, 0) S: check that the cookie is still valid S: do operation with sfd [*] By contrast, I can see that we might want to do multiple cookie checks in the /proc/PID/mem case, since the supervisor might do multiple reads. Or, do you mean: there really needs to be another cookie check after the point [*], since, if the the target's syscall was interrupted and 'tfd' was closed/resused, then the supervisor would be operating with a file descriptor that refers to an open file description (a "struct file") that is no longer meaningful in the target? (Thinking about it, I think this probably is what you mean, but I want to confirm.) Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/