On Wed, 23 Oct 2024 at 10:12, Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > > +cc Linus as reference a commit of his below... > > On Wed, Oct 23, 2024 at 09:19:03AM +0200, David Hildenbrand wrote: > > On 23.10.24 08:24, Dmitry Vyukov wrote: > > > Hi Florian, Lorenzo, > > > > > > This looks great! > > Thanks! > > > > > > > What I am VERY interested in is if poisoned pages cause SIGSEGV even when > > > the access happens in the kernel. Namely, the syscall still returns EFAULT, > > > but also SIGSEGV is queued on return to user-space. > > Yeah we don't in any way. > > I think adding something like this would be a bit of its own project. I can totally understand this. > The fault andler for this is in handle_pte_marker() in mm/memory.c, where > we do the following: > > /* Hitting a guard page is always a fatal condition. */ > if (marker & PTE_MARKER_GUARD) > return VM_FAULT_SIGSEGV; > > So basically we pass this back to whoever invoked the fault. For uaccess we > end up in arch-specific code that eventually checks exception tables > etc. and for x86-64 that's kernelmode_fixup_or_oops(). > > There used to be a sig_on_uaccess_err in the x86-specific thread_struct > that let you propagate it but Linus pulled it out in commit 02b670c1f88e > ("x86/mm: Remove broken vsyscall emulation code from the page fault code") > where it was presumably used for vsyscall. > > Of course we could just get something much higher up the stack to send the > signal, but we'd need to be careful we weren't breaking anything doing > it... Can setting TIF_NOTIFY_RESUME and then doing the rest when returning to userspace help here? > I address GUP below. > > > > > > > Catching bad accesses in system calls is currently the weak spot for > > > all user-space bug detection tools (GWP-ASan, libefence, libefency, etc). > > > It's almost possible with userfaultfd, but catching faults in the kernel > > > requires admin capability, so not really an option for generic bug > > > detection tools (+inconvinience of userfaultfd setup/handler). > > > Intercepting all EFAULT from syscalls is not generally possible > > > (w/o ptrace, usually not an option as well), and EFAULT does not always > > > mean a bug. > > > > > > Triggering SIGSEGV even in syscalls would be not just a performance > > > optimization, but a new useful capability that would allow it to catch > > > more bugs. > > > > Right, we discussed that offline also as a possible extension to the > > userfaultfd SIGBUS mode. > > > > I did not look into that yet, but I was wonder if there could be cases where > > a different process could trigger that SIGSEGV, and how to (and if to) > > handle that. > > > > For example, ptrace (access_remote_vm()) -> GUP likely can trigger that. I > > think with userfaultfd() we will currently return -EFAULT, because we call > > get_user_page_vma_remote() that is not prepared for dropping the mmap lock. > > Possibly that is the right thing to do, but not sure :) That's a good corner case. I guess also process_vm_readv/writev. Not triggering the signal in these cases looks like the right thing to do. > > These "remote" faults set FOLL_REMOTE -> FAULT_FLAG_REMOTE, so we might be > > able to distinguish them and perform different handling. > > So all GUP will return -EFAULT when hitting guard pages unless we change > something. > > In GUP we handle this in faultin_page(): > > if (ret & VM_FAULT_ERROR) { > int err = vm_fault_to_errno(ret, flags); > > if (err) > return err; > BUG(); > } > > And vm_fault_to_errno() is: > > static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) > { > if (vm_fault & VM_FAULT_OOM) > return -ENOMEM; > if (vm_fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) > return (foll_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT; > if (vm_fault & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) > return -EFAULT; > return 0; > } > > Again, I think if we wanted special handling here we'd need to probably > propagate that fault from higher up, but yes we'd need to for one > definitely not do so if it's remote but I worry about other cases. > > > > > -- > > Cheers, > > > > David / dhildenb > > > > Overall while I sympathise with this, it feels dangerous and a pretty major > change, because there'll be something somewhere that will break because it > expects faults to be swallowed that we no longer do swallow. > > So I'd say it'd be something we should defer, but of course it's a highly > user-facing change so how easy that would be I don't know. > > But I definitely don't think a 'introduce the ability to do cheap PROT_NONE > guards' series is the place to also fundmentally change how user access > page faults are handled within the kernel :) Will delivering signals on kernel access be a backwards compatible change? Or will we need a different API? MADV_GUARD_POISON_KERNEL? It's just somewhat painful to detect/update all userspace if we add this feature in future. Can we say signal delivery on kernel accesses is unspecified?