On 23.10.24 08:24, Dmitry Vyukov wrote:
Hi Florian, Lorenzo, This looks great! What I am VERY interested in is if poisoned pages cause SIGSEGV even when the access happens in the kernel. Namely, the syscall still returns EFAULT, but also SIGSEGV is queued on return to user-space. Catching bad accesses in system calls is currently the weak spot for all user-space bug detection tools (GWP-ASan, libefence, libefency, etc). It's almost possible with userfaultfd, but catching faults in the kernel requires admin capability, so not really an option for generic bug detection tools (+inconvinience of userfaultfd setup/handler). Intercepting all EFAULT from syscalls is not generally possible (w/o ptrace, usually not an option as well), and EFAULT does not always mean a bug. Triggering SIGSEGV even in syscalls would be not just a performance optimization, but a new useful capability that would allow it to catch more bugs.
Right, we discussed that offline also as a possible extension to the userfaultfd SIGBUS mode.
I did not look into that yet, but I was wonder if there could be cases where a different process could trigger that SIGSEGV, and how to (and if to) handle that.
For example, ptrace (access_remote_vm()) -> GUP likely can trigger that. I think with userfaultfd() we will currently return -EFAULT, because we call get_user_page_vma_remote() that is not prepared for dropping the mmap lock. Possibly that is the right thing to do, but not sure :)
These "remote" faults set FOLL_REMOTE -> FAULT_FLAG_REMOTE, so we might be able to distinguish them and perform different handling.
-- Cheers, David / dhildenb