Re: [PATCH v2 0/5] implement lightweight guard pages

David Hildenbrand <david@xxxxxxxxxx> · Wed, 23 Oct 2024 09:19:03 +0200

On 23.10.24 08:24, Dmitry Vyukov wrote:
Hi Florian, Lorenzo,

This looks great!

What I am VERY interested in is if poisoned pages cause SIGSEGV even when
the access happens in the kernel. Namely, the syscall still returns EFAULT,
but also SIGSEGV is queued on return to user-space.

Catching bad accesses in system calls is currently the weak spot for
all user-space bug detection tools (GWP-ASan, libefence, libefency, etc).
It's almost possible with userfaultfd, but catching faults in the kernel
requires admin capability, so not really an option for generic bug
detection tools (+inconvinience of userfaultfd setup/handler).
Intercepting all EFAULT from syscalls is not generally possible
(w/o ptrace, usually not an option as well), and EFAULT does not always
mean a bug.

Triggering SIGSEGV even in syscalls would be not just a performance
optimization, but a new useful capability that would allow it to catch
more bugs.

Right, we discussed that offline also as a possible extension to the 
userfaultfd SIGBUS mode.

I did not look into that yet, but I was wonder if there could be cases 
where a different process could trigger that SIGSEGV, and how to (and if 
to) handle that.

For example, ptrace (access_remote_vm()) -> GUP likely can trigger that. 
I think with userfaultfd() we will currently return -EFAULT, because we 
call get_user_page_vma_remote() that is not prepared for dropping the 
mmap lock. Possibly that is the right thing to do, but not sure :)

These "remote" faults set FOLL_REMOTE -> FAULT_FLAG_REMOTE, so we might 
be able to distinguish them and perform different handling.

--
Cheers,

David / dhildenb