On Wed, Oct 23, 2024 at 01:36:10PM +0200, David Hildenbrand wrote: > On 23.10.24 13:31, Marco Elver wrote: > > On Wed, 23 Oct 2024 at 11:29, David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > > > On 23.10.24 11:18, Lorenzo Stoakes wrote: > > > > On Wed, Oct 23, 2024 at 11:13:47AM +0200, David Hildenbrand wrote: > > > > > On 23.10.24 11:06, Vlastimil Babka wrote: > > > > > > On 10/23/24 10:56, Dmitry Vyukov wrote: > > > > > > > > > > > > > > > > Overall while I sympathise with this, it feels dangerous and a pretty major > > > > > > > > change, because there'll be something somewhere that will break because it > > > > > > > > expects faults to be swallowed that we no longer do swallow. > > > > > > > > > > > > > > > > So I'd say it'd be something we should defer, but of course it's a highly > > > > > > > > user-facing change so how easy that would be I don't know. > > > > > > > > > > > > > > > > But I definitely don't think a 'introduce the ability to do cheap PROT_NONE > > > > > > > > guards' series is the place to also fundmentally change how user access > > > > > > > > page faults are handled within the kernel :) > > > > > > > > > > > > > > Will delivering signals on kernel access be a backwards compatible > > > > > > > change? Or will we need a different API? MADV_GUARD_POISON_KERNEL? > > > > > > > It's just somewhat painful to detect/update all userspace if we add > > > > > > > this feature in future. Can we say signal delivery on kernel accesses > > > > > > > is unspecified? > > > > > > > > > > > > Would adding signal delivery to guard PTEs only help enough the ASAN etc > > > > > > usecase? Wouldn't it be instead possible to add some prctl to opt-in the > > > > > > whole ASANized process to deliver all existing segfaults as signals instead > > > > > > of -EFAULT ? > > > > > > > > > > Not sure if it is an "instead", you might have to deliver the signal in > > > > > addition to letting the syscall fail (not that I would be an expert on > > > > > signal delivery :D ). > > > > > > > > > > prctl sounds better, or some way to configure the behavior on VMA ranges; > > > > > otherwise we would need yet another marker, which is not the end of the > > > > > world but would make it slightly more confusing. > > > > > > > > > > > > > Yeah prctl() sounds sensible, and since we are explicitly adding a marker > > > > for guard pages here we can do this as a follow up too without breaking any > > > > userland expectations, i.e. 'new feature to make guard pages signal' is not > > > > going to contradict the default behaviour. > > > > > > > > So all makes sense to me, but I do think best as a follow up! :) > > > > > > Yeah, fully agreed. And my gut feeling is that it might not be that easy > > > ... :) > > > > > > In the end, what we want is *some* notification that a guard PTE was > > > accessed. Likely the notification must not necessarily completely > > > synchronous (although it would be ideal) and it must not be a signal. > > > > > > Maybe having a different way to obtain that information from user space > > > would work. > > > > For bug detection tools (like GWP-ASan [1]) it's essential to have > > useful stack traces. As such, having this signal be synchronous would > > be more useful. I don't see how one could get a useful stack trace (or > > other information like what's stashed away in ucontext like CPU > > registers) if this were asynchronous. > > Yes, I know. But it would be better than not getting *any* notification > except of some syscalls simply failing with -EFAULT, and not having an idea > which address was even accessed. > > Maybe the signal injection is easier than I think, but I somehow doubt it > ... Yeah I'm afraid I don't think this series is a place where I can fundamentally change how something so sensitive works in the kernel. It's espeically super sensitive because this is a uAPI change and a wrong decision here could result in guard pages being broken out the gate and I really don't want to risk that. > > -- > Cheers, > > David / dhildenb >