On Mon, Oct 21, 2024 at 07:05:27PM +0200, David Hildenbrand wrote: > On 20.10.24 18:20, Lorenzo Stoakes wrote: > > Implement a new lightweight guard page feature, that is regions of userland > > virtual memory that, when accessed, cause a fatal signal to arise. > > > > Currently users must establish PROT_NONE ranges to achieve this. > > > > However this is very costly memory-wise - we need a VMA for each and every > > one of these regions AND they become unmergeable with surrounding VMAs. > > > > In addition repeated mmap() calls require repeated kernel context switches > > and contention of the mmap lock to install these ranges, potentially also > > having to unmap memory if installed over existing ranges. > > > > The lightweight guard approach eliminates the VMA cost altogether - rather > > than establishing a PROT_NONE VMA, it operates at the level of page table > > entries - poisoning PTEs such that accesses to them cause a fault followed > > by a SIGSGEV signal being raised. > > > > This is achieved through the PTE marker mechanism, which a previous commit > > in this series extended to permit this to be done, installed via the > > generic page walking logic, also extended by a prior commit for this > > purpose. > > > > These poison ranges are established with MADV_GUARD_POISON, and if the > > range in which they are installed contain any existing mappings, they will > > be zapped, i.e. free the range and unmap memory (thus mimicking the > > behaviour of MADV_DONTNEED in this respect). > > > > Any existing poison entries will be left untouched. There is no nesting of > > poisoned pages. > > > > Poisoned ranges are NOT cleared by MADV_DONTNEED, as this would be rather > > unexpected behaviour, but are cleared on process teardown or unmapping of > > memory ranges. > > > > Ranges can have the poison property removed by MADV_GUARD_UNPOISON - > > 'remedying' the poisoning. The ranges over which this is applied, should > > they contain non-poison entries, will be untouched, only poison entries > > will be cleared. > > > > We permit this operation on anonymous memory only, and only VMAs which are > > non-special, non-huge and not mlock()'d (if we permitted this we'd have to > > drop locked pages which would be rather counterintuitive). > > > > Suggested-by: Vlastimil Babka <vbabka@xxxxxxx> > > Suggested-by: Jann Horn <jannh@xxxxxxxxxx> > > Suggested-by: David Hildenbrand <david@xxxxxxxxxx> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > --- > > arch/alpha/include/uapi/asm/mman.h | 3 + > > arch/mips/include/uapi/asm/mman.h | 3 + > > arch/parisc/include/uapi/asm/mman.h | 3 + > > arch/xtensa/include/uapi/asm/mman.h | 3 + > > include/uapi/asm-generic/mman-common.h | 3 + > > mm/madvise.c | 168 +++++++++++++++++++++++++ > > mm/mprotect.c | 3 +- > > mm/mseal.c | 1 + > > 8 files changed, 186 insertions(+), 1 deletion(-) > > > > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > > index 763929e814e9..71e13f27742d 100644 > > --- a/arch/alpha/include/uapi/asm/mman.h > > +++ b/arch/alpha/include/uapi/asm/mman.h > > @@ -78,6 +78,9 @@ > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > > +#define MADV_GUARD_POISON 102 /* fatal signal on access to range */ > > +#define MADV_GUARD_UNPOISON 103 /* revoke guard poisoning */ > > Just to raise it here: MADV_GUARD_INSTALL / MADV_GUARD_REMOVE or sth. like > that would have been even clearer, at least to me. :) It still feels like poisoning to me because we're explicitly putting something in the page tables to make a range have different fault behaviour like a HW poisoning, and 'installing' suggests backing or something like this, I think that's more confusing. > > But no strong opinion, just if somebody else reading along was wondering > about the same. > > > I'm hoping to find more time to have a closer look at this this week, but in > general, the concept sounds reasonable to me. Thanks! > > -- > Cheers, > > David / dhildenb >