On Mon, Oct 21, 2024 at 07:00:53PM +0200, David Hildenbrand wrote: > On 21.10.24 18:51, Lorenzo Stoakes wrote: > > On Mon, Oct 21, 2024 at 06:44:04PM +0200, David Hildenbrand wrote: > > > On 21.10.24 18:23, Lorenzo Stoakes wrote: > > > > On Mon, Oct 21, 2024 at 06:00:20PM +0200, David Hildenbrand wrote: > > > > [snip] > > > > > > > > > > > > To summarise for on-list: > > > > > > > > > > > > * MADV_FREE, while ostensibly being a 'lazy free' mechanism, has the > > > > > > ability to be 'cancelled' if you write to the memory. Also, after the > > > > > > freeing is complete, you can write to the memory to reuse it, the mapping > > > > > > is still there. > > > > > > > > > > > > * For hardware poison markers it makes sense to drop them as you're > > > > > > effectively saying 'I am done with this range that is now unbacked and > > > > > > expect to get an empty page should I use it now'. UFFD WP I am not sure > > > > > > about but presumably also fine. > > > > > > > > > > > > * However, guard pages are different - if you 'cancel' and you are left > > > > > > with a block of memory allocated to you by a pthread or userland > > > > > > allocator implementation, you don't want to then no longer be protected > > > > > > from overrunning into other thread memory. > > > > > > > > > > Agreed. What happens on MADV_DONTNEED/MADV_FREE on guard pages? Ignored or > > > > > error? It sounds like a usage "error" to me (in contrast to munmap()). > > > > > > > > It's ignored, no errror. On MADV_DONTNEED we already left the guard pages in > > > > place, from v3 we will do the same for MADV_FREE. > > > > > > > > I'm not sure I'd say it's an error per se, as somebody might have a use case > > > > where they want to zap over a range but keep guard pages, perhaps an allocator > > > > or something? > > > > > > Hm, not sure I see use for that. > > > > > > Staring at madvise_walk_vmas(), we return ENOMEM on VMA holes, but would > > > process PROT_NONE. So current behavior is at least consistent with PROT_NONE > > > handling (where something could be mapped, though). > > > > Err, the handling of holes is terrible, yes we return ENOMEM, but we _carry out > > the whole procedure_ then return an error, an error _indistinguishable from an > > error arising from any of the individual parts_. > > > > Which is just, awful. > > Yes, absolutely. I don't know why we decided to continue. And why we return > ENOMEM ... Anyway UAPI so no turning back now :) > > > > > > > > > No strong opinion. > > > > Well you used up your strong opinion on the naming ;) > > He, and I am out of energy for this year ;) Haha understandable... > > In retrospective, "install or remove a guard PTE" is just much better than > anything else ... > > So I should never have been mislead to suggest poison/unpoison as a > replacement for poison/remedy :P You know the reason ;) > > > > > > > > > > > > > > Also the existing logic is that existing markers (HW poison, uffd-simulated HW > > > > poison, uffd wp marker) are retained and no error raised on MADV_DONTNEED, and > > > > no error on MADV_FREE either, so it'd be consistent with existing behaviour. > > > > > > > > > HW poison / uffd-simulated HW poison are expected to be zapped: it's just > > > like a mapped page with HWPOISON. So that is correct. > > > > Well, poison is _not_ zapped on MADV_DONTNEED but _is_ on MADV_FREE :) anyway, I > > Huh? > > madvise_dontneed_single_vma()->zap_page_range_single(details=NULL)->unmap_single_vma(details=NULL) > ... zap_pte_range() > > } else if (is_hwpoison_entry(entry) || > is_poisoned_swp_entry(entry)) { > if (!should_zap_cows(details)) > continue; > ... > > Should just zap them. > > What am I missing? Yeah ok it's me who's missing something here, I hadn't noticed details == NULL so should_zap_cows() is true, my mistake! In any case we explicitly add code here to prevent guard pages from going. I will correct everything where I wrongly say otherwise, doh! > > > mean the MADV flags are a confusing mess generally, as per Vlasta's comments > > which to begin with I strongly disagreed with then, discussing further, realsed > > that no this is just a bit insane and had driven _me_ insane. > > > > > > > > UFFD-WP behavior is ... weird. Would not expect MADV_DONTNEED to zap uffd-wp > > > entries. > > > > > > > > > > > Also semantically you are achieving what the calls expect you are freeing the > > > > ranges since the guard page regions are unbacked so are already freed... so yeah > > > > I don't think an error really makes sense here. > > > > > > I you compare it to a VMA hole, it make sense to fail. If we treat it like > > > PROT_NONE, it make sense to skip them. > > > > > > > > > > > We might also be limiting use cases by assuming they might _only_ be used for > > > > allocators and such. > > > > > > I don't buy that as an argument, sorry :) > > > > > > "Let's map the kernel writable into all user space because otherwise we > > > might be limiting use cases" > > > > That's a great idea! Patch series incoming, 1st April 2025... :>) > > :) Just flip the bit on x86 and we're done! ;) > > > > > > > > > > :P > > > > > > -- > > > Cheers, > > > > > > David / dhildenb > > > > > > > Overall I think just always leaving in place except on remedy err sorry sorry > > unpoison and munmap and not returning an error if encountered elsewhere (other > > than, of course, GUP) is the right way forward and most in line with user > > expectation and practical usage. > > > Fine with me, make sure to document that is behaves like a PROT_NONE VMA, > not like a memory hole, except when something would trigger a fault (GUP > etc). Ack will make sure to document. > > > -- > Cheers, > > David / dhildenb >