Re: [PATCH v21 1/4] mm: add VM_DROPPABLE for designating always lazily freeable mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

Thanks a lot for the review of the code. Am very glad to have somebody
who knows this code take a careful look at it.

On Sun, Jul 07, 2024 at 09:42:38AM +0200, David Hildenbrand wrote:
> Patch subject would be better to talk about MAP_DROPPABLE now.

Will do. Or, well, in light of the conversation downthread, MAP_DROPPABLE.

> But I don't immediately see why MAP_WIPEONFORK and MAP_DONTDUMP have to 
> be mmap() flags. Using mmap(MAP_NORESERVE|MAP_DROPPABLE) with madvise() 
> to configure these (for users that require that) should be good enough, 
> just like they are for existing users.

I looked into that too, and coming up with some clunky mechanism for
automating several calls to madvise() for each thing. I could make it
work need be, but it's really not nice. And it sort of then leads in the
direction, "this interface isn't great; why don't you just make a
dedicated syscall that does everything you need in one fell swoop,"
which is explicitly what Linus doesn't want. Making it accessible to
mmap() instead makes it more of a direct thing that isn't a whole new
syscall.

Anyway, it indeed looks like there are more PROT_ bits available, and
also that PROT_ has been used this way before. In addition to PROT_SEM,
there are a few arch-specific PROT_ bits that seem similar enough. The
distinction is pretty blurry between MAP_ and PROT_.

So I'll just move this to PROT_ for v+1.

> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 8c6cd8825273..57b8dad9adcc 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -623,7 +623,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
> >   				may_expand_vm(mm, oldflags, nrpages))
> >   			return -ENOMEM;
> >   		if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_HUGETLB|
> > -						VM_SHARED|VM_NORESERVE))) {
> > +				  VM_SHARED|VM_NORESERVE|VM_DROPPABLE))) {
> >   			charged = nrpages;
> >   			if (security_vm_enough_memory_mm(mm, charged))
> >   				return -ENOMEM;
> 
> I don't quite understand this change here. If MAP_DROPPABLE does not 
> affect memory accounting during mmap(), it should not affect the same 
> during mprotect(). VM_NORESERVE / MAP_NORESERVE is responsible for that.
> 
> Did I missing something where MAP_DROPPABLE changes the memory 
> accounting during mmap()?

Actually, I think I errored by not adding it to mmap() (via the check in
accountable_mapping(), I believe), and I should add it there.  That also
might be another reason why this is better as a MAP_ (or, rather PROT_)
bit, rather than an madvise call.

Tell me if you disagree, as I might be way off here. But I was thinking
that because the system can just "drop" this memory, it's not sensible
to account for it, because it can be taken right back.

> > diff --git a/mm/rmap.c b/mm/rmap.c
> We use
> 
> /*
>   * Comment start
>   * Comment end
>   */
> 
> styled comments in MM.

Fixed.

> 
> > +				     (vma->vm_flags & VM_DROPPABLE))) {
> >   					dec_mm_counter(mm, MM_ANONPAGES);
> >   					goto discard;
> >   				}
> > @@ -1851,7 +1858,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >   				 * discarded. Remap the page to page table.
> >   				 */
> >   				set_pte_at(mm, address, pvmw.pte, pteval);
> > -				folio_set_swapbacked(folio);
> > +				/* Unlike MADV_FREE mappings, VM_DROPPABLE ones
> > +				 * never get swap backed on failure to drop. */
> > +				if (!(vma->vm_flags & VM_DROPPABLE))
> > +					folio_set_swapbacked(folio);
> >   				ret = false;
> >   				page_vma_mapped_walk_done(&pvmw);
> >   				break;
> 
> A note that in mm/mm-stable, "madvise_free_huge_pmd" exists to optimize 
> MADV_FREE on PMDs. I suspect we'd want to extend that one as well for 
> dropping support, but likely it would also only be a performance 
> improvmeent and not affect functonality if not handled.

That's for doing the freeing of PTEs after the fact, right? If the
mapping was created, got filled with some data, and then sometime later
it got MADV_FREE'd, which is the pattern people follow typically with
MADV_FREE. If we do this as PROT_/MAP_, then that's not a case we need
to worry about, if I understand this code correctly.

Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux