Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

David Hildenbrand <david@xxxxxxxxxx> · Tue, 5 Dec 2023 14:39:39 +0100

On 05.12.23 14:31, Ryan Roberts wrote:
On 05/12/2023 09:56, David Hildenbrand wrote:

Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
-- he carries his own batching variant right now -- and
folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].

Note that the contpte series at [2] has a new patch in v3 (patch 2), which could
benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
on top of [2] once it is merged.

There is some overlap with both series (and some other work, like
multi-size THP [3]), so that will need some coordination, and likely a
stepwise inclusion.

Selfishly, I'd really like to get my stuff merged as soon as there is no
technical reason not to. I'd prefer not to add this as a dependency if we can
help it.

It's easy to rework either series on top of each other. The mTHP series has
highest priority,
no question, that will go in first.

Music to my ears! It would be great to either get a reviewed-by or feedback on
why not, for the key 2 patches in that series (3 & 4) and also your opinion on
whether we need to wait for compaction to land (see cover letter). It would be
great to get this into linux-next ASAP IMHO.

On it :)

Regarding the contpte, I think it needs more work. Especially, as raised, to not
degrade
order-0 performance. Maybe we won't make the next merge window (and you already
predicated
that in some cover letter :P ). Let's see.

Yeah that's ok. I'll do the work to fix the order-0 perf. And also do the same
for patch 2 in that series - would also be really helpful if you had a chance to
look at patch 2 - its new for v3.

I only skimmed over it, but it seems to go into the direction we'll 
need. Keeping order-0 performance unharmed should have highest priority. 
Hopefully my microbenchmarks are helpful.

But again, the conflicts are all trivial, so I'll happily rebase on top of
whatever is
in mm-unstable. Or move the relevant rework to the front so you can just carry
them/base on them. (the batched variants for dup do make the contpte code much
easier)

So perhaps we should aim for mTHP, then this, then contpte last, benefiting from
the batching.

Yeah. And again, I don't care too much if I have to rebase on top of 
your work if this here takes longer. It's all a fairly trivial conversion.

[...]

New (extended) hugetlb interface that operate on entire folio:
   * hugetlb_add_new_anon_rmap() -> Already existed
   * hugetlb_add_anon_rmap() -> Already existed
   * hugetlb_try_dup_anon_rmap()
   * hugetlb_try_share_anon_rmap()
   * hugetlb_add_file_rmap()
   * hugetlb_remove_rmap()

New "ordinary" interface for small folios / THP::
   * folio_add_new_anon_rmap() -> Already existed
   * folio_add_anon_rmap_[pte|ptes|pmd]()
   * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
   * folio_try_share_anon_rmap_[pte|pmd]()
   * folio_add_file_rmap_[pte|ptes|pmd]()
   * folio_dup_file_rmap_[pte|ptes|pmd]()
   * folio_remove_rmap_[pte|ptes|pmd]()

I'm not sure if there are official guidelines, but personally if we are
reworking the API, I'd take the opportunity to move "rmap" to the front of the
name, rather than having it burried in the middle as it is for some of these:

rmap_hugetlb_*()

rmap_folio_*()

No strong opinion. But we might want slightly different names then. For example,
it's "bio_add_folio" and not "bio_folio_add":

rmap_add_new_anon_hugetlb()
rmap_add_anon_hugetlb()
...
rmap_remove_hugetlb()

Thoughts?

Having now reviewed your series, I have a less strong opinion, perhaps it's
actually best with your original names; "folio" is actually the subject after
all; it's the thing being operated on.

I think having "folio" in there looks cleaner and more consistent to 
other functions.

I tend to like "rmap_dup_file_folio_[pte|ptes|pmd]()", because then we 
have "file folio" and "anon folio" as one word.

But then I wonder about the hugetlb part. Maybe simply 
"hugetlb_rmap_remove_folio()" etc.

Having the "hugetlb_" prefix at the beginning feels like the right thing 
to do, looking at orher hugetlb special-handlings.

But I'll wait a bit until I go crazy on renaming :)

I guess reading the patches will tell me, but what's the point of "ptes"? Surely
you're either mapping at pte or pmd level, and the number of pages is determined
by the folio size? (or presumably nr param passed in)

It's really (currently) one function to handle 1 vs. multiple PTEs. For example:

void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
         struct vm_area_struct *);
#define folio_remove_rmap_pte(folio, page, vma) \
     folio_remove_rmap_ptes(folio, page, 1, vma)
void folio_remove_rmap_pmd(struct folio *, struct page *,
         struct vm_area_struct *);

Yeah now that I've looked at the series, this makes sense. "ptes" was originally
making me think of contpte, but I suspect I'll be the only one with that
association :)

Ah, yes :)

--
Cheers,

David / dhildenb