On 12/8/20 11:34 AM, Jason Gunthorpe wrote:
On Tue, Dec 08, 2020 at 05:28:59PM +0000, Joao Martins wrote:
Rather than decrementing the ref count one by one, we
walk the page array and checking which belong to the same
compound_head. Later on we decrement the calculated amount
of references in a single write to the head page.
Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
mm/gup.c | 41 ++++++++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 9 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 194e6981eb03..3a9a7229f418 100644
+++ b/mm/gup.c
@@ -212,6 +212,18 @@ static bool __unpin_devmap_managed_user_page(struct page *page)
}
#endif /* CONFIG_DEV_PAGEMAP_OPS */
+static int record_refs(struct page **pages, int npages)
+{
+ struct page *head = compound_head(pages[0]);
+ int refs = 1, index;
+
+ for (index = 1; index < npages; index++, refs++)
+ if (compound_head(pages[index]) != head)
+ break;
+
+ return refs;
+}
+
/**
* unpin_user_page() - release a dma-pinned page
* @page: pointer to page to be released
@@ -221,9 +233,9 @@ static bool __unpin_devmap_managed_user_page(struct page *page)
* that such pages can be separately tracked and uniquely handled. In
* particular, interactions with RDMA and filesystems need special handling.
*/
-void unpin_user_page(struct page *page)
+static void __unpin_user_page(struct page *page, int refs)
Refs should be unsigned everywhere.
That's fine (although, see my comments in the previous patch for
pitfalls). But it should be a preparatory patch, in order to avoid
clouding up this one and your others as well.
I suggest using clear language 'page' here should always be a compound
head called 'head' (or do we have another common variable name for
this?)
Agreed. Matthew's struct folio upgrade will allow us to really make
things clear in a typesafe way, but meanwhile, it's probably good to use
one of the following patterns:
page = compound_head(page); // at the very beginning of a routine
or
do_things_to_this_single_page(page);
head = compound_head(page);
do_things_to_this_compound_page(head);
'refs' is number of tail pages within the compound, so 'ntails' or
something
I think it's OK to leave it as "refs", because within gup.c, refs has
a very particular meaning. But if you change to ntails or something, I'd
want to see a complete change: no leftovers of refs that are really ntails.
So far I'd rather leave it as refs, but it's not a big deal either way.
{
- int refs = 1;
+ int orig_refs = refs;
page = compound_head(page);
Caller should always do this
@@ -237,14 +249,19 @@ void unpin_user_page(struct page *page)
return;
if (hpage_pincount_available(page))
- hpage_pincount_sub(page, 1);
+ hpage_pincount_sub(page, refs);
Maybe a nice touch would be to pass in orig_refs, because there
is no intention to use a possibly modified refs. So:
hpage_pincount_sub(page, orig_refs);
...obviously a fine point, I realize. :)
else
- refs = GUP_PIN_COUNTING_BIAS;
+ refs *= GUP_PIN_COUNTING_BIAS;
if (page_ref_sub_and_test(page, refs))
__put_page(page);
- mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1);
+ mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, orig_refs);
+}
And really this should be placed directly after
try_grab_compound_head() and be given a similar name
'unpin_compound_head()'. Even better would be to split the FOLL_PIN
part into a function so there was a clear logical pairing.
And reviewing it like that I want to ask if this unpin sequence is in
the right order.. I would expect it to be the reverse order of the get
John?
Is it safe to call mod_node_page_state() after releasing the refcount?
This could race with hot-unplugging the struct pages so I think it is
wrong.
Yes, I think you are right! I wasn't in a hot unplug state of mind when I
thought about the ordering there, but I should have been. :)
+void unpin_user_page(struct page *page)
+{
+ __unpin_user_page(page, 1);
Thus this is
__unpin_user_page(compound_head(page), 1);
@@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
bool make_dirty)
{
unsigned long index;
+ int refs = 1;
/*
* TODO: this can be optimized for huge pages: if a series of pages is
I think you can delete this TODO block now, and the one in unpin_user_pages_dirty_lock(),
as a result of these changes.
@@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
return;
}
- for (index = 0; index < npages; index++) {
+ for (index = 0; index < npages; index += refs) {
struct page *page = compound_head(pages[index]);
+
I think this is really hard to read, it should end up as some:
for_each_compond_head(page_list, page_list_len, &head, &ntails) {
if (!PageDirty(head))
set_page_dirty_lock(head, ntails);
unpin_user_page(head, ntails);
}
And maybe you open code that iteration, but that basic idea to find a
compound_head and ntails should be computational work performed.
No reason not to fix set_page_dirty_lock() too while you are here.
Eh? What's wrong with set_page_dirty_lock() ?
Also, this patch and the next can be completely independent of the
rest of the series, it is valuable regardless of the other tricks. You
can split them and progress them independently.
.. and I was just talking about this with Daniel Jordan and some other
people at your company :)
Thanks,
Jason
thanks,
--
John Hubbard
NVIDIA