On 07.12.23 15:45, Ryan Roberts wrote:
On 07/12/2023 13:28, David Hildenbrand wrote:
Right, but you know from the first loop which order is applicable (and will be
fed to the second loop) and could just pte_unmap(pte) + tryalloc. If that fails,
remap and try with the next orders.
You mean something like this?
pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
if (!pte)
return ERR_PTR(-EAGAIN);
order = highest_order(orders);
while (orders) {
addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
if (!pte_range_none(pte + pte_index(addr), 1 << order)) {
order = next_order(&orders, order);
continue;
}
pte_unmap(pte);
folio = vma_alloc_folio(gfp, order, vma, addr, true);
if (folio) {
clear_huge_page(&folio->page, vmf->address, 1 << order);
return folio;
}
pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
if (!pte)
return ERR_PTR(-EAGAIN);
order = next_order(&orders, order);
}
pte_unmap(pte);
I don't really like that because if high order folio allocations fail, then you
are calling pte_range_none() again for the next lower order; once that check has
succeeded for an order it shouldn't be required for any lower orders. In this
case you also have lots of pte map/unmap.
I see what you mean.
The original version feels more efficient to me.
Yes it is. Adding in some comments might help, like
/*
* Find the largest order where the aligned range is completely prot_none(). Note
* that all remaining orders will be completely prot_none().
*/
...
/* Try allocating the largest of the remaining orders. */
OK added.
That would make the code certainly easier to understand. That "orders" magic of
constructing, filtering, walking is confusing :)
I might find some time today to see if there is an easy way to cleanup all what
I spelled out above. It really is a mess. But likely that cleanup could be
deferred (but you're touching it, so ... :) ).
I'm going to ignore the last 5 words. I heard the "that cleanup could be
deferred" part loud and clear though :)
:)
If we could stop passing orders into thp_vma_allowable_orders(), that would
probably
be the biggest win. It's just all a confusing mess.
I tried an approach like you suggested in the other thread originally, but I
struggled to define exactly what "thp_vma_configured_orders()" should mean;
Ideally, I just want "all the THP orders that are currently enabled for this
VMA+flags". But some callers want to enforce_sysfs and others don't, so you
probably have to at least pass that flag. Then you have DAX which explicitly
Yes, the flags would still be passed. It's kind of the "context".
ignores enforce_sysfs, but only in a page fault. And shmem, which ignores
enforce_sysfs, but only outside of a page fault. So it quickly becomes pretty
complex. It is basically thp_vma_allowable_orders() as currently defined.
Yeah, but moving the "can we actually fit a THP in there" check out of
the picture.
If this could be a simple function then it could be inline and as you say, we
can do the masking in the caller and exit early for the order-0 case. But it is
very complex (at least if you want to retain the equivalent logic to what
thp_vma_allowable_orders() has) so I'm not sure how to do the order-0 early exit
without passing in the orders bitfield. And we are unlikely to exit early
because PMD-sized THP is likely enabled and because we didn't pass in a orders
bitfield, that wasn't filtered out.
In short, I can't see a solution that's better than the one I have. But if you
have something in mind, if you can spell it out, then I'll have a go at tidying
it up and integrating it into the series. Otherwise I really would prefer to
leave it for a separate series.
I'm playing with some cleanups, but they can all be built on top if they
materialize.
--
Cheers,
David / dhildenb