Thanks Mel for your detailed comments!
On 2022/9/20 19:02, Mel Gorman wrote:
On Tue, Sep 20, 2022 at 05:38:30PM +0800, Zhenhua Huang wrote:
Also this patch doesn't really explain why it should work and honestly
it doesn't really make much sense to me either.
Sorry, my fault. IMO, The reason it should work is, say for this case of
order 3 allocation: we can perform direct reclaim more times as we have only
order 2 pages(which *lowered* by this change) in free_list(8214*16kB (UEC)).
The order requirement which I have lowered is should_reclaim_retry ->
__zone_watermark_ok:
for (o = order; o < MAX_ORDER; o++) {
struct free_area *area = &z->free_area[o];
...
for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
if (!free_area_empty(area, mt))
return true;
}
Order 2 pages can be more easily met, hence VM has more chance to return
true from should_reclaim_retry.
This is a wrong approach to the problem because there is no real
guarantee the reclaim round will do anything useful. You should be
really looking at the compaction side of the thing.
Thanks Michal for the advice, I'll look at from compaction side also. But I
have one further question, IMO reclaim(~2GB LRU pages can be reclaimed)
should be more feasible compared to compaction(already tried with highest
prio and failed) in this case? Could you please elaborate more...it seems I
still not fully understand why it's a wrong approach to check from reclaim
phase.
Because it risks major slowdowns due to excessive reclaim. Early support
used "lumpy reclaim" instead of compaction and resulted in major stalls when
trying to allocate THP resulting in THP often being disabled. The success
rates were great but systems could become unusable for several minutes
and ultimately this resulted in compaction and the current backoff logic
of reclaim. Your scenario is similar, you want to aggressively trying to
shrink slabs in case an order-3 block of pages gets freed. It might succeed
but the system grinds to a halt with excessive re-reading of information
from the disk for other use cases.
Thanks, I've also noticed. Contiguous reclaim obviously enlarged the
time to OOM as I saw.
Your focus likely should be on reclaim and compaction aborting
prematurely because free CMA pages are available at the correct order
but the calling context cannot use CMA pages.
It's strange to hear of a driver that has a strict need for order-3 pages
being available at all times due to a lack of an IOMMU because that is
going to be fragile. One point of CMA was to carve out a region for such
drivers so they could the contiguous regions they needed. I believe phone
cameras were an early example. If your driver has strict requirements for
high-order page availability then CMA probably should be configured and
the driver should use CMA.
You point is to avoid to allocate order 3 unless that's really needed.
Got it, thanks.
Thanks,
Zhenhua