On 10/24/24 06:35, Yu Zhao wrote: > On Wed, Oct 23, 2024 at 1:35 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> On 10/23/24 08:36, Yu Zhao wrote: >> > On Tue, Oct 22, 2024 at 4:53 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> >> >> +Cc Mel and Matt >> >> >> >> On 10/21/24 19:25, Michal Hocko wrote: >> >> >> >> Hm I don't think it's completely WAI. The intention is that we should be >> >> able to unreserve the highatomic pageblocks before going OOM, and there >> >> seems to be an unintended corner case that if the pageblocks are fully >> >> exhausted, they are not reachable for unreserving. >> > >> > I still think unreserving should only apply to highatomic PBs that >> > contain free pages. Otherwise, it seems to me that it'd be >> > self-defecting because: >> > 1. Unreserving fully used hightatomic PBs can't fulfill the alloc >> > demand immediately. >> >> I thought the alloc demand is only blocked on the pessimistic watermark >> calculation. Usable free pages exist, but the allocation is not allowed to >> use them. > > I think we are talking about two different problems here: > 1. The estimation problem. > 2. The unreserving policy problem. > > What you said here is correct w.r.t. the first problem, and I was > talking about the second problem. OK but the problem with unreserving currently makes the problem of estimation worse and unfixable. >> > 2. More importantly, it only takes one alloc failure in >> > __alloc_pages_direct_reclaim() to reset nr_reserved_highatomic to 2MB, >> > from as high as 1% of a zone (in this case 1GB). IOW, it makes more >> > sense to me that highatomic only unreserves what it doesn't fully use >> > each time unreserve_highatomic_pageblock() is called, not everything >> > it got (except the last PB). >> >> But if the highatomic pageblocks are already full, we are not really >> removing any actual highatomic reserves just by changing the migratetype and >> decreasing nr_reserved_highatomic? > > If we change the MT, they can be fragmented a lot faster, i.e., from > the next near OOM condition to upon becoming free. Trying to persist > over time is what actually makes those PBs more fragmentation > resistant. If we assume the allocations there have similar sizes and lifetimes, then I guess yeah. >> In fact that would allow the reserves >> grow with some actual free pages in the future. > > Good point. I think I can explain it better along this line. > > If highatomic is under the limit, both your proposal and the current > implementation would try to grow, making not much difference. However, > the current implementation can also reuse previously full PBs when > they become available. So there is a clear winner here: the current > implementation. I'd say it depends on the user of the highatomic blocks (the workload), which way ends up better. > If highatomic has reached the limit, with your proposal, the growth > can only happen after unreserve, and unreserve only happens under > memory pressure. This means it's likely that it tries to grow under > memory pressure, which is more difficult than the condition where > there is plenty of memory. For the current implementation, it doesn't > try to grow, rather, it keeps what it already has, betting those full > PBs becoming available for reuse. So I don't see a clear winner > between trying to grow under memory pressure and betting on becoming > available for reuse. Understood. But also note there are many conditions where the current implementation and my proposal behave the same. If highatomic pageblocks become full and then only one or few pages from each is freed, it suddenly becomes possible to unreserve them due to memory pressure, and there is no reuse for those highatomic allocations anymore. This very different outcome only depends on whether a single page is free for the unreserve to work, but from the efficiency of pageblock reusal you describe above a single page is only a minor difference. My proposal would at least remove the sudden change of behavior when going from a single free page to no free page. >> Hm that assumes we're adding some checks in free fastpath, and for that to >> work also that there will be a freed page in highatomic PC in near enough >> future from the decision we need to unreserve something. Which is not so >> much different from the current assumption we'll find such a free page >> already in the free list immediately. >> >> > To summarize, I think this is an estimation problem, which I would >> > categorize as a lesser problem than accounting problems. But it sounds >> > to me that you think it's a policy problem, i.e., the highatomic >> > unreserving policy is wrong or not properly implemented? >> >> Yeah I'd say not properly implemented, but that sounds like a mechanism, not >> policy problem to me :) > > What about adding a new counter to keep track of the size of free > pages reserved for highatomic? That's doable but not so trivial and means starting to handle the highatomic pageblocks much more carefully, like we do with CMA pageblocks and NR_FREE_CMA_PAGES counter, otherwise we risk drifting the counter unrecoverably. > Mel?