On 4/28/20 9:43 AM, Michal Hocko wrote: > On Mon 27-04-20 16:35:58, Andrew Morton wrote: > [...] >> No consumer of GFP_ATOMIC memory should consume an unbounded amount of >> it. >> Subsystems such as networking will consume a certain amount and >> will then start recycling it. The total amount in-flight will vary >> over the longer term as workloads change. A dynamically tuning >> threshold system will need to adapt rapidly enough to sudden load >> shifts, which might require unreasonable amounts of headroom. > I do agree. __GFP_HIGH/__GFP_ATOMIC are bound by the size of the > reserves under memory pressure. Then allocatios start failing very > quickly and users have to cope with that, usually by deferring to a > sleepable context. Tuning reserves dynamically for heavy reserves > consumers would be possible but I am worried that this is far from > trivial. > > We definitely need to understand what is going on here. Why doesn't > kswapd + N*direct reclaimers do not provide enough memory to satisfy > both N threads + reserves consumers? How many times those direct > reclaimers have to retry? Was this not supposed to be avoided with PSI, user-space should a fair change to take actions before it goes bad in user-space? > We used to have the allocation stall warning as David mentioned in the > patch description and I have seen it triggering without heavy reserves > consumers (aka reported free pages corresponded to the min watermark). > The underlying problem was usually kswapd being stuck on some FS locks, > direct reclaimers stuck in shrinkers or way too overloaded system with > dozens if not hundreds of processes stuck in the page allocator each > racing with the reclaim and betting on luck. The last problem was the > most annoying because it is really hard to tune for.