On Fri, Apr 6, 2018 at 4:44 AM, Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx> wrote: > On 04/06/2018 05:13 AM, Shakeel Butt wrote: >> On Fri, Mar 23, 2018 at 8:20 AM, Andrey Ryabinin >> <aryabinin@xxxxxxxxxxxxx> wrote: >>> memcg reclaim may alter pgdat->flags based on the state of LRU lists >>> in cgroup and its children. PGDAT_WRITEBACK may force kswapd to sleep >>> congested_wait(), PGDAT_DIRTY may force kswapd to writeback filesystem >>> pages. But the worst here is PGDAT_CONGESTED, since it may force all >>> direct reclaims to stall in wait_iff_congested(). Note that only kswapd >>> have powers to clear any of these bits. This might just never happen if >>> cgroup limits configured that way. So all direct reclaims will stall >>> as long as we have some congested bdi in the system. >>> >>> Leave all pgdat->flags manipulations to kswapd. kswapd scans the whole >>> pgdat, only kswapd can clear pgdat->flags once node is balance, thus >>> it's reasonable to leave all decisions about node state to kswapd. >> >> What about global reclaimers? Is the assumption that when global >> reclaimers hit such condition, kswapd will be running and correctly >> set PGDAT_CONGESTED? >> > > The reason I moved this under if(current_is_kswapd()) is because only kswapd > can clear these flags. I'm less worried about the case when PGDAT_CONGESTED falsely > not set, and more worried about the case when it falsely set. If direct reclaimer sets > PGDAT_CONGESTED, do we have guarantee that, after congestion problem is sorted, kswapd > ill be woken up and clear the flag? It seems like there is no such guarantee. > E.g. direct reclaimers may eventually balance pgdat and kswapd simply won't wake up > (see wakeup_kswapd()). > > Thanks for the explanation, I think it should be in the commit message.