Re: [PATCH v3 3/3] mm: THP low utilization shrinker

"Alex Zhu (Kernel)" <alexlzhu@xxxxxxxx> · Thu, 20 Oct 2022 17:04:02 +0000



> On Oct 20, 2022, at 1:06 AM, Huang, Ying <ying.huang@xxxxxxxxx> wrote:
> 
> "Alex Zhu (Kernel)" <alexlzhu@xxxxxxxx> writes:
> 
>>> On Oct 19, 2022, at 6:24 PM, Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>>> 
>>> "Alex Zhu (Kernel)" <alexlzhu@xxxxxxxx> writes:
>>> 
>>>>> On Oct 19, 2022, at 12:04 AM, Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>>>>> 
>>>>>> 
>>>>> <alexlzhu@xxxxxx> writes:
>>>>> 
> 
> [snip]
> 
>>>>>> diff --git a/mm/list_lru.c b/mm/list_lru.c
>>>>>> index a05e5bef3b40..8cc56a84b554 100644
>>>>>> --- a/mm/list_lru.c
>>>>>> +++ b/mm/list_lru.c
>>>>>> @@ -140,6 +140,32 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
>>>>>> }
>>>>>> EXPORT_SYMBOL_GPL(list_lru_add);
>>>>>> 
>>>>>> +bool list_lru_add_page(struct list_lru *lru, struct page *page, struct list_head *item)
>>>>>> +{
>>>>>> +	int nid = page_to_nid(page);
>>>>>> +	struct list_lru_node *nlru = &lru->node[nid];
>>>>>> +	struct list_lru_one *l;
>>>>>> +	struct mem_cgroup *memcg;
>>>>>> +	unsigned long flags;
>>>>>> +
>>>>>> +	spin_lock_irqsave(&nlru->lock, flags);
>>>>>> +	if (list_empty(item)) {
>>>>>> +		memcg = page_memcg(page);
>>>>>> +		l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
>>>>>> +		list_add_tail(item, &l->list);
>>>>>> +		/* Set shrinker bit if the first element was added */
>>>>>> +		if (!l->nr_items++)
>>>>>> +			set_shrinker_bit(memcg, nid,
>>>>>> +					 lru_shrinker_id(lru));
>>>>>> +		nlru->nr_items++;
>>>>>> +		spin_unlock_irqrestore(&nlru->lock, flags);
>>>>>> +		return true;
>>>>>> +	}
>>>>>> +	spin_unlock_irqrestore(&nlru->lock, flags);
>>>>>> +	return false;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(list_lru_add_page);
>>>>>> +
>>>>> 
>>>>> It appears that only 2 lines are different from list_lru_add().  Is it
>>>>> possible for us to share code?  For example, add another flag for
>>>>> page_memcg() case?
>>>> 
>>>> I believe there are 4 lines. The page_to_nid(page) and the spin_lock_irqsave/restore. 
>>>> It was implemented this way as we found we needed to take a page as a parameter and obtain the node id from
>>>> the page. This is because the THP is not necessarily a slab object, as the list_lru_add/delete code assumes. 
>>>> 
>>>> Also, there is a potential deadlock when split_huge_page is called from reclaim and when split_huge_page is called 
>>>> by the THP Shrinker, which is why we need irqsave/restore. 
>>> 
>>> Can you provide more details on this?  If it's necessary, I think it should be OK to use irqsave/restore.
>> 
>> There is an ABBA deadlock between reclaim and the THP shrinker where
>> one waits for the page lock and the other waits for the list_lru lock.
> 
> Per my understanding, when we hold list_lru lock, we only use
> trylock_page() and backoff if we fail to acquire the page lock.  So it
> appears that we are safe here?  If not, can you provide the detailed
> race condition information, for example actions on different CPUs?

The ABBA deadlock was the reason we added this try lock_page(). It is safe now, was not safe in v3. 
> 
>> There is another deadlock where free_transhuge_page interrupts the THP
>> shrinker while it is splitting THPs (has the list_lru lock) and then
>> acquires the same list_lru lock (self deadlock). These happened in our
>> prod experiments.  I do believe irqsave/restore is necessary.
> 
> Got it.  We may use list_lru lock inside IRQ handler when free pages.  I
> think this information is good for code comments or patch description.

Sounds good! I’ll add some comments to make that more clear. 
> 
>>> 
>>>> I though this would be cleaner than attempting to shared code with list_lru_add/delete. Only the shrinker makes use of this. 
>>>> All other use cases assume slab objects.
>>> 
>>> Not only for slab objects now.  The original code works for slab and
>>> normal page.  Which is controlled by list_lru->memcg_aware.  You can add
>>> another flag for your new use case, or refactor it to use a function
>>> pointer.
>>> 
>>> Best Regards,
>>> Huang, Ying
>> 
>> I’ll take a look at this tomorrow and get back to you. Thanks for the suggestion! 
>>> 
>>>>> 
> 
> [snip]
> 
> Best Regards,
> Huang, Ying