"Alex Zhu (Kernel)" <alexlzhu@xxxxxxxx> writes: >> On Oct 19, 2022, at 6:24 PM, Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> "Alex Zhu (Kernel)" <alexlzhu@xxxxxxxx> writes: >> >>>> On Oct 19, 2022, at 12:04 AM, Huang, Ying <ying.huang@xxxxxxxxx> wrote: >>>> >>>>> >>>> <alexlzhu@xxxxxx> writes: >>>> [snip] >>>>> diff --git a/mm/list_lru.c b/mm/list_lru.c >>>>> index a05e5bef3b40..8cc56a84b554 100644 >>>>> --- a/mm/list_lru.c >>>>> +++ b/mm/list_lru.c >>>>> @@ -140,6 +140,32 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item) >>>>> } >>>>> EXPORT_SYMBOL_GPL(list_lru_add); >>>>> >>>>> +bool list_lru_add_page(struct list_lru *lru, struct page *page, struct list_head *item) >>>>> +{ >>>>> + int nid = page_to_nid(page); >>>>> + struct list_lru_node *nlru = &lru->node[nid]; >>>>> + struct list_lru_one *l; >>>>> + struct mem_cgroup *memcg; >>>>> + unsigned long flags; >>>>> + >>>>> + spin_lock_irqsave(&nlru->lock, flags); >>>>> + if (list_empty(item)) { >>>>> + memcg = page_memcg(page); >>>>> + l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); >>>>> + list_add_tail(item, &l->list); >>>>> + /* Set shrinker bit if the first element was added */ >>>>> + if (!l->nr_items++) >>>>> + set_shrinker_bit(memcg, nid, >>>>> + lru_shrinker_id(lru)); >>>>> + nlru->nr_items++; >>>>> + spin_unlock_irqrestore(&nlru->lock, flags); >>>>> + return true; >>>>> + } >>>>> + spin_unlock_irqrestore(&nlru->lock, flags); >>>>> + return false; >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(list_lru_add_page); >>>>> + >>>> >>>> It appears that only 2 lines are different from list_lru_add(). Is it >>>> possible for us to share code? For example, add another flag for >>>> page_memcg() case? >>> >>> I believe there are 4 lines. The page_to_nid(page) and the spin_lock_irqsave/restore. >>> It was implemented this way as we found we needed to take a page as a parameter and obtain the node id from >>> the page. This is because the THP is not necessarily a slab object, as the list_lru_add/delete code assumes. >>> >>> Also, there is a potential deadlock when split_huge_page is called from reclaim and when split_huge_page is called >>> by the THP Shrinker, which is why we need irqsave/restore. >> >> Can you provide more details on this? If it's necessary, I think it should be OK to use irqsave/restore. > > There is an ABBA deadlock between reclaim and the THP shrinker where > one waits for the page lock and the other waits for the list_lru lock. Per my understanding, when we hold list_lru lock, we only use trylock_page() and backoff if we fail to acquire the page lock. So it appears that we are safe here? If not, can you provide the detailed race condition information, for example actions on different CPUs? > There is another deadlock where free_transhuge_page interrupts the THP > shrinker while it is splitting THPs (has the list_lru lock) and then > acquires the same list_lru lock (self deadlock). These happened in our > prod experiments. I do believe irqsave/restore is necessary. Got it. We may use list_lru lock inside IRQ handler when free pages. I think this information is good for code comments or patch description. >> >>> I though this would be cleaner than attempting to shared code with list_lru_add/delete. Only the shrinker makes use of this. >>> All other use cases assume slab objects. >> >> Not only for slab objects now. The original code works for slab and >> normal page. Which is controlled by list_lru->memcg_aware. You can add >> another flag for your new use case, or refactor it to use a function >> pointer. >> >> Best Regards, >> Huang, Ying > > I’ll take a look at this tomorrow and get back to you. Thanks for the suggestion! >> >>>> [snip] Best Regards, Huang, Ying