On 11/18/22 11:17, Mel Gorman wrote: > The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task > allocating from the PCP must not re-enter the allocator from IRQ context. > In each instance where IRQ-reentrancy is possible, the lock is acquired > using pcp_spin_trylock_irqsave() even though IRQs are disabled and > re-entrancy is impossible. > > Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common > case at the cost of some IRQ allocations taking a slower path. If the PCP > lists need to be refilled, the zone lock still needs to disable IRQs but > that will only happen on PCP refill and drain. If an IRQ is raised when > a PCP allocation is in progress, the trylock will fail and fallback to > using the buddy lists directly. Note that this may not be a universal win > if an interrupt-intensive workload also allocates heavily from interrupt > context and contends heavily on the zone->lock as a result. > > [yuzhao@xxxxxxxxxx: Reported lockdep issue on IO completion from softirq] > [hughd@xxxxxxxxxx: Fix list corruption, lock improvements, micro-optimsations] > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> Some nits below: > @@ -3516,10 +3485,10 @@ void free_unref_page(struct page *page, unsigned int order) > */ > void free_unref_page_list(struct list_head *list) > { > + unsigned long __maybe_unused UP_flags; > struct page *page, *next; > struct per_cpu_pages *pcp = NULL; > struct zone *locked_zone = NULL; > - unsigned long flags; > int batch_count = 0; > int migratetype; > > @@ -3550,11 +3519,26 @@ void free_unref_page_list(struct list_head *list) > > /* Different zone, different pcp lock. */ > if (zone != locked_zone) { > - if (pcp) > - pcp_spin_unlock_irqrestore(pcp, flags); > + if (pcp) { > + pcp_spin_unlock(pcp); > + pcp_trylock_finish(UP_flags); > + } > > + /* > + * trylock is necessary as pages may be getting freed > + * from IRQ or SoftIRQ context after an IO completion. > + */ > + pcp_trylock_prepare(UP_flags); > + pcp = pcp_spin_trylock(zone->per_cpu_pageset); > + if (!pcp) { Perhaps use unlikely() here? > + pcp_trylock_finish(UP_flags); > + free_one_page(zone, page, page_to_pfn(page), > + 0, migratetype, FPI_NONE); Not critical for correctness, but the migratepage here might be stale and we should do get_pcppage_migratetype(page); > + locked_zone = NULL; > + continue; > + } > locked_zone = zone; > - pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags); > + batch_count = 0; > } > > /* > @@ -3569,18 +3553,23 @@ void free_unref_page_list(struct list_head *list) > free_unref_page_commit(zone, pcp, page, migratetype, 0); > > /* > - * Guard against excessive IRQ disabled times when we get > - * a large list of pages to free. > + * Guard against excessive lock hold times when freeing > + * a large list of pages. Lock will be reacquired if > + * necessary on the next iteration. > */ > if (++batch_count == SWAP_CLUSTER_MAX) { > - pcp_spin_unlock_irqrestore(pcp, flags); > + pcp_spin_unlock(pcp); > + pcp_trylock_finish(UP_flags); > batch_count = 0; > - pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags); > + pcp = NULL; > + locked_zone = NULL; AFAICS if this block was just "locked_zone = NULL;" then the existing code would do the right thing. Or maybe to have simpler code, just do batch_count++ here and make the relocking check do if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) > } > } > > - if (pcp) > - pcp_spin_unlock_irqrestore(pcp, flags); > + if (pcp) { > + pcp_spin_unlock(pcp); > + pcp_trylock_finish(UP_flags); > + } > } > > /*