On Sun, Nov 06, 2022 at 08:42:32AM -0800, Hugh Dickins wrote: > On Fri, 4 Nov 2022, Mel Gorman wrote: > > > Changelog since v1 > > o Use trylock in free_unref_page_list due to IO completion from softirq > > context > > > > The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task > > allocating from the PCP must not re-enter the allocator from IRQ context. > > In each instance where IRQ-reentrancy is possible, the lock is acquired using > > pcp_spin_trylock_irqsave() even though IRQs are disabled and re-entrancy > > is impossible. > > > > Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common > > case at the cost of some IRQ allocations taking a slower path. If the PCP > > lists need to be refilled, the zone lock still needs to disable IRQs but > > that will only happen on PCP refill and drain. If an IRQ is raised when > > a PCP allocation is in progress, the trylock will fail and fallback to > > using the buddy lists directly. Note that this may not be a universal win > > if an interrupt-intensive workload also allocates heavily from interrupt > > context and contends heavily on the zone->lock as a result. > > > > [yuzhao@xxxxxxxxxx: Reported lockdep issue on IO completion from softirq] > > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > > Hi Mel, I think you Cc'ed me for the purpose of giving this patch a > run, and reporting if it's not good. That is the case, I'm afraid. > Thanks for testing and yes, you were cc'd in the hope you'd run it through a stress test of some sort. A lot of the test I run are performance orientated and relatively few target functional issues. > I first tried it on a v6.1-rc3, and very soon crashed under load with > something about PageBuddy in the output. When I reverted, no problem; > I thought maybe it's dependent on other commits in akpm's tree. > Can you tell me what sort of load it's under? I would like to add something similar to the general battery of tests I run all patches affecting the page allocator through. Even if this is just a crude shell script, it would be enough for me to work with and incorporate into mmtests. If it's there and I find mm-unstable has its own problems, bisection can brute force the problem. > Later I tried on current mm-unstable: which is living up to the name > in other ways, but when other issues patched, it soon crashed under > load, GPF probably for non-canonical address 0xdead0000000000f8 > in compact_zone < compact_zone_order < try_to_compact_pages < > .... < shmem_alloc_hugefolio < ... > 0xdead000* looks like ILLEGAL_POINTER_VALUE which is used as a poison value so a full list of debugging options you apply for the stress test would also be useful. > I do try to exercise compaction as hard as I can, even to the point > of having a hack in what used to be called shmem_getpage_gfp(), > reverting to the stronger attempt to get huge pages, before Rik > weakened the effect of huge=always with vma_thp_gfp_mask() in 5.12: > so shmem is probably applying stronger flags for compaction than it > would in your tree - I'm using > GFP_TRANSHUGE_LIGHT | __GFP_RECLAIM | __GFP_NORETRY there. > > Sorry for not giving you more info, I'm rather hoping that compaction > is relevant, and will give you a clue (maybe that capture code, which > surprised us once before??). While capture is a possibility, it's a bad fit for this patch because pages are captured under task context. > What I'm really trying to do is fix > the bug in Linus's rmap/TLB series, and its interaction with my > rmap series, and report back on his series (asking for temporary > drop), before next-20221107 goes down in flames. > > I'd advocate for dropping this patch of yours too; but if it's giving > nobody else any trouble, I can easily continue to patch it out. > Given that you tested the patch against v6.1-rc3, it's clear that the patch on its own causes problems. Having a reproduction case will help me figure out why. -- Mel Gorman SUSE Labs