On Mon, Oct 10, 2022 at 2:45 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 10/10/22 16:22, Mel Gorman wrote: > > On Wed, Aug 24, 2022 at 10:58:26PM -0600, Yu Zhao wrote: > >> On Wed, Aug 24, 2022 at 8:18 AM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > >>> > >>> The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task > >>> allocating from the PCP must not re-enter the allocator from IRQ context. > >>> In each instance where IRQ-reentrancy is possible, the lock is acquired using > >>> pcp_spin_trylock_irqsave() even though IRQs are disabled and re-entrancy > >>> is impossible. > >>> > >>> Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common > >>> case at the cost of some IRQ allocations taking a slower path. If the PCP > >>> lists need to be refilled, the zone lock still needs to disable IRQs but > >>> that will only happen on PCP refill and drain. If an IRQ is raised when > >>> a PCP allocation is in progress, the trylock will fail and fallback to > >>> using the buddy lists directly. Note that this may not be a universal win > >>> if an interrupt-intensive workload also allocates heavily from interrupt > >>> context and contends heavily on the zone->lock as a result. > >> > >> Hi, > >> > >> This patch caused the following warning. Please take a look. > >> > >> Thanks. > >> > >> WARNING: inconsistent lock state > >> 6.0.0-dbg-DEV #1 Tainted: G S W O > >> -------------------------------- > > > > I finally found time to take a closer look at this and I cannot reproduce > > it against 6.0. What workload triggered the warning, on what platform and > > can you post the kernel config used please? It would also help if you > > can remember what git commit the patch was tested upon. > > > > Thanks and sorry for the long delay. > > I didn't (try to) reproduce this, but FWIW the report looked legit to > me, as after the patch, pcp_spin_trylock() has to be used for both > allocation and freeing to be IRQ safe. free_unref_page() uses it, so > it's fine. But as the stack trace in the report shows, > free_unref_page_list() does pcp_spin_lock() and not _trylock, and that's > IMHO the problem. If this is not the case, please let me know and I'll try repro again.