On Wed, Mar 23, 2022 at 10:45:30AM +0100, Michal Hocko wrote: > [Let me add more people to the CC list - I am not really sure who is the > most familiar with all the tricks that mmu notifiers might do] > > On Wed 23-03-22 09:43:59, Juergen Gross wrote: > > Hi, > > > > during analysis of a customer's problem on a 4.12 based kernel > > (deadlock due to a blocking mmu notifier in a Xen driver) I came > > across upstream patches 93065ac753e4 ("mm, oom: distinguish > > blockable mode for mmu notifiers") et al. > > > > The backtrace of the blocked tasks was typically something like: > > > > #0 [ffffc9004222f228] __schedule at ffffffff817223e2 > > #1 [ffffc9004222f2b8] schedule at ffffffff81722a02 > > #2 [ffffc9004222f2c8] schedule_preempt_disabled at ffffffff81722d0a > > #3 [ffffc9004222f2d0] __mutex_lock at ffffffff81724104 > > #4 [ffffc9004222f360] mn_invl_range_start at ffffffffc01fd398 [xen_gntdev] > > #5 [ffffc9004222f398] __mmu_notifier_invalidate_page at ffffffff8123375a > > #6 [ffffc9004222f3c0] try_to_unmap_one at ffffffff812112cb > > #7 [ffffc9004222f478] rmap_walk_file at ffffffff812105cd > > #8 [ffffc9004222f4d0] try_to_unmap at ffffffff81212450 > > #9 [ffffc9004222f508] shrink_page_list at ffffffff811e0755 > > #10 [ffffc9004222f5c8] shrink_inactive_list at ffffffff811e13cf > > #11 [ffffc9004222f6a8] shrink_node_memcg at ffffffff811e241f > > #12 [ffffc9004222f790] shrink_node at ffffffff811e29c5 > > #13 [ffffc9004222f808] do_try_to_free_pages at ffffffff811e2ee1 > > #14 [ffffc9004222f868] try_to_free_pages at ffffffff811e3248 > > #15 [ffffc9004222f8e8] __alloc_pages_slowpath at ffffffff81262c37 > > #16 [ffffc9004222f9f0] __alloc_pages_nodemask at ffffffff8121afc1 > > #17 [ffffc9004222fa48] alloc_pages_current at ffffffff8122f350 > > #18 [ffffc9004222fa78] __get_free_pages at ffffffff8121685a > > #19 [ffffc9004222fa80] __pollwait at ffffffff8127e795 > > #20 [ffffc9004222faa8] evtchn_poll at ffffffffc00e802b [xen_evtchn] > > #21 [ffffc9004222fab8] do_sys_poll at ffffffff8127f953 > > #22 [ffffc9004222fec8] sys_ppoll at ffffffff81280478 > > #23 [ffffc9004222ff30] do_syscall_64 at ffffffff81004954 > > #24 [ffffc9004222ff50] entry_SYSCALL_64_after_hwframe at ffffffff818000b6 > > > > It was found that the notifier of the Xen gntdev driver was using a > > mutex resulting in the deadlock. The bug here is that prior to commit a81461b0546c ("xen/gntdev: update to new mmu_notifier semantic") wired the mn_invl_range_start() which takes a mutex to invalidate_page, which is defined to run in an atomic context. > > Michal Hocko suggested that backporting above mentioned patch might > > help, as the mmu notifier call is happening in atomic context. IIRC "blocking" was not supposed to be about atomic context or not, but more about time to completion/guarenteed completion as it is used from a memory reclaim path. > Just to be more explicit. The current upstream code calls > mmu_notifier_invalidate_range while the page table locks are held. > Are there any notifiers which could sleep in those? There should not be, that would be a bug that lockdep would find. > In other words should we use the nonblock start/stop in > try_to_unmap? AFAICT, no. Jason