On 3/1/21 9:23 AM, Michal Hocko wrote: > On Mon 01-03-21 08:39:22, Shakeel Butt wrote: >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > [...] >>> Then how come this can ever be a problem? in_task() should exclude soft >>> irq context unless I am mistaken. >>> >> >> If I take the following example of syzbot's deadlock scenario then >> CPU1 is the one freeing the hugetlb pages. It is in the process >> context but has disabled softirqs (see __tcp_close()). >> >> CPU0 CPU1 >> ---- ---- >> lock(hugetlb_lock); >> local_irq_disable(); >> lock(slock-AF_INET); >> lock(hugetlb_lock); >> <Interrupt> >> lock(slock-AF_INET); >> >> So, this deadlock scenario is very much possible. > > OK, I see the point now. I was focusing on the IRQ context and hugetlb > side too much. We do not need to be freeing from there. All it takes is > to get a dependency chain over a common lock held here. Thanks for > bearing with me. > > Let's see whether we can make hugetlb_lock irq safe. I may be confused, but it seems like we have a general problem with calling free_huge_page (as a result of put_page) with interrupts disabled. Consider the current free_huge_page code. Today, we drop the lock when processing gigantic pages because we may need to block on a mutex in cma code. If our caller has disabled interrupts, then it doesn't matter if the hugetlb lock is irq safe, when we drop it interrupts will still be disabled we can not block . Right? If correct, then making hugetlb_lock irq safe would not help. Again, I may be missing something. Note that we also are considering doing more with the hugetlb lock dropped in this path in the 'free vmemmap of hugetlb pages' series. Since we need to do some work that could block in this path, it seems like we really need to use a workqueue. It is too bad that there is not an interface to identify all the cases where interrupts are disabled. -- Mike Kravetz