On Sun, Sep 15, 2024 at 04:14:21PM GMT, Dan Carpenter wrote: > On Sun, Sep 15, 2024 at 01:38:40PM +0100, Lorenzo Stoakes wrote: > > + get_maintainers.pl people for drivers/misc/sgi-gru/grumain.c > > > > On Sun, Sep 15, 2024 at 03:09:35PM GMT, Dan Carpenter wrote: > > > On Sun, Sep 15, 2024 at 01:01:43PM +0100, Lorenzo Stoakes wrote: > > > > On Sun, Sep 15, 2024 at 01:08:27PM GMT, Dan Carpenter wrote: > > > > > Hi Linus, > > > > > > > > > > Commit 79a61cc3fc04 ("mm: avoid leaving partial pfn mappings around in > > > > > error case") from Sep 11, 2024 (linux-next), leads to the following > > > > > Smatch static checker warning: > > > > > > > > > > mm/memory.c:2709 remap_pfn_range_notrack() > > > > > warn: sleeping in atomic context > > > > > > > > > > mm/memory.c > > > > > 2696 int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr, > > > > > 2697 unsigned long pfn, unsigned long size, pgprot_t prot) > > > > > 2698 { > > > > > 2699 int error = remap_pfn_range_internal(vma, addr, pfn, size, prot); > > > > > 2700 > > > > > 2701 if (!error) > > > > > 2702 return 0; > > > > > 2703 > > > > > 2704 /* > > > > > 2705 * A partial pfn range mapping is dangerous: it does not > > > > > 2706 * maintain page reference counts, and callers may free > > > > > 2707 * pages due to the error. So zap it early. > > > > > 2708 */ > > > > > --> 2709 zap_page_range_single(vma, addr, size, NULL); > > > > > > > > > > The lru_add_drain() function at the start of zap_page_range_single() takes a > > > > > mutext. > > > > > > > > Hm does it? I see a local lock, and some folio batch locking which are > > > > local locks too? > > > > > > Ah... No it doesn't. It's the mmu_notifier_invalidate_range_start() which is > > > a might_sleep() function. Sorry for the confusion. > > > > OK so in conclusion it seems to be that Linus's commit introducing > > zap_page_range_single() accidentally had smatch hit a might_sleep() via > > mmu_notifier_invalidate_range_start(), but it should, in theory, have fired > > due to page table allocations invoking the page allocator that might sleep, > > but didn't, because smatch misses the below might_alloc() path... > > > > -> prepare_alloc_pages() > > -> might_alloc() > > -> might_sleep_if(gfpflags_allow_blocking(gfp_mask)) > > > > ...as a result of get_zeroed_page() tripping it up *breathes*. :) > > > > (please correct me if I am wrong here). > > That's an accurate summary... Thanks! > > > > > The preempt_disable() is introduced in commit fe5bb6b00c3a9 ("sgi-gru: misc > > GRU cleanup") from... 2009, but it fixed it from the far far more broken > > 'disable preemption before taking a mutex' situation that existed before. > > > > So fix seems to me to not invoke remap_pfn_range() with preemption disabled > > and a mutex held? gru_fault() maintainers added for input... > > Every time I get a response to this bug report I feel dumber. How did I not > see that this was a bug in drivers/misc/sgi-gru/? Here is another one from the > same driver: > > drivers/misc/sgi-gru/grukservices.c:262 gru_get_cpu_resources() warn: sleeping in atomic context Nothing to feel dumb about, this stuff is confounding by nature, if I had a penny for every time I felt dumb doing kernel work I'd be very rich by now! ;) > > regards, > dan carpenter Cheers for report! This means we can now get this thing fixed...