On Wed, Jul 12, 2023 at 12:35 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 12.07.23 01:45, Suren Baghdasaryan wrote: > > On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > >> > >> On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote: > >>> On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@xxxxxxxxxx> wrote: > >>>> > >>>> Hello Suren Baghdasaryan, > >>>> > >>>> The patch 1c71222e5f23: "mm: replace vma->vm_flags direct > >>>> modifications with modifier calls" from Jan 26, 2023, leads to the > >>>> following Smatch static checker warning: > >>>> > >>>> ./include/linux/mm.h:729 vma_start_write() > >>>> warn: sleeping in atomic context > >>>> > >>>> include/linux/mm.h > >>>> 722 static inline void vma_start_write(struct vm_area_struct *vma) > >>>> 723 { > >>>> 724 int mm_lock_seq; > >>>> 725 > >>>> 726 if (__is_vma_write_locked(vma, &mm_lock_seq)) > >>>> 727 return; > >>>> 728 > >>>> --> 729 down_write(&vma->vm_lock->lock); > >>>> 730 vma->vm_lock_seq = mm_lock_seq; > >>>> 731 up_write(&vma->vm_lock->lock); > >>>> 732 } > >>>> > >>>> The call tree is: > >>>> > >>>> gru_fault() <- disables preempt > >>>> -> remap_pfn_range() > >>>> -> track_pfn_remap() > >>>> -> remap_pfn_range_notrack() > >>>> -> vm_flags_set() > >>>> -> vma_start_write() > >>>> > >>>> Before track_pfn_remap() and remap_pfn_range_notrack() would just do |= > >>>> to set the flags but now they use vm_flags_set() so there is a potential > >>>> they could sleep. > >>> > >>> Hi Dan, > >>> Thanks for reporting! Looks like the page fault handler is modifying > >>> the VMA flags, which has to be done under write-locked mmap_lock and I > >>> don't see that being done here... I wonder if that should be allowed. > >>> I'm CC'ing some MM folks to check if this is a valid VMA modification > >>> and should be allowed. Matthew, this might be especially interesting > >>> for you since gru_fault() handles file-backed page faults AFAIKT. > >> > >> I don't run the ->fault handler under RCU, only the ->map_pages() > >> method. I don't intend to change that. > >> > >>> Back to the issue at hand. If such modification should be indeed > >>> allowed then the simplest fix I think would be to add new > >>> remap_pfn_range_locked() function to be called from gru_fault() which > >>> would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod() > >>> does not lock the VMA, so would not have this issue. If the conclusion > >>> is that this is a valid scenario then I can post a fix I described. > >> > >> I'm not certain, but calling remap_pfn_range() in the fault handler > >> is definitely weird. It's normally called _instead_ of having a fault > >> handler. The fault handler usually calls set_pte_at() directly. > > > > Hmm. Is it weird enough to be considered invalid or weird but still ok? > > Also, is it ok to modify VMA flags here without write-locking the > > mmap_lock (and without write-locking the VMA)? The fault handler is > > done under read-locked mmap_lock but I thought VMA modifications > > require stronger locking... > > > > The "easy" fix would be to have something like "remap_pfn_range_prepare" > that the remap_pfn_range() caller calls during mmap(). Are you suggesting to break remap_pfn_range() into two stages (remap_pfn_range_prepare() then remap_pfn_range())? If so, there are many places remap_pfn_range() is called and IIUC all of them would need to use that 2-stage approach (lots of code churn). In addition, this is an exported function, so many more drivers might expect the current behavior. My suggestion was to have another function called smth like remap_pfn_range_nolock() and internally use the same code with an additional flag that would tell us whether we should lock the vma or not. Such change should be quite simple and small. > > We can let that one set these flags, and then we can later let > remap_pfn_range() fail if the relevant flags are not set. > > It's certainly one of these "always done like that and suboptimal but > somehow it worked" thingies. > > I suspect, because only the first pagefault->remap_pfn_range() will > actually modify the VMA flags, that this is ok. Otherwise, there usually > wouldn't be any pagetables and nothing mapped, so who really cares about > these VMA flags (e.g., GUP cannot pin anything if nothing is mapped). Is my understanding correct that the reasoning remap_pfn_range() is allowed here without write-locking the VMA (or write-locking mmap_lock) is because it's done only if there are no pre-existing patables? Thanks, Suren. > > -- > Cheers, > > David / dhildenb >