On Wed, Apr 17, 2019 at 09:15:52AM +0000, Thomas Hellstrom wrote: > On Tue, 2019-04-16 at 10:46 -0400, Jerome Glisse wrote: > > On Sat, Apr 13, 2019 at 08:34:02AM +0000, Thomas Hellstrom wrote: > > > Hi, Jérôme > > > > > > On Fri, 2019-04-12 at 17:07 -0400, Jerome Glisse wrote: > > > > On Fri, Apr 12, 2019 at 04:04:18PM +0000, Thomas Hellstrom wrote: [...] > > > > > -/* > > > > > - * Scan a region of virtual memory, filling in page tables as > > > > > necessary > > > > > - * and calling a provided function on each leaf page table. > > > > > +/** > > > > > + * apply_to_pfn_range - Scan a region of virtual memory, > > > > > calling a > > > > > provided > > > > > + * function on each leaf page table entry > > > > > + * @closure: Details about how to scan and what function to > > > > > apply > > > > > + * @addr: Start virtual address > > > > > + * @size: Size of the region > > > > > + * > > > > > + * If @closure->alloc is set to 1, the function will fill in > > > > > the > > > > > page table > > > > > + * as necessary. Otherwise it will skip non-present parts. > > > > > + * Note: The caller must ensure that the range does not > > > > > contain > > > > > huge pages. > > > > > + * The caller must also assure that the proper mmu_notifier > > > > > functions are > > > > > + * called. Either in the pte leaf function or before and after > > > > > the > > > > > call to > > > > > + * apply_to_pfn_range. > > > > > > > > This is wrong there should be a big FAT warning that this can > > > > only be > > > > use > > > > against mmap of device file. The page table walking above is > > > > broken > > > > for > > > > various thing you might find in any other vma like THP, device > > > > pte, > > > > hugetlbfs, > > > > > > I was figuring since we didn't export the function anymore, the > > > warning > > > and checks could be left to its users, assuming that any other > > > future > > > usage of this function would require mm people audit anyway. But I > > > can > > > of course add that warning also to this function if you still want > > > that? > > > > Yeah more warning are better, people might start using this, i know > > some poeple use unexported symbol and then report bugs while they > > just were doing something illegal. > > > > > > ... > > > > > > > > Also the mmu notifier can not be call from the pfn callback as > > > > that > > > > callback > > > > happens under page table lock (the change_pte notifier callback > > > > is > > > > useless > > > > and not enough). So it _must_ happen around the call to > > > > apply_to_pfn_range > > > > > > In the comments I was having in mind usage of, for example > > > ptep_clear_flush_notify(). But you're the mmu_notifier expert here. > > > Are > > > you saying that function by itself would not be sufficient? > > > In that case, should I just scratch the text mentioning the pte > > > leaf > > > function? > > > > ptep_clear_flush_notify() is useless ... i have posted patches to > > either > > restore it or remove it. In any case you must call mmu notifier range > > and > > they can not happen under lock. You usage looked fine (in the next > > patch) > > but i would rather have a bit of comment here to make sure people are > > also > > aware of that. > > > > While we can hope that people would cc mm when using mm function, it > > is > > not always the case. So i rather be cautious and warn in comment as > > much > > as possible. > > > > OK. Understood. All this actually makes me tend to want to try a bit > harder using a slight modification to the pagewalk code instead. Don't > really want to encourage two parallel code paths doing essentially the > same thing; one good and one bad. > > One thing that confuses me a bit with the pagewalk code is that callers > (for example softdirty) typically call > mmu_notifier_invalidate_range_start() around the pagewalk, but then if > it ends up splitting a pmd, mmu_notifier_invalidate_range is called > again, within the first range. Docs aren't really clear whether that's > permitted or not. Is it? It is mandatory ie you have to call mmu_notifier_invalidate_range() in some cases. This is all documented in mmu_notifier.h see struct mmu_notifier_ops comments and also Documentation/vm/mmu_notifier.rst Roughly anytime you go from one valid pte (pmd/pud/p4d) to another valid pte (pmd/pud/p4d) with a different page then you have to call after clearing pte (pmd/pud/p4d) and before replacing it with its new value. Changing permission on same page ie going from read and write to read only, or read only to read and write, does not require any extra call to mmu_notifier_invalidate_range() The mmu_notifier_invalidate_range() is important for IOMMU with ATS/ PASID as it is when the flush the TLB and remote device TLB. So you must flush those secondary TLB after clearing entry so that it can not race to repopulate the TLB and before setting the new entry so that at no point in time any hardware can wrongly access old page while a new page is just now active. Hopes that clarify it, between if you see any improvement to mmu- notifier doc it would be more than welcome. I try to put comments in enough places that people should see at least one of them but maybe i miss a place where i should have put a comments to point to the doc :) Cheers, Jérôme