On Mon, Sep 03, 2018 at 07:56:54AM +0200, Michal Hocko wrote: > On Thu 30-08-18 14:39:44, Jerome Glisse wrote: > > On Thu, Aug 30, 2018 at 11:05:16AM -0700, Mike Kravetz wrote: > > > On 08/30/2018 09:57 AM, Jerome Glisse wrote: > > > > On Thu, Aug 30, 2018 at 06:19:52PM +0200, Michal Hocko wrote: > > > >> On Thu 30-08-18 10:08:25, Jerome Glisse wrote: > > > >>> On Thu, Aug 30, 2018 at 12:56:16PM +0200, Michal Hocko wrote: > > > >>>> On Wed 29-08-18 17:11:07, Jerome Glisse wrote: > > > >>>>> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote: > > > >>>>>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote: > > > >>>>>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote: > > > >>>>>> [...] > > > >>>>>>>> What would be the best mmu notifier interface to use where there are no > > > >>>>>>>> start/end calls? > > > >>>>>>>> Or, is the best solution to add the start/end calls as is done in later > > > >>>>>>>> versions of the code? If that is the suggestion, has there been any change > > > >>>>>>>> in invalidate start/end semantics that we should take into account? > > > >>>>>>> > > > >>>>>>> start/end would be the one to add, 4.4 seems broken in respect to THP > > > >>>>>>> and mmu notification. Another solution is to fix user of mmu notifier, > > > >>>>>>> they were only a handful back then. For instance properly adjust the > > > >>>>>>> address to match first address covered by pmd or pud and passing down > > > >>>>>>> correct page size to mmu_notifier_invalidate_page() would allow to fix > > > >>>>>>> this easily. > > > >>>>>>> > > > >>>>>>> This is ok because user of try_to_unmap_one() replace the pte/pmd/pud > > > >>>>>>> with an invalid one (either poison, migration or swap) inside the > > > >>>>>>> function. So anyone racing would synchronize on those special entry > > > >>>>>>> hence why it is fine to delay mmu_notifier_invalidate_page() to after > > > >>>>>>> dropping the page table lock. > > > >>>>>>> > > > >>>>>>> Adding start/end might the solution with less code churn as you would > > > >>>>>>> only need to change try_to_unmap_one(). > > > >>>>>> > > > >>>>>> What about dependencies? 369ea8242c0fb sounds like it needs work for all > > > >>>>>> notifiers need to be updated as well. > > > >>>>> > > > >>>>> This commit remove mmu_notifier_invalidate_page() hence why everything > > > >>>>> need to be updated. But in 4.4 you can get away with just adding start/ > > > >>>>> end and keep around mmu_notifier_invalidate_page() to minimize disruption. > > > >>>> > > > >>>> OK, this is really interesting. I was really worried to change the > > > >>>> semantic of the mmu notifiers in stable kernels because this is really > > > >>>> a hard to review change and high risk for anybody running those old > > > >>>> kernels. If we can keep the mmu_notifier_invalidate_page and wrap them > > > >>>> into the range scope API then this sounds like the best way forward. > > > >>>> > > > >>>> So just to make sure we are at the same page. Does this sounds goo for > > > >>>> stable 4.4. backport? Mike's hugetlb pmd shared fixup can be applied on > > > >>>> top. What do you think? > > > >>> > > > >>> You need to invalidate outside page table lock so before the call to > > > >>> page_check_address(). For instance like below patch, which also only > > > >>> do the range invalidation for huge page which would avoid too much of > > > >>> a behavior change for user of mmu notifier. > > > >> > > > >> Right. I would rather not make this PageHuge special though. So the > > > >> fixed version should be. > > > > > > > > Why not testing for huge ? Only huge is broken and thus only that > > > > need the extra range invalidation. Doing the double invalidation > > > > for single page is bit overkill. > > > > > > I am a bit confused, and hope this does not add to any confusion by others. > > > > > > IIUC, the patch below does not attempt to 'fix' anything. It is simply > > > there to add the start/end notifiers to the v4.4 version of this routine > > > so that a subsequent patch can use them (with modified ranges) to handle > > > unmapping a shared pmd huge page. That is the mainline fix which started > > > this thread. > > > > > > Since we are only/mostly interested in fixing the shared pmd issue in > > > 4.4, how about just adding the start/end notifiers to the very specific > > > case where pmd sharing is possible? > > > > > > I can see the value in trying to back port dependent patches such as this > > > so that stable releases look more like mainline. However, I am not sure of > > > the value in this case as this patch was part of a larger set changing > > > notifier semantics. > > > > For all intents and purposes this is not a backport of the original > > patch so maybe we should just drop the commit reference and just > > explains that it is there to fix mmu notifier in respect to huge page > > migration. > > > > The original patches fix more than this case because newer featurers > > like THP migration, THP swapping, ... added more cases where things > > would have been wrong. But in 4.4 frame there is only huge tlb fs > > migration. > > And THP migration is still a problem with 4.4 AFAICS. All other cases > simply split the huge page but THP migration keeps it in one piece and > as such it is theoretically broken as you have explained. So I would > stick with what I posted with some more clarifications in the changelog > if you think it is appropriate (suggestions welcome). Reading code there is no THP migration in 4.4 only huge tlb migration. Look at handle_mm_fault which do not know how to handle swap pmd, only the huge tlb fs fault handler knows how to handle those. Hence why i was checking for huge tlb exactly as page_check_address() to only range invalidate for huge tlb fs migration. But i am fine with doing the range invalidation with all. Cheers, Jérôme