> On Aug 2, 2019, at 3:31 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > On 08/01, Song Liu wrote: >> >> >>> On Aug 1, 2019, at 7:50 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: >>> >>> On 07/31, Song Liu wrote: >>>> >>>> +static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm, >>>> + unsigned long addr) >>>> +{ >>>> + struct mm_slot *mm_slot; >>>> + int ret = 0; >>>> + >>>> + /* hold mmap_sem for khugepaged_test_exit() */ >>>> + VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm); >>>> + VM_BUG_ON(addr & ~HPAGE_PMD_MASK); >>>> + >>>> + if (unlikely(khugepaged_test_exit(mm))) >>>> + return 0; >>>> + >>>> + if (!test_bit(MMF_VM_HUGEPAGE, &mm->flags) && >>>> + !test_bit(MMF_DISABLE_THP, &mm->flags)) { >>>> + ret = __khugepaged_enter(mm); >>>> + if (ret) >>>> + return ret; >>>> + } >>> >>> could you explain why do we need mm->mmap_sem, khugepaged_test_exit() check >>> and __khugepaged_enter() ? >> >> If the mm doesn't have a mm_slot, we would like to create one here (by >> calling __khugepaged_enter()). > > I can be easily wrong, I never read this code before, but this doesn't > look correct. > > Firstly, mm->mmap_sem cam ONLY help if a) the task already has mm_slot > and b) this mm_slot is khugepaged_scan.mm_slot. Otherwise khugepaged_exit() > won't take mmap_sem for writing and thus we can't rely on test_exit(). > > and this means that down_read(mmap_sem) before khugepaged_add_pte_mapped_thp() > is pointless and can't help; this mm was found by vma_interval_tree_foreach(). > > so __khugepaged_enter() can race with khugepaged_exit() and this is wrong > in any case. > >> This happens when the THP is created by another mm, or by tmpfs with >> "huge=always"; and then page table of this mm got split by split_huge_pmd(). >> With current kernel, this happens when we attach/detach uprobe to a file >> in tmpfs with huge=always. > > Well. In this particular case khugepaged_enter() was likely already called > by shmem_mmap() or khugepaged_enter_vma_merge(), or madvise. > > (in fact I think do_set_pmd() or shmem_fault() should call _enter() too, > like do_huge_pmd_anonymous_page() does, but this is another story). Hmm.. I didn't notice the one in shmem_mmap(). And yes, you are right, we don't really need __khugepaged_enter() in khugepaged_add_pte_mapped_thp(), especially when uprobes.c doesn't call it. This should simplify this patch. > > > And I forgot to mention... I don't understand why > khugepaged_collapse_pte_mapped_thps() has to be called with khugepaged_mm_lock. You are right that khugepaged_collapse_pte_mapped_thps() doesn't need khugepaged_mm_lock in this version. For v1, when uprobes.c calls khugepaged_add_pte_mapped_thp(), this is necessary. Let me clean this up. Thanks, Song