On Tue, May 28, 2019 at 04:53:01PM +0800, Hillf Danton wrote: > > On Mon, 20 May 2019 12:52:48 +0900 Minchan Kim wrote: > > +static int madvise_cool_pte_range(pmd_t *pmd, unsigned long addr, > > + unsigned long end, struct mm_walk *walk) > > +{ > > + pte_t *orig_pte, *pte, ptent; > > + spinlock_t *ptl; > > + struct page *page; > > + struct vm_area_struct *vma = walk->vma; > > + unsigned long next; > > + > > + next = pmd_addr_end(addr, end); > > + if (pmd_trans_huge(*pmd)) { > > + spinlock_t *ptl; > > Seems not needed with another ptl declared above. Will remove it. > > + > > + ptl = pmd_trans_huge_lock(pmd, vma); > > + if (!ptl) > > + return 0; > > + > > + if (is_huge_zero_pmd(*pmd)) > > + goto huge_unlock; > > + > > + page = pmd_page(*pmd); > > + if (page_mapcount(page) > 1) > > + goto huge_unlock; > > + > > + if (next - addr != HPAGE_PMD_SIZE) { > > + int err; > > Alternately, we deactivate thp only if the address range from userspace > is sane enough, in order to avoid complex works we have to do here. Not sure it's a good idea. That's the way we have done in MADV_FREE so want to be consistent. > > + > > + get_page(page); > > + spin_unlock(ptl); > > + lock_page(page); > > + err = split_huge_page(page); > > + unlock_page(page); > > + put_page(page); > > + if (!err) > > + goto regular_page; > > + return 0; > > + } > > + > > + pmdp_test_and_clear_young(vma, addr, pmd); > > + deactivate_page(page); > > +huge_unlock: > > + spin_unlock(ptl); > > + return 0; > > + } > > + > > + if (pmd_trans_unstable(pmd)) > > + return 0; > > + > > +regular_page: > > Take a look at pending signal? Do you have any reason to see pending signal here? I want to know what's your requirement so that what's the better place to handle it. > > > + orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > > + for (pte = orig_pte; addr < end; pte++, addr += PAGE_SIZE) { > > s/end/next/ ? Why do you think it should be next? > > + ptent = *pte; > > + > > + if (pte_none(ptent)) > > + continue; > > + > > + if (!pte_present(ptent)) > > + continue; > > + > > + page = vm_normal_page(vma, addr, ptent); > > + if (!page) > > + continue; > > + > > + if (page_mapcount(page) > 1) > > + continue; > > + > > + ptep_test_and_clear_young(vma, addr, pte); > > + deactivate_page(page); > > + } > > + > > + pte_unmap_unlock(orig_pte, ptl); > > + cond_resched(); > > + > > + return 0; > > +} > > + > > +static long madvise_cool(struct vm_area_struct *vma, > > + unsigned long start_addr, unsigned long end_addr) > > +{ > > + struct mm_struct *mm = vma->vm_mm; > > + struct mmu_gather tlb; > > + > > + if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)) > > + return -EINVAL; > > No service in case of VM_IO? I don't know VM_IO would have regular LRU pages but just follow normal convention for DONTNEED and FREE. Do you have anything in your mind?