On 24.08.22 16:30, Baolin Wang wrote: > > > On 8/24/2022 7:55 PM, David Hildenbrand wrote: >> On 24.08.22 11:41, Baolin Wang wrote: >>> >>> >>> On 8/24/2022 3:31 PM, David Hildenbrand wrote: >>>>>>>> >>>>>>>> IMHO, these follow_huge_xxx() functions are arch-specified at first and >>>>>>>> were moved into the common hugetlb.c by commit 9e5fc74c3025 ("mm: >>>>>>>> hugetlb: Copy general hugetlb code from x86 to mm"), and now there are >>>>>>>> still some arch-specified follow_huge_xxx() definition, for example: >>>>>>>> ia64: follow_huge_addr >>>>>>>> powerpc: follow_huge_pd >>>>>>>> s390: follow_huge_pud >>>>>>>> >>>>>>>> What I mean is that follow_hugetlb_page() is a common and >>>>>>>> not-arch-specified function, is it suitable to change it to be >>>>>>>> arch-specified? >>>>>>>> And thinking more, can we rename follow_hugetlb_page() as >>>>>>>> hugetlb_page_faultin() and simplify it to only handle the page faults of >>>>>>>> hugetlb like the faultin_page() for normal page? That means we can make >>>>>>>> sure only follow_page_mask() can handle hugetlb. >>>>>>>> >>>>>> >>>>>> Something like that might work, but you still have two page table walkers >>>>>> for hugetlb. I like David's idea (if I understand it correctly) of >>>>> >>>>> What I mean is we may change the hugetlb handling like normal page: >>>>> 1) use follow_page_mask() to look up a hugetlb firstly. >>>>> 2) if can not get the hugetlb, then try to page fault by >>>>> hugetlb_page_faultin(). >>>>> 3) if page fault successed, then retry to find hugetlb by >>>>> follow_page_mask(). >>>> >>>> That implies putting more hugetlbfs special code into generic GUP, >>>> turning it even more complicated. But of course, it depends on how the >>>> end result looks like. My gut feeling was that hugetlb is better handled >>>> in follow_hugetlb_page() separately (just like we do with a lot of other >>>> page table walkers). >>> >>> OK, fair enough. >>> >>>>> >>>>> Just a rough thought, and I need more investigation for my idea and >>>>> David's idea. >>>>> >>>>>> using follow_hugetlb_page for both cases. As noted, it will need to be >>>>>> taught how to not trigger faults in the follow_page_mask case. >>>>> >>>>> Anyway, I also agree we need some cleanup, and firstly I think we should >>>>> cleanup these arch-specified follow_huge_xxx() on some architectures >>>>> which are similar with the common ones. I will look into these. >>>> >>>> There was a recent discussion on that, e.g.: >>>> >>>> https://lkml.kernel.org/r/20220818135717.609eef8a@thinkpad >>> >>> Thanks. >>> >>>> >>>>> >>>>> However, considering cleanup may need more investigation and >>>>> refactoring, now I prefer to make these bug-fix patches of this patchset >>>>> into mainline firstly, which are suitable to backport to old version to >>>>> fix potential race issues. Mike and David, how do you think? Could you >>>>> help to review these patches? Thanks. >>>> >>>> Patch #1 certainly add more special code just to handle another hugetlb >>>> corner case (CONT pages), and maybe just making it all use >>>> follow_hugetlb_page() would be even cleaner and less error prone. >>>> >>>> I agree that locking is shaky, but I'm not sure if we really want to >>>> backport this to stable trees: >>>> >>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html >>>> >>>> "It must fix a real bug that bothers people (not a, “This could be a >>>> problem...” type thing)." >>>> >>>> >>>> Do we actually have any instance of this being a real (and not a >>>> theoretical) problem? If not, I'd rather clean it all up right away. >>> >>> I think this is a real problem (not theoretical), and easy to write some >>> code to show the issue. For example, suppose thread A is trying to look >>> up a CONT-PTE size hugetlb page under the lock, however antoher thread B >>> can migrate the CONT-PTE hugetlb page at the same time, which will cause >>> thread A to get an incorrect page, if thread A want to do something for >>> this incorrect page, error occurs. >>> >>> Actually we also want to backport these fixes to the distro with old >>> kernel versions to make the hugetlb more stable. Otherwise we must hit >>> these issues sooner or later if the customers use CONT-PTE/PMD hugetlb. >>> >>> Anyway, if you and Mike still think these issues are not important >>> enough to be fixed in the old versions, I can do the cleanup firstly. >>> >> >> [asking myself which follow_page() users actually care about hugetlb, >> and why we need this handling in follow_page at all] >> >> Which follow_page() user do we care about here? Primarily mm/migrate.c >> only I assume? > > Right, mainly affects the move_pages() syscall I think. Yes, I can not > know all of the users of the move_pages() syscall now or in the future > in our data center, but like I said the move_pages() syscall + hugetlb > can be a real potential stability issue. > I wonder if we can get rid of follow_page() completely, there are not too many users. Or alternatively simply make it use general GUP infrastructure more clearly. We'd need something like FOLL_NOFAULT that also covers "absolutely no faults". -- Thanks, David / dhildenb