On 11/15/21 22:03, Baolin Wang wrote: > > > On 2021/11/16 12:21, Andrew Morton wrote: >> On Sun, 7 Nov 2021 16:57:26 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: >> >>> Correct the migration stats for hugetlb with using compound_nr() instead >>> of thp_nr_pages(), >> >> It would be helpful to explain why using thp_nr_pages() was wrong. > > Sure. Using thp_nr_pages() to get the number of subpages for a hugetlb is incorrect, since the number of subpages in te hugetlb is not always HPAGE_PMD_NR. > Correct. However, prior to this patch the return value from thp_nr_pages was never used for hugetlb pages; only THP. So, this really did not have any bad side effects prior to this patch that I can see. >> And to explain the end user visible effects of this bug so we can > > Actually not also user visible effect, but also hugetlb migration stats in kernel are incorrect. For he end user visible effects, like I described in patch 1, the syscall move_pages() can return a non-migrated number larger than the number of pages the users tried to migrate, when a THP page is failed to migrate. This is confusing for users. > It looks like hugetlb pages were never taken into account when originally defining the migration stats. In the documentation (page_migration.rst) it only talks about Normal and THP pages. It does not mention how hugetlb pages are counted. Currently, hugetlb pages count as 'a single page' in the stats PGMIGRATE_SUCCESS/FAIL. Correct? After this change we will increment these stats by the number of sub-pages. Correct? I 'think' this is OK since the behavior is not really defined today. But, we are changing user visible output. Perhaps we should go ahead and document the hugetlb behavior when making these changes? -- Mike Kravetz