Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4 Apr 2022, at 11:18, Naoya Horiguchi wrote:

> On Mon, Apr 04, 2022 at 10:47:20AM -0400, Zi Yan wrote:
>> On 4 Apr 2022, at 10:29, Matthew Wilcox wrote:
>>
>>> On Mon, Apr 04, 2022 at 10:05:00AM -0400, Zi Yan wrote:
>>>> On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:
>>>>> I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
>>>>> (and also reproducible with mmotm on 3/31).
>>>>> I have no idea about the bug's mechanism, but it seems not to be
>>>>> shared in LKML yet, so let me just share. config.gz is attached.
>>>>>
>>>>> [   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
>>>>> [   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
>>>>> [   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
>>>>> [   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
>>>>> [   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
>>>>> [   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
>>>>> [   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
>>>>> [   48.249196] ------------[ cut here ]------------
>>>>> [   48.251240] kernel BUG at mm/memcontrol.c:6857!
>>>>> [   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
>>>>> [   48.286942] Call Trace:
>>>>> [   48.287665]  <TASK>
>>>>> [   48.288255]  iomap_migrate_page+0x64/0x190
>>>>> [   48.289366]  move_to_new_page+0xa3/0x470
>>>>
>>>> Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
>>>> Would the patch below fix the issue?
>
> I briefly confirmed that this bug didn't reproduce with your change,
> thank you very much!
>

Thanks.


Hi Matthew,

I am wondering if my change is the right fix or not. folios with order>0
are still available when CONFIG_TRANSPARENT_HUGEPAGE is not set, right?
Then, PageTransHuge always returns false and the VM_BUG will still be
triggered, since there is no code to allocate folios with order>0.

Maybe the patch below could cover !CONFIG_TRANSPARENT_HUGEPAGE too?

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a2516d31db6c..6e60b5c4b565 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1209,7 +1209,7 @@ static struct page *new_page(struct page *page, unsigned long start)
                struct page *thp;

                thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
-                                        HPAGE_PMD_ORDER);
+                                        thp_order(page));
                if (!thp)
                        return NULL;
                prep_transhuge_page(thp);
@@ -1218,8 +1218,8 @@ static struct page *new_page(struct page *page, unsigned long start)
        /*
         * if !vma, alloc_page_vma() will use task or system default policy
         */
-       return alloc_page_vma(GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL,
-                       vma, address);
+       return alloc_pages_vma(GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL,
+                       folio_order(page_folio(page), vma, address);
 }
 #else

diff --git a/mm/migrate.c b/mm/migrate.c
index de175e2fdba5..b079605854d7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1522,7 +1522,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
 {
        struct migration_target_control *mtc;
        gfp_t gfp_mask;
-       unsigned int order = 0;
+       unsigned int order = folio_order(page_folio(page));
        struct page *new_page = NULL;
        int nid;
        int zidx;
@@ -1547,7 +1547,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
                 */
                gfp_mask &= ~__GFP_RECLAIM;
                gfp_mask |= GFP_TRANSHUGE;
-               order = HPAGE_PMD_ORDER;
+               order = thp_order(page);
        }
        zidx = zone_idx(page_zone(page));
        if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)


>>>
>>> This looks entirely plausible to me!  I do have changes in this area,
>>> but clearly I should have submitted them earlier.  Let's get these fixes
>>> in as they are.
>>>
>>> Is there a test suite that tests page migration?  I usually use xfstests
>>> and it does no page migration at all (at least 'git grep migrate'
>>> finds nothing useful).
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flinux-test-project%2Fltp&amp;data=04%7C01%7Cziy%40nvidia.com%7Cec512f5a763543d4f99608da164e5413%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637846822934713102%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Ig%2Ba4GEkks5vShdpfX8RSX5csCTKq3dmtaOqjpOmelk%3D&amp;reserved=0 has some migrate_pages and move_pages
>> tests. You can run them after install ltp:
>> sudo ./runltp -f syscalls -s migrate_pages and
>> sudo ./runltp -f sys calls -s move_pages


--
Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux