Hi Andrew, On 8/20/2022 8:45 AM, Andrew Morton wrote: > On Wed, 10 Aug 2022 14:49:07 +0800 Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote: > >> If there is private data attached to THP, the refcount of >> THP will be increased and block the THP split. Release >> private data attached to THP before split it to increase >> the chance of splitting THP successfully. >> >> There was a memory failure issue hit during HW error >> injection testing with 5.18 kernel + xfs as rootfs. Test >> got killed and system reboot was required to re-run the >> test. >> >> The issue was tracked down to THP split failure caused the >> memory failure not being handled. The page dump showed: >> >> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200 >> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0 >> [ 1785.452408] memcg:ff4247f2d28e9000 >> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx" >> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2) >> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8 >> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000 >> >> It was like the error was injected to a large folio for xfs >> with private data attached. >> >> With private data released before split THP, the test case >> could be run successfully many times without reboot system. > > I did a bit of editorial work on the changelog. Please check, Note my > addition of "attempt to" to the second sentence. Thanks a lot for the update. Looks good to me. > > : If there is private data attached to a THP, the refcount of THP will be > : increased and will prevent the THP from being split. Attempt to release > : any private data attached to the THP before attempting the split to > : increase the chance of splitting successfully. > : > : There was a memory failure issue hit during HW error injection testing > : with 5.18 kernel + xfs as rootfs. The test was killed and a system reboot > : was required to re-run the test. > : > : The issue was tracked down to a THP split failure caused by the memory > : failure not being handled. The page dump showed: > : > : [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200 > : [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0 > : [ 1785.452408] memcg:ff4247f2d28e9000 > : [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx" > : [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2) > : [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8 > : [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000 > : > : It was like the error was injected to a large folio for xfs with private > : data attached. > : > : With private data released before splitting the THP, the test case could > : be run successfully many times without rebooting the system. > >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> >> ... >> >> @@ -2635,8 +2637,16 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) >> goto out; >> } >> >> - xas_split_alloc(&xas, head, compound_order(head), >> - mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); >> + gfp = current_gfp_context(mapping_gfp_mask(mapping) & >> + GFP_RECLAIM_MASK); >> + >> + if (folio_test_private(folio) && >> + !filemap_release_folio(folio, gfp)) { >> + ret = -EBUSY; >> + goto out; >> + } >> + >> + xas_split_alloc(&xas, head, compound_order(head), gfp); > > Because I assume we run into the same problem if > filemap_release_folio() fails? Yes. You are right. Thanks. Regards Yin, Fengwei