On Wed, 6 Apr 2016, Mika Penttila wrote: > On 04/06/2016 12:53 AM, Hugh Dickins wrote: > > +static void shmem_recovery_work(struct work_struct *work) ... > > + > > + if (head) { > > + /* We are resuming work from a previous partial recovery */ > > + if (PageTeam(page)) > > + shr_stats(resume_teamed); > > + else > > + shr_stats(resume_tagged); > > + } else { > > + gfp_t gfp = mapping_gfp_mask(mapping); > > + /* > > + * XXX: Note that with swapin readahead, page_to_nid(page) will > > + * often choose an unsuitable NUMA node: something to fix soon, > > + * but not an immediate blocker. > > + */ > > + head = __alloc_pages_node(page_to_nid(page), > > + gfp | __GFP_NOWARN | __GFP_THISNODE, HPAGE_PMD_ORDER); > > + if (!head) { > > + shr_stats(huge_failed); > > + error = -ENOMEM; > > + goto out; > > + } > > Should this head marked PageTeam? Because in patch 27/31 when given as a hint to shmem_getpage_gfp() : > > hugehint = NULL; > + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && > + sgp == SGP_TEAM && *pagep) { > + struct page *head; > + > + if (!get_page_unless_zero(*pagep)) { > + error = -ENOENT; > + goto decused; > + } > + page = *pagep; > + lock_page(page); > + head = page - (index & (HPAGE_PMD_NR-1)); > > we fail always because : > + if (!PageTeam(head)) { > + error = -ENOENT; > + goto decused; > + } Great observation, thank you Mika. We don't fail always, because in most cases the page wanted for the head will either be already in memory, or read in from swap, and that SGP_TEAM block in shmem_getpage_gfp() (with the -ENOENT you show) not come into play on it: then shmem_recovery_populate() does its !recovery->exposed_team SetPageTeam(head) and all is well from then on. But I think what you point out means that the current recovery code is incapable of assembling a hugepage if its first page was not already instantiated earlier: not something I'd realized until you showed me. Not a common failing, and would never be the case for an extent which had been mapped huge in the past, but it's certainly not what I'd intended. As to whether the head should be marked PageTeam immediately after the hugepage allocation: I think not, especially because of the swapin case (26/31). Swapin may need to read data from disk into that head page, and I've never had to think about the consequences of having a swap page marked PageTeam. Perhaps it would work out okay, but I'd prefer not to go there. At this moment I'm too tired to think what the right answer will be, and certainly won't be able to commit to any without some testing. So, not as incapacitating as perhaps you thought, and not any danger to people trying out huge tmpfs, but definitely something to be fixed: I'll mull it over in the background and let you know when I'm sure. Thank you again, Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>