On Mon, Jun 7, 2021 at 11:55 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 07-06-21 10:00:01, Yang Shi wrote: > > On Sun, Jun 6, 2021 at 11:21 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Fri 04-06-21 13:35:13, Yang Shi wrote: > > > > When trying to migrate pages to obey mempolicy, the huge zero page is > > > > split then the page table walk at PTE level just skips zero page. So it > > > > seems pointless to split huge zero page, it could be just skipped like > > > > base zero page. > > > > > > My THP knowledge is not the best but this is incorrect AIACS. Huge zero > > > page is not split. We do split the pmd which is mapping the said page. I > > > suspect you refer to vm_normal_page when talking about a zero page but > > > please be aware that huge zero page is not a normal zero page. It is > > > allocated dynamically (see get_huge_zero_page). > > > > For a normal huge page, yes, split_huge_pmd() just splits pmd. But > > actually the base zero pfn will be inserted to PTEs when splitting > > huge zero pmd. Please check __split_huge_zero_page_pmd() out. > > My bad. I didn't have a look all the way down there. The naming > suggested that this is purely page table operations and I have suspected > that ptes just point to the offset of the THP. > > But I am obviously wrong here. Sorry about that. > > > I should make this point clearer in the commit log. Sorry for the confusion. > > > > > > > > So in the end you patch disables mbind of zero pages to a target node > > > and that is a regression. > > > > Do we really migrate zero page? IIUC zero page is just skipped by > > vm_normal_page() check in queue_pages_pte_range(), isn't it? > > Yeah, normal zero pages are skipped indeed. I haven't studied why this > is the case yet. It surely sounds a bit suspicious because this is an > explicit request to migrate memory and if the zero page is misplaced it > should be moved. On the hand this would increase RSS so maybe this is > the point. The zero page is a global shared page, I don't think "misplace" applies to it. It doesn't make too much sense to migrate a shared page. Actually there is page mapcount check in migrate_page_add() to skip shared normal pages as well. > > > > Have you tested the patch? > > > > No, just build test. I thought this change was straightforward. > > > > > > > > > Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for > > > > this case. > > > > > > Btw. this changelog is missing a problem statement. I suspect there is > > > no actual problem that it should fix and it is likely driven by reading > > > the code. Right? > > > > The actual problem is it is pointless to split a huge zero pmd. Yes, > > it is driven by visual inspection. > > Is there any actual workload that cares? This is quite a subtle area so > I would be careful to do changes just because... I'm not sure whether there is measurable improvement for actual workloads, but I believe this change does eliminate some unnecessary work. I think the test shown in the previous email gives us some confidence that the change doesn't have regression. > -- > Michal Hocko > SUSE Labs