On Wed, 14 Jun 2017 22:04:55 -0400 Jerome Glisse <jglisse@xxxxxxxxxx> wrote: > On Thu, Jun 15, 2017 at 11:41:59AM +1000, Balbir Singh wrote: > > On Wed, 14 Jun 2017 16:11:43 -0400 > > Jérôme Glisse <jglisse@xxxxxxxxxx> wrote: > > > > > HMM pages (private or public device pages) are ZONE_DEVICE page and > > > thus need special handling when it comes to lru or refcount. This > > > patch make sure that memcontrol properly handle those when it face > > > them. Those pages are use like regular pages in a process address > > > space either as anonymous page or as file back page. So from memcg > > > point of view we want to handle them like regular page for now at > > > least. > > > > > > Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx> > > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > > > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > > > Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> > > > Cc: cgroups@xxxxxxxxxxxxxxx > > > --- > > > kernel/memremap.c | 2 ++ > > > mm/memcontrol.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++----- > > > 2 files changed, 55 insertions(+), 5 deletions(-) > > > > > > diff --git a/kernel/memremap.c b/kernel/memremap.c > > > index da74775..584984c 100644 > > > --- a/kernel/memremap.c > > > +++ b/kernel/memremap.c > > > @@ -479,6 +479,8 @@ void put_zone_device_private_or_public_page(struct page *page) > > > __ClearPageActive(page); > > > __ClearPageWaiters(page); > > > > > > + mem_cgroup_uncharge(page); > > > + > > > > A zone device page could have a mem_cgroup charge if > > > > 1. The old page was charged to a cgroup and the new page from ZONE_DEVICE then > > gets the charge that we need to drop here > > > > And should not be charged > > > > 2. If the driver allowed mmap based allocation (these pages are not on LRU > > > > > > Since put_zone_device_private_or_public_page() is called from release_pages(), > > I think the assumption is that 2 is not a problem? I've not tested the mmap > > bits yet. > > Well that is one of the big question. Do we care about memory cgroup despite > page not being on lru and thus not being reclaimable through the usual path ? > > I believe we do want to keep charging ZONE_DEVICE page against memory cgroup > so that userspace limit are enforced. This is important especialy for device > private when migrating back to system memory due to CPU page fault. We do not > want the migration back to fail because of memory cgroup limit. > > Hence why i do want to charge ZONE_DEVICE page just like regular page. If we > have people that run into OOM because of this then we can start thinking about > how to account those pages slightly differently inside the memory cgroup. > > For now i believe we do want this patch. > Yes, we do need the patch, I was trying to check if we'll end up trying to uncharge a page that is not charged, just double checking > > [...] > > > > @@ -4610,6 +4637,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, > > > */ > > > if (page->mem_cgroup == mc.from) { > > > ret = MC_TARGET_PAGE; > > > + if (is_device_private_page(page) || > > > + is_device_public_page(page)) > > > + ret = MC_TARGET_DEVICE; > > > if (target) > > > target->page = page; > > > } > > > @@ -4669,6 +4699,11 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, > > > > > > ptl = pmd_trans_huge_lock(pmd, vma); > > > if (ptl) { > > > + /* > > > + * Note their can not be MC_TARGET_DEVICE for now as we do not > > there > > > + * support transparent huge page with MEMORY_DEVICE_PUBLIC or > > > + * MEMORY_DEVICE_PRIVATE but this might change. > > > > I am trying to remind myself why THP and MEMORY_DEVICE_* pages don't work well > > together today, the driver could allocate a THP size set of pages and migrate it. > > There are patches to do THP migration, not upstream yet. Could you remind me > > of any other limitations? > > No there is nothing that would be problematic AFAICT. Persistent memory already > use huge page so we should be in the clear. But i would rather enable that as > a separate patchset alltogether and have proper testing specificaly for such > scenario. Agreed Balbir Singh. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html