RE: mm/DAMON: Profiling enhancements for DAMON

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Resending as plain-text msg.

The profiling enhancements can be applied to Arm64 provided Arm updates the access bits at the higher levels of the page table tree. 


From: Prasad, Aravinda 
Sent: Monday, March 4, 2024 11:54 AM
To: 罗午阳 <luowuyang@xxxxxxxxxx>; 'damon@xxxxxxxxxxxxxxx' <damon@xxxxxxxxxxxxxxx>; 'linux-mm@xxxxxxxxx' <linux-mm@xxxxxxxxx>; 'sj@xxxxxxxxxx' <sj@xxxxxxxxxx>; 'linux-kernel@xxxxxxxxxxxxxxx' <linux-kernel@xxxxxxxxxxxxxxx>
Cc: 's2322819@xxxxxxxx' <s2322819@xxxxxxxx>; Kumar, Sandeep4 <sandeep4.kumar@xxxxxxxxx>; Huang, Ying <ying.huang@xxxxxxxxx>; Hansen, Dave <dave.hansen@xxxxxxxxx>; Williams, Dan J <dan.j.williams@xxxxxxxxx>; Subramoney, Sreenivas <sreenivas.subramoney@xxxxxxxxx>; Kervinen, Antti <antti.kervinen@xxxxxxxxx>; Kanevskiy, Alexander <alexander.kanevskiy@xxxxxxxxx>
Subject: RE: mm/DAMON: Profiling enhancements for DAMON

The profiling enhancements can be applied to Arm64 provided Arm updates the access bits at the higher levels of the page table tree. 

From: 罗午阳 <mailto:luowuyang@xxxxxxxxxx> 
Sent: Monday, March 4, 2024 7:25 AM
To: Prasad, Aravinda <mailto:aravinda.prasad@xxxxxxxxx>; 'damon@xxxxxxxxxxxxxxx' <mailto:damon@xxxxxxxxxxxxxxx>; 'linux-mm@xxxxxxxxx' <mailto:linux-mm@xxxxxxxxx>; 'sj@xxxxxxxxxx' <mailto:sj@xxxxxxxxxx>; 'linux-kernel@xxxxxxxxxxxxxxx' <mailto:linux-kernel@xxxxxxxxxxxxxxx>
Cc: 's2322819@xxxxxxxx' <mailto:s2322819@xxxxxxxx>; Kumar, Sandeep4 <mailto:sandeep4.kumar@xxxxxxxxx>; Huang, Ying <mailto:ying.huang@xxxxxxxxx>; Hansen, Dave <mailto:dave.hansen@xxxxxxxxx>; Williams, Dan J <mailto:dan.j.williams@xxxxxxxxx>; Subramoney, Sreenivas <mailto:sreenivas.subramoney@xxxxxxxxx>; Kervinen, Antti <mailto:antti.kervinen@xxxxxxxxx>; Kanevskiy, Alexander <mailto:alexander.kanevskiy@xxxxxxxxx>; 'alan.nair@xxxxxxxxx' <mailto:alan.nair@xxxxxxxxx>
Subject: mm/DAMON: Profiling enhancements for DAMON

>From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mailto:owner-linux-mm@xxxxxxxxx>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
        aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
        by smtp.lore.kernel.org (Postfix) with ESMTP id 8E415C4332F
        for <mailto:linux-mm@xxxxxxxxxxxxxxxxxxx>; Fri, 15 Dec 2023 07:42:32 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
        id 2AA048D011B; Fri, 15 Dec 2023 02:42:32 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
        id 25A3F8D0103; Fri, 15 Dec 2023 02:42:32 -0500 (EST)
X-Delivered-To: mailto:int-list-linux-mm@xxxxxxxxx
Received: by kanga.kvack.org (Postfix, from userid 63042)
        id 0FAAE8D011B; Fri, 15 Dec 2023 02:42:32 -0500 (EST)
X-Delivered-To: mailto:linux-mm@xxxxxxxxx
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
        by kanga.kvack.org (Postfix) with ESMTP id 015488D0103
        for <mailto:linux-mm@xxxxxxxxx>; Fri, 15 Dec 2023 02:42:31 -0500 (EST)
Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1])
        by unirelay06.hostedemail.com (Postfix) with ESMTP id C3ED6A1473
        for <mailto:linux-mm@xxxxxxxxx>; Fri, 15 Dec 2023 07:42:31 +0000 (UTC)
X-FDA: 81568260102.06.332ED91
Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20])
        by imf18.hostedemail.com (Postfix) with ESMTP id 7E84E1C0013
        for <mailto:linux-mm@xxxxxxxxx>; Fri, 15 Dec 2023 07:42:28 +0000 (UTC)
Authentication-Results: imf18.hostedemail.com;
        dkim=pass header.d=intel.com header.s=Intel header.b=GRCP+hgl;
        spf=pass (imf18.hostedemail.com: domain of mailto:aravinda.prasad@xxxxxxxxx designates 134.134.136.20 as permitted sender) mailto:smtp.mailfrom=aravinda.prasad@xxxxxxxxx;
        dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
        s=arc-20220608; t=1702626149;
        h=from:from:sender:reply-to:subject:subject:date:date:
        message-id:message-id:to:to:cc:cc:mime-version:mime-version:
        content-type:content-type:
        content-transfer-encoding:content-transfer-encoding:in-reply-to:
        references:dkim-signature; bh=Wy4maTp3FvZJPQpK4P6LsDTA5xIj7NHl9T0HbhkL9P8=;
        b=NeE4C9vWxoUWOS0ggq4DANoqevpO7pdntFz0IDSkK9aoi2z7p4oPBmjNb4b99b1S2oWIr7
        xm0r7BanY/8cn3g9UYvOmZCtJMhFhHR2A1obE1Ra6B5UxcNni+aKvZvuMFJzO4r6ROxp5G
        UMAGX1SdJNVxoaWu/JlVpLFPai7VxNY=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702626149; a=rsa-sha256;
        cv=none;
        b=HA9TVxgaSrSB6ocUwC+vkea+3BDfFrGoUFj9UFeI4c3XohbSo/oHCTAVnQph7LfGeoH4hN
        EMev80NQFWzg4V/4k5GRZ4vPkz67VDymKr2r/c+tInFyH2uTkWd2L7CsLPxfweulH8Bs3v
        VW59/+1ZumNFAI9+sCK73O38iRz7I/Y=
ARC-Authentication-Results: i=1;
        imf18.hostedemail.com;
        dkim=pass header.d=intel.com header.s=Intel header.b=GRCP+hgl;
        spf=pass (imf18.hostedemail.com: domain of mailto:aravinda.prasad@xxxxxxxxx designates 134.134.136.20 as permitted sender) mailto:smtp.mailfrom=aravinda.prasad@xxxxxxxxx;
        dmarc=pass (policy=none) header.from=intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; mailto:i=@intel.com; q=dns/txt; s=Intel;
  t=1702626148; x=1734162148;
  h=from:to:cc:subject:date:message-id:mime-version:
   content-transfer-encoding;
  bh=5yeRH864DCWoHH5VGhWvjXrR53z9cs1Wa1o3OaEo9mk=;
  b=GRCP+hglmyXrQvtRegbwnie32aMmxR8s2AyL1g/ABX56IdsgRtjpmGb6
   28IQebcgxHmw+F4BfpBsFZfkdcg6u+T2MddfnYTcklebirAilfVp4AXAK
   dcthPX6OiD295pAhRvnh1qu3VevVLtUbA2dsOMsIp2YGzuLsbRVhdMiC9
   iNGC1WOCAF2MWSJm0gnjbzeFU1Z/AuoATtRp0UaOHgqpDKRwbQzOiGJxL
   cixrQEK27lI8/gGk6drstARDhi5F7RhweUmzCkd5USLAdRwcQhqCOyuiD
   a+4Dmo/m0bMjpDeG+dv61JqnersnQ1WytgcXJr1b6EzcPQiE5I85e8yyj
   Q==;
X-IronPort-AV: E=McAfee;i="6600,9927,10924"; a="385660900"
X-IronPort-AV: E=Sophos;i="6.04,278,1695711600"; 
   d="scan'208";a="385660900"
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2023 23:42:26 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10924"; a="808877437"
X-IronPort-AV: E=Sophos;i="6.04,278,1695711600"; 
   d="scan'208";a="808877437"
Received: from adr-par-inspur1.iind.intel.com ([10.223.93.209])
  by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2023 23:42:22 -0800
From: Aravinda Prasad <mailto:aravinda.prasad@xxxxxxxxx>
To: mailto:damon@xxxxxxxxxxxxxxx,
        mailto:linux-mm@xxxxxxxxx,
        mailto:sj@xxxxxxxxxx,
        mailto:linux-kernel@xxxxxxxxxxxxxxx
Cc: mailto:aravinda.prasad@xxxxxxxxx,
        mailto:s2322819@xxxxxxxx,
        mailto:sandeep4.kumar@xxxxxxxxx,
        mailto:ying.huang@xxxxxxxxx,
        mailto:dave.hansen@xxxxxxxxx,
        mailto:dan.j.williams@xxxxxxxxx,
        mailto:sreenivas.subramoney@xxxxxxxxx,
        mailto:antti.kervinen@xxxxxxxxx,
        mailto:alexander.kanevskiy@xxxxxxxxx,
        Alan Nair <mailto:alan.nair@xxxxxxxxx>
Subject: mm/DAMON: Profiling enhancements for DAMON
Date: Fri, 15 Dec 2023 13:16:19 +0530
Message-Id: <mailto:20231215074619.173787-1-aravinda.prasad@xxxxxxxxx>
X-Mailer: git-send-email 2.21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Stat-Signature: njd78ou6rf9ifi3rdbxenntr3pxo4ykx
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: 7E84E1C0013
X-Rspam-User: 
X-HE-Tag: 1702626148-429191
X-HE-Meta: U2FsdGVkX1+6dS+zzuYhUKK1fckjYonolsfOKwHIRdhlL3PmF2YZV12/IQuMHZjBoOubfsb0wnqWad0ehMjxTfbBQwgWk3dV5uG3Y4IbbvU5tLc8qbZnO0Ee4Zp7Zh19vTj+2akAYLC/JyPR7A4e5DA4GzdIcWn2kBFzu5S7vsSFGWEKZxS9ehtm1A3lWjV2Jpb7oktgikefeh/9enpKiNwQoVSJ1VLE5tZZ2W09cwpEX6qKKf3geQoCaNZm1WlwO3bSmWSIDFBspALGwgTFHs150SxeHcyJrGf+PwGO2amK6PCwR3Chc5ndRhp1UIfy25oWaRqtTykpse+rxxdhOXYI5VbU/ZeGc5YrumEjE9fSjEt0JEFo0Z9MdqVhOPshVU7q1VNW1RGbzJ8/49JRVPqnMMUPv/QmF7EcESNPdEl4FHJHyTLdp60/IJMQFD9qSfxiQ6FJa256i+FhCfRAdGL+kKvRj4DOVTvU2zu1p0xgXWavQZIXiDVxNaC2unF6Z4PHv8HArGLID/I8H0SDiZ7GjBchZ4DYD4Fi3M5hYv+N2/5pvtE+UivZgBgPii8G4Z8g11Fx7XqAKYYD+YkZO+uORJPBd3gtHfJMh2HKrm8WfaC5EMeCf4XSCwM3b9RyPLmvJ2UyEePsqg7HqV8THTsheJ6WR+qiVYumw2c/vTlIifJAfLg5sL0+Gwv29OLZpGiZbCo2hs7TtLU7b+ZQij+FTEnMWrxBR+obk1CBeB+8rD65Z6M9tHMzSYRoXlYtKPqUFM7+l1yWsOUpmICZthaVC2O2hKlGrI0Uoh8ntDyWcfCkOP9pSl0Vsa51KcWBj2MwxaRQaFhgH6YQohotlAZYnLo/Kr5LVu1+RFG0Y6BImlrs7U+FkBpaoWqg4tT6RQpp3yIaTlUHUJPno28JhkGhp12CfAMlz7Ppwcm4Jz8+UX7CLEOgcPp9jCOxGSasCuyNX4yYjIKlnluTYfc
0IBo9o9n
m7YFShtzmzn4RpUsinYh0MO1JKUeUxCNVFCpj2OSch0n40KxF9nfikd6VMKaIrsykQzb5bzypGWgMfBFg9N/p2/bcmYmfQ4XSTmG0qWTSt8f/IIt/wY1UVg6orUmxcVhIgCACBaDqhVsu8gK8VR1nQxP9GLtmpJwWcG369lNsoSrBK9NNIXccLl47c15GnXzOmOS3
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: mailto:owner-linux-mm@xxxxxxxxx
Precedence: bulk
X-Loop: mailto:owner-majordomo@xxxxxxxxx
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@xxxxxxxxx>
List-Unsubscribe: <mailto:majordomo@xxxxxxxxx>

DAMON randomly samples one or more pages in every region and tracks
accesses to them using the ACCESSED bit in PTE (or PMD for 2MB pages).
When the region size is large (e.g., several GBs), which is common
for large footprint applications, detecting whether the region is
accessed or not completely depends on whether the pages that are
actively accessed in the region are picked during random sampling.
If such pages are not picked for sampling, DAMON fails to identify
the region as accessed. However, increasing the sampling rate or
increasing the number of regions increases CPU overheads of kdamond.

This patch proposes profiling different levels of the application’s
page table tree to detect whether a region is accessed or not. This
patch is based on the observation that, when the accessed bit for a
page is set, the accessed bits at the higher levels of the page table
tree (PMD/PUD/PGD) corresponding to the path of the page table walk
are also set. Hence, it is efficient to  check the accessed bits at
the higher levels of the page table tree to detect whether a region
is accessed or not. For example, if the access bit for a PUD entry
is set, then one or more pages in the 1GB PUD subtree is accessed as
each PUD entry covers 1GB mapping. Hence, instead of sampling
thousands of 4K/2M pages to detect accesses in a large region, 
sampling at the higher level of page table tree is faster and efficient.

This patch is based on 6.6.3 kernel.

TODO: Support 5-level page table tree

Evaluation:

- MASIM benchmark with 1GB, 10GB, 100GB footprint with 10% hot data
  and 5TB with 10GB hot data.
- DAMON: 5ms sampling, 200ms aggregation interval. Rest all
  parameters set to default value.
- DAMON+PTP: Page table profiling applied to DAMON with the above
  parameters.

Profiling efficiency in detecting hot data [1]:

Footprint      1GB     10GB    100GB   5TB
---------------------------------------------
DAMON          >90%    <50%    ~0%      0%
DAMON+PTP      >90%    >90%    >90%    >90%

CPU overheads (in billion cycles) for kdamond:

Footprint      1GB     10GB    100GB   5TB
---------------------------------------------
DAMON          1.15    19.53   3.52    9.55
DAMON+PTP      0.83    3.20   1.27    2.55

A detailed explanation and evaluation can be found in the arXiv paper:
[1] https://arxiv.org/pdf/2311.10275.pdf

Regards,
Aravinda

Signed-off-by: Alan Nair <mailto:alan.nair@xxxxxxxxx>
Signed-off-by: Sandeep Kumar <mailto:sandeep4.kumar@xxxxxxxxx>
Signed-off-by: Aravinda Prasad <mailto:aravinda.prasad@xxxxxxxxx>
---
arch/x86/include/asm/pgtable.h |   17 +++++
include/linux/damon.h          |   13 ++++
include/linux/pgtable.h        |   31 ++++++++
mm/damon/core.c                |   28 ++++++++
mm/damon/vaddr.c               |  146 +++++++++++++++++++++++++++++++++++++---
5 files changed, 223 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e02b179ec..accdabb95 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -169,6 +169,11 @@ static inline int pud_young(pud_t pud)
       return pud_flags(pud) & _PAGE_ACCESSED;
}

+static inline int pgd_young(pgd_t pgd)
+{
+       return pgd_flags(pgd) & _PAGE_ACCESSED;
+}
+
static inline int pte_write(pte_t pte)
{
       /*
@@ -681,6 +686,18 @@ static inline pud_t pud_mkwrite(pud_t pud)
       return pud_clear_saveddirty(pud);
}

+static inline pgd_t pgd_clear_flags(pgd_t pgd, pgdval_t clear)
+{
+       pgdval_t v = native_pgd_val(pgd);
+
+       return native_make_pgd(v & ~clear);
+}
+
+static inline pgd_t pgd_mkold(pgd_t pgd)
+{
+       return pgd_clear_flags(pgd, _PAGE_ACCESSED);
+}
+
#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
static inline int pte_soft_dirty(pte_t pte)
{
diff --git a/include/linux/damon.h b/include/linux/damon.h
index c70cca8a8..8521a62ec 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -19,6 +19,14 @@
/* Max priority score for DAMON-based operation schemes */
#define DAMOS_MAX_SCORE              (99)

+/* DAMON profiling levels */
+enum damon_profile_level {
+       PTE_LEVEL,
+       PMD_LEVEL,
+       PUD_LEVEL,
+       PGD_LEVEL,
+};
+
/* Get a random number in [l, r) */
static inline unsigned long damon_rand(unsigned long l, unsigned long r)
{
@@ -57,6 +65,8 @@ struct damon_region {
       unsigned int age;
/* private: Internal value for age calculation. */
       unsigned int last_nr_accesses;
+       /* Page table profiling level */
+       enum damon_profile_level profile_level;
};

 /**
@@ -656,6 +666,9 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
int damon_set_region_biggest_system_ram_default(struct damon_target *t,
                              unsigned long *start, unsigned long *end);

+enum damon_profile_level pick_profile_level(unsigned long start,
+              unsigned long end, unsigned long addr);
+
#endif /* CONFIG_DAMON */

 #endif /* _DAMON_H */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index af7639c3b..82d5f67ea 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -935,6 +935,37 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
#define flush_tlb_fix_spurious_fault(vma, address, ptep) flush_tlb_page(vma, address)
#endif

+/*
+ * When walking page tables, get the address of the current/passed boundary,
+ * or the start address of the range if that comes earlier.
+ */
+
+#define pgd_addr_start(addr, start)                 \
+({     unsigned long __boundary = (addr) & PGDIR_MASK;       \
+       (__boundary > start) ? __boundary : (start);  \
+})
+
+#ifndef p4d_addr_start
+#define p4d_addr_start(addr, start)                 \
+({     unsigned long __boundary = (addr) & P4D_MASK; \
+       (__boundary > start) ? __boundary : (start);  \
+})
+#endif
+
+#ifndef pud_addr_start
+#define pud_addr_start(addr, start)                 \
+({     unsigned long __boundary = (addr) & PUD_MASK; \
+       (__boundary > start) ? __boundary : (start);  \
+})
+#endif
+
+#ifndef pmd_addr_start
+#define pmd_addr_start(addr, start)                 \
+({     unsigned long __boundary = (addr) & PMD_MASK; \
+       (__boundary > start) ? __boundary : (start);  \
+})
+#endif
+
/*
  * When walking page tables, get the address of the next boundary,
  * or the end address of the range if that comes earlier.  Although no
diff --git a/mm/damon/core.c b/mm/damon/core.c
index fd5be73f6..2a7d5c041 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -31,6 +31,33 @@ static struct damon_operations damon_registered_ops[NR_DAMON_OPS];

 static struct kmem_cache *damon_region_cache __ro_after_init;

+/* Pick the highest possible page table profiling level for the
+ * region corresponding to addr
+ */
+enum damon_profile_level pick_profile_level(unsigned long start,
+       unsigned long end, unsigned long addr)
+{
+       enum damon_profile_level level = PTE_LEVEL;
+
+       if (pmd_addr_start(addr, (start) - 1) < start
+              || pmd_addr_end(addr, (end) + 1) > end)
+              goto out;
+       level = PMD_LEVEL;
+
+       if (pud_addr_start(addr, (start) - 1) < start
+              || pud_addr_end(addr, (end) + 1) > end)
+              goto out;
+       level = PUD_LEVEL;
+
+       if (pgd_addr_start(addr, (start) - 1) < start
+              || pgd_addr_end(addr, (end) + 1) > end)
+              goto out;
+       level = PGD_LEVEL;
+
+out:
+       return level;
+}
+
/* Should be called under damon_ops_lock with id smaller than NR_DAMON_OPS */
static bool __damon_is_registered_ops(enum damon_ops_id id)
{
@@ -132,6 +159,7 @@ struct damon_region *damon_new_region(unsigned long start, unsigned long end)

        region->age = 0;
       region->last_nr_accesses = 0;
+       region->profile_level = PTE_LEVEL;

        return region;
}
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index cf8a9fc5c..b71221b3e 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -387,16 +387,76 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask,
#define damon_mkold_hugetlb_entry NULL
#endif /* CONFIG_HUGETLB_PAGE */

-static const struct mm_walk_ops damon_mkold_ops = {
-       .pmd_entry = damon_mkold_pmd_entry,
+static int damon_mkold_pmd(pmd_t *pmd, unsigned long addr,
+       unsigned long next, struct mm_walk *walk)
+{
+       spinlock_t *ptl;
+
+       if (!pmd_present(*pmd) || pmd_none(*pmd))
+              goto out;
+
+       ptl = pmd_lock(walk->mm, pmd);
+       if (pmd_young(*pmd))
+              *pmd = pmd_mkold(*pmd);
+
+       spin_unlock(ptl);
+
+out:
+       return 0;
+}
+
+static int damon_mkold_pud(pud_t *pud, unsigned long addr,
+       unsigned long next, struct mm_walk *walk)
+{
+       spinlock_t *ptl;
+
+       if (!pud_present(*pud) || pud_none(*pud))
+              goto out;
+
+       ptl = pud_lock(walk->mm, pud);
+       if (pud_young(*pud))
+              *pud = pud_mkold(*pud);
+
+       spin_unlock(ptl);
+
+out:
+       return 0;
+}
+
+static int damon_mkold_pgd(pgd_t *pgd, unsigned long addr,
+       unsigned long next, struct mm_walk *walk)
+{
+
+       if (!pgd_present(*pgd) || pgd_none(*pgd))
+              goto out;
+
+       spin_lock(&pgd_lock);
+       if (pgd_young(*pgd))
+              *pgd = pgd_mkold(*pgd);
+
+       spin_unlock(&pgd_lock);
+
+out:
+       return 0;
+}
+
+static const struct mm_walk_ops damon_mkold_ops[] = {
+       {.pmd_entry = damon_mkold_pmd_entry,
       .hugetlb_entry = damon_mkold_hugetlb_entry,
-       .walk_lock = PGWALK_RDLOCK,
+       .walk_lock = PGWALK_RDLOCK},
+       {.pmd_entry = damon_mkold_pmd},
+       {.pud_entry = damon_mkold_pud},
+       {.pgd_entry = damon_mkold_pgd},
};

-static void damon_va_mkold(struct mm_struct *mm, unsigned long addr)
+static void damon_va_mkold(struct mm_struct *mm, struct damon_region *r)
{
+       unsigned long addr = r->sampling_addr;
+
+       r->profile_level = pick_profile_level(r->ar.start, r->ar.end, addr);
+
       mmap_read_lock(mm);
-       walk_page_range(mm, addr, addr + 1, &damon_mkold_ops, NULL);
+       walk_page_range(mm, addr, addr + 1, damon_mkold_ops + r->profile_level, NULL);
       mmap_read_unlock(mm);
}

@@ -409,7 +469,7 @@ static void __damon_va_prepare_access_check(struct mm_struct *mm,
{
       r->sampling_addr = damon_rand(r->ar.start, r->ar.end);

-       damon_va_mkold(mm, r->sampling_addr);
+       damon_va_mkold(mm, r);
}

 static void damon_va_prepare_access_checks(struct damon_ctx *ctx)
@@ -531,22 +591,84 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask,
#define damon_young_hugetlb_entry NULL
#endif /* CONFIG_HUGETLB_PAGE */

-static const struct mm_walk_ops damon_young_ops = {
-       .pmd_entry = damon_young_pmd_entry,
+static int damon_young_pmd(pmd_t *pmd, unsigned long addr,
+              unsigned long next, struct mm_walk *walk)
+{
+       spinlock_t *ptl;
+       struct damon_young_walk_private *priv = walk->private;
+
+       if (!pmd_present(*pmd) || pmd_none(*pmd))
+              goto out;
+
+       ptl = pmd_lock(walk->mm, pmd);
+       if (pmd_young(*pmd) || mmu_notifier_test_young(walk->mm, addr))
+              priv->young = true;
+
+       *priv->folio_sz = (1UL << PMD_SHIFT);
+       spin_unlock(ptl);
+out:
+       return 0;
+}
+
+static int damon_young_pud(pud_t *pud, unsigned long addr,
+              unsigned long next, struct mm_walk *walk)
+{
+       spinlock_t *ptl;
+       struct damon_young_walk_private *priv = walk->private;
+
+       if (!pud_present(*pud) || pud_none(*pud))
+              goto out;
+
+       ptl = pud_lock(walk->mm, pud);
+       if (pud_young(*pud) || mmu_notifier_test_young(walk->mm, addr))
+              priv->young = true;
+
+       *priv->folio_sz = (1UL << PUD_SHIFT);
+
+       spin_unlock(ptl);
+out:
+       return 0;
+}
+
+static int damon_young_pgd(pgd_t *pgd, unsigned long addr,
+              unsigned long next, struct mm_walk *walk)
+{
+       struct damon_young_walk_private *priv = walk->private;
+
+       if (!pgd_present(*pgd) || pgd_none(*pgd))
+              goto out;
+
+       spin_lock(&pgd_lock);
+       if (pgd_young(*pgd) || mmu_notifier_test_young(walk->mm, addr))
+              priv->young = true;
+
+       *priv->folio_sz = (1UL << PGDIR_SHIFT);
+
+       spin_unlock(&pgd_lock);
+out:
+       return 0;
+}
+
+static const struct mm_walk_ops damon_young_ops[] = {
+       {.pmd_entry = damon_young_pmd_entry,
       .hugetlb_entry = damon_young_hugetlb_entry,
-       .walk_lock = PGWALK_RDLOCK,
+       .walk_lock = PGWALK_RDLOCK},
+       {.pmd_entry = damon_young_pmd},
+       {.pud_entry = damon_young_pud},
+       {.pgd_entry = damon_young_pgd},
};

-static bool damon_va_young(struct mm_struct *mm, unsigned long addr,
+static bool damon_va_young(struct mm_struct *mm, struct damon_region *r,
              unsigned long *folio_sz)
{
+       unsigned long addr = r->sampling_addr;
       struct damon_young_walk_private arg = {
               .folio_sz = folio_sz,
              .young = false,
       };

        mmap_read_lock(mm);
-       walk_page_range(mm, addr, addr + 1, &damon_young_ops, &arg);
+       walk_page_range(mm, addr, addr + 1, damon_young_ops + r->profile_level, &arg);
       mmap_read_unlock(mm);
       return arg.young;
}
@@ -572,7 +694,7 @@ static void __damon_va_check_access(struct mm_struct *mm,
              return;
       }

-       last_accessed = damon_va_young(mm, r->sampling_addr, &last_folio_sz);
+       last_accessed = damon_va_young(mm, r, &last_folio_sz);
       if (last_accessed)
              r->nr_accesses++;

Hi,Aravinda.prasad,

Recently, we are researching this patch, what do you think of applying this patch to Arm64 ? and Do you have any suggestions?  

Thanks!
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/# 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux