On 9/9/24 1:43 AM, David Hildenbrand wrote:
On 06.09.24 19:59, Yang Shi wrote:
Enable MTE support for hugetlb.
The MTE page flags will be set on the folio only. When copying
hugetlb folio (for example, CoW), the tags for all subpages will be
copied
when copying the first subpage.
When freeing hugetlb folio, the MTE flags will be cleared.
Signed-off-by: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx>
---
arch/arm64/include/asm/hugetlb.h | 15 ++++++-
arch/arm64/include/asm/mman.h | 3 +-
arch/arm64/include/asm/mte.h | 67 ++++++++++++++++++++++++++++++++
arch/arm64/kernel/hibernate.c | 7 ++++
arch/arm64/kernel/mte.c | 25 +++++++++++-
arch/arm64/kvm/guest.c | 16 ++++++--
arch/arm64/kvm/mmu.c | 11 ++++++
arch/arm64/mm/copypage.c | 33 +++++++++++++---
fs/hugetlbfs/inode.c | 2 +-
9 files changed, 166 insertions(+), 13 deletions(-)
v3: * Fixed the build error when !CONFIG_ARM64_MTE.
* Incorporated the comment from David to have hugetlb folio
specific APIs for manipulating the page flags.
* Don't assume the first page is the head page since huge page copy
can start from any subpage.
v2: * Reimplemented the patch to fix the comments from Catalin.
* Added test cases (patch #2) per Catalin.
diff --git a/arch/arm64/include/asm/hugetlb.h
b/arch/arm64/include/asm/hugetlb.h
index 293f880865e8..06f621c5cece 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -11,6 +11,7 @@
#define __ASM_HUGETLB_H
#include <asm/cacheflush.h>
+#include <asm/mte.h>
#include <asm/page.h>
#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
@@ -18,9 +19,21 @@
extern bool arch_hugetlb_migration_supported(struct hstate *h);
#endif
+#ifdef CONFIG_ARM64_MTE
+#define CLEAR_FLAGS (BIT(PG_dcache_clean) | BIT(PG_mte_tagged) | \
+ BIT(PG_mte_lock))
+#else
+#define CLEAR_FLAGS BIT(PG_dcache_clean)
+#endif
+
static inline void arch_clear_hugetlb_flags(struct folio *folio)
{
- clear_bit(PG_dcache_clean, &folio->flags);
+ if (!system_supports_mte()) {
+ clear_bit(PG_dcache_clean, &folio->flags);
+ return;
+ }
+
+ folio->flags &= ~CLEAR_FLAGS;
In contrast to clear_bit, this is now not an atomic operation anymore.
Could we have concurrent modifications (locking the folio? mte?) where
we could mess up (IOW, is there a reason we don't do __clear_bit in
existing code)?
AFAICT, I don't see there is any concurrent modification to them.
arch_clear_hugetlb_flags() is called when:
* hugetlb folio is freed
* new hugetlb folio is enqueued (not available for allocation yet)
And is is protected by hugetlb_lock, all hugetlb folio allocation is
serialized by the same lock. PG_mte_* flags can't be set before hugetlb
folio is allocated. So I don't think atomic operation is actually
necessary. Not sure if I miss something or not.
But all arches use clear_bit(). As you suggested below, it may be safer
start with clear_bit().
Maybe start with:
static inline void arch_clear_hugetlb_flags(struct folio *folio)
{
clear_bit(PG_dcache_clean, &folio->flags);
#ifdef CONFIG_ARM64_MTE
if (system_supports_mte()) {
clear_bit(PG_mte_tagged, &folio->flags);
clear_bit(PG_mte_lock, &folio->flags);
}
#endif
}
And if you can argue that atomics are not required, convert all to
__clear_bit() and have the compiler optimize it for you.
}
#define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
> diff --git a/arch/arm64/include/asm/mte.h
b/arch/arm64/include/asm/mte.h
index 0f84518632b4..cec9fb6fec3b 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -41,6 +41,8 @@ void mte_free_tag_storage(char *storage);
static inline void set_page_mte_tagged(struct page *page)
{
+ VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
+
/*
* Ensure that the tags written prior to this function are visible
* before the page flags update.
@@ -51,6 +53,8 @@ static inline void set_page_mte_tagged(struct page
*page)
static inline bool page_mte_tagged(struct page *page)
{
+ VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
+
bool ret = test_bit(PG_mte_tagged, &page->flags);
/*
@@ -76,6 +80,8 @@ static inline bool page_mte_tagged(struct page *page)
*/
static inline bool try_page_mte_tagging(struct page *page)
{
+ VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
+
if (!test_and_set_bit(PG_mte_lock, &page->flags))
return true;
[...]
+static inline void set_folio_hugetlb_mte_tagged(struct folio *folio)
+{
+}
+
+static inline bool folio_hugetlb_mte_tagged(struct folio *folio)
+{
+ return false;
+}
+
+static inline bool try_folio_hugetlb_mte_tagging(struct folio *folio)
+{
+ return false;
+}
I would suggest to stick to the format of our folio_test / folio_set
... functions. Please refer to
folio_set_hugetlb_migratable/folio_set_hugetlb_hwpoison/ ...
Something like:
folio_test_hugetlb_mte_tagged
folio_set_hugetlb_mte_tagged
I don't have strong opinion on these two. They are straightforward
enough and the users can easily map them to the page_* variants.
But the semantics of try_folio_hugetlb_mte_tagging() are a bit less
obvious. I would suggest
folio_test_and_set_hugetlb_mte_lock()
This one is a little bit hard to map to try_page_mte_tagging() TBH.
We should probably clean up the page_* variants separately.
Probably. The most confusing one is actually try_page_mte_tagging(). We
should probably change it to test_and_set_page_mte_lock(). But anyway
they are just used by arm64, so I don't worry too much and I'd like to
hear from arm64 maintainers.
But ARM maintainers can feel free to intervene here.
+#endif
+
static inline void mte_disable_tco_entry(struct task_struct *task)
{
if (!system_supports_mte())
diff --git a/arch/arm64/kernel/hibernate.c
b/arch/arm64/kernel/hibernate.c
index 02870beb271e..ebf81fffa79d 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -266,10 +266,17 @@ static int swsusp_mte_save_tags(void)
max_zone_pfn = zone_end_pfn(zone);
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
+ struct folio *folio;
if (!page)
continue;
Nit: I would drop this empty line.
OK
+ folio = page_folio(page);
+
+ if (folio_test_hugetlb(folio) &&
+ !folio_hugetlb_mte_tagged(folio))
+ continue;
+
if (!page_mte_tagged(page))
continue;
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 6174671be7c1..c8b13bf36fc6 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -38,7 +38,22 @@ EXPORT_SYMBOL_GPL(mte_async_or_asymm_mode);
void mte_sync_tags(pte_t pte, unsigned int nr_pages)
{
struct page *page = pte_page(pte);
- unsigned int i;
+ struct folio *folio = page_folio(page);
+ unsigned long i;
+
+ if (folio_test_hugetlb(folio)) {
+ unsigned long nr = folio_nr_pages(folio);
Nit: empty line please.
OK
+ /* Hugetlb MTE flags are set for head page only */
+ if (try_folio_hugetlb_mte_tagging(folio)) {
+ for (i = 0; i < nr; i++, page++)
+ mte_clear_page_tags(page_address(page));
+ set_folio_hugetlb_mte_tagged(folio);
+ }
+
+ smp_wmb();
We already do have one in set_folio_hugetlb_mte_tagged() [and
try_folio_hugetlb_mte_tagging() does some magic as well], do we really
need this smp_wmb()?
In general, I think checkpatch will tell you to document memory
barriers and their counterparts thoroughly.
It is not used to guarantee the PG_mte_tagged flag is visible, it is
used to guarantee the tags are visible before PTE is set. The
non-hugetlb path has:
/* ensure the tags are visible before the PTE is set */
smp_wmb();
I should keep the comment for hugetlb path as well.
+
+ return;
+ }