From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> For VM-based confidential computing (AMD SEV-SNP and Intel TDX), the KVM guest memfd effort [1] is ongoing. It allows KVM guests to use file descriptors and their offset as protected guest memory without user-space virtual address mapping. Intel TDX uses machine checks to notify the host OS/VMM that the TDX guest consumed corrupted memory. The KVM/x86 handles machine checks for guest vcpu specially. It sets up the guest so that vcpu exits from running on machine check, checks the exit reason, and manually raises the machine check by calling do_machine_check(). To test the KVM machine check path, KVM wants to poison memory based on file descriptor and its offset. The current memory poisoning is based on the physical address, /sys/kernel/debug/hwpoison/{corrupt-pfn, unpoison-pfn}, or the virtual address, MADV_HWPOISON and MADV_UNPOISON. Add new flags FADV_HWPOISON, and FADV_UNPOISON to posix_fadvise() by following MADV_HWPOISON and MADV_UNPOISON. 9893e49d64a4 ("HWPOISON: Add madvise() based injector for hardware poisoned pages v4") The possible options would be - Add FADV flags for memory poison. This patch. - introduce IOCTL specific to KVM guest memfd - introduce debugfs entries for KVM guest memfd /sys/kernel/debug/kvm/<pid>-<vm-fd>/guest-memfd<fd>/hwpoison/ {corrupt-offset, unoison-offset}. This options follows /sys/kernel/debug/hwpoison/{corrupt-pfn, unpoison-pfn} [1] https://lore.kernel.org/all/20230914015531.1419405-1-seanjc@xxxxxxxxxx/ KVM guest_memfd() and per-page attributes https://lore.kernel.org/all/20230921203331.3746712-1-seanjc@xxxxxxxxxx/ [PATCH 00/13] KVM: guest_memfd fixes Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> --- include/uapi/linux/fadvise.h | 4 +++ mm/fadvise.c | 58 +++++++++++++++++++++++++++++++++++- 2 files changed, 61 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h index 0862b87434c2..3699b4b0adcd 100644 --- a/include/uapi/linux/fadvise.h +++ b/include/uapi/linux/fadvise.h @@ -19,4 +19,8 @@ #define POSIX_FADV_NOREUSE 5 /* Data will be accessed once. */ #endif +/* Same to MADV_HWPOISON and MADV_SOFT_OFFLINE */ +#define FADV_HWPOISON 100 /* poison a page for testing */ +#define FADV_SOFT_OFFLINE 101 /* soft offline page for testing */ + #endif /* FADVISE_H_INCLUDED */ diff --git a/mm/fadvise.c b/mm/fadvise.c index 6c39d42f16dc..1f028a6e1d90 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -18,11 +18,62 @@ #include <linux/writeback.h> #include <linux/syscalls.h> #include <linux/swap.h> +#include <linux/hugetlb.h> #include <asm/unistd.h> #include "internal.h" +static int fadvise_inject_error(struct file *file, struct address_space *mapping, + loff_t offset, off_t endbyte, int advice) +{ + pgoff_t start_index, end_index, index, next; + struct folio *folio; + unsigned int shift; + unsigned long pfn; + int ret; + + if (!IS_ENABLED(CONFIG_MEMORY_FAILURE)) + return -EOPNOTSUPP; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (is_file_hugepages(file)) + shift = huge_page_shift(hstate_file(file)); + else + shift = PAGE_SHIFT; + start_index = offset >> shift; + end_index = endbyte >> shift; + + index = start_index; + while (index <= end_index) { + folio = filemap_get_folio(mapping, index); + if (IS_ERR(folio)) + return PTR_ERR(folio); + + next = folio_next_index(folio); + pfn = folio_pfn(folio); + if (advice == FADV_SOFT_OFFLINE) { + pr_info("Soft offlining pfn %#lx at file index %#lx\n", + pfn, index); + ret = soft_offline_page(pfn, MF_COUNT_INCREASED); + } else { + pr_info("Injecting memory failure for pfn %#lx at file index %#lx\n", + pfn, index); + ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED); + if (ret == -EOPNOTSUPP) + ret = 0; + } + + if (ret) + return ret; + index = next; + } + + return 0; +} + /* * POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could * deactivate the pages and clear PG_Referenced. @@ -57,11 +108,13 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) case POSIX_FADV_NOREUSE: case POSIX_FADV_DONTNEED: /* no bad return value, but ignore advice */ + return 0; + case FADV_HWPOISON: + case FADV_SOFT_OFFLINE: break; default: return -EINVAL; } - return 0; } /* @@ -170,6 +223,9 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) } } break; + case FADV_HWPOISON: + case FADV_SOFT_OFFLINE: + return fadvise_inject_error(file, mapping, offset, endbyte, advice); default: return -EINVAL; } -- 2.25.1