On 2022/12/2 7:09, Andrew Morton wrote:
On Wed, 30 Nov 2022 16:59:30 -0800 Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote:
Make __collapse_huge_page_copy return whether copying anonymous pages
succeeded, and make collapse_huge_page handle the return status.
Break existing PTE scan loop into two for-loops. The first loop copies
source pages into target huge page, and can fail gracefully when running
into memory errors in source pages. If copying all pages succeeds, the
second loop releases and clears up these normal pages. Otherwise, the
second loop rolls back the page table and page states by:
- re-establishing the original PTEs-to-PMD connection.
- releasing source pages back to their LRU list.
Tested manually:
0. Enable khugepaged on system under test.
1. Start a two-thread application. Each thread allocates a chunk of
non-huge anonymous memory buffer.
2. Pick 4 random buffer locations (2 in each thread) and inject
uncorrectable memory errors at corresponding physical addresses.
3. Signal both threads to make their memory buffer collapsible, i.e.
calling madvise(MADV_HUGEPAGE).
4. Wait and check kernel log: khugepaged is able to recover from poisoned
pages and skips collapsing them.
5. Signal both threads to inspect their buffer contents and make sure no
data corruption.
Looks like a nice patchset. I'd like to give it a run in linux-next
but we're at -rc7 and we have no review/ack tags. So it should be a
post-6.2-rc1 thing.
I have a quibble.
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -361,6 +361,27 @@ static inline void copy_highpage(struct page *to, struct page *from)
#endif
+/*
+ * Machine check exception handled version of copy_highpage. Return number
+ * of bytes not copied if there was an exception; otherwise 0 for success.
+ * Note handling #MC requires arch opt-in.
+ */
+static inline int copy_mc_highpage(struct page *to, struct page *from)
+{
+ char *vfrom, *vto;
+ unsigned long ret;
+
+ vfrom = kmap_local_page(from);
+ vto = kmap_local_page(to);
+ ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE);
+ if (ret == 0)
+ kmsan_copy_page_meta(to, from);
+ kunmap_local(vto);
+ kunmap_local(vfrom);
+
+ return ret;
+}
Why inlined? It's large, it's slow, it's called only from
khugepaged.c. A regular out-of-line function which is static to
khugepaged.c seems more appropriate?
There is a similar function copy_mc_user_highpage(), could we reuse
it , see commit a873dfe1032a mm, hwpoison: try to recover from copy-on
write faults
.