On 2022/12/3 1:29, Jiaqi Yan wrote:
On Thu, Dec 1, 2022 at 6:25 PM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
On 2022/12/2 7:09, Andrew Morton wrote:
On Wed, 30 Nov 2022 16:59:30 -0800 Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote:
Make __collapse_huge_page_copy return whether copying anonymous pages
succeeded, and make collapse_huge_page handle the return status.
Break existing PTE scan loop into two for-loops. The first loop copies
source pages into target huge page, and can fail gracefully when running
into memory errors in source pages. If copying all pages succeeds, the
second loop releases and clears up these normal pages. Otherwise, the
second loop rolls back the page table and page states by:
- re-establishing the original PTEs-to-PMD connection.
- releasing source pages back to their LRU list.
Tested manually:
0. Enable khugepaged on system under test.
1. Start a two-thread application. Each thread allocates a chunk of
non-huge anonymous memory buffer.
2. Pick 4 random buffer locations (2 in each thread) and inject
uncorrectable memory errors at corresponding physical addresses.
3. Signal both threads to make their memory buffer collapsible, i.e.
calling madvise(MADV_HUGEPAGE).
4. Wait and check kernel log: khugepaged is able to recover from poisoned
pages and skips collapsing them.
5. Signal both threads to inspect their buffer contents and make sure no
data corruption.
Looks like a nice patchset. I'd like to give it a run in linux-next
but we're at -rc7 and we have no review/ack tags. So it should be a
post-6.2-rc1 thing.
I have a quibble.
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -361,6 +361,27 @@ static inline void copy_highpage(struct page *to, struct page *from)
#endif
+/*
+ * Machine check exception handled version of copy_highpage. Return number
+ * of bytes not copied if there was an exception; otherwise 0 for success.
+ * Note handling #MC requires arch opt-in.
+ */
+static inline int copy_mc_highpage(struct page *to, struct page *from)
+{
+ char *vfrom, *vto;
+ unsigned long ret;
+
+ vfrom = kmap_local_page(from);
+ vto = kmap_local_page(to);
+ ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE);
+ if (ret == 0)
+ kmsan_copy_page_meta(to, from);
+ kunmap_local(vto);
+ kunmap_local(vfrom);
+
+ return ret;
+}
Why inlined? It's large, it's slow, it's called only from
khugepaged.c. A regular out-of-line function which is static to
khugepaged.c seems more appropriate?
There is a similar function copy_mc_user_highpage(), could we reuse
it , see commit a873dfe1032a mm, hwpoison: try to recover from copy-on
write faults
To Kefeng: As I explained in v7, besides `to` and `from` pages,
copy_mc_user_highpage takes `struct vm_area_struct *vma` and `vaddr`.
While it fits __collapse_huge_page_copy, it doesn't really fit well
for collapse_file (needed for the 2nd commit). When Shi Yang reviewed
my patches, we agreed that we should borrow this opportunity to unify
the copying routines in khugepaged.c (for both file-backed and anon
memory), and copy_highpage fits both (alternatively we can use
copy_user_highpage and passing vaddr=null and vma=null, but I don't
like that). So I choose to make copy_highpage to be MC recoverable.
Does this make sense to you?
Yes,thanks for your explanation.
To Andrew: I think it is a reasonable "quibble". I will prepare the
update in v9 while waiting for more reviews on v8 if there is.
.