On Wed, 30 Nov 2022 16:59:30 -0800 Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote: > Make __collapse_huge_page_copy return whether copying anonymous pages > succeeded, and make collapse_huge_page handle the return status. > > Break existing PTE scan loop into two for-loops. The first loop copies > source pages into target huge page, and can fail gracefully when running > into memory errors in source pages. If copying all pages succeeds, the > second loop releases and clears up these normal pages. Otherwise, the > second loop rolls back the page table and page states by: > - re-establishing the original PTEs-to-PMD connection. > - releasing source pages back to their LRU list. > > Tested manually: > 0. Enable khugepaged on system under test. > 1. Start a two-thread application. Each thread allocates a chunk of > non-huge anonymous memory buffer. > 2. Pick 4 random buffer locations (2 in each thread) and inject > uncorrectable memory errors at corresponding physical addresses. > 3. Signal both threads to make their memory buffer collapsible, i.e. > calling madvise(MADV_HUGEPAGE). > 4. Wait and check kernel log: khugepaged is able to recover from poisoned > pages and skips collapsing them. > 5. Signal both threads to inspect their buffer contents and make sure no > data corruption. Looks like a nice patchset. I'd like to give it a run in linux-next but we're at -rc7 and we have no review/ack tags. So it should be a post-6.2-rc1 thing. I have a quibble. > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -361,6 +361,27 @@ static inline void copy_highpage(struct page *to, struct page *from) > > #endif > > +/* > + * Machine check exception handled version of copy_highpage. Return number > + * of bytes not copied if there was an exception; otherwise 0 for success. > + * Note handling #MC requires arch opt-in. > + */ > +static inline int copy_mc_highpage(struct page *to, struct page *from) > +{ > + char *vfrom, *vto; > + unsigned long ret; > + > + vfrom = kmap_local_page(from); > + vto = kmap_local_page(to); > + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); > + if (ret == 0) > + kmsan_copy_page_meta(to, from); > + kunmap_local(vto); > + kunmap_local(vfrom); > + > + return ret; > +} Why inlined? It's large, it's slow, it's called only from khugepaged.c. A regular out-of-line function which is static to khugepaged.c seems more appropriate?