On Thu, Dec 1, 2022 at 6:25 PM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > > On 2022/12/2 7:09, Andrew Morton wrote: > > On Wed, 30 Nov 2022 16:59:30 -0800 Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote: > > > >> Make __collapse_huge_page_copy return whether copying anonymous pages > >> succeeded, and make collapse_huge_page handle the return status. > >> > >> Break existing PTE scan loop into two for-loops. The first loop copies > >> source pages into target huge page, and can fail gracefully when running > >> into memory errors in source pages. If copying all pages succeeds, the > >> second loop releases and clears up these normal pages. Otherwise, the > >> second loop rolls back the page table and page states by: > >> - re-establishing the original PTEs-to-PMD connection. > >> - releasing source pages back to their LRU list. > >> > >> Tested manually: > >> 0. Enable khugepaged on system under test. > >> 1. Start a two-thread application. Each thread allocates a chunk of > >> non-huge anonymous memory buffer. > >> 2. Pick 4 random buffer locations (2 in each thread) and inject > >> uncorrectable memory errors at corresponding physical addresses. > >> 3. Signal both threads to make their memory buffer collapsible, i.e. > >> calling madvise(MADV_HUGEPAGE). > >> 4. Wait and check kernel log: khugepaged is able to recover from poisoned > >> pages and skips collapsing them. > >> 5. Signal both threads to inspect their buffer contents and make sure no > >> data corruption. > > Looks like a nice patchset. I'd like to give it a run in linux-next > > but we're at -rc7 and we have no review/ack tags. So it should be a > > post-6.2-rc1 thing. > > > > I have a quibble. > > > >> --- a/include/linux/highmem.h > >> +++ b/include/linux/highmem.h > >> @@ -361,6 +361,27 @@ static inline void copy_highpage(struct page *to, struct page *from) > >> > >> #endif > >> > >> +/* > >> + * Machine check exception handled version of copy_highpage. Return number > >> + * of bytes not copied if there was an exception; otherwise 0 for success. > >> + * Note handling #MC requires arch opt-in. > >> + */ > >> +static inline int copy_mc_highpage(struct page *to, struct page *from) > >> +{ > >> + char *vfrom, *vto; > >> + unsigned long ret; > >> + > >> + vfrom = kmap_local_page(from); > >> + vto = kmap_local_page(to); > >> + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); > >> + if (ret == 0) > >> + kmsan_copy_page_meta(to, from); > >> + kunmap_local(vto); > >> + kunmap_local(vfrom); > >> + > >> + return ret; > >> +} > > Why inlined? It's large, it's slow, it's called only from > > khugepaged.c. A regular out-of-line function which is static to > > khugepaged.c seems more appropriate? > > There is a similar function copy_mc_user_highpage(), could we reuse > it , see commit a873dfe1032a mm, hwpoison: try to recover from copy-on > write faults > > To Kefeng: As I explained in v7, besides `to` and `from` pages, copy_mc_user_highpage takes `struct vm_area_struct *vma` and `vaddr`. While it fits __collapse_huge_page_copy, it doesn't really fit well for collapse_file (needed for the 2nd commit). When Shi Yang reviewed my patches, we agreed that we should borrow this opportunity to unify the copying routines in khugepaged.c (for both file-backed and anon memory), and copy_highpage fits both (alternatively we can use copy_user_highpage and passing vaddr=null and vma=null, but I don't like that). So I choose to make copy_highpage to be MC recoverable. Does this make sense to you? To Andrew: I think it is a reasonable "quibble". I will prepare the update in v9 while waiting for more reviews on v8 if there is.