On Mon, Oct 17, 2022 at 04:42:03PM -0700, Tony Luck wrote: > If the kernel is copying a page as the result of a copy-on-write > fault and runs into an uncorrectable error, Linux will crash because > it does not have recovery code for this case where poison is consumed > by the kernel. > > It is easy to set up a test case. Just inject an error into a private > page, fork(2), and have the child process write to the page. > > I wrapped that neatly into a test at: > > git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git > > just enable ACPI error injection and run: > > # ./einj_mem-uc -f copy-on-write > > [Note this test needs some better reporting for the case where this > patch has been applied and the system does NOT crash] > > Patch below works ... but there are probably many places where it could > fit better into the general "mm" way of doing things. E.g. using the > copy_mc_to_kernel() function does what I need here, but the name doesn't > seem like it is quite right. As for the name "copy_user_highpage_mc", it simply sounds like an mcsafe variant of copy_user_highpage, so I have no objection. Recently similar routine is suggested in https://lore.kernel.org/linux-mm/20221010160142.1087120-2-jiaqiyan@xxxxxxxxxx/ , so maybe we need some coordination between related interfaces. > > Basic idea is very simple ... if the kernel gets a machine check copying > the page, just free up the new page that was going to be the target of > the copy and return VM_FAULT_HWPOISON to the calling stack. > > Slightly-signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> I'm basically supportive to what this patch intends to do. > --- > include/linux/highmem.h | 19 +++++++++++++++++++ > mm/memory.c | 28 ++++++++++++++++++++-------- > 2 files changed, 39 insertions(+), 8 deletions(-) > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index e9912da5441b..5967541fbf0e 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -319,6 +319,25 @@ static inline void copy_user_highpage(struct page *to, struct page *from, > > #endif > > +static inline int copy_user_highpage_mc(struct page *to, struct page *from, > + unsigned long vaddr, struct vm_area_struct *vma) > +{ > + unsigned long ret = 0; > +#ifdef copy_mc_to_kernel > + char *vfrom, *vto; > + > + vfrom = kmap_local_page(from); > + vto = kmap_local_page(to); > + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); > + kunmap_local(vto); > + kunmap_local(vfrom); > +#else > + copy_user_highpage(to, from, vaddr, vma); > +#endif > + > + return ret; > +} > + > #ifndef __HAVE_ARCH_COPY_HIGHPAGE > > static inline void copy_highpage(struct page *to, struct page *from) > diff --git a/mm/memory.c b/mm/memory.c > index f88c351aecd4..b5e22bf4c10a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2848,8 +2848,14 @@ static inline int pte_unmap_same(struct vm_fault *vmf) > return same; > } > > -static inline bool __wp_page_copy_user(struct page *dst, struct page *src, > - struct vm_fault *vmf) > +/* > + * Return: > + * -1 = copy failed due to poison in source page Simply calling "poison" might cause confusion with page poisoning feature, so "hwpoison" might be better. But I know that "poison" is commonly used under arch/x86, and it's not clear to me what to do with this terminology. Maybe using -EHWPOISON instead of -1 might be helpful to the distinction. > + * 0 = copied failed (some other reason) > + * 1 = copied succeeded > + */ > +static inline int __wp_page_copy_user(struct page *dst, struct page *src, > + struct vm_fault *vmf) > { > bool ret; > void *kaddr; ... > @@ -3121,7 +3129,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) > if (!new_page) > goto oom; > > - if (!__wp_page_copy_user(new_page, old_page, vmf)) { > + ret = __wp_page_copy_user(new_page, old_page, vmf); > + if (ret == -1) { > + put_page(new_page); Maybe I miss something, but don't you have to care about refcount of old_page in this branch (as done in "ret == 0" branch)? Thanks, Naoya Horiguchi > + return VM_FAULT_HWPOISON; > + } else if (ret == 0) { > /* > * COW failed, if the fault was solved by other, > * it's fine. If not, userspace would re-fault on > -- > 2.37.3