On 2023/9/20 22:02, ankita@xxxxxxxxxx wrote: > From: Ankit Agrawal <ankita@xxxxxxxxxx> > > The kernel MM currently does not handle ECC errors / poison on a memory > region that is not backed by struct pages. If a memory region is mapped > using remap_pfn_range(), but not added to the kernel, MM will not have > associated struct pages. Add a new mechanism to handle memory failure > on such memory. > > Make kernel MM expose a function to allow modules managing the device > memory to register a failure function and the physical address space > associated with the device memory. MM maintains this information as > interval tree. The registered memory failure function is used by MM to > notify the kernel module managing the PFN, so that the module may take > any required action. The module for example may use the information > to track the poisoned pages. > > In this implementation, kernel MM follows the following sequence similar > (mostly) to the memory_failure() handler for struct page backed memory: > 1. memory_failure() is triggered on reception of a poison error. An > absence of struct page is detected and consequently memory_failure_pfn() > is executed. > 2. memory_failure_pfn() call the newly introduced failure handler exposed > by the module managing the poisoned memory to notify it of the problematic > PFN. > 3. memory_failure_pfn() unmaps the stage-2 mapping to the PFN. > 4. memory_failure_pfn() collects the processes mapped to the PFN. > 5. memory_failure_pfn() sends SIGBUS (BUS_MCEERR_AO) to all the processes > mapping the faulty PFN using kill_procs(). > 6. An access to the faulty PFN by an operation in VM at a later point > is trapped and user_mem_abort() is called. > 7. The vma ops fault function gets called due to the absence of Stage-2 > mapping. It is expected to return VM_FAULT_HWPOISON on the PFN. > 8. __gfn_to_pfn_memslot() then returns KVM_PFN_ERR_HWPOISON, which cause > the poison with SIGBUS (BUS_MCEERR_AR) to be sent to the QEMU process > through kvm_send_hwpoison_signal(). > > Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx> Thanks for your patch. <snip> > /* > * Return values: > * 1: the page is dissolved (if needed) and taken off from buddy, > @@ -422,15 +428,15 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma, > * Schedule a process for later kill. > * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM. > * > - * Note: @fsdax_pgoff is used only when @p is a fsdax page and a > - * filesystem with a memory failure handler has claimed the > - * memory_failure event. In all other cases, page->index and > - * page->mapping are sufficient for mapping the page back to its > + * Notice: @pgoff is used either when @p is a fsdax page or a PFN is not > + * backed by struct page and a filesystem with a memory failure handler > + * has claimed the memory_failure event. In all other cases, page->index > + * and page->mapping are sufficient for mapping the page back to its > * corresponding user virtual address. > */ > static void __add_to_kill(struct task_struct *tsk, struct page *p, > struct vm_area_struct *vma, struct list_head *to_kill, > - unsigned long ksm_addr, pgoff_t fsdax_pgoff) > + unsigned long ksm_addr, pgoff_t pgoff) > { > struct to_kill *tk; > > @@ -440,13 +446,18 @@ static void __add_to_kill(struct task_struct *tsk, struct page *p, > return; > } > > - tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma); > - if (is_zone_device_page(p)) { > - if (fsdax_pgoff != FSDAX_INVALID_PGOFF) > - tk->addr = vma_pgoff_address(fsdax_pgoff, 1, vma); > - tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr); > - } else > - tk->size_shift = page_shift(compound_head(p)); > + if (vma->vm_flags | PFN_MAP) { if (vma->vm_flags | PFN_MAP)? So this branch is always selected? > + tk->addr = vma_pgoff_address(pgoff, 1, vma); > + tk->size_shift = PAGE_SHIFT; > + } else { > + tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma); > + if (is_zone_device_page(p)) { > + if (pgoff != FSDAX_INVALID_PGOFF) > + tk->addr = vma_pgoff_address(pgoff, 1, vma); > + tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr); > + } else > + tk->size_shift = page_shift(compound_head(p)); > + } > IIUC, the page passed to __add_to_kill is NULL in this case. So when tk->addr == -EFAULT, we will have problem to do the page_to_pfn(p) in the following pr_info: if (tk->addr == -EFAULT) { pr_info("Unable to find user space address %lx in %s\n", page_to_pfn(p), tsk->comm); > /* > * Send SIGKILL if "tk->addr == -EFAULT". Also, as > @@ -666,8 +677,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, > i_mmap_unlock_read(mapping); > } > <snip> > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -2183,6 +2271,11 @@ int memory_failure(unsigned long pfn, int flags) > if (!(flags & MF_SW_SIMULATED)) > hw_memory_failure = true; > > + if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) { Could it be better to add a helper here to detect the pfns without struct page? > + res = memory_failure_pfn(pfn, flags); > + goto unlock_mutex; > + } > + > p = pfn_to_online_page(pfn); > if (!p) { > res = arch_memory_failure(pfn, flags); > Thanks.