Re: [PATCH v3 0/7] hugetlbfs memory HW error fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello David,

I've finally tested many page mapping possibilities and tried to identify the error injection reaction on these pages to see if mmap() can be used to recover the impacted area. I'm using the latest upstream kernel I have for that: 6.12.0-rc7.master.20241117.ol9.x86_64 But I also got similar results with a kernel not supporting MADV_DONTNEED, for example: 5.15.0-301.163.5.2.el9uek.x86_64


Let's start with mapping a file without modifying the mapped area:
In this case we should have a clean page cache mapped in the process.
If an error is injected on this page, the kernel doesn't even inform the process about the error as the page is replaced (no matter if the mapping was shared of not).

The kernel indicates this situation with the following messages:

[10759.371701] Injecting memory failure at pfn 0x10d88e
[10759.374922] Memory failure: 0x10d88e: corrupted page was clean: dropped without side effects [10759.377525] Memory failure: 0x10d88e: recovery action for clean LRU page: Recovered


Now when the page content is modified, in the case of standard page size, we need to consider a MAP_PRIVATE or MAP_SHARED - in the case of a MAP_PRIVATE page, this page is corrupted and the modified data are lost, the kernel will use the SIGBUS mechanism to inform this process if needed. But remapping the area sweeps away the poisoned page, and allows the process to use the area.

- In the case of a MAP_SHARED page, if the content hasn't been sync'ed with the file backend, we also loose the modified data, and the kernel can also raise SIGBUS. Remapping the area recreates a page cache from the "on disk" file content, clearing the error.

In both cases, the kernel indicates messages like:
[41589.578750] Injecting memory failure for pfn 0x122105 at process virtual address 0x7f13bad55000 [41589.582237] Memory failure: 0x122105: Sending SIGBUS to testdh:7343 due to hardware memory corruption [41589.584907] Memory failure: 0x122105: recovery action for dirty LRU page: Recovered


Now in the case of hugetlbfs pages:
This case behaves the same way as the standard page size when using MAP_PRIVATE: mmap of the underlying file is able to sweep away the poisoned page. But the MAP_SHARED case is different: mmap() doesn't clear anything. fallocate() must be used.


In both cases, the kernel indicates messages like:
[89141.724295] Injecting memory failure for pfn 0x117800 at process virtual address 0x7fd148800000 [89141.727103] Memory failure: 0x117800: Sending SIGBUS to testdh:9480 due to hardware memory corruption [89141.729829] Memory failure: 0x117800: recovery action for huge page: Recovered

Conclusion:
We can't count on the mmap() method only for the hugetlbfs case with MAP_SHARED.

So According to these tests results, we should change the part of the qemu_ram_remap() function (in the 2nd patch) to something like:

+ if (ram_block_discard_range(block, offset + block->fd_offset,
+                                            length) != 0) {
+                    /*
+                     * Fold back to using mmap(), but it cannot repair a
+                     * shared hugetlbfs region. In this case we fail.
+                     */
+                    if (block->fd >= 0 && qemu_ram_is_shared(block) &&
+                        (length > TARGET_PAGE_SIZE)) {
+ error_report("Memory hugetlbfs poison recovery failure addr: "
+                                     RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+                                     length, addr);
+                        exit(1);
+                    }
+                    qemu_ram_remap_mmap(block, vaddr, page_size, offset);
+                    memory_try_enable_merging(vaddr, size);
+                    qemu_ram_setup_dump(vaddr, size);
                 }

This should also change the subsequent patch accordingly.

(As a side note about the 3rd patch, I'll also adjust the lp_msg[57] message size to 54 bytes instead (there is no '0x' prefix on the hexadecimal values and the message ends with a zero)

So if you agree with this v3 proposal (including the above modifications), I can submit a v4 version for integration.

Please let me know what you think about that, and if you see any additional change we should consider before the integration.

Thanks in advance,
William.





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux