Re: [PATCH v3 0/7] hugetlbfs memory HW error fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03.12.24 15:39, William Roche wrote:
On 12/3/24 15:08, David Hildenbrand wrote:
[...]

Let me take a look at your tool below if I can find an explanation of
what is happening, because it's weird :)

[...]


At the end of this email, I included the source code of a simplistic
test case that shows that the page is replaced in the case of standard
page size.

The idea of this test is simple:

1/ Create a local FILE with:
# dd if=/dev/zero of=./FILE bs=4k count=2
2+0 records in
2+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.000337674 s, 24.3 MB/s

2/ As root run:
# ./poisonedShared4k
Mapping 8192 bytes from file FILE
Reading and writing the first 2 pages content:
Read: Read: Wrote: Initial mem page 0
Wrote: Initial mem page 1
Data pages at 0x7f71a19d6000  physically 0x124fb0000
Data pages at 0x7f71a19d7000  physically 0x128ce4000
Poisoning 4k at 0x7f71a19d6000
Signal 7 received
     code 4        Signal code
     addr 0x7f71a19d6000    Memory location
     si_addr_lsb 12
siglongjmp used
Remapping the poisoned page
Reading and writing the first 2 pages content:
Read: Read: Initial mem page 1
Wrote: Rewrite mem page 0
Wrote: Rewrite mem page 1
Data pages at 0x7f71a19d6000  physically 0x10c367000
Data pages at 0x7f71a19d7000  physically 0x128ce4000


    ---

As we can see, this process:
- maps the FILE,
- tries to read and write the beginning of the first 2 pages
- gives their physical addresses
- poison the first page with a madvise(MADV_HWPOISON) call
- shows the SIGBUS signal received and recovers from it
- simply remaps the same page from the file
- tries again to read and write the beginning of the first 2 pages
- gives their physical addresses



Turns out the code will try to truncate the pagecache page using
mapping->a_ops->error_remove_folio().

That, however, is only implemented on *some* filesystems.

Most prominently, it is not implemented on shmem as well.


So if you run your test with shmem (e.g., /tmp/FILE), it doesn't work.

Correct, on tmpfs the test case fails to continue to use the memory area
and gets a SIGBUS.  And it works with xfs.




Using fallocate+MADV_DONTNEED seems to work on shmem.


Our new Qemu code is testing first the fallocate+MADV_DONTNEED procedure
for standard sized pages (in ram_block_discard_range()) and only folds
back to the mmap() use if it fails. So maybe my proposal to implement:

+                    /*
+                     * Fold back to using mmap(), but it should not
repair a
+                     * shared file memory region. In this case we fail.
+                     */
+                    if (block->fd >= 0 && qemu_ram_is_shared(block)) {
+                        error_report("Shared memory poison recovery
failure addr: "
+                                     RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+                                     length, addr);
+                        exit(1);
+                    }

Could be the right choice.

Right. But then, what about a mmap(MAP_PRIVATE, shmem), where the pagecache page is poisoned and needs an explicit fallocate? :)

It's all tricky. I wonder if we should just say "if it's backed by a file, and we cannot discard, then mmap() can't fix it reliably".

if (block->fd >= 0) {
	...
}

After all, we don't even expect the fallocate/MADV_DONTNEED to ever fail :) So I was also wondering if we could get rid of the mmap(MAP_FIXED) completely ... but who knows what older Linux kernels do.

--
Cheers,

David / dhildenb





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux