Re: PING: [PATCH] KVM: HWPoison: Fix memory address&size during remap

Eiichi Tsukata <eiichi.tsukata@xxxxxxxxxxx> · Thu, 4 Aug 2022 06:59:42 +0000

Hi 

We’ve also hit this case.

> On May 5, 2022, at 9:32, zhenwei pi <pizhenwei@xxxxxxxxxxxxx> wrote:
> 
> Hi, Paolo
> 
> I would appreciate it if you could review patch.
> 
> On 4/20/22 14:45, zhenwei pi wrote:
>> qemu exits during reset with log:
>> qemu-system-x86_64: Could not remap addr: 1000@22001000
>> Currently, after MCE on RAM of a guest, qemu records a ram_addr only,
>> remaps this address with a fixed size(TARGET_PAGE_SIZE) during reset.
>> In the hugetlbfs scenario, mmap(addr...) needs page_size aligned
>> address and correct size. Unaligned address leads mmap to fail.

As far as I checked, SIGBUS sent from memory_failure() due to PR_MCE_KILL_EARLY has aligned address
in siginfo. But SIGBUS sent from kvm_mmu_page_fault() has unaligned address. This happens only when Guest touches
poisoned pages before they get remapped. This is not a usual case but it can sometimes happen.

FYI: call path
       CPU 1/KVM-328915  [005] d..1. 711765.805910: signal_generate: sig=7 errno=0 code=4 comm=CPU 1/KVM pid=328915 grp=0 res=0
       CPU 1/KVM-328915  [005] d..1. 711765.805915: <stack trace>
 => trace_event_raw_event_signal_generate
 => __send_signal
 => do_send_sig_info
 => send_sig_mceerr
 => handle_abnormal_pfn
 => direct_page_fault
 => kvm_mmu_page_fault
 => kvm_arch_vcpu_ioctl_run
 => kvm_vcpu_ioctl
 => __x64_sys_ioctl
 => do_syscall_64

In addition, aligning length suppresses the following madvise error message in qemu_ram_setup_dump():

  qemu_madvise: Invalid argument
  madvise doesn't support MADV_DONTDUMP, but dump_guest_core=off specified

Thanks

Eiichi