On 2018/1/3 21:31, Igor Mammedov wrote: > On Wed, 3 Jan 2018 10:21:06 +0800 > gengdongjiu <gengdongjiu@xxxxxxxxxx> wrote: > > [...] >>> >>>> In order to simulation, we hard code the error >>>> type to Multi-bit ECC. >>> Not sure what this is about, care to elaborate? >> >> please see Memory Error Record in [1], in which the "Memory Error Type" field is used to describe the >> error type, such as Multi-bit ECC or Parity Error etc. Because KVM or host does not pass the memory >> error type to Qemu, so Qemu does not know what is the error type for the memory section. Hence we let QEMU simulate >> the error type to Multi-bit ECC. > Agreed that in case of TCG qemu won't likely have any way to get hw error from kernel > so it could be useful only for testing purposes (i.e. 'make check' and/or testing > how guest OS handles errors) > > But with KVM in kernel it should be possible to fish error out from host kernel > and forward it to guest. If this are intended for handling HW errors, > I'm not sure that 'Multi-bit ECC' could replace all real errors reported by host > firmware. Thanks for the mail. I understand your meaning, I explain it more. (1). In fact the Memory Error type is not important to guest OS, when the OS(such as guest OS) do memory recovery, it does not uses the memory error type, OS(such as guest OS) mainly uses the memory_failure() function[1] to do recovery , In this function, it does not care what is the memory error type, It even does not know what is the memory error type. (2). If KVM forward the error type to guest, it needs more efforts, may be not worth to do. The real memory error type exists in host APEI table, only host APEI driver can get it, KVM can not directly get it. If forward it to guest, KVM needs to firstly get the error type from APEI driver and forward it to guest, which may be opposed by James(james.morse@xxxxxxx), I ever export more error information to guest, but James does not agree that. In the ARM64 platform, we do not have implementation to get the error information from the APEI driver to KVM or to other kernel modules. [1]: int memory_failure(unsigned long pfn, int trapno, int flags) { ...... } > > >> [1]: >> UEFI Spec 2.6 Errata A: >> >> "N.2.5 Memory Error Section" >> -----------------+---------------+--------------+-------------------------------------------+ >> Mnemonic | Byte Offset | Byte Length | Description | >> -----------------+---------------+--------------+-------------------------------------------+ >> ........ | ............ | ......... | ........... | >> -----------------+---------------+--------------+-------------------------------------------+ >> Memory Error Type| 72 | 1 |Identifies the type of error that occurred:| >> | | | 0 – Unknown | >> | | | 1 – No error | >> | | | 2 – Single-bit ECC | >> | | | 3 – Multi-bit ECC | >> | | | 4 – Single-symbol ChipKill ECC | >> | | | 5 – Multi-symbol ChipKill ECC | >> | | | 6 – Master abort | >> | | | 7 – Target abort | >> | | | 8 – Parity Error | >> | | | 9 – Watchdog timeout | >> | | | 10 – Invalid address | >> | | | 11 – Mirror Broken | >> | | | 12 – Memory Sparing | >> | | | 13 - Scrub corrected error | >> | | | 14 - Scrub uncorrected error | >> | | | 15 - Physical Memory Map-out event | >> | | | All other values reserved. | >> -----------------+---------------+--------------+-------------------------------------------+ >> ........ | ............ | ......... | ........... | >> -----------------+---------------+--------------+-------------------------------------------+ > [...] > > . >