On 3 January 2018 at 02:21, gengdongjiu <gengdongjiu@xxxxxxxxxx> wrote: > On 2017/12/28 22:18, Igor Mammedov wrote: >> On Thu, 28 Dec 2017 13:54:11 +0800 >> Dongjiu Geng <gengdongjiu@xxxxxxxxxx> wrote: >>> In order to simulation, we hard code the error >>> type to Multi-bit ECC. >> Not sure what this is about, care to elaborate? > > please see Memory Error Record in [1], in which the "Memory Error Type" field is used to describe the > error type, such as Multi-bit ECC or Parity Error etc. Because KVM or host does not pass the memory > error type to Qemu, so Qemu does not know what is the error type for the memory section. Hence we let QEMU simulate > the error type to Multi-bit ECC. > > [1]: > UEFI Spec 2.6 Errata A: > > "N.2.5 Memory Error Section" > -----------------+---------------+--------------+-------------------------------------------+ > Mnemonic | Byte Offset | Byte Length | Description | > -----------------+---------------+--------------+-------------------------------------------+ > ........ | ............ | ......... | ........... | > -----------------+---------------+--------------+-------------------------------------------+ > Memory Error Type| 72 | 1 |Identifies the type of error that occurred:| > | | | 0 – Unknown | > | | | 1 – No error | > | | | 2 – Single-bit ECC | > | | | 3 – Multi-bit ECC | > | | | 4 – Single-symbol ChipKill ECC | > | | | 5 – Multi-symbol ChipKill ECC | > | | | 6 – Master abort | > | | | 7 – Target abort | > | | | 8 – Parity Error | > | | | 9 – Watchdog timeout | > | | | 10 – Invalid address | > | | | 11 – Mirror Broken | > | | | 12 – Memory Sparing | > | | | 13 - Scrub corrected error | > | | | 14 - Scrub uncorrected error | > | | | 15 - Physical Memory Map-out event | > | | | All other values reserved. | > -----------------+---------------+--------------+-------------------------------------------+ > ........ | ............ | ......... | ........... | > -----------------+---------------+--------------+-------------------------------------------+ There's a value specified for "we don't know what the error type is", which is "0 - Unknown". I think we should use that rather than claiming that we have a particular type of error when we don't actually know that. I agree with James that we don't want to report a particular type of error to the guest anyway -- the VM is a virtual environment, and the exact reason why the hardware/firmware/host kernel have decided that a bit of RAM isn't usable any more doesn't matter to the guest. We just want to report "this RAM has gone away, sorry" to it. thanks -- PMM