On Mon, 17 Dec 2018 at 15:56, James Morse <james.morse@xxxxxxx> wrote: > I think the root issue here is the name of the cpufeature 'RAS Extensions', this > doesn't mean RAS is new, or even requires these features. It's just standardised > records, classification and a barrier. > Not only is it possible to build a platform that supports RAS without this > extensions: there are at least three platforms out there that do! > > > On 15/12/2018 00:12, gengdongjiu wrote: > >> On Fri, 14 Dec 2018 at 13:56, James Morse <james.morse@xxxxxxx> wrote: > >>> On 14/12/2018 10:15, Dongjiu Geng wrote: > >>>> When user space do memory recovery, it will check whether KVM and > >>>> guest support the error recovery, only when both of them support, > >>>> user space will do the error recovery. This patch exports this > >>>> capability of KVM to user space. > >>> > >>> I can understand user-space only wanting to do the work if host and > >>> guest support the feature. But 'error recovery' isn't a KVM feature, > >>> its a Linux kernel feature. > > [...] > > > Thanks Peter's explanation. Frankly speaking, I agree Peter's suggestion. > > > > To James, I explain more to you, as peter said QEMU needs to check whether > > the guest CPU is a type which can handle the error though guest ACPI table. > > I don't think this really matters. Its only the NMIlike notifications that the > guest doesn't have to register or poll. The ones we support today extend the > architectures existing behaviour: you would have taken an external-abort on a > real system, whether you know about the additional metadata doesn't matter to Qemu. Consider the case where we booted the guest using a DTB and no ACPI table at all -- we certainly can't just call QEMU code that tries to add entries to a nonexistent table. My main point is that there needs to be logic in Dongjiu's QEMU patches that checks more than just "does this KVM feature exist". I'm not sufficiently familiar with all this RAS stuff to be certain what those checks should be and what the right choices are; I just know we need to check *something*... > > Let us see the X86's QEMU logic: > > 1. Before the vCPU created, it will set a default env->mcg_cap value with > > > MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports > > RAS error recovery.[1] 2. when the vCPU initialize, it will check whether host > > kernel support this feature[2]. Only when host kernel and default env->mcg_cap > > value all expected this feature, then it will setup vCPU support RAS error > > recovery[3]. > > This looks like KVM exposing a CPU capability to Qemu, which then configures the > behaviour KVM gives to the guest. This doesn't tell you anything about what the > guest supports. It tells you what the *guest CPU* supports, which for x86 is a combination of (a) what did the user/machine model ask for and (b) what can KVM actually implement. I don't much care whether the guest OS supports anything or not, that's its business... but it does seem odd to me that the equivalent Arm code is not similarly saying "what were we asked for, and what can we do?". I think one question here which it would be good to answer is: if we are modelling a guest and we haven't specifically provided it an ACPI table to tell it about memory errors, what do we do when we get a sigbus from the host? We have basically two choices: (1) send the guest an SError (aka asynchronous external abort) anyway (with no further info about what the memory error is) (2) just stop QEMU (as we would for a memory error in QEMU's own memory) thanks -- PMM