Re: [PATCH 3/5] booke: define reset and shutdown hcalls

Alexander Graf <agraf@xxxxxxx> · Wed, 17 Jul 2013 17:41:34 +0200

On 17.07.2013, at 17:36, Yoder Stuart-B08248 wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@xxxxxxx]
>> Sent: Wednesday, July 17, 2013 10:21 AM
>> To: Yoder Stuart-B08248
>> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@xxxxxxxxxxxxxxx; kvm-ppc@xxxxxxxxxxxxxxx; Gleb Natapov
>> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
>> 
>> 
>> On 17.07.2013, at 17:19, Yoder Stuart-B08248 wrote:
>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Alexander Graf [mailto:agraf@xxxxxxx]
>>>> Sent: Wednesday, July 17, 2013 7:19 AM
>>>> To: Gleb Natapov
>>>> Cc: Wood Scott-B07421; Bhushan Bharat-R65777; kvm@xxxxxxxxxxxxxxx; kvm-ppc@xxxxxxxxxxxxxxx; Yoder
>>>> Stuart-B08248; Bhushan Bharat-R65777
>>>> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
>>>> 
>>>> 
>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>>>> 
>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>>>> There
>>>>>>>>> is zero probability x86 will implement those for instance
>>>>>>>> 
>>>>>>>> This is similar to the question of whether to keep device API
>>>>>>>> enumerations per-architecture...  It costs very little to keep it in
>>>>>>>> a common place, and it's hard to go back in the other direction if
>>>>>>>> we later realize there are things that should be shared.
>>>>>>>> 
>>>>>>> This is different from device API since with device API all arches
>>>>>>> have
>>>>>>> to create/destroy devices, so it make sense to put device lifecycle
>>>>>>> management into the common code, and device API has single entry point
>>>>>>> to the code - device fd ioctl - where it makes sense to handle common
>>>>>>> tasks, if any, and despatch others to specific device implementation.
>>>>>>> 
>>>>>>> This is totally unlike hypercalls which are, by definition, very
>>>>>>> architecture specific (the way they are triggered, the way parameter
>>>>>>> are passed from guest to host, what hypercalls arch needs...).
>>>>>> 
>>>>>> The ABI is architecture specific.  The API doesn't need to be, any
>>>>>> more than it does with syscalls (I consider the
>>>>>> architecture-specific definition of syscall numbers and similar
>>>>>> constants in Linux to be unfortunate, especially for tools such as
>>>>>> strace or QEMU's linux-user emulation).
>>>>>> 
>>>>> Unlike syscalls different arches have very different ideas what
>>>>> hypercalls they need to implement, so while with unified syscall space I
>>>>> can see how it may benefit (very) small number of tools, I do not see
>>>>> what advantage it will give us. The disadvantage is one more global name
>>>>> space to manage.
>>>>> 
>>>>>>>> Keeping it in a common place also makes it more visible to people
>>>>>>>> looking to add new hcalls, which could cut down on reinventing the
>>>>>>>> wheel.
>>>>>>> I do not want other arches to start using hypercalls in the way
>>>>>>> powerpc
>>>>>>> started to use them: separate device io space, so it is better to hide
>>>>>>> this as far away from common code as possible :) But on a more serious
>>>>>>> note hypercalls should be a last resort and added only when no other
>>>>>>> possibility exists, so people should not look what hcalls others
>>>>>>> implemented, so they can add them to their favorite arch, but they
>>>>>>> should have a problem at hand that they cannot solve without
>>>>>>> hcall, but
>>>>>>> at this point they will have pretty good idea what this hcall
>>>>>>> should do.
>>>>>> 
>>>>>> Why are hcalls such a bad thing?
>>>>>> 
>>>>> Because they often used to do non architectural things making OSes
>>>>> behave different from how they runs on real HW and real HW is what
>>>>> OSes are designed and tested for. Example: there once was a KVM (XEN
>>>>> have/had similar one) hypercall to accelerate MMU operation.  One thing it
>>>>> allowed is to to flush tlb without doing IPI if vcpu is not running. Later
>>>>> optimization was added to Linux MMU code that _relies_ on those IPIs for
>>>>> synchronisation. Good that at that point those hypercalls were already
>>>>> deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
>>>>> brings me to another point: they often get obsoleted by code improvement
>>>>> and HW advancement (happened to aforementioned MMU hypercalls), but they
>>>>> hard to deprecate if hypervisor supports live migration, without live
>>>>> migration it is less of a problem. Next point is that people often try
>>>>> to use them instead of emulate PV or real device just because they
>>>>> think it is easier, but it is often not so. Example: pvpanic device was
>>>>> initially proposed as hypercall, so lets say we would implement it as
>>>>> such. It would have been KVM specific, implementation would touch core
>>>>> guest KVM code and would have been Linux guest specific. Instead it was
>>>>> implemented as platform device with very small platform driver confined
>>>>> in drivers/ directory, immediately usable by XEN and QEMU tcg in addition
>>>> 
>>>> This is actually a very good point. How do we support reboot and shutdown for TCG guests? We surely
>>>> don't want to expose TCG as KVM hypervisor.
>>> 
>>> Hmm...so are you proposing that we abandon the current approach,
>>> and switch to a device-based mechanism for reboot/shutdown?
>> 
>> Reading Gleb's email it sounds like the more future proof approach, yes. I'm not quite sure yet where we
>> should plug this though.
> 
> What do you mean...where the paravirt device would go in the physical
> address map??

Right. Either we

  - let the guest decide (PCI)
  - let QEMU decide, but potentially break the SoC layout (SysBus)
  - let QEMU decide, but only for the virt machine so that we don't break anyone (PlatBus)

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html