Re: [PATCH 3/5] booke: define reset and shutdown hcalls

Alexander Graf <agraf@xxxxxxx> · Wed, 17 Jul 2013 18:04:08 +0200

On 17.07.2013, at 17:59, Bhushan Bharat-R65777 wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@xxxxxxx]
>> Sent: Wednesday, July 17, 2013 9:22 PM
>> To: Bhushan Bharat-R65777
>> Cc: Yoder Stuart-B08248; Wood Scott-B07421; kvm@xxxxxxxxxxxxxxx; kvm-
>> ppc@xxxxxxxxxxxxxxx; Gleb Natapov
>> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls
>> 
>> 
>> On 17.07.2013, at 17:47, Bhushan Bharat-R65777 wrote:
>> 
>>> 
>>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>>>>>>>> 
>>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>>>>>>>> There
>>>>>>>>>>>>> is zero probability x86 will implement those for instance
>>>>>>>>>>>> 
>>>>>>>>>>>> This is similar to the question of whether to keep device API
>>>>>>>>>>>> enumerations per-architecture...  It costs very little to
>>>>>>>>>>>> keep it in a common place, and it's hard to go back in the
>>>>>>>>>>>> other direction if we later realize there are things that should be
>> shared.
>>>>>>>>>>>> 
>>>>>>>>>>> This is different from device API since with device API all
>>>>>>>>>>> arches have to create/destroy devices, so it make sense to put
>>>>>>>>>>> device lifecycle management into the common code, and device
>>>>>>>>>>> API has single entry point to the code - device fd ioctl -
>>>>>>>>>>> where it makes sense to handle common tasks, if any, and
>>>>>>>>>>> despatch others to specific device implementation.
>>>>>>>>>>> 
>>>>>>>>>>> This is totally unlike hypercalls which are, by definition,
>>>>>>>>>>> very architecture specific (the way they are triggered, the
>>>>>>>>>>> way parameter are passed from guest to host, what hypercalls arch
>> needs...).
>>>>>>>>>> 
>>>>>>>>>> The ABI is architecture specific.  The API doesn't need to be,
>>>>>>>>>> any more than it does with syscalls (I consider the
>>>>>>>>>> architecture-specific definition of syscall numbers and similar
>>>>>>>>>> constants in Linux to be unfortunate, especially for tools such
>>>>>>>>>> as strace or QEMU's linux-user emulation).
>>>>>>>>>> 
>>>>>>>>> Unlike syscalls different arches have very different ideas what
>>>>>>>>> hypercalls they need to implement, so while with unified syscall
>>>>>>>>> space I can see how it may benefit (very) small number of tools,
>>>>>>>>> I do not see what advantage it will give us. The disadvantage is
>>>>>>>>> one more global name space to manage.
>>>>>>>>> 
>>>>>>>>>>>> Keeping it in a common place also makes it more visible to
>>>>>>>>>>>> people looking to add new hcalls, which could cut down on
>>>>>>>>>>>> reinventing the wheel.
>>>>>>>>>>> I do not want other arches to start using hypercalls in the
>>>>>>>>>>> way powerpc started to use them: separate device io space, so
>>>>>>>>>>> it is better to hide this as far away from common code as
>>>>>>>>>>> possible :) But on a more serious note hypercalls should be a
>>>>>>>>>>> last resort and added only when no other possibility exists,
>>>>>>>>>>> so people should not look what hcalls others implemented, so
>>>>>>>>>>> they can add them to their favorite arch, but they should have
>>>>>>>>>>> a problem at hand that they cannot solve without hcall, but at
>>>>>>>>>>> this point they will have pretty good idea what this hcall should do.
>>>>>>>>>> 
>>>>>>>>>> Why are hcalls such a bad thing?
>>>>>>>>>> 
>>>>>>>>> Because they often used to do non architectural things making
>>>>>>>>> OSes behave different from how they runs on real HW and real HW
>>>>>>>>> is what OSes are designed and tested for. Example: there once
>>>>>>>>> was a KVM (XEN have/had similar one) hypercall to accelerate MMU
>> operation.
>>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if
>>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU
>>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good that
>>>>>>>>> at that point those hypercalls were already deprecated on KVM
>>>>>>>>> (IIRC XEN was broke for some time in that regard). Which brings
>>>>>>>>> me to another point: they often get obsoleted by code
>>>>>>>>> improvement and HW advancement (happened to aforementioned MMU
>>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports
>>>>>>>>> live migration, without live migration it is less of a problem.
>>>>>>>>> Next point is that people often try to use them instead of
>>>>>>>>> emulate PV or real device just because they think it is easier, but it
>> is often not so. Example:
>>>>>>>>> pvpanic device was initially proposed as hypercall, so lets say
>>>>>>>>> we would implement it as such. It would have been KVM specific,
>>>>>>>>> implementation would touch core guest KVM code and would have
>>>>>>>>> been Linux guest specific. Instead it was implemented as
>>>>>>>>> platform device with very small platform driver confined in
>>>>>>>>> drivers/ directory, immediately usable by XEN and QEMU tcg in
>>>>>>>>> addition
>>>>>>>> 
>>>>>>>> This is actually a very good point. How do we support reboot and
>>>>>>>> shutdown for TCG guests? We surely don't want to expose TCG as
>>>>>>>> KVM
>>>> hypervisor.
>>>>>>> 
>>>>>>> Hmm...so are you proposing that we abandon the current approach,
>>>>>>> and switch to a device-based mechanism for reboot/shutdown?
>>>>>> 
>>>>>> Reading Gleb's email it sounds like the more future proof approach,
>>>>>> yes. I'm not quite sure yet where we should plug this though.
>>>>> 
>>>>> What do you mean...where the paravirt device would go in the
>>>>> physical address map??
>>>> 
>>>> Right. Either we
>>>> 
>>>> - let the guest decide (PCI)
>>>> - let QEMU decide, but potentially break the SoC layout (SysBus)
>>>> - let QEMU decide, but only for the virt machine so that we don't
>>>> break anyone
>>>> (PlatBus)
>>> 
>>> Can you please elaborate above two points ?
>> 
>> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time we diverge
>> from the layout of the original chip, things can break.
>> 
>> However, for our PV machine (-M ppce500 / e500plat) we don't care about real
>> hardware layouts. We simply emulate a machine that is 100% described through the
>> device tree. So guests that can't deal with the machine looking different from
>> real hardware don't really matter anyways, since they'd already be broken.
>> 
> 
> Ah, so we can choose any address range in ccsr space of a PV machine (-M ppce500 / e500plat).

No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.

> What about MPC8544DS machine?.

I guess we'll have to live with GUTS there.

> So what is preferred way, vitio-reset/shutdown device or the above mentioned ?

A virtio device would clutter our PCI space which we're already pretty tight on. So I'd personally prefer the above mentioned.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html