Re: [PATCH 3/5] booke: define reset and shutdown hcalls

Alexander Graf <agraf@xxxxxxx> · Wed, 17 Jul 2013 18:23:01 +0200

On 17.07.2013, at 18:21, Bhushan Bharat-R65777 wrote:

>>>>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
>>>>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
>>>>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
>>>>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
>>>>>>>>>>>>>>> There is no much sense to share hypercalls between architectures.
>>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>> is zero probability x86 will implement those for instance
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is similar to the question of whether to keep device
>>>>>>>>>>>>>> API enumerations per-architecture...  It costs very little
>>>>>>>>>>>>>> to keep it in a common place, and it's hard to go back in
>>>>>>>>>>>>>> the other direction if we later realize there are things
>>>>>>>>>>>>>> that should be
>>>> shared.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is different from device API since with device API all
>>>>>>>>>>>>> arches have to create/destroy devices, so it make sense to
>>>>>>>>>>>>> put device lifecycle management into the common code, and
>>>>>>>>>>>>> device API has single entry point to the code - device fd
>>>>>>>>>>>>> ioctl - where it makes sense to handle common tasks, if any,
>>>>>>>>>>>>> and despatch others to specific device implementation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is totally unlike hypercalls which are, by definition,
>>>>>>>>>>>>> very architecture specific (the way they are triggered, the
>>>>>>>>>>>>> way parameter are passed from guest to host, what hypercalls
>>>>>>>>>>>>> arch
>>>> needs...).
>>>>>>>>>>>> 
>>>>>>>>>>>> The ABI is architecture specific.  The API doesn't need to
>>>>>>>>>>>> be, any more than it does with syscalls (I consider the
>>>>>>>>>>>> architecture-specific definition of syscall numbers and
>>>>>>>>>>>> similar constants in Linux to be unfortunate, especially for
>>>>>>>>>>>> tools such as strace or QEMU's linux-user emulation).
>>>>>>>>>>>> 
>>>>>>>>>>> Unlike syscalls different arches have very different ideas
>>>>>>>>>>> what hypercalls they need to implement, so while with unified
>>>>>>>>>>> syscall space I can see how it may benefit (very) small number
>>>>>>>>>>> of tools, I do not see what advantage it will give us. The
>>>>>>>>>>> disadvantage is one more global name space to manage.
>>>>>>>>>>> 
>>>>>>>>>>>>>> Keeping it in a common place also makes it more visible to
>>>>>>>>>>>>>> people looking to add new hcalls, which could cut down on
>>>>>>>>>>>>>> reinventing the wheel.
>>>>>>>>>>>>> I do not want other arches to start using hypercalls in the
>>>>>>>>>>>>> way powerpc started to use them: separate device io space,
>>>>>>>>>>>>> so it is better to hide this as far away from common code as
>>>>>>>>>>>>> possible :) But on a more serious note hypercalls should be
>>>>>>>>>>>>> a last resort and added only when no other possibility
>>>>>>>>>>>>> exists, so people should not look what hcalls others
>>>>>>>>>>>>> implemented, so they can add them to their favorite arch,
>>>>>>>>>>>>> but they should have a problem at hand that they cannot
>>>>>>>>>>>>> solve without hcall, but at this point they will have pretty good
>> idea what this hcall should do.
>>>>>>>>>>>> 
>>>>>>>>>>>> Why are hcalls such a bad thing?
>>>>>>>>>>>> 
>>>>>>>>>>> Because they often used to do non architectural things making
>>>>>>>>>>> OSes behave different from how they runs on real HW and real
>>>>>>>>>>> HW is what OSes are designed and tested for. Example: there
>>>>>>>>>>> once was a KVM (XEN have/had similar one) hypercall to
>>>>>>>>>>> accelerate MMU
>>>> operation.
>>>>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if
>>>>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU
>>>>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good
>>>>>>>>>>> that at that point those hypercalls were already deprecated on
>>>>>>>>>>> KVM (IIRC XEN was broke for some time in that regard). Which
>>>>>>>>>>> brings me to another point: they often get obsoleted by code
>>>>>>>>>>> improvement and HW advancement (happened to aforementioned MMU
>>>>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports
>>>>>>>>>>> live migration, without live migration it is less of a problem.
>>>>>>>>>>> Next point is that people often try to use them instead of
>>>>>>>>>>> emulate PV or real device just because they think it is
>>>>>>>>>>> easier, but it
>>>> is often not so. Example:
>>>>>>>>>>> pvpanic device was initially proposed as hypercall, so lets
>>>>>>>>>>> say we would implement it as such. It would have been KVM
>>>>>>>>>>> specific, implementation would touch core guest KVM code and
>>>>>>>>>>> would have been Linux guest specific. Instead it was
>>>>>>>>>>> implemented as platform device with very small platform driver
>>>>>>>>>>> confined in drivers/ directory, immediately usable by XEN and
>>>>>>>>>>> QEMU tcg in addition
>>>>>>>>>> 
>>>>>>>>>> This is actually a very good point. How do we support reboot
>>>>>>>>>> and shutdown for TCG guests? We surely don't want to expose TCG
>>>>>>>>>> as KVM
>>>>>> hypervisor.
>>>>>>>>> 
>>>>>>>>> Hmm...so are you proposing that we abandon the current approach,
>>>>>>>>> and switch to a device-based mechanism for reboot/shutdown?
>>>>>>>> 
>>>>>>>> Reading Gleb's email it sounds like the more future proof
>>>>>>>> approach, yes. I'm not quite sure yet where we should plug this though.
>>>>>>> 
>>>>>>> What do you mean...where the paravirt device would go in the
>>>>>>> physical address map??
>>>>>> 
>>>>>> Right. Either we
>>>>>> 
>>>>>> - let the guest decide (PCI)
>>>>>> - let QEMU decide, but potentially break the SoC layout (SysBus)
>>>>>> - let QEMU decide, but only for the virt machine so that we don't
>>>>>> break anyone
>>>>>> (PlatBus)
>>>>> 
>>>>> Can you please elaborate above two points ?
>>>> 
>>>> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time
>>>> we diverge from the layout of the original chip, things can break.
>>>> 
>>>> However, for our PV machine (-M ppce500 / e500plat) we don't care
>>>> about real hardware layouts. We simply emulate a machine that is 100%
>>>> described through the device tree. So guests that can't deal with the
>>>> machine looking different from real hardware don't really matter anyways,
>> since they'd already be broken.
>>>> 
>>> 
>>> Ah, so we can choose any address range in ccsr space of a PV machine (-M
>> ppce500 / e500plat).
>> 
>> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.
> 
> All devices are represented in guest device tree, so how we will represent this device in guest Device Tree?

Not inside of the CCSR node :).

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html