RE: [PATCH 3/5] booke: define reset and shutdown hcalls

Bhushan Bharat-R65777 <R65777@xxxxxxxxxxxxx> · Wed, 17 Jul 2013 16:21:36 +0000

> >>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote:
> >>>>>>>>
> >>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> >>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >>>>>>>>>>>>> There is no much sense to share hypercalls between architectures.
> >>>>>>>>>>>>> There
> >>>>>>>>>>>>> is zero probability x86 will implement those for instance
> >>>>>>>>>>>>
> >>>>>>>>>>>> This is similar to the question of whether to keep device
> >>>>>>>>>>>> API enumerations per-architecture...  It costs very little
> >>>>>>>>>>>> to keep it in a common place, and it's hard to go back in
> >>>>>>>>>>>> the other direction if we later realize there are things
> >>>>>>>>>>>> that should be
> >> shared.
> >>>>>>>>>>>>
> >>>>>>>>>>> This is different from device API since with device API all
> >>>>>>>>>>> arches have to create/destroy devices, so it make sense to
> >>>>>>>>>>> put device lifecycle management into the common code, and
> >>>>>>>>>>> device API has single entry point to the code - device fd
> >>>>>>>>>>> ioctl - where it makes sense to handle common tasks, if any,
> >>>>>>>>>>> and despatch others to specific device implementation.
> >>>>>>>>>>>
> >>>>>>>>>>> This is totally unlike hypercalls which are, by definition,
> >>>>>>>>>>> very architecture specific (the way they are triggered, the
> >>>>>>>>>>> way parameter are passed from guest to host, what hypercalls
> >>>>>>>>>>> arch
> >> needs...).
> >>>>>>>>>>
> >>>>>>>>>> The ABI is architecture specific.  The API doesn't need to
> >>>>>>>>>> be, any more than it does with syscalls (I consider the
> >>>>>>>>>> architecture-specific definition of syscall numbers and
> >>>>>>>>>> similar constants in Linux to be unfortunate, especially for
> >>>>>>>>>> tools such as strace or QEMU's linux-user emulation).
> >>>>>>>>>>
> >>>>>>>>> Unlike syscalls different arches have very different ideas
> >>>>>>>>> what hypercalls they need to implement, so while with unified
> >>>>>>>>> syscall space I can see how it may benefit (very) small number
> >>>>>>>>> of tools, I do not see what advantage it will give us. The
> >>>>>>>>> disadvantage is one more global name space to manage.
> >>>>>>>>>
> >>>>>>>>>>>> Keeping it in a common place also makes it more visible to
> >>>>>>>>>>>> people looking to add new hcalls, which could cut down on
> >>>>>>>>>>>> reinventing the wheel.
> >>>>>>>>>>> I do not want other arches to start using hypercalls in the
> >>>>>>>>>>> way powerpc started to use them: separate device io space,
> >>>>>>>>>>> so it is better to hide this as far away from common code as
> >>>>>>>>>>> possible :) But on a more serious note hypercalls should be
> >>>>>>>>>>> a last resort and added only when no other possibility
> >>>>>>>>>>> exists, so people should not look what hcalls others
> >>>>>>>>>>> implemented, so they can add them to their favorite arch,
> >>>>>>>>>>> but they should have a problem at hand that they cannot
> >>>>>>>>>>> solve without hcall, but at this point they will have pretty good
> idea what this hcall should do.
> >>>>>>>>>>
> >>>>>>>>>> Why are hcalls such a bad thing?
> >>>>>>>>>>
> >>>>>>>>> Because they often used to do non architectural things making
> >>>>>>>>> OSes behave different from how they runs on real HW and real
> >>>>>>>>> HW is what OSes are designed and tested for. Example: there
> >>>>>>>>> once was a KVM (XEN have/had similar one) hypercall to
> >>>>>>>>> accelerate MMU
> >> operation.
> >>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if
> >>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU
> >>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good
> >>>>>>>>> that at that point those hypercalls were already deprecated on
> >>>>>>>>> KVM (IIRC XEN was broke for some time in that regard). Which
> >>>>>>>>> brings me to another point: they often get obsoleted by code
> >>>>>>>>> improvement and HW advancement (happened to aforementioned MMU
> >>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports
> >>>>>>>>> live migration, without live migration it is less of a problem.
> >>>>>>>>> Next point is that people often try to use them instead of
> >>>>>>>>> emulate PV or real device just because they think it is
> >>>>>>>>> easier, but it
> >> is often not so. Example:
> >>>>>>>>> pvpanic device was initially proposed as hypercall, so lets
> >>>>>>>>> say we would implement it as such. It would have been KVM
> >>>>>>>>> specific, implementation would touch core guest KVM code and
> >>>>>>>>> would have been Linux guest specific. Instead it was
> >>>>>>>>> implemented as platform device with very small platform driver
> >>>>>>>>> confined in drivers/ directory, immediately usable by XEN and
> >>>>>>>>> QEMU tcg in addition
> >>>>>>>>
> >>>>>>>> This is actually a very good point. How do we support reboot
> >>>>>>>> and shutdown for TCG guests? We surely don't want to expose TCG
> >>>>>>>> as KVM
> >>>> hypervisor.
> >>>>>>>
> >>>>>>> Hmm...so are you proposing that we abandon the current approach,
> >>>>>>> and switch to a device-based mechanism for reboot/shutdown?
> >>>>>>
> >>>>>> Reading Gleb's email it sounds like the more future proof
> >>>>>> approach, yes. I'm not quite sure yet where we should plug this though.
> >>>>>
> >>>>> What do you mean...where the paravirt device would go in the
> >>>>> physical address map??
> >>>>
> >>>> Right. Either we
> >>>>
> >>>> - let the guest decide (PCI)
> >>>> - let QEMU decide, but potentially break the SoC layout (SysBus)
> >>>> - let QEMU decide, but only for the virt machine so that we don't
> >>>> break anyone
> >>>> (PlatBus)
> >>>
> >>> Can you please elaborate above two points ?
> >>
> >> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time
> >> we diverge from the layout of the original chip, things can break.
> >>
> >> However, for our PV machine (-M ppce500 / e500plat) we don't care
> >> about real hardware layouts. We simply emulate a machine that is 100%
> >> described through the device tree. So guests that can't deal with the
> >> machine looking different from real hardware don't really matter anyways,
> since they'd already be broken.
> >>
> >
> > Ah, so we can choose any address range in ccsr space of a PV machine (-M
> ppce500 / e500plat).
> 
> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR.

All devices are represented in guest device tree, so how we will represent this device in guest Device Tree?

-Bharat

> 
> > What about MPC8544DS machine?.
> 
> I guess we'll have to live with GUTS there.
> 
> > So what is preferred way, vitio-reset/shutdown device or the above mentioned ?
> 
> A virtio device would clutter our PCI space which we're already pretty tight on.
> So I'd personally prefer the above mentioned.
> 
> 
> Alex
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html