> >>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote: > >>>>>>>> > >>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote: > >>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote: > >>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote: > >>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote: > >>>>>>>>>>>>> There is no much sense to share hypercalls between architectures. > >>>>>>>>>>>>> There > >>>>>>>>>>>>> is zero probability x86 will implement those for instance > >>>>>>>>>>>> > >>>>>>>>>>>> This is similar to the question of whether to keep device > >>>>>>>>>>>> API enumerations per-architecture... It costs very little > >>>>>>>>>>>> to keep it in a common place, and it's hard to go back in > >>>>>>>>>>>> the other direction if we later realize there are things > >>>>>>>>>>>> that should be > >> shared. > >>>>>>>>>>>> > >>>>>>>>>>> This is different from device API since with device API all > >>>>>>>>>>> arches have to create/destroy devices, so it make sense to > >>>>>>>>>>> put device lifecycle management into the common code, and > >>>>>>>>>>> device API has single entry point to the code - device fd > >>>>>>>>>>> ioctl - where it makes sense to handle common tasks, if any, > >>>>>>>>>>> and despatch others to specific device implementation. > >>>>>>>>>>> > >>>>>>>>>>> This is totally unlike hypercalls which are, by definition, > >>>>>>>>>>> very architecture specific (the way they are triggered, the > >>>>>>>>>>> way parameter are passed from guest to host, what hypercalls > >>>>>>>>>>> arch > >> needs...). > >>>>>>>>>> > >>>>>>>>>> The ABI is architecture specific. The API doesn't need to > >>>>>>>>>> be, any more than it does with syscalls (I consider the > >>>>>>>>>> architecture-specific definition of syscall numbers and > >>>>>>>>>> similar constants in Linux to be unfortunate, especially for > >>>>>>>>>> tools such as strace or QEMU's linux-user emulation). > >>>>>>>>>> > >>>>>>>>> Unlike syscalls different arches have very different ideas > >>>>>>>>> what hypercalls they need to implement, so while with unified > >>>>>>>>> syscall space I can see how it may benefit (very) small number > >>>>>>>>> of tools, I do not see what advantage it will give us. The > >>>>>>>>> disadvantage is one more global name space to manage. > >>>>>>>>> > >>>>>>>>>>>> Keeping it in a common place also makes it more visible to > >>>>>>>>>>>> people looking to add new hcalls, which could cut down on > >>>>>>>>>>>> reinventing the wheel. > >>>>>>>>>>> I do not want other arches to start using hypercalls in the > >>>>>>>>>>> way powerpc started to use them: separate device io space, > >>>>>>>>>>> so it is better to hide this as far away from common code as > >>>>>>>>>>> possible :) But on a more serious note hypercalls should be > >>>>>>>>>>> a last resort and added only when no other possibility > >>>>>>>>>>> exists, so people should not look what hcalls others > >>>>>>>>>>> implemented, so they can add them to their favorite arch, > >>>>>>>>>>> but they should have a problem at hand that they cannot > >>>>>>>>>>> solve without hcall, but at this point they will have pretty good > idea what this hcall should do. > >>>>>>>>>> > >>>>>>>>>> Why are hcalls such a bad thing? > >>>>>>>>>> > >>>>>>>>> Because they often used to do non architectural things making > >>>>>>>>> OSes behave different from how they runs on real HW and real > >>>>>>>>> HW is what OSes are designed and tested for. Example: there > >>>>>>>>> once was a KVM (XEN have/had similar one) hypercall to > >>>>>>>>> accelerate MMU > >> operation. > >>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if > >>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU > >>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good > >>>>>>>>> that at that point those hypercalls were already deprecated on > >>>>>>>>> KVM (IIRC XEN was broke for some time in that regard). Which > >>>>>>>>> brings me to another point: they often get obsoleted by code > >>>>>>>>> improvement and HW advancement (happened to aforementioned MMU > >>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports > >>>>>>>>> live migration, without live migration it is less of a problem. > >>>>>>>>> Next point is that people often try to use them instead of > >>>>>>>>> emulate PV or real device just because they think it is > >>>>>>>>> easier, but it > >> is often not so. Example: > >>>>>>>>> pvpanic device was initially proposed as hypercall, so lets > >>>>>>>>> say we would implement it as such. It would have been KVM > >>>>>>>>> specific, implementation would touch core guest KVM code and > >>>>>>>>> would have been Linux guest specific. Instead it was > >>>>>>>>> implemented as platform device with very small platform driver > >>>>>>>>> confined in drivers/ directory, immediately usable by XEN and > >>>>>>>>> QEMU tcg in addition > >>>>>>>> > >>>>>>>> This is actually a very good point. How do we support reboot > >>>>>>>> and shutdown for TCG guests? We surely don't want to expose TCG > >>>>>>>> as KVM > >>>> hypervisor. > >>>>>>> > >>>>>>> Hmm...so are you proposing that we abandon the current approach, > >>>>>>> and switch to a device-based mechanism for reboot/shutdown? > >>>>>> > >>>>>> Reading Gleb's email it sounds like the more future proof > >>>>>> approach, yes. I'm not quite sure yet where we should plug this though. > >>>>> > >>>>> What do you mean...where the paravirt device would go in the > >>>>> physical address map?? > >>>> > >>>> Right. Either we > >>>> > >>>> - let the guest decide (PCI) > >>>> - let QEMU decide, but potentially break the SoC layout (SysBus) > >>>> - let QEMU decide, but only for the virt machine so that we don't > >>>> break anyone > >>>> (PlatBus) > >>> > >>> Can you please elaborate above two points ? > >> > >> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time > >> we diverge from the layout of the original chip, things can break. > >> > >> However, for our PV machine (-M ppce500 / e500plat) we don't care > >> about real hardware layouts. We simply emulate a machine that is 100% > >> described through the device tree. So guests that can't deal with the > >> machine looking different from real hardware don't really matter anyways, > since they'd already be broken. > >> > > > > Ah, so we can choose any address range in ccsr space of a PV machine (-M > ppce500 / e500plat). > > No, we don't put it in CCSR space. It'd just be orthogonal to CCSR. All devices are represented in guest device tree, so how we will represent this device in guest Device Tree? -Bharat > > > What about MPC8544DS machine?. > > I guess we'll have to live with GUTS there. > > > So what is preferred way, vitio-reset/shutdown device or the above mentioned ? > > A virtio device would clutter our PCI space which we're already pretty tight on. > So I'd personally prefer the above mentioned. > > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html