On 17.07.2013, at 18:21, Bhushan Bharat-R65777 wrote: >>>>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote: >>>>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote: >>>>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote: >>>>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote: >>>>>>>>>>>>>>> There is no much sense to share hypercalls between architectures. >>>>>>>>>>>>>>> There >>>>>>>>>>>>>>> is zero probability x86 will implement those for instance >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is similar to the question of whether to keep device >>>>>>>>>>>>>> API enumerations per-architecture... It costs very little >>>>>>>>>>>>>> to keep it in a common place, and it's hard to go back in >>>>>>>>>>>>>> the other direction if we later realize there are things >>>>>>>>>>>>>> that should be >>>> shared. >>>>>>>>>>>>>> >>>>>>>>>>>>> This is different from device API since with device API all >>>>>>>>>>>>> arches have to create/destroy devices, so it make sense to >>>>>>>>>>>>> put device lifecycle management into the common code, and >>>>>>>>>>>>> device API has single entry point to the code - device fd >>>>>>>>>>>>> ioctl - where it makes sense to handle common tasks, if any, >>>>>>>>>>>>> and despatch others to specific device implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> This is totally unlike hypercalls which are, by definition, >>>>>>>>>>>>> very architecture specific (the way they are triggered, the >>>>>>>>>>>>> way parameter are passed from guest to host, what hypercalls >>>>>>>>>>>>> arch >>>> needs...). >>>>>>>>>>>> >>>>>>>>>>>> The ABI is architecture specific. The API doesn't need to >>>>>>>>>>>> be, any more than it does with syscalls (I consider the >>>>>>>>>>>> architecture-specific definition of syscall numbers and >>>>>>>>>>>> similar constants in Linux to be unfortunate, especially for >>>>>>>>>>>> tools such as strace or QEMU's linux-user emulation). >>>>>>>>>>>> >>>>>>>>>>> Unlike syscalls different arches have very different ideas >>>>>>>>>>> what hypercalls they need to implement, so while with unified >>>>>>>>>>> syscall space I can see how it may benefit (very) small number >>>>>>>>>>> of tools, I do not see what advantage it will give us. The >>>>>>>>>>> disadvantage is one more global name space to manage. >>>>>>>>>>> >>>>>>>>>>>>>> Keeping it in a common place also makes it more visible to >>>>>>>>>>>>>> people looking to add new hcalls, which could cut down on >>>>>>>>>>>>>> reinventing the wheel. >>>>>>>>>>>>> I do not want other arches to start using hypercalls in the >>>>>>>>>>>>> way powerpc started to use them: separate device io space, >>>>>>>>>>>>> so it is better to hide this as far away from common code as >>>>>>>>>>>>> possible :) But on a more serious note hypercalls should be >>>>>>>>>>>>> a last resort and added only when no other possibility >>>>>>>>>>>>> exists, so people should not look what hcalls others >>>>>>>>>>>>> implemented, so they can add them to their favorite arch, >>>>>>>>>>>>> but they should have a problem at hand that they cannot >>>>>>>>>>>>> solve without hcall, but at this point they will have pretty good >> idea what this hcall should do. >>>>>>>>>>>> >>>>>>>>>>>> Why are hcalls such a bad thing? >>>>>>>>>>>> >>>>>>>>>>> Because they often used to do non architectural things making >>>>>>>>>>> OSes behave different from how they runs on real HW and real >>>>>>>>>>> HW is what OSes are designed and tested for. Example: there >>>>>>>>>>> once was a KVM (XEN have/had similar one) hypercall to >>>>>>>>>>> accelerate MMU >>>> operation. >>>>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if >>>>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU >>>>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good >>>>>>>>>>> that at that point those hypercalls were already deprecated on >>>>>>>>>>> KVM (IIRC XEN was broke for some time in that regard). Which >>>>>>>>>>> brings me to another point: they often get obsoleted by code >>>>>>>>>>> improvement and HW advancement (happened to aforementioned MMU >>>>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports >>>>>>>>>>> live migration, without live migration it is less of a problem. >>>>>>>>>>> Next point is that people often try to use them instead of >>>>>>>>>>> emulate PV or real device just because they think it is >>>>>>>>>>> easier, but it >>>> is often not so. Example: >>>>>>>>>>> pvpanic device was initially proposed as hypercall, so lets >>>>>>>>>>> say we would implement it as such. It would have been KVM >>>>>>>>>>> specific, implementation would touch core guest KVM code and >>>>>>>>>>> would have been Linux guest specific. Instead it was >>>>>>>>>>> implemented as platform device with very small platform driver >>>>>>>>>>> confined in drivers/ directory, immediately usable by XEN and >>>>>>>>>>> QEMU tcg in addition >>>>>>>>>> >>>>>>>>>> This is actually a very good point. How do we support reboot >>>>>>>>>> and shutdown for TCG guests? We surely don't want to expose TCG >>>>>>>>>> as KVM >>>>>> hypervisor. >>>>>>>>> >>>>>>>>> Hmm...so are you proposing that we abandon the current approach, >>>>>>>>> and switch to a device-based mechanism for reboot/shutdown? >>>>>>>> >>>>>>>> Reading Gleb's email it sounds like the more future proof >>>>>>>> approach, yes. I'm not quite sure yet where we should plug this though. >>>>>>> >>>>>>> What do you mean...where the paravirt device would go in the >>>>>>> physical address map?? >>>>>> >>>>>> Right. Either we >>>>>> >>>>>> - let the guest decide (PCI) >>>>>> - let QEMU decide, but potentially break the SoC layout (SysBus) >>>>>> - let QEMU decide, but only for the virt machine so that we don't >>>>>> break anyone >>>>>> (PlatBus) >>>>> >>>>> Can you please elaborate above two points ? >>>> >>>> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time >>>> we diverge from the layout of the original chip, things can break. >>>> >>>> However, for our PV machine (-M ppce500 / e500plat) we don't care >>>> about real hardware layouts. We simply emulate a machine that is 100% >>>> described through the device tree. So guests that can't deal with the >>>> machine looking different from real hardware don't really matter anyways, >> since they'd already be broken. >>>> >>> >>> Ah, so we can choose any address range in ccsr space of a PV machine (-M >> ppce500 / e500plat). >> >> No, we don't put it in CCSR space. It'd just be orthogonal to CCSR. > > All devices are represented in guest device tree, so how we will represent this device in guest Device Tree? Not inside of the CCSR node :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html