On 17.07.2013, at 17:59, Bhushan Bharat-R65777 wrote: > > >> -----Original Message----- >> From: Alexander Graf [mailto:agraf@xxxxxxx] >> Sent: Wednesday, July 17, 2013 9:22 PM >> To: Bhushan Bharat-R65777 >> Cc: Yoder Stuart-B08248; Wood Scott-B07421; kvm@xxxxxxxxxxxxxxx; kvm- >> ppc@xxxxxxxxxxxxxxx; Gleb Natapov >> Subject: Re: [PATCH 3/5] booke: define reset and shutdown hcalls >> >> >> On 17.07.2013, at 17:47, Bhushan Bharat-R65777 wrote: >> >>> >>>>>>>> On 17.07.2013, at 13:00, Gleb Natapov wrote: >>>>>>>> >>>>>>>>> On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote: >>>>>>>>>> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote: >>>>>>>>>>> On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote: >>>>>>>>>>>> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote: >>>>>>>>>>>>> There is no much sense to share hypercalls between architectures. >>>>>>>>>>>>> There >>>>>>>>>>>>> is zero probability x86 will implement those for instance >>>>>>>>>>>> >>>>>>>>>>>> This is similar to the question of whether to keep device API >>>>>>>>>>>> enumerations per-architecture... It costs very little to >>>>>>>>>>>> keep it in a common place, and it's hard to go back in the >>>>>>>>>>>> other direction if we later realize there are things that should be >> shared. >>>>>>>>>>>> >>>>>>>>>>> This is different from device API since with device API all >>>>>>>>>>> arches have to create/destroy devices, so it make sense to put >>>>>>>>>>> device lifecycle management into the common code, and device >>>>>>>>>>> API has single entry point to the code - device fd ioctl - >>>>>>>>>>> where it makes sense to handle common tasks, if any, and >>>>>>>>>>> despatch others to specific device implementation. >>>>>>>>>>> >>>>>>>>>>> This is totally unlike hypercalls which are, by definition, >>>>>>>>>>> very architecture specific (the way they are triggered, the >>>>>>>>>>> way parameter are passed from guest to host, what hypercalls arch >> needs...). >>>>>>>>>> >>>>>>>>>> The ABI is architecture specific. The API doesn't need to be, >>>>>>>>>> any more than it does with syscalls (I consider the >>>>>>>>>> architecture-specific definition of syscall numbers and similar >>>>>>>>>> constants in Linux to be unfortunate, especially for tools such >>>>>>>>>> as strace or QEMU's linux-user emulation). >>>>>>>>>> >>>>>>>>> Unlike syscalls different arches have very different ideas what >>>>>>>>> hypercalls they need to implement, so while with unified syscall >>>>>>>>> space I can see how it may benefit (very) small number of tools, >>>>>>>>> I do not see what advantage it will give us. The disadvantage is >>>>>>>>> one more global name space to manage. >>>>>>>>> >>>>>>>>>>>> Keeping it in a common place also makes it more visible to >>>>>>>>>>>> people looking to add new hcalls, which could cut down on >>>>>>>>>>>> reinventing the wheel. >>>>>>>>>>> I do not want other arches to start using hypercalls in the >>>>>>>>>>> way powerpc started to use them: separate device io space, so >>>>>>>>>>> it is better to hide this as far away from common code as >>>>>>>>>>> possible :) But on a more serious note hypercalls should be a >>>>>>>>>>> last resort and added only when no other possibility exists, >>>>>>>>>>> so people should not look what hcalls others implemented, so >>>>>>>>>>> they can add them to their favorite arch, but they should have >>>>>>>>>>> a problem at hand that they cannot solve without hcall, but at >>>>>>>>>>> this point they will have pretty good idea what this hcall should do. >>>>>>>>>> >>>>>>>>>> Why are hcalls such a bad thing? >>>>>>>>>> >>>>>>>>> Because they often used to do non architectural things making >>>>>>>>> OSes behave different from how they runs on real HW and real HW >>>>>>>>> is what OSes are designed and tested for. Example: there once >>>>>>>>> was a KVM (XEN have/had similar one) hypercall to accelerate MMU >> operation. >>>>>>>>> One thing it allowed is to to flush tlb without doing IPI if >>>>>>>>> vcpu is not running. Later optimization was added to Linux MMU >>>>>>>>> code that _relies_ on those IPIs for synchronisation. Good that >>>>>>>>> at that point those hypercalls were already deprecated on KVM >>>>>>>>> (IIRC XEN was broke for some time in that regard). Which brings >>>>>>>>> me to another point: they often get obsoleted by code >>>>>>>>> improvement and HW advancement (happened to aforementioned MMU >>>>>>>>> hypercalls), but they hard to deprecate if hypervisor supports >>>>>>>>> live migration, without live migration it is less of a problem. >>>>>>>>> Next point is that people often try to use them instead of >>>>>>>>> emulate PV or real device just because they think it is easier, but it >> is often not so. Example: >>>>>>>>> pvpanic device was initially proposed as hypercall, so lets say >>>>>>>>> we would implement it as such. It would have been KVM specific, >>>>>>>>> implementation would touch core guest KVM code and would have >>>>>>>>> been Linux guest specific. Instead it was implemented as >>>>>>>>> platform device with very small platform driver confined in >>>>>>>>> drivers/ directory, immediately usable by XEN and QEMU tcg in >>>>>>>>> addition >>>>>>>> >>>>>>>> This is actually a very good point. How do we support reboot and >>>>>>>> shutdown for TCG guests? We surely don't want to expose TCG as >>>>>>>> KVM >>>> hypervisor. >>>>>>> >>>>>>> Hmm...so are you proposing that we abandon the current approach, >>>>>>> and switch to a device-based mechanism for reboot/shutdown? >>>>>> >>>>>> Reading Gleb's email it sounds like the more future proof approach, >>>>>> yes. I'm not quite sure yet where we should plug this though. >>>>> >>>>> What do you mean...where the paravirt device would go in the >>>>> physical address map?? >>>> >>>> Right. Either we >>>> >>>> - let the guest decide (PCI) >>>> - let QEMU decide, but potentially break the SoC layout (SysBus) >>>> - let QEMU decide, but only for the virt machine so that we don't >>>> break anyone >>>> (PlatBus) >>> >>> Can you please elaborate above two points ? >> >> If we emulate an MPC8544DS, we need to emulate an MPC8544DS. Any time we diverge >> from the layout of the original chip, things can break. >> >> However, for our PV machine (-M ppce500 / e500plat) we don't care about real >> hardware layouts. We simply emulate a machine that is 100% described through the >> device tree. So guests that can't deal with the machine looking different from >> real hardware don't really matter anyways, since they'd already be broken. >> > > Ah, so we can choose any address range in ccsr space of a PV machine (-M ppce500 / e500plat). No, we don't put it in CCSR space. It'd just be orthogonal to CCSR. > What about MPC8544DS machine?. I guess we'll have to live with GUTS there. > So what is preferred way, vitio-reset/shutdown device or the above mentioned ? A virtio device would clutter our PCI space which we're already pretty tight on. So I'd personally prefer the above mentioned. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html