On 04.10.2013, at 13:53, Paul Mackerras wrote: > On Thu, Oct 03, 2013 at 04:29:52PM +0200, Greg Kurz wrote: >> Hi, >> >> There have been some work on the topic lately but no agreement has >> been reached yet. I want to consolidate the facts in a single thread of >> mail and re-start the discussion. Please find below a recap of what we >> have as of today: >> >> From a virtio POV, guest endianness is reflected by the endianness of >> the interrupt vectors (ILE bit in the LPCR register). The guest kernel >> relies on the H_SET_MODE_RESOURCE_LE hcall to set this bit, early in the >> boot process. >> >> Rusty sent a patchset on qemu-devel@ to provide the necessary bits to >> perform byteswap in the QEMU: >> >> http://patchwork.ozlabs.org/patch/266451/ >> http://patchwork.ozlabs.org/patch/266452/ >> http://patchwork.ozlabs.org/patch/266450/ >> (plus other enablement patches for virtio drivers, not essential for >> the discussion). >> >> In non-KVM mode, QEMU implements the H_SET_MODE_RESOURCE_LE and updates >> its internal value for LPCR when the guest requests it. Rusty's patchset >> works out-of-the-box in this mode: I could successfully setup and use a >> 9p share over virtio transport (broader virtio testing still to be done >> though). >> >> When using KVM, the story is different : QEMU is not on this >> endianness change flow anymore, providing KVM has the following >> patch from Anton: >> >> http://patchwork.ozlabs.org/patch/277079/ >> >> There are *at least* two approaches to bring back endianness knowledge >> to QEMU: polling (1) and propagation (2). >> >> (1) QEMU must retrieve LPCR from the kernel using the following API: >> >> http://patchwork.ozlabs.org/patch/273029/ >> >> (2) KVM can resume execution to the host and thus propagating >> H_SET_MODE_RESOURCE_LE to QEMU. Laurent came up with a patch on >> linuxppc-dev@ to do this: >> >> http://patchwork.ozlabs.org/patch/278590/ >> >> I would say (1) is a standard and sane way of addressing the issue: >> since the LPCR register value is held by KVM, it makes sense to >> introduce an API to get/set it. Then, it is up to QEMU to use this API. >> >> We can dumbly do the polling in all the places where byteswapping >> matters: it is clearly sub-optimized, especially since the LPCR_ILE bit >> doesn't change so often. Rusty suggested we can retrieve it at virtio >> device reset time and cache it, since an endianness change after the >> devices have started to be used is non-sensical. >> >> I have searched for an appropriate place to add the polling and I must >> admit I did not find any... I am no QEMU expert but I suspect we would >> need some kind of arch specific hook to be called from the virtio code >> to do this... :-\ I hope I am wrong, please correct me if so. >> >> On the other hand, (2) looks a bit hacky: KVM usually returns to the >> host when it cannot fully handle the h_call. Propagating may look like >> a useless path to follow from a KVM POV. From a QEMU POV, things are >> different: propagation will trig the fallback code in QEMU, already >> working in non-KVM mode. Nothing more to be done. > > I don't mind particularly whether H_SET_MODE for the endianness > setting gets handled in the kernel or in QEMU, but I don't think it > should be handled in both. If you want QEMU to know about the > endianness setting immediately, make the kernel version do nothing and > get QEMU to handle it -- which if KVM is enabled will mean iterating > over all vcpus and getting them all to send the new LPCR setting to > the kernel via the SET_ONE_REG ioctl. > > However, I want the setting of breakpoint registers (CIABR and DAWR/X) > via H_SET_MODE to happen in the kernel, preferably in real mode, since > that can happen on context switch and thus needs to be quick. I don't want to see a single hypercall be split across the QEMU/KVM barrier. So if there's a reasonable incentive to handle H_SET_MODE in KVM, we should handle all of it in KVM. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html