On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote: > On 6 January 2014 14:56, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: > > On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote: > >> On 6 January 2014 18:20, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: > >> > No matter how data is stored in memory (BE, LE, or > >> > even PDP endianness), CPU registers always have a consistent > >> > representation. They are immune to CPU endianness change, and storing > >> > to/reading from memory won't change the value, as long as you use the > >> > same endianness for writing/reading. > >> > >> Ah, endianness. This always confuses me, but I hope the following > >> is correct... (in all the following when I say BE I mean BE8, not BE32, > >> since BE32 and virtualization never occur in the same CPU). > >> > >> Certainly registers don't have endianness, but the entire point > >> of the CPSR.E bit is exactly that it changes the value as it is > >> stored to / read from memory, isn't it? -- that's where and when the > >> byte-lane flipping happens. > >> > >> Where this impacts the hypervisor is that instead of actually sending > >> the data out to the bus via the byte-swapping h/w, we've trapped instead. > >> The hypervisor reads the original data directly from the guest CPU > >> registers, and so it's the hypervisor and userspace support code that > >> between them have to emulate the equivalent of the byte lane > >> swapping h/w. You could argue that it shouldn't be the kernel's > >> job, but since the kernel has to do it for the devices it emulates > >> internally, I'm not sure that makes much sense. > > > > As far as I understand, this is exactly what vcpu_data_guest_to_host and > > vcpu_data_host_to_guest do; emulate the byte lane swapping. > > > > The problem is that it only works on a little-endian host with the > > current code, because be16_to_cpu (for example), actually perform a > > byteswap, which is what needs to be emulated. On a big-endian host, we > > do nothing, so we end up giving a byteswapped value to the emulated > > device. > > Yes, that was my point on the thread: vcpu_data_guest_to_host and > vcpu_data_host_to_guest functions for any given host endianity should > give opposite endian results depending on CPSR E bit value. And > currently it is not happening in BE host case. It seems that Peter and > you agree with that and I gave example in another email with > dynamically switching E bit illustrating this problem for BE host. > > > I think a cleaner fix than this patch is to just change the > > be16_to_cpu() to a __swab16() instead, which clearly indicates that > > 'here is the byte lane swapping'. > > Yes, that may work, but it is a bit orthogonal issue. Why? I don't think it is, I think it's addressing exactly the point at hand. > And I don't think > it is better. For this to work one need to change canonical endianity on > one of the sides around vcpu_data_guest_to_host and > vcpu_data_host_to_guest functions. You have to simply clearly define which format you want mmio.data to be in. This is a user space interface across multiple architectures and therefore something you have to consider carefully and you're limited in choices to something that works with existing user space code. > > Changing it on side that faces hypervisor (code that handles guest spilled > CPU register set) does not make sense at all - if we will keep guest CPU > register set in memory in LE form and hypervisor runs in BE (BE host), > code that spills registers would need to do constant byteswaps. Also any > access by host kernel and hypervisor (all running in BE) would need to do > byteswaps while working with guest saved registers. > > Changing canonical form of data on side that faces emulator and mmio > part of kvm_run does not make sense either. kvm_run mmio.data field is > bytes array, when it comes to host kernel from emulator, it already contains > device memory in correct endian order that corresponds to endianity of > emulated device. For example for LE device word read access, after call is > emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] > mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at > mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function > will byte copy this mmio.data buffer into integer according to ongoing mmio > access size. Note in BE host case such integer, in 'data' variable of > kvm_handle_mmio_return function, will have byteswapped value. Now when it will > be passed into vcpu_data_host_to_guest function, and it emulates read access > of guest with E bit set, and if we follow your suggestion, it will be > byteswapped. > I.e 'data' integer will contain non byteswapped value of LE device. It will be > further stored into some vcpu_reg register, still in native format (BE > store), and > further restored into guest CPU register, still non byteswapped (BE hypervisor). > And that is not what BE client reading word of LE device expects - BE client > knowing that it reads LE device with E bit set, it will issue additional rev > instruction to get device memory as integer. If we really want to follow your > suggestion, one may introduce compensatory byteswaps in mmio_read_buf > and mmio_write_buf functions in case of BE host, rather then just do > memcpy ... but I am not sure what it will buy us - in BE case it will swap data > twice. > > Note in above description by "canonical" I mean some form of data regardless > of current access CPSR E value. But it may differ depending on host endianess. > There's a lot of text to digest here, talking about a canonical form here doesn't help; just define the layout of the destination byte array. I also got completely lost in what you're referring to when you talk about 'sides' here. The thing we must decide is how the data is stored in kvm_exit_mmio.data. See Peter's recent thread "KVM and variable-endianness guest CPUs". Once we agree on this, the rest should be easy (assuming we use the same structure for the data in the kernel's internal kvm_exit_mmio declared on the stack in io_mem_abort()). The format you suggest requires any consumer of this data to consider the host endianness, which I don't think makes anything more clear (see my comment on the vgic patch). The in-kernel interface between the io_mem_abort() code and any in-kernel emulated device must do exactly the same as the interface between KVM and QEMU must do for KVM_EXIT_MMIO. -- Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm