Hi Alex, On Wed, Jan 22, 2014 at 12:11 PM, Alexander Graf <agraf@xxxxxxx> wrote: > > >> Am 22.01.2014 um 07:31 schrieb Anup Patel <anup@xxxxxxxxxxxxxx>: >> >> On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky >> <victor.kamensky@xxxxxxxxxx> wrote: >>> Hi Guys, >>> >>> Christoffer and I had a bit heated chat :) on this >>> subject last night. Christoffer, really appreciate >>> your time! We did not really reach agreement >>> during the chat and Christoffer asked me to follow >>> up on this thread. >>> Here it goes. Sorry, it is very long email. >>> >>> I don't believe we can assign any endianity to >>> mmio.data[] byte array. I believe mmio.data[] and >>> mmio.len acts just memcpy and that is all. As >>> memcpy does not imply any endianity of underlying >>> data mmio.data[] should not either. >>> >>> Here is my definition: >>> >>> mmio.data[] is array of bytes that contains memory >>> bytes in such form, for read case, that if those >>> bytes are placed in guest memory and guest executes >>> the same read access instruction with address to this >>> memory, result would be the same as real h/w device >>> memory access. Rest of KVM host and hypervisor >>> part of code should really take care of mmio.data[] >>> memory so it will be delivered to vcpu registers and >>> restored by hypervisor part in such way that guest CPU >>> register value is the same as it would be for real >>> non-emulated h/w read access (that is emulation part). >>> The same goes for write access, if guest writes into >>> memory and those bytes are just copied to emulated >>> h/w register it would have the same effect as real >>> mapped h/w register write. >>> >>> In shorter form, i.e for len=4 access: endianity of integer >>> at &mmio.data[0] address should match endianity >>> of emulated h/w device behind phys_addr address, >>> regardless what is endianity of emulator, KVM host, >>> hypervisor, and guest >>> >>> Examples that illustrate my definition >>> -------------------------------------- >>> >>> 1) LE guest (E bit is off in ARM speak) reads integer >>> (4 bytes) from mapped h/w LE device register - >>> mmio.data[3] contains MSB, mmio.data[0] contains LSB. >>> >>> 2) BE guest (E bit is on in ARM speak) reads integer >>> from mapped h/w LE device register - mmio.data[3] >>> contains MSB, mmio.data[0] contains LSB. Note that >>> if &mmio.data[0] memory would be placed in guest >>> address space and instruction restarted with new >>> address, then it would meet BE guest expectations >>> - the guest knows that it reads LE h/w so it will byteswap >>> register before processing it further. This is BE guest ARM >>> case (regardless of what KVM host endianity is). >>> >>> 3) BE guest reads integer from mapped h/w BE device >>> register - mmio.data[0] contains MSB, mmio.data[3] >>> contains LSB. Note that if &mmio.data[0] memory would >>> be placed in guest address space and instruction >>> restarted with new address, then it would meet BE >>> guest expectation - the guest knows that it reads >>> BE h/w so it will proceed further without any other >>> work. I guess, it is BE ppc case. >>> >>> >>> Arguments in favor of memcpy semantics of mmio.data[] >>> ------------------------------------------------------ >>> >>> x) What are possible values of 'len'? Previous discussions >>> imply that is always powers of 2. Why is that? Maybe >>> there will be CPU that would need to do 5 bytes mmio >>> access, or 6 bytes. How do you assign endianity to >>> such case? 'len' 5 or 6, or any works fine with >>> memcpy semantics. I admit it is hypothetical case, but >>> IMHO it tests how clean ABI definition is. >>> >>> x) Byte array does not have endianity because it >>> does not have any structure. If one would want to >>> imply structure why mmio is not defined in such way >>> so structure reflected in mmio definition? >>> Something like: >>> >>> >>> /* KVM_EXIT_MMIO */ >>> struct { >>> __u64 phys_addr; >>> union { >>> __u8 byte; >>> __u16 hword; >>> __u32 word; >>> __u64 dword; >>> } data; >>> __u32 len; >>> __u8 is_write; >>> } mmio; >>> >>> where len is really serves as union discriminator and >>> only allowed len values are 1, 2, 4, 8. >>> In this case, I agree, endianity of integer types >>> should be defined. I believe, use of byte array strongly >>> implies that original intent was to have semantics of >>> byte stream copy, just like memcpy does. >>> >>> x) Note there is nothing wrong with user kernel ABI to >>> use just bytes stream as parameter. There is already >>> precedents like 'read' and 'write' system calls :). >>> >>> x) Consider case when KVM works with emulated memory mapped >>> h/w devices where some devices operate in LE mode and others >>> operate in BE mode. It is defined by semantics of real h/w >>> device which is it, and should be emulated by emulator and KVM >>> given all other context. As far as mmio.data[] array concerned, if the >>> same integer value is read from these devices registers, mmio.data[] >>> memory should contain integer in opposite endianity for these >>> two cases, i.e MSB is data[0] in one case and MSB is >>> data[3] is in another case. It cannot be the same, because >>> except emulator and guest kernel, all other, like KVM host >>> and hypervisor, have no clue what endianity of device >>> actually is - it should treat mmio.data[] in the same way. >>> But resulting guest target CPU register would need to contain >>> normal integer value in one case and byteswapped in another, >>> because guest kernel would use it directly in one case and >>> byteswap it in another. Byte stream semantics allows to do >>> that. I don't see how it could happen if you fixate mmio.data[] >>> endianity in such way that it would contain integer in >>> the same format for BE and LE emulated device types. >>> >>> If by this point you agree, that mmio.data[] user-land/kernel >>> ABI semantics should be just memcpy, stop reading :). If not, >>> you may would like to take a look at below appendix where I >>> described in great details endianity of data at different >>> points along mmio processing code path of existing ARM LE KVM, >>> and proposed ARM BE KVM. Note appendix, is very long and very >>> detailed, sorry about that, but I feel that earlier more >>> digested explanations failed, so it driven me to write out >>> all details how I see them. If I am wrong, I hope it would be >>> easier for folks to point in detailed explanation places >>> where my logic goes bad. Also, I am not sure whether this >>> mail thread is good place to discuss all details described >>> in the appendix. Christoffer, please advise whether I should take >>> that one back on [1]. But I hope this bigger picture may help to >>> see the mmio.data[] semantics issue in context. >>> >>> More inline and appendix is at the end. >>> >>>> On 20 January 2014 11:19, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: >>>>> On Mon, Jan 20, 2014 at 03:22:11PM +0100, Alexander Graf wrote: >>>>> >>>>>> On 17.01.2014, at 19:52, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: >>>>>> >>>>>>> On 17 January 2014 17:53, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: >>>>>>> Specifically, the KVM API says "here's a uint8_t[] byte >>>>>>> array and a length", and the current QEMU code treats that >>>>>>> as "this is a byte array written as if the guest CPU >>>>>>> (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its >>>>>>> I/O access to this buffer rather than to the device". >>>>>>> >>>>>>> The KVM API docs don't actually specify the endianness >>>>>>> semantics of the byte array, but I think that that really >>>>>>> needs to be nailed down. I can think of a couple of options: >>>>>>> * always LE >>>>>>> * always BE >>>>>>> [these first two are non-starters because they would >>>>>>> break either x86 or PPC existing code] >>>>>>> * always the endianness the guest is at the time >>>>>>> * always some arbitrary endianness based purely on the >>>>>>> endianness the KVM implementation used historically >>>>>>> * always the endianness of the host QEMU binary >>>>>>> * something else? >>>>>>> >>>>>>> Any preferences? Current QEMU code basically assumes >>>>>>> "always the endianness of TARGET_WORDS_BIGENDIAN", >>>>>>> which is pretty random. >>>>>> >>>>>> Having thought a little more about this, my opinion is: >>>>>> >>>>>> * we should specify that the byte order of the mmio.data >>>>>> array is host kernel endianness (ie same endianness >>>>>> as the QEMU process itself) [this is what it actually >>>>>> is, I think, for all the cases that work today] >>> >>> In above please consider two types of mapped emulated >>> h/w devices: BE and LE they cannot have mmio.data in the >>> same endianity. Currently in all observable cases LE ARM >>> and BE PPC devices endianity matches kernel/qemu >>> endianity but it would break when BE ARM is introduced >>> or LE PPC or one would start emulating BE devices on LE >>> ARM. >>> >>>>>> * we should fix the code path in QEMU for handling >>>>>> mmio.data which currently has the implicit assumption >>>>>> that when using KVM TARGET_WORDS_BIGENDIAN is the same >>>>>> as the QEMU host process endianness (because it's using >>>>>> load/store functions which swap if TARGET_WORDS_BIGENDIAN >>>>>> is different from HOST_WORDS_BIGENDIAN) >>> >>> I do not follow above. Maybe I am missing bigger context. >>> What is CPU under discussion in above? On ARM V7 system >>> when LE device is accessed as integer &mmio.data[0] address >>> would contain integer is in LE format, ie mmio.data[0] is LSB. >>> >>> Here is gdb session of LE qemu running on V7 LE kernel and >>> TC1 LE guest. Guest kernel accesses sys_cfgstat register which is >>> arm_sysctl registers with offset of 0xa8. Note.arm_sysct is memory >>> mapped LE device. >>> Please check run->mmio structure after read >>> (cpu_physical_memory_rw) completes it is in 4 bytes integer in >>> LE format mmio.data[0] is LSB and is equal to 1 >>> (s->syscfgstat value): >>> >>> (gdb) bt >>> #0 arm_sysctl_read (opaque=0x95a600, offset=168, size=4) at >>> /home/root/20131219/qemu-be/hw/misc/arm_sysctl.c:127 >>> #1 0x0023b9b4 in memory_region_read_accessor (mr=0x95b8e0, >>> addr=<optimized out>, value=0xb5c0dc18, size=4, shift=0, >>> mask=4294967295) >>> at /home/root/20131219/qemu-be/memory.c:407 >>> #2 0x0023aba4 in access_with_adjusted_size (addr=4294967295, >>> value=0xb5c0dc18, value@entry=0xb5c0dc10, size=size@entry=4, >>> access_size_min=1, >>> access_size_max=2357596, access=access@entry=0x23b96c >>> <memory_region_read_accessor>, mr=mr@entry=0x95b8e0) at >>> /home/root/20131219/qemu-be/memory.c:477 >>> #3 0x0023f95c in memory_region_dispatch_read1 (size=4, addr=168, >>> mr=0x95b8e0) at /home/root/20131219/qemu-be/memory.c:944 >>> #4 memory_region_dispatch_read (size=4, pval=0xb5c0dc68, addr=168, >>> mr=0x95b8e0) at /home/root/20131219/qemu-be/memory.c:966 >>> #5 io_mem_read (mr=mr@entry=0x95b8e0, addr=<optimized out>, >>> pval=pval@entry=0xb5c0dc68, size=size@entry=4) at >>> /home/root/20131219/qemu-be/memory.c:1743 >>> #6 0x001abd38 in address_space_rw (as=as@entry=0x8102d8 >>> <address_space_memory>, addr=469827752, buf=buf@entry=0xb6fd6028 "", >>> len=4, is_write=false, >>> is_write@entry=true) at /home/root/20131219/qemu-be/exec.c:2025 >>> #7 0x001abf90 in cpu_physical_memory_rw (addr=<optimized out>, >>> buf=buf@entry=0xb6fd6028 "", len=<optimized out>, is_write=0) >>> at /home/root/20131219/qemu-be/exec.c:2070 >>> #8 0x00239e00 in kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at >>> /home/root/20131219/qemu-be/kvm-all.c:1701 >>> #9 0x001a3f78 in qemu_kvm_cpu_thread_fn (arg=0x8758f8) at >>> /home/root/20131219/qemu-be/cpus.c:874 >>> #10 0xb6cae06c in start_thread (arg=0xb5c0e310) at pthread_create.c:314 >>> #11 0xb69f5070 in ?? () at >>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >>> #12 0xb69f5070 in ?? () at >>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>> (gdb) p /x s->sys_cfgstat >>> $25 = 0x1 >>> (gdb) finish >>> Run till exit from #0 arm_sysctl_read (opaque=0x95a600, offset=168, >>> size=4) at /home/root/20131219/qemu-be/hw/misc/arm_sysctl.c:127 >>> memory_region_read_accessor (mr=0x95b8e0, addr=<optimized out>, >>> value=0xb5c0dc18, size=4, shift=0, mask=4294967295) at >>> /home/root/20131219/qemu-be/memory.c:408 >>> 408 trace_memory_region_ops_read(mr, addr, tmp, size); >>> Value returned is $26 = 1 >>> (gdb) enable 2 >>> (gdb) cont >>> Continuing. >>> >>> Breakpoint 2, kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at >>> /home/root/20131219/qemu-be/kvm-all.c:1660 >>> 1660 kvm_arch_pre_run(cpu, run); >>> (gdb) bt >>> #0 kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at >>> /home/root/20131219/qemu-be/kvm-all.c:1660 >>> #1 0x001a3f78 in qemu_kvm_cpu_thread_fn (arg=0x8758f8) at >>> /home/root/20131219/qemu-be/cpus.c:874 >>> #2 0xb6cae06c in start_thread (arg=0xb5c0e310) at pthread_create.c:314 >>> #3 0xb69f5070 in ?? () at >>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >>> #4 0xb69f5070 in ?? () at >>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>> (gdb) p /x run->mmio >>> $27 = {phys_addr = 0x1c0100a8, data = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, >>> 0x0, 0x0}, len = 0x4, is_write = 0x0} >>> >>> Also please look at adjust_endianness function and >>> struct MemoryRegion 'endianness' field. IMHO in qemu it >>> works quite nicely already. MemoryRegion 'read' and 'write' >>> callbacks return/get data in native format adjust_endianness >>> function checks whether emulated device endianness matches >>> emulator endianness and if it is different it does byteswap >>> according to size. As in above example arm_sysctl_ops memory >>> region should be marked as DEVICE_LITTLE_ENDIAN when it >>> returns s->sys_cfgstat value LE qemu sees that endianity >>> matches and it does not byteswap of result, so integer at >>> &mmio.data[0] address is in LE form. When qemu would >>> run in BE mode on BE kernel, it would see that endianity >>> mismatches and it will byteswap s->sys_cfgstat native value >>> (BE), so mmio.data would contain integer in LE format again. >>> >>> Note in currently committed code arm_sysctl_ops endianity >>> is DEVICE_NATIVE_ENDIAN, which is wrong - real vexpress >>> arm_sysctl device always gives/receives data in LE format regardless >>> of current CPSR E bit value, so it cannot be marked as NATIVE. >>> LE and BE kernels always read it as LE device; BE kernel follows >>> with byteswap. It was OK while we just run qemu in LE, but it >>> should be fixed to be LITTLE_ENDIAN for BE qemu work correctly >>> ... and actually that device and few other ARM specific devices >>> endianity change to LITTLE_ENDIAN was the only change in qemu >>> to make BE KVM to work. >>> >>>>> >>>>> Yes, I fully agree :). >>>> Great, I'll prepare a patch for the KVM API documentation. >>>> >>>> -Christoffer >>>> _______________________________________________ >>>> kvmarm mailing list >>>> kvmarm@xxxxxxxxxxxxxxxxxxxxx >>>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm >>> >>> Thanks, >>> Victor >>> >>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/thread.html#223186 >>> >>> >>> Appendix >>> Data path endianity in ARM KVM mmio >>> =================================== >>> >>> This writeup considers several scenarios and tracks endianity >>> of data how it travels from emulator to guest CPU register, in >>> case of ARM KVM. It starts with currently committed code for LE >>> KVM host case and further discusses proposed BE KVM host >>> arrangement. >>> >>> Just to restrict discussion writeup considers code path of >>> integer (4 bytes) read from h/w mapped emulated device memory. >>> Writeup considers endianity of essential places involved in such >>> code path. >>> >>> For all cases when endianity is defined, it is assumed that >>> values under consideration are in memory (opposite to be in >>> register that does not have endianity). I.e even if function >>> variable could be actually allocated in CPU register writeup >>> will reference to it as it is in memory, just to keep >>> discussion clean, except for final guest CPU register. >>> >>> Let's consider the following places along data path from >>> emulator to guest CPU register: >>> >>> 1) emulator code that holds integer value to be read, assume >>> it would be global 'int emulated_hw_device_val' variable. >>> Normally in emulator it is held in native endian format - i.e >>> it is CPSR E bit is the same as kernel CPSR E bit. Just for >>> discussion sake assume that this h/w device registers >>> holds 5 as its value. >>> >>> 2) KVM_EXIT_MMIO part of 'struct kvm_run' structure, i.e >>> mmio.data byte array. Byte array does not have endianity, >>> but for this discussion it would track endianity of integer >>> at &mmio.data[0] address >>> >>> 3) 'data' variable type of 'unsigned long' in >>> kvm_handle_mmio_return function before vcpu_data_host_to_guest >>> call. KVM host mmio_read_buf function is used to fill this >>> variable from mmio.data buffer. mmio_read_buf actually >>> acts as memcpy from mmio.data buffer address, >>> just taking access size in account. >>> >>> 4) the same 'data' variable as above, but after >>> vcpu_data_host_to_guest function call, just before it is copied >>> to vcpu_reg target register location. Note >>> vcpu_data_host_to_guest function may byteswap value of 'data' >>> depending on current KVM host endianity and value of >>> guest CPSR E bit. >>> >>> 5) guest CPU spilled register array, location of target register >>> i.e integer at vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) address >>> >>> 6) finally guest CPU register filled from vcpu_reg just before >>> guest resume execution of trapped emulated instruction. Note >>> it is done by hypervisor part of code and hypervisor EE bit is >>> the same as KVM host CPSR E bit. >>> >>> Note again, KVM host, emulator, and hypervisor part of code (guest >>> CPU registers save and restore code) always run in the same >>> endianity. Endianity of accessed emulated devices and endianity >>> of guest varies independently of KVM host endianity. >>> >>> Below sections consider all permutations of all possible cases, >>> it maybe quite boring to read. I've created summary table at >>> the end, you can jump to the table, after reading few cases. >>> But if you have objections and you see things happen differently >>> please comment inline of the use cases steps. >>> >>> LE KVM host >>> =========== >>> >>> Use case 1 >>> ---------- >>> >>> Emulated h/w device gives data in LE form; emulator and KVM >>> host endianity is LE (host CPSR E bit is off); guest compiled >>> in LE mode; and guest does access with CPSR E bit off >>> >>> 1) 'emulated_hw_device_val' emulator variable is LE >>> 2) &mmio.data[0] holds integer in LE format, matches device >>> endianity >>> 3) 'data' is LE >>> 4) 'data' is LE (since guest CPSR E bit is off no byteswap) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >>> 6) final guest target CPU register contains 5 (0x00000005) >>> >>> guest resumes execution ... Let's say after 'ldr r1, [r0]' >>> instruction, where r0 holds address of devices, it knows >>> that it reads LE mapped h/w so no addition processing is >>> needed >>> >>> Use case 2 >>> ---------- >>> >>> Emulated h/w device gives data in LE form; emulator and KVM >>> host endianity is LE (host CPSR E bit is off); guest compiled >>> in BE mode; and guest does access with CPSR E bit on >>> >>> 1) 'emulated_hw_device_val' emulator variable is LE >>> 2) &mmio.data[0] holds integer in LE format; matches device >>> endianity >>> 3) 'data' is LE >>> 4) 'data' is BE (since guest CPSR E bit is on, vcpu_data_host_to_guest >>> will do byteswap: cpu_to_be) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >>> 6) final guest target CPU register contains 0x05000000 >>> >>> guest resumes execution after 'ldr r1, [r0]', guest kernel >>> knows that it runs in BE mode (E bit on), it knows that it reads >>> LE device memory, it needs to byteswap r1 before further >>> processing so it does 'rev r1, r1' and proceed with result >>> >>> Use case 3 >>> ---------- >>> >>> Emulated h/w device gives data in BE form; emulator and KVM >>> host endianity is LE (host CPSR E bit is off); guest compiled >>> in LE mode; and guest does access with CPSR E bit off >>> >>> 1) 'emulated_hw_device_val' emulator variable is LE >>> 2) &mmio.data[0] holds integer in BE format; emulator byteswaps >>> it because it knows that device endianity is opposite to native, >>> and it should match device endianity >>> 3) 'data' is BE >>> 4) 'data' is BE (since guest CPSR E bit is off no byteswap) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >>> 6) final guest target CPU register contains 0x05000000 >>> >>> guest resumes execution after 'ldr r1, [r0]', guest kernel >>> knows that it runs in LE mode (E bit off), it knows that it >>> reads BE device memory, it need to byteswap r1 before further >>> processing so it does 'rev r1, r1' and proceeds with result >>> >>> Use case 4 >>> ---------- >>> >>> Emulated h/w device gives data in BE form; emulator and KVM >>> host endianity is LE (host CPSR E bit is off); guest compiled >>> in BE mode; and guest does access with CPSR E bit on >>> >>> 1) 'emulated_hw_device_val' emulator variable is LE >>> 2) &mmio.data[0] holds integer in BE format; emulator byteswaps >>> it because it knows that device endianity is opposite to native, >>> and should match device endianity >>> 3) 'data' is BE >>> 4) 'data' is LE (since guest CPSR E bit is on, vcpu_data_host_to_guest >>> will do byteswap: cpu_to_be) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >>> 6) final guest target CPU register contains 5 (0x00000005) >>> >>> guest resumes execution after 'ldr r1, [r0]', guest kernel >>> knows that it runs in BE mode, it knows that it reads BE device >>> memory, so it does not need to do anything before further >>> processing. >>> >>> >>> Above uses cases that is exactly what we have now after Marc's >>> commit to support BE guest on LE KVM host. Further use >>> cases describe how it would work with BE KVM patches I proposed. >>> It is understood that it is subject of further discussion. >>> >>> >>> BE KVM host >>> =========== >>> >>> Use case 5 >>> ---------- >>> >>> Emulated h/w device gives data in LE form; emulator and KVM >>> host endianity is BE (host CPSR E bit is on); guest compiled >>> in BE mode; and guest does access with CPSR E bit on >>> >>> 1) 'emulated_hw_device_val' emulator variable is BE >>> 2) &mmio.data[0] holds integer in LE format; emulator byteswaps >>> it because it knows that device endianity is opposite to native; >>> matches device endianity >>> 3) 'data' is LE >>> 4) 'data' is LE (since guest CPSR E bit is on, BE KVM host kernel >>> does *not* do byteswap: cpu_to_be no effect in BE host kernel) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >>> 6) final guest target CPU register contains 0x05000000 because >>> hypervisor runs in BE mode, so load of LE integer will be >>> byteswapped value in register >>> >>> guest resumes execution after 'ldr r1, [r0]', guest kernel >>> knows that it runs in BE mode, it knows that it reads LE device >>> memory, it need to byteswap r1 before further processing so it >>> does 'rev r1, r1' and proceeds with result >>> >>> Use case 6 >>> ---------- >>> >>> Emulated h/w device gives data in LE form; emulator and KVM >>> host endianity is BE (host CPSR E bit is on); guest compiled >>> in LE mode; and guest does access with CPSR E bit off >>> >>> 1) 'emulated_hw_device_val' emulator variable is BE >>> 2) &mmio.data[0] holds integer in LE format; emulator byteswaps >>> it because it knows that device endianity is opposite to native; >>> matches device endianity >>> 3) 'data' is LE >>> 4) 'data' is BE (since guest CPSR E bit is off, BE KVM host kernel >>> does byteswap: cpu_to_le) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >>> 6) final guest target CPU register contains 5 (0x00000005) because >>> hypervisor runs in BE mode, so load of BE integer will be OK >>> >>> guest resumes execution after 'ldr r1, [r0]', guest kernel >>> knows that it runs in LE mode, it knows that it reads LE device >>> memory, so it does not need to do anything else it just proceeds >>> >>> Use case 7 >>> ---------- >>> >>> Emulated h/w device gives data in BE form; emulator and KVM >>> host endianity is BE (host CPSR E bit is on); guest compiled >>> in BE mode; and guest does access with CPSR E bit on >>> >>> 1) 'emulated_hw_device_val' emulator variable is BE >>> 2) &mmio.data[0] holds integer in BE format; matches device >>> endianity >>> 3) 'data' is BE >>> 4) 'data' is BE (since guest CPSR E bit is on, BE KVM host kernel >>> does *not* do byteswap: cpu_to_be no effect in BE host kernel) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >>> 6) final guest target CPU register contains 5 (0x00000005) because >>> hypervisor runs in BE mode, so load of BE integer will be OK >>> >>> guest resumes execution after 'ldr r1, [r0]', guest kernel >>> knows that it runs in BE mode, it knows that it reads BE device >>> memory, so it does not need to do anything else it just proceeds >>> >>> Use case 8 >>> ---------- >>> >>> Emulated h/w device gives data in BE form; emulator and KVM >>> host endianity is BE (host CPSR E bit is on); guest compiled >>> in LE mode; and guest does access with CPSR E bit off >>> >>> 1) 'emulated_hw_device_val' emulator variable is BE >>> 2) &mmio.data[0] holds integer in BE format; matches device >>> endianity >>> 3) 'data' is BE >>> 4) 'data' is LE (since guest CPSR E bit is off, BE KVM host kernel >>> does byteswap: cpu_to_le) >>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >>> 6) final guest target CPU register contains 0x05000000 because >>> hypervisor runs in BE mode, so load of LE integer will be >>> byteswapped value in register >>> >>> guest resumes execution after 'ldr r1, [r0]', guest kernel >>> knows that it runs in LE mode, it knows that it reads BE device >>> memory, it need to byteswap r1 before further processing so it >>> does 'rev r1, r1' and proceeds with result >>> >>> Note that with BE kernel we actually have some initial portion >>> of assembler code that is executed with CPSR bit off and it reads >>> LE h/w - i.e it falls into use case 1. >>> >>> Summary Table (please use fixed font to see it correctly) >>> ======================================== >>> >>> -------------------------------------------------------------- >>> | Use Case # | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | >>> -------------------------------------------------------------- >>> | KVM Host, | LE | LE | LE | LE | BE | BE | BE | BE | >>> | Emulator, | | | | | | | | | >>> | Hypervisor | | | | | | | | | >>> | Endianity | | | | | | | | | >>> -------------------------------------------------------------- >>> | Device | LE | LE | BE | BE | LE | LE | BE | BE | >>> | Endianity | | | | | | | | | >>> -------------------------------------------------------------- >>> | Guest | LE | BE | LE | BE | BE | LE | BE | LE | >>> | Access | | | | | | | | | >>> | Endianity | | | | | | | | | >>> -------------------------------------------------------------- >>> | Step 1) | LE | LE | LE | LE | BE | BE | BE | BE | >>> -------------------------------------------------------------- >>> | Step 2) | LE | LE | BE | BE | LE | LE | BE | BE | >>> -------------------------------------------------------------- >>> | Step 3) | LE | LE | BE | BE | LE | LE | BE | BE | >>> -------------------------------------------------------------- >>> | Step 4) | LE | BE | BE | LE | LE | BE | BE | LE | >>> -------------------------------------------------------------- >>> | Step 5) | LE | BE | BE | LE | LE | BE | BE | LE | >>> -------------------------------------------------------------- >>> | Final Reg | no | yes | yes | no | yes | no | no | yes | >>> | value | | | | | | | | | >>> | byteswapped| | | | | | | | | >>> -------------------------------------------------------------- >>> | Guest | no | yes | yes | no | yes | no | no | yes | >>> | Follows | | | | | | | | | >>> | with rev | | | | | | | | | >>> -------------------------------------------------------------- >>> >>> Few objservations >>> ================= >>> >>> x) Note above table is symmetric wrt to BE<->LE change: >>> 1<-->7 >>> 2<-->8 >>> 3<-->5 >>> 4<-->6 >>> >>> x) &mmio.data[0] address always holds integer in the same >>> format as emulated device endianity >>> >>> x) During step 4) when vcpu_data_host_to_guest function >>> is used, if guest E bit value different, but everything else >>> is the same, opposite result are produced (1&2, 3&4, 5&6, >>> 7&8) >>> >>> If you reached to this end :), again, thank you very much for >>> reading it! >>> >>> - Victor >>> _______________________________________________ >>> kvmarm mailing list >>> kvmarm@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm >> >> Hi Victor, >> >> First of all I really appreciate the thorough description with >> all the use-cases. >> >> Below would be a summary of what I understood from your >> analysis: >> >> 1. Any MMIO device marked as NATIVE ENDIAN in user > > "Native endian" really is just a shortcut for "target endian" which is LE for ARM and BE for PPC. There shouldn't be a qemu-system-armeb or qemu-system-ppc64le. > > QEMU emulates everything that comes after the CPU, so imagine the ioctl struct as a bus package. Your bus doesn't care what endianness the CPU is in - it just gets data from the CPU. > > A bus write on the CPU however honors the endianness setting of the CPU. So when we convert from a value in register to a value on the bus we need to take this endian configuration into account. > > That's exactly what we are talking about here. KVM should do the cpu configured register->bus endian mapping while QEMU does the bus->device endian map. Thanks for the info on QEMU side handling of MMIO data. I was not aware that we would be only have "target endian = LE" for ARM/ARM64 in QEMU. I think Marc Z had mentioned similar thing about MMIO this in our previous discussions on his patches. (Please refer, http://www.spinics.net/lists/arm-kernel/msg283313.html) This clearly means MMIO data passed to user space (QEMU) has to of host endianness so that QEMU can take care of bust->device endian map. Current vcpu_data_guest_to_host() and vcpu_data_host_to_guest() does not perform endianness conversion of MMIO data to LE when we are running LE guest on BE host so we do need Victor's patch for fixing vcpu_data_guest_to_host() and vcpu_data_host_to_guest(). (Already reported long time back by me, http://www.spinics.net/lists/arm-kernel/msg283308.html) Regards, Anup > > > Alex > >> space tool (QEMU or KVMTOOL) is bad for cross-endian >> Guest. For supporting cross-endian Guest we need to have >> all MMIO device with fixed ENDIANESS. >> >> 2. We don't need to do any endianness conversions in KVM >> for MMIO writes that are being forwarded to user space. It is >> the job of user space (QEMU or KVMTOOL) to interpret the >> endianness of MMIO write data based on device endianness. >> >> 3. The MMIO read operation is the one which will need >> explicit handling in KVM because the target VCPU register >> of MMIO read operation should be loaded with MMIO data >> (returned from user space) based upon current VCPU >> endianness (i.e. VCPU CPSR.E bit). >> >> 4. In-kernel emulated devices (such as VGIC) will have not >> require any explicit endianness conversion of MMIO data for >> MMIO write operations (same as point 2). >> >> 5. In-kernel emulated devices (such as VGIC) will have to >> explicit endianness conversion of MMIO data for MMIO read >> operations based on device endianness (same as point 3). >> >> I hope above summary of my understanding is as-per your >> description. If so then I am in-support of your description. >> >> I think your description (and above 5 points) takes care of >> all use cases of cross-endianness without changing current >> MMIO ABI. >> >> Regards, >> Anup >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html