> Am 22.01.2014 um 07:31 schrieb Anup Patel <anup@xxxxxxxxxxxxxx>: > > On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky > <victor.kamensky@xxxxxxxxxx> wrote: >> Hi Guys, >> >> Christoffer and I had a bit heated chat :) on this >> subject last night. Christoffer, really appreciate >> your time! We did not really reach agreement >> during the chat and Christoffer asked me to follow >> up on this thread. >> Here it goes. Sorry, it is very long email. >> >> I don't believe we can assign any endianity to >> mmio.data[] byte array. I believe mmio.data[] and >> mmio.len acts just memcpy and that is all. As >> memcpy does not imply any endianity of underlying >> data mmio.data[] should not either. >> >> Here is my definition: >> >> mmio.data[] is array of bytes that contains memory >> bytes in such form, for read case, that if those >> bytes are placed in guest memory and guest executes >> the same read access instruction with address to this >> memory, result would be the same as real h/w device >> memory access. Rest of KVM host and hypervisor >> part of code should really take care of mmio.data[] >> memory so it will be delivered to vcpu registers and >> restored by hypervisor part in such way that guest CPU >> register value is the same as it would be for real >> non-emulated h/w read access (that is emulation part). >> The same goes for write access, if guest writes into >> memory and those bytes are just copied to emulated >> h/w register it would have the same effect as real >> mapped h/w register write. >> >> In shorter form, i.e for len=4 access: endianity of integer >> at &mmio.data[0] address should match endianity >> of emulated h/w device behind phys_addr address, >> regardless what is endianity of emulator, KVM host, >> hypervisor, and guest >> >> Examples that illustrate my definition >> -------------------------------------- >> >> 1) LE guest (E bit is off in ARM speak) reads integer >> (4 bytes) from mapped h/w LE device register - >> mmio.data[3] contains MSB, mmio.data[0] contains LSB. >> >> 2) BE guest (E bit is on in ARM speak) reads integer >> from mapped h/w LE device register - mmio.data[3] >> contains MSB, mmio.data[0] contains LSB. Note that >> if &mmio.data[0] memory would be placed in guest >> address space and instruction restarted with new >> address, then it would meet BE guest expectations >> - the guest knows that it reads LE h/w so it will byteswap >> register before processing it further. This is BE guest ARM >> case (regardless of what KVM host endianity is). >> >> 3) BE guest reads integer from mapped h/w BE device >> register - mmio.data[0] contains MSB, mmio.data[3] >> contains LSB. Note that if &mmio.data[0] memory would >> be placed in guest address space and instruction >> restarted with new address, then it would meet BE >> guest expectation - the guest knows that it reads >> BE h/w so it will proceed further without any other >> work. I guess, it is BE ppc case. >> >> >> Arguments in favor of memcpy semantics of mmio.data[] >> ------------------------------------------------------ >> >> x) What are possible values of 'len'? Previous discussions >> imply that is always powers of 2. Why is that? Maybe >> there will be CPU that would need to do 5 bytes mmio >> access, or 6 bytes. How do you assign endianity to >> such case? 'len' 5 or 6, or any works fine with >> memcpy semantics. I admit it is hypothetical case, but >> IMHO it tests how clean ABI definition is. >> >> x) Byte array does not have endianity because it >> does not have any structure. If one would want to >> imply structure why mmio is not defined in such way >> so structure reflected in mmio definition? >> Something like: >> >> >> /* KVM_EXIT_MMIO */ >> struct { >> __u64 phys_addr; >> union { >> __u8 byte; >> __u16 hword; >> __u32 word; >> __u64 dword; >> } data; >> __u32 len; >> __u8 is_write; >> } mmio; >> >> where len is really serves as union discriminator and >> only allowed len values are 1, 2, 4, 8. >> In this case, I agree, endianity of integer types >> should be defined. I believe, use of byte array strongly >> implies that original intent was to have semantics of >> byte stream copy, just like memcpy does. >> >> x) Note there is nothing wrong with user kernel ABI to >> use just bytes stream as parameter. There is already >> precedents like 'read' and 'write' system calls :). >> >> x) Consider case when KVM works with emulated memory mapped >> h/w devices where some devices operate in LE mode and others >> operate in BE mode. It is defined by semantics of real h/w >> device which is it, and should be emulated by emulator and KVM >> given all other context. As far as mmio.data[] array concerned, if the >> same integer value is read from these devices registers, mmio.data[] >> memory should contain integer in opposite endianity for these >> two cases, i.e MSB is data[0] in one case and MSB is >> data[3] is in another case. It cannot be the same, because >> except emulator and guest kernel, all other, like KVM host >> and hypervisor, have no clue what endianity of device >> actually is - it should treat mmio.data[] in the same way. >> But resulting guest target CPU register would need to contain >> normal integer value in one case and byteswapped in another, >> because guest kernel would use it directly in one case and >> byteswap it in another. Byte stream semantics allows to do >> that. I don't see how it could happen if you fixate mmio.data[] >> endianity in such way that it would contain integer in >> the same format for BE and LE emulated device types. >> >> If by this point you agree, that mmio.data[] user-land/kernel >> ABI semantics should be just memcpy, stop reading :). If not, >> you may would like to take a look at below appendix where I >> described in great details endianity of data at different >> points along mmio processing code path of existing ARM LE KVM, >> and proposed ARM BE KVM. Note appendix, is very long and very >> detailed, sorry about that, but I feel that earlier more >> digested explanations failed, so it driven me to write out >> all details how I see them. If I am wrong, I hope it would be >> easier for folks to point in detailed explanation places >> where my logic goes bad. Also, I am not sure whether this >> mail thread is good place to discuss all details described >> in the appendix. Christoffer, please advise whether I should take >> that one back on [1]. But I hope this bigger picture may help to >> see the mmio.data[] semantics issue in context. >> >> More inline and appendix is at the end. >> >>> On 20 January 2014 11:19, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: >>>> On Mon, Jan 20, 2014 at 03:22:11PM +0100, Alexander Graf wrote: >>>> >>>>> On 17.01.2014, at 19:52, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: >>>>> >>>>>> On 17 January 2014 17:53, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: >>>>>> Specifically, the KVM API says "here's a uint8_t[] byte >>>>>> array and a length", and the current QEMU code treats that >>>>>> as "this is a byte array written as if the guest CPU >>>>>> (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its >>>>>> I/O access to this buffer rather than to the device". >>>>>> >>>>>> The KVM API docs don't actually specify the endianness >>>>>> semantics of the byte array, but I think that that really >>>>>> needs to be nailed down. I can think of a couple of options: >>>>>> * always LE >>>>>> * always BE >>>>>> [these first two are non-starters because they would >>>>>> break either x86 or PPC existing code] >>>>>> * always the endianness the guest is at the time >>>>>> * always some arbitrary endianness based purely on the >>>>>> endianness the KVM implementation used historically >>>>>> * always the endianness of the host QEMU binary >>>>>> * something else? >>>>>> >>>>>> Any preferences? Current QEMU code basically assumes >>>>>> "always the endianness of TARGET_WORDS_BIGENDIAN", >>>>>> which is pretty random. >>>>> >>>>> Having thought a little more about this, my opinion is: >>>>> >>>>> * we should specify that the byte order of the mmio.data >>>>> array is host kernel endianness (ie same endianness >>>>> as the QEMU process itself) [this is what it actually >>>>> is, I think, for all the cases that work today] >> >> In above please consider two types of mapped emulated >> h/w devices: BE and LE they cannot have mmio.data in the >> same endianity. Currently in all observable cases LE ARM >> and BE PPC devices endianity matches kernel/qemu >> endianity but it would break when BE ARM is introduced >> or LE PPC or one would start emulating BE devices on LE >> ARM. >> >>>>> * we should fix the code path in QEMU for handling >>>>> mmio.data which currently has the implicit assumption >>>>> that when using KVM TARGET_WORDS_BIGENDIAN is the same >>>>> as the QEMU host process endianness (because it's using >>>>> load/store functions which swap if TARGET_WORDS_BIGENDIAN >>>>> is different from HOST_WORDS_BIGENDIAN) >> >> I do not follow above. Maybe I am missing bigger context. >> What is CPU under discussion in above? On ARM V7 system >> when LE device is accessed as integer &mmio.data[0] address >> would contain integer is in LE format, ie mmio.data[0] is LSB. >> >> Here is gdb session of LE qemu running on V7 LE kernel and >> TC1 LE guest. Guest kernel accesses sys_cfgstat register which is >> arm_sysctl registers with offset of 0xa8. Note.arm_sysct is memory >> mapped LE device. >> Please check run->mmio structure after read >> (cpu_physical_memory_rw) completes it is in 4 bytes integer in >> LE format mmio.data[0] is LSB and is equal to 1 >> (s->syscfgstat value): >> >> (gdb) bt >> #0 arm_sysctl_read (opaque=0x95a600, offset=168, size=4) at >> /home/root/20131219/qemu-be/hw/misc/arm_sysctl.c:127 >> #1 0x0023b9b4 in memory_region_read_accessor (mr=0x95b8e0, >> addr=<optimized out>, value=0xb5c0dc18, size=4, shift=0, >> mask=4294967295) >> at /home/root/20131219/qemu-be/memory.c:407 >> #2 0x0023aba4 in access_with_adjusted_size (addr=4294967295, >> value=0xb5c0dc18, value@entry=0xb5c0dc10, size=size@entry=4, >> access_size_min=1, >> access_size_max=2357596, access=access@entry=0x23b96c >> <memory_region_read_accessor>, mr=mr@entry=0x95b8e0) at >> /home/root/20131219/qemu-be/memory.c:477 >> #3 0x0023f95c in memory_region_dispatch_read1 (size=4, addr=168, >> mr=0x95b8e0) at /home/root/20131219/qemu-be/memory.c:944 >> #4 memory_region_dispatch_read (size=4, pval=0xb5c0dc68, addr=168, >> mr=0x95b8e0) at /home/root/20131219/qemu-be/memory.c:966 >> #5 io_mem_read (mr=mr@entry=0x95b8e0, addr=<optimized out>, >> pval=pval@entry=0xb5c0dc68, size=size@entry=4) at >> /home/root/20131219/qemu-be/memory.c:1743 >> #6 0x001abd38 in address_space_rw (as=as@entry=0x8102d8 >> <address_space_memory>, addr=469827752, buf=buf@entry=0xb6fd6028 "", >> len=4, is_write=false, >> is_write@entry=true) at /home/root/20131219/qemu-be/exec.c:2025 >> #7 0x001abf90 in cpu_physical_memory_rw (addr=<optimized out>, >> buf=buf@entry=0xb6fd6028 "", len=<optimized out>, is_write=0) >> at /home/root/20131219/qemu-be/exec.c:2070 >> #8 0x00239e00 in kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at >> /home/root/20131219/qemu-be/kvm-all.c:1701 >> #9 0x001a3f78 in qemu_kvm_cpu_thread_fn (arg=0x8758f8) at >> /home/root/20131219/qemu-be/cpus.c:874 >> #10 0xb6cae06c in start_thread (arg=0xb5c0e310) at pthread_create.c:314 >> #11 0xb69f5070 in ?? () at >> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >> #12 0xb69f5070 in ?? () at >> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >> (gdb) p /x s->sys_cfgstat >> $25 = 0x1 >> (gdb) finish >> Run till exit from #0 arm_sysctl_read (opaque=0x95a600, offset=168, >> size=4) at /home/root/20131219/qemu-be/hw/misc/arm_sysctl.c:127 >> memory_region_read_accessor (mr=0x95b8e0, addr=<optimized out>, >> value=0xb5c0dc18, size=4, shift=0, mask=4294967295) at >> /home/root/20131219/qemu-be/memory.c:408 >> 408 trace_memory_region_ops_read(mr, addr, tmp, size); >> Value returned is $26 = 1 >> (gdb) enable 2 >> (gdb) cont >> Continuing. >> >> Breakpoint 2, kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at >> /home/root/20131219/qemu-be/kvm-all.c:1660 >> 1660 kvm_arch_pre_run(cpu, run); >> (gdb) bt >> #0 kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at >> /home/root/20131219/qemu-be/kvm-all.c:1660 >> #1 0x001a3f78 in qemu_kvm_cpu_thread_fn (arg=0x8758f8) at >> /home/root/20131219/qemu-be/cpus.c:874 >> #2 0xb6cae06c in start_thread (arg=0xb5c0e310) at pthread_create.c:314 >> #3 0xb69f5070 in ?? () at >> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >> #4 0xb69f5070 in ?? () at >> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6 >> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >> (gdb) p /x run->mmio >> $27 = {phys_addr = 0x1c0100a8, data = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0, >> 0x0, 0x0}, len = 0x4, is_write = 0x0} >> >> Also please look at adjust_endianness function and >> struct MemoryRegion 'endianness' field. IMHO in qemu it >> works quite nicely already. MemoryRegion 'read' and 'write' >> callbacks return/get data in native format adjust_endianness >> function checks whether emulated device endianness matches >> emulator endianness and if it is different it does byteswap >> according to size. As in above example arm_sysctl_ops memory >> region should be marked as DEVICE_LITTLE_ENDIAN when it >> returns s->sys_cfgstat value LE qemu sees that endianity >> matches and it does not byteswap of result, so integer at >> &mmio.data[0] address is in LE form. When qemu would >> run in BE mode on BE kernel, it would see that endianity >> mismatches and it will byteswap s->sys_cfgstat native value >> (BE), so mmio.data would contain integer in LE format again. >> >> Note in currently committed code arm_sysctl_ops endianity >> is DEVICE_NATIVE_ENDIAN, which is wrong - real vexpress >> arm_sysctl device always gives/receives data in LE format regardless >> of current CPSR E bit value, so it cannot be marked as NATIVE. >> LE and BE kernels always read it as LE device; BE kernel follows >> with byteswap. It was OK while we just run qemu in LE, but it >> should be fixed to be LITTLE_ENDIAN for BE qemu work correctly >> ... and actually that device and few other ARM specific devices >> endianity change to LITTLE_ENDIAN was the only change in qemu >> to make BE KVM to work. >> >>>> >>>> Yes, I fully agree :). >>> Great, I'll prepare a patch for the KVM API documentation. >>> >>> -Christoffer >>> _______________________________________________ >>> kvmarm mailing list >>> kvmarm@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm >> >> Thanks, >> Victor >> >> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/thread.html#223186 >> >> >> Appendix >> Data path endianity in ARM KVM mmio >> =================================== >> >> This writeup considers several scenarios and tracks endianity >> of data how it travels from emulator to guest CPU register, in >> case of ARM KVM. It starts with currently committed code for LE >> KVM host case and further discusses proposed BE KVM host >> arrangement. >> >> Just to restrict discussion writeup considers code path of >> integer (4 bytes) read from h/w mapped emulated device memory. >> Writeup considers endianity of essential places involved in such >> code path. >> >> For all cases when endianity is defined, it is assumed that >> values under consideration are in memory (opposite to be in >> register that does not have endianity). I.e even if function >> variable could be actually allocated in CPU register writeup >> will reference to it as it is in memory, just to keep >> discussion clean, except for final guest CPU register. >> >> Let's consider the following places along data path from >> emulator to guest CPU register: >> >> 1) emulator code that holds integer value to be read, assume >> it would be global 'int emulated_hw_device_val' variable. >> Normally in emulator it is held in native endian format - i.e >> it is CPSR E bit is the same as kernel CPSR E bit. Just for >> discussion sake assume that this h/w device registers >> holds 5 as its value. >> >> 2) KVM_EXIT_MMIO part of 'struct kvm_run' structure, i.e >> mmio.data byte array. Byte array does not have endianity, >> but for this discussion it would track endianity of integer >> at &mmio.data[0] address >> >> 3) 'data' variable type of 'unsigned long' in >> kvm_handle_mmio_return function before vcpu_data_host_to_guest >> call. KVM host mmio_read_buf function is used to fill this >> variable from mmio.data buffer. mmio_read_buf actually >> acts as memcpy from mmio.data buffer address, >> just taking access size in account. >> >> 4) the same 'data' variable as above, but after >> vcpu_data_host_to_guest function call, just before it is copied >> to vcpu_reg target register location. Note >> vcpu_data_host_to_guest function may byteswap value of 'data' >> depending on current KVM host endianity and value of >> guest CPSR E bit. >> >> 5) guest CPU spilled register array, location of target register >> i.e integer at vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) address >> >> 6) finally guest CPU register filled from vcpu_reg just before >> guest resume execution of trapped emulated instruction. Note >> it is done by hypervisor part of code and hypervisor EE bit is >> the same as KVM host CPSR E bit. >> >> Note again, KVM host, emulator, and hypervisor part of code (guest >> CPU registers save and restore code) always run in the same >> endianity. Endianity of accessed emulated devices and endianity >> of guest varies independently of KVM host endianity. >> >> Below sections consider all permutations of all possible cases, >> it maybe quite boring to read. I've created summary table at >> the end, you can jump to the table, after reading few cases. >> But if you have objections and you see things happen differently >> please comment inline of the use cases steps. >> >> LE KVM host >> =========== >> >> Use case 1 >> ---------- >> >> Emulated h/w device gives data in LE form; emulator and KVM >> host endianity is LE (host CPSR E bit is off); guest compiled >> in LE mode; and guest does access with CPSR E bit off >> >> 1) 'emulated_hw_device_val' emulator variable is LE >> 2) &mmio.data[0] holds integer in LE format, matches device >> endianity >> 3) 'data' is LE >> 4) 'data' is LE (since guest CPSR E bit is off no byteswap) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >> 6) final guest target CPU register contains 5 (0x00000005) >> >> guest resumes execution ... Let's say after 'ldr r1, [r0]' >> instruction, where r0 holds address of devices, it knows >> that it reads LE mapped h/w so no addition processing is >> needed >> >> Use case 2 >> ---------- >> >> Emulated h/w device gives data in LE form; emulator and KVM >> host endianity is LE (host CPSR E bit is off); guest compiled >> in BE mode; and guest does access with CPSR E bit on >> >> 1) 'emulated_hw_device_val' emulator variable is LE >> 2) &mmio.data[0] holds integer in LE format; matches device >> endianity >> 3) 'data' is LE >> 4) 'data' is BE (since guest CPSR E bit is on, vcpu_data_host_to_guest >> will do byteswap: cpu_to_be) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >> 6) final guest target CPU register contains 0x05000000 >> >> guest resumes execution after 'ldr r1, [r0]', guest kernel >> knows that it runs in BE mode (E bit on), it knows that it reads >> LE device memory, it needs to byteswap r1 before further >> processing so it does 'rev r1, r1' and proceed with result >> >> Use case 3 >> ---------- >> >> Emulated h/w device gives data in BE form; emulator and KVM >> host endianity is LE (host CPSR E bit is off); guest compiled >> in LE mode; and guest does access with CPSR E bit off >> >> 1) 'emulated_hw_device_val' emulator variable is LE >> 2) &mmio.data[0] holds integer in BE format; emulator byteswaps >> it because it knows that device endianity is opposite to native, >> and it should match device endianity >> 3) 'data' is BE >> 4) 'data' is BE (since guest CPSR E bit is off no byteswap) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >> 6) final guest target CPU register contains 0x05000000 >> >> guest resumes execution after 'ldr r1, [r0]', guest kernel >> knows that it runs in LE mode (E bit off), it knows that it >> reads BE device memory, it need to byteswap r1 before further >> processing so it does 'rev r1, r1' and proceeds with result >> >> Use case 4 >> ---------- >> >> Emulated h/w device gives data in BE form; emulator and KVM >> host endianity is LE (host CPSR E bit is off); guest compiled >> in BE mode; and guest does access with CPSR E bit on >> >> 1) 'emulated_hw_device_val' emulator variable is LE >> 2) &mmio.data[0] holds integer in BE format; emulator byteswaps >> it because it knows that device endianity is opposite to native, >> and should match device endianity >> 3) 'data' is BE >> 4) 'data' is LE (since guest CPSR E bit is on, vcpu_data_host_to_guest >> will do byteswap: cpu_to_be) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >> 6) final guest target CPU register contains 5 (0x00000005) >> >> guest resumes execution after 'ldr r1, [r0]', guest kernel >> knows that it runs in BE mode, it knows that it reads BE device >> memory, so it does not need to do anything before further >> processing. >> >> >> Above uses cases that is exactly what we have now after Marc's >> commit to support BE guest on LE KVM host. Further use >> cases describe how it would work with BE KVM patches I proposed. >> It is understood that it is subject of further discussion. >> >> >> BE KVM host >> =========== >> >> Use case 5 >> ---------- >> >> Emulated h/w device gives data in LE form; emulator and KVM >> host endianity is BE (host CPSR E bit is on); guest compiled >> in BE mode; and guest does access with CPSR E bit on >> >> 1) 'emulated_hw_device_val' emulator variable is BE >> 2) &mmio.data[0] holds integer in LE format; emulator byteswaps >> it because it knows that device endianity is opposite to native; >> matches device endianity >> 3) 'data' is LE >> 4) 'data' is LE (since guest CPSR E bit is on, BE KVM host kernel >> does *not* do byteswap: cpu_to_be no effect in BE host kernel) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >> 6) final guest target CPU register contains 0x05000000 because >> hypervisor runs in BE mode, so load of LE integer will be >> byteswapped value in register >> >> guest resumes execution after 'ldr r1, [r0]', guest kernel >> knows that it runs in BE mode, it knows that it reads LE device >> memory, it need to byteswap r1 before further processing so it >> does 'rev r1, r1' and proceeds with result >> >> Use case 6 >> ---------- >> >> Emulated h/w device gives data in LE form; emulator and KVM >> host endianity is BE (host CPSR E bit is on); guest compiled >> in LE mode; and guest does access with CPSR E bit off >> >> 1) 'emulated_hw_device_val' emulator variable is BE >> 2) &mmio.data[0] holds integer in LE format; emulator byteswaps >> it because it knows that device endianity is opposite to native; >> matches device endianity >> 3) 'data' is LE >> 4) 'data' is BE (since guest CPSR E bit is off, BE KVM host kernel >> does byteswap: cpu_to_le) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >> 6) final guest target CPU register contains 5 (0x00000005) because >> hypervisor runs in BE mode, so load of BE integer will be OK >> >> guest resumes execution after 'ldr r1, [r0]', guest kernel >> knows that it runs in LE mode, it knows that it reads LE device >> memory, so it does not need to do anything else it just proceeds >> >> Use case 7 >> ---------- >> >> Emulated h/w device gives data in BE form; emulator and KVM >> host endianity is BE (host CPSR E bit is on); guest compiled >> in BE mode; and guest does access with CPSR E bit on >> >> 1) 'emulated_hw_device_val' emulator variable is BE >> 2) &mmio.data[0] holds integer in BE format; matches device >> endianity >> 3) 'data' is BE >> 4) 'data' is BE (since guest CPSR E bit is on, BE KVM host kernel >> does *not* do byteswap: cpu_to_be no effect in BE host kernel) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE >> 6) final guest target CPU register contains 5 (0x00000005) because >> hypervisor runs in BE mode, so load of BE integer will be OK >> >> guest resumes execution after 'ldr r1, [r0]', guest kernel >> knows that it runs in BE mode, it knows that it reads BE device >> memory, so it does not need to do anything else it just proceeds >> >> Use case 8 >> ---------- >> >> Emulated h/w device gives data in BE form; emulator and KVM >> host endianity is BE (host CPSR E bit is on); guest compiled >> in LE mode; and guest does access with CPSR E bit off >> >> 1) 'emulated_hw_device_val' emulator variable is BE >> 2) &mmio.data[0] holds integer in BE format; matches device >> endianity >> 3) 'data' is BE >> 4) 'data' is LE (since guest CPSR E bit is off, BE KVM host kernel >> does byteswap: cpu_to_le) >> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE >> 6) final guest target CPU register contains 0x05000000 because >> hypervisor runs in BE mode, so load of LE integer will be >> byteswapped value in register >> >> guest resumes execution after 'ldr r1, [r0]', guest kernel >> knows that it runs in LE mode, it knows that it reads BE device >> memory, it need to byteswap r1 before further processing so it >> does 'rev r1, r1' and proceeds with result >> >> Note that with BE kernel we actually have some initial portion >> of assembler code that is executed with CPSR bit off and it reads >> LE h/w - i.e it falls into use case 1. >> >> Summary Table (please use fixed font to see it correctly) >> ======================================== >> >> -------------------------------------------------------------- >> | Use Case # | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | >> -------------------------------------------------------------- >> | KVM Host, | LE | LE | LE | LE | BE | BE | BE | BE | >> | Emulator, | | | | | | | | | >> | Hypervisor | | | | | | | | | >> | Endianity | | | | | | | | | >> -------------------------------------------------------------- >> | Device | LE | LE | BE | BE | LE | LE | BE | BE | >> | Endianity | | | | | | | | | >> -------------------------------------------------------------- >> | Guest | LE | BE | LE | BE | BE | LE | BE | LE | >> | Access | | | | | | | | | >> | Endianity | | | | | | | | | >> -------------------------------------------------------------- >> | Step 1) | LE | LE | LE | LE | BE | BE | BE | BE | >> -------------------------------------------------------------- >> | Step 2) | LE | LE | BE | BE | LE | LE | BE | BE | >> -------------------------------------------------------------- >> | Step 3) | LE | LE | BE | BE | LE | LE | BE | BE | >> -------------------------------------------------------------- >> | Step 4) | LE | BE | BE | LE | LE | BE | BE | LE | >> -------------------------------------------------------------- >> | Step 5) | LE | BE | BE | LE | LE | BE | BE | LE | >> -------------------------------------------------------------- >> | Final Reg | no | yes | yes | no | yes | no | no | yes | >> | value | | | | | | | | | >> | byteswapped| | | | | | | | | >> -------------------------------------------------------------- >> | Guest | no | yes | yes | no | yes | no | no | yes | >> | Follows | | | | | | | | | >> | with rev | | | | | | | | | >> -------------------------------------------------------------- >> >> Few objservations >> ================= >> >> x) Note above table is symmetric wrt to BE<->LE change: >> 1<-->7 >> 2<-->8 >> 3<-->5 >> 4<-->6 >> >> x) &mmio.data[0] address always holds integer in the same >> format as emulated device endianity >> >> x) During step 4) when vcpu_data_host_to_guest function >> is used, if guest E bit value different, but everything else >> is the same, opposite result are produced (1&2, 3&4, 5&6, >> 7&8) >> >> If you reached to this end :), again, thank you very much for >> reading it! >> >> - Victor >> _______________________________________________ >> kvmarm mailing list >> kvmarm@xxxxxxxxxxxxxxxxxxxxx >> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm > > Hi Victor, > > First of all I really appreciate the thorough description with > all the use-cases. > > Below would be a summary of what I understood from your > analysis: > > 1. Any MMIO device marked as NATIVE ENDIAN in user "Native endian" really is just a shortcut for "target endian" which is LE for ARM and BE for PPC. There shouldn't be a qemu-system-armeb or qemu-system-ppc64le. QEMU emulates everything that comes after the CPU, so imagine the ioctl struct as a bus package. Your bus doesn't care what endianness the CPU is in - it just gets data from the CPU. A bus write on the CPU however honors the endianness setting of the CPU. So when we convert from a value in register to a value on the bus we need to take this endian configuration into account. That's exactly what we are talking about here. KVM should do the cpu configured register->bus endian mapping while QEMU does the bus->device endian map. Alex > space tool (QEMU or KVMTOOL) is bad for cross-endian > Guest. For supporting cross-endian Guest we need to have > all MMIO device with fixed ENDIANESS. > > 2. We don't need to do any endianness conversions in KVM > for MMIO writes that are being forwarded to user space. It is > the job of user space (QEMU or KVMTOOL) to interpret the > endianness of MMIO write data based on device endianness. > > 3. The MMIO read operation is the one which will need > explicit handling in KVM because the target VCPU register > of MMIO read operation should be loaded with MMIO data > (returned from user space) based upon current VCPU > endianness (i.e. VCPU CPSR.E bit). > > 4. In-kernel emulated devices (such as VGIC) will have not > require any explicit endianness conversion of MMIO data for > MMIO write operations (same as point 2). > > 5. In-kernel emulated devices (such as VGIC) will have to > explicit endianness conversion of MMIO data for MMIO read > operations based on device endianness (same as point 3). > > I hope above summary of my understanding is as-per your > description. If so then I am in-support of your description. > > I think your description (and above 5 points) takes care of > all use cases of cross-endianness without changing current > MMIO ABI. > > Regards, > Anup > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html