Re: [Qemu-ppc] KVM and variable-endianness guest CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> Am 22.01.2014 um 07:31 schrieb Anup Patel <anup@xxxxxxxxxxxxxx>:
> 
> On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
> <victor.kamensky@xxxxxxxxxx> wrote:
>> Hi Guys,
>> 
>> Christoffer and I had a bit heated chat :) on this
>> subject last night. Christoffer, really appreciate
>> your time! We did not really reach agreement
>> during the chat and Christoffer asked me to follow
>> up on this thread.
>> Here it goes. Sorry, it is very long email.
>> 
>> I don't believe we can assign any endianity to
>> mmio.data[] byte array. I believe mmio.data[] and
>> mmio.len acts just memcpy and that is all. As
>> memcpy does not imply any endianity of underlying
>> data mmio.data[] should not either.
>> 
>> Here is my definition:
>> 
>> mmio.data[] is array of bytes that contains memory
>> bytes in such form, for read case, that if those
>> bytes are placed in guest memory and guest executes
>> the same read access instruction with address to this
>> memory, result would be the same as real h/w device
>> memory access. Rest of KVM host and hypervisor
>> part of code should really take care of mmio.data[]
>> memory so it will be delivered to vcpu registers and
>> restored by hypervisor part in such way that guest CPU
>> register value is the same as it would be for real
>> non-emulated h/w read access (that is emulation part).
>> The same goes for write access, if guest writes into
>> memory and those bytes are just copied to emulated
>> h/w register it would have the same effect as real
>> mapped h/w register write.
>> 
>> In shorter form, i.e for len=4 access: endianity of integer
>> at &mmio.data[0] address should match endianity
>> of emulated h/w device behind phys_addr address,
>> regardless what is endianity of emulator, KVM host,
>> hypervisor, and guest
>> 
>> Examples that illustrate my definition
>> --------------------------------------
>> 
>> 1) LE guest (E bit is off in ARM speak) reads integer
>> (4 bytes) from mapped h/w LE device register -
>> mmio.data[3] contains MSB, mmio.data[0] contains LSB.
>> 
>> 2) BE guest (E bit is on in ARM speak) reads integer
>> from mapped h/w LE device register - mmio.data[3]
>> contains MSB, mmio.data[0] contains LSB. Note that
>> if &mmio.data[0] memory would be placed in guest
>> address space and instruction restarted with new
>> address, then it would meet BE guest expectations
>> - the guest knows that it reads LE h/w so it will byteswap
>> register before processing it further. This is BE guest ARM
>> case (regardless of what KVM host endianity is).
>> 
>> 3) BE guest reads integer from mapped h/w BE device
>> register - mmio.data[0] contains MSB, mmio.data[3]
>> contains LSB. Note that if &mmio.data[0] memory would
>> be placed in guest address space and instruction
>> restarted with new address, then it would meet BE
>> guest expectation - the guest knows that it reads
>> BE h/w so it will proceed further without any other
>> work. I guess, it is BE ppc case.
>> 
>> 
>> Arguments in favor of memcpy semantics of mmio.data[]
>> ------------------------------------------------------
>> 
>> x) What are possible values of 'len'? Previous discussions
>> imply that is always powers of 2. Why is that? Maybe
>> there will be CPU that would need to do 5 bytes mmio
>> access, or 6 bytes. How do you assign endianity to
>> such case? 'len' 5 or 6, or any works fine with
>> memcpy semantics. I admit it is hypothetical case, but
>> IMHO it tests how clean ABI definition is.
>> 
>> x) Byte array does not have endianity because it
>> does not have any structure. If one would want to
>> imply structure why mmio is not defined in such way
>> so structure reflected in mmio definition?
>> Something like:
>> 
>> 
>>                /* KVM_EXIT_MMIO */
>>                struct {
>>                          __u64 phys_addr;
>>                          union {
>>                               __u8 byte;
>>                               __u16 hword;
>>                               __u32 word;
>>                               __u64 dword;
>>                          }  data;
>>                          __u32 len;
>>                          __u8  is_write;
>>                } mmio;
>> 
>> where len is really serves as union discriminator and
>> only allowed len values are 1, 2, 4, 8.
>> In this case, I agree, endianity of integer types
>> should be defined. I believe, use of byte array strongly
>> implies that original intent was to have semantics of
>> byte stream copy, just like memcpy does.
>> 
>> x) Note there is nothing wrong with user kernel ABI to
>> use just bytes stream as parameter. There is already
>> precedents like 'read' and 'write' system calls :).
>> 
>> x) Consider case when KVM works with emulated memory mapped
>> h/w devices where some devices operate in LE mode and others
>> operate in BE mode. It is defined by semantics of real h/w
>> device which is it, and should be emulated by emulator and KVM
>> given all other context. As far as mmio.data[] array concerned, if the
>> same integer value is read from these devices registers, mmio.data[]
>> memory should contain integer in opposite endianity for these
>> two cases, i.e MSB is data[0] in one case and MSB is
>> data[3] is in another case. It cannot be the same, because
>> except emulator and guest kernel, all other, like KVM host
>> and hypervisor, have no clue what endianity of device
>> actually is - it should treat mmio.data[] in the same way.
>> But resulting guest target CPU register would need to contain
>> normal integer value in one case and byteswapped in another,
>> because guest kernel would use it directly in one case and
>> byteswap it in another. Byte stream semantics allows to do
>> that. I don't see how it could happen if you fixate mmio.data[]
>> endianity in such way that it would contain integer in
>> the same format for BE and LE emulated device types.
>> 
>> If by this point you agree, that mmio.data[] user-land/kernel
>> ABI semantics should be just memcpy, stop reading :). If not,
>> you may would like to take a look at below appendix where I
>> described in great details endianity of data at different
>> points along mmio processing code path of existing ARM LE KVM,
>> and proposed ARM BE KVM. Note appendix, is very long and very
>> detailed, sorry about that, but I feel that earlier more
>> digested explanations failed, so it driven me to write out
>> all details how I see them. If I am wrong, I hope it would be
>> easier for folks to point in detailed explanation places
>> where my logic goes bad. Also, I am not sure whether this
>> mail thread is good place to discuss all details described
>> in the appendix. Christoffer, please advise whether I should take
>> that one back on [1]. But I hope this bigger picture may help to
>> see the mmio.data[] semantics issue in context.
>> 
>> More inline and appendix is at the end.
>> 
>>> On 20 January 2014 11:19, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote:
>>>> On Mon, Jan 20, 2014 at 03:22:11PM +0100, Alexander Graf wrote:
>>>> 
>>>>> On 17.01.2014, at 19:52, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote:
>>>>> 
>>>>>> On 17 January 2014 17:53, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote:
>>>>>> Specifically, the KVM API says "here's a uint8_t[] byte
>>>>>> array and a length", and the current QEMU code treats that
>>>>>> as "this is a byte array written as if the guest CPU
>>>>>> (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
>>>>>> I/O access to this buffer rather than to the device".
>>>>>> 
>>>>>> The KVM API docs don't actually specify the endianness
>>>>>> semantics of the byte array, but I think that that really
>>>>>> needs to be nailed down. I can think of a couple of options:
>>>>>> * always LE
>>>>>> * always BE
>>>>>>  [these first two are non-starters because they would
>>>>>>  break either x86 or PPC existing code]
>>>>>> * always the endianness the guest is at the time
>>>>>> * always some arbitrary endianness based purely on the
>>>>>>  endianness the KVM implementation used historically
>>>>>> * always the endianness of the host QEMU binary
>>>>>> * something else?
>>>>>> 
>>>>>> Any preferences? Current QEMU code basically assumes
>>>>>> "always the endianness of TARGET_WORDS_BIGENDIAN",
>>>>>> which is pretty random.
>>>>> 
>>>>> Having thought a little more about this, my opinion is:
>>>>> 
>>>>> * we should specify that the byte order of the mmio.data
>>>>>  array is host kernel endianness (ie same endianness
>>>>>  as the QEMU process itself) [this is what it actually
>>>>>  is, I think, for all the cases that work today]
>> 
>> In above please consider two types of mapped emulated
>> h/w devices: BE and LE they cannot have mmio.data in the
>> same endianity. Currently in all observable cases LE ARM
>> and BE PPC devices endianity matches kernel/qemu
>> endianity but it would break when BE ARM is introduced
>> or LE PPC or one would start emulating BE devices on LE
>> ARM.
>> 
>>>>> * we should fix the code path in QEMU for handling
>>>>>  mmio.data which currently has the implicit assumption
>>>>>  that when using KVM TARGET_WORDS_BIGENDIAN is the same
>>>>>  as the QEMU host process endianness (because it's using
>>>>>  load/store functions which swap if TARGET_WORDS_BIGENDIAN
>>>>>  is different from HOST_WORDS_BIGENDIAN)
>> 
>> I do not follow above. Maybe I am missing bigger context.
>> What is CPU under discussion in above? On ARM V7 system
>> when LE device is accessed as integer &mmio.data[0] address
>> would contain integer is in LE format, ie mmio.data[0] is LSB.
>> 
>> Here is gdb session of LE qemu running on V7 LE kernel and
>> TC1 LE guest. Guest kernel accesses sys_cfgstat register which is
>> arm_sysctl registers with offset of 0xa8. Note.arm_sysct is memory
>> mapped LE device.
>> Please check run->mmio structure after read
>> (cpu_physical_memory_rw) completes it is in 4 bytes integer in
>> LE format mmio.data[0] is LSB and is equal to 1
>> (s->syscfgstat value):
>> 
>> (gdb) bt
>> #0  arm_sysctl_read (opaque=0x95a600, offset=168, size=4) at
>> /home/root/20131219/qemu-be/hw/misc/arm_sysctl.c:127
>> #1  0x0023b9b4 in memory_region_read_accessor (mr=0x95b8e0,
>> addr=<optimized out>, value=0xb5c0dc18, size=4, shift=0,
>> mask=4294967295)
>>    at /home/root/20131219/qemu-be/memory.c:407
>> #2  0x0023aba4 in access_with_adjusted_size (addr=4294967295,
>> value=0xb5c0dc18, value@entry=0xb5c0dc10, size=size@entry=4,
>> access_size_min=1,
>>    access_size_max=2357596, access=access@entry=0x23b96c
>> <memory_region_read_accessor>, mr=mr@entry=0x95b8e0) at
>> /home/root/20131219/qemu-be/memory.c:477
>> #3  0x0023f95c in memory_region_dispatch_read1 (size=4, addr=168,
>> mr=0x95b8e0) at /home/root/20131219/qemu-be/memory.c:944
>> #4  memory_region_dispatch_read (size=4, pval=0xb5c0dc68, addr=168,
>> mr=0x95b8e0) at /home/root/20131219/qemu-be/memory.c:966
>> #5  io_mem_read (mr=mr@entry=0x95b8e0, addr=<optimized out>,
>> pval=pval@entry=0xb5c0dc68, size=size@entry=4) at
>> /home/root/20131219/qemu-be/memory.c:1743
>> #6  0x001abd38 in address_space_rw (as=as@entry=0x8102d8
>> <address_space_memory>, addr=469827752, buf=buf@entry=0xb6fd6028 "",
>> len=4, is_write=false,
>>    is_write@entry=true) at /home/root/20131219/qemu-be/exec.c:2025
>> #7  0x001abf90 in cpu_physical_memory_rw (addr=<optimized out>,
>> buf=buf@entry=0xb6fd6028 "", len=<optimized out>, is_write=0)
>>    at /home/root/20131219/qemu-be/exec.c:2070
>> #8  0x00239e00 in kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at
>> /home/root/20131219/qemu-be/kvm-all.c:1701
>> #9  0x001a3f78 in qemu_kvm_cpu_thread_fn (arg=0x8758f8) at
>> /home/root/20131219/qemu-be/cpus.c:874
>> #10 0xb6cae06c in start_thread (arg=0xb5c0e310) at pthread_create.c:314
>> #11 0xb69f5070 in ?? () at
>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6
>> #12 0xb69f5070 in ?? () at
>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) p /x s->sys_cfgstat
>> $25 = 0x1
>> (gdb) finish
>> Run till exit from #0  arm_sysctl_read (opaque=0x95a600, offset=168,
>> size=4) at /home/root/20131219/qemu-be/hw/misc/arm_sysctl.c:127
>> memory_region_read_accessor (mr=0x95b8e0, addr=<optimized out>,
>> value=0xb5c0dc18, size=4, shift=0, mask=4294967295) at
>> /home/root/20131219/qemu-be/memory.c:408
>> 408        trace_memory_region_ops_read(mr, addr, tmp, size);
>> Value returned is $26 = 1
>> (gdb) enable 2
>> (gdb) cont
>> Continuing.
>> 
>> Breakpoint 2, kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at
>> /home/root/20131219/qemu-be/kvm-all.c:1660
>> 1660            kvm_arch_pre_run(cpu, run);
>> (gdb) bt
>> #0  kvm_cpu_exec (cpu=cpu@entry=0x8758f8) at
>> /home/root/20131219/qemu-be/kvm-all.c:1660
>> #1  0x001a3f78 in qemu_kvm_cpu_thread_fn (arg=0x8758f8) at
>> /home/root/20131219/qemu-be/cpus.c:874
>> #2  0xb6cae06c in start_thread (arg=0xb5c0e310) at pthread_create.c:314
>> #3  0xb69f5070 in ?? () at
>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6
>> #4  0xb69f5070 in ?? () at
>> ../ports/sysdeps/unix/sysv/linux/arm/clone.S:97 from /lib/libc.so.6
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) p /x run->mmio
>> $27 = {phys_addr = 0x1c0100a8, data = {0x1, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0}, len = 0x4, is_write = 0x0}
>> 
>> Also please look at adjust_endianness function and
>> struct MemoryRegion 'endianness' field. IMHO in qemu it
>> works quite nicely already. MemoryRegion 'read' and 'write'
>> callbacks return/get data in native format adjust_endianness
>> function checks whether emulated device endianness matches
>> emulator endianness and if it is different it does byteswap
>> according to size. As in above example arm_sysctl_ops memory
>> region should be marked as DEVICE_LITTLE_ENDIAN when it
>> returns s->sys_cfgstat value LE qemu sees that endianity
>> matches and it does not byteswap of result, so integer at
>> &mmio.data[0] address is in LE form. When qemu would
>> run in BE mode on BE kernel, it would see that endianity
>> mismatches and it will byteswap s->sys_cfgstat native value
>> (BE), so mmio.data would contain integer in LE format again.
>> 
>> Note in currently committed code arm_sysctl_ops endianity
>> is DEVICE_NATIVE_ENDIAN, which is wrong - real vexpress
>> arm_sysctl device always gives/receives data in LE format regardless
>> of current CPSR E bit value, so it cannot be marked as NATIVE.
>> LE and BE kernels always read it as LE device; BE kernel follows
>> with byteswap. It was OK while we just run qemu in LE, but it
>> should be fixed to be LITTLE_ENDIAN for BE qemu work correctly
>> ... and actually that device and few other ARM specific devices
>> endianity change to LITTLE_ENDIAN was the only change in qemu
>> to make BE KVM to work.
>> 
>>>> 
>>>> Yes, I fully agree :).
>>> Great, I'll prepare a patch for the KVM API documentation.
>>> 
>>> -Christoffer
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@xxxxxxxxxxxxxxxxxxxxx
>>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
>> 
>> Thanks,
>> Victor
>> 
>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/thread.html#223186
>> 
>> 
>>    Appendix
>>    Data path endianity in ARM KVM mmio
>>    ===================================
>> 
>> This writeup considers several scenarios and tracks endianity
>> of data how it travels from emulator to guest CPU register, in
>> case of ARM KVM. It starts with currently committed code for LE
>> KVM host case and further discusses proposed BE KVM host
>> arrangement.
>> 
>> Just to restrict discussion writeup considers code path of
>> integer (4 bytes) read from h/w mapped emulated device memory.
>> Writeup considers endianity of essential places involved in such
>> code path.
>> 
>> For all cases when endianity is defined, it is assumed that
>> values under consideration are in memory (opposite to be in
>> register that does not have endianity). I.e even if function
>> variable could be actually allocated in CPU register writeup
>> will reference to it as it is in memory, just to keep
>> discussion clean, except for final guest CPU register.
>> 
>> Let's consider the following places along data path from
>> emulator to guest CPU register:
>> 
>> 1) emulator code that holds integer value to be read, assume
>> it would be global 'int emulated_hw_device_val' variable.
>> Normally in emulator it is held in native endian format - i.e
>> it is CPSR E bit is the same as kernel CPSR E bit. Just for
>> discussion sake assume that this h/w device registers
>> holds 5 as its value.
>> 
>> 2) KVM_EXIT_MMIO part of 'struct kvm_run' structure, i.e
>> mmio.data byte array. Byte array does not have endianity,
>> but for this discussion it would track endianity of integer
>> at &mmio.data[0] address
>> 
>> 3) 'data' variable type of 'unsigned long' in
>> kvm_handle_mmio_return function before vcpu_data_host_to_guest
>> call. KVM host mmio_read_buf function is used to fill this
>> variable from mmio.data buffer. mmio_read_buf actually
>> acts as memcpy from mmio.data buffer address,
>> just taking access size in account.
>> 
>> 4) the same 'data' variable as above, but after
>> vcpu_data_host_to_guest function call, just before it is copied
>> to vcpu_reg target register location. Note
>> vcpu_data_host_to_guest function may byteswap value of 'data'
>> depending on current KVM host endianity and value of
>> guest CPSR E bit.
>> 
>> 5) guest CPU spilled register array, location of target register
>> i.e integer at vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) address
>> 
>> 6) finally guest CPU register filled from vcpu_reg just before
>> guest resume execution of trapped emulated instruction. Note
>> it is done by hypervisor part of code and hypervisor EE bit is
>> the same as KVM host CPSR E bit.
>> 
>> Note again, KVM host, emulator, and hypervisor part of code (guest
>> CPU registers save and restore code) always run in the same
>> endianity. Endianity of accessed emulated devices and endianity
>> of guest varies independently of KVM host endianity.
>> 
>> Below sections consider all permutations of all possible cases,
>> it maybe quite boring to read. I've created summary table at
>> the end, you can jump to the table, after reading few cases.
>> But if you have objections and you see things happen differently
>> please comment inline of the use cases steps.
>> 
>> LE KVM host
>> ===========
>> 
>> Use case 1
>> ----------
>> 
>> Emulated h/w device gives data in LE form; emulator and KVM
>> host endianity is LE (host CPSR E bit is off); guest compiled
>> in LE mode; and guest does access with CPSR E bit off
>> 
>> 1) 'emulated_hw_device_val' emulator variable is LE
>> 2) &mmio.data[0] holds integer in LE format, matches device
>> endianity
>> 3) 'data' is LE
>> 4) 'data' is LE (since guest CPSR E bit is off no byteswap)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE
>> 6) final guest target CPU register contains 5 (0x00000005)
>> 
>> guest resumes execution ... Let's say after 'ldr r1, [r0]'
>> instruction, where r0 holds address of devices, it knows
>> that it reads LE mapped h/w so no addition processing is
>> needed
>> 
>> Use case 2
>> ----------
>> 
>> Emulated h/w device gives data in LE form; emulator and KVM
>> host endianity is LE (host CPSR E bit is off); guest compiled
>> in BE mode; and guest does access with CPSR E bit on
>> 
>> 1) 'emulated_hw_device_val' emulator variable is LE
>> 2) &mmio.data[0] holds integer in LE format; matches device
>> endianity
>> 3) 'data' is LE
>> 4) 'data' is BE (since guest CPSR E bit is on, vcpu_data_host_to_guest
>> will do byteswap: cpu_to_be)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE
>> 6) final guest target CPU register contains 0x05000000
>> 
>> guest resumes execution after 'ldr r1, [r0]', guest kernel
>> knows that it runs in BE mode (E bit on), it knows that it reads
>> LE device memory, it needs to byteswap r1 before further
>> processing so it does 'rev r1, r1' and proceed with result
>> 
>> Use case 3
>> ----------
>> 
>> Emulated h/w device gives data in BE form; emulator and KVM
>> host endianity is LE (host CPSR E bit is off); guest compiled
>> in LE mode; and guest does access with CPSR E bit off
>> 
>> 1) 'emulated_hw_device_val' emulator variable is LE
>> 2) &mmio.data[0] holds integer in BE format; emulator byteswaps
>> it because it knows that device endianity is opposite to native,
>> and it should match device endianity
>> 3) 'data' is BE
>> 4) 'data' is BE (since guest CPSR E bit is off no byteswap)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE
>> 6) final guest target CPU register contains 0x05000000
>> 
>> guest resumes execution after 'ldr r1, [r0]', guest kernel
>> knows that it runs in LE mode (E bit off), it knows that it
>> reads BE device memory, it need to byteswap r1 before further
>> processing so it does 'rev r1, r1' and proceeds with result
>> 
>> Use case 4
>> ----------
>> 
>> Emulated h/w device gives data in BE form; emulator and KVM
>> host endianity is LE (host CPSR E bit is off); guest compiled
>> in BE mode; and guest does access with CPSR E bit on
>> 
>> 1) 'emulated_hw_device_val' emulator variable is LE
>> 2) &mmio.data[0] holds integer in BE format; emulator byteswaps
>> it because it knows that device endianity is opposite to native,
>> and should match device endianity
>> 3) 'data' is BE
>> 4) 'data' is LE (since guest CPSR E bit is on, vcpu_data_host_to_guest
>> will do byteswap: cpu_to_be)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE
>> 6) final guest target CPU register contains 5 (0x00000005)
>> 
>> guest resumes execution after 'ldr r1, [r0]', guest kernel
>> knows that it runs in BE mode, it knows that it reads BE device
>> memory, so it does not need to do anything before further
>> processing.
>> 
>> 
>> Above uses cases that is exactly what we have now after Marc's
>> commit to support BE guest on LE KVM host. Further use
>> cases describe how it would work with BE KVM patches I proposed.
>> It is understood that it is subject of further discussion.
>> 
>> 
>> BE KVM host
>> ===========
>> 
>> Use case 5
>> ----------
>> 
>> Emulated h/w device gives data in LE form; emulator and KVM
>> host endianity is BE (host CPSR E bit is on); guest compiled
>> in BE mode; and guest does access with CPSR E bit on
>> 
>> 1) 'emulated_hw_device_val' emulator variable is BE
>> 2) &mmio.data[0] holds integer in LE format; emulator byteswaps
>> it because it knows that device endianity is opposite to native;
>> matches device endianity
>> 3) 'data' is LE
>> 4) 'data' is LE (since guest CPSR E bit is on, BE KVM host kernel
>> does *not* do byteswap: cpu_to_be no effect in BE host kernel)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE
>> 6) final guest target CPU register contains 0x05000000 because
>> hypervisor runs in BE mode, so load of LE integer will be
>> byteswapped value in register
>> 
>> guest resumes execution after 'ldr r1, [r0]', guest kernel
>> knows that it runs in BE mode, it knows that it reads LE device
>> memory, it need to byteswap r1 before further processing so it
>> does 'rev r1, r1' and proceeds with result
>> 
>> Use case 6
>> ----------
>> 
>> Emulated h/w device gives data in LE form; emulator and KVM
>> host endianity is BE (host CPSR E bit is on); guest compiled
>> in LE mode; and guest does access with CPSR E bit off
>> 
>> 1) 'emulated_hw_device_val' emulator variable is BE
>> 2) &mmio.data[0] holds integer in LE format; emulator byteswaps
>> it because it knows that device endianity is opposite to native;
>> matches device endianity
>> 3) 'data' is LE
>> 4) 'data' is BE (since guest CPSR E bit is off, BE KVM host kernel
>> does byteswap: cpu_to_le)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE
>> 6) final guest target CPU register contains 5 (0x00000005) because
>> hypervisor runs in BE mode, so load of BE integer will be OK
>> 
>> guest resumes execution after 'ldr r1, [r0]', guest kernel
>> knows that it runs in LE mode, it knows that it reads LE device
>> memory, so it does not need to do anything else it just proceeds
>> 
>> Use case 7
>> ----------
>> 
>> Emulated h/w device gives data in BE form; emulator and KVM
>> host endianity is BE (host CPSR E bit is on); guest compiled
>> in BE mode; and guest does access with CPSR E bit on
>> 
>> 1) 'emulated_hw_device_val' emulator variable is BE
>> 2) &mmio.data[0] holds integer in BE format; matches device
>> endianity
>> 3) 'data' is BE
>> 4) 'data' is BE (since guest CPSR E bit is on, BE KVM host kernel
>> does *not* do byteswap: cpu_to_be no effect in BE host kernel)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is BE
>> 6) final guest target CPU register contains 5 (0x00000005) because
>> hypervisor runs in BE mode, so load of BE integer will be OK
>> 
>> guest resumes execution after 'ldr r1, [r0]', guest kernel
>> knows that it runs in BE mode, it knows that it reads BE device
>> memory, so it does not need to do anything else it just proceeds
>> 
>> Use case 8
>> ----------
>> 
>> Emulated h/w device gives data in BE form; emulator and KVM
>> host endianity is BE (host CPSR E bit is on); guest compiled
>> in LE mode; and guest does access with CPSR E bit off
>> 
>> 1) 'emulated_hw_device_val' emulator variable is BE
>> 2) &mmio.data[0] holds integer in BE format; matches device
>> endianity
>> 3) 'data' is BE
>> 4) 'data' is LE (since guest CPSR E bit is off, BE KVM host kernel
>> does byteswap: cpu_to_le)
>> 5) integer at 'vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt)' is LE
>> 6) final guest target CPU register contains 0x05000000 because
>> hypervisor runs in BE mode, so load of LE integer will be
>> byteswapped value in register
>> 
>> guest resumes execution after 'ldr r1, [r0]', guest kernel
>> knows that it runs in LE mode, it knows that it reads BE device
>> memory, it need to byteswap r1 before further processing so it
>> does 'rev r1, r1' and proceeds with result
>> 
>> Note that with BE kernel we actually have some initial portion
>> of assembler code that is executed with CPSR bit off and it reads
>> LE h/w - i.e it falls into use case 1.
>> 
>> Summary Table (please use fixed font to see it correctly)
>> ========================================
>> 
>> --------------------------------------------------------------
>> | Use Case # | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   |
>> --------------------------------------------------------------
>> | KVM Host,  | LE  | LE  | LE  | LE  | BE  | BE  | BE  | BE  |
>> | Emulator,  |     |     |     |     |     |     |     |     |
>> | Hypervisor |     |     |     |     |     |     |     |     |
>> | Endianity  |     |     |     |     |     |     |     |     |
>> --------------------------------------------------------------
>> | Device     | LE  | LE  | BE  | BE  | LE  | LE  | BE  | BE  |
>> | Endianity  |     |     |     |     |     |     |     |     |
>> --------------------------------------------------------------
>> | Guest      | LE  | BE  | LE  | BE  | BE  | LE  | BE  | LE  |
>> | Access     |     |     |     |     |     |     |     |     |
>> | Endianity  |     |     |     |     |     |     |     |     |
>> --------------------------------------------------------------
>> | Step 1)    | LE  | LE  | LE  | LE  | BE  | BE  | BE  | BE  |
>> --------------------------------------------------------------
>> | Step 2)    | LE  | LE  | BE  | BE  | LE  | LE  | BE  | BE  |
>> --------------------------------------------------------------
>> | Step 3)    | LE  | LE  | BE  | BE  | LE  | LE  | BE  | BE  |
>> --------------------------------------------------------------
>> | Step 4)    | LE  | BE  | BE  | LE  | LE  | BE  | BE  | LE  |
>> --------------------------------------------------------------
>> | Step 5)    | LE  | BE  | BE  | LE  | LE  | BE  | BE  | LE  |
>> --------------------------------------------------------------
>> | Final Reg  | no  | yes | yes | no  | yes | no  | no  | yes |
>> | value      |     |     |     |     |     |     |     |     |
>> | byteswapped|     |     |     |     |     |     |     |     |
>> --------------------------------------------------------------
>> | Guest      | no  | yes | yes | no  | yes | no  | no  | yes |
>> | Follows    |     |     |     |     |     |     |     |     |
>> | with rev   |     |     |     |     |     |     |     |     |
>> --------------------------------------------------------------
>> 
>> Few objservations
>> =================
>> 
>> x) Note above table is symmetric wrt to BE<->LE change:
>>       1<-->7
>>       2<-->8
>>       3<-->5
>>       4<-->6
>> 
>> x) &mmio.data[0] address always holds integer in the same
>> format as emulated device endianity
>> 
>> x) During step 4) when vcpu_data_host_to_guest function
>> is used, if guest E bit value different, but everything else
>> is the same, opposite result are produced (1&2, 3&4, 5&6,
>> 7&8)
>> 
>> If you reached to this end :), again, thank you very much for
>> reading it!
>> 
>> - Victor
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@xxxxxxxxxxxxxxxxxxxxx
>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
> 
> Hi Victor,
> 
> First of all I really appreciate the thorough description with
> all the use-cases.
> 
> Below would be a summary of what I understood from your
> analysis:
> 
> 1. Any MMIO device marked as NATIVE ENDIAN in user

"Native endian" really is just a shortcut for "target endian" which is LE for ARM and BE for PPC. There shouldn't be a qemu-system-armeb or qemu-system-ppc64le.

QEMU emulates everything that comes after the CPU, so imagine the ioctl struct as a bus package. Your bus doesn't care what endianness the CPU is in - it just gets data from the CPU.

A bus write on the CPU however honors the endianness setting of the CPU. So when we convert from a value in register to a value on the bus we need to take this endian configuration into account.

That's exactly what we are talking about here. KVM should do the cpu configured register->bus endian mapping while QEMU does the bus->device endian map.


Alex

> space tool (QEMU or KVMTOOL) is bad for cross-endian
> Guest. For supporting cross-endian Guest we need to have
> all MMIO device with fixed ENDIANESS.
> 
> 2. We don't need to do any endianness conversions in KVM
> for MMIO writes that are being forwarded to user space. It is
> the job of user space (QEMU or KVMTOOL) to interpret the
> endianness of MMIO write data based on device endianness.
> 
> 3. The MMIO read operation is the one which will need
> explicit handling in KVM because the target VCPU register
> of MMIO read operation should be loaded with MMIO data
> (returned from user space) based upon current VCPU
> endianness (i.e. VCPU CPSR.E bit).
> 
> 4. In-kernel emulated devices (such as VGIC) will have not
> require any explicit endianness conversion of MMIO data for
> MMIO write operations (same as point 2).
> 
> 5. In-kernel emulated devices (such as VGIC) will have to
> explicit endianness conversion of MMIO data for MMIO read
> operations based on device endianness (same as point 3).
> 
> I hope above summary of my understanding is as-per your
> description. If so then I am in-support of your description.
> 
> I think your description (and above 5 points) takes care of
> all use cases of cross-endianness without changing current
> MMIO ABI.
> 
> Regards,
> Anup
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux