Re: [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25 January 2014 01:20, Alexander Graf <agraf@xxxxxxx> wrote:
>
>
>> Am 25.01.2014 um 03:37 schrieb Victor Kamensky <victor.kamensky@xxxxxxxxxx>:
>>
>>> On 24 January 2014 18:15, Alexander Graf <agraf@xxxxxxx> wrote:
>>>
>>>> On 25.01.2014, at 02:58, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:
>>>>
>>>>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>>>>> On 24 January 2014 23:51, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:
>>>>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>>> index 366bf4b..6dbd68c 100644
>>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>>>> true, and should be filled by application code otherwise.
>>>>>>>
>>>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>>>> +<type> val = *((<type> *)mmio.data).
>>>>>>
>>>>>> Host userspace should be able to do that with what results?  It would
>>>>>> only produce a directly usable value if host endianness is the same as
>>>>>> the emulated device's endianness.
>>>>>
>>>>> With the result that it gets the value the CPU has sent out on
>>>>> the bus as the memory transaction.
>>>>
>>>> Doesn't that assume the host kernel endianness is the same as the bus
>>>> (or rather, that the host CPU would not swap such an access before it
>>>> hits the bus)?
>>>>
>>>> If you take the same hardware and boot a little endian host kernel one
>>>> day, and a big endian host kernel the next, the bus doesn't change, and
>>>> neither should the bytewise (assuming address invariance) contents of
>>>> data[].  How data[] would look when read as a larger integer would of
>>>> course change -- but that's due to how you're reading it.
>>>>
>>>> It's clear to say that a value in memory has been stored there in host
>>>> endianness when the value is as you would want to see it in a CPU
>>>> register, but it's less clear when you talk about it relative to values
>>>> on a bus.  It's harder to correlate that to something that is software
>>>> visible.
>>>>
>>>> I don't think there's any actual technical difference between your
>>>> wording and mine when each wording is properly interpreted, but I
>>>> suspect my wording is less likely to be misinterpreted (I could be
>>>> wrong).
>>>>
>>>>> Obviously if what userspace
>>>>> is emulating is a bus which has a byteswapping bridge or if it's
>>>>> being helpful to device emulation by providing "here's the value
>>>>> even though you think you're wired up backwards" then it needs
>>>>> to byteswap.
>>>>
>>>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>>>> something that depends on the endianness that the host CPU is currently
>>>> running in.
>>>>
>>>>>> How about a wording like this:
>>>>>>
>>>>>> The 'data' member contains, in its first 'len' bytes, the value as it
>>>>>> would appear if the guest had accessed memory rather than I/O.
>>>>>
>>>>> I think this is confusing, because now userspace authors have
>>>>> to figure out how to get back to "value X of size Y at address Z"
>>>>> by interpreting this text... Can you write out the equivalent of
>>>>> Christoffer's text "here's how you get the memory transaction
>>>>> value" for what you want?
>>>>
>>>> Userspace swaps the value if and only if userspace's endianness differs
>>>> from the endianness with which the device interprets the data
>>>> (regardless of whether said interpretation is considered natural or
>>>> swapped relative to the way the bus is documented).  It's similar to how
>>>> userspace would handle emulating DMA.
>>>>
>>>> KVM swaps the value if and only if the endianness of the guest access
>>>> differs from that of the host, i.e. if it would have done swapping when
>>>> emulating an ordinary memory access.
>>>>
>>>>> (Also, value as it would appear to who?)
>>>>
>>>> As it would appear to anyone.  It works because data[] actually is
>>>> memory.  Any difference in how data appears based on the reader's
>>>> context would already be reflected when the reader performs the load.
>>>>
>>>>> I think your wording implies that the order of bytes in data[] depend
>>>>> on the guest CPU "usual byte order", ie the order which the CPU
>>>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>>>> and it would mean it would come out differently from
>>>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>>>> endianness from that "usual" order.
>>>>
>>>> It doesn't depend on "usual" anything.  The only thing it implicitly
>>>> says about guest byte order is that it's KVM's job to implement any
>>>> swapping if the endianness of the guest access is different from the
>>>> endianness of the host kernel access (whether it's due to the guest's
>>>> mode, the way a page is mapped, the instruction used, etc).
>>>>
>>>>> Finally, I think it's a bit confusing in that "as if the guest had
>>>>> accessed memory" is assigning implicit semantics to memory
>>>>> in the emulated system, when memory is actually kind of outside
>>>>> KVM's purview because it's not part of the CPU.
>>>>
>>>> That's sort of the point.  It defines it in a way that is independent of
>>>> the CPU, and thus independent of what endianness the CPU operates in.
>>>
>>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>>>
>>> your proposal:
>>>
>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>
>> Actually it is { 0x01, 0x02, 0x03, 0x04 } in all cases.
>> First of all, it does not depend on host endianity because in
>> Scott's case it is defined in terms where KVM host is not
>> even present.
>>
>> If you just take real h/w (no emulation) and put
>> memory instead of IO and it is the same driver does the same
>> access operation on the same device bytes wise it will look
>> always the same. Because in one case driver just write word,
>> but when driver operates in opposite endianity it swaps word
>> first and then write it. Bytes wise it is the same memory
>> all the time. Device sees the same value in both cases.
>>
>> Look at writel, readl, readl_relaxed, etc functions... Clean
>> endian agnostic driver just uses those access functions
>> and those functions handle byteswap if needed under
>> the cover.
>
> You're thinking Linux again. Of course Linux does things on
> its side to compensate for endian swaps on the bus.

Anyone who would want to work with the same h/w in
endian agnostic way will have to do those  endian swaps
for one of the cases.

> It even
> does it on BE PPC if you access devices through swizzling
> buses, but we don't care as hypervisor. All we know in kvm is:
>
>   - instruction used for access
>   -> originating register (value)
>   -> access size (len)
>   -> virtual address (which we translate to phys)
>   - guest endianness mode

Do you agree that in above if you just change guest endianness
then device will see different (byteswapped) value so as far as
device concerned it is completely different transactions, different
effect. Why do we consider them? So if you want to see the
same transaction and have the same effect on device, then when
guest switches endianness it will need to add/drop byteswap,
in order to have the same effect.

Thanks,
Victor

> So my example below was trying to show what happens
> when value=0x01020304, len=4 with different host/guest endian
> combinations.
>
>
> Alex
>
>>
>> Thanks,
>> Victor
>>
>>> -> ldw_p() will give us the correct value to work with
>>>
>>> current proposal:
>>>
>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>>  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>
>>> -> *(uint32_t*)data will give us the correct value to work with
>>>
>>>
>>> There are pros and cons for both approaches.
>>>
>>> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
>>> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>>>
>>> Obviously both approaches would work as long as they're properly defined :).
>>>
>>>
>>> Alex
>>>
>>>
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@xxxxxxxxxxxxxxxxxxxxx
>>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux