Re: [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> Am 25.01.2014 um 03:34 schrieb Christoffer Dall <christoffer.dall@xxxxxxxxxx>:
> 
>> On Sat, Jan 25, 2014 at 03:15:35AM +0100, Alexander Graf wrote:
>> 
>>> On 25.01.2014, at 02:58, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:
>>> 
>>>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>>>> On 24 January 2014 23:51, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:
>>>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index 366bf4b..6dbd68c 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>>> true, and should be filled by application code otherwise.
>>>>>> 
>>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>>> +<type> val = *((<type> *)mmio.data).
>>>>> 
>>>>> Host userspace should be able to do that with what results?  It would
>>>>> only produce a directly usable value if host endianness is the same as
>>>>> the emulated device's endianness.
>>>> 
>>>> With the result that it gets the value the CPU has sent out on
>>>> the bus as the memory transaction.
>>> 
>>> Doesn't that assume the host kernel endianness is the same as the bus
>>> (or rather, that the host CPU would not swap such an access before it
>>> hits the bus)?
>>> 
>>> If you take the same hardware and boot a little endian host kernel one
>>> day, and a big endian host kernel the next, the bus doesn't change, and
>>> neither should the bytewise (assuming address invariance) contents of
>>> data[].  How data[] would look when read as a larger integer would of
>>> course change -- but that's due to how you're reading it.
>>> 
>>> It's clear to say that a value in memory has been stored there in host
>>> endianness when the value is as you would want to see it in a CPU
>>> register, but it's less clear when you talk about it relative to values
>>> on a bus.  It's harder to correlate that to something that is software
>>> visible.
>>> 
>>> I don't think there's any actual technical difference between your
>>> wording and mine when each wording is properly interpreted, but I
>>> suspect my wording is less likely to be misinterpreted (I could be
>>> wrong).
>>> 
>>>> Obviously if what userspace
>>>> is emulating is a bus which has a byteswapping bridge or if it's
>>>> being helpful to device emulation by providing "here's the value
>>>> even though you think you're wired up backwards" then it needs
>>>> to byteswap.
>>> 
>>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>>> something that depends on the endianness that the host CPU is currently
>>> running in.
>>> 
>>>>> How about a wording like this:
>>>>> 
>>>>> The 'data' member contains, in its first 'len' bytes, the value as it
>>>>> would appear if the guest had accessed memory rather than I/O.
>>>> 
>>>> I think this is confusing, because now userspace authors have
>>>> to figure out how to get back to "value X of size Y at address Z"
>>>> by interpreting this text... Can you write out the equivalent of
>>>> Christoffer's text "here's how you get the memory transaction
>>>> value" for what you want?
>>> 
>>> Userspace swaps the value if and only if userspace's endianness differs
>>> from the endianness with which the device interprets the data
>>> (regardless of whether said interpretation is considered natural or
>>> swapped relative to the way the bus is documented).  It's similar to how
>>> userspace would handle emulating DMA.
>>> 
>>> KVM swaps the value if and only if the endianness of the guest access
>>> differs from that of the host, i.e. if it would have done swapping when
>>> emulating an ordinary memory access.
>>> 
>>>> (Also, value as it would appear to who?)
>>> 
>>> As it would appear to anyone.  It works because data[] actually is
>>> memory.  Any difference in how data appears based on the reader's
>>> context would already be reflected when the reader performs the load.
>>> 
>>>> I think your wording implies that the order of bytes in data[] depend
>>>> on the guest CPU "usual byte order", ie the order which the CPU
>>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>>> and it would mean it would come out differently from
>>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>>> endianness from that "usual" order.
>>> 
>>> It doesn't depend on "usual" anything.  The only thing it implicitly
>>> says about guest byte order is that it's KVM's job to implement any
>>> swapping if the endianness of the guest access is different from the
>>> endianness of the host kernel access (whether it's due to the guest's
>>> mode, the way a page is mapped, the instruction used, etc).
>>> 
>>>> Finally, I think it's a bit confusing in that "as if the guest had
>>>> accessed memory" is assigning implicit semantics to memory
>>>> in the emulated system, when memory is actually kind of outside
>>>> KVM's purview because it's not part of the CPU.
>>> 
>>> That's sort of the point.  It defines it in a way that is independent of
>>> the CPU, and thus independent of what endianness the CPU operates in.
>> 
>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>> 
>> your proposal:
>> 
>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>> 
>> -> ldw_p() will give us the correct value to work with
>> 
>> current proposal:
>> 
>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>> 
>> -> *(uint32_t*)data will give us the correct value to work with
>> 
>> 
>> There are pros and cons for both approaches.
>> 
>> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
>> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>> 
>> Obviously both approaches would work as long as they're properly defined :).
> Just to clarify, with approach 2 existing supported QEMU configurations
> of BE/LE on both ARM and PPC still work - it is only for future mixed
> endian supprt we need to modify QEMU, right?

Yes, I'm fairly sure we all agree that today's combinations should keep working the way they used to. It's only the new host ones (LE ppc, BE arm) that are under discussion.


Alex

> 
> -Christoffer
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux