On 25.01.2014, at 02:58, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote: > On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote: >> On 24 January 2014 23:51, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote: >>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote: >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >>>> index 366bf4b..6dbd68c 100644 >>>> --- a/Documentation/virtual/kvm/api.txt >>>> +++ b/Documentation/virtual/kvm/api.txt >>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied >>>> by kvm. The 'data' member contains the written data if 'is_write' is >>>> true, and should be filled by application code otherwise. >>>> >>>> +The 'data' member byte order is host kernel native endianness, regardless of >>>> +the endianness of the guest, and represents the the value as it would go on the >>>> +bus in real hardware. The host user space should always be able to do: >>>> +<type> val = *((<type> *)mmio.data). >>> >>> Host userspace should be able to do that with what results? It would >>> only produce a directly usable value if host endianness is the same as >>> the emulated device's endianness. >> >> With the result that it gets the value the CPU has sent out on >> the bus as the memory transaction. > > Doesn't that assume the host kernel endianness is the same as the bus > (or rather, that the host CPU would not swap such an access before it > hits the bus)? > > If you take the same hardware and boot a little endian host kernel one > day, and a big endian host kernel the next, the bus doesn't change, and > neither should the bytewise (assuming address invariance) contents of > data[]. How data[] would look when read as a larger integer would of > course change -- but that's due to how you're reading it. > > It's clear to say that a value in memory has been stored there in host > endianness when the value is as you would want to see it in a CPU > register, but it's less clear when you talk about it relative to values > on a bus. It's harder to correlate that to something that is software > visible. > > I don't think there's any actual technical difference between your > wording and mine when each wording is properly interpreted, but I > suspect my wording is less likely to be misinterpreted (I could be > wrong). > >> Obviously if what userspace >> is emulating is a bus which has a byteswapping bridge or if it's >> being helpful to device emulation by providing "here's the value >> even though you think you're wired up backwards" then it needs >> to byteswap. > > Whether the emulated bus has "a byteswapping bridge" doesn't sound like > something that depends on the endianness that the host CPU is currently > running in. > >>> How about a wording like this: >>> >>> The 'data' member contains, in its first 'len' bytes, the value as it >>> would appear if the guest had accessed memory rather than I/O. >> >> I think this is confusing, because now userspace authors have >> to figure out how to get back to "value X of size Y at address Z" >> by interpreting this text... Can you write out the equivalent of >> Christoffer's text "here's how you get the memory transaction >> value" for what you want? > > Userspace swaps the value if and only if userspace's endianness differs > from the endianness with which the device interprets the data > (regardless of whether said interpretation is considered natural or > swapped relative to the way the bus is documented). It's similar to how > userspace would handle emulating DMA. > > KVM swaps the value if and only if the endianness of the guest access > differs from that of the host, i.e. if it would have done swapping when > emulating an ordinary memory access. > >> (Also, value as it would appear to who?) > > As it would appear to anyone. It works because data[] actually is > memory. Any difference in how data appears based on the reader's > context would already be reflected when the reader performs the load. > >> I think your wording implies that the order of bytes in data[] depend >> on the guest CPU "usual byte order", ie the order which the CPU >> does not do a byte-lane-swap for (LE for ARM, BE for PPC), >> and it would mean it would come out differently from >> my/Alex/Christoffer's proposal if the host kernel was the opposite >> endianness from that "usual" order. > > It doesn't depend on "usual" anything. The only thing it implicitly > says about guest byte order is that it's KVM's job to implement any > swapping if the endianness of the guest access is different from the > endianness of the host kernel access (whether it's due to the guest's > mode, the way a page is mapped, the instruction used, etc). > >> Finally, I think it's a bit confusing in that "as if the guest had >> accessed memory" is assigning implicit semantics to memory >> in the emulated system, when memory is actually kind of outside >> KVM's purview because it's not part of the CPU. > > That's sort of the point. It defines it in a way that is independent of > the CPU, and thus independent of what endianness the CPU operates in. Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like your proposal: BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 } LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 } BE guest, LE host: { 0x01, 0x02, 0x03, 0x04 } LE guest, LE host: { 0x04, 0x03, 0x02, 0x01 } -> ldw_p() will give us the correct value to work with current proposal: BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 } LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 } BE guest, LE host: { 0x04, 0x03, 0x02, 0x01 } LE guest, LE host: { 0x01, 0x02, 0x03, 0x04 } -> *(uint32_t*)data will give us the correct value to work with There are pros and cons for both approaches. Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness". With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in. Obviously both approaches would work as long as they're properly defined :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html