Re: KVM and variable-endianness guest CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23 January 2014 18:14, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote:
> On Thu, Jan 23, 2014 at 04:50:18PM -0800, Victor Kamensky wrote:
>> On 23 January 2014 12:45, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote:
>> > On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote:
>> >> On 23 January 2014 07:33, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote:
>> >> > On 23 January 2014 15:06, Victor Kamensky <victor.kamensky@xxxxxxxxxx> wrote:
>> >> >> In [1] I wrote
>> >> >>
>> >> >> "I don't see why you so attached to desire to describe
>> >> >> data part of memory transaction as just one of int
>> >> >> types. If we are talking about bunch of hypothetical
>> >> >> cases imagine such bus that allow transaction with
>> >> >> size of 6 bytes. How do you describe such data in
>> >> >> your ints speak? What endianity you can assign to
>> >> >> sequence of 6 bytes? While note that description of
>> >> >> such transaction as set of 6 byte values at address
>> >> >> $whatever makes perfect sense."
>> >> >>
>> >> >> But notice that in your next reply [2] you just dropped it
>> >> >
>> >> > Yes. This is because it was one of the places where
>> >> > I would have just had to repeat "no, I'm afraid you're wrong
>> >> > about how hardware works". I think in general it's going
>> >> > to be better if I don't try to reply point by point to this
>> >> > email; I think you should go back and reread the emails I've
>> >> > sent. Key points:
>> >> >  (1) hardware is not doing anything involving arrays
>> >> >      of bytes
>> >>
>> >> Array of bytes or integers is just a way to describe data lines
>> >> on the bus. Did you look at this document?
>> >>
>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
>> >>
>> >> A0, A1, ,,, A7 byte values are the same for both LE and BE-8
>> >> case (first two columns in the table) and they unambiguously
>> >> describe data bus signals
>> >>
>> >
>> > The point is simple, and Peter has made it over and over:
>> > Any consumer of a memory operation sees "value, len, address".
>>
>> and "endianess" of operation.
>
> no, value is value, is value.  By a consumer I mean whatever sits and
> the end of the memory bus.  There is no endianness.
>
>>
>> here is memory operation
>>
>> *(int *) (0x1000) = 0x01020304;
>
> this is from the CPU's perspective and involves specifics of a
> programming language and a compiler.  You cannot compare to the above.

compare it with this description of memory like this
unsigned char mem[] = {0x4, 0x3, 0x2, 0x1};
that is the same memory content from *anyone* perspective
at address mem we have 0x4, at addres mem+1 we have 0x3, etc

>>
>> can you tell how memory will look like at 0x1000 address - you can't
>> in LE it will look one way in BE byteswapped.
>>
>> > This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
>> > definition and having a pointer to the structure you need to be able to
>> > tell me "value, len, address".
>> >
>> >> >  (2) the API between kernel and userspace needs to define
>> >> >      the semantics of mmio.data, ie how to map between
>> >> >      "x byte wide transaction with value v" and the array,
>> >> >      and that is primarily what this conversation is about
>> >> >  (3) the only choice which is both (a) sensible and (b)
>> >> >      not breaking existing usage is to say "the array is
>> >> >      in host-kernel-byte-order"
>> >> >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
>> >> >      the same, because in the ARM case it is doing an
>> >> >      internal-to-CPU byteswap, and in the PPC case it is not
>> >>
>> >> That is one of the key disconnects. I'll go find real examples
>> >> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
>> >> everybody sake's here is summary of the disconnect:
>> >>
>> >> If we have the same h/w connected to memory bus in ARM
>> >> and PPC systems and we have the following three pieces
>> >> of code that work with r0 having same device same
>> >> register address:
>> >>
>> >> 1. ARM LE word write of  0x04030201:
>> >> setend le
>> >> mov r1, #0x04030201
>> >> str r1, [r0]
>> >>
>> >> 2. ARM BE word write of 0x01020304:
>> >> setend be
>> >> mov r1, #0x01020304
>> >> str r1, [r0]
>> >>
>> >> 3. PPC BE word write of 0x01020304:
>> >> lis     r1,0x102
>> >> ori     r1,r1,0x304
>> >> stw    r1,0(r0)
>> >>
>> >> I claim that h/w will see the same data on bus lines in all
>> >> three cases, and h/w would acts the same in all three
>> >> cases. Peter says that ARM BE and PPC BE case h/w
>> >> would act differently.
>> >>
>> >> If anyone else can offer opinion on that while I am looking
>> >> for real examples that would be great.
>> >>
>> >
>> > I really don't think listing all these examples help.
>>
>> I think Peter is wrong in his understanding how real
>> BE PPC kernel drivers work with h/w mapped devices. Going
>> with such misunderstanding to suggest how it should hold
>> info in emulated mmio case is quite strange.
>>
>> > You need to focus
>> > on the key points that Peter listed in his previous mail.
>> >
>> > I tried in our chat to ask you this questions:
>> >
>> > vcpu_data_host_to_guest() is handling a read from an emulated device.
>> > All the info you have is:
>> > (1) len of memory access
>> > (2) mmio.data pointer
>> > (3) destination register
>> > (4) host CPU endianness
>> > (5) guest CPU endianness
>> >
>> > Based on this information alone, you need to decide whether you do a
>> > byteswap or not before loading the hardware register upon returning to
>> > the guest.
>> >
>> > You will find it impossible to answer, because you don't know the layout
>> > of mmio.data, and that is the thing we are trying to solve.
>>
>> Actually I am not arguing with above. I agree that
>> meaning of mmio.data should be better clarified.
>
> Progress,  \o/
>
>>
>> I propose my clarification as array of bytes at
>> phys_addr address on BE-8,
>> byte invariant, memory bus.
>> That unambiguously
>> describes data bus signals in case of BE-8 memory
>> bus. Please look at
>>
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
>>
>> first two columns LE and BE-8, If one will specify all
>> values for A0, A1, ... A7 it will define all bus signals.
>> Note that is the only endian agnostic way to describe data
>> bus signals. If one would try to describe them in
>> half-word[s], word[s], double word one need to tell what
>> endianity of those integers (other columns in document
>> table).
>
> This is a reference to an arm1136 processor specification, which does
> not support virtualization.  We are discussing a generic kernel
> interface documentation, I'm afraid it's not useful.

The reason why BE-8 vs BE-32 explanation is in this CPU spec,
because it was transitional from BE-32 to BE-8 memory bus,
and this CPU has bit that can specify whether CPU works with
memory in BE-8 or BE-32. Since then all ARM cpus works in
BE-8, when they drop switch bit they dropped this section.
Check  CONFIG_CPU_ENDIAN_BE8 kernel config.
In fact all modern CPUs that support both endianities work
with memory bus in BE-8 mode.

>>
>> Peter claims that "I don't understand how h/w bus works".
>> I disagree with that. I gave pointer on document that describes
>> how BE-8, byte invariant, memory bus works. I would
>> appreciate pointer to document, section and page that
>> describes Peter's memory bus operation understanding.
>
> I choose to trust that Peter understands very well how a h/w bus works.
> I am not sure the documentation you request exists in public.
>
> I, however, understand how KVM works, and I understand how the
> kernel<->user ABI works, and you still haven't been able to express your
> proposal in a concise, understandable, generic maner that works for a
> KVM ABI specification, unfortunately.
>
>>
>> I pointed that Peter's proposal would have the following issue:
>> BE qemu will have to act differently depending on CPU
>> type while emulating the same device. If Peter's proposal is
>> accepted n qemu code should do something like:
>>
>> #ifdef WORD_BIGENDIAN
>> #ifdef __PPC_
>>    do one thing
>> #else
>>   do another
>> #endif
>> #endif
>
> No, your device has an operation: write32(u32 val)
>
> That's it, Peter suggests having proper device containers in QEMU that
> translate from bus native endianness to the device native endianness.
>
>>
>> there reason for that because the same device write in mmio
>> will look like this:
>>
>> ARM LE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 }
>> ARM BE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 }
>> PPC LE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 }
>> PPC BE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 }
>>
>> for ARM BE and PPC BE arrays are different even
>> it is just BE case, so code would need to '#if ARM'
>> thing
>
> I don't understand this, sorry.

In above example suppose in case of ARM guest we had guest
little endian write of 0x04030201 into device address emulated
on ARM LE host/qemu, and in this case content of
mmio.data[] array looks like this:
  mmio.data[] = { 0x01, 0x02, 0x03, 0x04 }
it is current case nothing we can do about
when the same guest run in ARM BE KVM host/qemu
and it executes exactly same access as in above
and we follow clarification you just merged, mmio.data[]
array will look like this:
  mmio.data[] = { 0x04, 0x03, 0x02, 0x01 }

>>
>> If you follow my proposal to clarify mmio.data[] meaning
>> mmio.data[] array will look the same in all 4 cases,
>> compatible with current usage.
>
> I still don't know what your proposal is, "array will look the same in
> all 4 cases" is not a definition that I can use to interpret the data
> written.
>
> The semantics you need to be able to describe is that of the memory
> operation: for example, store a word, not store an array of bytes.
>
>>
>> If Peter's proposal is adopted ARM BE and PPC LE cases
>> would be penalized with excessive back and forth
>> byteswaps. That is possible to avoid with my proposal.
>>
>
> I would take 50 byteswaps with a clear ABI any day over an obscure
> standard that can avoid a single hardware-on-register instruction.  This
> is about designing a clean software interface, not about building an
> optimized integrated stack.
>
> Unfortunately, this is going nowhere, so I think we need to stop this
> thread.  As you can see I have sent a patch as a clarification to the
> ABI, if it's merged we can move on with more important tasks.

OK, that is fine. I still believe is not the best choice,
but I agree that we need to move on. I will respin my
V7 KVM BE patches according to this new semantics, I will
integrate comments that you (thanks!) and others gave me
over mailing list and post my series again when it is ready.

Thanks,
Victor

> Thanks,
> -Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux