Re: KVM and variable-endianness guest CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 23, 2014 at 04:50:18PM -0800, Victor Kamensky wrote:
> On 23 January 2014 12:45, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote:
> > On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote:
> >> On 23 January 2014 07:33, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote:
> >> > On 23 January 2014 15:06, Victor Kamensky <victor.kamensky@xxxxxxxxxx> wrote:
> >> >> In [1] I wrote
> >> >>
> >> >> "I don't see why you so attached to desire to describe
> >> >> data part of memory transaction as just one of int
> >> >> types. If we are talking about bunch of hypothetical
> >> >> cases imagine such bus that allow transaction with
> >> >> size of 6 bytes. How do you describe such data in
> >> >> your ints speak? What endianity you can assign to
> >> >> sequence of 6 bytes? While note that description of
> >> >> such transaction as set of 6 byte values at address
> >> >> $whatever makes perfect sense."
> >> >>
> >> >> But notice that in your next reply [2] you just dropped it
> >> >
> >> > Yes. This is because it was one of the places where
> >> > I would have just had to repeat "no, I'm afraid you're wrong
> >> > about how hardware works". I think in general it's going
> >> > to be better if I don't try to reply point by point to this
> >> > email; I think you should go back and reread the emails I've
> >> > sent. Key points:
> >> >  (1) hardware is not doing anything involving arrays
> >> >      of bytes
> >>
> >> Array of bytes or integers is just a way to describe data lines
> >> on the bus. Did you look at this document?
> >>
> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
> >>
> >> A0, A1, ,,, A7 byte values are the same for both LE and BE-8
> >> case (first two columns in the table) and they unambiguously
> >> describe data bus signals
> >>
> >
> > The point is simple, and Peter has made it over and over:
> > Any consumer of a memory operation sees "value, len, address".
> 
> and "endianess" of operation.

no, value is value, is value.  By a consumer I mean whatever sits and
the end of the memory bus.  There is no endianness.

> 
> here is memory operation
> 
> *(int *) (0x1000) = 0x01020304;

this is from the CPU's perspective and involves specifics of a
programming language and a compiler.  You cannot compare to the above.

> 
> can you tell how memory will look like at 0x1000 address - you can't
> in LE it will look one way in BE byteswapped.
> 
> > This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
> > definition and having a pointer to the structure you need to be able to
> > tell me "value, len, address".
> >
> >> >  (2) the API between kernel and userspace needs to define
> >> >      the semantics of mmio.data, ie how to map between
> >> >      "x byte wide transaction with value v" and the array,
> >> >      and that is primarily what this conversation is about
> >> >  (3) the only choice which is both (a) sensible and (b)
> >> >      not breaking existing usage is to say "the array is
> >> >      in host-kernel-byte-order"
> >> >  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
> >> >      the same, because in the ARM case it is doing an
> >> >      internal-to-CPU byteswap, and in the PPC case it is not
> >>
> >> That is one of the key disconnects. I'll go find real examples
> >> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
> >> everybody sake's here is summary of the disconnect:
> >>
> >> If we have the same h/w connected to memory bus in ARM
> >> and PPC systems and we have the following three pieces
> >> of code that work with r0 having same device same
> >> register address:
> >>
> >> 1. ARM LE word write of  0x04030201:
> >> setend le
> >> mov r1, #0x04030201
> >> str r1, [r0]
> >>
> >> 2. ARM BE word write of 0x01020304:
> >> setend be
> >> mov r1, #0x01020304
> >> str r1, [r0]
> >>
> >> 3. PPC BE word write of 0x01020304:
> >> lis     r1,0x102
> >> ori     r1,r1,0x304
> >> stw    r1,0(r0)
> >>
> >> I claim that h/w will see the same data on bus lines in all
> >> three cases, and h/w would acts the same in all three
> >> cases. Peter says that ARM BE and PPC BE case h/w
> >> would act differently.
> >>
> >> If anyone else can offer opinion on that while I am looking
> >> for real examples that would be great.
> >>
> >
> > I really don't think listing all these examples help.
> 
> I think Peter is wrong in his understanding how real
> BE PPC kernel drivers work with h/w mapped devices. Going
> with such misunderstanding to suggest how it should hold
> info in emulated mmio case is quite strange.
> 
> > You need to focus
> > on the key points that Peter listed in his previous mail.
> >
> > I tried in our chat to ask you this questions:
> >
> > vcpu_data_host_to_guest() is handling a read from an emulated device.
> > All the info you have is:
> > (1) len of memory access
> > (2) mmio.data pointer
> > (3) destination register
> > (4) host CPU endianness
> > (5) guest CPU endianness
> >
> > Based on this information alone, you need to decide whether you do a
> > byteswap or not before loading the hardware register upon returning to
> > the guest.
> >
> > You will find it impossible to answer, because you don't know the layout
> > of mmio.data, and that is the thing we are trying to solve.
> 
> Actually I am not arguing with above. I agree that
> meaning of mmio.data should be better clarified.

Progress,  \o/

> 
> I propose my clarification as array of bytes at
> phys_addr address on BE-8,
> byte invariant, memory bus. 
> That unambiguously
> describes data bus signals in case of BE-8 memory
> bus. Please look at
> 
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html
> 
> first two columns LE and BE-8, If one will specify all
> values for A0, A1, ... A7 it will define all bus signals.
> Note that is the only endian agnostic way to describe data
> bus signals. If one would try to describe them in
> half-word[s], word[s], double word one need to tell what
> endianity of those integers (other columns in document
> table).

This is a reference to an arm1136 processor specification, which does
not support virtualization.  We are discussing a generic kernel
interface documentation, I'm afraid it's not useful.

> 
> Peter claims that "I don't understand how h/w bus works".
> I disagree with that. I gave pointer on document that describes
> how BE-8, byte invariant, memory bus works. I would
> appreciate pointer to document, section and page that
> describes Peter's memory bus operation understanding.

I choose to trust that Peter understands very well how a h/w bus works.
I am not sure the documentation you request exists in public.

I, however, understand how KVM works, and I understand how the
kernel<->user ABI works, and you still haven't been able to express your
proposal in a concise, understandable, generic maner that works for a
KVM ABI specification, unfortunately.

> 
> I pointed that Peter's proposal would have the following issue:
> BE qemu will have to act differently depending on CPU
> type while emulating the same device. If Peter's proposal is
> accepted n qemu code should do something like:
> 
> #ifdef WORD_BIGENDIAN
> #ifdef __PPC_
>    do one thing
> #else
>   do another
> #endif
> #endif

No, your device has an operation: write32(u32 val)

That's it, Peter suggests having proper device containers in QEMU that
translate from bus native endianness to the device native endianness.

> 
> there reason for that because the same device write in mmio
> will look like this:
> 
> ARM LE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 }
> ARM BE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 }
> PPC LE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 }
> PPC BE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 }
> 
> for ARM BE and PPC BE arrays are different even
> it is just BE case, so code would need to '#if ARM'
> thing

I don't understand this, sorry.

> 
> If you follow my proposal to clarify mmio.data[] meaning
> mmio.data[] array will look the same in all 4 cases,
> compatible with current usage.

I still don't know what your proposal is, "array will look the same in
all 4 cases" is not a definition that I can use to interpret the data
written.

The semantics you need to be able to describe is that of the memory
operation: for example, store a word, not store an array of bytes.

> 
> If Peter's proposal is adopted ARM BE and PPC LE cases
> would be penalized with excessive back and forth
> byteswaps. That is possible to avoid with my proposal.
> 

I would take 50 byteswaps with a clear ABI any day over an obscure
standard that can avoid a single hardware-on-register instruction.  This
is about designing a clean software interface, not about building an
optimized integrated stack.

Unfortunately, this is going nowhere, so I think we need to stop this
thread.  As you can see I have sent a patch as a clarification to the
ABI, if it's merged we can move on with more important tasks.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux