On Thu, Jan 23, 2014 at 04:50:18PM -0800, Victor Kamensky wrote: > On 23 January 2014 12:45, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: > > On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote: > >> On 23 January 2014 07:33, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: > >> > On 23 January 2014 15:06, Victor Kamensky <victor.kamensky@xxxxxxxxxx> wrote: > >> >> In [1] I wrote > >> >> > >> >> "I don't see why you so attached to desire to describe > >> >> data part of memory transaction as just one of int > >> >> types. If we are talking about bunch of hypothetical > >> >> cases imagine such bus that allow transaction with > >> >> size of 6 bytes. How do you describe such data in > >> >> your ints speak? What endianity you can assign to > >> >> sequence of 6 bytes? While note that description of > >> >> such transaction as set of 6 byte values at address > >> >> $whatever makes perfect sense." > >> >> > >> >> But notice that in your next reply [2] you just dropped it > >> > > >> > Yes. This is because it was one of the places where > >> > I would have just had to repeat "no, I'm afraid you're wrong > >> > about how hardware works". I think in general it's going > >> > to be better if I don't try to reply point by point to this > >> > email; I think you should go back and reread the emails I've > >> > sent. Key points: > >> > (1) hardware is not doing anything involving arrays > >> > of bytes > >> > >> Array of bytes or integers is just a way to describe data lines > >> on the bus. Did you look at this document? > >> > >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html > >> > >> A0, A1, ,,, A7 byte values are the same for both LE and BE-8 > >> case (first two columns in the table) and they unambiguously > >> describe data bus signals > >> > > > > The point is simple, and Peter has made it over and over: > > Any consumer of a memory operation sees "value, len, address". > > and "endianess" of operation. no, value is value, is value. By a consumer I mean whatever sits and the end of the memory bus. There is no endianness. > > here is memory operation > > *(int *) (0x1000) = 0x01020304; this is from the CPU's perspective and involves specifics of a programming language and a compiler. You cannot compare to the above. > > can you tell how memory will look like at 0x1000 address - you can't > in LE it will look one way in BE byteswapped. > > > This is what KVM_EXIT_MMIO emulates. So just by knowing the ABI > > definition and having a pointer to the structure you need to be able to > > tell me "value, len, address". > > > >> > (2) the API between kernel and userspace needs to define > >> > the semantics of mmio.data, ie how to map between > >> > "x byte wide transaction with value v" and the array, > >> > and that is primarily what this conversation is about > >> > (3) the only choice which is both (a) sensible and (b) > >> > not breaking existing usage is to say "the array is > >> > in host-kernel-byte-order" > >> > (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not > >> > the same, because in the ARM case it is doing an > >> > internal-to-CPU byteswap, and in the PPC case it is not > >> > >> That is one of the key disconnects. I'll go find real examples > >> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for > >> everybody sake's here is summary of the disconnect: > >> > >> If we have the same h/w connected to memory bus in ARM > >> and PPC systems and we have the following three pieces > >> of code that work with r0 having same device same > >> register address: > >> > >> 1. ARM LE word write of 0x04030201: > >> setend le > >> mov r1, #0x04030201 > >> str r1, [r0] > >> > >> 2. ARM BE word write of 0x01020304: > >> setend be > >> mov r1, #0x01020304 > >> str r1, [r0] > >> > >> 3. PPC BE word write of 0x01020304: > >> lis r1,0x102 > >> ori r1,r1,0x304 > >> stw r1,0(r0) > >> > >> I claim that h/w will see the same data on bus lines in all > >> three cases, and h/w would acts the same in all three > >> cases. Peter says that ARM BE and PPC BE case h/w > >> would act differently. > >> > >> If anyone else can offer opinion on that while I am looking > >> for real examples that would be great. > >> > > > > I really don't think listing all these examples help. > > I think Peter is wrong in his understanding how real > BE PPC kernel drivers work with h/w mapped devices. Going > with such misunderstanding to suggest how it should hold > info in emulated mmio case is quite strange. > > > You need to focus > > on the key points that Peter listed in his previous mail. > > > > I tried in our chat to ask you this questions: > > > > vcpu_data_host_to_guest() is handling a read from an emulated device. > > All the info you have is: > > (1) len of memory access > > (2) mmio.data pointer > > (3) destination register > > (4) host CPU endianness > > (5) guest CPU endianness > > > > Based on this information alone, you need to decide whether you do a > > byteswap or not before loading the hardware register upon returning to > > the guest. > > > > You will find it impossible to answer, because you don't know the layout > > of mmio.data, and that is the thing we are trying to solve. > > Actually I am not arguing with above. I agree that > meaning of mmio.data should be better clarified. Progress, \o/ > > I propose my clarification as array of bytes at > phys_addr address on BE-8, > byte invariant, memory bus. > That unambiguously > describes data bus signals in case of BE-8 memory > bus. Please look at > > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html > > first two columns LE and BE-8, If one will specify all > values for A0, A1, ... A7 it will define all bus signals. > Note that is the only endian agnostic way to describe data > bus signals. If one would try to describe them in > half-word[s], word[s], double word one need to tell what > endianity of those integers (other columns in document > table). This is a reference to an arm1136 processor specification, which does not support virtualization. We are discussing a generic kernel interface documentation, I'm afraid it's not useful. > > Peter claims that "I don't understand how h/w bus works". > I disagree with that. I gave pointer on document that describes > how BE-8, byte invariant, memory bus works. I would > appreciate pointer to document, section and page that > describes Peter's memory bus operation understanding. I choose to trust that Peter understands very well how a h/w bus works. I am not sure the documentation you request exists in public. I, however, understand how KVM works, and I understand how the kernel<->user ABI works, and you still haven't been able to express your proposal in a concise, understandable, generic maner that works for a KVM ABI specification, unfortunately. > > I pointed that Peter's proposal would have the following issue: > BE qemu will have to act differently depending on CPU > type while emulating the same device. If Peter's proposal is > accepted n qemu code should do something like: > > #ifdef WORD_BIGENDIAN > #ifdef __PPC_ > do one thing > #else > do another > #endif > #endif No, your device has an operation: write32(u32 val) That's it, Peter suggests having proper device containers in QEMU that translate from bus native endianness to the device native endianness. > > there reason for that because the same device write in mmio > will look like this: > > ARM LE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 } > ARM BE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 } > PPC LE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 } > PPC BE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 } > > for ARM BE and PPC BE arrays are different even > it is just BE case, so code would need to '#if ARM' > thing I don't understand this, sorry. > > If you follow my proposal to clarify mmio.data[] meaning > mmio.data[] array will look the same in all 4 cases, > compatible with current usage. I still don't know what your proposal is, "array will look the same in all 4 cases" is not a definition that I can use to interpret the data written. The semantics you need to be able to describe is that of the memory operation: for example, store a word, not store an array of bytes. > > If Peter's proposal is adopted ARM BE and PPC LE cases > would be penalized with excessive back and forth > byteswaps. That is possible to avoid with my proposal. > I would take 50 byteswaps with a clear ABI any day over an obscure standard that can avoid a single hardware-on-register instruction. This is about designing a clean software interface, not about building an optimized integrated stack. Unfortunately, this is going nowhere, so I think we need to stop this thread. As you can see I have sent a patch as a clarification to the ABI, if it's merged we can move on with more important tasks. Thanks, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html