On 23 January 2014 18:14, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: > On Thu, Jan 23, 2014 at 04:50:18PM -0800, Victor Kamensky wrote: >> On 23 January 2014 12:45, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: >> > On Thu, Jan 23, 2014 at 08:25:35AM -0800, Victor Kamensky wrote: >> >> On 23 January 2014 07:33, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: >> >> > On 23 January 2014 15:06, Victor Kamensky <victor.kamensky@xxxxxxxxxx> wrote: >> >> >> In [1] I wrote >> >> >> >> >> >> "I don't see why you so attached to desire to describe >> >> >> data part of memory transaction as just one of int >> >> >> types. If we are talking about bunch of hypothetical >> >> >> cases imagine such bus that allow transaction with >> >> >> size of 6 bytes. How do you describe such data in >> >> >> your ints speak? What endianity you can assign to >> >> >> sequence of 6 bytes? While note that description of >> >> >> such transaction as set of 6 byte values at address >> >> >> $whatever makes perfect sense." >> >> >> >> >> >> But notice that in your next reply [2] you just dropped it >> >> > >> >> > Yes. This is because it was one of the places where >> >> > I would have just had to repeat "no, I'm afraid you're wrong >> >> > about how hardware works". I think in general it's going >> >> > to be better if I don't try to reply point by point to this >> >> > email; I think you should go back and reread the emails I've >> >> > sent. Key points: >> >> > (1) hardware is not doing anything involving arrays >> >> > of bytes >> >> >> >> Array of bytes or integers is just a way to describe data lines >> >> on the bus. Did you look at this document? >> >> >> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html >> >> >> >> A0, A1, ,,, A7 byte values are the same for both LE and BE-8 >> >> case (first two columns in the table) and they unambiguously >> >> describe data bus signals >> >> >> > >> > The point is simple, and Peter has made it over and over: >> > Any consumer of a memory operation sees "value, len, address". >> >> and "endianess" of operation. > > no, value is value, is value. By a consumer I mean whatever sits and > the end of the memory bus. There is no endianness. > >> >> here is memory operation >> >> *(int *) (0x1000) = 0x01020304; > > this is from the CPU's perspective and involves specifics of a > programming language and a compiler. You cannot compare to the above. compare it with this description of memory like this unsigned char mem[] = {0x4, 0x3, 0x2, 0x1}; that is the same memory content from *anyone* perspective at address mem we have 0x4, at addres mem+1 we have 0x3, etc >> >> can you tell how memory will look like at 0x1000 address - you can't >> in LE it will look one way in BE byteswapped. >> >> > This is what KVM_EXIT_MMIO emulates. So just by knowing the ABI >> > definition and having a pointer to the structure you need to be able to >> > tell me "value, len, address". >> > >> >> > (2) the API between kernel and userspace needs to define >> >> > the semantics of mmio.data, ie how to map between >> >> > "x byte wide transaction with value v" and the array, >> >> > and that is primarily what this conversation is about >> >> > (3) the only choice which is both (a) sensible and (b) >> >> > not breaking existing usage is to say "the array is >> >> > in host-kernel-byte-order" >> >> > (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not >> >> > the same, because in the ARM case it is doing an >> >> > internal-to-CPU byteswap, and in the PPC case it is not >> >> >> >> That is one of the key disconnects. I'll go find real examples >> >> in ARM LE, ARM BE, and PPC BE Linux kernel. Just for >> >> everybody sake's here is summary of the disconnect: >> >> >> >> If we have the same h/w connected to memory bus in ARM >> >> and PPC systems and we have the following three pieces >> >> of code that work with r0 having same device same >> >> register address: >> >> >> >> 1. ARM LE word write of 0x04030201: >> >> setend le >> >> mov r1, #0x04030201 >> >> str r1, [r0] >> >> >> >> 2. ARM BE word write of 0x01020304: >> >> setend be >> >> mov r1, #0x01020304 >> >> str r1, [r0] >> >> >> >> 3. PPC BE word write of 0x01020304: >> >> lis r1,0x102 >> >> ori r1,r1,0x304 >> >> stw r1,0(r0) >> >> >> >> I claim that h/w will see the same data on bus lines in all >> >> three cases, and h/w would acts the same in all three >> >> cases. Peter says that ARM BE and PPC BE case h/w >> >> would act differently. >> >> >> >> If anyone else can offer opinion on that while I am looking >> >> for real examples that would be great. >> >> >> > >> > I really don't think listing all these examples help. >> >> I think Peter is wrong in his understanding how real >> BE PPC kernel drivers work with h/w mapped devices. Going >> with such misunderstanding to suggest how it should hold >> info in emulated mmio case is quite strange. >> >> > You need to focus >> > on the key points that Peter listed in his previous mail. >> > >> > I tried in our chat to ask you this questions: >> > >> > vcpu_data_host_to_guest() is handling a read from an emulated device. >> > All the info you have is: >> > (1) len of memory access >> > (2) mmio.data pointer >> > (3) destination register >> > (4) host CPU endianness >> > (5) guest CPU endianness >> > >> > Based on this information alone, you need to decide whether you do a >> > byteswap or not before loading the hardware register upon returning to >> > the guest. >> > >> > You will find it impossible to answer, because you don't know the layout >> > of mmio.data, and that is the thing we are trying to solve. >> >> Actually I am not arguing with above. I agree that >> meaning of mmio.data should be better clarified. > > Progress, \o/ > >> >> I propose my clarification as array of bytes at >> phys_addr address on BE-8, >> byte invariant, memory bus. >> That unambiguously >> describes data bus signals in case of BE-8 memory >> bus. Please look at >> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/ch06s05s01.html >> >> first two columns LE and BE-8, If one will specify all >> values for A0, A1, ... A7 it will define all bus signals. >> Note that is the only endian agnostic way to describe data >> bus signals. If one would try to describe them in >> half-word[s], word[s], double word one need to tell what >> endianity of those integers (other columns in document >> table). > > This is a reference to an arm1136 processor specification, which does > not support virtualization. We are discussing a generic kernel > interface documentation, I'm afraid it's not useful. The reason why BE-8 vs BE-32 explanation is in this CPU spec, because it was transitional from BE-32 to BE-8 memory bus, and this CPU has bit that can specify whether CPU works with memory in BE-8 or BE-32. Since then all ARM cpus works in BE-8, when they drop switch bit they dropped this section. Check CONFIG_CPU_ENDIAN_BE8 kernel config. In fact all modern CPUs that support both endianities work with memory bus in BE-8 mode. >> >> Peter claims that "I don't understand how h/w bus works". >> I disagree with that. I gave pointer on document that describes >> how BE-8, byte invariant, memory bus works. I would >> appreciate pointer to document, section and page that >> describes Peter's memory bus operation understanding. > > I choose to trust that Peter understands very well how a h/w bus works. > I am not sure the documentation you request exists in public. > > I, however, understand how KVM works, and I understand how the > kernel<->user ABI works, and you still haven't been able to express your > proposal in a concise, understandable, generic maner that works for a > KVM ABI specification, unfortunately. > >> >> I pointed that Peter's proposal would have the following issue: >> BE qemu will have to act differently depending on CPU >> type while emulating the same device. If Peter's proposal is >> accepted n qemu code should do something like: >> >> #ifdef WORD_BIGENDIAN >> #ifdef __PPC_ >> do one thing >> #else >> do another >> #endif >> #endif > > No, your device has an operation: write32(u32 val) > > That's it, Peter suggests having proper device containers in QEMU that > translate from bus native endianness to the device native endianness. > >> >> there reason for that because the same device write in mmio >> will look like this: >> >> ARM LE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 } >> ARM BE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 } >> PPC LE mmio.data[] = { 0x04, 0x03, 0x02, 0x01 } >> PPC BE mmio.data[] = { 0x01, 0x02, 0x03, 0x04 } >> >> for ARM BE and PPC BE arrays are different even >> it is just BE case, so code would need to '#if ARM' >> thing > > I don't understand this, sorry. In above example suppose in case of ARM guest we had guest little endian write of 0x04030201 into device address emulated on ARM LE host/qemu, and in this case content of mmio.data[] array looks like this: mmio.data[] = { 0x01, 0x02, 0x03, 0x04 } it is current case nothing we can do about when the same guest run in ARM BE KVM host/qemu and it executes exactly same access as in above and we follow clarification you just merged, mmio.data[] array will look like this: mmio.data[] = { 0x04, 0x03, 0x02, 0x01 } >> >> If you follow my proposal to clarify mmio.data[] meaning >> mmio.data[] array will look the same in all 4 cases, >> compatible with current usage. > > I still don't know what your proposal is, "array will look the same in > all 4 cases" is not a definition that I can use to interpret the data > written. > > The semantics you need to be able to describe is that of the memory > operation: for example, store a word, not store an array of bytes. > >> >> If Peter's proposal is adopted ARM BE and PPC LE cases >> would be penalized with excessive back and forth >> byteswaps. That is possible to avoid with my proposal. >> > > I would take 50 byteswaps with a clear ABI any day over an obscure > standard that can avoid a single hardware-on-register instruction. This > is about designing a clean software interface, not about building an > optimized integrated stack. > > Unfortunately, this is going nowhere, so I think we need to stop this > thread. As you can see I have sent a patch as a clarification to the > ABI, if it's merged we can move on with more important tasks. OK, that is fine. I still believe is not the best choice, but I agree that we need to move on. I will respin my V7 KVM BE patches according to this new semantics, I will integrate comments that you (thanks!) and others gave me over mailing list and post my series again when it is ready. Thanks, Victor > Thanks, > -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html