On Mon, 26 Jan 2015, David Daney wrote: > > Well, read(2), write(2) and similar calls operate on byte streams, these > > are endianness agnostic (like the text of this e-mail for example is -- > > it's stored in memory of a byte-addressed computer the same way regardless > > of its processor's endianness). > > This is precisely the point I was attempting to make. What you say here is > *not* correct with respect to MIPS as specified in the architecture reference > mentioned above. The byte streams are scrambled up when viewed from contexts > of opposite endianness. > > Byte streams are *not* endian agnostic, but aligned 64-bit loads and stores > are. > > It is bizarre, and perhaps almost mind bending, but that seems to be how it is > specified. Certainly the OCTEON implementation works this way. Well, I think this observation: "2.2.2.2 Memory Operation Functions "Regardless of byte ordering (big- or little-endian), the address of a halfword, word, or doubleword is the smallest byte address of the bytes that form the object. For big-endian ordering this is the most-significant byte; for a little-endian ordering this is the least-significant byte." contradicts your claim as it would not be possible to have all these quantities at once arranged such that the smallest byte address points at the quantity itself *and* the MSB or LSB for the big- and the little-endian memory interface byte ordering respectively *both* at a time. The implication of the above observation is that a 64-bit, 32-bit and 16-bit values of 0xfedcba9876543210, 0x76543210 and 0x3210 respectively are for the individual memory interface endiannesses stored in memory like this: memory interface endiannesses | memory big little big little big little | address ------------------------------------------------+--------- 0x10 0xfe | 7 0x32 0xdc | 6 0x54 0xba | 5 0x76 0x98 | 4 0x98 0x76 0x10 0x76 | 3 0xba 0x54 0x32 0x54 | 2 0xdc 0x32 0x54 0x32 0x10 0x32 | 1 0xfe 0x10 0x76 0x10 0x32 0x10 | 0 This representation meets all the requirements set by 2.2.2.2 and makes the reverse-endian interpretation correct as well. And the supposedly bizarre physical address adjustment made by the LB, etc. pseudocode you refer to merely reflects the fact that (in the 64-bit case considered here) sub-doubleword addresses (i.e. the 3 LSBs) are presented on the SysAD bus with byte enables rather than via address lines. This is clearly indicated in the description of `LoadMemory' and `StoreMemory' pseudocode: "The low-order 2 (or 3) bits of the address and the AccessLength indicate which of the bytes within MemElem need to be passed to the processor." So given a 64-bit SysAD bus to load a byte from the bus/memory address 0x00000000 a 0b00000001 logical bit pattern has to be driven on BE[7:0]. And that pattern corresponds to CPU's physical address 0x00000000 in the native-endian load/store instruction mode, but 0x00000007 in the reverse-endian load/store instruction mode, because the doubleword location requested is swapped compared to how the memory interface has been configured (the `BigEndianMem' setting in the architecture spec). Hence the `XOR ReverseEndian' address adjustment made for the reverse-endianness mode. And similarly for other sub-doubleword accesses; they'll have a higher number of byte enables asserted accordingly. So again, to illustrate, we have a 64-bit value of 0xfedcba9876543210 stored at the bus/memory address 0x00000000 and will use LB to retrieve the byte at that address. For the big memory interface endianness and a native access we have this: BE: BE7 BE6 BE5 BE4 BE3 BE2 BE1 BE0 memory: 0x10 0x32 0x54 0x76 0x98 0xba 0xdc 0xfe | | | | | | | | \ \ \ \ / / / / \ \ \ \ / / / / ------------------------\ /------------------------ X ------------------------/ \------------------------ / / / / \ \ \ \ / / / / \ \ \ \ | | | | | | | | buffer: 0xfe 0xdc 0xba 0x98 0x76 0x54 0x32 0x10 pAddr: 0 1 2 3 4 5 6 7 (forgive the inferior ASCII art, I hope the way lanes are swapped is clear). So the byte at pAddr 0 in the buffer corresponds to 0xfe at BE0 and consequently bus/memory address 0x00000000, as expected. Now in the reverse-endian load/store instruction mode we have the lane swapping reconfigured in the memory interface so now things look like this: BE: BE7 BE6 BE5 BE4 BE3 BE2 BE1 BE0 memory: 0x10 0x32 0x54 0x76 0x98 0xba 0xdc 0xfe | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | buffer: 0x10 0x32 0x54 0x76 0x98 0xba 0xdc 0xfe pAddr: 0 1 2 3 4 5 6 7 Of course (on a byte-addressed machine) byte addresses are the same regardless of the endianness, so we still want to retrieve the 0xfe byte at BE0. But that now corresponds to pAddr 7! For the little-endian memory interface mode things are reversed respectively and for the native mode we have: BE: BE7 BE6 BE5 BE4 BE3 BE2 BE1 BE0 memory: 0xfe 0xdc 0xba 0x98 0x76 0x54 0x32 0x10 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | buffer: 0xfe 0xdc 0xba 0x98 0x76 0x54 0x32 0x10 pAddr: 7 6 5 4 3 2 1 0 and for the reverse-endian one: BE: BE7 BE6 BE5 BE4 BE3 BE2 BE1 BE0 memory: 0xfe 0xdc 0xba 0x98 0x76 0x54 0x32 0x10 | | | | | | | | \ \ \ \ / / / / \ \ \ \ / / / / ------------------------\ /------------------------ X ------------------------/ \------------------------ / / / / \ \ \ \ / / / / \ \ \ \ | | | | | | | | buffer: 0x10 0x32 0x54 0x76 0x98 0xba 0xdc 0xfe pAddr: 7 6 5 4 3 2 1 0 -- so again we have to use pAddr 7 to get at BE0/0x10. Notice that this physical address adjustment is then cancelled by making a reverse `XOR BigEndianCPU' adjustment in calculating the byte offset for the sub-doubleword value to extract from the intermediate doubleword buffer used by the pseudocode (where `BigEndianCPU' is calculated as `BigEndianMem XOR ReverseEndian', as defined by Table 1.1 "Symbols Used in Instruction Operation Statements"). So referring to the examples above, the leftmost byte is extracted from the buffer in the reverse-endian mode rather than the rightmost one as it would in the native mode. Again, this makes things work as intended, and makes any byte stream stored in memory endianness-agnostic. So I don't exactly know what you did with the Octeon implementation, but I do hope you did the sane thing. Maciej