Hi, Apologies for the late reply, I was rather busy and wanted to investigate your claims a bit more in detail. > Iwlwifi driver talks to firmware using so called host command > structures defined naturally in little endian. Yes, I know that. I actually read the code before saying it was ugly. > Statistically most of > the operations are setting and testing bits which is endianity > agnostic meaning no run time swap operations are needed. It's interesting that you say this, single bits don't need any endianness conversion at all if you simply define the container types as u8. Of course, that isn't always how the hardware is documented and not always the right thing to do, but it is possible. However, with much of the code it it would actually make sense: For example, RXON_FLG_* go from bit 0 to bit 7 (a single u8) and then comes the antenna selection mask. "HT flags" are called "flags" but really are two- and three-bit values mashed into a single field so the "flags" argument doesn't apply. Filter flags go from 0-6. Similar for QoS flags. Key "flags" really aren't completely flags, contain a three-bit field for the encryption type. etc etc. > The few computation including the shifting one visible in Johannes > example are the trade off of run time efficiency and 'ugliness'. It > was better bargain for us. Iwlwifi keeps that in host command > structures through the driver operation so we don't need to keep > shadow native layered structures and there is quite a few of them. I never said you should keep native layout structures. I merely said that it leads to better code if the data is converted to CPU endianness at the system boundary. While the actual system boundary is obviously the PCI(-E) bus, the logical system boundary may well be at the point where you load a certain value from the DMA memory into the CPU to operate with it (whether it's then pushed back out to memory on the stack is uninteresting). Yes, doing that can lead to larger C files, but the compile result won't suffer; given a sufficiently smart compiler it'll even be *completely* equivalent on little-endian platforms (because nothing is declared volatile so the extra variables containing the CPU byte order fields can be completely removed by the compiler by going back to the memory the data came from if not enough registers are available). > Also number swap operation is much lower then the other approach. As I will show, this is not true. At least not if you go away from "shadow" structures. > We rather think of all constants for host commands as liitle endian. > As we stick consistently to this paradigma it becomes natural. > I'm not sure if this was a good decision for a long run as we are > adding more hardware and we are need more abstraction, at that point > it looked correct. Maybe it's "natural" to Intel people to think in little endian, but I'd guess most other people don't want to be bothered thinking about endianness in every single line of code. Here's my argument for why this is inefficient not only in terms of comprehensibility but also in terms of runtime penalty. Let's consider STA flags. I don't know whether they're written or read more but it hardly matters. The field is defined as follows: | bits | 0-7 | 8 | 9-16 | 19-20 | 21 | 22 | 23-25 | ... | means| - | pwrsave | - | aggsz | fat | mimodisable | mpdu density | - In order to set all information in this field, you need code like this: --- >% --- __le32 staflags = 0; if (sta power save) staflags |= STA_FLG_PWR_SAVE_MSK; staflags |= cpu_to_le32(ampdu_factor << STA_FLG_MAX_AGG_SIZE_POS); staflags |= cpu_to_le32(ampdu_density << STA_FLG_AGG_MPDU_DENSITY_POS); [...] memstruct->sta_flags = staflags; --- %< --- (similar code is present in iwl4965_set_ht_add_station) Let's compare to my approach (assuming the constants are appropriately redefined): --- >% --- u32 staflags = 0; if (sta power save) staflags |= STA_FLG_PWR_SAVE_MSK; staflags |= ampdu_factor << STA_FLG_MAX_AGG_SIZE_POS; staflags |= ampdu_density << STA_FLG_AGG_MPDU_DENSITY_POS; [...] memstruct->sta_flags = cpu_to_le32(staflags); --- %< --- Notice how it is (a) easier to understand (especially when you look at the constants) and (b) more (run-time) efficient? Yes, you can make a case for "flags are nicer this way", but in fact even when you have flags it doesn't matter much at all because the flags value is loaded from memory which takes much longer than byte-swapping it anyway. [1] The above code becomes (in my opinion) even harder to understand when you read out these fields, because it requires doing something like this: ampdu_factor = le32_to_cpu(staflags) >> STA_FLG_MAX_AGG_SIZE_POS; [etc] which is similar to the example I quoted in my first mail and absolutely not intuitive. johannes [1] And you could even use the little-endian load macros (like le32_to_cpup, beware of alignment though) that powerpc for example optimises into a single "lwbrx" statement, sparc64 does a similar thing. For little endian processors it doesn't make a difference at all.
Attachment:
signature.asc
Description: This is a digitally signed message part