Re: coding style lesson: iwlwifi vs. endianness

Johannes Berg <johannes@xxxxxxxxxxxxxxxx> · Mon, 10 Dec 2007 12:42:29 +0100

Hi,

Apologies for the late reply, I was rather busy and wanted to
investigate your claims a bit more in detail.

> Iwlwifi driver talks to firmware using so called host command
> structures defined naturally in little endian. 

Yes, I know that. I actually read the code before saying it was ugly.

> Statistically most of
> the operations are setting and testing bits which is endianity
> agnostic meaning no run time swap operations are needed.

It's interesting that you say this, single bits don't need any
endianness conversion at all if you simply define the container types as
u8. Of course, that isn't always how the hardware is documented and not
always the right thing to do, but it is possible.

However, with much of the code it it would actually make sense: For
example, RXON_FLG_* go from bit 0 to bit 7 (a single u8) and then comes
the antenna selection mask. "HT flags" are called "flags" but really are
two- and three-bit values mashed into a single field so the "flags"
argument doesn't apply. Filter flags go from 0-6. Similar for QoS flags.
Key "flags" really aren't completely flags, contain a three-bit field
for the encryption type. etc etc.

> The few computation including the shifting one visible in Johannes
> example are the trade off of run time efficiency and 'ugliness'. It
> was better bargain for us.  Iwlwifi keeps that in host command
> structures through the driver operation so we don't need to keep
> shadow native layered structures and there is quite a few of them.

I never said you should keep native layout structures. I merely said
that it leads to better code if the data is converted to CPU endianness
at the system boundary. While the actual system boundary is obviously
the PCI(-E) bus, the logical system boundary may well be at the point
where you load a certain value from the DMA memory into the CPU to
operate with it (whether it's then pushed back out to memory on the
stack is uninteresting). Yes, doing that can lead to larger C files, but
the compile result won't suffer; given a sufficiently smart compiler
it'll even be *completely* equivalent on little-endian platforms
(because nothing is declared volatile so the extra variables containing
the CPU byte order fields can be completely removed by the compiler by
going back to the memory the data came from if not enough registers are
available).

> Also number swap operation is much lower then the other approach.

As I will show, this is not true. At least not if you go away from
"shadow" structures.

> We rather think of all constants for host commands as liitle endian.
> As we stick consistently to this paradigma it becomes natural.
> I'm not sure if this was a good decision for a long run as we are
> adding more hardware and we are need more abstraction, at that point
> it looked correct.

Maybe it's "natural" to Intel people to think in little endian, but I'd
guess most other people don't want to be bothered thinking about
endianness in every single line of code.

Here's my argument for why this is inefficient not only in terms of
comprehensibility but also in terms of runtime penalty.

Let's consider STA flags. I don't know whether they're written or read
more but it hardly matters.

The field is defined as follows:
| bits | 0-7 | 8       | 9-16 | 19-20 | 21  | 22          | 23-25        | ...
| means|  -  | pwrsave |  -   | aggsz | fat | mimodisable | mpdu density | -

In order to set all information in this field, you need code like this:

--- >% ---
__le32 staflags = 0;

if (sta power save)
	staflags |= STA_FLG_PWR_SAVE_MSK;

staflags |= cpu_to_le32(ampdu_factor << STA_FLG_MAX_AGG_SIZE_POS);
staflags |= cpu_to_le32(ampdu_density << STA_FLG_AGG_MPDU_DENSITY_POS);

[...]

memstruct->sta_flags = staflags;
--- %< ---

(similar code is present in iwl4965_set_ht_add_station)

Let's compare to my approach (assuming the constants are appropriately
redefined):
--- >% ---
u32 staflags = 0;

if (sta power save)
	staflags |= STA_FLG_PWR_SAVE_MSK;

staflags |= ampdu_factor << STA_FLG_MAX_AGG_SIZE_POS;
staflags |= ampdu_density << STA_FLG_AGG_MPDU_DENSITY_POS;

[...]

memstruct->sta_flags = cpu_to_le32(staflags);
--- %< ---

Notice how it is
 (a) easier to understand (especially when you look at the constants)
and
 (b) more (run-time) efficient?

Yes, you can make a case for "flags are nicer this way", but in fact
even when you have flags it doesn't matter much at all because the flags
value is loaded from memory which takes much longer than byte-swapping
it anyway. [1]

The above code becomes (in my opinion) even harder to understand when
you read out these fields, because it requires doing something like
this:

ampdu_factor = le32_to_cpu(staflags) >> STA_FLG_MAX_AGG_SIZE_POS;
[etc]

which is similar to the example I quoted in my first mail and absolutely
not intuitive.

johannes

[1] And you could even use the little-endian load macros (like
le32_to_cpup, beware of alignment though) that powerpc for example
optimises into a single "lwbrx" statement, sparc64 does a similar thing.
For little endian processors it doesn't make a difference at all.

Attachment:
signature.asc

Description: This is a digitally signed message part