On Fri, 2011-02-18 at 11:35 -0800, Michael Chan wrote: > On Fri, 2011-02-18 at 08:13 -0800, James Bottomley wrote: > > > The 32-bit buf_addr_hi/buf_addr_lo fields are the same. But the 16-bit fields > > > have to be reversed in their order. In the code, we reference these fields as- > > > is directly and the chip does the endian swapping when DMA'ing the whole > > > structure. > > > > > > If we want to get rid of even the #ifdef in the .h file, we'll need to redefine > > > all 8-bit and 16-bit fields and combine them as 32-bit fields. When assigning > > > 8-bit or 16-bit quantities, we'll have to mask and shift them in properly. > > > Again, no cpu_to_xxx is needed because of the DMA endian swap done by the chip. > > > > > > So we trade clean .h files for shifting and masking operations in the code. > > > These .h files are generated by automatic scripts to match the firmware > > > structures. > > > > I understand that, but you're creating a complete mess. Your hardware > > swapper clearly only operates on 32 bit quantities, hence the header > > nightmare. This is actually very similar to the qla1280 situation. The > > consequence is that every header needs two separate definitions for the > > DMA quantities which doubles the amount of testing and makes the headers > > very hard to verify. > > > > I also don't quite get what you think this buys (other than added > > complexity). The standard way to do this is to DMA bus endian > > structures (i.e. little endian for PCI). You do the swaps as you load > > using the cpu_to_leX() macros. These macros are pretty much free on all > > architectures (they're nops on the LE ones and the BE ones mostly have > > load and swap which costs the same as a load). > > > > May be the hardware guys didn't know that cpu_to_le32() is almost free > on BE. I've already started a discussion with the firmware guys on > this. They generate these .h files for us to match their structures. Great ... the relevant details are all in include/linux/byteorder with the machine specific part being in arch/<mach>/include/asm/swab.h PPC and Sparc are the architectures that do it all in a single instruction; Mips does a pipelined two instruction register swap and parisc does a pipelined extract/deposit; even for mips and parisc, where extra instructions are generated, the overhead is a small fraction of the load/store memory latency. > Can we do this optimization after merge? Or it needs to be done before > merge? Well ... normally I force this type of change before merging, because my power greatly diminishes after something gets in. However, in this case, I'll trust you to follow through ... lets say an opportunity to demonstrate good faith to the community. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html