Re: which way is faster?

Erik Mouw <mouw@xxxxxxxxxxxx> · Mon, 18 Dec 2006 23:36:40 +0100

On Mon, Dec 18, 2006 at 05:12:53PM -0500, Ming Zhang wrote:
> See this code piece
> 
> http://lxr.linux.no/source/drivers/scsi/libata-scsi.c?v=2.6.18#L1049
> 
> 1056         lba |= ((u64)scsicmd[2]) << 24;
> 1057         lba |= ((u64)scsicmd[3]) << 16;
> 1058         lba |= ((u64)scsicmd[4]) << 8;
> 1059         lba |= ((u64)scsicmd[5]);
> 
> it can also be written as 
> 
> lba = be32_to_cpu(*(u32 *)(&scsicmd[2])
> 
> 
> i wrote a simple code to test this and i found second one is several
> time faster than first one.

I guess the total time spend in doing the conversion is orders of
magnitude smaller than the time needed to get a data block from the
drive. Don't bother optimising things that don't need to be optimised.

> if there any pitfall with 2nd way?

AFAICS be32_to_cpu() works on native alignment 32 bit integers. Yes, it
works on x86 cause x86 can do unaligned memory accesses. I don't think
your second method will work on a RISC architecture (ARM, MIPS, Alpha,
etc).

Erik

-- 
They're all fools. Don't worry. Darwin may be slow, but he'll
eventually get them. -- Matthew Lammers in alt.sysadmin.recovery

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/