On Sat, Aug 20, 2011 at 12:46:25PM +0200, David Kuehling wrote: > this patch adds a MIPS assembler version of the twofish cipher > algorithm. x86(_64) had an assembler version of twofish for some time > now, giving it an "unfair" advantage against the not so common > architectures. > > The code wast put through various tests and benchmarks in a user-space > testbench, testing its correctness (also via qemu) for 32bit vs. 64bit > and big vs. little endian MIPS. > > http://mosquito.dyndns.tv/freesvn/trunk/linux/twofish-mips/ > > Also the same code backported to a 2.6.39 kernel (64bit) has been > working flawlessly on my loongson-2f machine (with a twofish dm-crypt > root) for a few weeks now. > > Performance gain for the pure crypto code in 32-bit mode is about 15% on > both Ingenic Xburst and Loongson-2f. With 64-bit it is 24-29%, as the C > code suffers from 32->64 bit type conversions inserted by gcc. The code > uses less unrolling than the C version, and should be smaller by at > least a factor of 8. Full results in file Benchres.txt at the SVN URL > given above. > > Signed-off-by: David Kühling <dvdkhlng@xxxxxx> Lots of trailing whitespace in that patch. scripts/checkpatch.pl would have warned about those … +#if __mips64 +# define HAVE_64BIT +#endif s/HAVE_64BIT/CONFIG_64BIT/ +#if _MIPS_SZPTR == 32 +# define PTRADD addu +# warning "detected pointer-size as 32-bit" +#else +# ifndef HAVE_64BIT +# error "something is broken: detected pointer size > GPR size" +# endif +# define PTRADD daddu +# warning "detected pointer-size as 64-bit" +#endif PTR_ADDU from <asm/asm.h> does the same thing as PTRADDU so the entire ifdefery above can go away. Finally the use of register names starting with $ is a bit obscure. Kernel code needs to build for N64 and O32 but of those only N64 has registers called $ta0 .. $ta3 which are the equivalent to registers $8 .. $11. And in O32 registers $t0 ... $t3 also aliases for $8 .. $11. I haven't fully analyzed the code to ensure that there is no register conflict arising from that. I was surprised that gas assembles the code at all. $ta0 .. $ta3 are N32 / N64 register names and consider gas permitting the use of these registers in O32 a bug - but see my other posting to the binutils list for that. Improvment suggestion for le32_fromto_cpu - MIPS 32/64 R2 CPUs can use the wsbh instruction to faster endianess swapping: static inline __attribute_const__ __u32 __arch_swab32(__u32 x) { __asm__( " wsbh %0, %1 \n" " rotr %0, %0, 16 \n" : "=r" (x) : "r" (x)); return x; } #define __arch_swab32 __arch_swab32 This code fragment is from <asm/swab.h>. + /* if we turned this into 64-bit ops, we get endianess issues on + big-endian mips, plus alignment problems */ Some CPUs (Cavium Octeon) handle unaligned loads in hardware, on others a combination of LDL / LDR and SDL / SDR could be used to handle the unaligned loads and DSBH / DSHD (see __arch_swab64 in swab.h) could be used on MIPS64 R2 CPUs to handle the alignment issues. Or a rotate - have to think about it. Ralf