On Wed, 2013-04-17 at 20:58 +0300, Jussi Kivilinna wrote: > On 16.04.2013 19:20, Tim Chen wrote: > > This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ > > instructions. Details discussing the implementation can be found in the > > paper: > > > > "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" > > URL: http://download.intel.com/design/intarch/papers/323102.pdf > > URL does not work. Thanks for catching this. Will update. > > > > > Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > > Tested-by: Keith Busch <keith.busch@xxxxxxxxx> > > --- > > arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 +++++++++++++++++++++++++++++++++ > > 1 file changed, 659 insertions(+) > > create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S > <snip> > > + > > + # Allocate Stack Space > > + mov %rsp, %rcx > > + sub $16*10, %rsp > > + and $~(0x20 - 1), %rsp > > + > > + # push the xmm registers into the stack to maintain > > + movdqa %xmm10, 16*2(%rsp) > > + movdqa %xmm11, 16*3(%rsp) > > + movdqa %xmm8 , 16*4(%rsp) > > + movdqa %xmm12, 16*5(%rsp) > > + movdqa %xmm13, 16*6(%rsp) > > + movdqa %xmm6, 16*7(%rsp) > > + movdqa %xmm7, 16*8(%rsp) > > + movdqa %xmm9, 16*9(%rsp) > > You don't need to store (and restore) these, as 'crc_t10dif_pcl' is called between kernel_fpu_begin/_end. That's true. Will skip the xmm save/restore in update to the patch. > > > + > > + > > + # check if smaller than 256 > > + cmp $256, arg3 > > + > <snip> > > +_cleanup: > > + # scale the result back to 16 bits > > + shr $16, %eax > > + movdqa 16*2(%rsp), %xmm10 > > + movdqa 16*3(%rsp), %xmm11 > > + movdqa 16*4(%rsp), %xmm8 > > + movdqa 16*5(%rsp), %xmm12 > > + movdqa 16*6(%rsp), %xmm13 > > + movdqa 16*7(%rsp), %xmm6 > > + movdqa 16*8(%rsp), %xmm7 > > + movdqa 16*9(%rsp), %xmm9 > > Registers are overwritten by kernel_fpu_end. > > > + mov %rcx, %rsp > > + ret > > +ENDPROC(crc_t10dif_pcl) > > + > > You should move ENDPROC at end of the full function. > > > +######################################################################## > > + > > +.align 16 > > +_less_than_128: > > + > > + # check if there is enough buffer to be able to fold 16B at a time > > + cmp $32, arg3 > <snip> > > + movdqa (%rsp), %xmm7 > > + pshufb %xmm11, %xmm7 > > + pxor %xmm0 , %xmm7 # xor the initial crc value > > + > > + psrldq $7, %xmm7 > > + > > + jmp _barrett > > Move ENDPROC here. > Will do. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html