Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16.04.2013 19:20, Tim Chen wrote:
> This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
> instructions.  Details discussing the implementation can be found in the
> paper:
> 
> "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
> URL: http://download.intel.com/design/intarch/papers/323102.pdf

URL does not work.

> 
> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Tested-by: Keith Busch <keith.busch@xxxxxxxxx>
> ---
>  arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 +++++++++++++++++++++++++++++++++
>  1 file changed, 659 insertions(+)
>  create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S
<snip>
> +
> +	# Allocate Stack Space
> +	mov     %rsp, %rcx
> +	sub	$16*10, %rsp
> +	and     $~(0x20 - 1), %rsp
> +
> +	# push the xmm registers into the stack to maintain
> +	movdqa %xmm10, 16*2(%rsp)
> +	movdqa %xmm11, 16*3(%rsp)
> +	movdqa %xmm8 , 16*4(%rsp)
> +	movdqa %xmm12, 16*5(%rsp)
> +	movdqa %xmm13, 16*6(%rsp)
> +	movdqa %xmm6,  16*7(%rsp)
> +	movdqa %xmm7,  16*8(%rsp)
> +	movdqa %xmm9,  16*9(%rsp)

You don't need to store (and restore) these, as 'crc_t10dif_pcl' is called between kernel_fpu_begin/_end.

> +
> +
> +	# check if smaller than 256
> +	cmp	$256, arg3
> +
<snip>
> +_cleanup:
> +	# scale the result back to 16 bits
> +	shr	$16, %eax
> +	movdqa	16*2(%rsp), %xmm10
> +	movdqa	16*3(%rsp), %xmm11
> +	movdqa	16*4(%rsp), %xmm8
> +	movdqa	16*5(%rsp), %xmm12
> +	movdqa	16*6(%rsp), %xmm13
> +	movdqa	16*7(%rsp), %xmm6
> +	movdqa	16*8(%rsp), %xmm7
> +	movdqa	16*9(%rsp), %xmm9

Registers are overwritten by kernel_fpu_end.

> +	mov     %rcx, %rsp
> +	ret
> +ENDPROC(crc_t10dif_pcl)
> +

You should move ENDPROC at end of the full function.

> +########################################################################
> +
> +.align 16
> +_less_than_128:
> +
> +	# check if there is enough buffer to be able to fold 16B at a time
> +	cmp	$32, arg3
<snip>
> +	movdqa	(%rsp), %xmm7
> +	pshufb	%xmm11, %xmm7
> +	pxor	%xmm0 , %xmm7   # xor the initial crc value
> +
> +	psrldq	$7, %xmm7
> +
> +	jmp	_barrett

Move ENDPROC here.


 -Jussi
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux