On Mon, Jan 02, 2023 at 11:44:35PM +0100, Lukasz Stelmach wrote: > Hi, > > I am researching possibility to use xor_blocks() in crypto_xor() and > crypto_xor_cpy(). What I've found already is that different architecture > dependent xor functions work on different blocks between 16 and 512 > (Intel AVX) bytes long. There is a hint in the comment for > async_xor_offs() that src_cnt (as passed to do_sync_xor_offs()) counts > pages. Thus, it is assumed, that the smallest chunk xor_blocks() gets is > a single page. Am I right? > > Do you think adding block_len field to struct xor_block_template (and > maybe some information about required alignment) and using it to call > do_2 from crypto_xor() may work? I am thinking especially about disk > encryption where sectors of 512~4096 are handled. > Taking a step back, it sounds like you think the word-at-a-time XOR in crypto_xor() is not performant enough, so you want to use a SIMD (e.g. NEON, SSE, or AVX) implementation instead. Have you tested that this would actually give a benefit on the input sizes in question, especially considering that SIMD can only be used in the kernel if kernel_fpu_begin() is executed first? It also would be worth considering just optimizing crypto_xor() by unrolling the word-at-a-time loop to 4x or so. - Eric