On Wednesday, December 20, 2017 1:12:54 PM PST Eric Biggers wrote: > > > > We do need both registers, though we could certainly swap their usage to make > > r12 the temp register. The reason we need the second register is because we > > need to keep the original length to perform the pshufb at the end. But, of > > course, that will not be needed anymore if we avoid the pshufb by duplicating > > the _read_last_lt8 block or utilizing pslldq some other way. > > > > If READ_PARTIAL_BLOCK can clobber 'DLEN' that would simplify it even more (no > need for 'TMP1'), but what I am talking about here is how INITIAL_BLOCKS_DEC and > INITIAL_BLOCKS_ENC maintain two copies of the remaining length in lock-step in > r11 and r12: > > _get_AAD_blocks\num_initial_blocks\operation: > movdqu (%r10), %xmm\i > PSHUFB_XMM %xmm14, %xmm\i # byte-reflect the AAD data > pxor %xmm\i, \XMM2 > GHASH_MUL \XMM2, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1 > add $16, %r10 > sub $16, %r12 > sub $16, %r11 > cmp $16, %r11 > jge _get_AAD_blocks\num_initial_blocks\operation > > The code which you are replacing with READ_PARTIAL_BLOCK actually needed the two > copies, but now it seems that only one copy is needed, so it can be simplified > by only using r11. > Sorry, I misunderstood earlier. I’ll remove the extra register from the preceding code in INIITIAL_BLOCKS_ENC/DEC. Thanks, Junaid