On 2/11/23 06:18, Samuel Neves wrote:
Hi Samuel,
Thank you so much for the review!
> On Fri, Feb 10, 2023 at 6:18 PM Taehee Yoo <ap420073@xxxxxxxxx> wrote:
>>
>> Also, vpbroadcastd is simply replaced by vmovdqa in it.
>>
>> #ifdef CONFIG_AS_GFNI
>> #define aria_sbox_8way_gfni(x0, x1, x2, x3, \
>> x4, x5, x6, x7, \
>> t0, t1, t2, t3, \
>> t4, t5, t6, t7) \
>> - vpbroadcastq .Ltf_s2_bitmatrix, t0; \
>> - vpbroadcastq .Ltf_inv_bitmatrix, t1; \
>> - vpbroadcastq .Ltf_id_bitmatrix, t2; \
>> - vpbroadcastq .Ltf_aff_bitmatrix, t3; \
>> - vpbroadcastq .Ltf_x2_bitmatrix, t4; \
>> + vmovdqa .Ltf_s2_bitmatrix, t0; \
>> + vmovdqa .Ltf_inv_bitmatrix, t1; \
>> + vmovdqa .Ltf_id_bitmatrix, t2; \
>> + vmovdqa .Ltf_aff_bitmatrix, t3; \
>> + vmovdqa .Ltf_x2_bitmatrix, t4; \
>
> You can use vmovddup to replicate the behavior of vpbroadcastq for xmm
> registers. It's as fast as a movdqa and does not require increasing
> the data fields to be 16 bytes.
Thanks for this suggestion!
I tested this driver using vmovddup instead of using vpbroadcastq, it
works well.
As you mentioned, vmovddup doesn't require 16byte data.
So, I will use vmovddup instruction instead of vpbroadcastq instruction.
I will send the v2 patch for it.
Thank you so much,
Taehee Yoo