On Fri, Feb 10, 2023 at 6:18 PM Taehee Yoo <ap420073@xxxxxxxxx> wrote: > > Also, vpbroadcastd is simply replaced by vmovdqa in it. > > #ifdef CONFIG_AS_GFNI > #define aria_sbox_8way_gfni(x0, x1, x2, x3, \ > x4, x5, x6, x7, \ > t0, t1, t2, t3, \ > t4, t5, t6, t7) \ > - vpbroadcastq .Ltf_s2_bitmatrix, t0; \ > - vpbroadcastq .Ltf_inv_bitmatrix, t1; \ > - vpbroadcastq .Ltf_id_bitmatrix, t2; \ > - vpbroadcastq .Ltf_aff_bitmatrix, t3; \ > - vpbroadcastq .Ltf_x2_bitmatrix, t4; \ > + vmovdqa .Ltf_s2_bitmatrix, t0; \ > + vmovdqa .Ltf_inv_bitmatrix, t1; \ > + vmovdqa .Ltf_id_bitmatrix, t2; \ > + vmovdqa .Ltf_aff_bitmatrix, t3; \ > + vmovdqa .Ltf_x2_bitmatrix, t4; \ You can use vmovddup to replicate the behavior of vpbroadcastq for xmm registers. It's as fast as a movdqa and does not require increasing the data fields to be 16 bytes.