-----Hongtao Liu <crazylht@xxxxxxxxx> wrote: ----- >To: Hong X <hongx@xxxxxxx> >From: Hongtao Liu <crazylht@xxxxxxxxx> >Date: 03/08/2020 22:54 >Cc: gcc-help@xxxxxxxxxxx >Subject: [EXTERNAL] Re: Initializing a vector to zero leads to less >efficient assemblies than manually assigning a vector to zero? > >On Sat, Mar 7, 2020 at 5:20 AM Hong X <hongx@xxxxxxx> wrote: >> >> Hi all, >> >> I tried to compile the following two code snippets with >"--std=c++14 -mavx2 -O3" options: >> >> double tmp_values[4] = {0}; >> >> and >> >> double tmp_values[4]; >> >> for (auto i = 0; i < 4; ++i) { >> tmp_values[i] = 0.0; >> } >> >> The first code snippet leads to >> >> vmovaps XMMWORD PTR [rsp], xmm0 >> vmovaps XMMWORD PTR [rsp+16], xmm0 >> >> But the second leads to only >> >> vmovapd YMMWORD PTR [rsp], ymm0 >> >> which is less efficient than the previous one. Am I missing >something? >> >Assume you're working on Skylake. the latency and throuoput of >vmovaps/vmovpad is > | lat | throughput | uops | >port | >VMOVAPS (XMM, M128)| [≤4;≤7] | 0.50 / 0.50 | 1 | 1*p23 | >VMOVAPS (YMM, M256)| [≤5;≤8]| 0.50 / 0.50| 1 | 1*p23 | >Refer to >https://urldefense.proofpoint.com/v2/url?u=https-3A__uops.info_table. >html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=MiihJD2XQNB_CwZVDvjHBg&m=nEB >RkuwiQXUL6Tu6accQsNS-jUQ9wCEw6jqJXNEBOes&s=zEMMNHR8du8hu3NLiODEXoXBYX >fjaraeuP8ueYllxTM&e= >So the later seems better. Oops, I said in the other way around. I meant the second is *more* (not *less* in my original post) efficient than the first despite they are functionally equivalent, but the first is likely more preferred by an average C++ programmer. This looks odd to me. Thanks, Hong