On Mon, Nov 7, 2022 at 2:26 PM <i.nixman@xxxxxxxxxxxxx> wrote: > > On 2022-11-07 03:32, Hongtao Liu wrote: > > On Sun, Nov 6, 2022 at 6:54 PM i.nixman--- via Gcc-help > > <gcc-help@xxxxxxxxxxx> wrote: > >> > >> > >> Hello, > >> > >> look at this example(https://godbolt.org/z/TnGMsfMs6): > >> ``` > >> auto foo(const char *p) { > >> const auto substr = _mm_loadu_si128((const __m128i *)p); > >> return _mm_cmplt_epi8(substr, _mm_set1_epi8('0')); > >> } > >> ``` > >> and to the generated asm: > >> ``` > >> 1: foo(char const*): > >> 2: movdqu xmm0, XMMWORD PTR [rdi] > >> 3: pxor xmm1, xmm1 > >> 4: pcmpgtb xmm0, XMMWORD PTR .LC0[rip] > >> 5: pcmpeqb xmm0, xmm1 > >> 6: ret > >> ``` > >> look at line 5. > >> is there any reason for `pcmpeqb` instruction? > > hi, > > > Looks like a mis optimization from > > > > _4 = VIEW_CONVERT_EXPR<__v16qs>(_7); > > _3 = _4 <= { 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, > > 47, 47 }; > > _5 = VIEW_CONVERT_EXPR<vector(16) signed char>(_3); --- this? > > > > Could you open a bugzilla for it > > https://gcc.gnu.org/bugzilla/ > > sure, but for which component? Let's put it as rtl-optimization first. > > > > >> > >> just for info, clang's output(https://godbolt.org/z/MPnvEMdhr): > >> ``` > >> 1: foo(char const*): > >> 2: movdqu xmm1, xmmword ptr [rdi] > >> 3: movdqa xmm0, xmmword ptr [rip + .LCPI0_0] > >> 4: pcmpgtb xmm0, xmm1 > >> 5: ret > >> ``` > >> > >> > >> best! -- BR, Hongtao