Nicolas Pitre <nicolas.pitre@xxxxxxxxxx> writes: > On Thu, 19 Nov 2015, Måns Rullgård wrote: > >> Nicolas Pitre <nicolas.pitre@xxxxxxxxxx> writes: >> >> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) >> > +{ >> > + unsigned long long res; >> > + unsigned int tmp = 0; >> > + >> > + if (!bias) { >> > + asm ( "umull %Q0, %R0, %Q1, %Q2\n\t" >> > + "mov %Q0, #0" >> > + : "=&r" (res) >> > + : "r" (m), "r" (n) >> > + : "cc"); >> > + } else if (!(m & ((1ULL << 63) | (1ULL << 31)))) { >> > + res = m; >> > + asm ( "umlal %Q0, %R0, %Q1, %Q2\n\t" >> > + "mov %Q0, #0" >> > + : "+&r" (res) >> > + : "r" (m), "r" (n) >> > + : "cc"); >> > + } else { >> > + asm ( "umull %Q0, %R0, %Q2, %Q3\n\t" >> > + "cmn %Q0, %Q2\n\t" >> > + "adcs %R0, %R0, %R2\n\t" >> > + "adc %Q0, %1, #0" >> > + : "=&r" (res), "+&r" (tmp) >> > + : "r" (m), "r" (n) >> >> Why is tmp using a +r constraint here? The register is not written, so >> using an input-only operand could/should result in better code. That is >> also what the old code did. > > No, it is worse. gcc allocates two registers because, somehow, it > doesn't think that the first one still holds zero after the first usage. > This way usage of only one temporary register is forced throughout, > producing better code. Makes sense. Thanks for explaining. -- Måns Rullgård mans@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html