On Tue, 3 Feb 2015, Daniel Sanders wrote: > From: Toma Tabacu <toma.tabacu@xxxxxxxxxx> > > Change the type of csum_ipv6_magic's 'proto' argument from unsigned > short to __u32. > > This fixes a type mismatch between the 'htonl(proto)' inline asm > input, which is __u32, and the 'proto' output, which is unsigned > short. > > This is the error message reported by clang: > arch/mips/include/asm/checksum.h:285:27: error: unsupported inline asm: input with type '__be32' (aka 'unsigned int') matching output with type 'unsigned short' > "0" (htonl(len)), "1" (htonl(proto)), "r" (sum)); > ^~~~~~~~~~~~ > > The changed code can be compiled successfully by both gcc and clang. This definitely looks like a bug in clang to me. What this construct means is both input #5 and output #1 live in the same register, and that an `__u32' value is taken on input (from the result of the `htonl(proto)' calculation) and an `unsigned short' value produced in the same register on output, that'll be the value of the `proto' variable from there on. A perfectly valid arrangement. This would be the right arrangement to use with the MIPS16 SEH instruction for example. Has this bug been reported to clang maintainers? And I'd prefer to leave the declaration of `proto' alone as IPv6 network protocol numbers are 16-bit quantities. That said this code is indeed weird if not wrong, which is probably why this arrangement resulted, in an attempt to prevent GCC from messing up the registers used. First and foremost both outputs, and especially #1, lack an earlyclobber. This I imagine may have prompted GCC to overwrite one of the inputs, which in turn is why whoever poked at this code decided to alias input #5 to output #1. But as you can see in the asm there's no real aliasing between input #5 and output #1. Input #5 is consumed early on (and even referred to with `%5' rather than `%1', which would be the norm in the case of actual aliasing), and the containing register reused for something else. So the two operands can be separated. This is unlike input #4 vs output #0, that is both read and written right away (and just as one'd expect there's no reference to `%4' anywhere). Output #0 can do without an earlyclobber as it is aliased to input #4 and therefore cannot be assigned by GCC to another input. But it won't hurt to have one too and it will set a good practice and serve a documentation purpose. I suggest a fix like this then: static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr, const struct in6_addr *daddr, __u32 len, unsigned short proto, __wsum sum) { __wsum tmp; __asm__( [...] : "=&r" (sum), "=&r" (tmp) : "r" (saddr), "r" (daddr), "0" (htonl(len)), "r" (htonl(proto)), "r" (sum)); return csum_fold(sum); } Try and see if it works for you. I wonder why this is an asm in the first place though. There's no rocket science here that GCC couldn't handle. I guess it must have been very bad at optimising a C equivalent then. Maciej