I was researching different ways of writing unaligned load/store
macros, so I checked how the kernel did it -- the most general way
possible. See include/linux/unaligned.h. As such, very bad code is
generated, for example on alpha with BWX, we can implement all these
functions with a single instruction, whereas we get stuff like this
generated from the generic functions.
__get_unaligned_le32:
.frame $30,0,$26,0
.prologue 0
ldbu $0,1($16)
ldbu $1,2($16)
ldbu $2,3($16)
ldbu $3,0($16)
sll $1,16,$1
sll $0,8,$0
bis $0,$1,$0
sll $2,24,$2
bis $0,$3,$0
bis $0,$2,$0
addl $31,$0,$0
ret $31,($26),1
4 load byte instructions, shift, shift, or, shift, or, or, sign extend
-- or ldl_u instruction. The code is more than doubly-bad for le64.
Do we use the generic functions for a reason I don't see? It appears
that it would be easy enough to add architecture-specific unaligned
get/put functions in arch/*/include/asm/unaligned.h