While working on mul_u64_u64_div_u64() improvements I realized that there is a better way to perform a 64x64->128 bits multiplication that doesn't involve any conditionals to handle overflows. It even produces equivalent or better code than the provided ARM assembly alternative. And the best part is: arch/arm/include/asm/div64.h | 52 ----------------- include/asm-generic/div64.h | 105 +++++++++++------------------------ 2 files changed, 31 insertions(+), 126 deletions(-)