> -----Original Message----- > From: Charlie Jenkins <charlie@xxxxxxxxxxxx> > Sent: Wednesday, December 13, 2023 10:11 AM > To: Palmer Dabbelt <palmer@xxxxxxxxxxx>; Conor Dooley > <conor@xxxxxxxxxx>; Samuel Holland <samuel.holland@xxxxxxxxxx>; David > Laight <David.Laight@xxxxxxxxxx>; Wang, Xiao W <xiao.w.wang@xxxxxxxxx>; > Evan Green <evan@xxxxxxxxxxxx>; linux-riscv@xxxxxxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; linux-arch@xxxxxxxxxxxxxxx > Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx>; Albert Ou > <aou@xxxxxxxxxxxxxxxxx>; Arnd Bergmann <arnd@xxxxxxxx>; Conor Dooley > <conor.dooley@xxxxxxxxxxxxx> > Subject: Re: [PATCH v12 4/5] riscv: Add checksum library > > On Tue, Dec 12, 2023 at 05:18:41PM -0800, Charlie Jenkins wrote: > > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit > > will load from the buffer in groups of 32 bits, and when compiled for > > 64-bit will load in groups of 64 bits. > > > > Additionally provide riscv optimized implementation of csum_ipv6_magic. > > > > Signed-off-by: Charlie Jenkins <charlie@xxxxxxxxxxxx> > > Acked-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx> > > Reviewed-by: Xiao Wang <xiao.w.wang@xxxxxxxxx> > > --- > > arch/riscv/include/asm/checksum.h | 13 +- > > arch/riscv/lib/Makefile | 1 + > > arch/riscv/lib/csum.c | 326 > ++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 339 insertions(+), 1 deletion(-) > > > > diff --git a/arch/riscv/include/asm/checksum.h > b/arch/riscv/include/asm/checksum.h > > index 2fcf864186e7..3fa04ff1eda8 100644 > > --- a/arch/riscv/include/asm/checksum.h > > +++ b/arch/riscv/include/asm/checksum.h > > @@ -12,6 +12,17 @@ > > > > #define ip_fast_csum ip_fast_csum > > > > +extern unsigned int do_csum(const unsigned char *buff, int len); > > +#define do_csum do_csum > > + > > +/* Default version is sufficient for 32 bit */ > > +#ifndef CONFIG_32BIT > > +#define _HAVE_ARCH_IPV6_CSUM > > +__sum16 csum_ipv6_magic(const struct in6_addr *saddr, > > + const struct in6_addr *daddr, > > + __u32 len, __u8 proto, __wsum sum); > > +#endif > > + > > /* Define riscv versions of functions before importing asm- > generic/checksum.h */ > > #include <asm-generic/checksum.h> > > > > @@ -69,7 +80,7 @@ static inline __sum16 ip_fast_csum(const void *iph, > unsigned int ihl) > > .option pop" > > : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)); > > } > > - return csum >> 16; > > + return (__force __sum16) (csum >> 16); I notice that this type conversion comes in after V10. This change should go to patch 3/5. BRs, Xiao [...] > > + > > +/* > > + * Perform a checksum on an arbitrary memory address. > > + * Will do a light-weight address alignment if buff is misaligned, unless > > + * cpu supports fast misaligned accesses. > > + */ > > +unsigned int do_csum(const unsigned char *buff, int len) > > +{ > > + if (unlikely(len <= 0)) > > + return 0; > > + > > + /* > > + * Significant performance gains can be seen by not doing alignment > > + * on machines with fast misaligned accesses. > > + * > > + * There is some duplicate code between the "with_alignment" and > > + * "no_alignment" implmentations, but the overlap is too awkward to > be > > + * able to fit in one function without introducing multiple static > > + * branches. The largest chunk of overlap was delegated into the > > + * do_csum_common function. > > + */ > > + if (static_branch_likely(&fast_misaligned_access_speed_key)) > > + return do_csum_no_alignment(buff, len); > > + > > + if (((unsigned long)buff & OFFSET_MASK) == 0) > > + return do_csum_no_alignment(buff, len); > > + > > + return do_csum_with_alignment(buff, len); > > +} > > > > -- > > 2.43.0 > > > > There is potentially a code size concern here. These changes do require > alternatives, and as such it increases the resulting binary size. The > bloat-o-meter script reports that the do_csum function grows to twice > the size with this patch: > > Function old new delta > do_csum 238 514 +276 > > The other functions are harder to measure because they get inlined or > are not included in generic code. However the do_csum is the most > impacted because of the misaligned access behavior. > > The performance improvements afforded by alternatives (with the Zbb > extension) and with the misaligned access checking are significant. In > my testing these optimizations alone contribute to over a 20% performance > improvement. > > - Charlie