On Mon, 15 Apr 2024 09:21:36 +0800 Long Li <leo.lilong@xxxxxxxxxx> wrote: > The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") > modified the xas_descend function in such a way that it was no longer > being compiled as an inline function, because it increased the size of > xas_descend(), and the compiler no longer optimizes it as inline. This > had a negative impact on performance, xas_descend is called frequently > to traverse downwards in the xarray tree, making it a hot function. > > Inlining xas_descend has been shown to significantly improve performance > by approximately 4.95% in the iozone write test. > > Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz > #iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest > > Before this patch: > > kB reclen write rewrite read reread > 67108864 16384 2230080 3637689 6 315197 5496027 > > After this patch: > > kB reclen write rewrite read reread > 67108864 16384 2340360 3666175 6272401 5460782 > > Percentage change: > 4.95% 0.78% -0.68% -0.64% > > This patch introduces inlining to the xas_descend function. While this > change increases the size of lib/xarray.o, the performance gains in > critical workloads make this an acceptable trade-off. > > Size comparison before and after patch: > .text .data .bss file > 0x3502 0 0 lib/xarray.o.before > 0x3602 0 0 lib/xarray.o.after > > ... > > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -200,7 +200,7 @@ static void *xas_start(struct xa_state *xas) > return entry; > } > > -static void *xas_descend(struct xa_state *xas, struct xa_node *node) > +static inline void *xas_descend(struct xa_state *xas, struct xa_node *node) > { > unsigned int offset = get_offset(xas->xa_index, node); > void *entry = xa_entry(xas->xa, node, offset); I thought gcc nowadays treats `inline' as avisory and still makes up its own mind? Perhaps we should use __always_inline here?