Jesper Dangaard Brouer <brouer@xxxxxxxxxx> writes:
On Sun, 28 Feb 2021 23:27:25 +0100
Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> wrote:
> > drops = bq->count - sent;
> > -out:
> > - bq->count = 0;
> > + if (unlikely(drops > 0)) {
> > + /* If not all frames have been
> > transmitted, it is our
> > + * responsibility to free them
> > + */
> > + for (i = sent; i < bq->count; i++)
> > +
> > xdp_return_frame_rx_napi(bq->q[i]);
> > + }
>
> Wouldn't the logic above be the same even w/o the 'if'
> condition ?
it is just an optimization to avoid the for loop instruction if
sent = bq->count
True, and I like this optimization.
It will affect how the code layout is (and thereby I-cache
usage).
I'm not sure what I-cache optimization you mean here. Compiling
the following C code:
# define unlikely(x) __builtin_expect(!!(x), 0)
extern void xdp_return_frame_rx_napi(int q);
struct bq_stuff {
int q[4];
int count;
};
int test(int sent, struct bq_stuff *bq) {
int i;
int drops;
drops = bq->count - sent;
if(unlikely(drops > 0))
for (i = sent; i < bq->count; i++)
xdp_return_frame_rx_napi(bq->q[i]);
return 2;
}
with x86_64 gcc 10.2 with -O3 flag in https://godbolt.org/ (which
provides the assembly code for different compilers) yields the
following assembly:
test:
mov eax, DWORD PTR [rsi+16]
mov edx, eax
sub edx, edi
test edx, edx
jg .L10
.L6:
mov eax, 2
ret
.L10:
cmp eax, edi
jle .L6
push rbp
mov rbp, rsi
push rbx
movsx rbx, edi
sub rsp, 8
.L3:
mov edi, DWORD PTR [rbp+0+rbx*4]
add rbx, 1
call xdp_return_frame_rx_napi
cmp DWORD PTR [rbp+16], ebx
jg .L3
add rsp, 8
mov eax, 2
pop rbx
pop rbp
ret
When dropping the 'if' completely I get the following assembly
output
test:
cmp edi, DWORD PTR [rsi+16]
jge .L6
push rbp
mov rbp, rsi
push rbx
movsx rbx, edi
sub rsp, 8
.L3:
mov edi, DWORD PTR [rbp+0+rbx*4]
add rbx, 1
call xdp_return_frame_rx_napi
cmp DWORD PTR [rbp+16], ebx
jg .L3
add rsp, 8
mov eax, 2
pop rbx
pop rbp
ret
.L6:
mov eax, 2
ret
which exits earlier from the function if 'drops > 0' compared to
the original code (the 'for' loop looks a little different, but
this shouldn't affect icache).
When removing the 'if' and surrounding the 'for' condition with
'unlikely' statement:
for (i = sent; unlikely(i < bq->count); i++)
I get the following assembly code:
test:
cmp edi, DWORD PTR [rsi+16]
jl .L10
mov eax, 2
ret
.L10:
push rbx
movsx rbx, edi
sub rsp, 16
.L3:
mov edi, DWORD PTR [rsi+rbx*4]
mov QWORD PTR [rsp+8], rsi
add rbx, 1
call xdp_return_frame_rx_napi
mov rsi, QWORD PTR [rsp+8]
cmp DWORD PTR [rsi+16], ebx
jg .L3
add rsp, 16
mov eax, 2
pop rbx
ret
which is shorter than the other two (one line compared to the
second and 7 lines compared the original code) and seems as
optimized as the second.
I'm far from being an assembly expert, and I tested a code snippet
I wrote myself rather than the kernel's code (for the sake of
simplicity only).
Can you please elaborate on what makes the original 'if' essential
(I took the time to do the assembly tests, please take the time on
your side to prove your point, I'm not trying to be grumpy here).
Shay