> On Mon, 1 Mar 2021 13:23:06 +0200 > Shay Agroskin <shayagr@xxxxxxxxxx> wrote: > > > Jesper Dangaard Brouer <brouer@xxxxxxxxxx> writes: > > > > > On Sun, 28 Feb 2021 23:27:25 +0100 > > > Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> wrote: > > > > > >> > > drops = bq->count - sent; > > >> > > -out: > > >> > > - bq->count = 0; > > >> > > + if (unlikely(drops > 0)) { > > >> > > + /* If not all frames have been > > >> > > transmitted, it is our > > >> > > + * responsibility to free them > > >> > > + */ > > >> > > + for (i = sent; i < bq->count; i++) > > >> > > + > > >> > > xdp_return_frame_rx_napi(bq->q[i]); > > >> > > + } > > >> > > > >> > Wouldn't the logic above be the same even w/o the 'if' > > >> > condition ? > > >> > > >> it is just an optimization to avoid the for loop instruction if > > >> sent = bq->count > > > > > > True, and I like this optimization. > > > It will affect how the code layout is (and thereby I-cache > > > usage). > > > > I'm not sure what I-cache optimization you mean here. Compiling > > the following C code: > > > > # define unlikely(x) __builtin_expect(!!(x), 0) > > > > extern void xdp_return_frame_rx_napi(int q); > > > > struct bq_stuff { > > int q[4]; > > int count; > > }; > > > > int test(int sent, struct bq_stuff *bq) { > > int i; > > int drops; > > > > drops = bq->count - sent; > > if(unlikely(drops > 0)) > > for (i = sent; i < bq->count; i++) > > xdp_return_frame_rx_napi(bq->q[i]); > > > > return 2; > > } > > > > with x86_64 gcc 10.2 with -O3 flag in https://godbolt.org/ (which > > provides the assembly code for different compilers) yields the > > following assembly: > > > > test: > > mov eax, DWORD PTR [rsi+16] > > mov edx, eax > > sub edx, edi > > test edx, edx > > jg .L10 > > .L6: > > mov eax, 2 > > ret > > This exactly shows my point. Notice how 'ret' happens earlier in this > function. This is the common case, thus the CPU don't have to load the > asm instruction below. > > > .L10: > > cmp eax, edi > > jle .L6 > > push rbp > > mov rbp, rsi > > push rbx > > movsx rbx, edi > > sub rsp, 8 > > .L3: > > mov edi, DWORD PTR [rbp+0+rbx*4] > > add rbx, 1 > > call xdp_return_frame_rx_napi > > cmp DWORD PTR [rbp+16], ebx > > jg .L3 > > add rsp, 8 > > mov eax, 2 > > pop rbx > > pop rbp > > ret > > > > > > When dropping the 'if' completely I get the following assembly > > output > > test: > > cmp edi, DWORD PTR [rsi+16] > > jge .L6 > > Jump to .L6 which is the common case. The code in between is not used > in common case, but the CPU will likely load this into I-cache, and > then jumps over the code in common case. > > > push rbp > > mov rbp, rsi > > push rbx > > movsx rbx, edi > > sub rsp, 8 > > .L3: > > mov edi, DWORD PTR [rbp+0+rbx*4] > > add rbx, 1 > > call xdp_return_frame_rx_napi > > cmp DWORD PTR [rbp+16], ebx > > jg .L3 > > add rsp, 8 > > mov eax, 2 > > pop rbx > > pop rbp > > ret > > .L6: > > mov eax, 2 > > ret > > > > which exits earlier from the function if 'drops > 0' compared to > > the original code (the 'for' loop looks a little different, but > > this shouldn't affect icache). > > > > When removing the 'if' and surrounding the 'for' condition with > > 'unlikely' statement: > > > > for (i = sent; unlikely(i < bq->count); i++) > > > > I get the following assembly code: > > > > test: > > cmp edi, DWORD PTR [rsi+16] > > jl .L10 > > mov eax, 2 > > ret > > .L10: > > push rbx > > movsx rbx, edi > > sub rsp, 16 > > .L3: > > mov edi, DWORD PTR [rsi+rbx*4] > > mov QWORD PTR [rsp+8], rsi > > add rbx, 1 > > call xdp_return_frame_rx_napi > > mov rsi, QWORD PTR [rsp+8] > > cmp DWORD PTR [rsi+16], ebx > > jg .L3 > > add rsp, 16 > > mov eax, 2 > > pop rbx > > ret > > > > which is shorter than the other two (one line compared to the > > second and 7 lines compared the original code) and seems as > > optimized as the second. > > You are also using unlikely() and get the earlier return, with less > instructions, which is great. Perhaps we can use this type of > unlikely() in the for-statement? WDYT Lorenzo? sure, we can do it..I will address it in v3. Thanks. Regards, Lorenzo > > > > I'm far from being an assembly expert, and I tested a code snippet > > I wrote myself rather than the kernel's code (for the sake of > > simplicity only). > > Can you please elaborate on what makes the original 'if' essential > > (I took the time to do the assembly tests, please take the time on > > your side to prove your point, I'm not trying to be grumpy here). > > > > Shay > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer >
Attachment:
signature.asc
Description: PGP signature