W dniu 18 kwietnia 2011 16:48 uÅytkownik Helmut Schaa <helmut.schaa@xxxxxxxxxxxxxx> napisaÅ: > Hi, > > Am Montag, 18. April 2011 schrieb Ivo Van Doorn: >> > Wouldn't this be better to create two pointers in struct rt2x00_dev. >> > One for writing function and one for reading function? Am I right >> > thinking calling functions by pointers is quite fast? Or is this still >> > noticeably slower than using proper functions directly? >> >> We already have the pointer inside struct rt2x00_dev which references >> the register access functions for rt2800pci/usb. These pointers are used >> by rt2800lib to access the common registers. What this patch does, is >> optimize the case where we exactly know which function we need, because >> we are in the actual driver. >> >> As for the performance, I'll let Helmut comment on that as he created patch 20, >> which introduced this change to rt2800pci. :) > > Sure, I was comparing some assembly in the rt2800pci hotpaths (on a 380Mhz > MIPS CPU btw). A register read/write on PCI is just a readl or writel, > nothing more but using the indirect wrappers we get something like this > (This is x86_64 as I didn't want to cross compile right now). For example > the register read + write in rt2800pci_enable_interrupt (which is called > in every tasklet invocation, which can happen for every rx'ed frame and > every tx'ed frame). > > movq  Â8(%rbx), %rax  # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops > leaq  Â-36(%rbp), %rdx #, tmp82 > movq  Â%rbx, %rdi   Â# rt2x00dev, > movq  Â72(%rax), %rax Â# D.47612_27->drv, D.47612_27->drv > movl  Â$516, %esi   Â#, > call  Â*(%rax) # rt2800ops_29->register_read > movb  Â%r14b, %cl   Â#, > movq  Â8(%rbx), %rax  # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops > movq  Â%rbx, %rdi   Â# rt2x00dev, > movq  Â72(%rax), %rax Â# D.47619_31->drv, D.47619_31->drv > movl  Â$516, %esi   Â#, > movl  Â$1, %edx    Â#, reg.119 > sall  Â%cl, %edx    #, reg.119 > andl  Â%r13d, %edx   # irq_field$bit_mask, reg.119 > notl  Â%r13d  # tmp89 > andl  Â-36(%rbp), %r13d    Â# reg, tmp89 > orl   %r13d, %edx   # tmp89, reg.119 > movl  Â%edx, -36(%rbp) # reg.119, reg > call  Â*16(%rax)    # rt2800ops_33->register_write > > Also, this will trigger rt2x00pci_register_read > > pushq  %rbp  Â# > mov   %esi, %esi   Â# offset, addr.27 > movq  Â%rsp, %rbp   Â#, > addq  Â1056(%rdi), %rsi    Â# rt2x00dev_1(D)->csr.base, addr.27 > movl  Â%eax, (%rdx)  Â# ret,* value > > And rt2x00pci_register_write: > > pushq  %rbp  Â# > mov   %esi, %esi   Â# offset, addr.26 > movq  Â%rsp, %rbp   Â#, > addq  Â1056(%rdi), %rsi    Â# rt2x00dev_1(D)->csr.base, addr.26 > movl  Â%edx,(%rsi)    Â# value,* addr.26 > > And here the same when using rt2x00pci_register_read/write directly: > > movq  Â1056(%rbx), %rax    Â# rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base > movl  Â516(%rax),%eax   #, reg.119 > movl  Â%r13d, %edx   # irq_field$bit_mask, tmp80 > movb  Â%r14b, %cl   Â#, > notl  Â%edx  Â# tmp80 > andl  Â%edx, %eax   Â# tmp80, reg.119 > movl  Â$1, %edx    Â#, tmp85 > sall  Â%cl, %edx    #, tmp85 > andl  Â%r13d, %edx   # irq_field$bit_mask, tmp85 > orl   %edx, %eax   Â# tmp85, reg.119 > movq  Â1056(%rbx), %rdx    Â# rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base > movl  Â%eax,516(%rdx)   # reg.119, > > As you can see we save more then just one indirect function call: > > 17 movs -> 7 movs > 2 calls -> 0 calls > 1 add -> 0 adds > > This happens because the compiler is able to apply a number of optimizations > that are only possible by inlining rt2x00pci_register_read/write. When using > the indirect function call the compiler is not able to inline them. > > So, I first thought about using direct calls only in the interrupt handler > and the RX/TX hotpaths but since using rt2800_register_read and > rt2x00pci_register_read in different locations in rt2800pci would be even > more confusing I just replaced every rt2800_register_read with > rt2x00pci_register_read in rt2800pci. > > One way to keep the abstraction and still improve the register_read/write > operations would be to introduce a inlined rt2800pci_register_read/write > which directly calls rt2x00pci_register_read/write and provide that via > rt2800_ops to rt2800lib. That way all calls in rt2800pci can directly > inline rt2x00_register_read/write while rt2800lib will still use indirect > calls to do the same. > > However, I didn't see any need for this. Huh, thanks a lot for great explanation :) -- RafaÅ -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html