Search Linux Wireless

Re: [PATCH 21/23] rt2x00: Optimize register access in rt2800usb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



W dniu 18 kwietnia 2011 16:48 uÅytkownik Helmut Schaa
<helmut.schaa@xxxxxxxxxxxxxx> napisaÅ:
> Hi,
>
> Am Montag, 18. April 2011 schrieb Ivo Van Doorn:
>> > Wouldn't this be better to create two pointers in struct rt2x00_dev.
>> > One for writing function and one for reading function? Am I right
>> > thinking calling functions by pointers is quite fast? Or is this still
>> > noticeably slower than using proper functions directly?
>>
>> We already have the pointer inside struct rt2x00_dev which references
>> the register access functions for rt2800pci/usb. These pointers are used
>> by rt2800lib to access the common registers. What this patch does, is
>> optimize the case where we exactly know which function we need, because
>> we are in the actual driver.
>>
>> As for the performance, I'll let Helmut comment on that as he created patch 20,
>> which introduced this change to rt2800pci. :)
>
> Sure, I was comparing some assembly in the rt2800pci hotpaths (on a 380Mhz
> MIPS CPU btw). A register read/write on PCI is just a readl or writel,
> nothing more but using the indirect wrappers we get something like this
> (This is x86_64 as I didn't want to cross compile right now). For example
> the register read + write in rt2800pci_enable_interrupt (which is called
> in every tasklet invocation, which can happen for every rx'ed frame and
> every tx'ed frame).
>
> movq  Â8(%rbx), %rax  # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
> leaq  Â-36(%rbp), %rdx #, tmp82
> movq  Â%rbx, %rdi   Â# rt2x00dev,
> movq  Â72(%rax), %rax Â# D.47612_27->drv, D.47612_27->drv
> movl  Â$516, %esi   Â#,
> call  Â*(%rax) # rt2800ops_29->register_read
> movb  Â%r14b, %cl   Â#,
> movq  Â8(%rbx), %rax  # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
> movq  Â%rbx, %rdi   Â# rt2x00dev,
> movq  Â72(%rax), %rax Â# D.47619_31->drv, D.47619_31->drv
> movl  Â$516, %esi   Â#,
> movl  Â$1, %edx    Â#, reg.119
> sall  Â%cl, %edx    #, reg.119
> andl  Â%r13d, %edx   # irq_field$bit_mask, reg.119
> notl  Â%r13d  # tmp89
> andl  Â-36(%rbp), %r13d    Â# reg, tmp89
> orl   %r13d, %edx   # tmp89, reg.119
> movl  Â%edx, -36(%rbp) # reg.119, reg
> call  Â*16(%rax)    # rt2800ops_33->register_write
>
> Also, this will trigger rt2x00pci_register_read
>
> pushq  %rbp  Â#
> mov   %esi, %esi   Â# offset, addr.27
> movq  Â%rsp, %rbp   Â#,
> addq  Â1056(%rdi), %rsi    Â# rt2x00dev_1(D)->csr.base, addr.27
> movl  Â%eax, (%rdx)  Â# ret,* value
>
> And rt2x00pci_register_write:
>
> pushq  %rbp  Â#
> mov   %esi, %esi   Â# offset, addr.26
> movq  Â%rsp, %rbp   Â#,
> addq  Â1056(%rdi), %rsi    Â# rt2x00dev_1(D)->csr.base, addr.26
> movl  Â%edx,(%rsi)    Â# value,* addr.26
>
> And here the same when using rt2x00pci_register_read/write directly:
>
> movq  Â1056(%rbx), %rax    Â# rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
> movl  Â516(%rax),%eax   #, reg.119
> movl  Â%r13d, %edx   # irq_field$bit_mask, tmp80
> movb  Â%r14b, %cl   Â#,
> notl  Â%edx  Â# tmp80
> andl  Â%edx, %eax   Â# tmp80, reg.119
> movl  Â$1, %edx    Â#, tmp85
> sall  Â%cl, %edx    #, tmp85
> andl  Â%r13d, %edx   # irq_field$bit_mask, tmp85
> orl   %edx, %eax   Â# tmp85, reg.119
> movq  Â1056(%rbx), %rdx    Â# rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
> movl  Â%eax,516(%rdx)   # reg.119,
>
> As you can see we save more then just one indirect function call:
>
> 17 movs -> 7 movs
> 2 calls -> 0 calls
> 1 add -> 0 adds
>
> This happens because the compiler is able to apply a number of optimizations
> that are only possible by inlining rt2x00pci_register_read/write. When using
> the indirect function call the compiler is not able to inline them.
>
> So, I first thought about using direct calls only in the interrupt handler
> and the RX/TX hotpaths but since using rt2800_register_read and
> rt2x00pci_register_read in different locations in rt2800pci would be even
> more confusing I just replaced every rt2800_register_read with
> rt2x00pci_register_read in rt2800pci.
>
> One way to keep the abstraction and still improve the register_read/write
> operations would be to introduce a inlined rt2800pci_register_read/write
> which directly calls rt2x00pci_register_read/write and provide that via
> rt2800_ops to rt2800lib. That way all calls in rt2800pci can directly
> inline rt2x00_register_read/write while rt2800lib will still use indirect
> calls to do the same.
>
> However, I didn't see any need for this.

Huh, thanks a lot for great explanation :)

-- 
RafaÅ
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux