Search Linux Wireless

Re: [PATCH 21/23] rt2x00: Optimize register access in rt2800usb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Am Montag, 18. April 2011 schrieb Ivo Van Doorn:
> > Wouldn't this be better to create two pointers in struct rt2x00_dev.
> > One for writing function and one for reading function? Am I right
> > thinking calling functions by pointers is quite fast? Or is this still
> > noticeably slower than using proper functions directly?
> 
> We already have the pointer inside struct rt2x00_dev which references
> the register access functions for rt2800pci/usb. These pointers are used
> by rt2800lib to access the common registers. What this patch does, is
> optimize the case where we exactly know which function we need, because
> we are in the actual driver.
> 
> As for the performance, I'll let Helmut comment on that as he created patch 20,
> which introduced this change to rt2800pci. :)

Sure, I was comparing some assembly in the rt2800pci hotpaths (on a 380Mhz
MIPS CPU btw). A register read/write on PCI is just a readl or writel,
nothing more but using the indirect wrappers we get something like this
(This is x86_64 as I didn't want to cross compile right now). For example
the register read + write in rt2800pci_enable_interrupt (which is called
in every tasklet invocation, which can happen for every rx'ed frame and
every tx'ed frame).

movq    8(%rbx), %rax   # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
leaq    -36(%rbp), %rdx #, tmp82
movq    %rbx, %rdi      # rt2x00dev,
movq    72(%rax), %rax  # D.47612_27->drv, D.47612_27->drv
movl    $516, %esi      #,
call    *(%rax) # rt2800ops_29->register_read
movb    %r14b, %cl      #,
movq    8(%rbx), %rax   # rt2x00dev_1(D)->ops, rt2x00dev_1(D)->ops
movq    %rbx, %rdi      # rt2x00dev,
movq    72(%rax), %rax  # D.47619_31->drv, D.47619_31->drv
movl    $516, %esi      #,
movl    $1, %edx        #, reg.119
sall    %cl, %edx       #, reg.119
andl    %r13d, %edx     # irq_field$bit_mask, reg.119
notl    %r13d   # tmp89
andl    -36(%rbp), %r13d        # reg, tmp89
orl     %r13d, %edx     # tmp89, reg.119
movl    %edx, -36(%rbp) # reg.119, reg
call    *16(%rax)       # rt2800ops_33->register_write

Also, this will trigger rt2x00pci_register_read

pushq   %rbp    #
mov     %esi, %esi      # offset, addr.27
movq    %rsp, %rbp      #,
addq    1056(%rdi), %rsi        # rt2x00dev_1(D)->csr.base, addr.27
movl    %eax, (%rdx)    # ret,* value

And rt2x00pci_register_write:

pushq   %rbp    #
mov     %esi, %esi      # offset, addr.26
movq    %rsp, %rbp      #,
addq    1056(%rdi), %rsi        # rt2x00dev_1(D)->csr.base, addr.26
movl 	%edx,(%rsi)        # value,* addr.26

And here the same when using rt2x00pci_register_read/write directly:

movq    1056(%rbx), %rax        # rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
movl 	516(%rax),%eax     #, reg.119
movl    %r13d, %edx     # irq_field$bit_mask, tmp80
movb    %r14b, %cl      #,
notl    %edx    # tmp80
andl    %edx, %eax      # tmp80, reg.119
movl    $1, %edx        #, tmp85
sall    %cl, %edx       #, tmp85
andl    %r13d, %edx     # irq_field$bit_mask, tmp85
orl     %edx, %eax      # tmp85, reg.119
movq    1056(%rbx), %rdx        # rt2x00dev_1(D)->csr.base, rt2x00dev_1(D)->csr.base
movl 	%eax,516(%rdx)     # reg.119,

As you can see we save more then just one indirect function call:

17 movs -> 7 movs
2 calls -> 0 calls
1 add -> 0 adds

This happens because the compiler is able to apply a number of optimizations
that are only possible by inlining rt2x00pci_register_read/write. When using
the indirect function call the compiler is not able to inline them.

So, I first thought about using direct calls only in the interrupt handler
and the RX/TX hotpaths but since using rt2800_register_read and
rt2x00pci_register_read in different locations in rt2800pci would be even
more confusing I just replaced every rt2800_register_read with
rt2x00pci_register_read in rt2800pci.

One way to keep the abstraction and still improve the register_read/write
operations would be to introduce a inlined rt2800pci_register_read/write
which directly calls rt2x00pci_register_read/write and provide that via
rt2800_ops to rt2800lib. That way all calls in rt2800pci can directly
inline rt2x00_register_read/write while rt2800lib will still use indirect
calls to do the same.

However, I didn't see any need for this.

Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux