Re: [PATCH] xHCI: Remove duplicate functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 30 Oct 2011, Sarah Sharp wrote:

> On Fri, Oct 28, 2011 at 11:11:10AM -0400, Alan Stern wrote:
> > On Fri, 28 Oct 2011, Sarah Sharp wrote:
> > 
> > > On Fri, Oct 28, 2011 at 05:05:58PM +0800, Andiry Xu wrote:
> > > > There're two static xhci_urb_to_transfer_ring() implementations in xhci.c
> > > > and xhci-ring.c, and they do exactly the same things.
> > > > 
> > > > Remove one of them to get rid of duplicate codes.
> > > 
> > > NAK.  There are the same functions in two separate files for performance
> > > reasons.  This function is used very often, and allowing it to be static
> > > lets the compiler optimize it.
> > 
> > Optimize it how?  The only optimization for static functions that I'm 
> > aware of is to put them inline.  But this function is sufficiently long 
> > and complicated that putting in inline would yield worse object code, 
> > not better.
> > 
> > Have you measured the performance difference?
> 
> From the original commit that created the copies:
> 
> commit 021bff9179c2d19c26599dc3e9134d04cf1c8a3a
> Author: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
> Date:   Thu Jul 29 22:12:20 2010 -0700
> 
>     USB: xhci: Performance - move functions that find ep ring.
>     
>     I've been using perf to measure the top symbols while transferring 1GB of data
>     on a USB 3.0 drive with dd.  This is using the raw disk with /dev/sdb, with a
>     block size of 1K.
>     
>     During performance testing, the top symbol was xhci_triad_to_transfer_ring(), a
>     function that should return immediately if streams are not enabled for an
>     endpoint.  It turned out that the functions to find the endpoint ring was
>     defined in xhci-mem.c and used in xhci-ring.c and xhci-hcd.c.  I moved a copy of
>     xhci_triad_to_transfer_ring() and xhci_urb_to_transfer_ring() into xhci-ring.c
>     and declared them static.  I also made a static version of
>     xhci_urb_to_transfer_ring() in xhci.c.
>     
>     This improved throughput on a 1GB read of the raw disk with dd from
>     186MB/s to 195MB/s, and perf reported sampling the xhci_triad_to_transfer_ring()
>     0.06% of the time, rather than 9.26% of the time.
>     
>     Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx>
> 
> 
> It's possible that the compiler is inlining the function automatically
> or doing some other sort of optimization that wasn't possible when the
> function was defined in a different file.

I see.  Well, it's hard to argue with success.  On the other hand, it
sounds like you don't really know exactly what the compiler is doing.  
It might be optimizing the function somehow, or there might be a
different reason for the performance benefit.

> The instance of xhci_urb_to_transfer_ring() in xhci.c is only used when
> an URB is canceled, which is very rare.  The instance in xhci-ring.c
> would be called again and again as URBs complete.  Wouldn't the function
> in xhci-ring.c be cached properly in the common case?

Maybe keeping a single copy of the function and putting it in
xhci-ring.c would provide the same performance benefit.  Or doing the
test for streams enabled before calling the function instead of within
the function.  But it seems silly to go back and revisit all that at
this point.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux