On Sun, 30 Oct 2011, Sarah Sharp wrote: > On Fri, Oct 28, 2011 at 11:11:10AM -0400, Alan Stern wrote: > > On Fri, 28 Oct 2011, Sarah Sharp wrote: > > > > > On Fri, Oct 28, 2011 at 05:05:58PM +0800, Andiry Xu wrote: > > > > There're two static xhci_urb_to_transfer_ring() implementations in xhci.c > > > > and xhci-ring.c, and they do exactly the same things. > > > > > > > > Remove one of them to get rid of duplicate codes. > > > > > > NAK. There are the same functions in two separate files for performance > > > reasons. This function is used very often, and allowing it to be static > > > lets the compiler optimize it. > > > > Optimize it how? The only optimization for static functions that I'm > > aware of is to put them inline. But this function is sufficiently long > > and complicated that putting in inline would yield worse object code, > > not better. > > > > Have you measured the performance difference? > > From the original commit that created the copies: > > commit 021bff9179c2d19c26599dc3e9134d04cf1c8a3a > Author: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> > Date: Thu Jul 29 22:12:20 2010 -0700 > > USB: xhci: Performance - move functions that find ep ring. > > I've been using perf to measure the top symbols while transferring 1GB of data > on a USB 3.0 drive with dd. This is using the raw disk with /dev/sdb, with a > block size of 1K. > > During performance testing, the top symbol was xhci_triad_to_transfer_ring(), a > function that should return immediately if streams are not enabled for an > endpoint. It turned out that the functions to find the endpoint ring was > defined in xhci-mem.c and used in xhci-ring.c and xhci-hcd.c. I moved a copy of > xhci_triad_to_transfer_ring() and xhci_urb_to_transfer_ring() into xhci-ring.c > and declared them static. I also made a static version of > xhci_urb_to_transfer_ring() in xhci.c. > > This improved throughput on a 1GB read of the raw disk with dd from > 186MB/s to 195MB/s, and perf reported sampling the xhci_triad_to_transfer_ring() > 0.06% of the time, rather than 9.26% of the time. > > Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx> > > > It's possible that the compiler is inlining the function automatically > or doing some other sort of optimization that wasn't possible when the > function was defined in a different file. I see. Well, it's hard to argue with success. On the other hand, it sounds like you don't really know exactly what the compiler is doing. It might be optimizing the function somehow, or there might be a different reason for the performance benefit. > The instance of xhci_urb_to_transfer_ring() in xhci.c is only used when > an URB is canceled, which is very rare. The instance in xhci-ring.c > would be called again and again as URBs complete. Wouldn't the function > in xhci-ring.c be cached properly in the common case? Maybe keeping a single copy of the function and putting it in xhci-ring.c would provide the same performance benefit. Or doing the test for streams enabled before calling the function instead of within the function. But it seems silly to go back and revisit all that at this point. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html