On Mon, Aug 19, 2013 at 10:00 PM, Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > On Mon, Aug 19, 2013 at 07:04:18PM +0800, Ming Lei wrote: >> Because usb_hcd_submit_urb is in the hotest path of usb core, >> so use percpu counter to count URB instead of using atomic variable >> because atomic operations are much slower than percpu operations. >> >> Cc: Oliver Neukum <oliver@xxxxxxxxxx> >> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> >> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx> >> --- >> drivers/usb/core/hcd.c | 4 ++-- >> drivers/usb/core/sysfs.c | 7 ++++++- >> drivers/usb/core/usb.c | 9 ++++++++- >> drivers/usb/core/usb.h | 1 + >> include/linux/usb.h | 2 +- >> 5 files changed, 18 insertions(+), 5 deletions(-) > > And this really speeds things up? Exactly what does it? > > And it's not that atomic operations are "slower", it's just that the For SMP, atomic_inc/atomic_dec are much slower than percpu variable inc/dec, see 4.1(Why Isn’t Concurrent Count-ing Trivial?) of [1]. However, it is slower: on a Intel Core Duo laptop, it is about six times slower than non-atomic increment when a single thread is incrementing, and more than ten times slower if two threads are incrementing. Considered that most of desktop & laptop are SMP now, and with USB3.0, the submitted URBs per second may reach tens of thousand or more, and we can remove the atomic inc/dec operations in the hot path, so why don't do it? > barriers involved can be slower, depending on what else is happening. > If you look, you are already hitting atomic variables in the same path, > so how can this change speed anything up? No, no barriers are involved in atomic_inc/atomic_dec at all. [1], Is Parallel Programming Hard, And, If So, What Can You Do About It? git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html