On Mon, Aug 19, 2013 at 11:14 PM, Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > On Mon, Aug 19, 2013 at 11:06:31PM +0800, Ming Lei wrote: >> On Mon, Aug 19, 2013 at 10:00 PM, Greg Kroah-Hartman >> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: >> > On Mon, Aug 19, 2013 at 07:04:18PM +0800, Ming Lei wrote: >> >> Because usb_hcd_submit_urb is in the hotest path of usb core, >> >> so use percpu counter to count URB instead of using atomic variable >> >> because atomic operations are much slower than percpu operations. >> >> >> >> Cc: Oliver Neukum <oliver@xxxxxxxxxx> >> >> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> >> >> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx> >> >> --- >> >> drivers/usb/core/hcd.c | 4 ++-- >> >> drivers/usb/core/sysfs.c | 7 ++++++- >> >> drivers/usb/core/usb.c | 9 ++++++++- >> >> drivers/usb/core/usb.h | 1 + >> >> include/linux/usb.h | 2 +- >> >> 5 files changed, 18 insertions(+), 5 deletions(-) >> > >> > And this really speeds things up? Exactly what does it? >> > >> > And it's not that atomic operations are "slower", it's just that the >> >> For SMP, atomic_inc/atomic_dec are much slower than percpu >> variable inc/dec, see 4.1(Why Isn’t Concurrent Count-ing Trivial?) >> of [1]. >> >> However, it is slower: on a Intel Core Duo laptop, it is about six >> times slower than non-atomic increment when a single thread >> is incrementing, and more than ten times slower if two threads >> are incrementing. >> >> Considered that most of desktop & laptop are SMP now, and with >> USB3.0, the submitted URBs per second may reach tens of thousand >> or more, and we can remove the atomic inc/dec operations in the hot >> path, so why don't do it? > > Because you really didn't do it, there are lots of other atomic > operations on that same path. Not lots in the path of usbcore. > > And, thens of thousands of urbs should be trivial, did you measure this > to see if it changed anything? I'm not taking patches like this that > are not quantifiable, sorry. The number may be too trivial to measure, but I will try to test with perf. > > The gating problem in USB right now is the hardware, it's the slowest > thing, not the kernel, from everything I have ever tested, or seen. The problem may not speed up usb performance, but might decrease CPU utilization a bit, or cache miss. > > Well, bad host controller silicon is also a problem (i.e. raspberry pi), > but there's not much we can do about braindead problems like that... > >> > barriers involved can be slower, depending on what else is happening. >> > If you look, you are already hitting atomic variables in the same path, >> > so how can this change speed anything up? >> >> No, no barriers are involved in atomic_inc/atomic_dec at all. > > None? Hm, you might want to rethink that statement :) Please see Documentation/memory-barriers.txt: The following also do _not_ imply memory barriers, and so may require explicit memory barriers under some circumstances (smp_mb__before_atomic_dec() for instance): atomic_add(); atomic_sub(); atomic_inc(); atomic_dec(); Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html