On Mon, Aug 19, 2013 at 10:14:57AM -0700, Greg Kroah-Hartman wrote: > On Mon, Aug 19, 2013 at 11:40:50PM +0800, Ming Lei wrote: > > On Mon, Aug 19, 2013 at 11:14 PM, Greg Kroah-Hartman > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > On Mon, Aug 19, 2013 at 11:06:31PM +0800, Ming Lei wrote: > > >> On Mon, Aug 19, 2013 at 10:00 PM, Greg Kroah-Hartman > > >> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > >> > On Mon, Aug 19, 2013 at 07:04:18PM +0800, Ming Lei wrote: > > >> >> Because usb_hcd_submit_urb is in the hotest path of usb core, > > >> >> so use percpu counter to count URB instead of using atomic variable > > >> >> because atomic operations are much slower than percpu operations. > > >> >> > > >> >> Cc: Oliver Neukum <oliver@xxxxxxxxxx> > > >> >> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> > > >> >> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx> > > >> >> --- > > >> >> drivers/usb/core/hcd.c | 4 ++-- > > >> >> drivers/usb/core/sysfs.c | 7 ++++++- > > >> >> drivers/usb/core/usb.c | 9 ++++++++- > > >> >> drivers/usb/core/usb.h | 1 + > > >> >> include/linux/usb.h | 2 +- > > >> >> 5 files changed, 18 insertions(+), 5 deletions(-) > > >> > > > >> > And this really speeds things up? Exactly what does it? > > >> > > > >> > And it's not that atomic operations are "slower", it's just that the > > >> > > >> For SMP, atomic_inc/atomic_dec are much slower than percpu > > >> variable inc/dec, see 4.1(Why Isn’t Concurrent Count-ing Trivial?) > > >> of [1]. > > >> > > >> However, it is slower: on a Intel Core Duo laptop, it is about six > > >> times slower than non-atomic increment when a single thread > > >> is incrementing, and more than ten times slower if two threads > > >> are incrementing. > > >> > > >> Considered that most of desktop & laptop are SMP now, and with > > >> USB3.0, the submitted URBs per second may reach tens of thousand > > >> or more, and we can remove the atomic inc/dec operations in the hot > > >> path, so why don't do it? > > > > > > Because you really didn't do it, there are lots of other atomic > > > operations on that same path. > > > > Not lots in the path of usbcore. > > Did you look close? I see 2 more right there in the context of your > patch alone. One you try to take care of later (but just do the same > thing, no real change), the other you don't address at all. Ok, sorry, atomic_set() is, on some arches, "faster" than atomic_inc(), so you have sped up things there, my apologies. But given the other locks taken in this path, and other atomic operations, removing 1 seems like premature optimization. Please, use perf, and other things, to find out where real problems are in the USB stack. I'm sure there are locations that we can improve on, but until you get some real numbers, it's going to be hard to accept stuff like this. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html