Re: [PATCH 1/3] USB: use percpu counter to count submitted URBs per device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 19, 2013 at 11:14 PM, Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Aug 19, 2013 at 11:06:31PM +0800, Ming Lei wrote:
>> On Mon, Aug 19, 2013 at 10:00 PM, Greg Kroah-Hartman
>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>> > On Mon, Aug 19, 2013 at 07:04:18PM +0800, Ming Lei wrote:
>> >> Because usb_hcd_submit_urb is in the hotest path of usb core,
>> >> so use percpu counter to count URB instead of using atomic variable
>> >> because atomic operations are much slower than percpu operations.
>> >>
>> >> Cc: Oliver Neukum <oliver@xxxxxxxxxx>
>> >> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
>> >> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>> >> ---
>> >>  drivers/usb/core/hcd.c   |    4 ++--
>> >>  drivers/usb/core/sysfs.c |    7 ++++++-
>> >>  drivers/usb/core/usb.c   |    9 ++++++++-
>> >>  drivers/usb/core/usb.h   |    1 +
>> >>  include/linux/usb.h      |    2 +-
>> >>  5 files changed, 18 insertions(+), 5 deletions(-)
>> >
>> > And this really speeds things up?  Exactly what does it?
>> >
>> > And it's not that atomic operations are "slower", it's just that the
>>
>> For SMP, atomic_inc/atomic_dec are much slower than percpu
>> variable inc/dec, see 4.1(Why Isn’t Concurrent Count-ing Trivial?)
>> of [1].
>>
>>          However, it is slower: on a Intel Core Duo laptop, it is about six
>>          times slower than non-atomic increment when a single thread
>>          is incrementing, and more than ten times slower if two threads
>>          are incrementing.
>>
>> Considered that most of desktop & laptop are SMP now, and with
>> USB3.0, the submitted URBs per second may reach tens of thousand
>> or more, and we can remove the atomic inc/dec operations in the hot
>> path, so why don't do it?
>
> Because you really didn't do it, there are lots of other atomic
> operations on that same path.

Not lots in the path of usbcore.

>
> And, thens of thousands of urbs should be trivial, did you measure this
> to see if it changed anything?  I'm not taking patches like this that
> are not quantifiable, sorry.

The number may be too trivial to measure, but I will try to test
with perf.

>
> The gating problem in USB right now is the hardware, it's the slowest
> thing, not the kernel, from everything I have ever tested, or seen.

The problem may not speed up usb performance, but might decrease
CPU utilization a bit, or cache miss.

>
> Well, bad host controller silicon is also a problem (i.e. raspberry pi),
> but there's not much we can do about braindead problems like that...
>
>> > barriers involved can be slower, depending on what else is happening.
>> > If you look, you are already hitting atomic variables in the same path,
>> > so how can this change speed anything up?
>>
>> No, no barriers are involved in atomic_inc/atomic_dec at all.
>
> None?  Hm, you might want to rethink that statement :)

Please see Documentation/memory-barriers.txt:

The following also do _not_ imply memory barriers, and so may require explicit
memory barriers under some circumstances (smp_mb__before_atomic_dec() for
instance):

        atomic_add();
        atomic_sub();
        atomic_inc();
        atomic_dec();


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux