Re: [PATCH v4 0/5] getcpu_cache system call for 4.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- On Feb 24, 2016, at 3:07 PM, H. Peter Anvin hpa@xxxxxxxxx wrote:

> On February 23, 2016 8:09:23 PM PST, Mathieu Desnoyers
> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>----- On Feb 23, 2016, at 8:36 PM, H. Peter Anvin hpa@xxxxxxxxx wrote:
>>
>>> On 02/23/2016 03:28 PM, Mathieu Desnoyers wrote:
>>>> Hi,
>>>> 
>>>> Here is a patchset implementing a cache for the CPU number of the
>>>> currently running thread in user-space.
>>>> 
>>>> Benchmarks comparing this approach to a getcpu based on system call
>>on
>>>> ARM show a 44x speedup. They show a 14x speedup on x86-64 compared
>>to
>>>> executing lsl from a vDSO through glibc.
>>>> 
>>>> I'm added a man page in the changelog of patch 1/3, which shows an
>>>> example usage of this new system call.
>>>> 
>>>> This series is based on v4.5-rc5, submitted for Linux 4.6.
>>>> 
>>>> Feedback is welcome,
>>>> 
>>> 
>>> What is the resulting context switch overhead?
>>
>>The getcpu_cache only adds code to the thread migration path,
>>and to the resume notifier. The context switch path per se is
>>untouched. I would therefore expect the overhead on context
>>switch to be within the noise, except if stuff like hackbench
>>would be so sensitive to the size of struct task_struct that
>>a single extra pointer added at the end of struct task_struct
>>would throw off the benchmarks.
>>
>>Is that what you are concerned about ?
>>
>>Thanks,
>>
>>Mathieu
> 
> Yes, I'd like to see numbers.  It is way easy to handwave small changes away,
> but they add up over time.  Without numbers it is a bit hard to quantify the
> pro vs con.

- Speed

Running 10 runs of hackbench -l 100000 on a 2 sockets * 8-core Intel(R) Xeon(R) CPU
E5-2630 v3 @ 2.40GHz (directly on hardware, no virtualization), with
hyperthreading, with a 4.5-rc5 defconfig+localyesconfig, getcpu_cache series
applied, seems to indicate that the sched switch impact of this new configuration
option is within the noise:

* CONFIG_GETCPU_CACHE=n

avg.:      26.63 s
std.dev.:   0.38 s

* CONFIG_GETCPU_CACHE=y

avg.:      26.52 s
std.dev.:   0.47 s


- Size

Between CONFIG_GETCPU_CACHE=n/y, the size delta added to the compressed kernel
zImage is 704 bytes. The text size increase of vmlinux is 512 bytes, and the data
size increase of vmlinux is also 512 bytes.

* CONFIG_GETCPU_CACHE=n
   text	           data	    bss	     dec	    hex	filename
16802349	2745968	1564672	21112989	142289d	vmlinux

* CONFIG_GETCPU_CACHE=y
   text            data     bss      dec            hex filename
16802861        2746480 1564672 21114013        1422c9d vmlinux

Am I missing anything ? I plan to add this information to the
changelog for my next round (v5).

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux