On 08/24/2012 09:58 AM, Eric Dumazet wrote:
Le vendredi 24 août 2012 à 09:48 -0500, Nathan Zimmer a écrit :
On Wed, Aug 22, 2012 at 11:42:58PM +0200, Eric Dumazet wrote:
On Wed, 2012-08-22 at 20:28 +0200, Eric Dumazet wrote:
Thats interesting, but if you really want this to fly, one RCU
conversion would be much better ;)
pde_users would be an atomic_t and you would avoid the spinlock
contention.
Here is what I had in mind, I would be interested to know how it helps a 512 core machine ;)
Here are the results and they look great.
cpuinfo baseline moved kfree Rcu
tasks read-sec read-sec read-sec
1 0.0141 0.0141 0.0141
2 0.0140 0.0140 0.0142
4 0.0140 0.0141 0.0141
8 0.0145 0.0145 0.0140
16 0.0553 0.0548 0.0168
32 0.1688 0.1622 0.0549
64 0.5017 0.3856 0.1690
128 1.7005 0.9710 0.5038
256 5.2513 2.6519 2.0804
512 8.0529 6.2976 3.0162
Indeed...
Could you explicit the test you are actually doing ?
Thanks
It is a dead simple test.
The test starts by forking off X number of tasks
assigning each their own cpu.
Each task then allocs a bit of memory.
All tasks wait on a memory cell for the go order.
We measure the read time starting here.
Once the go order is given they all read a chunk of the selected proc file.
I was using /proc/cpuinfo to test.
Once everyone has finished we take the end read time.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html