Batch locking for rmap fork/exit processing

Andi Kleen <andi@xxxxxxxxxxxxxx> · Thu, 5 May 2011 12:32:48 -0700

012f18004da33ba67 in 2.6.36 caused a significant performance regression in 
fork/exit intensive workloads with a lot of sharing. The problem is that 
fork/exit now contend heavily on the lock of the root anon_vma.

This patchkit attempts to lower this a bit by batching the lock acquisions.
Right now the lock is taken for every shared vma individually. This
patchkit batches this and only reaquires the lock when actually needed.

When multiple processes are doing this in parallel, they will now 
spend much less time bouncing the lock cache line around. In addition
there should be also lower overhead in the uncontended case because
locks are relatively slow (not measured) 

This doesn't completely fix the regression on a 4S system, but cuts 
it down somewhat. One particular workload suffering from this gets
about 5% faster.

This is essentially a micro optimization that just tries to mitigate
the problem a bit.

Better would be to switch back to more local locking like .35 had, but I 
guess then we would be back with the old deadlocks? I was thinking also of 
adding some deadlock avoidance as an alternative.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>