On Thu, Oct 11, 2012 at 11:37:46AM -0700, Peter Fordham wrote: > On 8 October 2012 18:10, Theodore Ts'o <tytso@xxxxxxx> wrote: > > How expensive are memory barriers on ARM, anyway? > > The performance monitors seem to be telling me that a DMB just after a > store which misses in the L1 & L2, (causing an eviction of a clean > line and a line-fill, I assume) takes over 100 cycles. If we assume a 1GHz clock, 100 cycles is 0.1 microseconds (100ns). A 4k read on an eMMC device (what I assume you are using) is about 5ms. A super expensive PCIe attached flash has a read latency of around 20-50 microseconds. Read latency for an SSD is around 1ms. > I'm seeing a 20% slow down in ext4 performance when enabling SMP on my > device. I'm starting to think there might be issues with the memory > system. So what are you measuring and how are you measuring it? If 0.1 microseconds is significant, it must be something where everything is in cache, and you're never hitting the storage device. More to the point, as Arjan pointed out to me on Google+, using a mutex is going to add at least one, and probably more, memory barriers (due to the locks needed by the scheduler) *plus* the scheduling overhead. So your claim that using a mutex is superior to using spinlocks makes absolutely no sense. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html