On Thu, Apr 13, 2006 at 05:26:39PM +0200, Laurent Vivier wrote: > Le lun 10/04/2006 à 10:24, Andrew Morton a écrit : > > Laurent Vivier <Laurent.Vivier@xxxxxxxx> wrote: > > > > > > Does the attached patch look like the thing you though about ? > > > > I guess so. But it'll need a lot of performance testing on big SMP > > to work out what the impact is. > > I made some tests with dbench: > > IBM x440: 8 CPUs hyperthreaded = 16 CPUs (Xeon at 1.4 Ghz) > I ran the same tests on a 16 core EM64T box very similar to the one you ran dbench on :). Dbench results on ext3 varies quite a bit. I couldn't get to a statistically significant conclusion For eg, With atomic counters, 32 clients, 3 runs Throughput 187.712 MB/sec 32 procs Throughput 197.059 MB/sec 32 procs Throughput 203.522 MB/sec 32 procs Without atomic counters (per-cpu counters), 32 clients, 3 runs Throughput 228.805 MB/sec 32 procs Throughput 155.831 MB/sec 32 procs Throughput 134.777 MB/sec 32 procs The oprofile profiles for the atomic counter case looks like this: CPU: P4 / Xeon with 2 hyper-threads, speed 3002.77 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % app name symbol name 180505286 57.7844 vmlinux-t poll_idle 51944524 16.6288 vmlinux-t ext3_test_allocatable 43648955 13.9731 vmlinux-t bitmap_search_next_usable_block 2892251 0.9259 vmlinux-t copy_user_generic 2099969 0.6723 vmlinux-t do_get_write_access 1459523 0.4672 vmlinux-t journal_dirty_metadata 1393413 0.4461 vmlinux-t journal_stop So the atomic counters in question are not even hotspots on this workload, so IMHO, dbench cannot be used to come to any conclusion regarding per-cpu counters vs atomics here. Thanks, Kiran - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html