在 2008-12-08一的 17:12 -0500,Theodore Tso写道: > On Sun, Dec 07, 2008 at 08:52:50PM -0800, Andrew Morton wrote: > > > > The first patch which was added (pre-2.6.27) was "percpu_counter: new > > function percpu_counter_sum_and_set". This added the broken-by-design > > percpu_counter_sum_and_set() function, **and used it in ext4**. > > > > Mea culpa, I was the one who reviewed Mingming's patch, and missed > this. Part of the problem was that percpu_counter.c isn't well > documented, and I so saw the spinlock, but didn't realize it only > protected reference counter, and not the per-cpu array. I should have > read through code more thoroughly before approving the patch. > > I suppose if we wanted we could add a rw spinlock which mediates > access to a "foreign" cpu counter (i.e., percpu_counter_add gets a > shared lock, and percpu_counter_set needs an exclusive lock) but it's > probably not worth it. > > Actually, if all popular architectures had a hardware-implemented > atomic_t, I wonder how much ext4 really needs the percpu counter, > especially given ext4's multiblock allocator; Delayed allocation will makes multiple block allocation possible for buffered IO. However, we still need to check the percpu counter on write_begin() time for every single possible block allocation (this is to make sure fs is not overbooked), unless write_begin() could cluster the write requests and maps multiple blocks in a single shot. So in reality in ext4 the free blocks percpu_counter check and the s_dirty_blocks (percpu counter too, for delayed blocks) only takes 1 block at a time:( Mingming -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html