On Wed 23-06-10 08:29:32, Dave Chinner wrote: > On Tue, Jun 22, 2010 at 04:02:59PM +0200, Jan Kara wrote: > > > 2) most writeback will be submitted by one per-bdi-flusher, so no worry > > > of cache bouncing (this also means the per CPU counter error is > > > normally bounded by the batch size) > > Yes, writeback will be submitted by one flusher thread but the question > > is rather where the writeback will be completed. And that depends on which > > CPU that particular irq is handled. As far as my weak knowledge of HW goes, > > this very much depends on the system configuration (i.e., irq affinity and > > other things). > > And how many paths to the storage you are using, how threaded the > underlying driver is, whether it is using MSI to direct interrupts to > multiple CPUs instead of just one, etc. > > As we scale up we're more likely to see multiple CPUs doing IO > completion for the same BDI because the storage configs are more > complex in high end machines. Hence IMO preventing cacheline > bouncing between submission and completion is a significant > scalability concern. Thanks for details. I'm wondering whether we could assume that although IO completion can run on several CPUs, it will be still a fairly limited number of CPUs. If this is the case, we could then implement a per-cpu counter that would additionally track number of CPUs modifying the counter (the number of CPUs would get zeroed in ???_counter_sum). This way the number of atomic operations won't be much higher (only one atomic inc when a CPU updates the counter for the first time) and if only several CPUs modify the counter, we would be able to bound the error much better. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html