On Thu, Feb 28, 2008 at 08:09:57AM -0500, Ric Wheeler wrote: > One more thought - what we really want here is to have a sense of the > latency of the device. In the S-ATA disk case, this optimization works > well for batching since we "spend" an extra 4ms worst case in the chance > of combining multiple, slow 18ms operations. > > With the clariion box we tested, the optimization fails badly since the > cost is only 1.3 ms so we optimize by waiting 3-4 times longer than it > would take to do the operation immediately. > > This problem has also seemed to me to be the same problem that IO > schedulers do with plugging - we want to dynamically figure out when to > plug and unplug here without hard coding in device specific tunings. > > If we bypass the snippet for multi-threaded writers, we would probably > slow down this workload on normal S-ATA/ATA drives (or even higher > performance non-RAID disks). It's the self-tuning aspect of this problem that makes it hard. In the case of XFS, the way this tuning is done is that we look at the state of the previous log I/O buffer to check if it is still syncing to disk. If it is sync to disk, we go to sleep waiting for that log buffer I/O to complete. This holds the current buffer open to aggregate more transactions before syncing it to disk and hence allows parallel fsyncs to be issued in the one log write. The fact that it waits for the previous log I/O to complete means it self-tunes to the latency of the underlying storage medium..... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html