Jamie Lokier <jamie@xxxxxxxxxxxxx> wrote on 01/21/2009 12:53:56 PM: > Bryan Henderson wrote: > > Nick Piggin <npiggin@xxxxxxx> wrote on 01/20/2009 05:36:06 PM: > > > > > On Tue, Jan 20, 2009 at 01:25:59PM -0800, Bryan Henderson wrote: > > > > > For this, taking a vector of multiple ranges would be nice. > > > > > Alternatively, issuing parallel fsync_range calls from multiple > > > > > threads would approximate the same thing - if (big if) they aren't > > > > > serialised by the kernel. > > > > > > > > That sounds like a job for fadvise(). A new FADV_WILLSYNC says you're > > > > > > planning to sync that data soon. The kernel responds by scheduling > > the > > > > I/O immediately. fsync_range() takes a single range and in this case > > is > > > > just a wait. I think it would be easier for the user as well as more > > > > flexible for the kernel than a multi-range fsync_range() or multiple > > > > threads. > > > > > > A problem is that the kernel will not always be able to schedule the > > > IO without blocking (various mutexes or block device queues full etc). > > > > I don't really see the problem with that. We're talking about a program > > that is doing device-synchronous I/O. Blocking is a way of life. Plus, > > the beauty of advice is that if it's hard occasionally, the kernel can > > just ignore it. > > If you have 100 file regions, each one a few pages in size, and you do > 100 fsync_range() calls, that results in potentally far from optimal > I/O scheduling (e.g. all over the disk) *and* 100 low-level disk cache > flushes (I/O barriers) instead of just one at the end. 100 head seeks > and 100 cache flush ops can be very expensive. You got lost in the thread here. I proposed a fadvise() that would result in I/O scheduling; Nick said the fadvise() might have to block; I said so what? Now you seem to be talking about 100 fsync_range() calls, each of which starts and then waits for a sync of one range. Getting back to I/O scheduled as a result of an fadvise(): if it blocks because the block queue is full, then it's going to block with a multi-range fsync_range() as well. The other blocks are kind of vague, but I assume they're rare and about the same as for multi-range fsync_range(). > This is the point of taking a vector of ranges to flush - or some > other way to "plug" the I/O and only wait for it after submitting it > all. My fadvise-based proposal waits for I/O only after it has all been submitted. But plugging (delaying the start of I/O even though it is ready to go and the device is idle) is rarely a good idea. It can help for short bursts to a mostly idle device (typically saves half a seek per burst), but a busy device provides a natural plug. It thus can't help throughput, but can improve the response time of a burst. -- Bryan Henderson IBM Almaden Research Center San Jose CA Storage Systems -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html