Re: [rfc] fsync_range?

Jamie Lokier <jamie@xxxxxxxxxxxxx> · Wed, 21 Jan 2009 22:30:03 +0000

Bryan Henderson wrote:
> > If you have 100 file regions, each one a few pages in size, and you do
> > 100 fsync_range() calls, that results in potentally far from optimal
> > I/O scheduling (e.g. all over the disk) *and* 100 low-level disk cache
> > flushes (I/O barriers) instead of just one at the end.  100 head seeks
> > and 100 cache flush ops can be very expensive.
> 
> You got lost in the thread here.  I proposed a fadvise() that would result 
> in I/O scheduling; Nick said the fadvise() might have to block; I said so 
> what?  Now you seem to be talking about 100 fsync_range() calls, each of 
> which starts and then waits for a sync of one range.
>
> Getting back to I/O scheduled as a result of an fadvise(): if it blocks 
> because the block queue is full, then it's going to block with a 
> multi-range fsync_range() as well.

No, why would it block?  The block queue has room for (say) 100 small
file ranges.  If you submit 1000 ranges, sure the first 900 may block,
then you've got 100 left in the queue.

Then you call fsync_range() 1000 times, the first 900 are NOPs as you
say because the data has been written.  The remaining 100 (size of the
block queue) are forced to write serially.  They're even written to
the disk platter in order.

> My fadvise-based proposal waits for I/O only after it has all been 
> submitted.

Are you saying one call to fsync_range() should wait for all the
writes which have been queued by the fadvice to different ranges?

> But plugging (delaying the start of I/O even though it is ready to go and 
> the device is idle) is rarely a good idea.  It can help for short bursts 
> to a mostly idle device (typically saves half a seek per burst), but a 
> busy device provides a natural plug.  It thus can't help throughput, but 
> can improve the response time of a burst.

I agree, plugging doesn't make a big difference.

However, letting the disk or elevator reorder the writes it has room
for does sometimes make a big difference.  That's the point.  We're
not talking about forcibly _delaying_ I/O, we're talking about giving
the block elevator, and disk's own elevator, freedom to do their job
by not forcibly _flushing_ and _waiting_ between each individual
request for the length of the queue.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html