Re: [rfc] fsync_range?

Bryan Henderson <hbryan@xxxxxxxxxx> · Wed, 21 Jan 2009 14:14:21 -0800

Jamie Lokier <jamie@xxxxxxxxxxxxx> wrote on 01/21/2009 12:53:56 PM:

> Bryan Henderson wrote:
> > Nick Piggin <npiggin@xxxxxxx> wrote on 01/20/2009 05:36:06 PM:
> > 
> > > On Tue, Jan 20, 2009 at 01:25:59PM -0800, Bryan Henderson wrote:
> > > > > For this, taking a vector of multiple ranges would be nice.
> > > > > Alternatively, issuing parallel fsync_range calls from multiple
> > > > > threads would approximate the same thing - if (big if) they 
aren't
> > > > > serialised by the kernel.
> > > > 
> > > > That sounds like a job for fadvise().  A new FADV_WILLSYNC says 
you're 
> > 
> > > > planning to sync that data soon.  The kernel responds by 
scheduling 
> > the 
> > > > I/O immediately.  fsync_range() takes a single range and in this 
case 
> > is 
> > > > just a wait.  I think it would be easier for the user as well as 
more 
> > > > flexible for the kernel than a multi-range fsync_range() or 
multiple 
> > > > threads.
> > > 
> > > A problem is that the kernel will not always be able to schedule the
> > > IO without blocking (various mutexes or block device queues full 
etc).
> > 
> > I don't really see the problem with that.  We're talking about a 
program 
> > that is doing device-synchronous I/O.  Blocking is a way of life. 
Plus, 
> > the beauty of advice is that if it's hard occasionally, the kernel can 

> > just ignore it.
> 
> If you have 100 file regions, each one a few pages in size, and you do
> 100 fsync_range() calls, that results in potentally far from optimal
> I/O scheduling (e.g. all over the disk) *and* 100 low-level disk cache
> flushes (I/O barriers) instead of just one at the end.  100 head seeks
> and 100 cache flush ops can be very expensive.

You got lost in the thread here.  I proposed a fadvise() that would result 
in I/O scheduling; Nick said the fadvise() might have to block; I said so 
what?  Now you seem to be talking about 100 fsync_range() calls, each of 
which starts and then waits for a sync of one range.

Getting back to I/O scheduled as a result of an fadvise(): if it blocks 
because the block queue is full, then it's going to block with a 
multi-range fsync_range() as well.  The other blocks are kind of vague, 
but I assume they're rare and about the same as for multi-range 
fsync_range().

> This is the point of taking a vector of ranges to flush - or some
> other way to "plug" the I/O and only wait for it after submitting it
> all.

My fadvise-based proposal waits for I/O only after it has all been 
submitted.

But plugging (delaying the start of I/O even though it is ready to go and 
the device is idle) is rarely a good idea.  It can help for short bursts 
to a mostly idle device (typically saves half a seek per burst), but a 
busy device provides a natural plug.  It thus can't help throughput, but 
can improve the response time of a burst.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Storage Systems

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html