Re: background on the ext3 batching performance issue

Chris Mason <chris.mason@xxxxxxxxxx> · Thu, 28 Feb 2008 12:35:17 -0500

On Thursday 28 February 2008, Jan Kara wrote:
> > On Thursday 28 February 2008, Ric Wheeler wrote:
> >
> > [ fsync batching can be slow ]
> >
> > > One more thought - what we really want here is to have a sense of the
> > > latency of the device. In the S-ATA disk case, this optimization works
> > > well for batching since we "spend" an extra 4ms worst case in the
> > > chance of combining multiple, slow 18ms operations.
> > >
> > > With the clariion box we tested, the optimization fails badly since the
> > > cost is only 1.3 ms so we optimize by waiting 3-4 times longer than it
> > > would take to do the operation immediately.
> > >
> > > This problem has also seemed to me to be the same problem that IO
> > > schedulers do with plugging - we want to dynamically figure out when to
> > > plug and unplug here without hard coding in device specific tunings.
> > >
> > > If we bypass the snippet for multi-threaded writers, we would probably
> > > slow down this workload on normal S-ATA/ATA drives (or even higher
> > > performance non-RAID disks).
> >
> > It probably makes sense to keep track of the average number of writers we
> > are able to gather into a transcation.  There are lots of similar
> > workloads where we have a pool of procs doing fsyncs and the size of the
> > transaction or the number of times we joined a running transaction will
> > be fairly constant.
>
>   I'm probably missing something, but what are you trying to say? Either we
> wait for writers and the number of writes is higher, or we don't wait and
> the number of writes in a transaction is lower...

The common workload would be N mail server threads servicing incoming requests 
at a fairly constant rate.  Right now we sleep for a bit and wait for the 
number of writers to increase.  

My guess is that if we record the average number of times a writer joins an 
existing transaction, or if we record the average size of the transactions, 
we'll end up with a fairly constant number.

So, we can skip the sleep if the transaction has already grown close to that 
number.  This would avoid the latencies Ric is seeing.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html