Re: Performance Translators' Stability and Usefulness

Geoff Kassel <gkassel@xxxxxxxxxxxxxxxxxxxxx> · Sat, 4 Jul 2009 18:03:31 +1000

Hi Shehjar,
   I feel I should comment on part of your reply to Gordan's email.

> > Finally - which translators are deemed stable (no know issues -
> > memory leaks/bloat, crashes, corruption, etc.)?
>
> We can definitely vouch for a higher degree of stability of the
> releases. Otherwise, I dont think there is any performance translator we
> can call completely stable/mature because of the roadmap we have for
> constantly upgrading algorithms, functionality, etc.

When will the Gluster team be able to deliver a stable, mature, and reliable 
version of GlusterFS?

I have been using GlusterFS since the v1.3.x days, and I have yet to see a 
version since then that doesn't crash at least once a day from just load on 
even the simplest configurations.

Then there's the data corruption bug of the early 2.0.0 releases, which has 
kept me (and no doubt others) from upgrading to these releases.

I have read about the Gluster QA team, but quite frankly, I have yet to see 
the fruits of this team's work. Letting through a bug of that magnitude in a 
major release blew a lot of trust I had in the Gluster team's QA process.

When will regression tests be used? It's been months now since this bug, and 
still I don't see any sign of the use of this simple, industry-standard 
technique to minimise the risk of such issues slipping through again.

Why wasn't this prioritised after such a disasterous bug?

When will this even show up on the roadmap?

Geoff.

On Sat, 4 Jul 2009, Shehjar Tikoo wrote:
> Gordan Bobic wrote:
> > Just reading through the wiki on this and a few things are unclear,
> > so I'm hoping someone can clarify.
> >
> > 1) readahead
> >
> > - Is there any point in using this on systems where the interconnect
> > <= 1Gb/s? The wiki implies there is no point in this, but doesn't
> > quite state it explicitly.
>
> I am pretty sure it helps. The question of using read-ahead is more of a
> question related to the workload rather than the interconnect, for eg.
> it'll be useful for sequential reading, without any doubts.
> Of course, there can be cases where excessive read-ahead chokes the
> 100 Mib/s link, but then read-ahead can be configured to reduce its
> utilization of the network by reducing the page-count option.
>
> > - Is there any point in using this on a server that is also it's own
> > client when use with replicate/afr? I'm guessing there isn't since
> > the local fs will be doing it's own read-ahead but I'd like some
> > confirmation on that.
>
> No. Generally, read-ahead will be most beneficial only on the client
> side since it helps avoid the need to go to the network when an
> application does need the data already read-ahead. Yes, on the server
> side, on-disk file systems read-ahead already does it best.
>
> In your setup above, in case the system has more than a few CPUs/cores,
> it might be possible to get a little better performance while using
> io-threads on the client. That'll make it possible to offload the
> read-ahead to an io-thread without blocking the main glusterfs thread.
> Then, the benefit of read-ahead + io-threads might show up when the data
> is actually needed, and could be served without a kernel entry/exit for
> file system call.
>
> > 2) io-threads
> >
> > Is this (usefully) applicable on the client side?
>
> It is. Using io-threads on the client side helps offload the processing
> of individual file operations onto a separate thread, freeing up
> the main thread to perform other tasks. This is especially applicable
> when using io-threads under a write-behind and/or read-ahead translators
> where the write-behind and read-ahead requests, i.e. background or
> asynchronous requests essentially, can be offloaded to the threads while
> freeing up the main glusterfs thread to handle sync requests, i.e.
> requests that could make the application block on a syscall.
>
> Also, using io-threads on client side could help in performing network
> IO in a separate thread, again freeing up the main thread for other
> in-band tasks.
>
> Then again, if the workload is not concurrent in terms of number of
> processes or number of files/dirs, then io-threads might not help much.
>
> > 3) io-cache
> >
> > The wiki page has the same paragraph pasted for both io-threads and
> > io-cache. Are they the same thing, or is this a documentation bug?
>
> No, they're not the same. The documentation is still in a flux. Hope
> this version will help:
> http://www.gluster.org/docs/index.php/Translators_options
>
> > What does io-cache do?
>
> io-cache is a translator that caches data from files so that future
> references do not lead to network requests. It is generally used along
> with read-ahead so that the data that gets read ahead or any data that
> gets read, for that matter, will be available from the local client
> cache. We're also working on incorporating support for write buffering
> in io-cache so that write operations can also benefit from local
> buffering until a point in time suitable for actual transmission to the
> server.
>
> > Finally - which translators are deemed stable (no know issues -
> > memory leaks/bloat, crashes, corruption, etc.)?
>
> We can definitely vouch for a higher degree of stability of the
> releases. Otherwise, I dont think there is any performance translator we
> can call completely stable/mature because of the roadmap we have for
> constantly upgrading algorithms, functionality, etc.
>
> > Any particular suggestions on which performance translator
> > combination would be good to apply for a shared root AFR over a WAN?
> > I already have read-subvolume set to the local mirror, but any
> > improvement is welcome when latencies soar to 100ms and b/w gets
> > hammered down to 1-2.5 Mb/s.
>
> WANs are generally characterised as having a large bandwidth-delay
> product. That basically means, for good throughput, we should be
> pipelining as much data as possible over the link, so that the long
> latency overhead can be mitigated or amortised by sending larger amount
> of data for the same fixed overhead.
>
> That said, what particular workload is it that gives you a throughput of
> 1-2.5 Mb/s?
>
> When you say "latencies soar to 100ms", does that mean, these are just
> unusual spikes or is that the normal latency observed?
>
> It'd help to see your volfiles and how the performance translators are
> arranged.
>
> > Another thing - when a node works standalone in AFR, performance is
> > pretty good, but as soon as a peer node joins, even though the
> > original node is the primary, performance degrades on the primary
> > node quite significantly, even though the interconnect is direct
> > gigabit, which shouldn't be adding any particular latency (< 0.1ms)
> > or overheads, especially on the primary node. Is there any particular
> >  reason for this degradation? It's OK in normal usage, but some
> > operations (e.g. building an big bootstrapping initrd (50MB
> > compressed, including all the gernel drivers) takes nearly 10x longer
> >  when the peers join than when the node is standalone. I expected
> > some degradation, but only on the order of added network latency, and
> >  this is way, way more. I tried with and without direct-io=off, and
> > that didn't make a great amount of difference. Which performance
> > translators are likely to help with this use case?
>
> I think Vikas will be able to answer that better.
>
> -Shehjar
>
> > Gordan
> >
> >
> > _______________________________________________ Gluster-devel mailing
> >  list Gluster-devel@xxxxxxxxxx
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel