Re: Improving real world performance by moving files closer to their target workloads

Martin Fick <mogulguy@xxxxxxxxx> · Tue, 20 May 2008 10:11:06 -0700 (PDT)

--- Gordan Bobic <gordan@xxxxxxxxxx> wrote:

> > My point was that, as I understood your algorithm,
> > a client would not know which nodes contained a 
> > certain file until all nodes had been 
> > contacted.  So, while the actual bandwidth, even
> > to consult thousands of nodes, might be small 
> > relative to file transfer bandwidth, the client 
> > can't assume it has a complete answer until it
> > gets all the replies, meaning requests to downed 
> > nodes have timed out.
> 
> I agree that waiting for all nodes could be an issue
> in case of downed nodes, and I concur that quorum 
> would be a good work-around.

I think that it might help to stay focused in these
types of discussions.  Luke was concerned with
increasing performance, not reliability.  Yes, the
idea of an AFR like unify translator was brought up,
and it would be neat to be able to have HA and
performance.  However, it might be helpful for now to
deal with these as two separate issues.  

Having said that, downed nodes will currently (in the
non migration scenario) affect unify, they will
effectively shutdown the whole cluster.  The current
way to "fix" this is to use AFR underneath unify.  So,
if a unify translator were modified to be able to
migrate files for performance, we are no worse off
than we currently are with the unify translator if one
of those nodes goes down.  

So why try and solve this issue here?  What you really
are talking about is solving AFRs issues, not issues
with the migration solution.  I agree that AFR could
use some enhancements to deal with split brain, but
that seems out of the scope of a migration type
solution aimed at improving performance.  I suggest
that Luke should pursue increasing performance with
migration and making that work well without adding
additional constraints to his problem.

The simplest migration solution is to not tolerate
downed nodes!  If you do not tolerate them, you do not
have locking/split brain types of issues to resolve. 
Simply migrate a file where it is needed and never
leave a copy behind where it can get out of sync.  If
you want HA, install AFR under each subvolume.  If you
want to solve split brain issues with AFR (I hope we
can,) start another thread. :)  Once AFR split brain
issues are resolved in glusterfs, merging AFR and a
Luke's potential merging unify translator should be a
much easier and well defined task!

Cheers,

-Martin