Re: Improving real world performance by moving files closer to their target workloads

Martin Fick <mogulguy@xxxxxxxxx> · Mon, 19 May 2008 15:56:43 -0700 (PDT)

--- Luke McGregor <luke@xxxxxxxxxxxxxxx> wrote:
> This could cause some serious problems especially
> on a hevially accessed file. The problem would i
> believe be worsened as the nodes which are hosting 
> any hevially accessed file are the most likely to
> not respond quickly to any kind of multicast.

I think that this problem is one that is exagerated. 
Although it is possible that a heavily accessed node
has a file on it that is needed and this node may end
being the slowest node to respond, this situation is
no worse off than if you did not migrate/replicate
files is it?

Let's look at the problem this way, current unify
(scenario 1):

  Node A      Node B     Node C
  file A

If file A is on Node A and it is being heavily
accessed by Node B, Node A will be heavily accessed
also right?  This means that when Node C requests file
A it will still be contending with Node B, so node A
may be slow to respond.

Fast forward to a migration scenario (2).   Node B is
heavily accessing file A, it gets duplicated to Node
B.

  Node A      Node B     Node C
  file A      file A

When Node C comes along and requests file A it may
have to wait longer for Node B to respond to a meta
data query, buy this should probably be shorter than a
whole file read from Node A in scenario 1, wouldn't
it?  Overall you have still potentially drastically
improved both the system latency and throughput in
scenario 2 over scenario 1.  I really don't think that
the fact that Node B is heavily accessed that it is
will make your solution slower, it just becomes
another potential contention point after having scaled
up already quite with migration/caching.

In order for this to become your blocking point, it
also requires that Node B be heavily loaded and that
it not be accessing file A!!  If Node B were accessing
file A, it would still be a drag on the accessibility
of file A to Node C so adding a quorum solution may
not even help in this case.  However, if Node B
becomes heavily loaded and is no longer accessing file
A, then again your migration solution will kick in and
file A should migrate to where it is actually being
accessed, potentially Node C!  

Perhaps the lesson here is that having the file on
fewer nodes and only the nodes that are actually
accessing it is potentially better for latency!!  If
latency becomes an issue than perhaps a heavy bias
towards migration should be considered?  And perhaps
even a heavier bias towards flushing files from loaded
nodes if that file is not being accessed on the loaded
node!

All in all, I think that enhancing your
migration/flushing heuristics may be a better way to
deal with this latency than any centralized meta data
solution.

Cheers,

-Martin