Re: Improving real world performance by moving files closer to their target workloads

Gordan Bobic <gordan@xxxxxxxxxx> · Fri, 16 May 2008 01:19:48 +0100

Luke McGregor wrote:

We are currently experimenting with running GLuster over the nodes in
the cluster to produce a single large filesystem. For my Honors
research project ive been asked to look into making some improvements
to GLuster to try to improve performance by moving the files within
the GLusterFS closer to the node which is accessing the file.

What i was wondering is basically how hard would it be to write code
to modify the metadata so that when a file is accessed it is then
moved to the node which it is accessed from and its location is
updated in the metadata.

So, you want a unify/AFR hybrid translator that keeps track of what 
nodes use what files most often, and migrate the file to that node? 
Perhaps a probabalistic local caching approach would do well with this. 
When a node accesses a file, there is a chance that it will replicate 
the file to local storage. If a node accesses a file repeatedly, the 
cumulative chance approaches unity. The problem is that you need some 
way of ensuring that files don't exist on more than XYZ nodes, and that 
when the store fills up, the file that gets dropped exists somewhere 
else, when you are dropping the least recently used file from a node.

Interesting enough idea, but I'm not sure if the book-keeping overheads 
would be overcome by speed benefits, especially on a fast network. You'd 
also not be able to route requests for a particular file easily, which 
might end up meaning a broadcast request to all nodes to establish who 
has the file available.

I suspect that designing an algorithm that does all this with 
sufficiently little overhead to keep you ahead in performance will be 
the most difficult part, not writing a GlusterFS plugin. You are almost 
looking at a variant of a probabalistically cached distributed hash 
table network, only without using hashes for routing (which makes it 
more difficult).

I'd _LOVE_ to see this done, though, it sounds like an awesome project. :)

Gordan