Re: Improving real world performance by moving files closer to their target workloads

Anand Babu Periasamy <ab@xxxxxxxxxx> · Thu, 15 May 2008 15:55:31 -0700

Hi Luke,
Are you going to present this storage cluster to outside network
over the 10GigE uplink or this storage is purely for local
computing purposes on the same nodes?

If you are looking to build an computing/storage integrated
system, then you should look at the NUFA scheduler. When
you run HPC jobs, lot of scratch data will be generated.
Local disks are always faster than remote disks. NUFA
scheduler is aware of local/remote disks while scheduling.

NUFA decides disk affinity only at the time of creation.
This is OK for scratch data, but for permanent data,
the I/O profile may change over time. For example,
if node72 reads a file on node21 frequently, then it
makes sense to move the file to node21.

There are lot of ways we can do optimization in GlusterFS
if we know the application requirements.

Here are few tips to explore:

1) disk-io-cache: Implement a new disk based caching translator
based on current memory based io-cache translator (or extend it
to support disks).

2) HSM: hierarchical storage management: Frequently accessed
files will be pre-fetched to a faster/local cache volume of
limited capacity.

3) glusterfs-defrag utility: Optimize the volume by moving files
around based on the I/O stat logs. It will do a number of useful
things such as leveling the volumes based on free disk space,
read-usage, write-usage, file sizes and so on.

On the legal side:
Your university legal department should give a written statement
agreeing to release the code under GPLv3 or later and documentation
under GNU FDL v1.2 or later. Your university will retain the
copyright ownership. If you need the Gluster team to defend
your work legally, then you can assign the copyright to Z RESEARCH
instead. If you are not going to re-distribute the code
and use it only for your own internal use, then there is no
legal issue.

Hope this helps..
--
Anand Babu Periasamy
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
The GNU Operating System [http://www.gnu.org]
Z RESEARCH Inc [http://www.zresearch.com]

Luke McGregor wrote:
Hi

Im Luke McGregor and im working on a project at the university of
waikato computer science department to make some improvements to
GLusterFS to improve performance for our specific application. We are
implementing a fairly small cluster (90 machines currently) to use for
large scale computing projects. This machine is being built using
comodity hardware and backended into a gigabit ethernet backbone with
10G uplinks between switches. Each node in the cluster will be
responsible for both storage and workload processing. This is to be
achieved with single sata disks in the machines.

We are currently experimenting with running GLuster over the nodes in
the cluster to produce a single large filesystem. For my Honors
research project ive been asked to look into making some improvements
to GLuster to try to improve performance by moving the files within
the GLusterFS closer to the node which is accessing the file.

What i was wondering is basically how hard would it be to write code
to modify the metadata so that when a file is accessed it is then
moved to the node which it is accessed from and its location is
updated in the metadata.

Any help/advice where to start would be much appreciated.

Thanks
Luke McGregor

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel