Hi Luke, Are you going to present this storage cluster to outside network over the 10GigE uplink or this storage is purely for local computing purposes on the same nodes? If you are looking to build an computing/storage integrated system, then you should look at the NUFA scheduler. When you run HPC jobs, lot of scratch data will be generated. Local disks are always faster than remote disks. NUFA scheduler is aware of local/remote disks while scheduling. NUFA decides disk affinity only at the time of creation. This is OK for scratch data, but for permanent data, the I/O profile may change over time. For example, if node72 reads a file on node21 frequently, then it makes sense to move the file to node21. There are lot of ways we can do optimization in GlusterFS if we know the application requirements. Here are few tips to explore: 1) disk-io-cache: Implement a new disk based caching translator based on current memory based io-cache translator (or extend it to support disks). 2) HSM: hierarchical storage management: Frequently accessed files will be pre-fetched to a faster/local cache volume of limited capacity. 3) glusterfs-defrag utility: Optimize the volume by moving files around based on the I/O stat logs. It will do a number of useful things such as leveling the volumes based on free disk space, read-usage, write-usage, file sizes and so on. On the legal side: Your university legal department should give a written statement agreeing to release the code under GPLv3 or later and documentation under GNU FDL v1.2 or later. Your university will retain the copyright ownership. If you need the Gluster team to defend your work legally, then you can assign the copyright to Z RESEARCH instead. If you are not going to re-distribute the code and use it only for your own internal use, then there is no legal issue. Hope this helps.. -- Anand Babu Periasamy GPG Key ID: 0x62E15A31 Blog [http://ab.freeshell.org] The GNU Operating System [http://www.gnu.org] Z RESEARCH Inc [http://www.zresearch.com] Luke McGregor wrote:
Hi Im Luke McGregor and im working on a project at the university of waikato computer science department to make some improvements to GLusterFS to improve performance for our specific application. We are implementing a fairly small cluster (90 machines currently) to use for large scale computing projects. This machine is being built using comodity hardware and backended into a gigabit ethernet backbone with 10G uplinks between switches. Each node in the cluster will be responsible for both storage and workload processing. This is to be achieved with single sata disks in the machines. We are currently experimenting with running GLuster over the nodes in the cluster to produce a single large filesystem. For my Honors research project ive been asked to look into making some improvements to GLuster to try to improve performance by moving the files within the GLusterFS closer to the node which is accessing the file. What i was wondering is basically how hard would it be to write code to modify the metadata so that when a file is accessed it is then moved to the node which it is accessed from and its location is updated in the metadata. Any help/advice where to start would be much appreciated. Thanks Luke McGregor _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel