Hi Luke, Feels good that your university is looking into GlusterFS. Few tips inline. On Thu, May 15, 2008 at 2:25 PM, Luke McGregor <luke@xxxxxxxxxxxxxxx> wrote: > Hi > > Im Luke McGregor and im working on a project at the university of > waikato computer science department to make some improvements to > GLusterFS to improve performance for our specific application. Understanding i/o pattern of the application will generally help to tune the filesystem to very good performance. You can look into it. > We are > implementing a fairly small cluster (90 machines currently) to use for > large scale computing projects. This machine is being built using > comodity hardware and backended into a gigabit ethernet backbone with > 10G uplinks between switches. Each node in the cluster will be > responsible for both storage and workload processing. This is to be > achieved with single sata disks in the machines. > You may use single process for both server and client to save overhead due to context switching. > > We are currently experimenting with running GLuster over the nodes in > the cluster to produce a single large filesystem. For my Honors > research project ive been asked to look into making some improvements > to GLuster to try to improve performance by moving the files within > the GLusterFS closer to the node which is accessing the file. > You may look at NUFA scheduler. We are thinking of a way to reduce the overhead in case of spec file management for NUFA. Which may come soon. > > What i was wondering is basically how hard would it be to write code > to modify the metadata so that when a file is accessed it is then > moved to the node which it is accessed from and its location is > updated in the metadata. > There is no metadata is stored about the location of the file. But I am not sure why you want to keep moving file :O if a file is moved to another node when its accessed, what are the guarantee that its not accessed by two nodes at a time (hence two copies and it may lead to I/O errors from GlusterFS). Also you will have lot of overhead in doing that. You may think of using I/O -cache. or implementing HSM. -Amar -- Amar Tumballi Gluster/GlusterFS Hacker [bulde on #gluster/irc.gnu.org] http://www.zresearch.com - Commoditizing Super Storage!