Re: Improving real world performance by moving files closer to their target workloads

"Luke McGregor" <luke@xxxxxxxxxxxxxxx> · Fri, 16 May 2008 12:15:02 +1200

Thanks for your quick replies :),
in response to Amar:
There is no metadata is stored about the location of the file. But I
am not sure why you want to keep moving file :O if a file is moved to
another node when its accessed, what are the guarantee that its not
accessed by two nodes at a time (hence two copies and it may lead to
I/O errors from GlusterFS).
Also you will have lot of overhead in doing that. You may think of
using I/O -cache. or implementing HSM.
 - we would like to retain the previous copy also as long as there is
free space on the system and have references to both files if this is
possible. The idea would be that over time the nodes which use files
regularly would all have copies of the particular file, (obviously
there is a synchronisation problem here but this could be worked
around), when space is needed the least read copy should be deleted
(assuming that it isnt the last copy). Does this make sense, im not
sure i have explained it very well.

In terms of NUFA scheduling this sounds promising, what is curently
implemented in terms of this? How does this sceduler decide where to
store the files when they are created? By the sounds of it what my
supervisors are thinking of is a similar scheme to the NUFA but
updated on a live basis rather than an on creation basis. Does this
sound correct? I think they have previously looked into caching but
would like to try to optimise this by updating the metadata of the
system to relocate/copy the file at this stage.

In terms of metadata, how does gluster(client) currenty know where to
find a specific file? How easy will it be to update the location of
the file as it appears to GLuster, and can i have references in this
metadata to multiple copies?

Thanks
Luke McGregor