Re: Improving real world performance by moving files closer to their target workloads

gordan@xxxxxxxxxx · Fri, 16 May 2008 10:52:38 +0100 (BST)

On Fri, 16 May 2008, Luke McGregor wrote:

I think im getting a little closer to understanding this. So the
metadata (file atributes AFR settings ect) is stored as a header on
the actual file and the central name space cache stores information to
do with file lookup, node its stored on ect. Does this sound correct?

The look-up of file location is done by the hash. The namespace only 
serves to present a unified view of all the individual merged stores.

If this is the case is it possible to update this namespace info as
the file is accessed or will that be dificult as they are currently
concidered static. i can see this as a potential issue where the local
and central caches may have consistancy issues.

There are no central caches. The nodes are all equal peers. You would have 
to keep them all in sync. At that rate, you might as well do a broadcast 
(or multicast) to establish who has a file when it's not available 
locally. I'm also not sure that this would be a big problem - the 
broadcast and the corresponding responses would only need to be done when:
1) A file is being open and it isn't available locally (see if another 
node has it).
2) A file is being deleted due to local store filling up (see if the file 
is sufficiently redundant in the network to allow us to delete it from the 
local store).

Would this problem be
lessened any by distributing the namespace cache? (im assuming a DHT
type solution) Would this just mean that consistancy problems would
occur in the instance of a node failure?

I'm not sure that there is a namespace "cache" per se. I think the file 
open call is just routed according to the hash.

Gordan