Re: gfs2 v. zfs?

Steven Whitehouse <swhiteho@xxxxxxxxxx> · Wed, 26 Jan 2011 10:00:17 +0000

Hi,

On Tue, 2011-01-25 at 15:27 -0800, Wendy Cheng wrote:
> On Tue, Jan 25, 2011 at 11:34 AM, yvette hirth <yvette@xxxxxxxxxxxx> wrote:
> > Rafa GrimÃn wrote:
> >
> >> Yes that is true. It's a bit blurry because some file systems have
> >> features others have so "classifying" them is quite difficult.
> >
> > i'm amazed at the conversation that has taken place by me simply asking a
> > question.
> >
> > *Thank You* all for all of this info!
> 
> We purposely diverted your question to backup, since it is easier to
> have productive discussions (compared to directory list) :) ... In
> general, any "walk" operation on GFS2 can become a pain for various
> reasons. Directory listing is certainly one of them. It is an age old
> problem. Other than the inherited issues from the horrible stat()
> system call, it is also to do with the way GFS(1/2) likes to
> "distribute" its block all over the device upon write contention. I
> don't see how GFS2 can alleviate this pain w/out doing some sorts of
> block reallocation.
> 
> I'll let other capable people to have another round of fun discussions
> .... Maybe some creative ideas can get popped out as the result ...
> 
> -- Wendy
> 
Although the block allocation can be an issue, the larger issue is that
of caching and when/whether the cache is flushed due to a write
operation on another node. The combination of workloads which scan all
files and a write workload, updating the same file set when run from
different nodes can cause dramatic slowdowns.

The solution is to try and partition the workload in a way which makes
the best use of the cache and reduces the number of invalidations which
are done,

Steve.

> >
> > we've traced the response time slowdown to "number of subdirectories that
> > need to be listed when their parent directory is enumerated".
> >
> > btw, my usage of "enumeration" means, "list contents".  sorry for any
> > confusion.
> >
> > we've noticed that if we do:
> >
> > ls -lhad /foo
> > ls -lhad /foo/stuff
> > ls -lhad /foo/stuff/moreStuff
> >
> > response time is good, because , but
> >
> > ls -lhad /foo/stuff/moreStuff/*
> >
> > is where response time increases by a magnitude, because moreStuff has ~260
> > directories.  enumerating moreStuff and other "directories with many
> > subdirectories" appear to be the culprits.
> >
> > for now, we'll be moving directories around, trying to reduce the number of
> > nested levels, and number of elements in each level.
> >
> > in human interaction there is a rule:  as the number of people interacting
> > increases linearly, the number of interactions between the people increases
> > exponentially.  is it true that as the number of nodes, "n", increases
> > linearly, the amount of metadata being passed around / inspected during disk
> > access increases geometrically?  does this "rule" apply?  or does metadata
> > processing increase linearly as well, because the querying is all done by
> > one node?
> >
> > thanks again - what a group!
> > yvette
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster