Re: [Linux-cluster] GFS limits?

Don MacAskill <don@xxxxxxxxxxx> · Tue, 13 Jul 2004 20:17:07 -0700

Brian Jackson wrote:

The code that most people on this list are interested in currently is
the code in cvs which is for 2.6 only. 2.6 has a config option to
enable using devices larger than 2TB. I'm still reading through all
the GFS code, but it's still architecturally the same as when it was
closed source, so I'm pretty sure most of my knowledge from OpenGFS
will still apply. GFS uses 64bit values internally, so you can have
very large filesystems (larger than PBs).

This is nice.  I was specifically thinking of 64bit machines, in which 
case, I'd expect it to be 9EB or something.

Our current (homegrown) solution will scale very well for quite some
time, but eventually we're going to get saturated with write requests to
individual head units.  Does GFS intelligently "spread the load" among
multiple storage entities for writing under high load?

No, each node that mounts has direct access to the storage. It writes
just like any other fs, when it can.

So, if I have a dozen seperate arrays in a given cluster, it will write 
data linearly to array #1, then array #2, then array #3?  If that's the 
case, GFS doesn't solve my biggest fear - write performance with a huge 
influx of data.  I'd hoped it might somehow "stripe" the data across 
individual units so that we can aggregate the combined interface 
bandwidth to some extent.

Does it always
write to any available storage units, or are there thresholds where it
expands the pool of units it writes to?  (I'm not sure I'm making much
sense, but we'll see if any of you grok it :)

I think you may have a little misconception about just what GFS is.
You should check the WHATIS_OpenGFS doc at
http://opengfs.sourceforge.net/docs.php It says OpenGFS, but for the
most part, the same stuff applies to GFS.

I've read it, and quite a few other documents and whitepapers on GFS 
quite a few times, but perhaps you're right - I must be missing 
something.  More on this below...

I notice the pricing for GFS is $2200.  Is that per seat?  And if so,
what's a "seat"?  Each client?  Each server with storage participating
in the cluster?  Both?  Some other distinction?

Now I definitely know you have some misconception. GFS doesn't have
any concept of server and client. All nodes mount the fs directly
since they are all directly connected to the storage.

Hmm, yes, this is probably my sticking point.  It was my understanding 
(or maybe just my hope?) that servers could participate as "storage 
units" in the cluster by exporting their block devices, in addition to 
FC or iSCSI or whatever devices which aren't techincally 'servers'.

In other words, I was thinking/hoping that the cluster consisted of 
block units aggregated into a filesystem, and that the filesystem could 
consist of FC RAID devices, iSCSI solutions, and "dumb servers" that 
just exported their local disks to the cluster FS.

Am I totally wrong?  I guess it's GNDB I don't totally understand, so 
I'd better go read up on it.

Thanks,

Don

begin:vcard
fn:Don MacAskill
n:MacAskill;Don
org:smugmug.com
adr:;;3347 Shady Spring Lane;Mountain View;CA;94043;USA
email;internet:don@xxxxxxxxxxx
title:CEO
tel;fax:(650) 641-3125
x-mozilla-html:FALSE
url:http://www.smugmug.com/
version:2.1
end:vcard