RE: Questions about GFS

Bowie Bailey <Bowie_Bailey@xxxxxxx> · Wed, 12 Apr 2006 11:45:19 -0400

As someone else pointed out, it is possible to run diskless
workstations with their root on the GFS.  I haven't tried this
configuration, so I don't know what issues their may be.  The security
issue is there.  Since they are all running from the same disk, a
compromise on one can corrupt the entire cluster.

On my systems, I just have a small hard drive to hold the OS and
applications and then mount the GFS as a data partition.

Bowie

Greg Perry wrote:
> Also, after reviewing the GFS architecture it seems there would be
> significant security issues to consider, ie if one client/member of
> the GFS volume were compromised, that would lead to a full compromise
> of the filesystem across all nodes (and the ability to create special
> devices and modify the filesystem on any other GFS node member).  Are
> there any plans to include any form of discretionary or mandatory
> access controls for GFS in the upcoming v2 release?
> 
> Greg
> 
> Greg Perry wrote:
> > Thanks Bowie, I understand more now.  So within this architecture,
> > it would make more sense to utilize a RAID-5/10 SAN, then add
> > diskless workstations as needed for performance...?
> > 
> > For said diskless workstations, does it make sense to run Stateless
> > Linux to keep the images the same across all of the
> > workstations/client machines? 
> > 
> > Regards
> > 
> > Greg
> > 
> > Bowie Bailey wrote:
> > > Greg Perry wrote:
> > > > I have been researching GFS for a few days, and I have some
> > > > questions that hopefully some seasoned users of GFS may be able
> > > > to answer. 
> > > > 
> > > > I am working on the design of a linux cluster that needs to be
> > > > scalable, it will be primarily an RDBMS-driven data warehouse
> > > > used for data mining and content indexing.  In an ideal world,
> > > > we would be able to start with a small (say 4 node) cluster,
> > > > then add machines (and storage) as the various RDBMS' grow in
> > > > size (as well as the use virtual IPs for load balancing across
> > > > multiple lighttpd instances. All machines on the node need to
> > > > be able to talk to the same volume of information, and GFS (in
> > > > theory at least) would be used to aggregate the drives from
> > > > each machine into that huge shared logical volume). With that
> > > > being said, here are some questions: 
> > > > 
> > > > 1) What is the preference on the RDBMS, will MySQL 5.x work and
> > > > are there any locking issues to consider?  What would the best
> > > > open source RDBMS be (MySQL vs. Postgresql etc)
> > > 
> > > Someone more qualified than me will have to answer that question.
> > > 
> > > > 2) If there was a 10 machine cluster, each with a 300GB SATA
> > > > drive, can you use GFS to aggregate all 10 drives into one big
> > > > logical 3000GB volume?  Would that scenario work similar to a
> > > > RAID array?  If one or two nodes fail, but the GFS quorum is
> > > > maintained, can those nodes be replaced and repopulated just
> > > > like a RAID-5 array?  If this scenario is possible, how
> > > > difficult is it to "grow" the shared logical volume by adding
> > > > additional nodes (say I had two more machines each with a 300GB
> > > > SATA drive)? 
> > > 
> > > GFS doesn't work that way.  GFS is just a fancy filesystem.  It
> > > takes an already shared volume and allows all of the nodes to
> > > access it at the same time. 
> > > 
> > > > 3) How stable is GFS currently, and is it used in many
> > > > production environments?
> > > 
> > > It seems to be stable for me, but we are still in testing mode at
> > > the moment. 
> > > 
> > > > 4) How stable is the FC5 version, and does it include all of the
> > > > configuration utilities in the RH Enterprise Cluster version? 
> > > > (the idea would be to prove the point on FC5, then migrate to RH
> > > > Enterprise).
> > > 
> > > Haven't used that one.
> > > 
> > > > 5) Would CentOS be preferred over FC5 for the initial
> > > > proof of concept and early adoption?
> > > 
> > > If your eventual platform is RHEL, then CentOS would make more
> > > sense for a testing platform since it is almost identical to
> > > RHEL.  Fedora can be less stable and may introduce some issues
> > > that you wouldn't have with RHEL.  On the other hand, RHEL may
> > > have some problems that don't appear on Fedora because of updated
> > > packages. 
> > > 
> > > If you want bleeding edge, use Fedora.
> > > If you want stability, use CentOS or RHEL.
> > > 
> > > > 6) Are there any restrictions or performance advantages of
> > > > using all drives with the same geometry, or can you mix and
> > > > match different size drives and just add to the aggregate
> > > > volume size? 
> > > 
> > > As I said earlier, GFS does not do the aggregation.
> > > 
> > > What you get with GFS is the ability to share an already networked
> > > storage volume.  You can use iSCSI, AoE, GNBD, or others to
> > > connect the storage to all of the cluster nodes.  Then you format
> > > the volume with GFS so that it can be used with all of the nodes.
> > > 
> > > I believe there is a project for the aggregate filesystem that
> > > you are looking for, but as far as I know, it is still beta.
> > > 
> > 
> > --
> > 
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster