Hello, I've been testing RHCS (CentOS 5.2) cluster with GFS1 for a while, and I'm about to transition the cluster to production, and I'd appreciate a quick review of the architecture and filesystem choices. I've got some concerns about GFS (1 & 2) stability and performance vrs. ext3fs, but the increased flexibility in a clustered filesystem has a lot of advantages. If there are fundamental stability advantages to a design that does not cluster the filesystems (ie., that uses GFS in lock_nolock mode or ext3fs), that would override any performance consideration. Assuming that stability is not an issue, my basic question in terms of choosing an architecture is whether there is better performance through using GFS with multiple cluster nodes (gaining some CPU and network load balancing at the cost of the GFS locking performance penalty) accessing the same data, or whether serving each volume from a single server via NFS (using RCHS solely for fail-over) is more efficient? Obviously, I don't expect anyone to provide definitive answers or data that's unique to our environment, but I'd highly appreciate your view on the architecture choices. Background: Our lab does basic science research on software to process medical images. There are about 40 lab members, with about 15~25 logged in at any given time. Most people will be logged into multiple servers at once, with their home directory and all data directories provided via NFS at this time. The workload is divided between a software development environment (compile/test cycles) and image processing. The software development process is interactive, and includes algorithm testing which requires reading/writing multi-MB files. There's a reasonably high performance expectation for interactive work, less so for the testing phase. Many lab members also SAMBA mount filesystems from the servers to their desktop machines, for which there is a high performance expectation. The image processing is very strongly CPU-bound, but involves reading many image files in the 1 to 50MB range, and writing results files in the same range, along with smaller index and metadata files. The image processing is largely non-interactive, so the I/O performance is not critical. The RHCS cluster will be used for infrastructure services (not as a compute resource for image processing, not as login servers, not as compilation servers). The primary services to be run on the clustered machines are: network file sharing (NFS, Samba) SVN repository backup server (bacula, to fibre-attached tape drive) Wiki nagios None of those services require a lot of CPU. The network file sharing could benefit from load balancing, so that the NFS and SAMBA clients have multiple network paths to the storage, but the NFS and SAMBA protocols are not well suited for using RHCS as a load balancer, so this may not be possible (using LVS or a front-end hardware load balancer is not an option at this time...HA is much more important than load balancing). The goals of using RHCS and clustering those functions are (in order of importance): stability of applications high availability of applications performance expandability of filesystems (ie., expand volumes at the SAN, LUN, LVM, and filesystem layers) expandability of servers (add more servers to the cluster, with machines dedicated to functions, as a crude form of load balancing) The computing environment consists of: 2 RHCS servers fibre attached to storage and backup tape device ~15TB EMC fibre-attached storage ~14TB fibre and iSCSI attached storage in the near future 4 compute servers currently accessing storage via NFS, could be fibre-attached and configured as cluster members 35 compute servers NFS-only access to storage, possibly iSCSI in the future, no chance of fibre attachment As I see it, there are 3 possible architecture choices: [1] infrastructure only-GFS+NFS the 2 cluster nodes share storage via GFS, and act as NFS servers to all compute servers + load balancing of some services - complexity of GFS - performance of shared GFS storage [2] shared storage/NFS 2 cluster nodes and 4 fibre-attached compute servers share storage via GFS (all machines are RHCS nodes, but the compute nodes do not provide infrastructure services, just use cluster membership for GFS file access) each GFS node is potentially an NFS server (via a VIP) to the 35 compute servers that are not on the fibre SAN + potentially faster access to data for 4 fibre-attached compute servers - potentially slow accesss to data for 4 fibre-attached compute servers due to GFS locking + increased stability over 2 node cluster - increased complexity [3] exclusive storage/NFS filesystems are formatted as ext3fs, exclusively mounted to one of the 2 infrastructure cluster nodes at a time, each filesystem mount also includes a child (dependent) function for the node to be an NFS server, all compute nodes access data via NFS + reliability of filesystem + performance of filesystem - potential for corruption in case of non-exclusive access - decreased flexibility due to exclusive use - no potential for load balancing across cluster nodes I'm very interested in getting your opinion of the choices, and would like to learn about other ideas that I may have overlooked. Thanks, Mark ---- Mark Bergman voice: 215-662-7310 mark.bergman@xxxxxxxxxxxxxx fax: 215-614-0266 System Administrator Section of Biomedical Image Analysis Department of Radiology University of Pennsylvania PGP Key: https://www.rad.upenn.edu/sbia/bergman -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster