On 04/02/2014 09:54 PM, Bill Oliver wrote: > > Just to see if I can do it, I thought I'd set up a small grid/cluster > using fedora. Does anybody know of a good step-by-step guide for this > (preferably free and online :-) )? The "cluster" thing has a very blured meaning. You can do a cluster of anything : you can have a cluster of postgres, a cluster of apache nodes, etc.. anything that is made from distinct elements and work together for a computational goal is a cluster. The "grid" word is simple: cluster of clusters (geographical scattered). An API that can access a cluster of clusters is a so-called "grid middleware" .. historically it started with globus and now there are middlewares like EMI (former gLite), ARC, AliEN (experiment specific) and others (i am only familiar with those used in the experiment i am part of) To return to cluster: a cluster (usually) is a bunch of nodes linked by a Resource Manager (like torque, lsf, sge, condor, slurm (used by most of top 500 super clusters)) I said usually because there are things like shared memory clusters. this cluster are made by individual nodes that have linked memory access creating what is called a NUMA machine. if you log on a such machine you will see a single computer with a few thousands cores and some (many) TB of memory. To return to the resources managed cluster: you can have 2 types of processing: distributed and parallel. In distributed computing you have some kind of atomic element of data. In this way you can distribute chunks of data elements and process them and in the end merge the results. Usually this happens in what is called batch systems (like the ones enumerated above at Resource Managers) because the computing jobs are batched in queues and the system process the jobs sequentially (modulo number of computing slots in the cluster). Lately appeared some other kind of processing in the form of interactive processing. It started at CERN with the PROOF subsystem (part of ROOT) around year 2000 (maybe older) and recently something called HADOOP. The principle is that the data is distributed (if not already present) on the nodes and interactively processed. Using the API it is like you have a computer with many more cores that you could have on your computer. (it is funny thing to use a laptop for some data analysis and see things like 3 TB data processed in a couple of minutes) The parallel processing usually is related to big matrices computing (and solving many equations with many variables). The standard API is called OpenMPI (but there are others). The parallel thing appears when you the computation steps have dependencies on the other computation steps so you need barriers, thread synchronization and (in general) communications between threads. From my experience the MPI api is using some resource manager in order to better access the required and available resources (i am used to have/see torque wrappers and integration for launching and using MPI based programs.) The problem with parallel computing is that it is heavily dependent of node intercommunication. It is working with ethernet (and i would recommend drivers and hardware that have RoCE (RDMA over Ethernet)) but an infiniband dedicated network is recommended. As for the answer to your initial inquiry i would highly recommend ROCKS Clusters that is based on CENTOS. it will automatically install and manage your nodes from a single point (the FrontEnd server)(with NFS shared home). HTH, Adrian
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
-- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org