Other cluster file systems include Terrascale (now RapidScale), Panasas, IBRIX, PVFS-2, and IBM's proprietary GPFS. Of these (and including previously mentioned Lustre) Panasas and GPFS are most mature. There are also distributed file systems like GFS and Polyserve... but these don't scale that well. NFS-V4 is supposed to have some distribute-ability, but it will probably only work reliably in proprietary Netapp devices, and if NFS-V3 is any indicator (and Linux's NFS is under the same management as for V3), it will take many years to stabilize. The "rule of thumb" is: I/O performance, reliability, and price; pick any two. In using a cluster file system, the best you can expect per node is your interconnect speed, which is, worst case, 100MB/s over GigE... much better than a single local disk (~40MB/s for SATA)... but, to get this number to scale over a lot of nodes has a hefty price tag. Local disks always scale best, if you can use them. Your problem sounds like it's write-intensive. For read-intensive problems, multicast can be used to sync with main file server quickly and cheaply. As stated before, if you can use local disk drives, they scale linerarly between nodes (but not as number of threads increase on the same node), and you can create a local software RAID cheaply (use Linux MD, not a HW RAID controller for best speed) and get best local performance. Also, many MPI-1 implementations have ROMIO extensions, part of MPI-2 spec, which can hide the underlying implementation and be programatically intuitive, but this is not always fastest: most (not all) cluster file systems perform best when each thread/process writes to files in different directories. Write speeds on cluster file systems will be better than read speeds, as all the caching layers between the app and the actual disk provide better performance, unless you need synchronous operation (like a journaled DBMS needs, or where messages are passed via files, as with distributed matlab). On 4/3/07, Andreas Dilger <adilger@xxxxxxxxxxxxx> wrote:
On Apr 03, 2007 15:53 -0700, Larry McVoy wrote: > On Tue, Apr 03, 2007 at 03:50:27PM -0700, David Schwartz wrote: > > > I write scientific number-crunching codes which deal with large > > > input and output files (Gb, tens of Gb, as much as hundreds of > > > Gygabytes on occasions). Now there are these multi-core setups > > > like Intel Core 2 Duo becoming available at a low cost. Disk I/O > > > is one of the biggest bottlenecks. I would very much like to put > > > to use the multiple processors to read or write in parallel > > > (probably using OMP directives in a C program). > > > > How do you think more processors is going to help? Disk I/O doesn't take > > much processor. It's hard to imagine a realistic disk I/O application that > > was somehow limited by available CPU. > > Indeed. S/he is mistaking CPUs for DMA engines. The way you make this go > fast is a lot of disk controllers running parallel. A single CPU can > handle a boatload of interrupts. Rather, I think the issue is that for an HPC application which needs to do frequent checkpoint and data dumps can be slowed down dramatically by lack of parallel IO. Adding more CPUs on a single node generally speeds up the computation part, but can slow down the IO part. That said, the most common IO model for such applications is to have a single input and/or output file for each CPU in the computation. That avoids a lot of IO serialization within the kernel and is efficient as long as the amount of IO per CPU is relatively large (as it sounds like in the initial posting. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users
_______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users