Hi :) On Monday 24 January 2011 22:37 Wendy Cheng wrote > I would love to get an education here. From usage model point of view, > what is the difference between a "parallel file system" and a "cluster > file system" ? i.e., when to use a parallel file system and when to > use a cluster file system ? Please don't top post :) A parallel file system is a file system in which: - metadata and data servers are separated - a file's data is distributed/striped among the data servers (each data server has its own storage) Due to #2, a file is read or written in parallel so you get a higher bandwidth. Each data server/node serves a chunk of each file. This is something similar to a RAID 0 on many servers. Metadata is stored on a metadata server so it doesn't "get in the way" ;) That is, the client node asks the metadata server where the file's chunks are. The metadata server sends the client a list of data nodes which contain the chunks and then the client talks directly to the data nodes without having to talk again with the metadata server. Obviously, this is over simplified ;) Also, take into account that metadata is IOPS intensive while data is bandwidth/throughput intensive. If you separate them both ... you can tune each storage susbsytem to get the best performance for IOPS or bandwidth. Parallel file systems are useful for high bandwidth/throughput systems (HPC). In clustered file systems: - metadata and data servers aren't usually separated (in CXFS they are) - a file's data is not striped among the data servers since there is a single storage array Due to #2, a file is not read/written in parallel. 1 file is served by 1 data node/server. This means you can have 2 nodes serving 2 files at the same time, but each node serves 1 file, not chunks of the same file. Clustered file systems are useful for active/active HA/loadbalancing configurations. This is a very simplified explanation of both. For more in depth explanations check Google ;) Look for GPFS, PVFS, Lustre, PanFS (Panasas), CXFS, GFS, OCFSv2, ... HTH Rafa > On Mon, Jan 24, 2011 at 1:10 PM, Rafa Grimïn <rafagriman@xxxxxxxxx> wrote: > > Hi :) > > > > On Monday 24 January 2011 21:25 Wendy Cheng wrote > > > >> Sometime ago, the following was advertised: > >> > >> "ZFS is not a native cluster, distributed, or parallel file system and > >> cannot provide concurrent access from multiple hosts as ZFS is a local > >> file system. Sun's Lustre distributed filesystem will adapt ZFS as > >> back-end storage for both data and metadata in version 3.0, which is > >> scheduled to be released in 2010." > >> > >> You can google "Lustre" to see whether their plan (built Lustre on top > >> of ZFS) is panned out. > > > > But Lustre isn't a clustered filesystem, it's a parallel filesystem. > > Similar to pNFS, PanFS, ... Comparing GFS to Lustre wouldn't be quite > > right. > > > > ï Rafa -- "We cannot treat computers as Humans. Computers need love." Happily using KDE 4.5.4 :) -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster