We currently run a commodity cluster that supports a few petabytes of data. Each node in the cluster has 4 drives, currently mounted as /0 through /3. We have been researching alternatives for managing the storage, Ceph being one possibility, iRODS being another. For preservation purposes, we would like each file to exist as one whole piece per drive (as opposed to being striped across multiple drives). It appears this is the default in Ceph. Now, it has always been convenient for us to run distributed jobs over SSH to, for instance, compile a list of checksums of all files in the cluster: dsh -Mca 'find /{0..3}/items -name \*.warc.gz | xargs md5sum >/tmp/$HOSTNAME.md5sum' And that nicely allows each node to process its own files using the local CPU. Would this scenario still be possible where Ceph is managing the storage? Thanks in advance for any feedback. Youssef Eldakar Bibliotheca Alexandrina _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com