Re: Directly addressing files on individual OSD

Youssef Eldakar <Youssef.Eldakar@xxxxxxxxxxx> · Thu, 16 Mar 2017 07:26:57 +0000

Thanks for the reply, Anthony, and I am sorry my question did not give sufficient background.

This is the cluster behind archive.bibalex.org. Storage nodes keep archived webpages as multi-member GZIP files on the disks, which are formatted using XFS as standalone file systems. The access system consults an index that says where a URL is stored, which is then fetched over HTTP from the individual storage node that has the URL somewhere on one of the disks. So far, we have pretty much been managing the storage using homegrown scripts to have each GZIP file stored on 2 separate nodes. This obviously has been requiring a good deal of manual work and as such has not been very effective.

Given that description, do you feel Ceph could be an appropriate choice?

Thanks once again for the reply.

Youssef Eldakar
Bibliotheca Alexandrina
________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Anthony D'Atri [aad@xxxxxxxxxxxxxx]
Sent: Thursday, March 16, 2017 01:37
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Directly addressing files on individual OSD

As I parse Youssef’s message, I believe there are some misconceptions.  It might help if you could give a bit more info on what your existing ‘cluster’ is running.  NFS? CIFS/SMB?  Something else?

1) Ceph regularly runs scrubs to ensure that all copies of data are consistent.  The checksumming that you describe would be both infeasible and redundant.

2) It sounds as though your current back-end stores user files as-is and is either a traditional file server setup or perhaps a virtual filesystem aggregating multiple filesystems.  Ceph is not a file storage solution in this sense.  The below sounds as though you want user files to not be sharded across multiple servers.  This is antithetical to how Ceph works and is counter to data durability and availability, unless there is some replication that you haven’t described.  Reference this diagram:

http://docs.ceph.com/docs/master/_images/stack.png

Beneath the hood Ceph operates internally on ‘objects’ that are not exposed to clients as such. There are several different client interfaces that are built on top of this block service:

- RBD volumes — think in terms of a virtual disk drive attached to a VM
- RGW — like Amazon S3 or Swift
- CephFS — provides a mountable filesystem interface, somewhat like NFS or even SMB but with important distictions in behavior and use-case

I had not heard of iRODS before but just looked it up.  It is a very different thing than any of the common interfaces to Ceph.

If your users need to mount the storage as a share / volume, in the sense of SMB or NFS, then Ceph may not be your best option.  If they can cope with an S3 / Swift type REST object interface, a cluster with RGW interfaces might do the job, or perhaps Swift or Gluster.   It’s hard to say for sure based on assumptions of what you need.

— Anthony

We currently run a commodity cluster that supports a few petabytes of data. Each node in the cluster has 4 drives, currently mounted as /0 through /3. We have been researching alternatives for managing the storage, Ceph being one possibility, iRODS being another. For preservation purposes, we would like each file to exist as one whole piece per drive (as opposed to being striped across multiple drives). It appears this is the default in Ceph.

Now, it has always been convenient for us to run distributed jobs over SSH to, for instance, compile a list of checksums of all files in the cluster:

dsh -Mca 'find /{0..3}/items -name \*.warc.gz | xargs md5sum >/tmp/$HOSTNAME.md5sum'

And that nicely allows each node to process its own files using the local CPU.

Would this scenario still be possible where Ceph is managing the storage?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com