On 16.03.2017 08:26, Youssef Eldakar wrote:
Thanks for the reply, Anthony, and I am sorry my question did not give sufficient background.
This is the cluster behind archive.bibalex.org. Storage nodes keep archived webpages as multi-member GZIP files on the disks, which are formatted using XFS as standalone file systems. The access system consults an index that says where a URL is stored, which is then fetched over HTTP from the individual storage node that has the URL somewhere on one of the disks. So far, we have pretty much been managing the storage using homegrown scripts to have each GZIP file stored on 2 separate nodes. This obviously has been requiring a good deal of manual work and as such has not been very effective.
Given that description, do you feel Ceph could be an appropriate choice?
if you adapt your scripts to something like...
"Storage nodes archives webpages as gzip files, hashes the url to use as
an object name and saves the gzipfiles as an object in ceph via the S3
interface. The access system gets a request for an url, it hashes an
url into a object name and fetch the gzip (object) using regular S3 get
syntax"
ceph would deal with replication, you would only put objects in, and
fetch them out.
you could if you need it store the list of urls and hashes. except as a
list of what you have stored.
this is just an example tho. you could also use cephfs, mounted on nodes
and serve files as today.
ceph is just a storage tool it could work very nicely for your needs.
but accessing the files on osd's directly will only bring pain.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com