Re: grid data placement

Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx> · Wed, 13 Feb 2013 17:55:17 -0600

On 2013-01-15 Gregory Farnum wrote:
> On Tue, Jan 15, 2013 at 9:38 AM, Dimitri Maziuk <dmaziuk <at> bmrb.wisc.edu> wrote:
...
>> (We've $APP running on the cluster, normally one instance/cpu core, that
>> mmap's (read only) ~30GB of binary files. I/O over NFS kills the cluster
>> even with a few hosts. Currently the files are rsync'ed to every host at
>> the start of the batch; that'll only scale to a few dozen hosts at best.)
> 
> There's a "read from replicas" operation flag that allows reading data
> off the local node, although I don't think there's a way to turn it on
> in the standard filesystem clients right now. It wouldn't be hard for
> somebody to add. I'm not sure you actually need it though; Ceph unlike
> NFS distributes the data over all the OSDs in the cluster so you could
> scale the number of suppliers as you scale the number of consumers.

That's the theory; in practice I had 16 jobs sit there for almost an
hour before they started completing. We'll see how the full batch fares,
but so far this does not look good.

The rate with those 30GB files in /var/tmp is ~125 jobs/hour. The rate
with the same files on cephfs with a local osd is so far 3 jobs/hour.
(This is on centos 6 with elrepo's "kernel-lt" build of 3.0.63 -- I know
it's old but their 3.7 doesn't even boot 9 times out of 10 and rolling
my own is way more time and effort than I can put into this project.
Ceph is bobtail from rpms.)

:(
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com