This is definitely something that we've discussed, though I don't think
anyone has really planned out what a complete solution would look like
including processor affinity, etc. Before I joined inktank I worked at
a supercomputing institute and one of the projects we worked on was to
develop grid computing tools for bioinformatics research. Moving
analytics rather than the data was a big topic for us too since genomics
data at least tends to be pretty big. Potentially ceph could be a very
interesting solution for that kind of thing.
Mark
On 03/30/2015 06:20 AM, Gurvinder Singh wrote:
One interesting use case of combining Ceph with computing is running big
data jobs on ceph itself. As with CephFS coming along, you can run
Haddop/Spark jobs directly on ceph without needed to move your data to
compute resources with data locality support. I am wondering if anyone
in community is looking at combining storage and compute resources from
this point of view.
Regards,
Gurvinder
On 03/29/2015 09:19 PM, Nick Fisk wrote:
There's probably a middle ground where you get the best of both worlds.
Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way
you get a bit of extra storage and can still use lower end CPU's, but don't
have to worry so much about resource contention.
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Martin Millnert
Sent: 29 March 2015 19:58
To: Mark Nelson
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: running Qemu / Hypervisor AND Ceph on the same
nodes
On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote:
Having said that, small nodes are
absolutely more expensive per OSD as far as raw hardware and
power/cooling goes.
The smaller volume manufacturers have on the units, the worse the margin
typically (from buyers side). Also, CPUs typically run up a premium the
higher
you go. I've found a lot of local maximas, optimization-wise, over the
past
years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for
instance.
There may be local maximas along colocated low-scale storage/compute
nodes, but the one major problem with colocating storage with compute is
that you can't scale compute independently from storage efficiently, on
using that building block alone. There may be temporal optimizations in
doing so however (e.g. before you have reached sufficient scale).
There's no single optimal answer when you're dealing with 20+ variables to
consider... :)
BR,
Martin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com