Re: running Qemu / Hypervisor AND Ceph on the same nodes

Haomai Wang <haomaiwang@xxxxxxxxx> · Mon, 30 Mar 2015 22:35:03 +0800

We have a related topic in CDS about
hadoop+ceph(https://wiki.ceph.com/Planning/Blueprints/Infernalis/rgw%3A_Hadoop_FileSystem_Interface_for_a_RADOS_Gateway_Caching_Tier).
It's not directly solve the data locality problem but try to avoid
data migration between different storage cluster.

It would be great if big data framework like Hadoop, Spark can export
interface to let ceph or other storage backend aware of compute job
schedule. And a new project tachyon(tachyon-project.org) is doing
something like this.

On Mon, Mar 30, 2015 at 7:20 PM, Gurvinder Singh
<gurvindersinghdahiya@xxxxxxxxx> wrote:
> One interesting use case of combining Ceph with computing is running big
> data jobs on ceph itself. As with CephFS coming along, you can run
> Haddop/Spark jobs directly on ceph without needed to move your data to
> compute resources with data locality support. I am wondering if anyone
> in community is looking at combining storage and compute resources from
> this point of view.
>
> Regards,
> Gurvinder
> On 03/29/2015 09:19 PM, Nick Fisk wrote:
>> There's probably a middle ground where you get the best of both worlds.
>> Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way
>> you get a bit of extra storage and can still use lower end CPU's, but don't
>> have to worry so much about resource contention.
>>
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>>> Martin Millnert
>>> Sent: 29 March 2015 19:58
>>> To: Mark Nelson
>>> Cc: ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re:  running Qemu / Hypervisor AND Ceph on the same
>>> nodes
>>>
>>> On Thu, Mar 26, 2015 at 12:36:53PM -0500, Mark Nelson wrote:
>>>> Having said that, small nodes are
>>>> absolutely more expensive per OSD as far as raw hardware and
>>>> power/cooling goes.
>>>
>>> The smaller volume manufacturers have on the units, the worse the margin
>>> typically (from buyers side).  Also, CPUs typically run up a premium the
>> higher
>>> you go.  I've found a lot of local maximas, optimization-wise, over the
>> past
>>> years both in 12 OSD/U vs 18 OSD/U dedicated storage node setups, for
>>> instance.
>>>   There may be local maximas along colocated low-scale storage/compute
>>> nodes, but the one major problem with colocating storage with compute is
>>> that you can't scale compute independently from storage efficiently, on
>>> using that building block alone.  There may be temporal optimizations in
>>> doing so however (e.g. before you have reached sufficient scale).
>>>
>>> There's no single optimal answer when you're dealing with 20+ variables to
>>> consider... :)
>>>
>>> BR,
>>> Martin
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com