ceph data locality

"Johnu George (johnugeo)" <johnugeo@xxxxxxxxx> · Thu, 4 Sep 2014 07:16:56 +0000

Hi All,
        I was reading more on Hadoop over ceph. I heard from Noah that
tuning of Hadoop on Ceph is going on. I am just curious to know if there
is any reason to keep default object size as 64MB. Is it because of the
fact that it becomes difficult to encode
 getBlockLocations if blocks are divided into objects and to choose the
best location for tasks if no nodes in the system has a complete block.?

I am wondering if someone any benchmark results for various object sizes.
If you have them, it will be helpful if you share them.

I see that Ceph doesn¹t place objects considering the client location or
distance between client and the osds where data is stored.(data-locality)
While, data locality is the key idea for HDFS block placement and
retrieval for maximum throughput. So, how does ceph plan to perform better
than HDFS as ceph relies on random placement
 using hashing unlike HDFS block placement? Can someone also point out
some performance results comparing ceph random placements vs hdfs locality
aware placement?

Also, Sage wrote about a way to specify a node to be primary for hadoop
like environments. 
(http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/1548 ) Is
this through primary affinity configuration?

Thanks,
Johnu

ÿ淸º{.nÇ+돴윯돪†+%듚ÿ깁負¥Šwÿº{.nÇ+돴œz˜ÿu銀쀸㎍썳變}©옽Æ zÚ&j:+v돣?®w?듺2듷솳鈺Ú&¢)傘«a뛴ÿÿ鎬z요z받쀺+껠šŽ듶¢jÿŠw療f