On Fri, Jul 19, 2013 at 8:09 AM, ker can <kercan74@xxxxxxxxx> wrote: > > With ceph is there any way to influence the data block placement for a > single file ? AFAIK, no... But, this is an interesting twist. New files written out to HDFS, IIRC, will by default store 1 local and 2 remote copies. This is great for MapReduce, as the reducers can avoid an extra remote write, and read locality doesn't matter at all--it'll be handled later when a map-reduce job runs. For the sake of simplicity, we were willing to pass over that optimization and accept the 33% extra remote writes. With HBase exploiting this locality on a long term basis this is a different case. There might be some optimization tricks you could play. For instance, a full table compaction followed by an HBase restart should try to schedule the region servers where they will achieve the most locality, but this obviously doesn't help for new data coming in. A brief look at HBase documentation shows a lot knobs related to caching, etc... so more tricks might be possible there. Unfortunately I'm not sure there is a solution to the general problem without fundamental changes to cephfs. However, maybe someone else can chime in that has more detailed knowledge. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com