Re: ceph & hbase:

Noah Watkins <noah.watkins@xxxxxxxxxxx> · Fri, 19 Jul 2013 23:04:31 -0700

On Fri, Jul 19, 2013 at 8:09 AM, ker can <kercan74@xxxxxxxxx> wrote:
>
> With ceph is there any way to influence the data block placement for a
> single file ?

AFAIK, no... But, this is an interesting twist. New files written out
to HDFS, IIRC, will by default store 1 local and 2 remote copies. This
is great for MapReduce, as the reducers can avoid an extra remote
write, and read locality doesn't matter at all--it'll be handled later
when a map-reduce job runs. For the sake of simplicity, we were
willing to pass over that optimization and accept the 33% extra remote
writes.

With HBase exploiting this locality on a long term basis this is a
different case.

There might be some optimization tricks you could play. For instance,
a full table compaction followed by an HBase restart should try to
schedule the region servers where they will achieve the most locality,
but this obviously doesn't help for new data coming in. A brief look
at HBase documentation shows a lot knobs related to caching, etc... so
more tricks might be possible there.

Unfortunately I'm not sure there is a solution to the general problem
without fundamental changes to cephfs. However, maybe someone else can
chime in that has more detailed knowledge.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com