On Thu, Jul 18, 2013 at 3:13 PM, ker can <kercan74@xxxxxxxxx> wrote:
the hbase+hdfs throughput results were 38x better.Any thoughts on what might be going on ?
Looks like this might be a data locality issue. After loading the table, when I look at the data block map of a region's store files its spread out on disks across all nodes. For my test 'usertable' hbase table osd 0-6 is on one node, and 7-13 is on another node. This is the map of region "da3b3bf6c0c5a9b387d23944122f208b" store file "0c43d345e3ea42abb5ce5a98b162218a"
hadoop@dmse-141:/mnt/mycephfs/hbase/usertable/da3b3bf6c0c5a9b387d23944122f208b/family$ cephfs 0c43d345e3ea42abb5ce5a98b162218a map
FILE OFFSET OBJECT OFFSET LENGTH OSD
0 10000001abd.00000000 0 67108864 2
67108864 10000001abd.00000001 0 67108864 4
134217728 10000001abd.00000002 0 67108864 8
201326592 10000001abd.00000003 0 67108864 6
268435456 10000001abd.00000004 0 67108864 3
335544320 10000001abd.00000005 0 67108864 6
402653184 10000001abd.00000006 0 67108864 9
469762048 10000001abd.00000007 0 67108864 9
536870912 10000001abd.00000008 0 67108864 0
603979776 10000001abd.00000009 0 67108864 2
671088640 10000001abd.0000000a 0 67108864 8
738197504 10000001abd.0000000b 0 67108864 13
805306368 10000001abd.0000000c 0 67108864 1
872415232 10000001abd.0000000d 0 67108864 1
939524096 10000001abd.0000000e 0 67108864 3
1006632960 10000001abd.0000000f 0 67108864 7
1073741824 10000001abd.00000010 0 67108864 3
1140850688 10000001abd.00000011 0 67108864 13
1207959552 10000001abd.00000012 0 67108864 13
hadoop@dmse-141:/mnt/mycephfs/hbase/usertable/da3b3bf6c0c5a9b387d23944122f208b/family$ cephfs 0c43d345e3ea42abb5ce5a98b162218a map
FILE OFFSET OBJECT OFFSET LENGTH OSD
0 10000001abd.00000000 0 67108864 2
67108864 10000001abd.00000001 0 67108864 4
134217728 10000001abd.00000002 0 67108864 8
201326592 10000001abd.00000003 0 67108864 6
268435456 10000001abd.00000004 0 67108864 3
335544320 10000001abd.00000005 0 67108864 6
402653184 10000001abd.00000006 0 67108864 9
469762048 10000001abd.00000007 0 67108864 9
536870912 10000001abd.00000008 0 67108864 0
603979776 10000001abd.00000009 0 67108864 2
671088640 10000001abd.0000000a 0 67108864 8
738197504 10000001abd.0000000b 0 67108864 13
805306368 10000001abd.0000000c 0 67108864 1
872415232 10000001abd.0000000d 0 67108864 1
939524096 10000001abd.0000000e 0 67108864 3
1006632960 10000001abd.0000000f 0 67108864 7
1073741824 10000001abd.00000010 0 67108864 3
1140850688 10000001abd.00000011 0 67108864 13
1207959552 10000001abd.00000012 0 67108864 13
For hbase+hdfs, all blocks within a single region were on the same region server/data node. So in the region server stats with hdfs you see a 100% data locality index and much better cache hit ratios.
hbase + hdfs region server stats:
blockCacheSizeMB=201.31, blockCacheFreeMB=45.57, blockCacheCount=3013,
blockCacheHitCount=9464863, blockCacheMissCount=10633061, blockCacheEvictedCount=9305729, blockCacheHitRatio=47%, blockCacheHitCachingRatio=50%,
hdfsBlocksLocalityIndex=100,
hbase + ceph region server stats:
blockCacheSizeMB=205.59, blockCacheFreeMB=41.29, blockCacheCount=2989,
blockCacheHitCount=1038372, blockCacheMissCount=1042117, blockCacheEvictedCount=397801, blockCacheHitRatio=49%, blockCacheHitCachingRatio=72%,
hdfsBlocksLocalityIndex=47
With ceph is there any way to influence the data block placement for a single file ?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com