I have a HPC cluster composed by 4 storage nodes
(8x 24TB RAID6 bricks, 2 per nodes) and 62 compute
nodes, interconnected via Infiniband QDR technology.
NB: each brick provide around 1.2-1.5TBs write
performances.
Volume Name:
vol_home
Type:
Distributed-Replicate
Volume ID:
f6ebcfc1-b735-4a0e-b1d7-47ed2d2e7af6
Status: Started
Number of
Bricks: 4 x 2 = 8
Transport-type:
tcp,rdma
Bricks:
Brick1:
ib-storage1:/export/brick_home/brick1
Brick2:
ib-storage2:/export/brick_home/brick1
Brick3:
ib-storage3:/export/brick_home/brick1
Brick4:
ib-storage4:/export/brick_home/brick1
Brick5:
ib-storage1:/export/brick_home/brick2
Brick6:
ib-storage2:/export/brick_home/brick2
Brick7:
ib-storage3:/export/brick_home/brick2
Brick8:
ib-storage4:/export/brick_home/brick2
Options
Reconfigured:
features.quota:
on
diagnostics.brick-log-level:
CRITICAL
auth.allow:
localhost,127.0.0.1,10.*
nfs.disable: on
performance.cache-size:
64MB
performance.write-behind-window-size:
1MB
performance.quick-read:
on
performance.io-cache:
on
performance.io-thread-count:
64
features.default-soft-limit:
90%
But, in the cluster, when I try to mount my
volume specifying RDMA transport type, i notice
all my communication go through TCP stack (all
network packet are visible on ib0 network
interface with ifstat shell command), not through
RDMA
[root@lucifer
~]# mount -t glusterfs -o
transport=rdma,direct-io-mode=disable
localhost:vol_home /home
[root@lucifer ~]# mount|grep
vol_home.rdma
localhost:vol_home.rdma on /home type
fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)
[root@lucifer ~]#
ifstat -i ib0
ib0
KB/s in KB/s out
25313.60 6776.44
26258.96 9064.92
28272.97 10034.15
23495.09 8504.84
21842.41 7161.69
^C
So, my best noticed throughput is around
400MBs, but basically around 200-250MBs, although
I can read on the net i can expect to achieve
around 800-900MBs -sometimes more- with RDMA
transport type.
Can anyone help me to make it work?