I have a HPC cluster composed by 4 storage nodes (8x 24TB
RAID6 bricks, 2 per nodes) and 62 compute nodes, interconnected
via Infiniband QDR technology.
NB: each brick provide around 1.2-1.5TBs write performances.
Volume Name: vol_home
Type: Distributed-Replicate
Volume ID:
f6ebcfc1-b735-4a0e-b1d7-47ed2d2e7af6
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp,rdma
Bricks:
Brick1:
ib-storage1:/export/brick_home/brick1
Brick2:
ib-storage2:/export/brick_home/brick1
Brick3:
ib-storage3:/export/brick_home/brick1
Brick4:
ib-storage4:/export/brick_home/brick1
Brick5:
ib-storage1:/export/brick_home/brick2
Brick6:
ib-storage2:/export/brick_home/brick2
Brick7:
ib-storage3:/export/brick_home/brick2
Brick8:
ib-storage4:/export/brick_home/brick2
Options Reconfigured:
features.quota: on
diagnostics.brick-log-level:
CRITICAL
auth.allow: localhost,127.0.0.1,10.*
nfs.disable: on
performance.cache-size: 64MB
performance.write-behind-window-size:
1MB
performance.quick-read: on
performance.io-cache: on
performance.io-thread-count: 64
features.default-soft-limit: 90%
But, in the cluster, when I try to mount my volume
specifying RDMA transport type, i notice all my communication
go through TCP stack (all network packet are visible on ib0
network interface with ifstat shell command), not through RDMA
[root@lucifer ~]# mount -t
glusterfs -o transport=rdma,direct-io-mode=disable
localhost:vol_home /home
[root@lucifer
~]# mount|grep vol_home.rdma
localhost:vol_home.rdma
on /home type fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)
[root@lucifer
~]# ifstat -i ib0
ib0
KB/s
in KB/s out
25313.60
6776.44
26258.96
9064.92
28272.97
10034.15
23495.09
8504.84
21842.41
7161.69
^C
So, my best noticed throughput is around 400MBs, but
basically around 200-250MBs, although I can read on the net i
can expect to achieve around 800-900MBs -sometimes more- with
RDMA transport type.
Can anyone help me to make it work?