tips/nest practices for gluster rdma?

matthew_nicholson at harvard.edu (Matthew Nicholson) · Tue, 9 Jul 2013 15:03:04 -0400

Hey guys,

So, we're testing Gluster RDMA storage, and are having some issues. Things
are working...just not as we expected them. THere isn't a whole lot in the
way, that I've foudn on docs for gluster rdma, aside from basically
"install gluster-rdma", create a volume with transport=rdma, and mount w/
transport=rdma....

I've done that...and the IB fabric is known to be good...however, a volume
created with transport=rdma,tcp and mounted w/ transport=rdma, still seems
to go over tcp?

A little more info about the setup:

we've got 10 storage nodes/bricks, each of which has a single 1GB NIC and a
FRD IB port. Likewise for the test clients. Now, the 1GB nic is for
management only, and we have all of the systems on this fabric configured
with IPoIB, so there is eth0, and ib0 on each node.

All storage nodes are peer'd using the ib0 interface, ie:

gluster peer probe storage_node01-ib
etc

thats all well and good.

Volume was created:

gluster volume create holyscratch transport rdma,tcp
holyscratch01-ib:/holyscratch01/brick
for i in `seq -w 2 10` ; do gluster volume add-brick holyscratch
holyscratch${i}-ib:/holyscratch${i}/brick; done

yielding:

Volume Name: holyscratch
Type: Distribute
Volume ID: 788e74dc-6ae2-4aa5-8252-2f30262f0141
Status: Started
Number of Bricks: 10
Transport-type: tcp,rdma
Bricks:
Brick1: holyscratch01-ib:/holyscratch01/brick
Brick2: holyscratch02-ib:/holyscratch02/brick
Brick3: holyscratch03-ib:/holyscratch03/brick
Brick4: holyscratch04-ib:/holyscratch04/brick
Brick5: holyscratch05-ib:/holyscratch05/brick
Brick6: holyscratch06-ib:/holyscratch06/brick
Brick7: holyscratch07-ib:/holyscratch07/brick
Brick8: holyscratch08-ib:/holyscratch08/brick
Brick9: holyscratch09-ib:/holyscratch09/brick
Brick10: holyscratch10-ib:/holyscratch10/brick
Options Reconfigured:
nfs.disable: on

For testing, we wanted to see how rdma stacked up vs tcp using IPoIB, so we
mounted this like:

[root at holy2a01202 holyscratch.tcp]# df -h |grep holyscratch
holyscratch:/holyscratch
                      273T  4.1T  269T   2% /n/holyscratch.tcp
holyscratch:/holyscratch.rdma
                      273T  4.1T  269T   2% /n/holyscratch.rdma

so, 2 mounts, same volume different transports. fstab looks like:

holyscratch:/holyscratch        /n/holyscratch.tcp      glusterfs
transport=tcp,fetch-attempts=10,gid-timeout=2,acl,_netdev       0       0
holyscratch:/holyscratch        /n/holyscratch.rdma     glusterfs
transport=rdma,fetch-attempts=10,gid-timeout=2,acl,_netdev      0       0

where holyscratch is a RRDNS entry for all the IPoIB interfaces for
fetching the volfile (something it seems, just like peering, MUST be tcp? )

but, again, when running just dumb,dumb,dumb tests (160 threads of dd over
8 nodes w/ each thread writing 64GB, so a 10TB throughput test), I'm seeing
all the traffic on the IPoIB interface for both RDMA and TCP
transports...when i really shouldn't be seeing ANY tcp traffic, aside from
volfile fetches/management on the IPoIB interface when using RDMA as a
transport...right? As a result, from early tests (the bigger 10TB ones are
running now), the tpc and rdma speeds were basically the same...when i
would expect the RDMA one to be at least slightly faster...

Oh, and this is all 3.4beta4, on both the clients and storage nodes.

So, I guess my questions are:

Is this expected/normal?
Is peering/volfile fetching always tcp based?
How should one peer nodes in a RDMA setup?
Should this be tried with only RDMA as a transport on the volume?
Are there more detailed docs for RDMA gluster coming w/ the 3.4 release?

--
Matthew Nicholson
Research Computing Specialist
Harvard FAS Research Computing
matthew_nicholson at harvard.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130709/edd88fa6/attachment.html>