I am testing Infiniband for the first time. It seems that I should be able to get a lot more speed than I am with some pretty basic tests. Maybe someone running Infiniband can confirm that what I am seeing is way out of line, and/or help diagnose? I have two systems connected using 3.1.2qa3. With 3.1.1 infiniband wouldn't even start, it gave an error about unable to intialize rdma. But with the latest version and an upgrade to OFED 1.5.2, everything starts up with no errors and I can create a volume and mount it. The underlying Infiniband seems ok, and a basic ibv_rc_pingpong test shows I can move data pretty fast: 81920000 bytes in 0.23 seconds = 2858.45 Mbit/sec 10000 iters in 0.23 seconds = 22.93 usec/iter So now I have two volumes created, one that uses tcp over a gig-e link and one that uses rdma. I mount them and do some file copy tests... And they are almost exactly the same? What? gluster volume info Volume Name: test2_volume Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: bravo:/cluster/shadow/test2 Brick2: backup:/cluster/shadow/test2 Volume Name: test_volume Type: Replicate Status: Started Number of Bricks: 2 Transport-type: rdma Bricks: Brick1: bravo:/cluster/shadow/test Brick2: backup:/cluster/shadow/test mount: glusterfs#localhost:/test_volume on /mnt/test type fuse (rw,allow_other,default_permissions,max_read=131072) glusterfs#localhost:/test2_volume on /mnt/test2 type fuse (rw,allow_other,default_permissions,max_read=131072) time cp files.tar /mnt/test2/ real 0m11.159s user 0m0.123s sys 0m1.244s files.tar is single file, 390MB, so this about 35MB/s. Fine for gig-e. ---------------------------- time cp files.tar /mnt/test/ real 0m5.656s user 0m0.116s sys 0m0.962s 69MB/s... ehhh. Faster at least. On a few runs, this was not any faster at all. Maybe a cache effect? ---------------------------- time cp -av /usr/src/kernels /mnt/test2/ real 0m49.605s user 0m0.681s sys 0m2.593s kernels dir is 34MB of small files. The low latency of IB should really show an improvement here I thought. ----------------------------- time cp -av /usr/src/kernels /mnt/test/ real 0m56.046s user 0m0.625s sys 0m2.675s It took LONGER? That can't be right. ------------------------------ And finally, this error is appearing in the rdma mount log every 3 seconds on both nodes: [2011-01-10 19:46:56.728127] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:46:59.738291] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:47:02.748260] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:47:05.758256] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:47:08.768299] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:47:11.778308] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:47:14.788356] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:47:17.798381] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) [2011-01-10 19:47:20.808413] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to failed (Connection refused) But there are no restrictions in the config. Everything is allow *. So my questions are, can anyone else tell me what kind of basic file copy performance they see using IB? And what can I do to troubleshoot? Thanks List and Devs, Chris