Hi Jeremy, Yes, there might be some performance decrease. But, it should not affect working of rdma. regards, ----- Original Message ----- From: "Jeremy Stout" <stout.jeremy at gmail.com> To: gluster-users at gluster.org Sent: Thursday, December 2, 2010 8:30:20 AM Subject: Re: RDMA Problems with GlusterFS 3.1.1 As an update to my situation, I think I have GlusterFS 3.1.1 working now. I was able to create and mount RDMA volumes without any errors. To fix the problem, I had to make the following changes on lines 3562 and 3563 in rdma.c: options->send_count = 32; options->recv_count = 32; The values were set to 128. I'll run some tests tomorrow to verify that it is working correctly. Assuming it does, what would be the expected side-effect of changing the values from 128 to 32? Will there be a decrease in performance? On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout <stout.jeremy at gmail.com> wrote: > Here are the results of the test: > submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong > ?local address: ?LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: > ?local address: ?LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: > ?local address: ?LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: > ?local address: ?LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: > ?local address: ?LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: > ?local address: ?LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: > ?local address: ?LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: > ?local address: ?LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: > ?local address: ?LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: > ?local address: ?LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: > ?local address: ?LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: > ?local address: ?LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: > ?local address: ?LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: > ?local address: ?LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: > ?local address: ?LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: > ?local address: ?LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: > ?remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: > ?remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: > ?remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: > ?remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: > ?remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: > ?remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: > ?remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: > ?remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: > ?remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: > ?remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: > ?remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: > ?remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: > ?remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: > ?remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: > ?remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: > ?remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: > 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec > 1000 iters in 0.01 seconds = 11.07 usec/iter > > fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1 > ?local address: ?LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: > ?local address: ?LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: > ?local address: ?LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: > ?local address: ?LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: > ?local address: ?LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: > ?local address: ?LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: > ?local address: ?LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: > ?local address: ?LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: > ?local address: ?LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: > ?local address: ?LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: > ?local address: ?LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: > ?local address: ?LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: > ?local address: ?LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: > ?local address: ?LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: > ?local address: ?LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: > ?local address: ?LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: > ?remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: > ?remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: > ?remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: > ?remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: > ?remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: > ?remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: > ?remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: > ?remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: > ?remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: > ?remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: > ?remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: > ?remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: > ?remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: > ?remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: > ?remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: > ?remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: > 8192000 bytes in 0.01 seconds = 7423.65 Mbit/sec > 1000 iters in 0.01 seconds = 8.83 usec/iter > > Based on the output, I believe it ran correctly. > > On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati <anand.avati at gmail.com> wrote: >> Can you verify that ibv_srq_pingpong works from the server where this log >> file is from? >> >> Thanks, >> Avati >> >> On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout <stout.jeremy at gmail.com> wrote: >>> >>> Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses >>> RDMA, I'm seeing the following error messages in the log file on the >>> server: >>> [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service started >>> [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: @data=(nil) >>> [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: @data=(nil) >>> [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq] >>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed >>> [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device] >>> rpc-transport/rdma: testdir-client-0: could not create CQ >>> [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init] >>> rpc-transport/rdma: could not create rdma device for mthca0 >>> [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0: >>> Failed to initialize IB Device >>> [2010-11-30 18:37:53.60030] E [rpc-transport.c:971:rpc_transport_load] >>> rpc-transport: 'rdma' initialization failed >>> >>> On the client, I see: >>> [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir: >>> dangling volume. check volfile >>> [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: @data=(nil) >>> [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: @data=(nil) >>> [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq] >>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed >>> [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device] >>> rpc-transport/rdma: testdir-client-0: could not create CQ >>> [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init] >>> rpc-transport/rdma: could not create rdma device for mthca0 >>> [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0: >>> Failed to initialize IB Device >>> [2010-11-30 18:43:49.736841] E >>> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' >>> initialization failed >>> >>> This results in an unsuccessful mount. >>> >>> I created the mount using the following commands: >>> /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir >>> transport rdma submit-1:/exports >>> /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir >>> >>> To mount the directory, I use: >>> mount -t glusterfs submit-1:/testdir /mnt/glusterfs >>> >>> I don't think it is an Infiniband problem since GlusterFS 3.0.6 and >>> GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the >>> commands listed above produced no error messages. >>> >>> If anyone can provide help with debugging these error messages, it >>> would be appreciated. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >> > _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users