No subject

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



lure is that the number of completion queue elements (in a completion queue=
) we had requested in ibv_create_cq, (1024 * send_count) is less than the m=
aximum supported by the ib hardware (max_cqe =3D 131071).

----- Original Message -----
From: "Jeremy Stout" <stout.jeremy at gmail.com>
To: "Raghavendra G" <raghavendra at gluster.com>
Cc: gluster-users at gluster.org
Sent: Friday, December 3, 2010 4:20:04 PM
Subject: Re: RDMA Problems with GlusterFS 3.1.1

I patched the source code and rebuilt GlusterFS. Here are the full logs:
Server:
[2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using
/etc/glusterd as working directory
[2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq]
rpc-transport/rdma: max_mr_size =3D 18446744073709551615, max_cq =3D
65408, max_cqe =3D 131071, max_mr =3D 131056
[2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq]
rpc-transport/rdma: rdma.management: creation of send_cq failed
[2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device]
rpc-transport/rdma: rdma.management: could not create CQ
[2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management:
Failed to initialize IB Device
[2010-12-03 07:08:55.953691] E
[rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
initialization failed
[2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init]
glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a
Given volfile:
+--------------------------------------------------------------------------=
----+
  1: volume management
  2:     type mgmt/glusterd
  3:     option working-directory /etc/glusterd
  4:     option transport-type socket,rdma
  5:     option transport.socket.keepalive-time 10
  6:     option transport.socket.keepalive-interval 2
  7: end-volume
  8:

+--------------------------------------------------------------------------=
----+
[2010-12-03 07:09:10.244790] I
[glusterd-handler.c:785:glusterd_handle_create_volume] glusterd:
Received create volume req
[2010-12-03 07:09:10.247646] I [glusterd-utils.c:232:glusterd_lock]
glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a
[2010-12-03 07:09:10.247678] I
[glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
local lock
[2010-12-03 07:09:10.247708] I
[glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
req to 0 peers
[2010-12-03 07:09:10.248038] I
[glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:10.251970] I
[glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:10.252020] I
[glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
unlock req to 0 peers
[2010-12-03 07:09:10.252036] I
[glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
local lock
[2010-12-03 07:09:22.11649] I
[glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd:
Received start vol reqfor volume testdir
[2010-12-03 07:09:22.11724] I [glusterd-utils.c:232:glusterd_lock]
glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a
[2010-12-03 07:09:22.11734] I
[glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
local lock
[2010-12-03 07:09:22.11761] I
[glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
req to 0 peers
[2010-12-03 07:09:22.12120] I
[glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:22.184403] I
[glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to
start glusterfs for brick pgh-submit-1:/mnt/gluster
[2010-12-03 07:09:22.229143] I
[glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
to 0 peers
[2010-12-03 07:09:22.229198] I
[glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
unlock req to 0 peers
[2010-12-03 07:09:22.229218] I
[glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
local lock
[2010-12-03 07:09:22.240157] I
[glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null)
on port 24009


Client:
[2010-12-03 07:09:00.82784] W [io-stats.c:1644:init] testdir: dangling
volume. check volfile
[2010-12-03 07:09:00.82824] W [dict.c:1204:data_to_str] dict: @data=3D(nil)
[2010-12-03 07:09:00.82836] W [dict.c:1204:data_to_str] dict: @data=3D(nil)
[2010-12-03 07:09:00.85980] E [rdma.c:2047:rdma_create_cq]
rpc-transport/rdma: max_mr_size =3D 18446744073709551615, max_cq =3D
65408, max_cqe =3D 131071, max_mr =3D 131056
[2010-12-03 07:09:00.92883] E [rdma.c:2079:rdma_create_cq]
rpc-transport/rdma: testdir-client-0: creation of send_cq failed
[2010-12-03 07:09:00.93156] E [rdma.c:3785:rdma_get_device]
rpc-transport/rdma: testdir-client-0: could not create CQ
[2010-12-03 07:09:00.93224] E [rdma.c:3971:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-03 07:09:00.93313] E [rdma.c:4803:init] testdir-client-0:
Failed to initialize IB Device
[2010-12-03 07:09:00.93332] E [rpc-transport.c:971:rpc_transport_load]
rpc-transport: 'rdma' initialization failed
Given volfile:
+--------------------------------------------------------------------------=
----+
  1: volume testdir-client-0
  2:     type protocol/client
  3:     option remote-host submit-1
  4:     option remote-subvolume /mnt/gluster
  5:     option transport-type rdma
  6: end-volume
  7:
  8: volume testdir-write-behind
  9:     type performance/write-behind
 10:     subvolumes testdir-client-0
 11: end-volume
 12:
 13: volume testdir-read-ahead
 14:     type performance/read-ahead
 15:     subvolumes testdir-write-behind
 16: end-volume
 17:
 18: volume testdir-io-cache
 19:     type performance/io-cache
 20:     subvolumes testdir-read-ahead
 21: end-volume
 22:
 23: volume testdir-quick-read
 24:     type performance/quick-read
 25:     subvolumes testdir-io-cache
 26: end-volume
 27:
 28: volume testdir-stat-prefetch
 29:     type performance/stat-prefetch
 30:     subvolumes testdir-quick-read
 31: end-volume
 32:
 33: volume testdir
 34:     type debug/io-stats
 35:     subvolumes testdir-stat-prefetch
 36: end-volume

+--------------------------------------------------------------------------=
----+


On Fri, Dec 3, 2010 at 12:38 AM, Raghavendra G <raghavendra at gluster.com> wr=
ote:
> Hi Jeremy,
>
> Can you apply the attached patch, rebuild and start glusterfs? Please mak=
e sure to send us the logs of glusterfs.
>
> regards,
> ----- Original Message -----
> From: "Jeremy Stout" <stout.jeremy at gmail.com>
> To: gluster-users at gluster.org
> Sent: Friday, December 3, 2010 6:38:00 AM
> Subject: Re: RDMA Problems with GlusterFS 3.1.1
>
> I'm currently using OFED 1.5.2.
>
> For the sake of testing, I just compiled GlusterFS 3.1.1 from source,
> without any modifications, on two systems that have a 2.6.33.7 kernel
> and OFED 1.5.2 built from source. Here are the results:
>
> Server:
> [2010-12-02 21:17:55.886563] I
> [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd:
> Received start vol reqfor volume testdir
> [2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock]
> glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec
> [2010-12-02 21:17:55.886607] I
> [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
> local lock
> [2010-12-02 21:17:55.886628] I
> [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
> req to 0 peers
> [2010-12-02 21:17:55.887031] I
> [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
> to 0 peers
> [2010-12-02 21:17:56.60427] I
> [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to
> start glusterfs for brick submit-1:/mnt/gluster
> [2010-12-02 21:17:56.104896] I
> [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
> to 0 peers
> [2010-12-02 21:17:56.104935] I
> [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
> unlock req to 0 peers
> [2010-12-02 21:17:56.104953] I
> [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
> local lock
> [2010-12-02 21:17:56.114764] I
> [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null)
> on port 24009
>
> Client:
> [2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir:
> dangling volume. check volfile
> [2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=3D(n=
il)
> [2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=3D(n=
il)
> [2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq]
> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
> [2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device]
> rpc-transport/rdma: testdir-client-0: could not create CQ
> [2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init]
> rpc-transport/rdma: could not create rdma device for mthca0
> [2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0:
> Failed to initialize IB Device
> [2010-12-02 21:17:25.543830] E
> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
> initialization failed
>
> Thank you for the help so far.
>
> On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl <craig at gluster.com> wrote:
>> Jeremy -
>> =C2=A0 What version of OFED are you running? Would you mind install vers=
ion 1.5.2
>> from source? We have seen this resolve several issues of this type.
>> http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/
>>
>>
>> Thanks,
>>
>> Craig
>>
>> -->
>> Craig Carl
>> Senior Systems Engineer
>> Gluster
>>
>>
>> On 12/02/2010 10:05 AM, Jeremy Stout wrote:
>>>
>>> An another follow-up, I tested several compilations today with
>>> different values for send/receive count. I found the maximum value I
>>> could use for both variables was 127. With a value of 127, GlusterFS
>>> did not produce any errors. However, when I changed the value back to
>>> 128, the RDMA errors appeared again.
>>>
>>> I also tried setting soft/hard "memlock" to unlimited in the
>>> limits.conf file, but still ran into RDMA errors on the client side
>>> when the count variables were set to 128.
>>>
>>> On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stout<stout.jeremy at gmail.com>
>>> =C2=A0wrote:
>>>>
>>>> Thank you for the response. I've been testing GlusterFS 3.1.1 on two
>>>> different OpenSUSE 11.3 systems. Since both systems generated the same
>>>> error messages, I'll include the output for both.
>>>>
>>>> System #1:
>>>> fs-1:~ # cat /proc/meminfo
>>>> MemTotal: =C2=A0 =C2=A0 =C2=A0 16468756 kB
>>>> MemFree: =C2=A0 =C2=A0 =C2=A0 =C2=A016126680 kB
>>>> Buffers: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 15680 kB
>>>> Cached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 155860 kB
>>>> SwapCached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> Active: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A065228 kB
>>>> Inactive: =C2=A0 =C2=A0 =C2=A0 =C2=A0 123100 kB
>>>> Active(anon): =C2=A0 =C2=A0 =C2=A018632 kB
>>>> Inactive(anon): =C2=A0 =C2=A0 =C2=A0 48 kB
>>>> Active(file): =C2=A0 =C2=A0 =C2=A046596 kB
>>>> Inactive(file): =C2=A0 123052 kB
>>>> Unevictable: =C2=A0 =C2=A0 =C2=A0 =C2=A01988 kB
>>>> Mlocked: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01988 kB
>>>> SwapTotal: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB
>>>> SwapFree: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> Dirty: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 30072 kB
>>>> Writeback: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4 kB
>>>> AnonPages: =C2=A0 =C2=A0 =C2=A0 =C2=A0 18780 kB
>>>> Mapped: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A012136 kB
>>>> Shmem: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 220 kB
>>>> Slab: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A039592 kB
>>>> SReclaimable: =C2=A0 =C2=A0 =C2=A013108 kB
>>>> SUnreclaim: =C2=A0 =C2=A0 =C2=A0 =C2=A026484 kB
>>>> KernelStack: =C2=A0 =C2=A0 =C2=A0 =C2=A02360 kB
>>>> PageTables: =C2=A0 =C2=A0 =C2=A0 =C2=A0 2036 kB
>>>> NFS_Unstable: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> Bounce: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> WritebackTmp: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> CommitLimit: =C2=A0 =C2=A0 8234376 kB
>>>> Committed_AS: =C2=A0 =C2=A0 107304 kB
>>>> VmallocTotal: =C2=A0 34359738367 kB
>>>> VmallocUsed: =C2=A0 =C2=A0 =C2=A0314316 kB
>>>> VmallocChunk: =C2=A0 34349860776 kB
>>>> HardwareCorrupted: =C2=A0 =C2=A0 0 kB
>>>> HugePages_Total: =C2=A0 =C2=A0 =C2=A0 0
>>>> HugePages_Free: =C2=A0 =C2=A0 =C2=A0 =C2=A00
>>>> HugePages_Rsvd: =C2=A0 =C2=A0 =C2=A0 =C2=A00
>>>> HugePages_Surp: =C2=A0 =C2=A0 =C2=A0 =C2=A00
>>>> Hugepagesize: =C2=A0 =C2=A0 =C2=A0 2048 kB
>>>> DirectMap4k: =C2=A0 =C2=A0 =C2=A0 =C2=A09856 kB
>>>> DirectMap2M: =C2=A0 =C2=A0 3135488 kB
>>>> DirectMap1G: =C2=A0 =C2=A013631488 kB
>>>>
>>>> fs-1:~ # uname -a
>>>> Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55
>>>> EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> fs-1:~ # ulimit -l
>>>> 64
>>>>
>>>> System #2:
>>>> submit-1:~ # cat /proc/meminfo
>>>> MemTotal: =C2=A0 =C2=A0 =C2=A0 16470424 kB
>>>> MemFree: =C2=A0 =C2=A0 =C2=A0 =C2=A016197292 kB
>>>> Buffers: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 11788 kB
>>>> Cached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A085492 kB
>>>> SwapCached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> Active: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A039120 kB
>>>> Inactive: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A076548 kB
>>>> Active(anon): =C2=A0 =C2=A0 =C2=A018532 kB
>>>> Inactive(anon): =C2=A0 =C2=A0 =C2=A0 48 kB
>>>> Active(file): =C2=A0 =C2=A0 =C2=A020588 kB
>>>> Inactive(file): =C2=A0 =C2=A076500 kB
>>>> Unevictable: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB
>>>> Mlocked: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB
>>>> SwapTotal: =C2=A0 =C2=A0 =C2=A067100656 kB
>>>> SwapFree: =C2=A0 =C2=A0 =C2=A0 67100656 kB
>>>> Dirty: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A024 kB
>>>> Writeback: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB
>>>> AnonPages: =C2=A0 =C2=A0 =C2=A0 =C2=A0 18408 kB
>>>> Mapped: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A011644 kB
>>>> Shmem: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 184 kB
>>>> Slab: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A034000 kB
>>>> SReclaimable: =C2=A0 =C2=A0 =C2=A0 8512 kB
>>>> SUnreclaim: =C2=A0 =C2=A0 =C2=A0 =C2=A025488 kB
>>>> KernelStack: =C2=A0 =C2=A0 =C2=A0 =C2=A02160 kB
>>>> PageTables: =C2=A0 =C2=A0 =C2=A0 =C2=A0 1952 kB
>>>> NFS_Unstable: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> Bounce: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> WritebackTmp: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB
>>>> CommitLimit: =C2=A0 =C2=A075335868 kB
>>>> Committed_AS: =C2=A0 =C2=A0 105620 kB
>>>> VmallocTotal: =C2=A0 34359738367 kB
>>>> VmallocUsed: =C2=A0 =C2=A0 =C2=A0 76416 kB
>>>> VmallocChunk: =C2=A0 34359652640 kB
>>>> HardwareCorrupted: =C2=A0 =C2=A0 0 kB
>>>> HugePages_Total: =C2=A0 =C2=A0 =C2=A0 0
>>>> HugePages_Free: =C2=A0 =C2=A0 =C2=A0 =C2=A00
>>>> HugePages_Rsvd: =C2=A0 =C2=A0 =C2=A0 =C2=A00
>>>> HugePages_Surp: =C2=A0 =C2=A0 =C2=A0 =C2=A00
>>>> Hugepagesize: =C2=A0 =C2=A0 =C2=A0 2048 kB
>>>> DirectMap4k: =C2=A0 =C2=A0 =C2=A0 =C2=A07488 kB
>>>> DirectMap2M: =C2=A0 =C2=A016769024 kB
>>>>
>>>> submit-1:~ # uname -a
>>>> Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00
>>>> EST 2010 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> submit-1:~ # ulimit -l
>>>> 64
>>>>
>>>> I retrieved the memory information on each machine after starting the
>>>> glusterd process.
>>>>
>>>> On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra G<raghavendra at gluster.com>
>>>> =C2=A0wrote:
>>>>>
>>>>> Hi Jeremy,
>>>>>
>>>>> can you also get the output of,
>>>>>
>>>>> #uname -a
>>>>>
>>>>> #ulimit -l
>>>>>
>>>>> regards,
>>>>> ----- Original Message -----
>>>>> From: "Raghavendra G"<raghavendra at gluster.com>
>>>>> To: "Jeremy Stout"<stout.jeremy at gmail.com>
>>>>> Cc: gluster-users at gluster.org
>>>>> Sent: Thursday, December 2, 2010 10:20:04 AM
>>>>> Subject: Re: RDMA Problems with GlusterFS 3.1.1
>>>>>
>>>>> Hi Jeremy,
>>>>>
>>>>> In order to diagnoise why completion queue creation is failing (as
>>>>> indicated by logs), we want to know what was the free memory availabl=
e in
>>>>> your system when glusterfs was started.
>>>>>
>>>>> regards,
>>>>> ----- Original Message -----
>>>>> From: "Raghavendra G"<raghavendra at gluster.com>
>>>>> To: "Jeremy Stout"<stout.jeremy at gmail.com>
>>>>> Cc: gluster-users at gluster.org
>>>>> Sent: Thursday, December 2, 2010 10:11:18 AM
>>>>> Subject: Re: RDMA Problems with GlusterFS 3.1.1
>>>>>
>>>>> Hi Jeremy,
>>>>>
>>>>> Yes, there might be some performance decrease. But, it should not aff=
ect
>>>>> working of rdma.
>>>>>
>>>>> regards,
>>>>> ----- Original Message -----
>>>>> From: "Jeremy Stout"<stout.jeremy at gmail.com>
>>>>> To: gluster-users at gluster.org
>>>>> Sent: Thursday, December 2, 2010 8:30:20 AM
>>>>> Subject: Re: RDMA Problems with GlusterFS 3.1.1
>>>>>
>>>>> As an update to my situation, I think I have GlusterFS 3.1.1 working
>>>>> now. I was able to create and mount RDMA volumes without any errors.
>>>>>
>>>>> To fix the problem, I had to make the following changes on lines 3562
>>>>> and 3563 in rdma.c:
>>>>> options->send_count =3D 32;
>>>>> options->recv_count =3D 32;
>>>>>
>>>>> The values were set to 128.
>>>>>
>>>>> I'll run some tests tomorrow to verify that it is working correctly.
>>>>> Assuming it does, what would be the expected side-effect of changing
>>>>> the values from 128 to 32? Will there be a decrease in performance?
>>>>>
>>>>>
>>>>> On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout<stout.jeremy at gmail.com>
>>>>> =C2=A0wrote:
>>>>>>
>>>>>> Here are the results of the test:
>>>>>> submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs #
>>>>>> ibv_srq_pingpong
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000406, PSN 0x703b96, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000407, PSN 0x618cc8, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000408, PSN 0xd62272, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000409, PSN 0x5db5d9, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040a, PSN 0xc51978, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040d, PSN 0xb7a676, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040e, PSN 0x56bde2, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040f, PSN 0xa662bc, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000410, PSN 0xee27b0, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000411, PSN 0x89c683, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000412, PSN 0xd025b3, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000413, PSN 0xcec8e4, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000414, PSN 0x37e5d2, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000415, PSN 0x29562e, G=
ID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
>>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
>>>>>> 8192000 bytes in 0.01 seconds =3D 5917.47 Mbit/sec
>>>>>> 1000 iters in 0.01 seconds =3D 11.07 usec/iter
>>>>>>
>>>>>> fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
>>>>>> submit-1
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000406, PSN 0x3b644e, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000407, PSN 0x173320, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000408, PSN 0xc105ea, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040a, PSN 0xff15b0, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040b, PSN 0xf0b152, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040c, PSN 0x4ced49, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040d, PSN 0x01da0e, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040e, PSN 0x69459a, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040f, PSN 0x197c14, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000410, PSN 0xd50228, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000412, PSN 0x0870eb, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000414, PSN 0x3eefca, G=
ID ::
>>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000415, PSN 0xbd64c6, G=
ID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
>>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
>>>>>> 8192000 bytes in 0.01 seconds =3D 7423.65 Mbit/sec
>>>>>> 1000 iters in 0.01 seconds =3D 8.83 usec/iter
>>>>>>
>>>>>> Based on the output, I believe it ran correctly.
>>>>>>
>>>>>> On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati<anand.avati at gmail.com>
>>>>>> =C2=A0wrote:
>>>>>>>
>>>>>>> Can you verify that ibv_srq_pingpong works from the server where th=
is
>>>>>>> log
>>>>>>> file is from?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Avati
>>>>>>>
>>>>>>> On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout<stout.jeremy at gmail.com=
>
>>>>>>> =C2=A0wrote:
>>>>>>>>
>>>>>>>> Whenever I try to start or mount a GlusterFS 3.1.1 volume that use=
s
>>>>>>>> RDMA, I'm seeing the following error messages in the log file on t=
he
>>>>>>>> server:
>>>>>>>> [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service
>>>>>>>> started
>>>>>>>> [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict:
>>>>>>>> @data=3D(nil)
>>>>>>>> [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict:
>>>>>>>> @data=3D(nil)
>>>>>>>> [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq]
>>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>>>>>>>> [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device]
>>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ
>>>>>>>> [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init]
>>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0
>>>>>>>> [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0:
>>>>>>>> Failed to initialize IB Device
>>>>>>>> [2010-11-30 18:37:53.60030] E
>>>>>>>> [rpc-transport.c:971:rpc_transport_load]
>>>>>>>> rpc-transport: 'rdma' initialization failed
>>>>>>>>
>>>>>>>> On the client, I see:
>>>>>>>> [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir:
>>>>>>>> dangling volume. check volfile
>>>>>>>> [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict:
>>>>>>>> @data=3D(nil)
>>>>>>>> [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict:
>>>>>>>> @data=3D(nil)
>>>>>>>> [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq]
>>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>>>>>>>> [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device]
>>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ
>>>>>>>> [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init]
>>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0
>>>>>>>> [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0=
:
>>>>>>>> Failed to initialize IB Device
>>>>>>>> [2010-11-30 18:43:49.736841] E
>>>>>>>> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
>>>>>>>> initialization failed
>>>>>>>>
>>>>>>>> This results in an unsuccessful mount.
>>>>>>>>
>>>>>>>> I created the mount using the following commands:
>>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir
>>>>>>>> transport rdma submit-1:/exports
>>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir
>>>>>>>>
>>>>>>>> To mount the directory, I use:
>>>>>>>> mount -t glusterfs submit-1:/testdir /mnt/glusterfs
>>>>>>>>
>>>>>>>> I don't think it is an Infiniband problem since GlusterFS 3.0.6 an=
d
>>>>>>>> GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, t=
he
>>>>>>>> commands listed above produced no error messages.
>>>>>>>>
>>>>>>>> If anyone can provide help with debugging these error messages, it
>>>>>>>> would be appreciated.
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux