Re: RDMA Client Hang Problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+Amar, +Rafi - Other maintainers and Peers of transport/rdma

* Can you attach logs from client and brick? Please set diagnostics.client-log-level and diagnostics.brick-log-level to TRACE before starting your tests.
* Does fuse client recover from hang?

I think we might not be handling the poll_err path correctly. The fact that we see issues only after brick reboots we are seeing the issues, makes me suspect the error path.

regards,
Raghavendra

On Wed, Apr 25, 2018 at 6:05 PM, Necati E. SISECI <siseci@xxxxxxxxx> wrote:
Thank you for your mail.

ibv_rc_pingpong seems working between servers and client. Also udaddy, ucmatose, rping etc are working.

root@gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0
  local address:  LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID fe80::ee0d:9aff:fec0:1dc8
  remote address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID fe80::ee0d:9aff:fec0:1b14
8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec
1000 iters in 0.01 seconds = 8.23 usec/iter

root@cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1
  local address:  LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID fe80::ee0d:9aff:fec0:1b14
  remote address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID fe80::ee0d:9aff:fec0:1dc8
8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec
1000 iters in 0.01 seconds = 7.78 usec/iter


Thank you.

Necati.


On 25-04-2018 12:27, Raghavendra Gowdappa wrote:
Is infiniband itself working fine? You can run tools like ibv_rc_pingpong to find out.

On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI <siseci@xxxxxxxxx> wrote:

Dear Gluster-Users,

I am experiencing RDMA problems.

I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel, MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers. All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers are connected via Mellanox SN2100 Switch.

I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers. These 3 boxes are running as gluster cluster. Additionally, I have installed Glusterfs Client to the last one.

I have created Gluster Volume with this command:

# gluster volume create db transport rdma replica 3 arbiter 1 gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force

(network.ping-timeout is 3)

Then I have mounted this volume using mount command below.

mount -t glusterfs -o transport=rdma gluster1:/db /db

After mountings "/db", I can access the files.

The problem is, when I reboot one of the cluster nodes, fuse client gives this error below and hangs.

[2018-04-17 07:42:55.506422] W [MSGID: 103070] [rdma.c:4284:gf_rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send work request on `mlx5_0' returned error wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len = 0, post->reused = 135

When I change transport mode from rdma to tcp, fuse client works well. No hangs.

I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on Ubuntu 16.04.4 and Centos 7.4. But results were the same.

Thank you.

Necati.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux