Re: rping segfault with 4.9.28 on CentOS 7.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The ib_read_bw looks like it can use rdma_cm or not. By default, I can
get things to work between the nodes. If I specify -R or -z, it fails.
It seems that the context is not being set properly when using
rdma_cm.

"Server"
-----------

# ib_read_bw

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                   RDMA_Read BW Test
Dual-port       : OFF          Device         : mlx5_0
Number of qps   : 1            Transport type : IB
Connection type : RC           Using SRQ      : OFF
CQ Moderation   : 100
Mtu             : 1024[B]
Link type       : Ethernet
GID index       : 2
Outstand reads  : 16
rdma_cm QPs     : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x011a PSN 0xa0e9fd OUT 0x10 RKey 0x00175e
VAddr 0x007fc73fd6e000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:13:13
remote address: LID 0000 QPN 0x011a PSN 0xf7747b OUT 0x10 RKey
0x002797 VAddr 0x007fe5cccc5000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:13:14
---------------------------------------------------------------------------------------
#bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
65536      1000             2728.79            2728.77            0.043660
---------------------------------------------------------------------------------------

# ib_read_bw -R

************************************
* Waiting for client to connect... *
************************************
Segmentation fault (core dumped)

# gdb ib_read_bw core.8319
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/ib_read_bw...Reading symbols from
/usr/lib/debug/usr/bin/ib_read_bw.debug...done.
done.
[New LWP 8319]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ib_read_bw -R'.
Program terminated with signal 11, Segmentation fault.
#0  __ibv_query_device (context=0x0, device_attr=0x7ffcd8fec160) at
src/verbs.c:135
135             return context->ops.query_device(context, device_attr);
(gdb) bt
#0  __ibv_query_device (context=0x0, device_attr=0x7ffcd8fec160) at
src/verbs.c:135
#1  0x0000000000410518 in check_for_contig_pages_support
(context=<optimized out>) at src/perftest_resources.c:262
#2  ctx_init (ctx=ctx@entry=0x110b000,
user_param=user_param@entry=0x110ad70) at
src/perftest_resources.c:1314
#3  0x000000000040585c in rdma_server_connect (ctx=0x110b000,
user_param=0x110ad70) at src/perftest_communication.c:1119
#4  0x0000000000405f53 in establish_connection
(comm=comm@entry=0x7ffcd8fec470) at src/perftest_communication.c:1244
#5  0x0000000000402b37 in main (argc=<optimized out>, argv=<optimized
out>) at src/read_bw.c:110
(gdb) f 0
#0  __ibv_query_device (context=0x0, device_attr=0x7ffcd8fec160) at
src/verbs.c:135
135             return context->ops.query_device(context, device_attr);
(gdb) list
130     }
131
132     int __ibv_query_device(struct ibv_context *context,
133                            struct ibv_device_attr *device_attr)
134     {
135             return context->ops.query_device(context, device_attr);
136     }
137     default_symver(__ibv_query_device, ibv_query_device);
138
139     int __ibv_query_port(struct ibv_context *context, uint8_t port_num,
(gdb) p context
$1 = (struct ibv_context *) 0x0

# ib_read_bw -z

************************************
* Waiting for client to connect... *
************************************
Segmentation fault (core dumped)

# gdb ib_read_bw core.8369
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/ib_read_bw...Reading symbols from
/usr/lib/debug/usr/bin/ib_read_bw.debug...done.
done.
[New LWP 8369]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ib_read_bw -z'.
Program terminated with signal 11, Segmentation fault.
#0  __ibv_query_device (context=0x0, device_attr=0x7ffe5f5ee4b0) at
src/verbs.c:135
135             return context->ops.query_device(context, device_attr);
(gdb) bt
#0  __ibv_query_device (context=0x0, device_attr=0x7ffe5f5ee4b0) at
src/verbs.c:135
#1  0x0000000000410518 in check_for_contig_pages_support
(context=<optimized out>) at src/perftest_resources.c:262
#2  ctx_init (ctx=ctx@entry=0x1b3d000,
user_param=user_param@entry=0x1b3cd70) at
src/perftest_resources.c:1314
#3  0x000000000040585c in rdma_server_connect (ctx=0x1b3d000,
user_param=0x1b3cd70)
   at src/perftest_communication.c:1119
#4  0x0000000000405f53 in establish_connection
(comm=comm@entry=0x7ffe5f5ee7c0) at src/perftest_communication.c:1244
#5  0x0000000000402b37 in main (argc=<optimized out>, argv=<optimized
out>) at src/read_bw.c:110
(gdb) f 0
#0  __ibv_query_device (context=0x0, device_attr=0x7ffe5f5ee4b0) at
src/verbs.c:135
135             return context->ops.query_device(context, device_attr);
(gdb) list
130     }
131
132     int __ibv_query_device(struct ibv_context *context,
133                            struct ibv_device_attr *device_attr)
134     {
135             return context->ops.query_device(context, device_attr);
136     }
137     default_symver(__ibv_query_device, ibv_query_device);
138
139     int __ibv_query_port(struct ibv_context *context, uint8_t port_num,
(gdb) p context
$1 = (struct ibv_context *) 0x0


"Client"
----------
# ib_read_bw 192.168.13.13
---------------------------------------------------------------------------------------
                   RDMA_Read BW Test
Dual-port       : OFF          Device         : mlx5_0
Number of qps   : 1            Transport type : IB
Connection type : RC           Using SRQ      : OFF
TX depth        : 128
CQ Moderation   : 100
Mtu             : 1024[B]
Link type       : Ethernet
GID index       : 2
Outstand reads  : 16
rdma_cm QPs     : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x011a PSN 0xf7747b OUT 0x10 RKey 0x002797
VAddr 0x007fe5cccc5000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:13:14
remote address: LID 0000 QPN 0x011a PSN 0xa0e9fd OUT 0x10 RKey
0x00175e VAddr 0x007fc73fd6e000
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:13:13
---------------------------------------------------------------------------------------
#bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 1200.024000 != 2600.000000.
CPU Frequency is not max.
65536      1000             2728.79            2728.77            0.043660
---------------------------------------------------------------------------------------

# ib_read_bw -R 192.168.13.13
Unexpected CM event bl blka 8
Unable to perform rdma_client function
Unable to init the socket connection

# ib_read_bw -z 192.168.13.13
Unexpected CM event bl blka 8
Unable to perform rdma_client function
Unable to init the socket connection
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, May 16, 2017 at 2:50 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> I installed OFED 4.0-2.0.0.1 on a fresh snapshot with the stock kernel
> (3.10.0-514.16.1.el7.x86_64). I'm getting a segfault on the server
> side, but not on the client side. I don't see any debug packages in
> the OFED package to load the symbols.
>
> rping server:
>
> # gdb rping core.10405
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/rping...Reading symbols from
> /usr/bin/rping...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> [New LWP 10405]
> [New LWP 10408]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `rping -s'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f31883d45b4 in ibv_alloc_pd () from /usr/lib64/libibverbs.so.1
> Missing separate debuginfos, use: debuginfo-install
> librdmacm-utils-1.1.0mlnx-OFED.4.0.1.6.1.40200.x86_64
> (gdb) bt
> #0  0x00007f31883d45b4 in ibv_alloc_pd () from /usr/lib64/libibverbs.so.1
> #1  0x0000000000402fe6 in rping_setup_qp.isra.7 ()
> #2  0x0000000000401d04 in main ()
> (gdb) list
> No symbol table is loaded.  Use the "file" command.
>
> rping client:
>
> # rping -c -a 192.168.13.13
> cma event RDMA_CM_EVENT_REJECTED, error 28
> wait for CONNECTED state 4
> connect error -1
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, May 16, 2017 at 1:23 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>> This is using ConnectX-4 LX RoCE cards, using only in-box drivers.
>>
>> While trying to debug some iSER issues, I'm trying to do rping between
>> the two hosts, but I'm getting a segfault. Sagi suggested that there
>> may be something wrong with my kernel ABI. I did a make mrproper and
>> built the latest 4.9.28 kernel and installed the kernel headers.
>>
>> make -j 32 && sudo make modules_install && sudo make install && sudo
>> make headers_install INSTALL_HDR_PATH=/usr
>>
>> After booting into the new kernel, I kept getting the segfaults, so I
>> rebuilt the libibverbs, libibumad, librdmacm packages in case they
>> aren't picking up the new kernel headers. Still no luck.
>>
>> Here is the server of rping with the rebuilt packages:
>> # gdb rping core.22936
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/bin/rping...Reading symbols from
>> /usr/lib/debug/usr/bin/rping.debug...done.
>> done.
>> [New LWP 22936]
>> [New LWP 22939]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `rping -s'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  __ibv_alloc_pd (context=0x0) at src/verbs.c:196
>> 196             pd = context->ops.alloc_pd(context);
>> (gdb) bt
>> #0  __ibv_alloc_pd (context=0x0) at src/verbs.c:196
>> #1  0x000055f60331d5f6 in rping_setup_qp (cb=cb@entry=0x55f603d74780,
>> cm_id=<optimized out>) at examples/rping.c:519
>> #2  0x000055f60331be7e in rping_run_server (cb=0x55f603d74780) at
>> examples/rping.c:890
>> #3  main (argc=2, argv=0x7ffcd16aae88) at examples/rping.c:1268
>> (gdb) f 0
>> #0  __ibv_alloc_pd (context=0x0) at src/verbs.c:196
>> 196             pd = context->ops.alloc_pd(context);
>> (gdb) list
>> 191
>> 192     struct ibv_pd *__ibv_alloc_pd(struct ibv_context *context)
>> 193     {
>> 194             struct ibv_pd *pd;
>> 195
>> 196             pd = context->ops.alloc_pd(context);
>> 197             if (pd)
>> 198                     pd->context = context;
>> 199
>> 200             return pd;
>> (gdb) p context
>> $1 = (struct ibv_context *) 0x0
>>
>> Here is the rping client that does not have the rebuilt packages:
>> # gdb rping core.8253
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/bin/rping...Reading symbols from
>> /usr/lib/debug/usr/bin/rping.debug...done.
>> done.
>> [New LWP 8253]
>> [New LWP 8256]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `rping -c -a 192.168.13.13'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  __ibv_dereg_mr (mr=0x560e295e93b0) at src/verbs.c:299
>> 299             ret = mr->context->ops.dereg_mr(mr);
>> (gdb) bt
>> #0  __ibv_dereg_mr (mr=0x560e295e93b0) at src/verbs.c:299
>> #1  0x0000560e293cd917 in rping_free_buffers (cb=0x560e295e5780) at
>> examples/rping.c:470
>> #2  0x0000560e293cbf57 in rping_run_client (cb=<optimized out>) at
>> examples/rping.c:1111
>> #3  main (argc=<optimized out>, argv=<optimized out>) at examples/rping.c:1270
>> (gdb) f 9
>> #0  0x0000000000000000 in ?? ()
>> (gdb) f 0
>> #0  __ibv_dereg_mr (mr=0x560e295e93b0) at src/verbs.c:299
>> 299             ret = mr->context->ops.dereg_mr(mr);
>> (gdb) list
>> 294     {
>> 295             int ret;
>> 296             void *addr      = mr->addr;
>> 297             size_t length   = mr->length;
>> 298
>> 299             ret = mr->context->ops.dereg_mr(mr);
>> 300             if (!ret)
>> 301                     ibv_dofork_range(addr, length);
>> 302
>> 303             return ret;
>> (gdb) p mr
>> $1 = (struct ibv_mr *) 0x560e295e93b0
>> (gdb) p *mr
>> $2 = {context = 0x7fd423be5090, pd = 0x560e295e9960, addr =
>> 0x560e295e57e8, length = 16, handle = 0, lkey = 72829, rkey = 72829}
>> (gdb) p *mr->context
>> Cannot access memory at address 0x7fd423be5090
>>
>> Any ideas on what I'm doing wrong?
>>
>> Thanks,
>>
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux