Asking for help: Is it possible to force use rxe provider functions atop a Mellanox ConnectX-5 ethernet network?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I have access to a cloud-provided virtual machine with a "Mellanox
Technologies MT27800 Family [ConnectX-5 Virtual Function]" ethernet
controller.  It seems that I cannot use hardware RoCE on it (BTW, do
you know how to check this?), so I decide to use rxe.

However, a divide error occurs when I run my RDMA program:

trap divide error ip:7f8dd1802b8f sp:7ffc63e72a80 error:0 in libmlx5.so.1.12.28.0[7f8dd17eb000+46000]

The backtrace is:

(gdb) bt
#0  0x00007ffff63d3365 in __add_page (context=0x7ffff7f75010)
    at ~/src/rdma-core/providers/mlx5/dbrec.c:58
#1  0x00007ffff63d3587 in mlx5_alloc_dbrec (context=0x7ffff7f75010, pd=0x0, custom_alloc=0x5555557671e8)
    at ~/src/rdma-core/providers/mlx5/dbrec.c:119
#2  0x00007ffff6403320 in create_cq (context=0x7ffff7f75150, cq_attr=0x7fffffffe6d0, cq_alloc_flags=0, 
    mlx5cq_attr=0x0) at ~/src/rdma-core/providers/mlx5/verbs.c:1013
#3  0x00007ffff640388a in mlx5_create_cq (context=0x7ffff7f75150, cqe=3, channel=0x555555765ec0, comp_vector=0)
    at ~/src/rdma-core/providers/mlx5/verbs.c:1134
#4  0x00007ffff79acfb9 in __ibv_create_cq_1_1 (context=0x7ffff7f75150, cqe=3, 
    cq_context=0x55555575b440 <admin_qp>, channel=0x555555765ec0, comp_vector=0)
    at ~/src/rdma-core/libibverbs/verbs.c:520
#5  0x0000555555556e79 in fsr_process_admin_qp_cm_event_req (aqp=0x55555575b440 <admin_qp>, ev=0x555555765a50)
    at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:449
#6  0x0000555555557299 in fsr_process_admin_qp_cm_event (aqp=0x55555575b440 <admin_qp>, ec=0x55555575c410)
    at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:552
#7  0x00005555555573b7 in fsr_estab_admin_qp (aqp=0x55555575b440 <admin_qp>)
    at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:596
#8  0x0000555555557445 in fmsgserver_rdma_init () at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:619
#9  0x00005555555562cf in main (argc=1, argv=0x7fffffffe9b8) at driver.c:248

I find that the 'match_device' in rdma-core/libibverbs/init.c
will always match the mlx5 provider instead of the rxe provider since the
the pci ID matches (see hca_table in rdma-core/providers/mlx5/mlx5.c).
This leads to mlx5_xxx functions are invoked instead of the rxe_xxx functions.

Do you know how to force use the rxe-provider functions on top of the
mellanox connectX-5 ethernet network?  Currently as a workaround, I
comment out the mlx5 subdirectories in the CMakeLists.txt so that mlx5
won't be tried in 'try_all_drivers'.

Best Regards,
Fan



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux