RE: mlx5 endpoint driver problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Joao,

Since mlx5 supported devices can do DMA with 64 bit addresses we start like this. This fails in your system since it is not capable of handling 64 bit addresses so we fall back to 32 bit addresses which then succeed. However what you are experiencing is the driver executed a command and firmware supposedly does not respond. Most likely the firmware responded but the driver could not see it due to problems related to dma addresses in your system.

Long story short, there is a problem in your system. To investigate this further you might need heavy tools such as pcie analyzer.

-----Original Message-----
From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-owner@xxxxxxxxxxxxxxx] On Behalf Of Joao Pinto
Sent: Tuesday, May 9, 2017 12:13 PM
To: linux-rdma@xxxxxxxxxxxxxxx
Subject: mlx5 endpoint driver problem


Hello,

I am making tests with a Mellanox MLX5 Endpoint, and I am getting kernel hangs when trying to enable the hca:

mlx5_core 0000:01:00.0: enabling device (0000 -> 0002) mlx5_core 0000:01:00.0: Warning: couldn't set 64-bit PCI DMA mask mlx5_core 0000:01:00.0: Warning: couldn't set 64-bit consistent PCI DMA mask mlx5_core 0000:01:00.0: firmware version: 16.19.21102
INFO: task swapper:1 blocked for more than 10 seconds.
      Not tainted 4.11.0-BETAMSIX1 #51
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swapper         D    0     1      0 0x00000000

Stack Trace:
  __switch_to+0x0/0x94
  __schedule+0x1da/0x8b0
  schedule+0x26/0x6c
  schedule_timeout+0x2da/0x380
  wait_for_completion+0x92/0x104
  mlx5_cmd_exec+0x70e/0xd60
  mlx5_load_one+0x1b4/0xad8
  init_one+0x404/0x600
  pci_device_probe+0x122/0x1f0
  really_probe+0x1ac/0x348
  __driver_attach+0xa8/0xd0
  bus_for_each_dev+0x3c/0x74
  bus_add_driver+0xc2/0x184
  driver_register+0x50/0xec
  init+0x40/0x60

(...)

Stack Trace:
  __switch_to+0x0/0x94
  __schedule+0x1da/0x8b0
  schedule+0x26/0x6c
  schedule_timeout+0x2da/0x380
  wait_for_completion+0x92/0x104
  mlx5_cmd_exec+0x70e/0xd60
  mlx5_load_one+0x1b4/0xad8
  init_one+0x404/0x600
  pci_device_probe+0x122/0x1f0
  really_probe+0x1ac/0x348
  __driver_attach+0xa8/0xd0
  bus_for_each_dev+0x3c/0x74
  bus_add_driver+0xc2/0x184
  driver_register+0x50/0xec
  init+0x40/0x60
mlx5_core 0000:01:00.0: wait_func:882:(pid 1): ENABLE_HCA(0x104) timeout. Will cause a leak of a command resource mlx5_core 0000:01:00.0: enable hca failed mlx5_core 0000:01:00.0: mlx5_load_one failed with error code -110
mlx5_core: probe of 0000:01:00.0 failed with error -110

Could you give me a clue of what might be happennig?

Thanks,
Joao
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux