mlx5 driver unload hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Hi,

  I'm currently running tests with a Connect-IB board under the current OFED-3.12 of the day:

  - compat:	407b205 compat: Add kthread support for kernels <= 2.6.35
  - compat-rdma: b2bda9f Fixed nfsrdma backport patch name
  - linux-3.12:	f9e9918 Prepare Linux tree for OFED 3.12

  
  the board is:

# mstflint -d mlx5_0 q

-W- Running quick query - Skipping full image integrity checks.

Image type:      FS3
FW Version:      10.10.2000
Device ID:       4113
Chip Revision:   0
Description:     UID                GuidsNumber  Step
Base GUID1:      f4521403000bf580        8        1
Base GUID2:      f4521403000bf588        8        1
Base MAC1:       0000f452140bf580        8        1
Base MAC2:       0000f452140bf588        8        1
Image VSD:       
Device VSD:      
PSID:            MT_1220110019

  When trying to restart the openibd service:

# service openibd restart

  here is what I get:

INFO: task rmmod:22654 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmmod         D 0000000000000001     0 22654  22653 0x00000000
 ffff88106f1b7b58 0000000000000082 0000000000000000 ffffffff81055f76
 ffff88106f1b7ae8 ffff88107b0bb500 ffff88106f1b7ae8 ffffffff810522fd
 ffff88107a8e9af8 ffff88106f1b7fd8 000000000000fb88 ffff88107a8e9af8
Call Trace:
 [<ffffffff81055f76>] ? enqueue_task+0x66/0x80
 [<ffffffff810522fd>] ? check_preempt_curr+0x6d/0x90
 [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
 [<ffffffff81096c96>] ? autoremove_wake_function+0x16/0x40
 [<ffffffff81051419>] ? __wake_up_common+0x59/0x90
 [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffff810912b1>] ? __queue_work+0x41/0x50
 [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
 [<ffffffffa05a3d18>] mlx5_cmd_exec+0x2d8/0x790 [mlx5_core]
 [<ffffffffa05a583e>] mlx5_cmd_teardown_hca+0x5e/0x90 [mlx5_core]
 [<ffffffffa05a10f9>] mlx5_dev_cleanup+0x69/0xe0 [mlx5_core]
 [<ffffffffa05da3c9>] remove_one+0x59/0x70 [mlx5_ib]
 [<ffffffff8129a047>] pci_device_remove+0x37/0x70
 [<ffffffff8135e8bf>] __device_release_driver+0x6f/0xe0
 [<ffffffff8135e9f8>] driver_detach+0xc8/0xd0
 [<ffffffff8135d7fe>] bus_remove_driver+0x8e/0x110
 [<ffffffff8135f1e2>] driver_unregister+0x62/0xa0
 [<ffffffff8129a354>] pci_unregister_driver+0x44/0xb0
 [<ffffffffa05e7349>] __exit_compat+0x15/0xbe [mlx5_ib]
 [<ffffffff810b4814>] sys_delete_module+0x194/0x260
 [<ffffffff8151311e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
0000:01:00.0:wait_func:618:(pid 22654): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource
0000:01:00.0:mlx5_reclaim_startup_pages:419:(pid 22654): FW did not return all pages. giving up...
0000:01:00.0:wait_func:618:(pid 22654): MLX5_CMD_OP_DISABLE_HCA(0x105) timeout. Will cause a leak of a command resource
Compat-rdma backport release: 435a602-c
Backport based on linux-3.12 385a572
compat.git: linux-3.12
mlx5_ib: Mellanox Connect-IB Infiniband driver v1.0 (June 2013)
mlx5_ib 0000:01:00.0: firmware version: 10.10.2000
0000:01:00.0:wait_func:618:(pid 25331): MLX5_CMD_OP_ENABLE_HCA(0x104) timeout. Will cause a leak of a command resource
mlx5_ib 0000:01:00.0: enable hca failed
mlx5_ib: probe of 0000:01:00.0 failed with error -110


  It looks like the driver fails to tear down the HCA, leaving the device in a completely
unstable state needing a reboot.

  This behaviour is fully reproductible, although it _may_ succeed once or twice right
after boot.

  Is this a FW problem, a driver problem?

  thanks,

  Sébastien.





--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux