[BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I noticed on our dual-port 100G ConnectX-4 cards (MT27700 Family) running Linux Kernel version 6.6.56 and the latest ConnectX-4 firmware version 12.28.2302 that we do not have a sysfs hwmon entry for reading temperature values. When running Kernel version 6.6.32, the hwmon entry is there again, and I can read the temperature values of those cards. Strangely, this problem doesn't occur on our ConnectX-4 Lx cards (MT27710 Family), regardless of which Kernel version I use.

I looked into the mlx5 core driver and noticed that it is checking the MCAM register here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c?h=v6.6.56#n380.
When I removed that check, the hwmon entry reappeared again.

Looking into recent mlx5 commits regarding this MCAM register, I found this commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.56&id=fb035aa9a3f8fd327ab83b15a94929d2b9045995.
When I reverted this commit, the hwmon entry also reappeared again.

I also found a firmware bug fix regarding that register inside the ConnectX-4 Lx bug fix history here (Ref. 2339971): https://docs.nvidia.com/networking/display/connectx4lxfirmwarev14321900/bug+fixes+history. I couldn't find such a firmware fix for the non-Lx ConnectX-4 cards. So, I'm unsure whether this might be a mlx5 driver or firmware issue.

Kind regards
Til




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux