On Mon, Apr 22, 2024 at 10:09:15AM +0000, Ewen Chan wrote: > To Whom It May Concern: > > I am using a few Mellanox ConnectX-4 100 Gbps Infiniband NIC that's connected together via a Mellanox MSB7890 externally managed switch. > > I have a dual Xeon E5-2697A v4, Proxmox 7.4-17 (Debian 11) server that's running opensm, along with two AMD Ryzen 5950X compute nodes, that also have the ConnectX-4 in them, running Proxmox 7.4-17 as well. > > I have enabled SR-IOV on all three systems, and all three systems have 8 virtual functions for said ConnectX-4. > > I read in the Nvidia/Mellanox documentation that I would need to add the parameter "virt_enabled 2" to /etc/opensm/opensm.conf so that the OpenSM subnet manager will know that virtual functions are enabled, but it would appear that the opensm that ships with Debian 11/linux-rdma, either ignores that option or doesn't know what to do with it. > > I would prefer NOT to install the MLNX_OFED drivers for Debian (11) if I can avoid it. > > My two questions are how do I get the linux opensm to: > > Recognise that I am using virtual functions (so that it would understand that there are multiple traffic streams coming over the wire, via one physical port)? > > Automatically assign the Node GUID and Port GUID so that I don't have to set those manually. > > (I've set the Node GUID and Port GUID on the my Ryzen compute node host already, and I can see the Node GUID and Port GUID inside my CentOS 7.7.1908 VM (which I've updated to use the 5.4.247 kernel), but it is still showing "Port 1, State: Down".) > > > Your help is greatly appreciated. Linux opensm is not supporting ConnectX4+ SRIOV. You can install SM RPM from NVIDIA Web to enable ConnectX4 SRIOV. https://network.nvidia.com/products/adapter-software/infiniband-management-and-monitoring-tools/ Thanks > > Thank you. >