Thanks for the report. + more people. On Fri, Apr 30, 2021 at 04:56:17PM -0400, Dennis Afanasev wrote: > Dear Saeed and Leo, > I am reporting a bug in the mlx5_core driver discovered by our team at > Stateless while setting up SRIOV devices in eswitch mode. Below are the > details and relevant files that relate to the bug. Please reach out to me > if I can provide any further information. > > 1. > > Description of problem: When creating SRIOV devices off physical mlx5 > PCIe devices and then putting the physical devices into switchdev mode, > adding a new VRF device with a default route will cause the mlx5_core > driver to segfault (replicate_bug1.sh). In addition, attempting to set the > physical devices to switchdev mode after adding a VRF with a default route > will cause the mlx5_core driver to segfault (replicate_bug2.sh). The seg > fault occurs in the function mlx5e_tc_tun_fib_event in both cases. > 2. > > Keywords: mlx5, ml5x_core, mlx5e_tc_tun_fib_event, tc, netdev, 5.12-rc7 > 3. > > Kernel information: Linux version 5.12.0-rc7 (root@data) (gcc (Debian > 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP > 4. > > Kernel config file: File attached - config-5.12.0-rc7 > 5. > > Oops message: Files attached - dmesg_output_bug1 and dmesg_output_bug2 > 6. > > Shell script to replicate: Files attached - replicate_bug1.sh and > replicate_bug2.sh > 7. > > ver_linux output: File attached - ver_linux_output > 8. > > Processor information: File attached - cpuinfo > 9. > > Module information: File attached - modules > 10. > > Loaded driver and hardware: Files attached - ioport and iomem > 11. > > PCI information: File attached - pci_info > 12. > > Other information - I hardcoded the values of the physical PCIe device > and the address of the created SRIOV device. This will have to be adjusted > depending on your machine. > #!/bin/bash > > set -euxETo pipefail > > mst start > > # (Hardcoded) These need to be modified based on the host machine > nic1_port0="0000:5e:00.0" > nic1_port1="0000:5e:00.1" > > # Create 1 SRIOV device per NIC port > echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port0/sriov_numvfs > echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port1/sriov_numvfs > > # The SRIOV devices are given these addresses > nic1_port0_vf="0000:5e:00.2" > nic1_port1_vf="0000:5e:00.4" > > declare -ar PCIE_PHYSICAL_ADDRESSES=($nic1_port0 $nic1_port1) > declare -ar PCIE_SRIOV_ADDRESSES=($nic1_port0_vf $nic1_port1_vf) > > # Unbind the driver from the SRIOV, required to activate the eswitch > for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do > echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/unbind > done > > # Wait for the binds to disappear > for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do > declare sys_symlink_file="/sys/bus/pci/drivers/mlx5_core/${pcie_address}" > until [[ ! -h "${sys_symlink_file}" ]]; do > inotifywait --event delete_self --timeout 1 "${sys_symlink_file}" || true > done > done > sync --file-system /sys > udevadm settle --timeout=30 > sleep 5 > > # Set the cards to 'switchdev' > for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do > devlink dev eswitch set "pci/${pcie_address}" mode switchdev encap-mode basic > done > > # Wait for the cards to be in switchdev mode > for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do > until [[ "$(devlink -j dev eswitch show "pci/${pcie_address}" | > jq --arg dev "pci/${pcie_address}" -r '.dev[$dev].mode' 2> /dev/null)" == "switchdev" ]]; do > sleep 1 > done > done > sync --file-system /sys > udevadm settle --timeout=30 > sleep 5 > > for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do > echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/bind > done > > ip link set group default up > ip link add vrf0 type vrf table 100 > > # This will crash the kernel > ip route add table 100 unreachable default > #!/bin/bash > > set -euxETo pipefail > > mst start > > # Add the VRF device and a route > ip link add vrf0 type vrf table 100 > ip route add table 100 unreachable default > > # (Hardcoded) These need to be modified based on the host machine > nic1_port0="0000:5e:00.0" > nic1_port1="0000:5e:00.1" > > # Create 1 SRIOV device per NIC port > echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port0/sriov_numvfs > echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port1/sriov_numvfs > > # The SRIOV devices are given these addresses > nic1_port0_vf="0000:5e:00.2" > nic1_port1_vf="0000:5e:00.4" > > declare -ar PCIE_PHYSICAL_ADDRESSES=($nic1_port0 $nic1_port1) > declare -ar PCIE_SRIOV_ADDRESSES=($nic1_port0_vf $nic1_port1_vf) > > # Unbind the driver from the SRIOV, required to activate the eswitch > for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do > echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/unbind > done > > # Wait for the binds to disappear > for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do > declare sys_symlink_file="/sys/bus/pci/drivers/mlx5_core/${pcie_address}" > until [[ ! -h "${sys_symlink_file}" ]]; do > inotifywait --event delete_self --timeout 1 "${sys_symlink_file}" || true > done > done > sync --file-system /sys > udevadm settle --timeout=30 > > # set the cards to 'switchdev' > for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do > # This will crash the kernel > devlink dev eswitch set "pci/${pcie_address}" mode switchdev encap-mode basic > done