Remove the erroneous unmap in case no DMA mapping was established The multi-packet WQE transmit code attempts to obtain a DMA mapping for the skb. This could fail, e.g. under memory pressure, when the IOMMU driver just can't allocate more memory for page tables. While the code tries to handle this in the path below the err_unmap label it erroneously unmaps one entry from the sq's FIFO list of active mappings. Since the current map attempt failed this unmap is removing some random DMA mapping that might still be required. If the PCI function now presents that IOVA, the IOMMU may assumes a rogue DMA access and e.g. on s390 puts the PCI function in error state. The erroneous behavior was seen in a stress-test environment that created memory pressure. Fixes: 5af75c747e2a ("net/mlx5e: Enhanced TX MPWQE for SKBs") Signed-off-by: Gerd Bayer <gbayer@xxxxxxxxxxxxx> --- While running some stress tests that put our system under memory pressure we observed the following splat, eventually: [ 1350.038775] ------------[ cut here ]------------ [ 1350.038776] WARNING: CPU: 36 PID: 37194 at arch/s390/include/asm/pci_dma.h:136 dma_update_cpu_trans+0x66/0x70 [ 1350.038799] Modules linked in: macvtap macvlan vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink lcs ctcm fsm dasd_fba_mod mlx5_ib ib_uverbs ib_core mlx5_core " "mlxfw psample rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs tls dm_service_time 8021q garp mrp rfkill sd_mod t10_pi sg sunrpc zfcp scsi_transport_fc dm_multipath dm_mod vfio_ccw mdev vfio_iommu_type1 vfio eadm_sch iommufd kvm drm i2c_core drm_panel_orientation_quirks xfs libcrc32c qeth_l2 " " bridge stp llc ghash_s390 prng aes_s390 dasd_eckd_mod des_s390 libdes sha3_512_s390 qeth sha3_256_s390 dasd_mod ccwgroup qdio pkey zcrypt fuse [ 1350.038880] CPU: 36 PID: 37194 Comm: vhost-37179 Kdump: loaded Tainted: G X ------- --- 5.14.0-427.20.1.el9_4.s390x #1 [ 1350.038884] Hardware name: IBM 3931 A01 400 (LPAR) [ 1350.038886] Krnl PSW : 0704f00180000000 00000056803d1eba (dma_update_cpu_trans+0x6a/0x70) [ 1350.038890] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 1350.038893] Krnl GPRS: 0000000000000000 0000000589eff400 0000003be2b477b0 0000000000000000 [ 1350.038895] 0000000000000400 0000000000001000 0000000000000400 ffffffbe8000a000 [ 1350.038897] 0000000000000001 0000000086d6bc00 0000000000000001 000000417fff7000 [ 1350.038900] 000000012d5baa00 0000000000000000 00000056803d1f3e 0000038016df75d8 [ 1350.038957] Krnl Code: 00000056803d1eae: af000000 mc 0,0 [ 1350.038963] 00000056803d1eb2: a7f4fff9 brc 15,00000056803d1ea4 [ 1350.038963] #00000056803d1eb6: af000000 mc 0,0 [ 1350.038970] >00000056803d1eba: a7f4ffd9 brc 15,00000056803d1e6c [ 1350.038979] 00000056803d1ebe: 0707 bcr 0,%r7 [ 1350.038983] 00000056803d1ec0: c004004b3334 brcl 0,0000005680d38528 [ 1350.038983] 00000056803d1ec6: eb7ff0500024 stmg %r7,%r15,80(%r15) [ 1350.038983] 00000056803d1ecc: b90400ef lgr %r14,%r15 [ 1350.038994] Call Trace: [ 1350.038995] [<00000056803d1eba>] dma_update_cpu_trans+0x6a/0x70 [ 1350.038998] ([<00000056803d1f22>] __dma_update_trans+0x62/0x150) [ 1350.039001] [<00000056803d2432>] s390_dma_unmap_pages+0x72/0x1c0 [ 1350.039003] [<000000568047e70c>] dma_unmap_page_attrs+0x3c/0x190 [ 1350.039008] [<000003ff807c5230>] mlx5e_sq_xmit_mpwqe+0x2b0/0x430 [mlx5_core] [ 1350.039170] [<000003ff807c589e>] mlx5e_xmit+0x20e/0x5a0 [mlx5_core] [ 1350.039246] [<0000005680aae326>] dev_hard_start_xmit+0xb6/0x210 [ 1350.039252] [<0000005680b144d8>] sch_direct_xmit+0x88/0x420 [ 1350.039256] [<0000005680aa9496>] __dev_xmit_skb+0x2c6/0x5c0 [ 1350.039259] [<0000005680aae93e>] __dev_queue_xmit+0x36e/0x840 [ 1350.039262] [<000003ff809e3b6a>] macvlan_start_xmit+0x6a/0x140 [macvlan] [ 1350.039266] [<0000005680aae326>] dev_hard_start_xmit+0xb6/0x210 [ 1350.039269] [<0000005680aaeae8>] __dev_queue_xmit+0x518/0x840 [ 1350.039271] [<000003ff809b40f4>] tap_get_user_xdp.isra.0+0x134/0x300 [tap] [ 1350.039274] [<000003ff809b4354>] tap_sendmsg+0x94/0xc0 [tap] [ 1350.039277] [<000003ff809d4f06>] vhost_tx_batch.constprop.0+0x66/0x1a0 [vhost_net] [ 1350.039281] [<000003ff809d6a5e>] handle_tx_copy+0x24e/0x340 [vhost_net] [ 1350.039283] [<000003ff809d6c0c>] handle_tx+0xbc/0x100 [vhost_net] [ 1350.039286] [<000003ff809bb6f2>] vhost_worker+0xa2/0x100 [vhost] [ 1350.039294] [<000000568040be98>] kthread+0x108/0x110 [ 1350.039299] [<000000568038afdc>] __ret_from_fork+0x3c/0x60 [ 1350.039302] [<0000005680d2e89a>] ret_from_fork+0xa/0x40 [ 1350.039307] Last Breaking-Event-Address: [ 1350.039308] [<00000056803d1e68>] dma_update_cpu_trans+0x18/0x70 [ 1350.039310] ---[ end trace a581115ebebd62f3 ]--- And here the IOMMU complains about the "rogue DMA attempt": [ 1350.043079] zpci: 0037:00:00.0: Event 0x7 reports an error for PCI function 0x3932 With some instrumentation in mlx5e_sq_xmit_mpwqe() to mimic a failure to DMA map every 1000th buffer, I was able to reproduce this with recent upstream code, too. I think the error handling of that routine has a bug as it DMA unmaps a buffer/IOVA that might be used, still. --- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index b09e9abd39f3..f8c7912abe0e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -642,7 +642,6 @@ mlx5e_sq_xmit_mpwqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, return; err_unmap: - mlx5e_dma_unmap_wqe_err(sq, 1); sq->stats->dropped++; dev_kfree_skb_any(skb); mlx5e_tx_flush(sq); --- base-commit: 8d53a5170c8677af9b3fbd9d0b75ae120fdefba2 change-id: 20240909-fix-mlx5_dma_unmap-e2a12e26e929 Best regards, -- Gerd Bayer <gbayer@xxxxxxxxxxxxx>