On Tue, 10 Sep 2024 at 10:53:51 +0200, Gerd Bayer wrote: > Remove the erroneous unmap in case no DMA mapping was established > > The multi-packet WQE transmit code attempts to obtain a DMA mapping for > the skb. This could fail, e.g. under memory pressure, when the IOMMU > driver just can't allocate more memory for page tables. While the code > tries to handle this in the path below the err_unmap label it erroneously > unmaps one entry from the sq's FIFO list of active mappings. The fix looks valid to me, thanks! Acked-by: Maxim Mikityanskiy <maxtram95@xxxxxxxxx> > Since the > current map attempt failed this unmap is removing some random DMA mapping > that might still be required. If the PCI function now presents that IOVA, > the IOMMU may assumes a rogue DMA access and e.g. on s390 puts the PCI > function in error state. > > The erroneous behavior was seen in a stress-test environment that created > memory pressure. > > Fixes: 5af75c747e2a ("net/mlx5e: Enhanced TX MPWQE for SKBs") > Signed-off-by: Gerd Bayer <gbayer@xxxxxxxxxxxxx> > --- > While running some stress tests that put our system under memory pressure > we observed the following splat, eventually: > > [ 1350.038775] ------------[ cut here ]------------ > [ 1350.038776] WARNING: CPU: 36 PID: 37194 at arch/s390/include/asm/pci_dma.h:136 dma_update_cpu_trans+0x66/0x70 > [ 1350.038799] Modules linked in: macvtap macvlan vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink lcs ctcm fsm dasd_fba_mod mlx5_ib ib_uverbs ib_core mlx5_core > " > "mlxfw psample rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs tls dm_service_time 8021q garp mrp rfkill sd_mod t10_pi sg sunrpc zfcp scsi_transport_fc dm_multipath dm_mod vfio_ccw mdev vfio_iommu_type1 vfio eadm_sch iommufd kvm drm i2c_core drm_panel_orientation_quirks xfs libcrc32c qeth_l2 > " > " bridge stp llc ghash_s390 prng aes_s390 dasd_eckd_mod des_s390 libdes sha3_512_s390 qeth sha3_256_s390 dasd_mod ccwgroup qdio pkey zcrypt fuse > [ 1350.038880] CPU: 36 PID: 37194 Comm: vhost-37179 Kdump: loaded Tainted: G X ------- --- 5.14.0-427.20.1.el9_4.s390x #1 > [ 1350.038884] Hardware name: IBM 3931 A01 400 (LPAR) > [ 1350.038886] Krnl PSW : 0704f00180000000 00000056803d1eba (dma_update_cpu_trans+0x6a/0x70) > [ 1350.038890] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 > [ 1350.038893] Krnl GPRS: 0000000000000000 0000000589eff400 0000003be2b477b0 0000000000000000 > [ 1350.038895] 0000000000000400 0000000000001000 0000000000000400 ffffffbe8000a000 > [ 1350.038897] 0000000000000001 0000000086d6bc00 0000000000000001 000000417fff7000 > [ 1350.038900] 000000012d5baa00 0000000000000000 00000056803d1f3e 0000038016df75d8 > [ 1350.038957] Krnl Code: 00000056803d1eae: af000000 mc 0,0 > [ 1350.038963] 00000056803d1eb2: a7f4fff9 brc 15,00000056803d1ea4 > [ 1350.038963] #00000056803d1eb6: af000000 mc 0,0 > [ 1350.038970] >00000056803d1eba: a7f4ffd9 brc 15,00000056803d1e6c > [ 1350.038979] 00000056803d1ebe: 0707 bcr 0,%r7 > [ 1350.038983] 00000056803d1ec0: c004004b3334 brcl 0,0000005680d38528 > [ 1350.038983] 00000056803d1ec6: eb7ff0500024 stmg %r7,%r15,80(%r15) > [ 1350.038983] 00000056803d1ecc: b90400ef lgr %r14,%r15 > [ 1350.038994] Call Trace: > [ 1350.038995] [<00000056803d1eba>] dma_update_cpu_trans+0x6a/0x70 > [ 1350.038998] ([<00000056803d1f22>] __dma_update_trans+0x62/0x150) > [ 1350.039001] [<00000056803d2432>] s390_dma_unmap_pages+0x72/0x1c0 > [ 1350.039003] [<000000568047e70c>] dma_unmap_page_attrs+0x3c/0x190 > [ 1350.039008] [<000003ff807c5230>] mlx5e_sq_xmit_mpwqe+0x2b0/0x430 [mlx5_core] > [ 1350.039170] [<000003ff807c589e>] mlx5e_xmit+0x20e/0x5a0 [mlx5_core] > [ 1350.039246] [<0000005680aae326>] dev_hard_start_xmit+0xb6/0x210 > [ 1350.039252] [<0000005680b144d8>] sch_direct_xmit+0x88/0x420 > [ 1350.039256] [<0000005680aa9496>] __dev_xmit_skb+0x2c6/0x5c0 > [ 1350.039259] [<0000005680aae93e>] __dev_queue_xmit+0x36e/0x840 > [ 1350.039262] [<000003ff809e3b6a>] macvlan_start_xmit+0x6a/0x140 [macvlan] > [ 1350.039266] [<0000005680aae326>] dev_hard_start_xmit+0xb6/0x210 > [ 1350.039269] [<0000005680aaeae8>] __dev_queue_xmit+0x518/0x840 > [ 1350.039271] [<000003ff809b40f4>] tap_get_user_xdp.isra.0+0x134/0x300 [tap] > [ 1350.039274] [<000003ff809b4354>] tap_sendmsg+0x94/0xc0 [tap] > [ 1350.039277] [<000003ff809d4f06>] vhost_tx_batch.constprop.0+0x66/0x1a0 [vhost_net] > [ 1350.039281] [<000003ff809d6a5e>] handle_tx_copy+0x24e/0x340 [vhost_net] > [ 1350.039283] [<000003ff809d6c0c>] handle_tx+0xbc/0x100 [vhost_net] > [ 1350.039286] [<000003ff809bb6f2>] vhost_worker+0xa2/0x100 [vhost] > [ 1350.039294] [<000000568040be98>] kthread+0x108/0x110 > [ 1350.039299] [<000000568038afdc>] __ret_from_fork+0x3c/0x60 > [ 1350.039302] [<0000005680d2e89a>] ret_from_fork+0xa/0x40 > [ 1350.039307] Last Breaking-Event-Address: > [ 1350.039308] [<00000056803d1e68>] dma_update_cpu_trans+0x18/0x70 > [ 1350.039310] ---[ end trace a581115ebebd62f3 ]--- > > And here the IOMMU complains about the "rogue DMA attempt": > [ 1350.043079] zpci: 0037:00:00.0: Event 0x7 reports an error for PCI function 0x3932 > > With some instrumentation in mlx5e_sq_xmit_mpwqe() to mimic a failure > to DMA map every 1000th buffer, I was able to reproduce this with recent > upstream code, too. I think the error handling of that routine has a bug > as it DMA unmaps a buffer/IOVA that might be used, still. > --- > drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > index b09e9abd39f3..f8c7912abe0e 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > @@ -642,7 +642,6 @@ mlx5e_sq_xmit_mpwqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, > return; > > err_unmap: > - mlx5e_dma_unmap_wqe_err(sq, 1); > sq->stats->dropped++; > dev_kfree_skb_any(skb); > mlx5e_tx_flush(sq); > > --- > base-commit: 8d53a5170c8677af9b3fbd9d0b75ae120fdefba2 > change-id: 20240909-fix-mlx5_dma_unmap-e2a12e26e929 > > Best regards, > -- > Gerd Bayer <gbayer@xxxxxxxxxxxxx> >