On 9/10/2024 7:34 PM, Andrew Halaney wrote: > Hey Suraj, > > Your email client didn't seem to quote my response in your latest reply, > so its difficult to figure out what you're writing vs me below. It also > seems to have messed with the line breaks so I'm manually redoing those. > > Please see if you can figure out how to make that happen for further > replies! > > More comments below... > > On Tue, Sep 10, 2024 at 12:47:08PM GMT, Suraj Jaiswal wrote: >> >> >> -----Original Message----- >> From: Andrew Halaney <ahalaney@xxxxxxxxxx> >> Sent: Wednesday, September 4, 2024 3:47 AM >> To: Suraj Jaiswal (QUIC) <quic_jsuraj@xxxxxxxxxxx> >> Cc: Vinod Koul <vkoul@xxxxxxxxxx>; bhupesh.sharma@xxxxxxxxxx; Andy Gross <agross@xxxxxxxxxx>; Bjorn Andersson <andersson@xxxxxxxxxx>; Konrad Dybcio <konrad.dybcio@xxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Rob Herring <robh+dt@xxxxxxxxxx>; Krzysztof Kozlowski <krzysztof.kozlowski+dt@xxxxxxxxxx>; Conor Dooley <conor+dt@xxxxxxxxxx>; Alexandre Torgue <alexandre.torgue@xxxxxxxxxxx>; Jose Abreu <joabreu@xxxxxxxxxxxx>; Maxime Coquelin <mcoquelin.stm32@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-arm-msm@xxxxxxxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-stm32@xxxxxxxxxxxxxxxxxxxxxxxxxxxx; Prasad Sodagudi <psodagud@xxxxxxxxxxx>; Rob Herring <robh@xxxxxxxxxx>; kernel <kernel@xxxxxxxxxxx> >> Subject: Re: [PATCH net] net: stmmac: Stop using a single dma_map() for multiple descriptors >> >> WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. >> >> On Mon, Sep 02, 2024 at 03:24:36PM GMT, Suraj Jaiswal wrote: >>> Currently same page address is shared >>> between multiple buffer addresses and causing smmu fault for other >>> descriptor if address hold by one descriptor got cleaned. >>> Allocate separate buffer address for each descriptor for TSO path so >>> that if one descriptor cleared it should not clean other descriptor >>> address. > > snip... > >>> >>> static void stmmac_flush_tx_descriptors(struct stmmac_priv *priv, int >>> queue) @@ -4351,25 +4380,17 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) >>> pay_len = 0; >>> } >>> >>> - stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); >>> + if (stmmac_tso_allocator(priv, (skb->data + proto_hdr_len), >>> + tmp_pay_len, nfrags == 0, queue, false)) >>> + goto dma_map_err; >> >> Changing the second argument here is subtly changing the dma_cap.addr64 <= 32 >> case right before this. Is that intentional? >> >> i.e., prior, pretend des = 0 (side note but des is a very confusing variable >> name for "dma address" when there's also mentions of desc meaning "descriptor" >> in the DMA ring). In the <= 32 case, we'd call stmmac_tso_allocator(priv, 0) >> and in the else case we'd call stmmac_tso_allocator(priv, 0 + proto_hdr_len). >> >> With this change in both cases its called with the (not-yet-dma-mapped) >> skb->data + proto_hdr_len always (i.e. like the else case). >> >> Honestly, the <= 32 case reads weird to me without this patch. It seems some >> of the buffer is filled but des is not properly incremented? >> >> I don't know how this hardware is supposed to be programmed (no databook >> access) but that seems fishy (and like a separate bug, which would be nice to >> squash if so in its own patch). Would you be able to explain the logic there >> to me if it does make sense to you? >> > >> <Suraj> des can not be 0 . des 0 means dma_map_single() failed and it will return. >> If we see if des calculation (first->des1 = cpu_to_le32(des + proto_hdr_len);) >> and else case des calculator ( des += proto_hdr_len;) it is adding proto_hdr_len >> to the memory that we after mapping skb->data using dma_map_single. >> Same way we added proto_hdr_len in second argument . > > > 0 was just an example, and a confusing one, sorry. Let me paste the original > fishy code that I think you've modified the behavior for. Here's the > original: > > if (priv->dma_cap.addr64 <= 32) { > first->des0 = cpu_to_le32(des); > > /* Fill start of payload in buff2 of first descriptor */ > if (pay_len) > first->des1 = cpu_to_le32(des + proto_hdr_len); > > /* If needed take extra descriptors to fill the remaining payload */ > tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE; > } else { > stmmac_set_desc_addr(priv, first, des); > tmp_pay_len = pay_len; > des += proto_hdr_len; > pay_len = 0; > } > > stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); > > Imagine the <= 32 case. Let's say des is address 0 (just for simplicity > sake, let's assume that's valid). That means: > > first->des0 = des; > first->des1 = des + proto_hdr_len; > stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue) > > if des is 0, proto_hdr_len is 64, then that means > > first->des0 = 0 > first->des1 = 64 > stmmac_tso_allocator(priv, 0, tmp_pay_len, (nfrags == 0), queue) > > That seems fishy to me. We setup up the first descriptor with the > beginning of des, and then the code goes and sets up more descriptors > (stmmac_tso_allocator()) starting with the same des again? tso_alloc is checking if more descriptor needed for packet . it is adding offset to get next descriptor (curr_addr = des + (total_len - tmp_len)) and storing in des of next descriptor. > > Should we be adding the payload length (TSO_MAX_BUFF_SIZE I suppose > based on the tmp_pay_len = pay_len - TSO_MAX_BUFFSIZE above)? It seems > that <= 32 results in duplicate data in both the "first" descriptor > programmed above, and in the "second" descriptor programmed in > stmmac_tso_allocator(). curr_addr = des + (total_len - tmp_len) is used in while loop in tso_alloc to get address of all required descriptor . descriptor address will be updated finally in tso_alloc by below call . if (priv->dma_cap.addr64 <= 32) desc->des0 = cpu_to_le32(curr_addr); else stmmac_set_desc_addr(priv, desc, curr_addr); Also, since tmp_pay_len is decremented, but des > isn't, it seems that stmmac_tso_allocator() would not put all of the > buffer in the descriptors and would leave the last TSO_MAX_BUFF_SIZE > bytes out? > > I highlight all of this because with your change here we get the > following now in the <= 32 case: > > first->des0 = des > first->des1 = des + proto_hdr_len > stmmac_tso_allocator(priv, des + proto_hdr_len, ...) > > which is a subtle change in the call to stmmac_tso_allocator, meaning > a subtle change in the descriptor programming. > > Both seem wrong for the <= 32 case, but I'm "reading between the lines" > with how these descriptors are programmed (I don't have the docs to back > this up, I'm inferring from the code). It seems to me that in the <= 32 > case we should have: > > first->des0 = des > first->des1 = des + proto_hdr_len > stmmac_tso_allocator(priv, des + TSO_MAX_BUF_SIZE, ...) let me check <=32 case only on setup and get back. > > or similar depending on if that really makes sense with how des0/des1 is > used (the handling is different in stmmac_tso_allocator() for <= 32, > only des0 is used so I'm having a tough time figuring out how much of > the des is actually programmed in des0 + des1 above without knowing the > hardware better). > > Does that make sense? The prior code seems fishy to me, your change > seems to unintentionally change that fhsy part, but it still seems fishy > to me. I don't think you should be changing that code's behavior in that > patch, if you think it's right then we should continue with the current > behavior prior to your patch, and if you think its wrong we should > probably fix that *prior* to this patch in your series. > > Thanks, > Andrew >