On Tue, Sep 24, 2024 at 04:36:59PM GMT, Sarosh Hasan wrote: > > > On 9/10/2024 7:34 PM, Andrew Halaney wrote: > > Hey Suraj, > > > > Your email client didn't seem to quote my response in your latest reply, > > so its difficult to figure out what you're writing vs me below. It also > > seems to have messed with the line breaks so I'm manually redoing those. > > > > Please see if you can figure out how to make that happen for further > > replies! > > > > More comments below... > > > > On Tue, Sep 10, 2024 at 12:47:08PM GMT, Suraj Jaiswal wrote: > >> > >> > >> -----Original Message----- > >> From: Andrew Halaney <ahalaney@xxxxxxxxxx> > >> Sent: Wednesday, September 4, 2024 3:47 AM > >> To: Suraj Jaiswal (QUIC) <quic_jsuraj@xxxxxxxxxxx> > >> Cc: Vinod Koul <vkoul@xxxxxxxxxx>; bhupesh.sharma@xxxxxxxxxx; Andy Gross <agross@xxxxxxxxxx>; Bjorn Andersson <andersson@xxxxxxxxxx>; Konrad Dybcio <konrad.dybcio@xxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Rob Herring <robh+dt@xxxxxxxxxx>; Krzysztof Kozlowski <krzysztof.kozlowski+dt@xxxxxxxxxx>; Conor Dooley <conor+dt@xxxxxxxxxx>; Alexandre Torgue <alexandre.torgue@xxxxxxxxxxx>; Jose Abreu <joabreu@xxxxxxxxxxxx>; Maxime Coquelin <mcoquelin.stm32@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-arm-msm@xxxxxxxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-stm32@xxxxxxxxxxxxxxxxxxxxxxxxxxxx; Prasad Sodagudi <psodagud@xxxxxxxxxxx>; Rob Herring <robh@xxxxxxxxxx>; kernel <kernel@xxxxxxxxxxx> > >> Subject: Re: [PATCH net] net: stmmac: Stop using a single dma_map() for multiple descriptors > >> > >> WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. > >> > >> On Mon, Sep 02, 2024 at 03:24:36PM GMT, Suraj Jaiswal wrote: > >>> Currently same page address is shared > >>> between multiple buffer addresses and causing smmu fault for other > >>> descriptor if address hold by one descriptor got cleaned. > >>> Allocate separate buffer address for each descriptor for TSO path so > >>> that if one descriptor cleared it should not clean other descriptor > >>> address. > > > > snip... > > > >>> > >>> static void stmmac_flush_tx_descriptors(struct stmmac_priv *priv, int > >>> queue) @@ -4351,25 +4380,17 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) > >>> pay_len = 0; > >>> } > >>> > >>> - stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); > >>> + if (stmmac_tso_allocator(priv, (skb->data + proto_hdr_len), > >>> + tmp_pay_len, nfrags == 0, queue, false)) > >>> + goto dma_map_err; > >> > >> Changing the second argument here is subtly changing the dma_cap.addr64 <= 32 > >> case right before this. Is that intentional? > >> > >> i.e., prior, pretend des = 0 (side note but des is a very confusing variable > >> name for "dma address" when there's also mentions of desc meaning "descriptor" > >> in the DMA ring). In the <= 32 case, we'd call stmmac_tso_allocator(priv, 0) > >> and in the else case we'd call stmmac_tso_allocator(priv, 0 + proto_hdr_len). > >> > >> With this change in both cases its called with the (not-yet-dma-mapped) > >> skb->data + proto_hdr_len always (i.e. like the else case). > >> > >> Honestly, the <= 32 case reads weird to me without this patch. It seems some > >> of the buffer is filled but des is not properly incremented? > >> > >> I don't know how this hardware is supposed to be programmed (no databook > >> access) but that seems fishy (and like a separate bug, which would be nice to > >> squash if so in its own patch). Would you be able to explain the logic there > >> to me if it does make sense to you? > >> > > > >> <Suraj> des can not be 0 . des 0 means dma_map_single() failed and it will return. > >> If we see if des calculation (first->des1 = cpu_to_le32(des + proto_hdr_len);) > >> and else case des calculator ( des += proto_hdr_len;) it is adding proto_hdr_len > >> to the memory that we after mapping skb->data using dma_map_single. > >> Same way we added proto_hdr_len in second argument . > > > > > > 0 was just an example, and a confusing one, sorry. Let me paste the original > > fishy code that I think you've modified the behavior for. Here's the > > original: > > > > if (priv->dma_cap.addr64 <= 32) { > > first->des0 = cpu_to_le32(des); > > > > /* Fill start of payload in buff2 of first descriptor */ > > if (pay_len) > > first->des1 = cpu_to_le32(des + proto_hdr_len); > > > > /* If needed take extra descriptors to fill the remaining payload */ > > tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE; > > } else { > > stmmac_set_desc_addr(priv, first, des); > > tmp_pay_len = pay_len; > > des += proto_hdr_len; > > pay_len = 0; > > } > > > > stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); > > > > Imagine the <= 32 case. Let's say des is address 0 (just for simplicity > > sake, let's assume that's valid). That means: > > > > first->des0 = des; > > first->des1 = des + proto_hdr_len; > > stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue) > > > > if des is 0, proto_hdr_len is 64, then that means > > > > first->des0 = 0 > > first->des1 = 64 > > stmmac_tso_allocator(priv, 0, tmp_pay_len, (nfrags == 0), queue) > > > > That seems fishy to me. We setup up the first descriptor with the > > beginning of des, and then the code goes and sets up more descriptors > > (stmmac_tso_allocator()) starting with the same des again? > tso_alloc is checking if more descriptor needed for packet . it is adding offset to get next > descriptor (curr_addr = des + (total_len - tmp_len)) and storing in des of next descriptor. Yes, so in stmmac_tso_allocator() we currently have: static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, int total_len, bool last_segment, u32 queue) { struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue]; struct dma_desc *desc; u32 buff_size; int tmp_len; tmp_len = total_len; while (tmp_len > 0) { dma_addr_t curr_addr; tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, priv->dma_conf.dma_tx_size); ... curr_addr = des + (total_len - tmp_len); if (priv->dma_cap.addr64 <= 32) desc->des0 = cpu_to_le32(curr_addr); so on the first loop you've got: tmp_len = total_len ... curr_addr = des + total_len - temp_len i.e. curr_addr = des meaning with the "first" handling I've highlighted we've got first->des0 = des "next"->des0 = des where "next" is the cur_tx descriptor in the first loop of stmmac_tso_allocator() (essentially the second descriptor). That seems broken to me, and was that way prior to this patch. You've modified the behavior in this patch unintentionally. I think it needs modifying, but it should be done so explicitly in its own patch prior to this one. I also think the current modification in this patch isn't a fix. See prior reply below where I highlighted the programming as I understand it with this patch applied, which would result in something like. first->des0 = des first->des1 = des + proto_hdr_len "next"->des0 = des + proto_hdr_len Which again seems wrong, two descriptors pointing to the same address isn't making sense to me. Sorry to sound like a broken record, but I want to make sure we're on the same page! Sounds like you're looking into it based on the below comment, but some of these comments here made me think I didn't explain the situation well enough. > > > > Should we be adding the payload length (TSO_MAX_BUFF_SIZE I suppose > > based on the tmp_pay_len = pay_len - TSO_MAX_BUFFSIZE above)? It seems > > that <= 32 results in duplicate data in both the "first" descriptor > > programmed above, and in the "second" descriptor programmed in > > stmmac_tso_allocator(). > curr_addr = des + (total_len - tmp_len) is used in while loop in tso_alloc to get address of all required descriptor . > descriptor address will be updated finally in tso_alloc by below call . > > if (priv->dma_cap.addr64 <= 32) > desc->des0 = cpu_to_le32(curr_addr); > else > stmmac_set_desc_addr(priv, desc, curr_addr); > > Also, since tmp_pay_len is decremented, but des > > isn't, it seems that stmmac_tso_allocator() would not put all of the > > buffer in the descriptors and would leave the last TSO_MAX_BUFF_SIZE > > bytes out? > > > > I highlight all of this because with your change here we get the > > following now in the <= 32 case: > > > > first->des0 = des > > first->des1 = des + proto_hdr_len > > stmmac_tso_allocator(priv, des + proto_hdr_len, ...) > > > > which is a subtle change in the call to stmmac_tso_allocator, meaning > > a subtle change in the descriptor programming. > > > > Both seem wrong for the <= 32 case, but I'm "reading between the lines" > > with how these descriptors are programmed (I don't have the docs to back > > this up, I'm inferring from the code). It seems to me that in the <= 32 > > case we should have: > > > > first->des0 = des > > first->des1 = des + proto_hdr_len > > stmmac_tso_allocator(priv, des + TSO_MAX_BUF_SIZE, ...) > > let me check <=32 case only on setup and get back. > > > > or similar depending on if that really makes sense with how des0/des1 is > > used (the handling is different in stmmac_tso_allocator() for <= 32, > > only des0 is used so I'm having a tough time figuring out how much of > > the des is actually programmed in des0 + des1 above without knowing the > > hardware better). > > > > Does that make sense? The prior code seems fishy to me, your change > > seems to unintentionally change that fhsy part, but it still seems fishy > > to me. I don't think you should be changing that code's behavior in that > > patch, if you think it's right then we should continue with the current > > behavior prior to your patch, and if you think its wrong we should > > probably fix that *prior* to this patch in your series. > > > > Thanks, > > Andrew > > >