Dear Vincent, > -----Ursprüngliche Nachricht----- > Von: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx> > Gesendet: Donnerstag, 27. Februar 2025 03:20 > An: Schmidt, Stefan <schmidtssstefan@xxxxxxxxxxxxxxxxxxxxxxxx> > Cc: linux-can@xxxxxxxxxxxxxxx; Dario Binacchi > <dario.binacchi@xxxxxxxxxxxxxxxxxxxx> > Betreff: Re: [Issue] Bosch D_CAN echo skb occupied error > > Hi Schmidt, > > How is it going? Do you need support to address the issue? Or would you > like us to take over? Please let us know if there is anything we can > help! Thank you for asking. Currently, this topic is paused from my side. I wanted to pick this up again in May or even later, as the solution with the lock is working very well for us, and we currently need to do lots of other stuff ☹ Despite not working on the problem, at least I often think about it. From my point of view, there will always be two contexts, the task that puts packets into the hardware and the tx irq, racing for the same hardware. Unfortunately, I can't imagine a lockless solution in a pre-emptive / multi core environment, yet. Maybe it would be helpful if someone else could also have a deeper look into it. > > On 30/01/2025 at 13:28, Vincent Mailhol wrote: > > +CC: Dario > > > > On 28/01/2025 at 00:44, Schmidt, Stefan wrote: > >> Dear Vincent, > >> > >>> -----Ursprüngliche Nachricht----- > >>> Von: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx> > >>> Gesendet: Freitag, 24. Januar 2025 15:16 > >>> An: Schmidt, Stefan <schmidtssstefan@xxxxxxxxxxxxxxxxxxxxxxxx> > >>> Cc: linux-can@xxxxxxxxxxxxxxx > >>> Betreff: Re: [Issue] Bosch D_CAN echo skb occupied error > > > > (...) > > > >>> > >>>> Context: > >>>> Recently, we updated our kernel from 4.14 to 6.1.43 and our Debian > >>>> from > >>> 10 to 12. > >>>> The system runs on an Altera Cyclone V, which has two Bosch D_CAN > >>>> CAN > >>> Controllers on board. We use both. Additionally, it is a dual core > >>> system with two Arm cores. > >>>> > >>>> Issue: > >>>> After the update, I saw that both CAN devices sporadically print > >>>> the message > >>> "BUG! echo_skb %d is occupied!" once into the syslog, with %d always > >>> being index 0. > >>>> From there on the affected device was unable to send any messages. > >>>> A > >>> restart of the device (ifdown/up) recovered the problem. > >>>> > >>>> Steps Taken: > >>>> After heavily polluting the driver with trace points, I found out > >>>> that in this > >>> case, the echo socket buffer is indeed still occupied by an old > >>> message on index 0. > >>>> Looking at the preceding usage of index 0, I saw that both > >>>> functions > >>> c_can_start_xmit and c_can_do_tx > >>> (drivers/net/can/c_can/c_can_main.c ) were executed on different > cores at the same time. > >>>> One was called by the tx-interrupt of the CAN controller and was > >>>> scheduled > >>> on CPU0. > >>>> The other one was called by the application / network layer and was > >>> scheduled on CPU1. > >>>> As both functions operate on the same resources (variables in RAM > >>>> and CAN > >>> controller registers), I think that there should be some kind of > >>> synchronization mechanism, but I did not find any in the code. > >>>> > >>>> Theory: > >>>> I think that the caching mechanism, that was introduced somewhen > >>>> after > >>> the 4.14 version of the kernel, can get compromised by this parallel > execution. > >>>> In the beginning of c_can_start_xmit, the function checks if there > >>>> is a wrap > >>> around in the buffer. If this is the case, the message just gets > >>> prepared, everything gets written to the chip, but the final > >>> transmission request is not set. > >>>> ```c > >>>> // (1) drivers/net/can/c_can/c_can_main.c - c_can_start_xmit cmd &= > >>>> ~IF_COMM_TXRQST; /* Cache the message */ ``` In the end of > >>>> c_can_start_xmit this command gets written to the CAN controller. > >>>> ```c > >>>> // (2) drivers/net/can/c_can/c_can_main.c - c_can_start_xmit > >>>> c_can_object_put(dev, IF_TX, obj, cmd); ``` The final trigger to > >>>> send the message shall then be set in the tx-interrupt, as soon as > >>>> message from > >>> object index 15 (last index) was processed, because otherwise, the > >>> new message on index 0 would bypass the one on index 15 and change > >>> the order in which they are put on the bus. > >>>> ```c > >>>> // (3) drivers/net/can/c_can/c_can_main.c - c_can_do_tx > >>>> c_can_object_put(dev, IF_NAPI, obj, IF_COMM_TXRQST); ``` My theory > >>>> is, that as it is possible that both functions run parallel on > >>>> different CPUs, I think > >>> that it is possible that this can happen that it happens in this > >>> order (1) -> (3) -> (2). Which leads to the situation that the > >>> transmission request bit is immediately reset after being set. > >>>> To support my theory, I put a spinlock around both critical code > >>>> sections. As > >>> of now, this seems to properly solve the issue, but I am aware that > >>> this is probably the clumsiest way to do it. > >>>> > >>>> Request: > >>>> I am reaching out to you because I have not found anyone else with > >>>> a similar > >>> problem on the internet. > >>>> It would be great if you could share your thought about this > >>>> problem and > >>> maybe give some hints for a cleaner solution. > >>>> > >>>> Additional Details: > >>>> Please find my clumsy fix with the spinlock below. I don't think > >>>> that it is > >>> ready to be merged to any official repository, but maybe it is a > >>> short-term help for someone who is in a similar situation. > >>> > >>> So, theoretically, it should be possible to have a lockless queue > >>> under the conditions that: > >>> > >>> - there is only one reader at a time > >>> - there is only one writer at a time > >>> - the queue size is a power of 2 > >>> > >> > >> Implementing another queue length than power of 2 in hardware is > probably a pain, so I think this point is always true. > > > > Actually, the queue size is configured in software through > > c_can_driver_data->msg_obj_num. Example: > > > > > > https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix > > ir.bootlin.com%2Flinux%2Fv6.13%2Fsource%2Fdrivers%2Fnet%2Fcan%2Fc_can% > > 2Fc_can_platform.c%23L196&data=05%7C02%7Cschmidtssstefan%40siemens-hea > > lthineers.com%7C309d6ff66a904598c7d608dd56d5363f%7C5dbf1add202a4b8d815 > > bbf0fb024e033%7C0%7C0%7C638762195934451436%7CUnknown%7CTWFpbGZsb3d8eyJ > > FbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpb > > CIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=fjW1SVv1nqliZ3qREQs2T6O%2BlUzxe > > pK%2FlSXdbI1GYpQ%3D&reserved=0 > > > > And all the values for D_CAN are either 32 or 64. > > > >> Maybe it can come to the problem even on single core systems. Is it > possible that the TX soft-irq can interrupt the start_xmit, or is this > somehow scheduled atomic? > >> If it is possible, then the lock is also needed on single core > systems. > > > > Not sure 100%, but I can imagine this appearing on single core systems > > as well if the scheduler decides to stop in the middle of > c_can_do_tx(). > > > > It is clear that c_can_start_xmit() and c_can_do_tx() are racing > > between each other. So let's go back to the lockless queue topic, and > > see what could have gone wrong. As established the queue size is not > the issue. > > > > So it means that c_can_start_xmit() and c_can_do_tx() are doing > > concurrent read and writes as corroborated by your observations and > > analysis. > > > > On important aspect of the lockless queue is the self discipline. > > Because it is lockless, the writer should monitor the occupancy of the > > queue and stop writing when it is full. Similarly, the reader should > > first free the slots *before* notifying the reader that these slots > > became available again. > > > > If the writer stops the queue too late or if the reader resumes the > > queue too early, bad things happen. > > > > Back to the network queue, this logic will be controlled using the > > netif_wake_queue() and netif_stop_queue(): > > > > - just before the head reaches the queue, c_can_start_xmit() must > call > > netif_stop_queue() so that the network frame work stop calling > that > > function > > > > - once c_can_do_tx() finishes reading the queue, it calls > > netif_wake_queue() to notify the network framework that it is safe > > again to call c_can_start_xmit(). > > > >>> This design will then abuse the C unsigned integer wraparound > >>> property. The c_can/d_can meets all above requirements, so normally, > >>> no locks are needed here. > >>> > >>> So I think that the core issue is somewhere else. Looking at the > >>> recent commits, [1] and [2] caught my attention: > >>> > >>> [1] commit 387da6bc7a82 ("can: c_can: cache frames to operate as a > >>> true > >>> FIFO") > >>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi > >>> t.kernel.org%2Ftorvalds%2Fc%2F387da6bc7a82&data=05%7C02%7Cschmidtsss > >>> tefan%40siemens-healthineers.com%7C309d6ff66a904598c7d608dd56d5363f% > >>> 7C5dbf1add202a4b8d815bbf0fb024e033%7C0%7C0%7C638762195934473027%7CUn > >>> known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAi > >>> OiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=f3Ze > >>> MpFpXijjCDWt72nfqrI%2FHLea2%2Bl%2FL7VlVH%2Fnb08%3D&reserved=0 > >>> > >>> [2] commit 81d192c2ce74 ("can: c_can: don't cache TX messages for > >>> C_CAN > >>> cores") > >>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi > >>> t.kernel.org%2Ftorvalds%2Fc%2F81d192c2ce74&data=05%7C02%7Cschmidtsss > >>> tefan%40siemens-healthineers.com%7C309d6ff66a904598c7d608dd56d5363f% > >>> 7C5dbf1add202a4b8d815bbf0fb024e033%7C0%7C0%7C638762195934488874%7CUn > >>> known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAi > >>> OiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=KWun > >>> 1f5AQkzuTYozoRXMFe9bPZCXo61oZ3jz0CThXqQ%3D&reserved=0 > >>> > >>> Those two are present in 6.1.x on which you are having your problem. > >>> > >>> What triggers me the most is the fact that the D_CAN, on which you > >>> are having your problems, have a special if condition: > >>> > >>> > >>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel > >>> ixir.bootlin.com%2Flinux%2Fv6.12%2Fsource%2Fdrivers%2Fnet%2Fcan%2Fc_ > >>> can%2Fc_can.h&data=05%7C02%7Cschmidtssstefan%40siemens-healthineers. > >>> com%7C309d6ff66a904598c7d608dd56d5363f%7C5dbf1add202a4b8d815bbf0fb02 > >>> 4e033%7C0%7C0%7C638762195934501886%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0e > >>> U1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsI > >>> ldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6wk2AV%2Bm102PPUstVhbABw%2BoKIuV7 > >>> HET%2BI3DAXL%2BK1Q%3D&reserved=0 > >>> #L244 > >>> > >>> In addition, [2] says that, I quote: > >>> > >>> the optimization introduced in [1] doesn't properly work on > >>> C_CAN, but on D_CAN IP cores. The exact reasons are still > >>> unknown. > >>> > >>> The other part of the queue implementation looks good to me. Only > >>> this > >>> c_can_get_tx_free() caught my attention. > >>> > >>> Can you try to revert above commits [1] and [2] and tell us if you > >>> still have the issue? > >> > >> Unfortunately, I don't have enough time for those tests, > > > > With the good git commands, reverting commits should be an easy task. > > Example working on the stable tree: > > > > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git > > cd linux > > git checkout v6.1.43 > > git revert 81d192c2ce74 # revert [2] > > git revert 387da6bc7a82 # revert [1] > > editor drivers/net/can/c_can/c_can_main.c # resolve the conflict > > git add drivers/net/can/c_can/c_can_main.c # add conflict resolution > > git revert --continue # continue and commit the revert of [1] > > > > (this is FYI, your analysis gave enough information so that this > > revert is not necessary anymore). > > > >> but I am quite sure that they are the reason, because the code looks > almost like in 4.14 after reverting those changes. > >> Those changes [1] and [2] introduced write access to the TX buffer in > the tx-irq (c_can_do_tx). This was not the case before those changes. > >> Maybe it was not considered that the function c_can_do_tx can / is > called from another context. > >> Maybe, someone who uses the C_CAN had the same problems as me and > therefore disabled the caching mechanism. > >> > >> From my point of view, those resources need to be protected by > concurrent access: > >> - tx_ring->tail: This is incremented by the tx-irq. If it becomes 0 > (message on index 15 was sent), it checks if other messages were cached > by looking on tx_ring->head. If head is greater 0, it will set the TX > request bit for all cached messages by calling c_can_object_put. > >> - tx_ring->head: This is incremented by the sending routine. If it > becomes 0 and tail is still greater than 0 (there are still messages in > the last objects that are waiting to be sent), the message must be > cached (TX request must not be set), because the controller sends the > messages object with the lowest index first. If it wasn’t cached, it > would bypass the messages in the objects with the higher index. > >> - Any call to c_can_object_put / c_can_inval_tx_object / > >> c_can_object_get, because they all call c_can_obj_update which sets a > >> register and busy waits for the transaction to complete > >> > >> Probably it is not necessary to protect the first half of the tx-irq: > >> ```c > >> static void c_can_do_tx(struct net_device *dev) { > >> struct c_can_priv *priv = netdev_priv(dev); > >> struct c_can_tx_ring *tx_ring = &priv->tx; > >> struct net_device_stats *stats = &dev->stats; > >> u32 idx, obj, pkts = 0, bytes = 0, pend; > >> u8 tail; > >> > >> if (priv->msg_obj_tx_last > 32) > >> pend = priv->read_reg32(priv, C_CAN_INTPND3_REG); > >> else > >> pend = priv->read_reg(priv, C_CAN_INTPND2_REG); > >> > >> while ((idx = ffs(pend))) { > >> idx--; > >> pend &= ~BIT(idx); > >> obj = idx + priv->msg_obj_tx_first; > >> > >> /* We use IF_NAPI interface instead of IF_TX because we > >> * are called from c_can_poll(), which runs inside > >> * NAPI. We are not transmitting. > >> */ > >> c_can_inval_tx_object(dev, IF_NAPI, obj); > >> bytes += can_get_echo_skb(dev, idx, NULL); > >> pkts++; > >> } > >> > >> if (!pkts) > >> return; > >> > >> ////////////// spinlock starting here > >> > >> tx_ring->tail += pkts; > >> if (c_can_get_tx_free(priv, tx_ring)) { > >> /* Make sure that anybody stopping the queue after > >> * this sees the new tx_ring->tail. > >> */ > >> smp_mb(); > >> ////////////// Do we get a problem here because this might call the > send routine?! I have no idea. Maybe this should be moved to the end of > the function and be removed out of the spin lock section > >> netif_wake_queue(priv->dev); // is this the equivalent to stop > queue? > > > > Yes! It is netif_stop_queue(). > > > >> } > >> > >> stats->tx_bytes += bytes; > >> stats->tx_packets += pkts; > >> > >> tail = c_can_get_tx_tail(tx_ring); > >> if (priv->type == BOSCH_D_CAN && tail == 0) { > > This bothers me the most. This check occurs *after* the call to > > netif_wake_queue() so anything in this if block is not guarded anymore > > and can race against c_can_start_xmit(). > > > >> u8 head = c_can_get_tx_head(tx_ring); > >> > >> /* Start transmission for all cached messages */ > >> for (idx = tail; idx < head; idx++) { > >> obj = idx + priv->msg_obj_tx_first; > >> c_can_object_put(dev, IF_NAPI, obj, IF_COMM_TXRQST); > >> } > >> } > >> } > >> ``` > >> > >> > >> In the sending routine, I think most stuff needs to be protected. > Some parts could be rearranged to reduce the amount of code in the > critical area. > >> ```c > >> static netdev_tx_t c_can_start_xmit(struct sk_buff *skb, > >> struct net_device *dev) > >> { > >> struct can_frame *frame = (struct can_frame *)skb->data; > >> struct c_can_priv *priv = netdev_priv(dev); > >> struct c_can_tx_ring *tx_ring = &priv->tx; > >> u32 idx, obj, cmd = IF_COMM_TX; > >> > >> if (can_dev_dropped_skb(dev, skb)) > >> return NETDEV_TX_OK; > >> > >> if (c_can_tx_busy(priv, tx_ring)) > >> return NETDEV_TX_BUSY; > >> > >> /////// spinlock starting here > >> idx = c_can_get_tx_head(tx_ring); /// this probably doesn’t need > the spinlock yet, because it just reads and the tx-irq does not modify > >> tx_ring->head++; // this needs to be protected because tx-irq will > use it to decide how many tx request bits it needs to set > >> if (c_can_get_tx_free(priv, tx_ring) == 0) // if this wasn’t > protected, the queue could possibly be stopped although a message object > just became free, which would not be too bad > >> netif_stop_queue(dev); > >> > >> if (idx < c_can_get_tx_tail(tx_ring)) // this needs to be protected > >> cmd &= ~IF_COMM_TXRQST; /* Cache the message */ > >> > >> /* Store the message in the interface so we can call > >> * can_put_echo_skb(). We must do this before we enable > >> * transmit as we might race against do_tx(). > >> */ > >> c_can_setup_tx_object(dev, IF_TX, frame, idx); // this needs to be > protected because it calls c_can_inval_tx_object > >> can_put_echo_skb(skb, dev, idx, 0); // this probably doesn’t need > protection > >> obj = idx + priv->msg_obj_tx_first; > >> c_can_object_put(dev, IF_TX, obj, cmd); // this needs protection > >> again > >> > >> return NETDEV_TX_OK; > >> } > >> ``` > >> > >> I have done extensive tests with my patch, and it works well in my > system. For me, it would be sufficient :-D. > >> In the meantime, I found that another driver had the same problem. > >> They chose a similar solution > >> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor > >> e.kernel.org%2Flkml%2F20180327162802.559928021%40linuxfoundation.org% > >> 2F&data=05%7C02%7Cschmidtssstefan%40siemens-healthineers.com%7C309d6f > >> f66a904598c7d608dd56d5363f%7C5dbf1add202a4b8d815bbf0fb024e033%7C0%7C0 > >> %7C638762195934514212%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWU > >> sIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D > >> %7C0%7C%7C%7C&sdata=bX7AulAsK%2BU2Hb0XTZcWiCPfwvWwzo7Uy8pu5il%2BWKs%3 > >> D&reserved=0 > > > > Putting spinlock will of course solves your issue, but it just > > papering over the root cause. Here we need to find where the queue > > wake/stop logic went wrong. > > > >>>> Thank you in advance for your time and assistance. I look forward > >>>> to hearing > >>> your suggestions or advice. > >>> > >>> Thank you for the detailed report. Let me ask you: do you want to > >>> write the final patch? If yes, we can guide you into the process. > >> > >> It would be an honour to do this :-) > > > > Great. Have a look at these: > > > > > > > https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnickde > saulniers.github.io%2Fblog%2F2017%2F05%2F16%2Fsubmitting-your-first- > patch-to-the-linux-kernel-and-responding-to- > feedback%2F&data=05%7C02%7Cschmidtssstefan%40siemens- > healthineers.com%7C309d6ff66a904598c7d608dd56d5363f%7C5dbf1add202a4b8d81 > 5bbf0fb024e033%7C0%7C0%7C638762195934526339%7CUnknown%7CTWFpbGZsb3d8eyJF > bXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIs > IldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=CJe6hTJXgpx16jr7TvRDwnro%2F1ml3a6KvV > dHfyT7JlY%3D&reserved=0 > > > > https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > > kernel.org%2Fdoc%2Fhtml%2Flatest%2Fprocess%2Fsubmitting-patches.html&d > > ata=05%7C02%7Cschmidtssstefan%40siemens-healthineers.com%7C309d6ff66a9 > > 04598c7d608dd56d5363f%7C5dbf1add202a4b8d815bbf0fb024e033%7C0%7C0%7C638 > > 762195934538556%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi > > IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C% > > 7C%7C&sdata=InLEChzj6SrY6QxwZVXNAbsrv6gr%2F3GFhKbQlWDpE%2Fo%3D&reserve > > d=0 > > > > When you describe your changes in your patch, go for conciseness. No > > need for the politeness, go strait to the point. > > > > A good way to describe the race condition is through some ASCII art > > diagram, something like this: > > > > CPU 1 CPU 2 > > > > | | > > c_can_do_tx() | > > | | > > netif_wake_queue() | > > | | > > | c_can_start_xmit() > > | | > > c_can_object_put() can_put_echo_skb() > > | | > > | | > > v v > > > > If you have a dmesg error log, it is also good to attach it at the > > bottom of your patch description. > > > > Have a look at: > > > > git log --grep="race condition" > > > > to see how people usually describe their patch when fixing a race > condition. > > Yours sincerely, > Vincent Mailhol Best regards, Stefan Schmidt