Hi, On Wed, Jan 08, 2025 at 02:31:12PM +0530, subramanian.mohan@xxxxxxxxx wrote: > From: Subramanian Mohan <subramanian.mohan@xxxxxxxxx> > > The prolonged testing of passing can messages between > two Elkhartlake platforms resulted in message stuck > i.e Message did not receive at receiver side Can you please describe the reason for the stuck messages in your commit message? I am reading this but I don't understand why this happens or why your proposed solution helps. > > Contolling TX i.e TEFN bit helped to resolve the message > stuck issue. > > The current solution is enhanced/optimized from the below patch: > https://lore.kernel.org/lkml/20230623051124.64132-1-kumari.pallavi@xxxxxxxxx/T/ > > Setup used to reproduce the issue: > > +---------------------+ +----------------------+ > |Intel ElkhartLake | |Intel ElkhartLake | > | +--------+ | | +--------+ | > | |m_can 0 | |<=======>| |m_can 0 | | > | +--------+ | | +--------+ | > +---------------------+ +----------------------+ > > Steps to be run on the two Elkhartlake HW: > 1)Bus-Rate is 1 MBit/s > 2)Busload during the test is about 40% > 3)we initialize the CAN with following commands > 4)ip link set can0 txqueuelen 100/1024/2048 > 5)ip link set can0 up type can bitrate 1000000 > > Python scripts are used send and receive the can messages > between the EHL systems. > > Signed-off-by: Hahn Matthias <matthias.hahn@xxxxxxxxx> > Signed-off-by: Subramanian Mohan <subramanian.mohan@xxxxxxxxx> > --- > drivers/net/can/m_can/m_can.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c > index 97cd8bbf2e32..0a2c9a622842 100644 > --- a/drivers/net/can/m_can/m_can.c > +++ b/drivers/net/can/m_can/m_can.c > @@ -1220,7 +1220,7 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir) > static int m_can_interrupt_handler(struct m_can_classdev *cdev) > { > struct net_device *dev = cdev->net; > - u32 ir = 0, ir_read; > + u32 ir = 0, ir_read, new_interrupts; > int ret; > > if (pm_runtime_suspended(cdev->dev)) > @@ -1283,6 +1283,9 @@ static int m_can_interrupt_handler(struct m_can_classdev *cdev) > ret = m_can_echo_tx_event(dev); > if (ret != 0) > return ret; > + > + new_interrupts = cdev->active_interrupts & ~(IR_TEFN); > + m_can_interrupt_enable(cdev, new_interrupts); Here is a theoretical situation of two messages being sent. The first is being sent and handled in this interrupt handler. Then it would disable the TEFN bit right? If the second message wasn't done sending yet, how would it ever call the interrupt handler if the interrupt is disabled? Also you are disabling this interrupt here regardless of the type of mcan device and also regardless of the coalescing state. In the transmit part you are only enabling it for non-peripheral devices. For peripheral mcan devices this would also introduce an additional two transfers per transmit. In which situations is this really necessary? Does it help to implement coalescing for non-peripheral devices? Best Markus > } > } > > @@ -1989,6 +1992,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, > struct m_can_classdev *cdev = netdev_priv(dev); > unsigned int frame_len; > netdev_tx_t ret; > + u32 new_interrupts; > > if (can_dev_dropped_skb(dev, skb)) > return NETDEV_TX_OK; > @@ -2008,8 +2012,11 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, > > if (cdev->is_peripheral) > ret = m_can_start_peripheral_xmit(cdev, skb); > - else > + else { > + new_interrupts = cdev->active_interrupts | IR_TEFN; > + m_can_interrupt_enable(cdev, new_interrupts); > ret = m_can_tx_handler(cdev, skb); > + } > > if (ret != NETDEV_TX_OK) > netdev_completed_queue(dev, 1, frame_len); > -- > 2.35.3 >
Attachment:
signature.asc
Description: PGP signature