Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I added the sockopt CAN_ISOTP_WAIT_TX_DONE in isotp-poll-test.c which fixes the problem:

(..)

struct sockaddr_can addr;
static struct can_isotp_options opts = { .flags = CAN_ISOTP_WAIT_TX_DONE};

(..)

sock = CHECK(socket(PF_CAN, SOCK_DGRAM, CAN_ISOTP));

CHECK(setsockopt(sock, SOL_CAN_ISOTP, CAN_ISOTP_OPTS, &opts, sizeof(opts)));

(..)

The sockopt uses a wait queue and returns from the write() syscall when tx.state becomes ISOTP_IDLE.

Using the isotp socket without CAN_ISOTP_WAIT_TX_DONE turned out to be not such good idea. But this results from the original API which had some kind of "fire-and-forget" mode.

Today the tx.state is set back to ISOTP_IDLE in isotp_rcv_echo() - and with this short 9 byte PDU the interaction with the receiving entity is really fast on vcan's. Maybe faster than the the write syscall returns.

I don't know what happens to the poll notification then.

Best regards,
Oliver


On 01.03.23 19:59, Oliver Hartkopp wrote:
Hi Michal,

just copied the text to answer your patch ...

On 28.02.23 21:49, Michal Sojka wrote:

 > When using select/poll/epoll() with a non-blocking ISOTP socket to
 > wait for when non-blocking write is possible, false EPOLLOUT event is
 > sometimes returned. This can happen at least after sending a message
 > which must be split to multiple CAN frames.
 >
 > The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
 > not equal to ISOTP_IDLE and this behavior is not reflected in
 > datagram_poll(), which is used in isotp_ops.
 >
 > This is fixed by introducing ISOTP-specific poll function, which
 > suppresses the EPOLLOUT events in that case.


Good improvement!

 > Below is a program that can trigger the problem on a vcan interface.
 > When running the program as:
 >
 >      ./isotp-poll-test -s 123 -d 321 -o
 >
 > it starts sending ISOTP messages that include increasing ASCII
 > numbers. poll() is used to wait before next transmission.
 >
 > With current mainline Linux, once the message length is greater than 7
 > bytes, write() returns -EAGAIN and the program terminates. This should
 > not happen, because the previous poll() reported that the write()
 > would not block.
 >
 > After applying this patch, the above command doesn't fail - if one
 > runs some ISOTP reader such as:
 >
 >      isotprecv -l -s 321 -d 123 vcan0
 >

Yes, I can confirm that the above setup works - but not reliable:

 > This test program can also show another problem. When running:
 >
 >      ./isotp-poll-test -s 321 -d 123 -i -a
 >
 > and then in another terminal:
 >
 >      ./isotp-poll-test -s 123 -d 321 -o
 >
 > The first program receives the messages and uses the counter values to
 > check for lost messages. After a random number of iterations a lost
 > message is always detected. I believe that ISOTP should be reliable
 > protocol, at least on vcan, shouldn't it?

The problem seems to occur with the new introduced POLL feature.

When using

     ./isotp-poll-test -s 123 -d 321 -o

together with

     isotprecv -l -s 321 -d 123 vcan0

the transmission of isotp PDUs stops at some point
#8179 #6204 #1787 #373 #69321 etc

I can not see problems or drops when using

./isotpsend vcan0 -s 123 -d 321 -D 9 -li

as data producer where I added a counter:

diff --git a/isotpsend.c b/isotpsend.c
index deac601..815c254 100644
--- a/isotpsend.c
+++ b/isotpsend.c
@@ -96,10 +96,11 @@ int main(int argc, char **argv)
      useconds_t usecs = 0; /* wait before sending the PDU */
      __u32 force_tx_stmin = 0;
      unsigned char buf[BUFSIZE];
      int buflen = 0;
      int datalen = 0;
+    u_int32_t datainc = 0;
      int retval = 0;

      addr.can_addr.tp.tx_id = addr.can_addr.tp.rx_id = NO_CAN_ID;

     while ((opt = getopt(argc, argv, "s:d:x:p:P:t:f:D:l:g:bSCL:?")) != -1) {
@@ -295,10 +296,13 @@ int main(int argc, char **argv)

  loop:
      if (usecs)
             usleep(usecs);

+    memcpy(&buf[datalen - 4], &datainc, 4);
+    datainc++;
+
      retval = write(s, buf, buflen);
      if (retval < 0) {
             perror("write");
             return retval;
      }


Maybe the 'poll' patch needs to check for something else. I will take a look at it.

Many thanks,
Oliver




[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux