can: isotp: epoll breaks isotp_sendmsg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

There is something not clear happening with the non-blocking behavior
of ISO-TP sockets in the TX path, but more importantly, using epoll now
completely breaks isotp_sendmsg.
I believe it is related to
79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events"),
but actually is probably deeper than that.

I don't completely understand what is exactly going on, so I am sharing
the problem I face:

With an ISO-TP socket in non-blocking mode, using epoll seems to make
isotp_sendmsg always return -EAGAIN.

I have a non-blocking socket + epoll version of can-utils isotpsend available for
testing at https://gist.github.com/MJayat/4857da43ab154e4ba644d2446b5fa46d
With this version I do the following test:

isotprecv -l -m 0x80 -s 7e8 -d 7e0 vcan0 &
echo "01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f" | strace ./isotpsend -l10 -s 7E0 -d 7E8 vcan0

I get:
...
15:37:02.456849 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=0, u64=0}}) = 0 <0.000249> 15:37:02.457839 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 1, 2000) = 1 <0.000200> 15:37:02.458838 write(3, "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1 EAGAIN (Resource temporarily unavailable) <0.000278> 15:37:02.459908 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 1, 2000) = 1 <0.000250> 15:37:02.460879 write(3, "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1 EAGAIN (Resource temporarily unavailable) <0.000272> 15:37:02.461831 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 1, 2000) = 1 <0.000199>
...
impossible to write on the socket, and busy-looping.

With a change to epoll_ctl flags, now including EPOLLET, I get:
...
15:36:22.443689 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=0, u64=0}}) = 0 <0.000171> 15:36:22.444514 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 1, 2000) = 1 <0.000188> 15:36:22.445413 write(3, "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1 EAGAIN (Resource temporarily unavailable) <0.000175>
15:36:22.446335 epoll_wait(4, [], 1, 2000) = 0 <2.026006>
...
epoll_wait now blocks indefinitely.

By reverting 79e19fa79c, I get better results but still incorrect:
...
15:41:43.890880 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=0, u64=0}}) = 0 <0.000200> 15:41:43.892011 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 1, 2000) = 1 <0.000408> 15:41:43.893148 write(3, "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15 <0.000458> 15:41:43.894405 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 1, 2000) = 1 <0.000257> 15:41:43.895385 write(3, "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15 <0.128429> 15:41:44.028757 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 1, 2000) = 1 <0.001886> 15:41:44.040858 write(3, "\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15 <0.108069>
...
It is then possible to write on the socket but the write is blocking, which is
not the expected behavior for a non-blocking socket.

I don't know how to solve the problem. To me, using wq_has_sleeper seems weird. The implementation of isotp_poll feels weird too (calling both datagram_poll and
poll_wait?). But I am not sure what would be the correct implementation.

Note that this behavior is currently on all linux-stable branches. I am
currently testing on v6.1.36 but I know it is failing on v6.3 too.

My actual use-case is in Async Rust using tokio. Not using epoll is not an
option and a non-blocking socket that sometimes blocks can be terrible for
the performance of the whole application. Our embedded target runs with the
out-of-tree module on an older linux version, but my colleagues with up-to-date linux stable kernels have recently been prevented from running tests on their
PC.

Is there someone with a good idea of how to proceed to fix this?
I'll probably keep digging but I don't think I can spend so much time on this,
so any help is appreciated.

Thanks,
Maxime





[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux