Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24.06.2021 17:21:15, Harald Mommer wrote:
> Hello,
> 
> Am 18.06.21 um 11:16 schrieb Marc Kleine-Budde:
> > On 17.06.2021 14:22:03, Harald Mommer wrote:
> > > we are currently in the process of developing a draft specification for
> > > Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
> > > driver and a Virtio CAN Linux device
> > Oh that sounds interesting. Please keep the linux-can mailing list in
> > the loop. Do you have a first draft version for review, yet?
> 
> First draft went to virtio-comment@xxxxxxxxxxxxxxxxxxxx and
> virtio-dev@xxxxxxxxxxxxxxxxxxxx.
> 
> https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results
> 
> Link should reveal the short conversation. Currently working on the next
> draft which incorporates the review comments I got so far but the next draft
> will also address the "TX ACK" problem we are discussing here.
> 
> In the future I will put the Linux-CAN list in the loop.
> 
> > > running on top of our hypervisor solution.
> > > 
> > > The Virtio CAN Linux device forwards an existing SocketCAN CAN device
> > > (currently vcan) via Virtio to the Virtio driver guest so that the virtual
> > > driver guest can send and receive CAN frames via SocketCAN.
> > > 
> > > What was originally planned (probably with too much AUTOSAR CAN driver
> > > semantics in my head and too few SocketCAN knowledge) is to mark a
> > > transmission request as used (done) when it's sent finally on the CAN bus
> > > (vs. when it's given to SocketCAN not really done but still pending
> > > somewhere in the protocol stack).
> > Makes sense.
> 
> Reading the "Makes sense". But reading also the rest of the E-Mail (and the
> thread) it makes the impression that making this timing requirement
> mandatory using SocketCAN is calling for trouble.

It makes sense to have a TX done notification. You probably need this
for proper queue handling and throttling.

> - Could remove the timing requirement. This is the easy solution. But there
> is the "Makes sense".
> 
> - The original strict timing requirement becomes an option so it's not a
> mandatory requirement.
> 
> 2nd is my favorite (but I tend to do over engineering in the first shot so
> the option before may be indeed the better one).
> 
> Not having this timing behavior has the implication that in the next virtio
> draft spec some other things have to be changed and this means now
> simplified.
> 
> > > Thought this was doable with some implementation effort using
> > > 
> > > setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing the
> > > MSG_CONFIRM bit on received messages.

> > Where does that code run? Would that be part of qemu running on the host
> > of an open source solution?

> The device application is closed source, runs under the COQOS hypervisor
> which is also closed source.

Ok

> A qemu device implementation is not planned as of now. The virtio CAN
> driver is a Linux device driver and will be open sourced at some point
> in time in the hope to get it upstreamed in a more far away future.

I suggest to post the code as early as possible, probably along with the
next round of virio-can spec RFC.

> Currently the driver is on an internal development branch, outsiders
> cannot see it (still better for everyone)

I doubt that :) I think the Linux community has seen a lot of code that
has been cooking for too long before trying to bring it mainline.

> and the colleagues are reviewing helping to bring it into an
> acceptable shape.

You have to pass the review here anyways :D

> > Can you sketch a quick block diagram showing guest, host, Virtio device,
> > Virtio driver, etc...
> 
> I hope this arrives on the list as is been sent and not garbled:
> 
>      Guest 2                    | Guest3
> ----------------                | ----------------
> ! cangen,      !                | ! cangen,      !
> ! candump,     !                | ! candump,     !
> ! cansend      !                | ! cansend      !
> ! using vcan0  !                | ! using can0   !
> ----------------                | ----------------
>  ^                              |             ^
>  !  ---------------------       |             !
>  !  ! Service process   !       |             !
>  !  ! in user space     !       |             !

Oliver has already commented on this :) Getting feedback from the
community early could have saved you some work :)

>  !  ! virtio-can device !       |             !
>  !  ! forwarding vcan0  !       |             !
>  !  ---------------------       |             !
>  !    ^               ^         |             !
>  !    !               !         |             !
> --------------------------------------------------
>  !    !   Device side ! kernel  | Driver side ! kernel
>  v    v               v         |             v
> ---------------- -------------- | ----------------
> ! Device Linux ! ! HV support ! | ! Driver Linux !
> !    VCan      ! !   module   ! | !  Virtio CAN  !
> !    vcan0     ! ! on device  ! | !     can0     !
> !              ! !   side     ! | !              !
> ---------------- -------------- | ----------------
>        ^               ^        |        ^
>        !               !        |        !
> --------------------------------------------------
>        !               !                 ! Hypervisor
>        v               v                 v
> --------------------------------------------------
> !                     COQOS-HV                   !
> --------------------------------------------------
> 
> > > This works fine with
> > > 
> > > cangen -g 0 -i can0
> > > 
> > > on the driver side sending CAN messages to the device guest. No confirmation
> > > is lost testing for several minutes.
>
> > Where's the driver side? On the host or the guest?
> 
> Both sides are guests of the hypervisor in our architecture. There is no
> host in this sense, COQOS-HV is a type 1 hypervisor. The hypervisor does not
> provide devices directly on its own, the devices are provided with the
> support of a device (provider) guest which is also only a guest of the
> hypervisor.

IC - as I'm not interested in closed source solution I'd focus on the
qemu use case. Good thing is, the virtio-can must handle both use cases
anyways.

> > Have you activated SO_RXQ_OVFL?
> > With recvmsg() you get the number of dropped messages in the socket.
> > Have a look at:
> > https://github.com/linux-can/can-utils/blob/master/cansequence.c
> 
> I had no idea about SO_RXQ_OVFL. This looks to be useful to implement an
> emergency recovery mechanism not to get stuck. If detecting loss of received
> frames and the controller is still active and TX messages are pending for a
> too long time then marking the pending TX messages as used (done) to cope
> with the situation and not getting stuck (for too long). Might be acceptable
> if this was something which normally does not happen besides in really
> exceptional situations.

Your user space bridge is the wrong solution here.....See Oliver's mail.

> Nothing which should be done now, getting far too complicated for a 1st shot
> to implement a Virtio CAN device.
> 
> > We don't have a feature flag to query if the Linux driver support proper
> > CAN echo on TX complete notification.
> 
> Not so nice. But the device integrator should know which backend is used and
> having a command line option for the device application the issue can be
> handled. Need the command line switch anyway now to do experiments.

If needed we can add flags to the CAN drivers so that they are
introspectable, maybe via the ethtool interface.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux