Re: [PATCH v7 4/7] can: canxl: introduce CAN XL data structure

Oliver Hartkopp <socketcan@xxxxxxxxxxxx> · Sun, 31 Jul 2022 10:21:41 +0200

On 31.07.22 05:19, Vincent Mailhol wrote:
On Sun. 31 Jul. 2022 at 06:05, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote:
On 30.07.22 15:16, Vincent Mailhol wrote:
On Sat. 30 juil. 2022 at 21:06, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote:

Existing applications are exempted from this risk: the
CAN_RAW_XL_FRAMES socket option will prevent the collision from
happening. To summarise, the collision will only occur if below
criteria are met:

     * The user allocates a can_frame without zeroing it (typical use
cases are stack declaration or malloc()).
     * The user activates the CAN_RAW_XL_FRAMES socket option.
     * Garbage value results in can_frame::__pad & CANXL_XLF being set.

So if going in that direction, we would have to update the
documentation (header files and kernel doc) to point at this pitfall
and specify that can_frame::__pad shall always be zeroed for mix
usages.

But you would add this recommendation to a *very late point* - namely
after 14 years of SocketCAN in the mainline kernel.

If you think about the rule "never break user space" I wonder if the
risk turns out to be higher when people enable CAN_RAW_XL_FRAMES ("as
seen on Stackoverflow") and not read the kernel doc recommendations to
review/update their existing code.

In which aspect does this break the user space? It doesn't.

It does in the way that you need to zero-initialize structures now.

Enabling CAN_RAW_XL_FRAMES marks a clear transition between the 14
years old Classical CAN API and the new use case of mixed CAN(-FD) and
CAN XL frames.

You are seriously telling me that CAN_RAW_XL_FRAMES is a clear
transition and people will automatically review and rework their former
CAN/CANFD code to zero-initialize padding bytes?

There is a much higher risk that this does not happen than telling
people to check for CANXL_XLF first after reading from the socket!

For the issue to occur, developers have to incorrectly modify their
application, I do not call this a break in user space if only the
yet-to-be-created programs are impacted.

I don't know about your thresholds but I call this a break in user
space. Yet-to-be-created programs will likely copy old code.

The argument that people do not read the kernel doc apply to both
cases: some people will not initialise variables, some people will not
do the "if" checks in the right order.

To make it simple: The difference between CAN XL and (old) CAN/CANFD
handling is CANXL_XLF.

I'm really tired about your constructed case that people would be too
dumb to check for CANXL_XLF first.

This CANFD_FDF check is a new optional(!) possibility, which we do not
need to promote. This could become a power user trick for people that
are obviously "smart enough" to put if-statements in the right order.

The correct and safe check that should be documented and suggested looks
like this:

if (can.xl.flags & CANXL_XLF)
       /* Handle CAN XL frame */
else if (nbytes == CANFD_MTU)
       /* Handle CAN FD frame */
else if (nbytes == CAN_MTU) /* paranoid */
       /* Handle Classic CAN frame */
else
       /* error */

After a good night of sleep, I found a use case which destroys my idea:

int main()
{
     static union can can;

     can.xl.flags = CANXL_XLF;
     /* set other fields, open socket... */
     write(s, &can.xl, CANXL_HD_SIZE + can.xl.len);

     can.cc.len = 8;
     /* set other field but leave can.cc._pad untouched */
     write(s, &can.cc, sizeof(can.cc));

     /* ... */
}

Here, we have a static declaration so the frame is guaranteed to be
zero initialized. The user then sends a CAN XL frame (and thus sets
the CANXL_XLF flag). Then the user sends a Classic CAN frame. However,
can.cc._pad still contains the CANXL_XLF flag set. I stay in my
position that asking to zero initialize variables for mix usage is
acceptable. But requiring to reset the can.cc._pad to zero in the
middle of the run time is too much.
This is a very bad foot gun. So let me withdraw my idea. Using the
0x80 flag on the can(fd)_frame.len is the best.

Thanks.

Best regards,
Oliver