On 31.07.22 05:19, Vincent Mailhol wrote:
On Sun. 31 Jul. 2022 at 06:05, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote:
On 30.07.22 15:16, Vincent Mailhol wrote:
On Sat. 30 juil. 2022 at 21:06, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote:
Existing applications are exempted from this risk: the
CAN_RAW_XL_FRAMES socket option will prevent the collision from
happening. To summarise, the collision will only occur if below
criteria are met:
* The user allocates a can_frame without zeroing it (typical use
cases are stack declaration or malloc()).
* The user activates the CAN_RAW_XL_FRAMES socket option.
* Garbage value results in can_frame::__pad & CANXL_XLF being set.
So if going in that direction, we would have to update the
documentation (header files and kernel doc) to point at this pitfall
and specify that can_frame::__pad shall always be zeroed for mix
usages.
But you would add this recommendation to a *very late point* - namely
after 14 years of SocketCAN in the mainline kernel.
If you think about the rule "never break user space" I wonder if the
risk turns out to be higher when people enable CAN_RAW_XL_FRAMES ("as
seen on Stackoverflow") and not read the kernel doc recommendations to
review/update their existing code.
In which aspect does this break the user space? It doesn't.
It does in the way that you need to zero-initialize structures now.
Enabling CAN_RAW_XL_FRAMES marks a clear transition between the 14
years old Classical CAN API and the new use case of mixed CAN(-FD) and
CAN XL frames.
You are seriously telling me that CAN_RAW_XL_FRAMES is a clear
transition and people will automatically review and rework their former
CAN/CANFD code to zero-initialize padding bytes?
There is a much higher risk that this does not happen than telling
people to check for CANXL_XLF first after reading from the socket!
For the issue to occur, developers have to incorrectly modify their
application, I do not call this a break in user space if only the
yet-to-be-created programs are impacted.
I don't know about your thresholds but I call this a break in user
space. Yet-to-be-created programs will likely copy old code.
The argument that people do not read the kernel doc apply to both
cases: some people will not initialise variables, some people will not
do the "if" checks in the right order.
To make it simple: The difference between CAN XL and (old) CAN/CANFD
handling is CANXL_XLF.
I'm really tired about your constructed case that people would be too
dumb to check for CANXL_XLF first.
This CANFD_FDF check is a new optional(!) possibility, which we do not
need to promote. This could become a power user trick for people that
are obviously "smart enough" to put if-statements in the right order.
The correct and safe check that should be documented and suggested looks
like this:
if (can.xl.flags & CANXL_XLF)
/* Handle CAN XL frame */
else if (nbytes == CANFD_MTU)
/* Handle CAN FD frame */
else if (nbytes == CAN_MTU) /* paranoid */
/* Handle Classic CAN frame */
else
/* error */
After a good night of sleep, I found a use case which destroys my idea:
int main()
{
static union can can;
can.xl.flags = CANXL_XLF;
/* set other fields, open socket... */
write(s, &can.xl, CANXL_HD_SIZE + can.xl.len);
can.cc.len = 8;
/* set other field but leave can.cc._pad untouched */
write(s, &can.cc, sizeof(can.cc));
/* ... */
}
Here, we have a static declaration so the frame is guaranteed to be
zero initialized. The user then sends a CAN XL frame (and thus sets
the CANXL_XLF flag). Then the user sends a Classic CAN frame. However,
can.cc._pad still contains the CANXL_XLF flag set. I stay in my
position that asking to zero initialize variables for mix usage is
acceptable. But requiring to reset the can.cc._pad to zero in the
middle of the run time is too much.
This is a very bad foot gun. So let me withdraw my idea. Using the
0x80 flag on the can(fd)_frame.len is the best.
Thanks.
Best regards,
Oliver