On 09/08/2019 16.27, Marc Kleine-Budde wrote:
On 8/9/19 12:27 PM, Oliver Hartkopp wrote:
The problem in loopback and vcan is similar.
But the issues that rises up when sanitizing the FD length value in
loopback/vcan/af_can -> we start to fiddle inside the skb data.
When we do so, we need to skb_copy() the skb instead of working on
clones to prevent race conditions inside the data, see:
https://elixir.bootlin.com/linux/v5.2.6/source/net/can/gw.c#L387
skb_copy would have a performance impact and it would trigger a big
rewriting of the current code. Don't know if it's worth that.
Which race condition do you have in mind?
E.g. you create a can-gw job that forwards the incoming CAN frame to 10
different vcan's.
These 10 skb's have been cloned which means their skb->data points to
the exact same location. When now each vcan instance on a multicore
system reads and writes the length info you have a classical race on
that one memory location.
For loopback ("can_put_echo_skb()"), I see a race condition in some CAN
drivers: They first do a can_put_echo_skb(), then keep using the skb to
write ->data into the hardware.
If we modify the skb in can_put_echo_skb(), then the CAN driver will see
the sanitized struct canfd_frame::len. This means for dlc == 13, 14, 15
the driver will write more data into the hardware than needed. It
probably depends on the HW what the controller sends out if you have a
dlc == 14 (which means len up to 48 bytes) but only write 33 bytes into
the registers.
Then you have uninitialized content from the CAN registers and not from
the skb on the wire :-)
While other drivers don't touch the skb after can_put_echo_skb().
Can we modify the ->len in can_get_echo_skb() (or the __ variant) right
before pushing the skv into the networking stack? We have to look at the
data path for driver that cannot/don't use can_get_echo_skb() due to no
TX-complete interrupt.
Looking at vcan, where's the race condition when modifying the skb in
vcan_tx()?
As written above. Additionally the echo is mostly done in can_send() in
af_can.c. And vcan_tx() is just a /dev/null from the packet flow
perspective.
I tend to sanitize the CAN FD length values when they are introduced
into the system (CAN_RAW, CAN_BCM, CAN FD drivers*) and when they are
modified (CAN_GW).
* = already done
You miss the datapath that directly injects packets into the networking
stack with the mechanism that tcpdump uses but the sending into the
kernel instead of receiving (I don't remember the exact name). This path
skips all checks in CAN_RAW. This is why we have the
"can_dropped_invalid_skb()" in all drivers.
I don't know whether these tools can create ethertype CAN(FD) skbs which
is mandatory for the CAN stack to process skbs. But if so, you are right.
So from my point of view it makes no sense to sanitize the len value in
gw, as the drivers convert from len to dlc anyways.
What do I see when I attach a candump to the src and another one to the
dst interface? I suspenct:
src) the unmodified canfd_frame with sanitized len, as the controller
knowns only the dlc.
dst) the modified-by-gw canfd_frame, _after_ the loopback by the driver?
If there's neither sanitation in the gw not in the loopback, it's the
unsanitized len.
When I have another application sending on the dst interface, the
candump receives the un-sanitized, as there's no sanitation in the loopback.
For me the loopback is and vcan is the part to modify, as gw is just a
another source of unsanitized len information.
My idea was to fix things up when someone manipulates the CAN or CAN FD
length field with can-gw.
For CAN you don't have to do any sanitization, only check if the len is
<= 8.
Mapping [0..8] to [0..8] is pointless but correct :-)
Or do you think I should just omit it and let the data flow as-is?
ACK
Ok, will send a v2 with a removed sanitized CAN FD length. The
boundaries of the max length information (max 8 or 64) is tested anyway.
Regards,
Oliver