Re: [PATCH rework v1] canxl: add virtual CAN network identifier support

Oliver Hartkopp <socketcan@xxxxxxxxxxxx> · Mon, 12 Feb 2024 22:06:11 +0100

Hello Guy,

On 2024-02-12 20:27, Guy Harris wrote:
On Feb 12, 2024, at 7:26 AM, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote:

I'm currently in discussion with Guy Harris who's working on the Linux-CAN support in Wireshark.

*AND* in libpcap.  Note that anything done with the metadata headers of SocketCAN packets affects both of them - and any programs *other* than Wireshark that capture on SocketCAN adapters or read pcap or pcapng files with SocketCAN packets.

Right.

We current have two approaches how to integrate the VCID into the struct canxl_frame:

1. Add the VCID at a specific 16 bit offset (above the 11 bit Prio ID)
2. Create a endian-dependent data structure with a separate uint8 VCID

1: https://lore.kernel.org/linux-can/20240212151404.60828-1-socketcan@xxxxxxxxxxxx/
2: https://lore.kernel.org/linux-can/20240128183758.68401-1-socketcan@xxxxxxxxxxxx/

The endian-dependent data structure looks like this:

struct canxl_frame {
#if defined(__LITTLE_ENDIAN)
        __u16 prio;   /* 11 bit priority for arbitration */
        __u8  vcid;   /* virtual CAN network identifier */
        __u8  __res0; /* reserved / padding must be set to zero */
#elif defined(__BIG_ENDIAN)
        __u8  __res0; /* reserved / padding must be set to zero */
        __u8  vcid;   /* virtual CAN network identifier */
        __u16 prio;   /* 11 bit priority for arbitration */
#else
#error "Unknown endianness"
#endif
        __u8  flags;  /* additional flags for CAN XL */
        __u8  sdt;    /* SDU (service data unit) type */
        __u16 len;    /* frame payload length in byte */
        __u32 af;     /* acceptance field */
        __u8  data[CANXL_MAX_DLEN];
};

@Guy: Besides the fact that suggestion 2 does not win a design prize I'm not sure whether solution 1 or 2 are better to maintain over lifetime.

The first question to ask is "which solution breaks less existing source and binary code"?

Although the CAN XL support was introduced in Linux 6.1, there is no 
hardware CAN XL driver available today and only a low 1-digit number of 
developers working on PoC code for CAN CiA protocols on virtual CAN 
interfaces.

We would not break any user space programs BUT the ABI and the socket 
API should be maintained as we can not change the uapi defines down to 
Linux 6.1.

Currently, the SocketCAN metadata header for CAN XL frames begins with a 4-byte integral-valued field.  I'm not seeing code in the tip of the main branch, in either the git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git repository or the git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git repository, that sets or gets the prio field of the canxl_frame structure, so I'm guessing that the prio field in a canal_frame structure would be in *host* byte order when received or sent.

Presumably any software that sees that structure would either

	1) treat the prio field as containing the priority in the low-order 11 bits and zeroes in the upper 21 bits

or

	2) tread the prio field as containing the priority in the low-order 11 bits and would extract the priority by ANDing the prio field with CANXL_PRIO_MASK.

At the binary level, code that does 1) would be broken by putting the VCID in that field; the only way not to break it would be to add an additional field for the VCID, but *that* would change the length of the metadata header, which would break, at the binary level, any code that thinks the header is 12 bytes long, so there's nothing you can to to add the VCID without breaking that software, so that code shouldn't be taken into account.

At the *source* level, code that does 1) would be broken by putting the VCID in the 8 bits above the lower 16 bits, and wouldn't be broken by changing the structure to have the priority in a 16-bit field, with additional 8-bit vcid and reserved fields.

Code that does 2) would not be broken by *either* solution, at the source or binary level.

Having that as a 32-bit field named "priority" gives the field a name that doesn't indicate that it also includes the VCID.  Renaming of the field as "priority_vcid" would break source compatibility; adding

	#define priority	priority_vcid

would fix that, at the expense of possibly breaking or, at least, uglifying debugging of code that has a variable or function or structure name or... named "priority".

Having that as a 16-bit field named "priority", an 8-bit field named "vcid", and an 8-bit reserved field involves, for structure layout compatibility, two different sets of structure members, with byte-order tests at compile time, which is a bit ugly, but does make it clear that one field contains the priority (and 5 reserved bits) and another field contains the VCID.

*Both* solutions result in the metadata header's first 4 bytes being:

byte 0: lower 8 bits of the priority;
byte 1: 5 reserved bits and upper 3 bits of the priority;
byte 2: VCID
byte 3: 8 reserved bits

on little-endian machines and

byte 0: 8 reserved bits
byte 1: VCID
byte 3: 5 reserved bits and upper 3 bits of the priority;
byte 4: lower 8 bits of the priority

on big-endian machines, i.e. the layout of the data in memory is the same for *both* structures.

Many thanks for your feedback!!

I finally have a clear view on the requirements and would finally vote 
for this patch that maintains the 32 bit prio element and "shift" the 
vcid at bit position 16:

https://lore.kernel.org/linux-can/20240212151404.60828-1-socketcan@xxxxxxxxxxxx/

And here are the reasons why:

1. It doesn't break existing code and ABI
   - the applications can use the existing CAN XL frames
   - there is no VCID content visible by default
   - it is intended that applications do not necessarily know the VCID

2. So far there are no CAN XL hardware drivers which are the really 
relevant users of the VCID information

3. The user explicitly has to switch on the CAN_RAW_XL_VCID_OPTS 
sockopt() for CAN_RAW sockets to enable and handle the VCID content

By now the VCID place in the prio is always set to zero which means an 
"untagged" CAN XL frame, which is fine.

So there is a new feature which will likely emerge in Linux 6.9 which is 
not enabled by default and which does not break the existing ABI and code.

When a VCID capable application runs on an older kernel the 
CAN_RAW_XL_VCID_OPTS sockopt() will fail and there will be no VCID 
content. So it is backwards compatible.

And when the sockopt() succeeds, the (new) application knows how to 
handle the prio content.

Best regards,
Oliver

ps. @Marc: I found a small issue in the commit message of the above 
patch as I removed the obsolete CANXL_VCID from an early implementation. 
So I'll send a "rework v2" patch.