Really appreciate the quick turnaround! Answers in-line below. -- Kishen Maloor Intel Corporation On 6/21/21, 5:07 PM, "Jesper Dangaard Brouer" <brouer@xxxxxxxxxx> wrote: (Answer inlined below, please learn howto answer inline) On Mon, 21 Jun 2021 19:08:54 +0000 "Maloor, Kishen" <kishen.maloor@xxxxxxxxx> wrote: > Hi Jesper, > > I wanted to run some patches by you for your comments. Thanks a lot for reaching out, I really appreciate it. > Intel has a requirement to support an SO_TXTIME like capability with > AF_XDP sockets, to set a precise per-packet transmission time from > user space XDP applications. This is great! I also have a customer that have this requirement. Thus, I'm very interested in collaborating on getting this feature implemented in practice. Super! > It is also critical that this support be upstreamed into the Linux > mainline. I fully agree "upstream first". For this to happen, we should discuss the approach in a public forum as early as possible to get upstream feedback. Cc'ing bpf@xxxxxxxxxxxxxxx. Also adding our working group mailing list as not everybody follow the kernel list Cc'ed xdp-hints@xxxxxxxxxxxxxxx (https://lists.xdp-project.net/xdp-hints/). Multiple of your Intel colleagues (from other departments) have also reached out to me, and they are also on this xdp-hints mailing list. It would be great if you can keep this list Cc'ed to keep everybody in the loop. Consider subscribing: https://lists.xdp-project.net/postorius/lists/xdp-hints.xdp-project.net/ Yes, I am aware, although it seems (per my reading) the main focus there is on capturing metadata on the RX path and funneling that into the XDP program (and possibly higher). Whereas this change is solely focused on passing down TXTIME (in the TX path obv.) from the XDP application running in userspace. Said another way, the aim here is to accomplish something equivalent to SO_TXTIME with AF_XDP. > The implementation is roughly equivalent to that of SO_TXTIME for > AF_PACKET. It includes: > > - a new XSK bind flag, So, this feature is enabled for all packets on this AF_XDP socket. Yes, in setting the XDP_TXTIME bind flag while creating the AF_XDP socket, it will apply to all packets sent through that socket. > - libbpf API to set a TXTIME for a frame, > > - leverages the concept of UMEM headroom to store the per-packet > TXTIME (immediately preceding the XDP buff's data_hard_start), This sounds like the XDP metadata area? Or is this top of mem data-area (guess not, as you use +umem_headroom)? No. More specifically it is stored in the UMEM headroom area (it's default size per XSK_UMEM__DEFAULT_FRAME_HEADROOM is 0). Currently, the code in net/xdp adjusts for this headroom (pool->headroom) in one or more places so my understanding is that the entirety of the XDP frame (data_hard_start and beyond, including the metadata area) follows the UMEM headroom. What will happen when an XDP-prog use this same metadata area, and your feature (patch below) is enabled (via AF_XDP) ? So far as I understand, an XDP/BPF program that writes to the defined XDP metadata area will function as expected, i.e. it won't intrude in the UMEM headroom. E.g. can I set the TXTIME in XDP-prog (RX) and XDP_REDIRECT out an interface with TXTIME enabled? I believe that's possible. This is kind of why I wanted to loop in experts such as yourself as you'd have a more complete understanding (than myself) of the XDP code as it currently stands :) So, I'd appreciate all the feedback I can get to account for anything that's missed in the patches. > - internally signals the NIC driver via the XDP descriptor when > there's a TXTIME bound to a frame, and Looking at patch, if I don't call 'xsk_umem__set_txtime' (libbpf), then the kernel (xp_raw_get_txtime) will pickup random-data left over by previous user of the memory, right? Yes, though the expectation is that an XDP application that enables XDP_TXTIME on the socket will proceed to pass in a TXTIME with every frame it sends. > - provides a kernel API to retrieve it in the driver so it may proceed > to set the Launch Time on the NIC. > > It uses existing bitmasks and there's no overhead (i.e. not > allocating any more memory than is already done), and there are no > other dependencies. > > You may review the commit messages for all details. > > kernel: https://github.com/kmaloor/bpfnext/commit/b72a336bcb8df4a26646a29962ca95d58172147f > libbpf: https://github.com/kmaloor/libbpf/commit/eda68c1ad11ed60739837ad6e4fb256768231b55 > > We're currently testing these changes in conjunction with the igc > driver. Can you share the patches for the igc driver as well? That's currently in the works and being tested internally. We should hopefully share it soon. I want to see how you convert the "__s64 txtime" into the 30-bit LaunchTime used[0] by the hardware (i225). It would also be good to see how this is used for driver igb/i210, that have a 25-bit LaunchTime. Details on driver+hardware here: [0] https://github.com/xdp-project/xdp-project/blob/master/areas/tsn/code01_follow_qdisc_TSN_offload.org > Meanwhile, we'd appreciate any comments you may have, particularly as > an expert with the kernel XDP subsystem. I think this TXTIME work is just *one* use-case of "xdp-hints". IMHO we should use something like BTF to describe the metadata memory area in-order to support more use-cases. Being concrete .e.g. your XSK socket bind call could register a BTF-ID that it tells the kernel about the layout. I guess, we need some guidance from BTF experts on how kernel can check the driver supports this LaunchTime feature, Andrii? Other use-cases: (1) On TX we also want to support asking for TX-csum by hardware. (2) Support setting VLAN via TX-descriptor (hint TSN can use VLAN priority bits). Well, to be perfectly clear, the TXTIME in this case is an application supplied actionable quantity, so I don't interpret it as "metadata". Whereas the examples you cite above indeed seem like metadata. In this patch, the goal is simply to replicate SO_TXTIME with AF_XDP. I am aware of ongoing discussions for handling XDP metadata/hints and so far as I understand (correct me if I'm wrong), we're still a ways off from a solution that the community will uphold. Again thanks for reaching out, lets find a solution together with upstream. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer Kernel: s64 xp_raw_get_txtime(struct xsk_buff_pool *pool, u64 addr) {  return *(s64 *)((char *)xp_raw_get_data(pool, addr) - XDP_TXTIME_LEN); } EXPORT_SYMBOL(xp_raw_get_txtime); libbpf: static inline void xsk_umem__set_txtime(void *umem_area, __u64 addr, __u32 umem_headroom, __s64 txtime) {  *(__s64 *)(&((char *)umem_area)[addr] + umem_headroom - XDP_TXTIME_LEN) = txtime; }