On Wed, Feb 26, 2025 at 09:30:18AM -0800, Alexei Starovoitov wrote: > On Tue, Feb 25, 2025 at 6:27 AM Maciej Fijalkowski > <maciej.fijalkowski@xxxxxxxxx> wrote: > > > > On Fri, Feb 21, 2025 at 05:55:57PM -0800, Alexei Starovoitov wrote: > > > On Thu, Feb 20, 2025 at 5:45 AM Maciej Fijalkowski > > > <maciej.fijalkowski@xxxxxxxxx> wrote: > > > > > > > > Hi! > > > > > > > > This patchset provides what is needed for storing skbs as kptrs in bpf > > > > maps. We start with necessary kernel change as discussed at [0] with > > > > Martin, then next patch adds kfuncs for handling skb refcount and on top > > > > of that a test case is added where one program stores skbs and then next > > > > program is able to retrieve them from map. > > > > > > > > Martin, regarding the kernel change I decided to go with boolean > > > > approach instead of what you initially suggested. Let me know if it > > > > works for you. > > > > > > > > Thanks, > > > > Maciej > > > > > > > > [0]: https://lore.kernel.org/bpf/Z0X%2F9PhIhvQwsgfW@boxer/ > > > > > > Before we go further we need a lot more details on "why" part. > > > In the old thread I was able to find: > > > > > > > On TC egress hook skb is stored in a map ... > > > > During TC ingress hook on the same interface, the skb that was previously > > > stored in map is retrieved ... > > > > > > This is too cryptic. I see several concerns with such use case > > > including netns crossing, L2/L3 mismatch, skb_scrub. > > > > > > I doubt we can make such "skb stash in a map" safe without > > > restricting the usage, so please provide detailed > > > description of the use case. > > > > Hi Alexei, > > > > We have a system with two nodes: one is a fully fledged Linux system (big node) > > and the other one a smaller embedded system. The big node runs Linux PTP for > > time synchronization, the smaller node we have no control over (might run Linux > > or something else). The problem is that we would like to use the Tx timestamps > > from the small node in the Linux PTP application on the big node. When a packet > > is sent out from the big node it arrives at the small node that send it out one > > of its interfaces. It then replies with another packet back to the big node > > with the Tx timestamp in it. > > > > Our current PoC for attacking this is to store the skb in a map (using this > > patch set) when it is sent out from the big node then retrieve it from the map > > when the reply from the small node is received. We then take the timestamp from > > the packet and put it in the skb and send it up to the socket error queue so > > that Linux PTP works out of the box. > > This sounds like you're actually xmit-ing the skb out of the big node > and storing it in a map via simple refcnt++. > That may work in some setups, but in general is not quite correct > from networking stack pov. > You need to skb_clone() it and keep the clone, so only cloned skb > can go into the socket error queue and up to user space. > xmit-ing the same skb and sending to user space > is going to cause issues. This skb only goes to errqueue and then gets its refcount decremented and dropped, we do not xmit that skb per-se. > > Cleaner design would probably involve bpf_clone_redirect() > and may be some form of bpf-qdisc where packet is waiting in the queue > until its hwtimestamp is adjusted and its released from the queue > into user space. We were working on clone initially but map storage turned out to be much cleaner approach, but we can re-iterate on that in case explanation provided above regarding not xmitting stored skb still feels off for you. > > What happens when small node doesn't send that timestamped packet? > The map will eventually overflow, right? > Overall it feels that stashing skb-s in a map isn't the right approach. We did not handle that in our demo but I believe BPF timers would be useful here, wouldn't they? > > > Note that for the purpose of setting skb->hwtstamp and sending it up to the > > error queue we are adding a kfunc (bpf_tx_tstamp) responsible for it, which is > > not included in this set, so I understand it is not obvious what we achieved > > with the current form of this patch set being discussed. > > > > We did not consider the restrictions that should be implemented from netns POV, > > so that is a good point. How would you see this being fixed? > >