On Wed, Aug 3, 2022 at 6:34 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Wed, 3 Aug 2022 18:05:44 -0700 Joanne Koong wrote: > > I think the problem is that the skb may be cloned, so a write into any > > portion of the paged data requires a pull. If it weren't for this, > > then we could do the write and checksumming without pulling (eg kmap > > the page, get the csum_partial of the bytes you'll write over, do the > > write, get the csum_partial of the bytes you just wrote, then unkmap, > > then update skb->csum to be skb->csum - csum of the bytes you wrote > > over + csum of the bytes you wrote). I think we would even be able to > > provide a direct data slice to non-contiguous pages without needing > > the additional copy to a stack buffer (eg kmap the non-contiguous > > pages to a contiguous virtual address that we pass back to the bpf > > program, and then when the bpf program is finished do the cleanup for > > the mappings). > > The whole read/write/data concept is not a great match for packet > parsing. Primary use for packet parsing is that you want to read > a header and not have to deal with frags or pulling. In that case > you should get a direct pointer or a copy on the stack, transparently. The selftests [0] includes some examples of packet parsing using dynptrs. You're able to get a pointer to the headers (if it's in the head) directly, or you can use bpf_dynptr_read() to read the data from the frag into a buffer (without needing to pull; bpf_dynptr_read() essentially just calls bpf_skb_load_bytes()). Does this address the use cases you have in mind? I think the pull and unclone stuff only pertains to the cases for writes into the frags. [0] https://lore.kernel.org/bpf/20220726184706.954822-4-joannelkoong@xxxxxxxxx/ > > Maybe before I go on talking nonsense I should read up on what dynptr > is and what it's trying to achieve. Stan says its like unique_ptr in > C++ which tells me.. nothing :) > > $ git grep dynptr -- Documentation/ > $ > > Any pointers? Ooh thanks for the reminder, adding a page for the dynptr documentation is on my to-do list A dynptr (also known as fat pointers in other languages) is a pointer that stores extra metadata alongside the address it points to. In particular, the metadata the bpf dynptr stores includes the length of data it points to (so in the case of skb/xdp, the size of the packet), the type of dynptr, and properties like whether it's read-only. Dynptrs are an interface for bpf programs to dynamically access memory, because the helper calls enforce that the accesses are safe. For example, bpf_dynptr_read() allows reads at offsets and lengths not statically known at compile time (in bpf_dynptr_read, the kernel uses the metadata to check that the offset and length doesn't violate memory bounds); without dynptrs, the verifier can't guarantee that the offset or length of the read is safe since those values aren't statically known at compile time, so for example you can't directly read a dynamic number of bytes from a packet. With regards to skb + xdp, the main use case of dynptrs are to 1) allow dynamic-size accesses of packet data and 2) allow easier and simpler packet parsing (for example, accessing skb->data directly requires multiple if checks for ensuring it's within bounds of skb->data_end in order to satisfy the verifier; with the dynptr interface, you are able to get a direct data slice and access it without needing the checks. The selftests 3rd patch has some demonstrations of this). > > > Three ideas I'm thinking through as a possible solution: > > 1) Enforce that the skb is always uncloned for skb-type bpf progs (we > > currently do this just for the skb head, see bpf_unclone_prologue()), > > but I'm not sure if the trade-off (pulling all the packet data, even > > if it won't be used) is acceptable. > > > > 2) Don't support cloned skbs for bpf_dynptr_write/data and don't do > > any pulling. If the prog wants to use bpf_dynptr_write/data, then they > > have to pull it first > > I think all output skbs from TCP are cloned, so that's not gonna work. > > > 2) (uglier than #1 and #2) For bpf_dynptr_write()s, pull if the write > > is to a paged area and the skb is cloned, otherwise write to the paged > > area without pulling; if we do this, then we always have to invalidate > > all data slices associated with the skb (even for writes to the head) > > since at prog load time, the verifier doesn't know if the pull happens > > or not. For bpf_dynptr_data()s, follow the same policy. > > > > I'm leaning towards 2. What are your thoughts?