On Thu, Oct 10, 2019 at 02:07:29PM +0300, Ido Schimmel wrote: > On Wed, Oct 09, 2019 at 09:15:03PM -0700, Alexei Starovoitov wrote: > > +SEC("raw_tracepoint/kfree_skb") > > +int trace_kfree_skb(struct trace_kfree_skb *ctx) > > +{ > > + struct sk_buff *skb = ctx->skb; > > + struct net_device *dev; > > + int ifindex; > > + struct callback_head *ptr; > > + void *func; > > + > > + __builtin_preserve_access_index(({ > > + dev = skb->dev; > > + ifindex = dev->ifindex; > > Hi Alexei, > > The patchset looks very useful. One question: Is it always safe to > access 'skb->dev->ifindex' here? I'm asking because 'dev' is inside a > union with 'dev_scratch' which is 'unsigned long' and therefore might > not always be a valid memory address. Consider for example the following > code path: > > ... > __udp_queue_rcv_skb(sk, skb) > __udp_enqueue_schedule_skb(sk, skb) > udp_set_dev_scratch(skb) > // returns error > ... > kfree_skb(skb) // ebpf program is invoked > > How is this handled by eBPF? Excellent question. There are cases like this where the verifier cannot possibly track semantics of the kernel code and union of pointer with scratch area like this. That's why every access through btf pointer is a hidden probe_read. Comparing to old school tracing all memory accesses were probe_read and bpf prog was free to read anything and page fault everywhere. Now bpf prog will almost always access correct data. Yet corner cases like this are inevitable. I'm working on few ideas how to improve it further with btf-tagged slab allocations and kasan-like memory shadowing. Your question made me thinking whether we have a long standing issue with dev_scratch, since even classic bpf has SKF_AD_IFINDEX hack which is implemented as: *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, dev), BPF_REG_TMP, BPF_REG_CTX, offsetof(struct sk_buff, dev)); /* if (tmp != 0) goto pc + 1 */ *insn++ = BPF_JMP_IMM(BPF_JNE, BPF_REG_TMP, 0, 1); *insn++ = BPF_EXIT_INSN(); if (fp->k == SKF_AD_OFF + SKF_AD_IFINDEX) *insn = BPF_LDX_MEM(BPF_W, BPF_REG_A, BPF_REG_TMP, offsetof(struct net_device, ifindex)); That means for long time [c|e]BPF code was checking skb->dev for NULL only. I've analyzed the code where socket filters can be called and I think it's good there. dev_scratch is used after sk_filter has run. But there are other hooks: lwt, various cgroups. I've checked lwt and cgroup inet/egress. I think dev_scratch should not be used in these paths. So should be good there as well. But I think the whole idea of aliasing scratch into 'dev' pointer is dangerous. There are plenty of tracepoints that do skb->dev->foo. Hard to track where everything is called. I think udp code need to move this dev_scratch into some other place in skb.