On 06/04/2019 01:54 AM, Alexei Starovoitov wrote: > On Mon, Jun 3, 2019 at 4:48 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: >> On 06/04/2019 01:27 AM, Alexei Starovoitov wrote: >>> On Mon, Jun 3, 2019 at 3:59 PM Matt Mullins <mmullins@xxxxxx> wrote: >>>> >>>> If these are invariably non-nested, I can easily keep bpf_misc_sd when >>>> I resubmit. There was no technical reason other than keeping the two >>>> codepaths as similar as possible. >>>> >>>> What resource gives you worry about doing this for the networking >>>> codepath? >>> >>> my preference would be to keep tracing and networking the same. >>> there is already minimal nesting in networking and probably we see >>> more when reuseport progs will start running from xdp and clsbpf >>> >>>>> Aside from that it's also really bad to miss events like this as exporting >>>>> through rb is critical. Why can't you have a per-CPU counter that selects a >>>>> sample data context based on nesting level in tracing? (I don't see a discussion >>>>> of this in your commit message.) >>>> >>>> This change would only drop messages if the same perf_event is >>>> attempted to be used recursively (i.e. the same CPU on the same >>>> PERF_EVENT_ARRAY map, as I haven't observed anything use index != >>>> BPF_F_CURRENT_CPU in testing). >>>> >>>> I'll try to accomplish the same with a percpu nesting level and >>>> allocating 2 or 3 perf_sample_data per cpu. I think that'll solve the >>>> same problem -- a local patch keeping track of the nesting level is how >>>> I got the above stack trace, too. >>> >>> I don't think counter approach works. The amount of nesting is unknown. >>> imo the approach taken in this patch is good. >>> I don't see any issue when event_outputs will be dropped for valid progs. >>> Only when user called the helper incorrectly without BPF_F_CURRENT_CPU. >>> But that's an error anyway. >> >> My main worry with this xchg() trick is that we'll miss to export crucial >> data with the EBUSY bailing out especially given nesting could increase in >> future as you state, so users might have a hard time debugging this kind of >> issue if they share the same perf event map among these programs, and no >> option to get to this data otherwise. Supporting nesting up to a certain >> level would still be better than a lost event which is also not reported >> through the usual way aka perf rb. > > I simply don't see this 'miss to export data' in all but contrived conditions. > Say two progs share the same perf event array. > One prog calls event_output and while rb logic is working > another prog needs to start executing and use the same event array Correct. > slot. Today it's only possible for tracing prog combined with networking, > but having two progs use the same event output array is pretty much > a user bug. Just like not passing BPF_F_CURRENT_CPU. I don't see the user bug part, why should that be a user bug? It's the same as if we would say that sharing a BPF hash map between networking programs attached to different hooks or networking and tracing would be a user bug which it is not. One concrete example would be cilium monitor where we currently expose skb trace and drop events a well as debug data through the same rb. This should be usable from any type that has perf_event_output helper enabled (e.g. XDP and tc/BPF) w/o requiring to walk yet another per cpu mmap rb from user space.