On Thu, 3 Oct 2024 20:26:29 -0400 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) > { > struct trace_array *tr = data; > struct trace_event_file *trace_file; > struct syscall_trace_enter *entry; > struct syscall_metadata *sys_data; > struct trace_event_buffer fbuffer; > unsigned long args[6]; > int syscall_nr; > int size; > > syscall_nr = trace_get_syscall_nr(current, regs); > if (syscall_nr < 0 || syscall_nr >= NR_syscalls) > return; > > /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */ > trace_file = rcu_dereference_sched(tr->enter_syscall_files[syscall_nr]); > > ^^^^ this function explicitly states that preempt needs to be disabled by > tracepoints. Ah, I should have known it was the syscall portion. I don't care for this hidden dependency. I rather add a preempt disable here and not expect it to be disabled when called. > > if (!trace_file) > return; > > if (trace_trigger_soft_disabled(trace_file)) > return; > > sys_data = syscall_nr_to_meta(syscall_nr); > if (!sys_data) > return; > > size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; > > entry = trace_event_buffer_reserve(&fbuffer, trace_file, size); > ^^^^ it reserves space in the ring buffer without disabling preemption explicitly. > > And also: > > void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer, > struct trace_event_file *trace_file, > unsigned long len) > { > struct trace_event_call *event_call = trace_file->event_call; > > if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) && > trace_event_ignore_this_pid(trace_file)) > return NULL; > > /* > * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables > * preemption (adding one to the preempt_count). Since we are > * interested in the preempt_count at the time the tracepoint was > * hit, we need to subtract one to offset the increment. > */ > ^^^ This function also explicitly expects preemption to be disabled. > > So I rest my case. The change I'm introducing for tracepoints > don't make any assumptions about whether or not each tracer require > preempt off or not: it keeps the behavior the _same_ as it was before. > > Then it's up to each tracer's developer to change the behavior of their > own callbacks as they see fit. But I'm not introducing regressions in > tracers with the "big switch" change of making syscall tracepoints > faultable. This will belong to changes that are specific to each tracer. I rather remove these dependencies at the source. So, IMHO, these places should be "fixed" first. At least for the ftrace users. But I think the same can be done for the other users as well. BPF already stated it just needs "migrate_disable()". Let's see what perf has. We can then audit all the tracepoint users to make sure they do not need preemption disabled. -- Steve