On Thu, Oct 8, 2020 at 12:40 AM Michel Lespinasse <walken@xxxxxxxxxx> wrote: > > On Wed, Oct 7, 2020 at 11:44 AM Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote: > > The goal of these tracepoints is to be able to debug lock contention > > issues. This lock is acquired on most (all?) mmap / munmap / page fault > > operations, so a multi-threaded process which does a lot of these can > > experience significant contention. > > > > We trace just before we start acquisition, when the acquisition returns > > (whether it succeeded or not), and when the lock is released (or > > downgraded). The events are broken out by lock type (read / write). > > > > The events are also broken out by memcg path. For container-based > > workloads, users often think of several processes in a memcg as a single > > logical "task", so collecting statistics at this level is useful. > > > > The end goal is to get latency information. This isn't directly included > > in the trace events. Instead, users are expected to compute the time > > between "start locking" and "acquire returned", using e.g. synthetic > > events or BPF. The benefit we get from this is simpler code. > > > > Because we use tracepoint_enabled() to decide whether or not to trace, > > this patch has effectively no overhead unless tracepoints are enabled at > > runtime. If tracepoints are enabled, there is a performance impact, but > > how much depends on exactly what e.g. the BPF program does. > > > > Signed-off-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> > > Thanks for working on this. > > I like that there is no overhead unless CONFIG_TRACING is set. Actually, it's even a bit better. Even if CONFIG_TRACING is set, the only overhead we add is a single static_key branch (much cheaper than a normal branch). The overhead is only introduced when the tracepoints are enabled at runtime, e.g. by writing to /sys/kernel/trace/events/(event)/enabled. > However, I think the __mmap_lock_traced_lock and similar functions are > the wrong level of abstraction, especially considering that we are > considering to switch away from using rwsem as the underlying lock > implementation. Would you consider something along the following lines > instead for include/linux/mmap_lock.h ? Sure, there's no reason that can't work. It's just a tradeoff: Breaking up these helpers the way you describe makes the locking functions more verbose (more repeated code; overall this file would be longer), but there are a few less lines to change when we switch from rwsem to something else (because we'd only need to change the lock functions vs. as-is we'd have to change a couple of lines in each of the helpers too). I prefer the way it is, but I don't feel that strongly about it. I'll refactor and send a v3 that looks as you describe. > > #ifdef CONFIG_TRACING > > DECLARE_TRACEPOINT(...); > > void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write); > > static inline void mmap_lock_trace_start_locking(struct mm_struct *mm, > bool write) > { > if (tracepoint_enabled(mmap_lock_start_locking)) > __mmap_lock_do_trace_start_locking(mm, write); > } > > #else > > static inline void mmap_lock_trace_start_locking(struct mm_struct *mm, > bool write) {} > > #endif > > static inline void mmap_write_lock(struct mm_struct *mm) > { > mmap_lock_trace_start_locking(mm, true); > down_write(&mm->mmap_lock); > mmap_lock_trace_acquire_returned(mm, true, true); > } > > I think this is more straightforward, and also the > mmap_lock_trace_start_locking and similar functions don't depend on > the underlying lock implementation. > > The changes to the other files look fine to me. > > -- > Michel "Walken" Lespinasse > A program is never fully debugged until the last user dies.