Re: Comments on new user events ABI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- On Mar 29, 2022, at 5:54 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx wrote:

> On Tue, Mar 29, 2022 at 12:17:14PM -0400, Mathieu Desnoyers wrote:
>> ----- On Mar 28, 2022, at 8:29 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx
>> wrote:
>> 
>> >> ----- On Mar 28, 2022, at 4:24 PM, Mathieu Desnoyers
>> >> mathieu.desnoyers@xxxxxxxxxxxx wrote:
[...]
> 
>> > 
>> >> I would have rather thought that tracers implemented in user-space could
>> >> register
>> >> themselves, and then there could be one
>> >> /sys/kernel/debug/tracing/user_events_status
>> >> per tracer. Considering that all kernel tracers use the same ABI to write an
>> >> event,
>> >> and then dispatch this event internally within the kernel to each registered
>> >> tracer, I would expect to have a single memory mapping for all those (e.g. a
>> >> /sys/kernel/debug/tracing/user_events_status/kernel_tracers file).
>> >> 
>> >> Then eventually if we have other user-space tracers such as lttng-ust with its
>> >> their own user-space code performing tracing in a shared memory ring buffer, it
>> >> would make sense to allow it to register its own
>> >> /sys/kernel/debug/tracing/user_events_status/lttng_ust file, with its own
>> >> indexes.
>> >> 
>> > 
>> > I don't follow that. The intention is to get user processes to participate with
>> > trace_events and the built-in tooling. When would a user-space tracer be used
>> > instead of perf/ftrace?
>> > 
>> > It seems like a feature request?
>> 
>> It can very well be out of scope for the user events, and I'm fine with that.
>> I was merely considering how the user events could be leveraged by tracers
>> implemented purely in user-space. But if the stated goal of this feature is
>> really to call into kernel tracers through a writev(), then I suspect that
>> supporting purely user-space tracers is indeed out of scope.
>> 
> 
> That was the goal with this ABI, are there maybe ways we can change the
> ABI to accomodate this later without shutting that out?

I suspect that targeting this bit-enable memory mapping only as a gate for
kernel tracers is good enough for now. Let's consider other use-cases when
we have a clear picture of how it would be used. For the moment, I suspect
that I would prefer to manage my own bit-enable memory mapping within lttng-ust
rather than use a kernel-wide facility, which opens up questions about isolation
of containers, and ability to run multiple concurrent instances on a given kernel
without side-effects on each other.

I suspect that clarifying what this mmap'd based mechanism brings over the already
existing SDT semaphores will be important to justify this new ABI.

Ref. https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

[...]

> 
>> [...]
>> 
>> >> kernel/trace/trace_events_user.c:
>> >> 
>> >> static int user_events_release(struct inode *node, struct file *file)
>> >> {
>> >> [...]
>> >>         /*
>> >>          * Ensure refs cannot change under any situation by taking the
>> >>          * register mutex during the final freeing of the references.
>> >>          */
>> >>         mutex_lock(&reg_mutex);
>> >> [...]
>> >>         mutex_unlock(&reg_mutex);
>> >> 
>> >>         kfree(refs);
>> >> 
>> >>         ^ AFAIU, the user_events_write() does not rely on reg_mutex to ensure mutual
>> >>         exclusion.
>> >>           Doing so would be prohibitive performance-wise. But I suspect that freeing
>> >>           "refs" here
>> >>           without waiting for a RCU grace period can be an issue if user_events_write_core
>> >>           is using
>> >>           refs concurrently with file descriptor close.
>> >> 
>> > 
>> > The refs is per-file, so if user_events_write() is running release cannot be
>> > called for that file instance, since it's being used. Did I miss something?
>> 
>> I'm possibly the one missing something here, but what prevents 2 threads
>> from doing user_event_write() and close() concurrently on the same file
>> descriptor ?
>> 
> 
> While nothing prevents it, my understanding is that the actual release()
> on the file_operations won't be invoked until the file struct has been
> ref'd down to zero. (fs/file_table.c:fput, fs/read_write.c:do_pwritev).
> 
> While the write call is being run, the file should have a non-zero ref
> count. I believe looking at do_pwritev the fdget/fdput pair are largely
> responsible for this, do you agree?

Yes, you are correct. So the "refs" lifetime for writev vs close is guaranteed
by the reference counting of the struct file when the process file descriptor
table has multiple users (e.g. multithreaded process).

[...]

Thanks,

Mathieu



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux