Re: [PATCH v2 1/3] tracing/user_events: Fix incorrect return value for writing operation when events are disabled

sunliming <kelulanainsley@xxxxxxxxx> · Tue, 20 Jun 2023 17:07:30 +0800



Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> 于2023年6月20日周二 02:40写道：
>
> On Mon, Jun 19, 2023 at 04:51:56PM +0800, sunliming wrote:
> > Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> 于2023年6月17日周六 00:08写道：
> > >
> > > On Fri, Jun 09, 2023 at 11:03:00AM +0800, sunliming wrote:
> > > > The writing operation return the count of writes whether events are
> > > > enabled or disabled. This is incorrect when events are disabled. Fix
> > > > this by just return -ENOENT when events are disabled.
> > > >
> > >
> > > When testing this patch locally I found that we would occasionally get
> > > -ENOENT when events were enabled, but then become disabled, since writes
> > > do not have any locking around the tracepoint checks for performance
> > > reasons.
> > >
> > > I've asked a few peers of mine their thoughts on this, whether an error
> > > should result when there are no enabled events. The consensus I've heard
> > > back is that they would not consider this case an actual error, just as
> > > writing to /dev/null does not actually return an error.
> > >
> > > However, if you feel strongly we need this and have a good use case, it
> > > seems better to enable this logic behind a flag instead of having it
> > > default based on my conversations with others.
> > >
> > > Thanks,
> > > -Beau
> >
> >
> >
> > There is indeed a problem. Once enabled, perform the write operation
> > immediately.
> >
>
> The immediate write does work, and gets put into a buffer. The ftrace
> and perf self tests do the above case. So, no worries at this point.
>
> > Now，when the event is disabled, the trace record appears to be lost.
>
> I'm taking this to mean, if in between the time of the bit check and the
> actual write() /writev() syscall the event becomes disabled, the event
> won't write to the buffer. Yes, that is expected.
>
Yes , got it, thank you for your explanation.

> > In some situations
> > where data timing is sensitive, it may cause confusion. In this case,
> > not returning an
> > error (as mentioned in your reply, it is not considered this case an
> > actual error) and
> > returning 0 ( meaning that the number of data to be written is 0) may
> > be a good way
> > to handle it?
>
> This is where I get a little lost. What would a user process do with a
> return of 0 bytes? It shouldn't retry, since it just hit that small
> timing window. In reality, it just incurred a temporary excessive
> syscall cost, but no real data loss (the operator/admin turned the event
> off).
>
> I'm missing why you feel it's important the user process know such a
> window was hit?
>
> Can you help me understand that?
>
I haven't encountered a specific scenario that it's important the user process
know such a window was hit. This may be a mistake in my understanding.
When someone uses user events checking the output of an event to confirm
the execution status of a program, it may cause confusion if someone else
prohibits the event. This shouldn't be a serious issue, this patch just makes
things look better.

Thanks,
-Sunliming

> I do think returning 0 bytes is better than an error here, but I'd
> really like to know why the user process wants to know at all. Maybe
> they have user-space only logging and want to be able to mark there if
> it's in both spots (kernel and user buffers)?
>
> Thanks,
> -Beau
>
> > Thanks,
> > -Sunliming
> >
> > >
> > > > Signed-off-by: sunliming <sunliming@xxxxxxxxxx>
> > > > ---
> > > >  kernel/trace/trace_events_user.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
> > > > index 1ac5ba5685ed..92204bbe79da 100644
> > > > --- a/kernel/trace/trace_events_user.c
> > > > +++ b/kernel/trace/trace_events_user.c
> > > > @@ -1957,7 +1957,8 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i)
> > > >
> > > >               if (unlikely(faulted))
> > > >                       return -EFAULT;
> > > > -     }
> > > > +     } else
> > > > +             return -ENOENT;
> > > >
> > > >       return ret;
> > > >  }
> > > > --
> > > > 2.25.1