On Tue, 14 Apr 2020 12:09:42 -0700, Dixit, Ashutosh wrote: > > On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote: > > On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote: > > > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote: > > >> > > >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote: > > >> > > > >> > A condition in wait_event_interruptible seems to be checked twice before > > >> > waiting on the event to occur. These checks are redundant when hrtimer > > >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail > > >> > pointers. The redundant checks add cpu overhead. Simplify the check > > >> > to reduce cpu overhead when using blocking io to read oa buffer reports. > > >> > > > >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx> > > >> > --- > > >> > drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++- > > >> > 1 file changed, 27 insertions(+), 1 deletion(-) > > >> > > > >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > > >> > index 5cde3e4e7be6..e28a3ab83fde 100644 > > >> > --- a/drivers/gpu/drm/i915/i915_perf.c > > >> > +++ b/drivers/gpu/drm/i915/i915_perf.c > > >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > > >> > return pollin; > > >> > } > > >> > > > >> > +/** > > >> > + * oa_buffer_check_reports - quick check if reports are available > > >> > + * @stream: i915 stream instance > > >> > + * > > >> > + * The return from this function is used as a condition for > > >> > + * wait_event_interruptible in blocking read. This is used to detect > > >> > + * available reports. > > >> > + * > > >> > + * A condition in wait_event_interruptible seems to be checked twice before > > >> > + * waiting on an event to occur. These checks are redundant when hrtimer events > > >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The > > >> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu > > >> > + * overhead. > > >> > + */ > > >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream) > > >> > +{ > > >> > + unsigned long flags; > > >> > + bool available; > > >> > + > > >> > + spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); > > >> > + available = stream->oa_buffer.tail != stream->oa_buffer.head; > > >> > + spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); > > >> > + > > >> > + return available; > > >> > +} > > >> > + > > >> > /** > > >> > * append_oa_status - Appends a status record to a userspace read() buffer. > > >> > * @stream: An i915-perf stream opened for OA metrics > > >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > > >> > return -EIO; > > >> > > > >> > return wait_event_interruptible(stream->poll_wq, > > >> > - oa_buffer_check_unlocked(stream)); > > >> > + oa_buffer_check_reports(stream)); > > >> > } > > >> > > >> I think the problem with this patch is that the original code had the > > >> property that the condition for data availability is checked (and the OA > > >> tail advanced) /before/ entering the wait. So the tail was advanced and if > > >> there was data there was no need to wait at all. This change has lost that > > >> property. With this change we will first always enter the wait and then get > > >> unblocked after the timer interrupt which will advance the tail and wake us > > >> up. > > >> > > >> I think if we want to do this, it is possible to do without losing the > > >> original property. Just call oa_buffer_check_unlocked() first (outside > > >> wait_event) and if there is data just return. If not, put yourself on > > >> stream->poll_wq from which the timer callback will wake us up. I think this > > >> is going to be something like prepare_to_wait() / schedule() / > > >> finish_wait() pattern of which there are numerous examples in the > > >> kernel. So in this case we won't call wait_event_interruptible() which > > >> checks the condition upon waking up. No need to define > > >> oa_buffer_check_reports() either. > > > > > > If on the other hand we say that this should actually be the desired > > > behavior of the blocking read, that it should not unblock immediately but > > > only after the next timer interrupt (so that an app can call the blocking > > > read repeatedly without worrying about it will return a little bit of data > > > in each call at a considerable amount of CPU load), then we may be able to > > > do something like this, but even then we may have to think about what would > > > be the correct way to do that. Though this may be that correct way and we > > > may just need to change the commit message, but is that what we want? > > > > I am not quite clear on how the blocking read should behave in terms of the > > API itself. > > > > The change here is solely to reduce cpu overhead from the additional 2 > > calls to oa_buffer_check_unlocked before blocking and that would happen on > > every call to the read. I agree that the read would block if the timer did > > not update the tail yet, but that makes sense in a way that we don't want > > read to constantly return data when the OA sampling rates are high (say 100 > > us). With this change the behavior becomes consistent irrespective of the > > OA sampling rate and user has the flexibility to set the POLL_OA_PERIOD to > > whatever value is suitable for the OA sampling rate chosen. > > I think the patch commit message should clearly state the reason for the > patch. Afais if the purpose of the patch is to reduce cpu overhead, the > patch should not change API behavior. If on the other hand the purpose of > the patch is to make the API behavior consistent, then please resubmit the > patch after modifying the commit message to reflect the purpose of the > patch so that the patch can be reviewed from that point of view. > > Also I think we should be able to just use stream->pollin instead of > defining the new oa_buffer_check_reports(). E.g. isn't > stream->pollin == (stream->oa_buffer.tail != stream->oa_buffer.head) ? Actually yes, using stream->pollin will make the wait for both blocking and non-blocking reads identical. So the blocking wait just becomes: static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) { ... return wait_event_interruptible(stream->poll_wq, stream->pollin); } That is if we re-purpose the patch to make the API behavior consistent across blocking and non-blocking reads. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx