On Sat, 21 Mar 2020 16:26:42 -0700, Dixit, Ashutosh wrote: > > On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote: > > > > From: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> > > > @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > > */ > > spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); > > > > hw_tail = stream->perf->ops.oa_hw_tail_read(stream); > > > > hw_tail &= ~(report_size - 1); > > > > @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > > > > now = ktime_get_mono_fast_ns(); > > > > + if (hw_tail == stream->oa_buffer.aging_tail && > > + (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) { > > + /* If the HW tail hasn't move since the last check and the HW > > + * tail has been aging for long enough, declare it the new > > + * tail. > > + */ > > + stream->oa_buffer.tail = stream->oa_buffer.aging_tail; > > + } else { > > + u32 head, tail; > > > > + /* NB: The head we observe here might effectively be a little > > + * out of date. If a read() is in progress, the head could be > > + * anywhere between this head and stream->oa_buffer.tail. > > + */ > > + head = stream->oa_buffer.head - gtt_offset; > > > > + hw_tail -= gtt_offset; > > + tail = hw_tail; > > > > + /* Walk the stream backward until we find a report with dword 0 > > + * & 1 not at 0. Since the circular buffer pointers progress by > > + * increments of 64 bytes and that reports can be up to 256 > > + * bytes long, we can't tell whether a report has fully landed > > + * in memory before the first 2 dwords of the following report > > + * have effectively landed. > > + * > > + * This is assuming that the writes of the OA unit land in > > + * memory in the order they were written to. > > + * If not : (╯°□°)╯︵ ┻━┻ > > */ > > + while (OA_TAKEN(tail, head) >= report_size) { > > + u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); > > + u32 *report32 = (void *)(stream->oa_buffer.vaddr + previous_tail); > > Sorry, this is wrong. This should just be: > > tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); > report32 = (void *)(stream->oa_buffer.vaddr + tail); > > Otherwise when we break out of the loop below tail is still set one > report_size ahead. previous_tail is not needed. (In the previous version of > the patch this used to work out correctly). > > > + > > + /* Head of the report indicated by the HW tail register has > > + * indeed landed into memory. > > + */ > > + if (report32[0] != 0 || report32[1] != 0) > > + break; > > + > > + tail = previous_tail; > > } Actually a couple of further improvements to the loop above are possible. First there is no reason to start at previous_tail, we can just start at the aligned hw_tail itself. Therefore the loop becomes: while (OA_TAKEN(tail, head) >= report_size) { u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail); if (report32[0] != 0 || report32[1] != 0) break; tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); } Further, there is no reason to go back to the head but only to the old tail. Therefore: head = stream->oa_buffer.head - gtt_offset; old_tail = stream->oa_buffer.tail - gtt_offset; hw_tail -= gtt_offset; tail = hw_tail; while (OA_TAKEN(tail, old_tail) >= report_size) { u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail); if (report32[0] != 0 || report32[1] != 0) break; tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); } Please review and see if these two improvements are possible. Thanks! _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx