Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 16, 2019 at 09:11:58PM -0700, Ashutosh Dixit wrote:
On Mon, 16 Sep 2019 12:17:54 -0700, Umesh Nerlige Ramappa wrote:

On Sun, Sep 15, 2019 at 02:24:41PM +0300, Lionel Landwerlin wrote:
> On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
>> OA perf unit supports non-power of 2 report sizes. Enable support for
>> these sizes in the driver.
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx>
>> ---
>>  drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
>>  1 file changed, 21 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 50b6d154fd46..482fca3da7de 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>>	int report_size = stream->oa_buffer.format_size;
>>	unsigned long flags;
>> -	u32 hw_tail;
>> +	u32 hw_tail, aging_tail;
>>	u64 now;
>>	/* We have to consider the (unlikely) possibility that read() errors
>> @@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>	 */
>>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>> -	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
>> +	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
>> +	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
>>	/* The tail pointer increases in 64 byte increments,
>>	 * not in report_size steps...
>>	 */
>> -	hw_tail &= ~(report_size - 1);
>> +	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));
>
>
> I'm struggling to parse this line above and I'm not 100% sure it's correct.
>
> Could add a comment to explain what is going on?

The aging tail is always pointing to the boundary of a report whereas
the hw_tail is advancing in 64 byte increments.

The innermost OA_TAKEN is returning the number of bytes between the
hw_tail and the aging_tail. The modulo is getting the size of the
partial report (if any).

The outermost OA_TAKEN is subtracting the size of partial report from
the hw_tail to get a hw_tail that points to the boundary of the last
full report.

The value of hw_tail would be the same as from the deleted line of code
above this line.

Looks like aging_tail should not be needed (it is complicating the
expression and is not there in the original expression). All we need is a
"circular modulus". For example would the following work?

original expression assumes all report sizes are powers of 2 and hence does not need a reference (like aging_tail) to rounddown the hw_tail.


   if (hw_tail < report_size)
       hw_tail += OA_BUFFER_SIZE;

Assuming that this condition is detecting a report split across the OA buffer boundary, the above check will not catch the split in cases where there are multiple reports generated before we read the hw_tail.

   hw_tail = rounddown(hw_tail, report_size);

Another way (if we want to avoid the division) would be to maintain a
cached copy of hw_tail in SW which is successively (and circularly)
incremented by report_size till it catches up with hw_tail read from
HW. But probably the first method above is simpler.

aging_tail is a cached copy of the hw_tail that advances only in report size increments.

Thanks,
Umesh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux