On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> wrote:
Hi Sourab,On 01/08/17 19:05, sourab gupta wrote:
On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kamble@xxxxxxxxx> wrote:
<Sagar> I will try to make this as just binary state. This enum is defining the state of the stream. I too got confused with purpose of IN_PROGRESS.
-----Original Message-----
From: Landwerlin, Lionel G
Sent: Monday, July 31, 2017 9:16 PM
To: Kamble, Sagar A <sagar.a.kamble@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxg
Cc: Sourab Gupta <sourab.gupta@xxxxxxxxx>
Subject: Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@xxxxxxxxx>
>
> This patch introduces a framework to capture OA counter reports associated
> with Render command stream. We can then associate the reports captured
> through this mechanism with their corresponding context id's. This can be
> further extended to associate any other metadata information with the
> corresponding samples (since the association with Render command stream
> gives us the ability to capture these information while inserting the
> corresponding capture commands into the command stream).
>
> The OA reports generated in this way are associated with a corresponding
> workload, and thus can be used the delimit the workload (i.e. sample the
> counters at the workload boundaries), within an ongoing stream of periodic
> counter snapshots.
>
> There may be usecases wherein we need more than periodic OA capture mode
> which is supported currently. This mode is primarily used for two usecases:
> - Ability to capture system wide metrics, alongwith the ability to map
> the reports back to individual contexts (particularly for HSW).
> - Ability to inject tags for work, into the reports. This provides
> visibility into the multiple stages of work within single context.
>
> The userspace will be able to distinguish between the periodic and CS based
> OA reports by the virtue of source_info sample field.
>
> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> counters, and is inserted at BB boundaries.
> The data thus captured will be stored in a separate buffer, which will
> be different from the buffer used otherwise for periodic OA capture mode.
> The metadata information pertaining to snapshot is maintained in a list,
> which also has offsets into the gem buffer object per captured snapshot.
> In order to track whether the gpu has completed processing the node,
> a field pertaining to corresponding gem request is added, which is tracked
> for completion of the command.
>
> Both periodic and CS based reports are associated with a single stream
> (corresponding to render engine), and it is expected to have the samples
> in the sequential order according to their timestamps. Now, since these
> reports are collected in separate buffers, these are merge sorted at the
> time of forwarding to userspace during the read call.
>
> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
> few related patches are squashed together for better readability
>
> v3: Updated perf sample capture emit hook name. Reserving space upfront
> in the ring for emitting sample capture commands and using
> req->fence.seqno for tracking samples. Added SRCU protection for streams.
> Changed the stream last_request tracking to resv object. (Chris)
> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> stream to global per-engine structure. (Sagar)
> Update unpin and put in the free routines to i915_vma_unpin_and_release.
> Making use of perf stream cs_buffer vma resv instead of separate resv obj.
> Pruned perf stream vma resv during gem_idle. (Chris)
> Changed payload field ctx_id to u64 to keep all sample data aligned at 8
> bytes. (Lionel)
> stall/flush prior to sample capture is not added. Do we need to give this
> control to user to select whether to stall/flush at each sample?
>
> Signed-off-by: Sourab Gupta <sourab.gupta@xxxxxxxxx>
> Signed-off-by: Robert Bragg <robert@xxxxxxxxxxxxx>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 101 ++-
> drivers/gpu/drm/i915/i915_gem.c | 1 +
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 +
> drivers/gpu/drm/i915/i915_perf.c | 1185 ++++++++++++++++++++++------
> drivers/gpu/drm/i915/intel_engine_cs.c | 4 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 5 +
> include/uapi/drm/i915_drm.h | 15 +
> 8 files changed, 1073 insertions(+), 248 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_dr v.h
> index 2c7456f..8b1cecf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
> * The stream will always be disabled before this is called.
> */
> void (*destroy)(struct i915_perf_stream *stream);
> +
> + /*
> + * @emit_sample_capture: Emit the commands in the command streamer
> + * for a particular gpu engine.
> + *
> + * The commands are inserted to capture the perf sample data at
> + * specific points during workload execution, such as before and after
> + * the batch buffer.
> + */
> + void (*emit_sample_capture)(struct i915_perf_stream *stream,
> + struct drm_i915_gem_request *request,
> + bool preallocate);
> +};
> +
It seems the motivation for this following enum is mostly to deal with
the fact that engine->perf_srcu is set before the OA unit is configured.
Would it possible to set it later so that we get rid of the enum?
SRCU is used for synchronizing stream state check.
IN_PROGRESS will enable us to not advertently try to access the stream vma for inserting the samples, but I guess depending on disabled/enabled should
suffice.
Hi Sagar/Lionel,
Thanks again for your input on this.
The purpose of the tristate was to workaround a particular kludge ofworking with just enabled/disabled boolean state. I'll explain below.
Let's say we have only boolean state.i915_perf_emit_sample_capture() function would depend on stream->enabled in order to insert the MI_RPC command in RCS.If you see i915_perf_enable_locked(), stream->enabled is set beforestream->ops->enable(). The stream->ops->enable() function actuallyenables the OA hardware to capture reports, and if MI_RPC commandsare submitted before OA hw is enabled, it may hang the gpu.
Do you remember if this is documented anywhere?
I couldn't find anything in the MI_RPC instruction.
Sorry, I don't happen to remember any documentation. Probably, you can check this out by submitting MI_RPC without enabling OA.
Also, we can't change the order of calling these operations insidei915_perf_enable_locked() since gen7_update_oacontrol_locked() function depends on stream->enabled flag to enable the OAhw unit (i.e. it needs the flag to be true).
We can probably work around that by passing some arguments.
To workaround this problem, I introduced a tristate here.If you can suggest some alternate solution to this problem,we can remove this tristate kludge here.Regards,Sourab
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx