Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/08/17 19:05, sourab gupta wrote:


On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kamble@xxxxxxxxx> wrote:


-----Original Message-----
From: Landwerlin, Lionel G
Sent: Monday, July 31, 2017 9:16 PM
To: Kamble, Sagar A <sagar.a.kamble@xxxxxxxxx>; intel-gfx@lists.freedesktop.org
Cc: Sourab Gupta <sourab.gupta@xxxxxxxxx>
Subject: Re: [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

On 31/07/17 08:59, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@xxxxxxxxx>
>
> This patch introduces a framework to capture OA counter reports associated
> with Render command stream. We can then associate the reports captured
> through this mechanism with their corresponding context id's. This can be
> further extended to associate any other metadata information with the
> corresponding samples (since the association with Render command stream
> gives us the ability to capture these information while inserting the
> corresponding capture commands into the command stream).
>
> The OA reports generated in this way are associated with a corresponding
> workload, and thus can be used the delimit the workload (i.e. sample the
> counters at the workload boundaries), within an ongoing stream of periodic
> counter snapshots.
>
> There may be usecases wherein we need more than periodic OA capture mode
> which is supported currently. This mode is primarily used for two usecases:
>      - Ability to capture system wide metrics, alongwith the ability to map
>        the reports back to individual contexts (particularly for HSW).
>      - Ability to inject tags for work, into the reports. This provides
>        visibility into the multiple stages of work within single context.
>
> The userspace will be able to distinguish between the periodic and CS based
> OA reports by the virtue of source_info sample field.
>
> The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> counters, and is inserted at BB boundaries.
> The data thus captured will be stored in a separate buffer, which will
> be different from the buffer used otherwise for periodic OA capture mode.
> The metadata information pertaining to snapshot is maintained in a list,
> which also has offsets into the gem buffer object per captured snapshot.
> In order to track whether the gpu has completed processing the node,
> a field pertaining to corresponding gem request is added, which is tracked
> for completion of the command.
>
> Both periodic and CS based reports are associated with a single stream
> (corresponding to render engine), and it is expected to have the samples
> in the sequential order according to their timestamps. Now, since these
> reports are collected in separate buffers, these are merge sorted at the
> time of forwarding to userspace during the read call.
>
> v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
> few related patches are squashed together for better readability
>
> v3: Updated perf sample capture emit hook name. Reserving space upfront
> in the ring for emitting sample capture commands and using
> req->fence.seqno for tracking samples. Added SRCU protection for streams.
> Changed the stream last_request tracking to resv object. (Chris)
> Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
> stream to global per-engine structure. (Sagar)
> Update unpin and put in the free routines to i915_vma_unpin_and_release.
> Making use of perf stream cs_buffer vma resv instead of separate resv obj.
> Pruned perf stream vma resv during gem_idle. (Chris)
> Changed payload field ctx_id to u64 to keep all sample data aligned at 8
> bytes. (Lionel)
> stall/flush prior to sample capture is not added. Do we need to give this
> control to user to select whether to stall/flush at each sample?
>
> Signed-off-by: Sourab Gupta <sourab.gupta@xxxxxxxxx>
> Signed-off-by: Robert Bragg <robert@xxxxxxxxxxxxx>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx>
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>   drivers/gpu/drm/i915/i915_perf.c           | 1185 ++++++++++++++++++++++------
>   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>   include/uapi/drm/i915_drm.h                |   15 +
>   8 files changed, 1073 insertions(+), 248 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2c7456f..8b1cecf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>        * The stream will always be disabled before this is called.
>        */
>       void (*destroy)(struct i915_perf_stream *stream);
> +
> +     /*
> +      * @emit_sample_capture: Emit the commands in the command streamer
> +      * for a particular gpu engine.
> +      *
> +      * The commands are inserted to capture the perf sample data at
> +      * specific points during workload execution, such as before and after
> +      * the batch buffer.
> +      */
> +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
> +                                 struct drm_i915_gem_request *request,
> +                                 bool preallocate);
> +};
> +

It seems the motivation for this following enum is mostly to deal with
the fact that engine->perf_srcu is set before the OA unit is configured.
Would it possible to set it later so that we get rid of the enum?

<Sagar> I will try to make this as just binary state. This enum is defining the state of the stream. I too got confused with purpose of IN_PROGRESS.
SRCU is used for synchronizing stream state check.
IN_PROGRESS will enable us to not advertently try to access the stream vma for inserting the samples, but I guess depending on disabled/enabled should
suffice.

Hi Sagar/Lionel,

Hi Sourab,

Thanks again for your input on this.


The purpose of the tristate was to workaround a particular kludge of
working with just enabled/disabled boolean state. I'll explain below.

Let's say we have only boolean state.
i915_perf_emit_sample_capture() function would depend on
stream->enabled in order to insert the MI_RPC command in RCS.
If you see i915_perf_enable_locked(), stream->enabled is set before
stream->ops->enable(). The stream->ops->enable() function actually
enables the OA hardware to capture reports, and if MI_RPC commands
are submitted before OA hw is enabled, it may hang the gpu.

Do you remember if this is documented anywhere?
I couldn't find anything in the MI_RPC instruction.


Also, we can't change the order of calling these operations inside
i915_perf_enable_locked() since gen7_update_oacontrol_locked()
function depends on stream->enabled flag to enable the OA
hw unit (i.e. it needs the flag to be true).

We can probably work around that by passing some arguments.

To workaround this problem, I introduced a tristate here.
If you can suggest some alternate solution to this problem,
we can remove this tristate kludge here.
Regards,
Sourab



_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux