Batch buffer checks on Intel Haswell & OpenCL performance on Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey guys,

The impact of disabling batch buffer security for Intel Haswell
platform on Linux for OpenCL kernels that need local shared memory
access is massive.

On a native kernel with and without the patch, the performance boost
is massive, especially ijn the Lu decomposition benchmark.

Do the math for yourself here. Note that the latest result is labelled
"Asus GTX 860M" and its' at the bottom of the list.

The test labelled i7 4700HQ should be identical to the Asus GTX 860M,
it was simply mis-labelled in another previous test.

The patchwork is here:
https://01.org/beignet/downloads/linux-kernel-patch-hsw-support

The results of the benchmark is right here:
http://openbenchmarking.org/result/1505138-DE-1409234KH90

Regards,

Dennis.





On 6 May 2014 at 21:56,  <intel-gfx-request@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Send Intel-gfx mailing list submissions to
>         intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> or, via email, send a message with subject or body 'help' to
>         intel-gfx-request@xxxxxxxxxxxxxxxxxxxxx
>
> You can reach the person managing the list at
>         intel-gfx-owner@xxxxxxxxxxxxxxxxxxxxx
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Intel-gfx digest..."
>
>
> Today's Topics:
>
>    1. Re: [PATCH] drm/i915: remove user GTT mappings early during
>       runtime suspend (Imre Deak)
>    2. Re: [IGT PATCH] intel_bios_reader: make edp block decode
>       match kernel (Damien Lespiau)
>    3. Re: [IGT PATCH] intel_bios_reader: make edp block decode
>       match kernel (Jani Nikula)
>    4. Re: [IGT PATCH] intel_bios_reader: make edp block decode
>       match kernel (Damien Lespiau)
>    5. Re: [PATCH 1/3] drm/i915: consider the source max DP      lane
>       count too (Jani Nikula)
>    6. Re: [RFC] drm/i915 : Reduce the shmem page allocation     time by
>       using blitter engines for clearing pages. (Eric Anholt)
>    7. Re: [Mesa-dev] [rong.r.yang@xxxxxxxxx: How user space
>       applications load registers on HSW?] (Kenneth Graunke)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 06 May 2014 17:46:01 +0300
> From: Imre Deak <imre.deak@xxxxxxxxx>
> To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  [PATCH] drm/i915: remove user GTT mappings
>         early during runtime suspend
> Message-ID: <1399387561.19761.31.camel@intelbox>
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, 2014-05-06 at 12:59 +0100, Chris Wilson wrote:
>> On Tue, May 06, 2014 at 02:42:26PM +0300, Imre Deak wrote:
>> > On Tue, 2014-05-06 at 12:40 +0100, Chris Wilson wrote:
>> > > On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote:
>> > > > Currently user space can access GEM buffers mapped to GTT through
>> > > > existing mappings concurrently while the platform specific suspend
>> > > > handlers are running.  Since these handlers may change the HW state in a
>> > > > way that would break such accesses, remove the mappings before calling
>> > > > the handlers.
>> > >
>> > > Hmm, but you never locked the device, so what is preventing those
>> > > concurrent accesses from refaulting in GTT entires anyway. Please explain
>> > > the context under which the runtime suspend code executes, and leave
>> > > that explanation within easy reach of intel_runtime_suspend() -
>> > > preferrably with testing of those assumptions.
>> >
>> > During faulting we take an RPM reference, so that avoids concurrent
>> > re-faults. I could add a comment about this to the code.
>>
>> You are still iterating over lists that are not safe, right? Or are
>> there more magic rpm references that prevent ioctls whilst
>> intel_runtime_suspend is being called?
>
> Tbh I haven't checked this, since moving the unmapping earlier doesn't
> make a difference in this regard.
>
> But it's a good point, I tried to audit now those paths. Currently the
> assumption is that we hold an RPM reference everywhere where those lists
> are changed. On the exec buffer path this is true, but for example in
> i915_drop_caches_set() it's not.
>
> We could fix this by taking struct_mutex around
> i915_gem_release_all_mmaps() in intel_runtime_suspend(). Doing that
> needs some more auditing to make sure we can't deadlock somehow. At
> first glance it seems that the driver always schedules a deferred work
> and calls intel_runtime_suspend() from that, so I think it's fine.
>
> I suggest applying this patch regardless since the two issues are
> separate.
>
> --Imre
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 490 bytes
> Desc: This is a digitally signed message part
> URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20140506/4700db0e/attachment-0001.sig>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 6 May 2014 16:58:36 +0100
> From: Damien Lespiau <damien.lespiau@xxxxxxxxx>
> To: Jani Nikula <jani.nikula@xxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  [IGT PATCH] intel_bios_reader: make edp block
>         decode match kernel
> Message-ID: <20140506155836.GA8012@xxxxxxxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, May 06, 2014 at 02:27:23PM +0300, Jani Nikula wrote:
>> All the somewhat recent VBT specs and the kernel have different format
>> for the eDP block than what the tool decodes. What the tool does *may*
>> be correct for really old VBT, but I have no specs or other reference to
>> suppor this. Just do what the kernel does, that's what we're interested
>> in anyway.
>>
>> Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx>
>
> Reviewed-by: Damien Lespiau <damien.lespiau@xxxxxxxxx>
>
> You may want to add checks on the VBT revision that introduce the
> different parameters.
>
> We're also missing the VswingPreEmphasisValue fields from rev. 173.
>
> --
> Damien
>
>> ---
>>  tools/intel_bios.h | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/intel_bios.h b/tools/intel_bios.h
>> index 128502039c66..832c580dd084 100644
>> --- a/tools/intel_bios.h
>> +++ b/tools/intel_bios.h
>> @@ -590,8 +590,11 @@ struct edp_link_params {
>>  struct bdb_edp {
>>       struct edp_power_seq power_seqs[16];
>>       uint32_t color_depth;
>> -     uint32_t sdrrs_msa_timing_delay;
>>       struct edp_link_params link_params[16];
>> +     uint32_t sdrrs_msa_timing_delay;
>> +
>> +     uint16_t edp_s3d_feature;
>> +     uint16_t edp_t3_optimization;
>>  } __attribute__ ((packed));
>>
>>  /*
>> --
>> 1.9.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 06 May 2014 19:04:21 +0300
> From: Jani Nikula <jani.nikula@xxxxxxxxx>
> To: Damien Lespiau <damien.lespiau@xxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  [IGT PATCH] intel_bios_reader: make edp block
>         decode match kernel
> Message-ID: <87wqdyki62.fsf@xxxxxxxxx>
> Content-Type: text/plain
>
> On Tue, 06 May 2014, Damien Lespiau <damien.lespiau@xxxxxxxxx> wrote:
>> On Tue, May 06, 2014 at 02:27:23PM +0300, Jani Nikula wrote:
>>> All the somewhat recent VBT specs and the kernel have different format
>>> for the eDP block than what the tool decodes. What the tool does *may*
>>> be correct for really old VBT, but I have no specs or other reference to
>>> suppor this. Just do what the kernel does, that's what we're interested
>>> in anyway.
>>>
>>> Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx>
>>
>> Reviewed-by: Damien Lespiau <damien.lespiau@xxxxxxxxx>
>>
>> You may want to add checks on the VBT revision that introduce the
>> different parameters.
>
> I *wanted* to. But the versions in the spec apparently get changed also
> when that part about the spec actually hasn't changed. At least my
> 140-something spec had some of the same stuff that was indicated as
> having been added at 150-something. :(
>
>> We're also missing the VswingPreEmphasisValue fields from rev. 173.
>
> Yup, need to be added later (when we actually start using those).
>
> Ok to push?
>
> BR,
> Jani.
>
>>
>> --
>> Damien
>>
>>> ---
>>>  tools/intel_bios.h | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/intel_bios.h b/tools/intel_bios.h
>>> index 128502039c66..832c580dd084 100644
>>> --- a/tools/intel_bios.h
>>> +++ b/tools/intel_bios.h
>>> @@ -590,8 +590,11 @@ struct edp_link_params {
>>>  struct bdb_edp {
>>>      struct edp_power_seq power_seqs[16];
>>>      uint32_t color_depth;
>>> -    uint32_t sdrrs_msa_timing_delay;
>>>      struct edp_link_params link_params[16];
>>> +    uint32_t sdrrs_msa_timing_delay;
>>> +
>>> +    uint16_t edp_s3d_feature;
>>> +    uint16_t edp_t3_optimization;
>>>  } __attribute__ ((packed));
>>>
>>>  /*
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Jani Nikula, Intel Open Source Technology Center
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 6 May 2014 17:06:50 +0100
> From: Damien Lespiau <damien.lespiau@xxxxxxxxx>
> To: Jani Nikula <jani.nikula@xxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  [IGT PATCH] intel_bios_reader: make edp block
>         decode match kernel
> Message-ID: <20140506160650.GB8012@xxxxxxxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, May 06, 2014 at 07:04:21PM +0300, Jani Nikula wrote:
>> On Tue, 06 May 2014, Damien Lespiau <damien.lespiau@xxxxxxxxx> wrote:
>> > On Tue, May 06, 2014 at 02:27:23PM +0300, Jani Nikula wrote:
>> >> All the somewhat recent VBT specs and the kernel have different format
>> >> for the eDP block than what the tool decodes. What the tool does *may*
>> >> be correct for really old VBT, but I have no specs or other reference to
>> >> suppor this. Just do what the kernel does, that's what we're interested
>> >> in anyway.
>> >>
>> >> Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx>
>> >
>> > Reviewed-by: Damien Lespiau <damien.lespiau@xxxxxxxxx>
>> >
>> > You may want to add checks on the VBT revision that introduce the
>> > different parameters.
>>
>> I *wanted* to. But the versions in the spec apparently get changed also
>> when that part about the spec actually hasn't changed. At least my
>> 140-something spec had some of the same stuff that was indicated as
>> having been added at 150-something. :(
>>
>> > We're also missing the VswingPreEmphasisValue fields from rev. 173.
>>
>> Yup, need to be added later (when we actually start using those).
>>
>> Ok to push?
>
> Sure!
>
> --
> Damien
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 06 May 2014 19:59:45 +0300
> From: Jani Nikula <jani.nikula@xxxxxxxxx>
> To: Damien Lespiau <damien.lespiau@xxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx, Paulo Zanoni
>         <paulo.r.zanoni@xxxxxxxxx>
> Subject: Re:  [PATCH 1/3] drm/i915: consider the source max
>         DP      lane count too
> Message-ID: <87tx92kflq.fsf@xxxxxxxxx>
> Content-Type: text/plain
>
> On Tue, 06 May 2014, Damien Lespiau <damien.lespiau@xxxxxxxxx> wrote:
>> On Tue, May 06, 2014 at 02:56:50PM +0300, Jani Nikula wrote:
>>> From: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
>>>
>>> Even if the panel claims it can support 4 lanes, there's the
>>> possibility that the HW can't, so consider this while selecting the
>>> max lane count.
>>>
>>> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
>>> Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx>
>>
>> Reviewed-by: Damien Lespiau <damien.lespiau@xxxxxxxxx>
>
> Series pushed to -fixes, thanks for the swift review!
>
> BR,
> Jani.
>
>>
>> --
>> Damien
>>
>>> ---
>>>  drivers/gpu/drm/i915/intel_dp.c | 20 ++++++++++++++++++--
>>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
>>> index 87b0a515d7a5..3dde9076d9d7 100644
>>> --- a/drivers/gpu/drm/i915/intel_dp.c
>>> +++ b/drivers/gpu/drm/i915/intel_dp.c
>>> @@ -121,6 +121,22 @@ intel_dp_max_link_bw(struct intel_dp *intel_dp)
>>>      return max_link_bw;
>>>  }
>>>
>>> +static u8 intel_dp_max_lane_count(struct intel_dp *intel_dp)
>>> +{
>>> +    struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
>>> +    struct drm_device *dev = intel_dig_port->base.base.dev;
>>> +    u8 source_max, sink_max;
>>> +
>>> +    source_max = 4;
>>> +    if (HAS_DDI(dev) && intel_dig_port->port == PORT_A &&
>>> +        (intel_dig_port->saved_port_bits & DDI_A_4_LANES) == 0)
>>> +            source_max = 2;
>>> +
>>> +    sink_max = drm_dp_max_lane_count(intel_dp->dpcd);
>>> +
>>> +    return min(source_max, sink_max);
>>> +}
>>> +
>>>  /*
>>>   * The units on the numbers in the next two are... bizarre.  Examples will
>>>   * make it clearer; this one parallels an example in the eDP spec.
>>> @@ -171,7 +187,7 @@ intel_dp_mode_valid(struct drm_connector *connector,
>>>      }
>>>
>>>      max_link_clock = drm_dp_bw_code_to_link_rate(intel_dp_max_link_bw(intel_dp));
>>> -    max_lanes = drm_dp_max_lane_count(intel_dp->dpcd);
>>> +    max_lanes = intel_dp_max_lane_count(intel_dp);
>>>
>>>      max_rate = intel_dp_max_data_rate(max_link_clock, max_lanes);
>>>      mode_rate = intel_dp_link_required(target_clock, 18);
>>> @@ -769,7 +785,7 @@ intel_dp_compute_config(struct intel_encoder *encoder,
>>>      struct intel_crtc *intel_crtc = encoder->new_crtc;
>>>      struct intel_connector *intel_connector = intel_dp->attached_connector;
>>>      int lane_count, clock;
>>> -    int max_lane_count = drm_dp_max_lane_count(intel_dp->dpcd);
>>> +    int max_lane_count = intel_dp_max_lane_count(intel_dp);
>>>      /* Conveniently, the link BW constants become indices with a shift...*/
>>>      int max_clock = intel_dp_max_link_bw(intel_dp) >> 3;
>>>      int bpp, mode_rate;
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Jani Nikula, Intel Open Source Technology Center
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 06 May 2014 10:56:21 -0700
> From: Eric Anholt <eric@xxxxxxxxxx>
> To: sourab.gupta@xxxxxxxxx, intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Akash Goel <akash.goel@xxxxxxxxx>, Sourab Gupta
>         <sourab.gupta@xxxxxxxxx>
> Subject: Re:  [RFC] drm/i915 : Reduce the shmem page
>         allocation      time by using blitter engines for clearing pages.
> Message-ID: <8761li94fu.fsf@xxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="us-ascii"
>
> sourab.gupta@xxxxxxxxx writes:
>
>> From: Sourab Gupta <sourab.gupta@xxxxxxxxx>
>>
>> This patch is in continuation of and is dependent on earlier patch
>> series to 'reduce the time for which device mutex is kept locked'.
>> (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html)
>
> One of userspace's assumptions is that when you allocate a new BO, you
> can map it and start writing data into it without needing to wait on the
> GPU.  I expect this patch to mostly hurt performance on apps (and I note
> that the patch doesn't come with any actual performance data) that get
> more stalls as a result.
>
> More importantly, though, it breaks existing userspace that relies on
> buffers being idle on allocation, for the unsychronized maps used in
> intel_bufferobj_subdata() and
> intel_bufferobj_map_range(GL_INVALIDATE_BUFFER_BIT |
> GL_UNSYNCHRONIZED_BIT)
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 818 bytes
> Desc: not available
> URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20140506/d54fdc6c/attachment-0001.sig>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 06 May 2014 11:57:24 -0700
> From: Kenneth Graunke <kenneth@xxxxxxxxxxxxx>
> To: rong.r.yang@xxxxxxxxx
> Cc: mesa-dev <mesa-dev@xxxxxxxxxxxxxxxxxxxxx>, Ben Widawsky
>         <ben@xxxxxxxxxxxx>, intel-gfx <intel-gfx@xxxxxxxxxxxxxxxxxxxxx>
> Subject: Re:  [Mesa-dev] [rong.r.yang@xxxxxxxxx: How user
>         space applications load registers on HSW?]
> Message-ID: <53693094.5060504@xxxxxxxxxxxxx>
> Content-Type: text/plain; charset="windows-1252"
>
> On 05/06/2014 08:26:15 AM, Yang, Rong R wrote:
>> Hi,
>>
>> I am developing the HSW?s OCL driver in the linux. I encounter a LRI
>> problem on HSW.
>>
>>
>> Some gpgpu's applications, which use the shared local memory, must load
>> the L3CTRLREG2 and L3CTRLREG3 registers to allocate the SLM in the L3
>> cache.
>>
>> So I add L3CTRLREG2 and L3CTRLREG3 to the gen7_render_regs to pass the
>> cmds parse when exec buffer. But it still don?t work.
>>
>> I notice that, on HSW, the commands that load the register, such as
>> MI_LOAD_REGISTER_IMM, will be converted to NOOP by the GPU if the batch
>> buffer's MI_BATCH_NON_SECURE_HSW bit is set. And after parse cmd, the
>> MI_BATCH_NON_SECURE_HSW still set in the kernel. So HSW don?t accept
>> LRI commands.
>>
>>
>> Can I load these registers in the user space? Or should I hack the
>> kernel?
>>
>>
>> Yang Rong
>
> I've been asking the kernel developers for the ability to LRI/LRM from
> userspace batches for around 1.5 years.  Unfortunately, we're still
> waiting, and I honestly have no idea when they're going to finish it.
>
> In the meantime, you can apply the attached patch to your kernel tree to
> disable the hardware scanner, letting you run whatever commands you
> want.  Obviously, we can't ship this on production systems, but it will
> allow you to do your development without having to wait for the kernel team.
>
> --Ken
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: disable-batchbuffer-security.patch
> Type: text/x-patch
> Size: 485 bytes
> Desc: not available
> URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20140506/a4b2f40d/attachment.bin>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 836 bytes
> Desc: OpenPGP digital signature
> URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20140506/a4b2f40d/attachment.sig>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
> ------------------------------
>
> End of Intel-gfx Digest, Vol 76, Issue 29
> *****************************************
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux