Re: [PATCH] Optimized division operation to shift operation

"Deucher, Alexander" <Alexander.Deucher@xxxxxxx> · Wed, 15 Apr 2020 14:08:06 +0000

[AMD Public Use]

I've gone ahead and dropped the patch.

Alex

From: Koenig, Christian <Christian.Koenig@xxxxxxx>

Sent: Wednesday, April 15, 2020 3:57 AM

To: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>; Alex Deucher <alexdeucher@xxxxxxxxx>; Bernard Zhao <bernard@xxxxxxxx>

Cc: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@xxxxxxx>; Zeng, Oak <Oak.Zeng@xxxxxxx>; Maling list - DRI developers <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; David Airlie <airlied@xxxxxxxx>; Kuehling, Felix <Felix.Kuehling@xxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>;
 amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; kernel@xxxxxxxx <kernel@xxxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Russell, Kent <Kent.Russell@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Sam Ravnborg <sam@xxxxxxxxxxxx>; Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>

Subject: Re: [PATCH] Optimized division operation to shift operation

Am 15.04.20 um 09:41 schrieb Jani Nikula:

> On Tue, 14 Apr 2020, Alex Deucher <alexdeucher@xxxxxxxxx> wrote:

>> On Tue, Apr 14, 2020 at 9:05 AM Bernard Zhao <bernard@xxxxxxxx> wrote:

>>> On some processors, the / operate will call the compiler`s div lib,

>>> which is low efficient, We can replace the / operation with shift,

>>> so that we can replace the call of the division library with one

>>> shift assembly instruction.

> This was applied already, and it's not in a driver I look after... but

> to me this feels like something that really should be

> justified. Using >> instead of / for multiples of 2 division mattered 20

> years ago, I'd be surprised if it still did on modern compilers.

I have similar worries, especially since we replace the "/ (4 * 2)" with 

">> 3" it's making the code just a bit less readable.

And that the code runs exactly once while loading the driver and pushing 

the firmware into the hardware. So performance is completely irrelevant 

here.

Regards,

Christian.

>

> BR,

> Jani.

>

>

>>> Signed-off-by: Bernard Zhao <bernard@xxxxxxxx>

>> Applied.  thanks.

>>

>> Alex

>>

>>> ---

>>>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 4 ++--

>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 4 ++--

>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 4 ++--

>>>   3 files changed, 6 insertions(+), 6 deletions(-)

>>>

>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c

>>> index b205039..66cd078 100644

>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c

>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c

>>> @@ -175,10 +175,10 @@ static int gmc_v6_0_mc_load_microcode(struct amdgpu_device *adev)

>>>          amdgpu_ucode_print_mc_hdr(&hdr->header);

>>>

>>>          adev->gmc.fw_version = le32_to_cpu(hdr->header.ucode_version);

>>> -       regs_size = le32_to_cpu(hdr->io_debug_size_bytes) / (4 * 2);

>>> +       regs_size = le32_to_cpu(hdr->io_debug_size_bytes) >> 3;

>>>          new_io_mc_regs = (const __le32 *)

>>>                  (adev->gmc.fw->data + le32_to_cpu(hdr->io_debug_array_offset_bytes));

>>> -       ucode_size = le32_to_cpu(hdr->header.ucode_size_bytes) / 4;

>>> +       ucode_size = le32_to_cpu(hdr->header.ucode_size_bytes) >> 2;

>>>          new_fw_data = (const __le32 *)

>>>                  (adev->gmc.fw->data + le32_to_cpu(hdr->header.ucode_array_offset_bytes));

>>>

>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c

>>> index 9da9596..ca26d63 100644

>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c

>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c

>>> @@ -193,10 +193,10 @@ static int gmc_v7_0_mc_load_microcode(struct amdgpu_device *adev)

>>>          amdgpu_ucode_print_mc_hdr(&hdr->header);

>>>

>>>          adev->gmc.fw_version = le32_to_cpu(hdr->header.ucode_version);

>>> -       regs_size = le32_to_cpu(hdr->io_debug_size_bytes) / (4 * 2);

>>> +       regs_size = le32_to_cpu(hdr->io_debug_size_bytes) >> 3;

>>>          io_mc_regs = (const __le32 *)

>>>                  (adev->gmc.fw->data + le32_to_cpu(hdr->io_debug_array_offset_bytes));

>>> -       ucode_size = le32_to_cpu(hdr->header.ucode_size_bytes) / 4;

>>> +       ucode_size = le32_to_cpu(hdr->header.ucode_size_bytes) >> 2;

>>>          fw_data = (const __le32 *)

>>>                  (adev->gmc.fw->data + le32_to_cpu(hdr->header.ucode_array_offset_bytes));

>>>

>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c

>>> index 27d83204..295039c 100644

>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c

>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c

>>> @@ -318,10 +318,10 @@ static int gmc_v8_0_tonga_mc_load_microcode(struct amdgpu_device *adev)

>>>          amdgpu_ucode_print_mc_hdr(&hdr->header);

>>>

>>>          adev->gmc.fw_version = le32_to_cpu(hdr->header.ucode_version);

>>> -       regs_size = le32_to_cpu(hdr->io_debug_size_bytes) / (4 * 2);

>>> +       regs_size = le32_to_cpu(hdr->io_debug_size_bytes) >> 3;

>>>          io_mc_regs = (const __le32 *)

>>>                  (adev->gmc.fw->data + le32_to_cpu(hdr->io_debug_array_offset_bytes));

>>> -       ucode_size = le32_to_cpu(hdr->header.ucode_size_bytes) / 4;

>>> +       ucode_size = le32_to_cpu(hdr->header.ucode_size_bytes) >> 2;

>>>          fw_data = (const __le32 *)

>>>                  (adev->gmc.fw->data + le32_to_cpu(hdr->header.ucode_array_offset_bytes));

>>>

>>> --

>>> 2.7.4

>>>

>>> _______________________________________________

>>> amd-gfx mailing list

>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx

>>> 
https://nam11.safelinks.protection.outlook.com/?url="">

>> _______________________________________________

>> dri-devel mailing list

>> dri-devel@xxxxxxxxxxxxxxxxxxxxx

>> 
https://nam11.safelinks.protection.outlook.com/?url="">

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx