Re: [PATCH v3 2/2] drm/amd/display: move remaining FPU code to dml folder

Ao Zhong <hacc1225@xxxxxxxxx> · Thu, 27 Oct 2022 18:48:49 +0200

There isn't much information on the internet for Qingyun W510 as this
is not a retail machine. But I'm happy to provide any details about
this machine.

The Qingyun W510 is powered by Huawei's server SoC Kunpeng 920, it's
SBSA compatible.
Information about Kunpeng 920 can be found here.
Link: https://en.wikichip.org/wiki/hisilicon/kunpeng/920-6426
But not all the functions provided by Kunpeng 920 can be use on
Qingyun W510, like SMMU( IOMMU on ARM ), SAS controller or Encryption
Acceleration Engine. This machine is SFF form factor, it has only two
sodimm memory slots and doesn't support ECC (some Kunpeng Desktop
motherboards support that) and 1x PCIe x4, 1x PCIe x16, 2x M.2 slot
(PCIe x4).
It also has 2 SATA 3.0 port, one for optical drive, and the other for
the HDD. This machine will be shipped with amd's RX550 or Jingjia
Micro JM7201 GPU. My machine comes with JM7201, it's a GPU
independently developed by China. Unfortunately, since there is no
open source driver, I can only use EFI framebuffer with mainline
kernel. Qingyun W510 also has a Huawei's Hi1103LPC WiFi/Bluetooth
module, and a power button with a Goodix fingerprint sensor. Since
none of them have open source drivers, I can't use them with mainline
kernel.

There are also two similar-looking machines, Qingyun W515 and Qingyun
W525, which use HiSilicon Kirin 990 SoC and HiSilicon Pangu M900 SoC,
which are based on mobile platforms.

My workstation should be a product of DVT stage, because Huawei only
allow users to use PCIe 3.0 in the release version of Qingyun W510.
Some machines may not be able to install more than 32G of memory due
to firmware.

Am Do., 27. Okt. 2022 um 17:38 Uhr schrieb Rodrigo Siqueira
<Rodrigo.Siqueira@xxxxxxx>:
>
> Hi Ao,
>
> Could you share a link that describe your workstation?
>
> Thanks
>
> On 10/26/22 17:17, Ao Zhong wrote:
> > Hi Rodrigo,
> >
> > Thanks for your review! This is my first time submitting a patch to the kernel.
> >
> > I'm not very good at using these tools yet. 😂
> >
> > Recently I got a Huawei Qingyun W510 (擎云 W510) ARM workstation
> >
> > from the second-hand market in China. It's SBSA and has a Kunpeng 920 (3211k) SoC
> >
> > with 24 Huawei-customized TSV110 cores. Since it's SFF form factor, and my machine
> >
> > supports PCIe 4.0 (looks like some W510 have it disabled), I installed an RX 6400 on it
> >
> > as my daily drive machine. It has decent performance. I uploaded a benchmark result on Geekbench.
> >
> > Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbrowser.geekbench.com%2Fv5%2Fcpu%2F18237269&amp;data=05%7C01%7CRodrigo.Siqueira%40amd.com%7Cdaa18df14f004d2d621d08dab7977866%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638024158436988558%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Iyq4tWJL%2FfXuKB9xAUaVTQQmJQ0GRZ2rH%2F%2BXPTT%2F2tc%3D&amp;reserved=0
> >
> > Ao
> >
> > Am 26.10.22 um 18:12 schrieb Rodrigo Siqueira:
> >>
> >>
> >> On 10/26/22 07:13, Ao Zhong wrote:
> >>> pipes[pipe_cnt].pipe.src.dcc_fraction_of_zs_req_luma = 0;
> >>> pipes[pipe_cnt].pipe.src.dcc_fraction_of_zs_req_chroma = 0;
> >>> these two operations in dcn32/dcn32_resource.c still need to use FPU,
> >>> This will cause compilation to fail on ARM64 platforms because
> >>> -mgeneral-regs-only is enabled by default to disable the hardware FPU.
> >>> Therefore, imitate the dcn31_zero_pipe_dcc_fraction function in
> >>> dml/dcn31/dcn31_fpu.c, declare the dcn32_zero_pipe_dcc_fraction function
> >>> in dcn32_fpu.c, and move above two operations into this function.
> >>>
> >>> Acked-by: Christian König <christian.koenig@xxxxxxx>
> >>> Signed-off-by: Ao Zhong <hacc1225@xxxxxxxxx>
> >>> ---
> >>>    drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c | 5 +++--
> >>>    drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  | 8 ++++++++
> >>>    drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.h  | 3 +++
> >>>    3 files changed, 14 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
> >>> index a88dd7b3d1c1..287b7fa9bf41 100644
> >>> --- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
> >>> +++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
> >>> @@ -1918,8 +1918,9 @@ int dcn32_populate_dml_pipes_from_context(
> >>>            timing = &pipe->stream->timing;
> >>>              pipes[pipe_cnt].pipe.src.gpuvm = true;
> >>> -        pipes[pipe_cnt].pipe.src.dcc_fraction_of_zs_req_luma = 0;
> >>> -        pipes[pipe_cnt].pipe.src.dcc_fraction_of_zs_req_chroma = 0;
> >>> +        DC_FP_START();
> >>> +        dcn32_zero_pipe_dcc_fraction(pipes, pipe_cnt);
> >>> +        DC_FP_END();
> >>>            pipes[pipe_cnt].pipe.dest.vfront_porch = timing->v_front_porch;
> >>>            pipes[pipe_cnt].pipe.src.gpuvm_min_page_size_kbytes = 256; // according to spreadsheet
> >>>            pipes[pipe_cnt].pipe.src.unbounded_req_mode = false;
> >>> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
> >>> index 819de0f11012..58772fce6437 100644
> >>> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
> >>> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
> >>> @@ -2521,3 +2521,11 @@ void dcn32_update_bw_bounding_box_fpu(struct dc *dc, struct clk_bw_params *bw_pa
> >>>        }
> >>>    }
> >>>    +void dcn32_zero_pipe_dcc_fraction(display_e2e_pipe_params_st *pipes,
> >>> +                  int pipe_cnt)
> >>> +{
> >>> +    dc_assert_fp_enabled();
> >>> +
> >>> +    pipes[pipe_cnt].pipe.src.dcc_fraction_of_zs_req_luma = 0;
> >>> +    pipes[pipe_cnt].pipe.src.dcc_fraction_of_zs_req_chroma = 0;
> >>> +}
> >>> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.h b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.h
> >>> index 3a3dc2ce4c73..ab010e7e840b 100644
> >>> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.h
> >>> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.h
> >>> @@ -73,4 +73,7 @@ int dcn32_find_dummy_latency_index_for_fw_based_mclk_switch(struct dc *dc,
> >>>      void dcn32_patch_dpm_table(struct clk_bw_params *bw_params);
> >>>    +void dcn32_zero_pipe_dcc_fraction(display_e2e_pipe_params_st *pipes,
> >>> +                  int pipe_cnt);
> >>> +
> >>>    #endif
> >>
> >> Hi Ao,
> >>
> >> First of all, thanks a lot for your patchset.
> >>
> >> For both patches:
> >>
> >> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@xxxxxxx>
> >>
> >> And I also applied them to amd-staging-drm-next.
> >>
> >> Btw, if you are using git-send-email for sending patches, I recommend the following options:
> >>
> >> git send-email --annotate --cover-letter --thread --no-chain-reply-to --to="EMAILS" --cc="mailing@xxxxxxxx" <SHA>
> >>
> >> Always add a cover letter, it makes it easier to follow the patchset, and you can also describe each change in the cover letter.
> >>
> >> When you send that other patch enabling ARM64, please add as many details as possible in the cover letter. Keep in mind that we have been working for isolating those FPU codes in a way that we do not regress any of our ASICs, which means that every change was well-tested on multiple devices. Anyway, maybe you can refer to this cover letter to write down the commit message:
> >>
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F93042%2F&amp;data=05%7C01%7CRodrigo.Siqueira%40amd.com%7Cdaa18df14f004d2d621d08dab7977866%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638024158436988558%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=0GMN1Uj9iuQv2ZjipDHnl29V0UvWk6IL4XwlehdPNLA%3D&amp;reserved=0
> >>
> >> Finally, do you have a use case for this change? I mean, ARM64 + AMD dGPU.
> >>
> >> Thanks again!
> >> Siqueira
> >>