On 04/09, Peter Zijlstra wrote: > On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote: > > Am 09.04.20 um 19:09 schrieb Peter Zijlstra: > > > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote: > > > [SNIP] > > > > I'll need another approach, let me consider. > > > Christian; it says these files are generated, does that generator know > > > which functions are wholly in FPU context and which are not? > > > > Well that "generator" is still a human being :) > > > > It's just that the formulae for the calculation come from the hardware team > > and we are not able to easily transcript them to fixed point calculations. > > Well, if it's a human, can this human respect the kernel coding style a > bit more :-) Some of that stuff is atrocious. > > > > My current thinking is that if I annotate all functions that are wholly > > > inside kernel_fpu_start() with an __fpu function attribute, then I can > > > verify that any call from regular text to fpu text only happens inside > > > kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation > > > fuctions only contain !fpu instructions. > > > > Yeah, that sounds like a good idea to me and should be easily doable. > > > > > Can that generator add the __fpu function attribute or is that something > > > that would need to be done manually (which seems like it would be > > > painful, since it is quite a bit of code) ? > > > > We are currently in the process of moving all the stuff which requires > > floating point into a single C file(s) and then make sure that we only call > > those within kernel_fpu_begin()/end() blocks. > > Can you make the build system stick all those .o files in a single > archive? That's the only way I can do call validation; external > relocatoin records do not contain the section. > > > Annotating those function with __fpu or even saying to gcc that all code of > > those files should go into a special text.fpu segment shouldn't be much of a > > problem. > > Guess what the __fpu attribute does ;-) > > With the below patch (which is on to of newer versions of the objtool > patches send earlier, let me know if you want a full set) that only > converts a few files, but fully converts: > > drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c > > But building it (and this is an absolute pain; when you're reworking > this, can you pretty please also fix the Makefiles?), we get: Hi, I'm going to work on the FPU issues in the display code. In this sense, could you clarify a little bit more about the Makefile issues? Also, I applied the patch `[PATCH v4] x86: insn: Add insn_is_fpu()` and tried to reproduce the warning that you described but I didn't see it. Could you explain to me how I can check those warnings? Thanks > drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of kernel_fpu_{begin,end}() > > $ ./scripts/faddr2line defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth+0x34fa > dcn_validate_bandwidth+0x34fa/0x57ce: > dcn_validate_bandwidth at /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293 (discriminator 5) > > # ./objdump-func.sh defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth | grep 34fa > 34fa 50fa: f2 0f 10 b5 60 ff ff movsd -0xa0(%rbp),%xmm6 > > Which seems to indicate there's still problms with the current code. > > > > --- > arch/x86/include/asm/fpu/api.h | 12 +++++++++++ > arch/x86/kernel/vmlinux.lds.S | 1 + > .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c | 25 +++++++++++----------- > drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 4 ++-- > .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c | 10 ++++----- > .../amd/display/dc/dml/display_rq_dlg_helpers.c | 2 +- > .../gpu/drm/amd/display/dc/dml/dml_common_defs.c | 2 +- > drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 2 +- > drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c | 10 ++++----- > drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c | 4 ++-- > drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h | 2 ++ > tools/objtool/check.c | 7 +++++- > tools/objtool/elf.h | 2 +- > 13 files changed, 52 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h > index 64be4426fda9..19eaf98bbb0a 100644 > --- a/arch/x86/include/asm/fpu/api.h > +++ b/arch/x86/include/asm/fpu/api.h > @@ -12,11 +12,23 @@ > #define _ASM_X86_FPU_API_H > #include <linux/bottom_half.h> > > +#ifdef CONFIG_STACK_VALIDATION > + > +#define __fpu __section(".text.fpu") > + > #define _ASM_ANNOTATE_FPU(at) \ > ".pushsection .discard.fpu_safe\n" \ > ".long " #at " - .\n" \ > ".popsection\n" \ > > +#else > + > +#define __fpu > + > +#define _ASM_ANNOTATE_FPU(at) > + > +#endif /* CONFIG_STACK_VALIDATION */ > + > #define annotate_fpu() ({ \ > asm volatile("%c0:\n\t" \ > _ASM_ANNOTATE_FPU(%c0b) \ > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index 1bf7e312361f..8442f8633d07 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -139,6 +139,7 @@ SECTIONS > SOFTIRQENTRY_TEXT > *(.fixup) > *(.gnu.warning) > + *(.text.fpu) > > #ifdef CONFIG_RETPOLINE > __indirect_thunk_start = .; > diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > index 07d18e78de49..57ab3aafef5a 100644 > --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > @@ -36,7 +36,7 @@ > * remain as-is as it provides us with a guarantee from HW that it is correct. > */ > > -float dcn_bw_mod(const float arg1, const float arg2) > +__fpu float dcn_bw_mod(const float arg1, const float arg2) > { > if (isNaN(arg1)) > return arg2; > @@ -45,7 +45,7 @@ float dcn_bw_mod(const float arg1, const float arg2) > return arg1 - arg1 * ((int) (arg1 / arg2)); > } > > -float dcn_bw_min2(const float arg1, const float arg2) > +__fpu float dcn_bw_min2(const float arg1, const float arg2) > { > if (isNaN(arg1)) > return arg2; > @@ -58,7 +58,7 @@ unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2) > { > return arg1 > arg2 ? arg1 : arg2; > } > -float dcn_bw_max2(const float arg1, const float arg2) > +__fpu float dcn_bw_max2(const float arg1, const float arg2) > { > if (isNaN(arg1)) > return arg2; > @@ -67,25 +67,26 @@ float dcn_bw_max2(const float arg1, const float arg2) > return arg1 > arg2 ? arg1 : arg2; > } > > -float dcn_bw_floor2(const float arg, const float significance) > +__fpu float dcn_bw_floor2(const float arg, const float significance) > { > if (significance == 0) > return 0; > return ((int) (arg / significance)) * significance; > } > -float dcn_bw_floor(const float arg) > + > +__fpu float dcn_bw_floor(const float arg) > { > return ((int) (arg)); > } > > -float dcn_bw_ceil(const float arg) > +__fpu float dcn_bw_ceil(const float arg) > { > float flr = dcn_bw_floor2(arg, 1); > > return flr + 0.00001 >= arg ? arg : flr + 1; > } > > -float dcn_bw_ceil2(const float arg, const float significance) > +__fpu float dcn_bw_ceil2(const float arg, const float significance) > { > float flr = dcn_bw_floor2(arg, significance); > if (significance == 0) > @@ -93,17 +94,17 @@ float dcn_bw_ceil2(const float arg, const float significance) > return flr + 0.00001 >= arg ? arg : flr + significance; > } > > -float dcn_bw_max3(float v1, float v2, float v3) > +__fpu float dcn_bw_max3(float v1, float v2, float v3) > { > return v3 > dcn_bw_max2(v1, v2) ? v3 : dcn_bw_max2(v1, v2); > } > > -float dcn_bw_max5(float v1, float v2, float v3, float v4, float v5) > +__fpu float dcn_bw_max5(float v1, float v2, float v3, float v4, float v5) > { > return dcn_bw_max3(v1, v2, v3) > dcn_bw_max2(v4, v5) ? dcn_bw_max3(v1, v2, v3) : dcn_bw_max2(v4, v5); > } > > -float dcn_bw_pow(float a, float exp) > +__fpu float dcn_bw_pow(float a, float exp) > { > float temp; > /*ASSERT(exp == (int)exp);*/ > @@ -120,7 +121,7 @@ float dcn_bw_pow(float a, float exp) > } > } > > -double dcn_bw_fabs(double a) > +__fpu double dcn_bw_fabs(double a) > { > if (a > 0) > return (a); > @@ -129,7 +130,7 @@ double dcn_bw_fabs(double a) > } > > > -float dcn_bw_log(float a, float b) > +__fpu float dcn_bw_log(float a, float b) > { > int * const exp_ptr = (int *)(&a); > int x = *exp_ptr; > diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c > index 3960a8db94cb..b3e305d9d1c9 100644 > --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c > +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c > @@ -435,7 +435,7 @@ static void pipe_ctx_to_e2e_pipe_params ( > > } > > -static void dcn_bw_calc_rq_dlg_ttu( > +static __fpu void dcn_bw_calc_rq_dlg_ttu( > const struct dc *dc, > const struct dcn_bw_internal_vars *v, > struct pipe_ctx *pipe, > @@ -1388,7 +1388,7 @@ static unsigned int dcn_find_normalized_clock_vdd_Level( > return vdd_level; > } > > -unsigned int dcn_find_dcfclk_suits_all( > +__fpu unsigned int dcn_find_dcfclk_suits_all( > const struct dc *dc, > struct dc_clocks *clocks) > { > diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c > index ca807846032f..0cbb58fc7fb8 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c > +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c > @@ -115,7 +115,7 @@ static bool is_dual_plane(enum source_format_class source_format) > return ret_val; > } > > -static double get_refcyc_per_delivery(struct display_mode_lib *mode_lib, > +static __fpu double get_refcyc_per_delivery(struct display_mode_lib *mode_lib, > double refclk_freq_in_mhz, > double pclk_freq_in_mhz, > bool odm_combine, > @@ -162,7 +162,7 @@ static unsigned int get_blk_size_bytes(const enum source_macro_tile_size tile_si > return (4 * 1024); > } > > -static void extract_rq_sizing_regs(struct display_mode_lib *mode_lib, > +static __fpu void extract_rq_sizing_regs(struct display_mode_lib *mode_lib, > display_data_rq_regs_st *rq_regs, > const display_data_rq_sizing_params_st rq_sizing) > { > @@ -313,7 +313,7 @@ static void handle_det_buf_split(struct display_mode_lib *mode_lib, > full_swath_bytes_packed_c); > } > > -static void get_meta_and_pte_attr(struct display_mode_lib *mode_lib, > +static __fpu void get_meta_and_pte_attr(struct display_mode_lib *mode_lib, > display_data_rq_dlg_params_st *rq_dlg_param, > display_data_rq_misc_params_st *rq_misc_param, > display_data_rq_sizing_params_st *rq_sizing_param, > @@ -763,7 +763,7 @@ void dml20_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, > > // Note: currently taken in as is. > // Nice to decouple code from hw register implement and extract code that are repeated for luma and chroma. > -static void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, > +static __fpu void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, > const display_e2e_pipe_params_st *e2e_pipe_param, > const unsigned int num_pipes, > const unsigned int pipe_idx, > @@ -1611,7 +1611,7 @@ void dml20_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, > dml_print("DML_DLG: Calculation for pipe[%d] end\n", pipe_idx); > } > > -static void calculate_ttu_cursor(struct display_mode_lib *mode_lib, > +static __fpu void calculate_ttu_cursor(struct display_mode_lib *mode_lib, > double *refcyc_per_req_delivery_pre_cur, > double *refcyc_per_req_delivery_cur, > double refclk_freq_in_mhz, > diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c > index e2d82aacd3bc..36541cba3894 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c > +++ b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c > @@ -133,7 +133,7 @@ void print__rq_dlg_params_st(struct display_mode_lib *mode_lib, display_rq_dlg_p > dml_print("DML_RQ_DLG_CALC: =====================================\n"); > } > > -void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, display_dlg_sys_params_st dlg_sys_param) > +__fpu void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, display_dlg_sys_params_st dlg_sys_param) > { > dml_print("DML_RQ_DLG_CALC: =====================================\n"); > dml_print("DML_RQ_DLG_CALC: DISPLAY_RQ_DLG_PARAM_ST\n"); > diff --git a/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c b/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c > index 723af0b2dda0..e86b1d0128cf 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c > +++ b/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c > @@ -28,7 +28,7 @@ > > #include "dml_inline_defs.h" > > -double dml_round(double a) > +__fpu double dml_round(double a) > { > double round_pt = 0.5; > double ceil = dml_ceil(a, 1); > diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c > index 87d682d25278..a0b0eb2f0fe3 100644 > --- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c > +++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c > @@ -323,7 +323,7 @@ static inline uint32_t calc_dsc_bpp_x16(uint32_t stream_bandwidth_kbps, uint32_t > /* Get DSC bandwidth range based on [min_bpp, max_bpp] target bitrate range, and timing's pixel clock > * and uncompressed bandwidth. > */ > -static void get_dsc_bandwidth_range( > +static __fpu void get_dsc_bandwidth_range( > const uint32_t min_bpp, > const uint32_t max_bpp, > const struct dsc_enc_caps *dsc_caps, > diff --git a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c > index 03ae15946c6d..535770930343 100644 > --- a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c > +++ b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c > @@ -40,7 +40,7 @@ > break > > > -void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum max_min max_min, float bpp) > +__fpu void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum max_min max_min, float bpp) > { > int mode = MODE_SELECT(444, 422, 420); > int sel = table_hash(mode, bpc, max_min); > @@ -85,7 +85,7 @@ void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum ma > memcpy(qps, table[index].qps, sizeof(qp_set)); > } > > -double dsc_roundf(double num) > +__fpu double dsc_roundf(double num) > { > if (num < 0.0) > num = num - 0.5; > @@ -95,7 +95,7 @@ double dsc_roundf(double num) > return (int)(num); > } > > -double dsc_ceil(double num) > +__fpu double dsc_ceil(double num) > { > double retval = (int)num; > > @@ -105,7 +105,7 @@ double dsc_ceil(double num) > return (int)retval; > } > > -void get_ofs_set(qp_set ofs, enum colour_mode mode, float bpp) > +__fpu void get_ofs_set(qp_set ofs, enum colour_mode mode, float bpp) > { > int *p = ofs; > > @@ -172,7 +172,7 @@ int median3(int a, int b, int c) > return b; > } > > -void calc_rc_params(struct rc_params *rc, enum colour_mode cm, enum bits_per_comp bpc, float bpp, int slice_width, int slice_height, int minor_version) > +__fpu void calc_rc_params(struct rc_params *rc, enum colour_mode cm, enum bits_per_comp bpc, float bpp, int slice_width, int slice_height, int minor_version) > { > float bpp_group; > float initial_xmit_delay_factor; > diff --git a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c > index 1f6e63b71456..38b3c4ac96dd 100644 > --- a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c > +++ b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c > @@ -98,7 +98,7 @@ static void copy_rc_to_cfg(struct drm_dsc_config *dsc_cfg, const struct rc_param > dsc_cfg->rc_buf_thresh[i] = rc->rc_buf_thresh[i]; > } > > -int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_parameters *dsc_params) > +__fpu int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_parameters *dsc_params) > { > enum colour_mode mode = pps->convert_rgb ? CM_RGB : > (pps->simple_422 ? CM_444 : > @@ -115,7 +115,7 @@ int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_par > > double d_bytes_per_pixel = dsc_ceil(bpp * slice_width / 8.0) / slice_width; > > - // TODO: Make sure the formula for calculating this is precise (ceiling vs. floor, and at what point they should be applied) > + // TODO: Make sure the formula for calculating this is precise (ceiling vs. floor, and at what point they should be applied > if (pps->native_422 || pps->native_420) > d_bytes_per_pixel /= 2; > > diff --git a/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h b/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h > index 45a07eeffbb6..d2ea6cf65f7e 100644 > --- a/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h > +++ b/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h > @@ -26,6 +26,8 @@ > #ifndef _DCN_CALC_MATH_H_ > #define _DCN_CALC_MATH_H_ > > +#include <asm/fpu/api.h> > + > float dcn_bw_mod(const float arg1, const float arg2); > float dcn_bw_min2(const float arg1, const float arg2); > unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2); > diff --git a/tools/objtool/check.c b/tools/objtool/check.c > index 1607a698eccd..02a51fedd031 100644 > --- a/tools/objtool/check.c > +++ b/tools/objtool/check.c > @@ -252,6 +252,9 @@ static int decode_instructions(struct objtool_file *file) > strncmp(sec->name, ".discard.", 9)) > sec->text = true; > > + if (!strcmp(sec->name, ".text.fpu")) > + sec->fpu = true; > + > for (offset = 0; offset < sec->len; offset += insn->len) { > insn = malloc(sizeof(*insn)); > if (!insn) { > @@ -288,8 +291,10 @@ static int decode_instructions(struct objtool_file *file) > return -1; > } > > - sym_for_each_insn(file, func, insn) > + sym_for_each_insn(file, func, insn) { > insn->func = func; > + insn->fpu_safe = sec->fpu; > + } > } > } > > diff --git a/tools/objtool/elf.h b/tools/objtool/elf.h > index ebbb10c61e24..a743f2f28feb 100644 > --- a/tools/objtool/elf.h > +++ b/tools/objtool/elf.h > @@ -39,7 +39,7 @@ struct section { > char *name; > int idx; > unsigned int len; > - bool changed, text, rodata; > + bool changed, text, rodata, fpu; > }; > > struct symbol { > > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CRodrigo.Siqueira%40amd.com%7C8224db5fb3164b0f857c08d7dcc217d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220598535185983&sdata=Zp4IQeEXy56dJhG39TsIyvfC%2BYJYEoBuV%2Bna5VVvQpk%3D&reserved=0
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx