[PATCH] drm/amdgpu: Add kernel parameter to manage memory error handling.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Michel Dänzer [mailto:michel at daenzer.net]
> Sent: Tuesday, April 11, 2017 10:12 PM
> To: Panariti, David <David.Panariti at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: Add kernel parameter to manage
> memory error handling.
> 
> 
> Hi Dave,
> 
> 
> Please send patches inline instead of as attachments, ideally using git send-
> email.
> 
> 
> > @@ -212,6 +213,9 @@ module_param_named(cg_mask,
> amdgpu_cg_mask, uint,
> > 0444);  MODULE_PARM_DESC(pg_mask, "Powergating flags mask (0 =
> disable
> > power gating)");  module_param_named(pg_mask, amdgpu_pg_mask,
> uint,
> > 0444);
> >
> > +MODULE_PARM_DESC(ecc_mask, "ECC/EDC flags mask (0 = disable
> > +ECC/EDC)");
> 
> "0 = disable ECC/EDC" implies that they're enabled by default? Was that
> already the case before this patch?

[davep] Yes it was, and there was actually a problem in some cases where the CZ would hang which is why I added the param.
I was wondering if it would be better to default to them being off, but I wasn't sure how important maintaining original behavior is considered.
Actually, there are some bugs in the workaround function as it is, so it really should default to off.
I did some work implementing EDC for Vilas in research, but it was side tracked for a while.
I fixed up the workaround, but it really makes no sense to use it at all until the rest of the EDC code is in place, and isn't currently scheduled to happen.
Given that, I actually think it would be best to remove the workaround and leave EDC disabled until (if) it is scheduled to be completed.

> 
> 
> > @@ -1664,6 +1664,24 @@ static int
> gfx_v8_0_do_edc_gpr_workarounds(struct amdgpu_device *adev)
> >  	if (adev->asic_type != CHIP_CARRIZO)
> >  		return 0;
> >
> > +	DRM_INFO("gfx_v8_0_do_edc_gpr_workarounds(): ecc_flags:
> 0x%08x\n",
> > +		 adev->ecc_flags);
> > +
> > +	/*
> > +	 * Check if EDC has been requested.
> > +	 * For Carrizo, EDC is the best/safest mode WRT error handling.
> > +	 */
> > +	if (!(adev->ecc_flags
> > +	      & (AMD_ECC_SUPPORT_BEST | AMD_ECC_SUPPORT_EDC))) {
> > +		DRM_INFO("gfx_v8_0_do_edc_gpr_workarounds(): "
> > +			 "skipping workarounds and not enabling EDC.\n");
> > +
> > +		return 0;
> > +	}
> > +
> > +	DRM_INFO("gfx_v8_0_do_edc_gpr_workarounds(): "
> > +		 "running workarounds and enabling EDC.\n");
> 
> These DRM_INFOs are too chatty, maybe make them e.g.
> DRM_DEBUG_DRIVER.
> 
> 
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux