[PATCH 3/3] drm/amdgpu: Add kernel parameter to control use of ECC/EDC.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Bridgman, John
> Sent: Monday, June 26, 2017 3:12 PM
> To: Xie, AlexBin <AlexBin.Xie at amd.com>; Panariti, David
> <David.Panariti at amd.com>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: RE: [PATCH 3/3] drm/amdgpu: Add kernel parameter to control use
> of ECC/EDC.
> 
> Agreed... one person's "best" is another person's "OMG I didn't want that".
[davep] However, I've questioned the sanity of many default choices.
> IMO we should have bits correspond to specific options as much as possible,
> modulo HW capabilities.
[davep] Yes, we can all agree that the word BEST, wasn't.
You can throw tomatoes at me when we're all in Markham.
But it's specifically the "modulo HW capabilities" that makes me want to use a single value to specify BEST (ADVANCED, RELIABLEST, SAFEST, AAGGRESSIVE, PURPLE, COMPUTE, PROFESSIONAL, etc.) 
Please, suggestions for better than BEST.

To summarize, I propose:

DEFAULT: User need do nothing, no parameter.  As AlexBin suggested, this can be a conservative choice of options. Options defined per asic.
ADVANCED: More features than DEFAULT, but nothing known to break.  Things we're sure not everyone would want. Selected by a unique value e.g. (1 << 3).  Options defined per asic.
ALL: All compatible features, mod mutually exclusive or otherwise incompatible to the HW.  Could be specified as 0xffffffff.
BITMASK: The ultimate catchall.  I'd even say no masking out of any bits, caveat usor.  Don't forget, we're users of this interface and who knows what will be useful for dev or testing.
NONE: ras_param=0

DEFAULT, ADVANCED, ALL and NONE could go into a 2 bit field since they're mutually exclusive.

Examples:
CZ and eCZ:
DEFAULT: everything except PROP_FED (halt ip/reboot)
ADVANCED: DEFAULT + PROP_FED.
ALL: Same as ADVANCED.
BITMASK: PROP_FED, no counters (who knows why, but we can do it)

Vega10:
DEFAULT: No ECC
ADVANCED: No ECC.
ALL: ECC
BITMASK: ECC, don't count on your results.

To me, the benefit of ADVANCED and ALL is that I, as a user, would want them to change with new drivers.  I want that logical behavior and don't want to check ChangeLogs to see what new bits I need.
I think, for example, an HPC customer would want to use ADVANCED because that is what we think is the most reliable. 
Basically they're just macros so in the most common cases, users don't need to worry about the bits.

Providing a bitmask argument allows anything to be overridden, and as an advanced user (such as a developer), system evaluator, etc, absolute flexibility is essential.


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux