>From: Keith Busch <kbusch@xxxxxxxxxx> >Sent: Tuesday, January 17, 2023 5:55 PM >To: Alexey Bogoslavsky <Alexey.Bogoslavsky@xxxxxxx> >Cc: linux-pci@xxxxxxxxxxxxxxx; bhelgaas@xxxxxxxxxx; 'hch@xxxxxx' <hch@xxxxxx> >Subject: Re: [PATCH 1/1] PCI/AER: Ignore correctable error reports for SN730 WD SSD >On Mon, Jan 16, 2023 at 06:32:54PM +0000, Alexey Bogoslavsky wrote: >> From: Alexey Bogoslavsky <mailto:Alexey.Bogoslavsky@xxxxxxx> >> >> A bug was found in SN730 WD SSD that causes occasional false AER reporting >> of correctable errors. While functionally harmless, this causes error >> messages to appear in the system log (dmesg) which, in turn, causes >> problems in automated platform validation tests. Since the issue can not >> be fixed by FW, customers asked for correctable error reporting to be >> quirked out in the kernel for this particular device. > >> The patch was manually verified. It was checked that correctable errors >> are still detected but ignored for the target device (SN730), and are both >> detected and reported for devices not affected by this quirk. >If you're just going to have the kernel ignore these, are you not able >to suppress the ERR_COR message at the source? Have the following >options been tried? > a. Disabling Correctable Error Reporting Enable in Device Control > Register; i.e. mask out PCI_EXP_DEVCTL_CERE. > b. Setting AER Correctable Error Mask Register to all 1's >I think it's usually possible for firmware to hardwire these. If the I believe these options were discussed but deemed non-viable. I'll double check anyway >If firmware can't do that, quirking the kernel to always disable reporting >sounds like a better option. If either of the above fail to suppress the >error messages, then I guess having the kernel ignore it is the only >option. This could probably work. I'll discuss this with our FW team to make sure the issue can be resolved this way. Thank you