RE: [PATCH 1/1] PCI/AER: Ignore correctable error reports for SN730 WD SSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Bjorn,

Sorry for not addressing your questions earlier. As you may have heard, WD experienced
a hacking attack which left us with no access to the company e-mail for weeks.
As for the patch, no FW change was an option as the product causing the issue was
basically at the end of life. So, I prepared a workaround that took into account
all the comments from the community.

Yet, at this point it seems like the company has lost interest in promoting this patch altogether.
So we could just drop it. Please let me know if there's anything I need to do to request that
officially.

Thank you,
Alexey

-----Original Message-----
From: Bjorn Helgaas <helgaas@xxxxxxxxxx> 
Sent: Wednesday, April 12, 2023 1:15 AM
To: Alexey Bogoslavsky <Alexey.Bogoslavsky@xxxxxxx>
Cc: Keith Busch <kbusch@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; Bjorn Helgas <bhelgaas@xxxxxxxxxx>; Christoph Hellwig <hch@xxxxxx>; Grant Grundler <grundler@xxxxxxxxxxxx>; Rajat Khandelwal <rajat.khandelwal@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH 1/1] PCI/AER: Ignore correctable error reports for SN730 WD SSD

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


[+cc Grant, Rajat]

On Tue, Jan 17, 2023 at 06:15:28PM +0000, Alexey Bogoslavsky wrote:
> >From: Keith Busch <kbusch@xxxxxxxxxx>
> >Sent: Tuesday, January 17, 2023 5:55 PM
> >To: Alexey Bogoslavsky <Alexey.Bogoslavsky@xxxxxxx>
> >Cc: linux-pci@xxxxxxxxxxxxxxx; bhelgaas@xxxxxxxxxx; 'hch@xxxxxx' <hch@xxxxxx>
> >Subject: Re: [PATCH 1/1] PCI/AER: Ignore correctable error reports for SN730 WD SSD
>
> >On Mon, Jan 16, 2023 at 06:32:54PM +0000, Alexey Bogoslavsky wrote:
> >> From: Alexey Bogoslavsky <mailto:Alexey.Bogoslavsky@xxxxxxx>
> >>
> >> A bug was found in SN730 WD SSD that causes occasional false AER reporting
> >> of correctable errors. While functionally harmless, this causes error
> >> messages to appear in the system log (dmesg) which, in turn, causes
> >> problems in automated platform validation tests. Since the issue can not
> >> be fixed by FW, customers asked for correctable error reporting to be
> >> quirked out in the kernel for this particular device.
> >
> >> The patch was manually verified. It was checked that correctable errors
> >> are still detected but ignored for the target device (SN730), and are both
> >> detected and reported for devices not affected by this quirk.
>
> >If you're just going to have the kernel ignore these, are you not able
> >to suppress the ERR_COR message at the source? Have the following
> >options been tried?
>
> > a. Disabling Correctable Error Reporting Enable in Device Control
> >    Register; i.e. mask out PCI_EXP_DEVCTL_CERE.
> > b. Setting AER Correctable Error Mask Register to all 1's
>
> >I think it's usually possible for firmware to hardwire these. If the
>
> I believe these options were discussed but deemed non-viable. I'll
> double check anyway
>
> >If firmware can't do that, quirking the kernel to always disable
> >reporting sounds like a better option. If either of the above fail
> >to suppress the error messages, then I guess having the kernel
> >ignore it is the only option.
>
> This could probably work. I'll discuss this with our FW team to make
> sure the issue can be resolved this way. Thank you

Any resolution on this FW possibility?

We have patches in progress to rate-limit correctable error messages
and make them KERN_INFO instead of KERN_WARN [1], but I don't think
that's going to be a good enough solution for you because nobody wants
to see even an informational message every 5 seconds if the message is
useless.

If firmware on the device can turn off these errors, that would be the
best solution.  If not, I think your quirk is a reasonable approach
and just needs a litle polishing per the previous comments.

Bjorn

[1] https://lore.kernel.org/r/20230317175109.3859943-1-grundler@xxxxxxxxxxxx




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux