On Fri, Sep 05, 2014 at 05:34:21PM -0600, Bjorn Helgaas wrote: > Date: Fri, 5 Sep 2014 17:34:21 -0600 > From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > To: "Chen, Gong" <gong.chen@xxxxxxxxxxxxxxx> > Cc: rdunlap@xxxxxxxxxxxxx, bp@xxxxxxxxx, tony.luck@xxxxxxxxx, > linux-pci@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [RESEND RFC 5/5] PCIe, AER: Update initial value of UC error > mask > User-Agent: Mutt/1.5.21 (2010-09-15) > > On Wed, Aug 13, 2014 at 02:22:41AM -0400, Chen, Gong wrote: > > In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register > > is redefined and it has an explicit requirement that when writing > > this field, a value of 1b is the only choice. So change previous > > initial maks from 0 to 1. > > > > Signed-off-by: Chen, Gong <gong.chen@xxxxxxxxxxxxxxx> > > --- > > NOTE: After scratching all use cases, this is the most obvious use > > case to violate the SPEC. Most of use cases just read first and > > then overwrite for clear purpose. Even so, such fix is obvious to > > not compatiable with previous SPEC definition. Do we need a dirty > > hack? > > > > arch/mips/pci/pci-octeon.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c > > index 59cccd95688b..f1bfdc201297 100644 > > --- a/arch/mips/pci/pci-octeon.c > > +++ b/arch/mips/pci/pci-octeon.c > > @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev) > > dconfig); > > /* Enable reporting of all uncorrectable errors */ > > /* Uncorrectable Error Mask - turned on bits disable errors */ > > - pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0); > > + pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1); > > I see the text in the spec that says we should only write 1 to bit 0 (sec > 7.10.3, for anybody following along). It looks like that change was made > between PCIe r1.0 and r1.1. It would really be nice to have more context > about why the change was made, because if there's hardware in the field > that implements r1.0 behavior, this patch will change the way it works, and > I don't know how to verify that is safe. > > Does this actually change fix a problem? If it fixes a problem that > happens on real hardware, that's a much better reason to make a change than > just to comply with the spec. > > Sec 7.10.2 also says we should ignore the value of bit 0 in the > Uncorrectable Error Status register, and I don't see any place where we > follow that advice. > That's why I mark this patch as RFC. As you mentioned above, these are my concerns, too. I submit such a patch not for merging but throwing a potential issue. As I noted above, I don't know if it is deserved to fix all affected placed to comply with spec change. After all, no one reports such an issue (or maybe have happened :-))
Attachment:
signature.asc
Description: Digital signature