Re: [PATCH 2/2] PCI: mediatek: Add controller support for MT7629

Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> · Tue, 19 Feb 2019 15:03:53 +0000

On Tue, Feb 19, 2019 at 03:01:39PM +0800, Jianjun Wang wrote:
> On Wed, 2019-01-23 at 15:40 +0000, Lorenzo Pieralisi wrote:
> > On Mon, Dec 24, 2018 at 07:40:28PM +0800, Jianjun Wang wrote:
> > > On Thu, 2018-12-20 at 12:20 -0600, Bjorn Helgaas wrote:
> > > > On Tue, Dec 18, 2018 at 05:19:24PM +0800, Jianjun Wang wrote:
> > > > > On Mon, 2018-12-17 at 15:46 +0000, Lorenzo Pieralisi wrote:
> > > > > > On Mon, Dec 17, 2018 at 08:32:47AM -0600, Bjorn Helgaas wrote:
> > > > > > > On Mon, Dec 17, 2018 at 04:19:39PM +0800, Jianjun Wang wrote:
> > > > > > > > On Thu, 2018-12-13 at 08:55 -0600, Bjorn Helgaas wrote:
> > > > > > > > > On Thu, Dec 06, 2018 at 09:09:13AM +0800, Jianjun Wang wrote:
> > > > > > > > > > The read value of BAR0 is 0xffff_ffff, it's size will be
> > > > > > > > > > calculated as 4GB in arm64 but bogus alignment values at
> > > > > > > > > > arm32, the pcie device and devices behind this bridge will
> > > > > > > > > > not be enabled. Fix it's BAR0 resource size to guarantee
> > > > > > > > > > the pcie devices will be enabled correctly.
> > > > > > > > > 
> > > > > > > > > So this is a hardware erratum?  Per spec, a memory BAR has
> > > > > > > > > bit 0 hardwired to 0, and an IO BAR has bit 1 hardwired to
> > > > > > > > > 0.
> > > > > > > > 
> > > > > > > > Yes, it only works properly on 64bit platform.
> > > > > > > 
> > > > > > > I don't understand.  BARs are supposed to work the same
> > > > > > > regardless of whether it's a 32- or 64-bit platform.  If this is
> > > > > > > a workaround for a hardware defect, please just say that
> > > > > > > explicitly.
> > > > > > 
> > > > > > I do not understand this either. First thing to do is to describe
> > > > > > the problem properly so that we can actually find a solution to
> > > > > > it.
> > > > > 
> > > > > This BAR0 is a 64-bit memory BAR, the HW default values for this BAR
> > > > > is 0xffff_ffff_0000_0000 and it could not be changed except by
> > > > > config write operation.
> > > > 
> > > > If you literally get 0xffff_ffff_0000_0000 when reading the BAR, that
> > > > is out of spec because the low-order 4 bits of a 64-bit memory BAR
> > > > cannot all be zero.
> > > > 
> > > > A 64-bit BAR consumes two DWORDS in config space.  For a 64-bit BAR0,
> > > > the DWORD at 0x10 contains the low-order bits, and the DWORD at 0x14
> > > > contains the upper 32 bits.  Bits 0-3 of the low-order DWORD (the
> > > > one at 0x10) are read-only, and in this case should contain the value
> > > > 0b1100 (0xc).  That means the range is prefetchable (bit 3 == 1) and
> > > > the BAR is 64 bits (bits 2:1 == 10).
> > > 
> > > Sorry, I have confused the HW default value and the read value of BAR
> > > size. The hardware default value is 0xffff_ffff_0000_000c, it's a 64-bit
> > > BAR with prefetchable range.
> > > 
> > > When we start to decoding the BAR, the read value of BAR0 at 0x10 is
> > > 0x0c, and the value at 0x14 is 0xffff_ffff, so the read value of BAR
> > > size is 0xffff_ffff_0000_0000, which will be decoded to 0xffff_ffff, and
> > > it will be set to the end value of BAR0 resource in the pci_dev.
> > > > 
> > > > > The calculated BAR size will be 0 in 32-bit platform since the
> > > > > phys_addr_t is a 32bit value in 32-bit platform.
> > > > 
> > > > Either (1) this is a hardware defect that feeds incorrect data to the
> > > > BAR size calculation, or (2) there's a problem in the BAR size
> > > > calculation code.  We need to figure out which one and work around or
> > > > fix it correctly.
> > > 
> > > The BAR size is calculated by the code (res->end - res->start + 1) is
> > > fine, I think it's a hardware defect because that we can not change the
> > > hardware default value or just disable it since we don't using it.
> > 
> > Apologies for the delay in getting back to this.
> > 
> > This looks like a kernel defect, not a HW defect.
> > 
> > I need some time to make up my mind on what the right fix for this
> > but it is most certainly not this patch.
> > 
> > Lorenzo
> 
> Hi Lorenzo,
> 
> Is there any better idea about this patch?

Hi,

I did not have time to investigate the issue in core code that triggers
this defect but this patch is not the solution to the problem it is a
plaster that papers over it, I won't merge it.

I would appreciate some help. If you could have a look at core code that
triggers the failure we can analyze what should be done to make it work,
I do not think it is a defect in your IP.

Lorenzo