On Tue, Feb 19, 2019 at 03:01:39PM +0800, Jianjun Wang wrote: > On Wed, 2019-01-23 at 15:40 +0000, Lorenzo Pieralisi wrote: > > On Mon, Dec 24, 2018 at 07:40:28PM +0800, Jianjun Wang wrote: > > > On Thu, 2018-12-20 at 12:20 -0600, Bjorn Helgaas wrote: > > > > On Tue, Dec 18, 2018 at 05:19:24PM +0800, Jianjun Wang wrote: > > > > > On Mon, 2018-12-17 at 15:46 +0000, Lorenzo Pieralisi wrote: > > > > > > On Mon, Dec 17, 2018 at 08:32:47AM -0600, Bjorn Helgaas wrote: > > > > > > > On Mon, Dec 17, 2018 at 04:19:39PM +0800, Jianjun Wang wrote: > > > > > > > > On Thu, 2018-12-13 at 08:55 -0600, Bjorn Helgaas wrote: > > > > > > > > > On Thu, Dec 06, 2018 at 09:09:13AM +0800, Jianjun Wang wrote: > > > > > > > > > > The read value of BAR0 is 0xffff_ffff, it's size will be > > > > > > > > > > calculated as 4GB in arm64 but bogus alignment values at > > > > > > > > > > arm32, the pcie device and devices behind this bridge will > > > > > > > > > > not be enabled. Fix it's BAR0 resource size to guarantee > > > > > > > > > > the pcie devices will be enabled correctly. > > > > > > > > > > > > > > > > > > So this is a hardware erratum? Per spec, a memory BAR has > > > > > > > > > bit 0 hardwired to 0, and an IO BAR has bit 1 hardwired to > > > > > > > > > 0. > > > > > > > > > > > > > > > > Yes, it only works properly on 64bit platform. > > > > > > > > > > > > > > I don't understand. BARs are supposed to work the same > > > > > > > regardless of whether it's a 32- or 64-bit platform. If this is > > > > > > > a workaround for a hardware defect, please just say that > > > > > > > explicitly. > > > > > > > > > > > > I do not understand this either. First thing to do is to describe > > > > > > the problem properly so that we can actually find a solution to > > > > > > it. > > > > > > > > > > This BAR0 is a 64-bit memory BAR, the HW default values for this BAR > > > > > is 0xffff_ffff_0000_0000 and it could not be changed except by > > > > > config write operation. > > > > > > > > If you literally get 0xffff_ffff_0000_0000 when reading the BAR, that > > > > is out of spec because the low-order 4 bits of a 64-bit memory BAR > > > > cannot all be zero. > > > > > > > > A 64-bit BAR consumes two DWORDS in config space. For a 64-bit BAR0, > > > > the DWORD at 0x10 contains the low-order bits, and the DWORD at 0x14 > > > > contains the upper 32 bits. Bits 0-3 of the low-order DWORD (the > > > > one at 0x10) are read-only, and in this case should contain the value > > > > 0b1100 (0xc). That means the range is prefetchable (bit 3 == 1) and > > > > the BAR is 64 bits (bits 2:1 == 10). > > > > > > Sorry, I have confused the HW default value and the read value of BAR > > > size. The hardware default value is 0xffff_ffff_0000_000c, it's a 64-bit > > > BAR with prefetchable range. > > > > > > When we start to decoding the BAR, the read value of BAR0 at 0x10 is > > > 0x0c, and the value at 0x14 is 0xffff_ffff, so the read value of BAR > > > size is 0xffff_ffff_0000_0000, which will be decoded to 0xffff_ffff, and > > > it will be set to the end value of BAR0 resource in the pci_dev. > > > > > > > > > The calculated BAR size will be 0 in 32-bit platform since the > > > > > phys_addr_t is a 32bit value in 32-bit platform. > > > > > > > > Either (1) this is a hardware defect that feeds incorrect data to the > > > > BAR size calculation, or (2) there's a problem in the BAR size > > > > calculation code. We need to figure out which one and work around or > > > > fix it correctly. > > > > > > The BAR size is calculated by the code (res->end - res->start + 1) is > > > fine, I think it's a hardware defect because that we can not change the > > > hardware default value or just disable it since we don't using it. > > > > Apologies for the delay in getting back to this. > > > > This looks like a kernel defect, not a HW defect. > > > > I need some time to make up my mind on what the right fix for this > > but it is most certainly not this patch. > > > > Lorenzo > > Hi Lorenzo, > > Is there any better idea about this patch? Hi, I did not have time to investigate the issue in core code that triggers this defect but this patch is not the solution to the problem it is a plaster that papers over it, I won't merge it. I would appreciate some help. If you could have a look at core code that triggers the failure we can analyze what should be done to make it work, I do not think it is a defect in your IP. Lorenzo