Bjorn, please weigh in on this - please see below. On Wed, Jun 19, 2019 at 06:49:45PM -0600, Logan Gunthorpe wrote: > > > On 2019-06-19 6:44 p.m., Nicholas Johnson wrote: > >> This is all kinds of confusing... the bug report just seems to be a copy > >> of the patch set. The description of the actual symptoms of the problem > >> appears to be missing from all of it. > > I believe everything to be there, but I can take another look and add > > more details. It is possible I lost track of what I had written where. > > > > There are common elements which I borrowed from the patchset or > > vice-versa, like the pin diagram for using the Thunderbolt add-in card > > for testing. > > What's missing are symptoms of the bug or what you are actually seeing > with what hardware. The closest thing to that is the bug's title. But > it's not clear what the problem is with having a double size MMIO window. > > The pin diagram and stuff is just noise to me because I don't have that > hardware and am not going to buy it just to try to figure out if there > is a bug there or not. > > >> > >>> Currently, the kernel can sometimes assign the MMIO_PREF window > >>> additional size into the MMIO window, resulting in double the MMIO > >>> additional size, even if the MMIO_PREF window was successful. > >>> > >>> This happens if in the first pass, the MMIO_PREF succeeds but the MMIO > >>> fails. In the next pass, because MMIO_PREF is already assigned, the > >>> attempt to assign MMIO_PREF returns an error code instead of success > >>> (nothing more to do, already allocated). > >>> > >>> Example of problem (more context can be found in the bug report URL): > >>> > >>> Mainline kernel: > >>> pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M > >>> pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M > >>> > >>> Patched kernel: > >>> pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M > >>> pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M > >>> > >>> This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine > >>> with the same configuration, with a Ubuntu mainline kernel and a kernel > >>> patched with this patch series. > >>> > >>> This patch is vital for the next patch in the series. The next patch > >>> allows the user to specify MMIO and MMIO_PREF independently. If the > >>> MMIO_PREF is set to be very large, this bug will end up more than > >>> doubling the MMIO size. The bug results in the MMIO_PREF being added to > >>> the MMIO window, which means doubling if MMIO_PREF size == MMIO size. > >>> With a large MMIO_PREF, without this patch, the MMIO window will likely > >>> fail to be assigned altogether due to lack of 32-bit address space. > >>> > >>> Patch notes > >>> ========================================================================== > >>> > >>> Change find_free_bus_resource() to not skip assigned resources with > >>> non-null parent. > >>> > >>> Add checks in pbus_size_io() and pbus_size_mem() to return success if > >>> resource returned from find_free_bus_resource() is already allocated. > >>> > >>> This avoids pbus_size_io() and pbus_size_mem() returning error code to > >>> __pci_bus_size_bridges() when a resource has been successfully assigned > >>> in a previous pass. This fixes the existing behaviour where space for a > >>> resource could be reserved multiple times in different parent bridge > >>> windows. This also greatly reduces the number of failed BAR messages in > >>> dmesg when Linux assigns resources. > >> > >> This patch looks like the same bug that I tracked down earlier but I > >> solved in a slightly different way. See this patch[1] which is still > >> under review. Can you maybe test it and see if it solves the same problem? > > > > I read [1] and it is definitely the same bug, without a doubt. This is > > fantastic because it means I have somebody to back me up on this. I will > > test the patch as soon as I can - perhaps after work today. > > > > My initial thoughts of [1] patch are that restricting 64-bit BARs to > > 64-bit windows might break assigning 64-bit BARs on bridges without the > > optional prefetchable window. My patch should not have that issue - but > > after I have tested [1], it might turn out to be fine. > > > > Correct me if I am wrong about assumptions about windows. My > > understanding cannot be perfect. As far as I know, 64-bit BARs should > > always be prefetchable, but I own the Aquantia AQC-107S NIC and it has > > three 64-bit non-pref BARs. It happens that they are assigned into the > > 32-bit window. I will see if [1] patch prevents that from happening or > > not. > > As best as I can tell the patches should have identical functionality. > My patch ignores the error returned by pbus_size_mem() your patch forces > the function from returning an error inside it for the same case. > > Logan I finally tested this (not rigorously) and it appears to work the same as my patch and did not do anything strange. It solves the problem that it set out to fix. If you want to add my tested-by then that is fine. I still slightly prefer my patch because it corrects the return codes instead of ignoring them. However, both patches have merits. Bjorn, please advise on what you think is better and / or easier to sign off on. In my PATCH v7, should I consider moving this to the end of the series so that any changes do not impact anything else? I.e. can remove the patch if we take Logan's version instead. Also, if you decide to take Logan's patch, which of the following do I do? a) drop the patch from my series and leave it at that (in this case I hope I can have a co-reported-by in Logan's patch when it gets accepted) b) merge Logan's patch into my series, giving credit c) something else If you decide to take mine then we will need to discuss Logan's concerns about my documentation and I will need to update some more information into the bug report (or link both bug reports into the notes). Cheers