On Thu, 1 Feb 2024, Jonathan Woithe wrote: > On Mon, Jan 22, 2024 at 02:45:20PM +0100, Igor Mammedov wrote: > > On Mon, 22 Jan 2024 14:37:32 +0200 (EET) > > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote: > > > > > On Mon, 22 Jan 2024, Jonathan Woithe wrote: > > > > > > > On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote: > > > > > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote: > > > > > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > > > > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > > > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > > > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > > > > > > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > > > > > > disparity in BAR sizes. > > > > > > > > > > > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > > > > > > decision (currently that function is called find_resource()). In order > > > > > > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > > > > > > few extra patches on resource side. > > > > > > > > > > > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > > > > > > the issues related to the currently ongoing resource regression > > > > > > > > > > thread [1]. > > > > Thanks, and understood. In this case the request from Igor was > > > > > > > > can you test this series on affected machine with broken kernel to see if > > > > it's of any help in your case? > > > > > > > > The latest vanilla kernel (6.7) has (AFAIK) had the offending commit > > > > reverted, so it's not a "broken" kernel in this respect. Therefore, if I've > > > > understood the request correctly, working with that kernel won't produce the > > > > desired test. > > > > > > Well, you can revert the revert again to get back to the broken state. > > > > either this or just a hand patching as Ilpo has suggested earlier > > would do. > > No problem. This was the easiest approach for me and I have now done this. > Apologies for the delay in getting to this: I ran out of time last Thursday. > > > There is non zero chance that this series might fix issues > > Jonathan is facing. i.e. failed resource reallocation which > > offending patches trigger. > > I can confirm that as expected, this patch series has had no effect on the > system which experiences the failed resource reallocation. From syslog, > running a 5.15.141+ kernel[1]: > > kernel: radeon 0000:4b:00.0: Fatal error during GPU init > kernel: radeon: probe of 0000:4b:00.0 failed with error -12 > > This is unchanged from what is seen with the unaltered 5.15.141 kernel. > > In case it's important, can also confirm that the errors related to the > thunderbolt device are are also still present in the patched 5.15.141+ > kernel: > > thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled > : > thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled > : > > Like the GPU failure, they do not appear in the working kernels on this > system. > > Let me know if you would like to me to run further tests. > > Regards > jonathan > > [1] This is 5.15.141, patched with the series of interest here and the hand > patch from Ilpo. Hi Jonathan, Thanks a lot for testing it regardless. The end result was not a big surprise given how it looked like based on the logs but was certainly worth a test like Igor mentioned. The resource allocation code isn't among the easiest to track. -- i.