Re: [PATCH v2 0/7] PCI: Solve two bridge window sizing issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 1 Feb 2024, Jonathan Woithe wrote:

> On Mon, Jan 22, 2024 at 02:45:20PM +0100, Igor Mammedov wrote:
> > On Mon, 22 Jan 2024 14:37:32 +0200 (EET)
> > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote:
> > 
> > > On Mon, 22 Jan 2024, Jonathan Woithe wrote:
> > > 
> > > > On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote:  
> > > > > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote:  
> > > > > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote:  
> > > > > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote:  
> > > > > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote:  
> > > > > > > > > On Thu, 28 Dec 2023 18:57:00 +0200
> > > > > > > > > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > >   
> > > > > > > > > > Hi all,
> > > > > > > > > > 
> > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing
> > > > > > > > > > algorithm. Together, they should enable remove & rescan cycle to work
> > > > > > > > > > for a PCI bus that has PCI devices with optional resources and/or
> > > > > > > > > > disparity in BAR sizes.
> > > > > > > > > > 
> > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from
> > > > > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit
> > > > > > > > > > decision (currently that function is called find_resource()). In order
> > > > > > > > > > to do that sensibly, a few improvements seemed in order to make its
> > > > > > > > > > interface and name of the function sane before exposing it. Thus, the
> > > > > > > > > > few extra patches on resource side.
> > > > > > > > > > 
> > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with
> > > > > > > > > > the issues related to the currently ongoing resource regression
> > > > > > > > > > thread [1].  

> > > > Thanks, and understood.  In this case the request from Igor was 
> > > > 
> > > >     can you test this series on affected machine with broken kernel to see if
> > > >     it's of any help in your case?
> > > > 
> > > > The latest vanilla kernel (6.7) has (AFAIK) had the offending commit
> > > > reverted, so it's not a "broken" kernel in this respect.  Therefore, if I've
> > > > understood the request correctly, working with that kernel won't produce the
> > > > desired test.  
> > > 
> > > Well, you can revert the revert again to get back to the broken state.
> > 
> > either this or just a hand patching as Ilpo has suggested earlier
> > would do.
> 
> No problem.  This was the easiest approach for me and I have now done this. 
> Apologies for the delay in getting to this: I ran out of time last Thursday.
> 
> > There is non zero chance that this series might fix issues
> > Jonathan is facing. i.e. failed resource reallocation which
> > offending patches trigger.
> 
> I can confirm that as expected, this patch series has had no effect on the
> system which experiences the failed resource reallocation.  From syslog,
> running a 5.15.141+ kernel[1]:
> 
>     kernel: radeon 0000:4b:00.0: Fatal error during GPU init
>     kernel: radeon: probe of 0000:4b:00.0 failed with error -12
> 
> This is unchanged from what is seen with the unaltered 5.15.141 kernel.
> 
> In case it's important, can also confirm that the errors related to the
> thunderbolt device are are also still present in the patched 5.15.141+
> kernel:
> 
>     thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled
>     :
>     thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled
>     :
> 
> Like the GPU failure, they do not appear in the working kernels on this
> system.
> 
> Let me know if you would like to me to run further tests.
> 
> Regards
>   jonathan
> 
> [1] This is 5.15.141, patched with the series of interest here and the hand
>     patch from Ilpo.

Hi Jonathan,

Thanks a lot for testing it regardless. The end result was not a big 
surprise given how it looked like based on the logs but was certainly 
worth a test like Igor mentioned. The resource allocation code isn't among 
the easiest to track.


-- 
 i.

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux