Re: [PATCH v2] platform/x86: p2sb: Defer P2SB device scan when P2SB device has func 0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mar 03, 2024 / 20:35, Hans de Goede wrote:
> Hi Shinichiro,
> 
> On 3/3/24 00:37, Shinichiro Kawasaki wrote:
> > On Mar 02, 2024 / 12:43, Hans de Goede wrote:
> 
> <snip>
> 
> >> Can we not simply just skip scanning function 0 all together when
> >> on Goldmont? I don't think any drivers actually ask for the bar
> >> of function 0 on Goldmont ?
> > 
> > Hi Hans, thank you for the idea. If we can take this appraoch, the fix patch
> > will be simpler.
> > 
> >>
> >> This is likely also why we never had the issue with the old p2sb_bar()
> >> code, because that never touched function 0.
> >>
> >> I think this is actually what you did in one of your first test
> >> patches in the bugzilla, right ?
> > 
> > To be precise, the first test patch did P2SB scan only for the function 2.
> > To make sure your idea works, it's the better to test to scan all the function
> > numbers except 0, from 1 to 7.
> > 
> >>
> >> So maybe audit all the callers of p2sb_bar() and see if any
> >> caller asks for function 0 on goldmont ?
> >>
> >> Let me know if you need help with this audit.
> > 
> > Help for the audit will be appreciated.
> > 
> > With the quick grep for p2sb_bar() [2], there are five p2sb_bar() callers:
> 
> Ack, I have found the same 5 callers, let go over them one by one:
> 
> >  1) edac/pnd2_edac             devfn = 0
> 
> Hmm, ok so this one binds based on CPU-ids:
> 
> static const struct x86_cpu_id pnd2_cpuids[] = {
>         X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT,       &apl_ops),
>         X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_D,     &dnv_ops),
>         { }
> };
> MODULE_DEVICE_TABLE(x86cpu, pnd2_cpuids);
> 
> And the 0 passed here will get replaced by PCI_DEVFN(13, 0),
> so there goes my theory of p2sb() never being called for
> function 0.

Thank you very much fot the audit. So, it was clarified that the p2sb_bar()
can be called for function 0 on goldmont.

> 
> So I have taken a quick look at your latest patch from:
> https://bugzilla.kernel.org/show_bug.cgi?id=218531
> 
> I think that skipping the caching at fs_initcall() on goldmont
> is a good idea.
> 
> But you still cache *all* the bars for goldmont on the first
> p2sb_bar(bus, 0, &res) call .
> 
> If we delay caching the bars till there first use, why not
> just do that for all the bars and also drop p2sb_scan_and_cache()
> which for non goldmont is equivalent to p2sb_scan_and_cache_devfn()
> but on goldmont caches all the functions.
> 
> Since you now delay caching (on goldmont) to the first p2sb_bar()
> call I think that you can just drop p2sb_scan_and_cache()
> altogether and just directly call p2sb_scan_and_cache_devfn()
> in its place.
> 
> This means that on goldmont where both the p2sb devfn
> PCI_DEVFN(13, 0) and the SPI controller PCI_DEVFN(13, 2)
> are used we end up going through p2sb_cache_resources()
> twice, assuming both are actually requested at least once.
> But with your current patch this will also happen when
> PCI_DEVFN(13, 2) gets requested first because then
> p2sb_scan_and_cache() will enter the "not function 0"
> path and only cache the one resource.
> 
> So I think that it would make things more KISS if
> p2sb_bar() always only caches the requested devfn bar0
> instead of treating function0 special as it does now.

Thank you again for looking into the patch. I agree that the "function 0" path
in p2sb_scan_and_cache() is not meaningful any more. When I prepare v3 patch,
I will modify the patch to call p2sb_scan_and_cache_devfn() in place of
p2sb_scan_and_cache().

> 
> Also talking about making things more KISS, how
> about completely dropping the fs_initcall and
> simply always delay the caching of a devfn until
> the first call of p2sb_bar() for that devfn ?
> 
> That way fixing the issue will also actually reduce /
> simplify the code :)

This will simplify the code more, but it has two drawabacks:

1) It still leaves the rare deadlock scenario. If the drivers which call
   p2sb_bar() are not probed during boot, and if they are probed afterwards by
   sysfs pci bus rescan, pci_rescan_remove_lock causes the deadlock.

2) It triggers lockdep splat for pci_rescan_remove_lock at sysfs pci bus rescan,
   even for devices unrelated to p2sb (This is what I regularly observe during
   kernel tests for storage sub-system.)

I suggest to limit these drawbacks only on goldmont.

On the other hand, it means that the drawbacks will be still left on goldmont.
I would like to follow your call. If you think the "no fs_initcall() and always
delay the caching" is the best, I'm willing to prepare the patch for it.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux