On Thu, Oct 19, 2023 at 07:52:26PM +0300, Andy Shevchenko wrote: > On Wed, Oct 18, 2023 at 03:41:24PM -0700, Dmitry Torokhov wrote: > > On Wed, Oct 18, 2023 at 08:01:23AM +0300, Andy Shevchenko wrote: > > > On Tue, Oct 17, 2023 at 02:43:01PM -0700, Dmitry Torokhov wrote: > > > > On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote: > > > > > > Thanks for your response. ... > > > > I wonder, could you please post entire dmesg for your system? > > > > > > Working, non-working or both? > > > > Non working, especially if you also enable debug logs in > > drivers/mmc/host/sdhci-pci-core.c. > > Here we are > https://paste.debian.net/hidden/5d778105/ For the sake of completeness https://paste.debian.net/hidden/149933ac/ the working case on the same codebase (the hash is different due to patch that changes couple of BUG*() to WARN*(), other than that the code is identical). > > What I do not quite understand is that I think we should not be hitting > > the case where pinctrl is already created for the device, which is the > > code path my patch was changing. IIUIC we should be mostly executing the > > "pinctrl not found" path and that did not really change. Maybe you could > > also put some more annotations to show how/at what exact point the probe > > order changed? Maybe log find_pinctrl() calls and compare? > > I see this order in dmesg > [ 48.429681] sdhci-pci 0000:00:01.2: Mapped GSI37 to IRQ79 > [ 48.436219] sdhci-pci 0000:00:01.0: Mapped GSI0 to IRQ80 > [ 48.450347] sdhci-pci 0000:00:01.3: Mapped GSI38 to IRQ81 > > which suggests that PCI enabling devices are happening in parallel > (pcim_enable_device() in SDHCI PCI driver) and whoever wins first gets > the ID via IDA (see mmc_alloc_host() implementation). But PCI itself > guarantees that function 0 has to be always present, so the PCI itself > enumerates it _always_ in the same order (and we are talking about exactly > BDF == x:y.0 in this case). > > > Linus, BTW, I think there are more problems there with pinctrl lookup, > > because, if we assume there are concurrent accesses to pinctrl_get(), > > the fact that we did not find an instance while scanning the list does > > not mean we will not find it when we go to insert a newly created one. > > > > Another problem, as far as I can see, that there is not really a defined > > owner of pinctrl structure, it is created on demand, and destroyed when > > last user is gone. So if we execute last pintctrl_put() and there is > > another pinctrl_get() running simultaneously, we may get and bump up the > > refcount, and then release (pinctrl_free) will acquire the mutex, and > > zap the structure. > > > > Given that there are more issues in that code, maybe we should revert > > the patch for now so Andy has a chance to convert to UUID/LABEL booting? > > I'm testing a PoC of the script, so looks promising, but needs more time to > check other possibilities (see below) and deploy. ... > > > > I think the right answer is "fix the userspace" really in this case. We > > > > could also try extend of_alias_get_id() to see if we could pass some > > > > preferred numbering on x86. But this will again be fragile if the > > > > knowledge resides in the driver and is not tied to a particular board > > > > (as it is in DT case): there could be multiple controllers, things will > > > > be shifting board to board... > > > > > > Any suggestion how should it be properly done in the minimum shell environment? > > > (Busybox uses mdev with static tables IIRC and there is no fancy udev or so) > > > > I'm not sure, so you have something like blkid running? You just need to > > locate the device and chroot there. This assumes you do have initramfs. > > blkid shows UUID for the partition of interest and it doesn't have any label, > OTOH I could parse it for the specific template, while it's less reliable than > going via sysfs from PCI device name, that's defined by hardware and may not be > changed. -- With Best Regards, Andy Shevchenko