On Thu, Aug 17, 2023 at 4:13 PM Shreeya Patel <shreeya.patel@xxxxxxxxxxxxx> wrote: > > Hi Geert, Saravana, > > On 18/08/23 00:03, Saravana Kannan wrote: > > On Thu, Aug 17, 2023 at 4:37 AM Shreeya Patel > > <shreeya.patel@xxxxxxxxxxxxx> wrote: > >> Hi Greg, > >> > >> On 16/08/23 20:33, Greg Kroah-Hartman wrote: > >>> On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote: > >>>> On 13/06/22 15:40, Greg Kroah-Hartman wrote: > >>>>> From: Saravana Kannan<saravanak@xxxxxxxxxx> > >>>>> > >>>>> [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ] > >>>>> > >>>>> Mounting NFS rootfs was timing out when deferred_probe_timeout was > >>>>> non-zero [1]. This was because ip_auto_config() initcall times out > >>>>> waiting for the network interfaces to show up when > >>>>> deferred_probe_timeout was non-zero. While ip_auto_config() calls > >>>>> wait_for_device_probe() to make sure any currently running deferred > >>>>> probe work or asynchronous probe finishes, that wasn't sufficient to > >>>>> account for devices being deferred until deferred_probe_timeout. > >>>>> > >>>>> Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits > >>>>> until the deferred_probe_timeout fires") tried to fix that by making > >>>>> sure wait_for_device_probe() waits for deferred_probe_timeout to expire > >>>>> before returning. > >>>>> > >>>>> However, if wait_for_device_probe() is called from the kernel_init() > >>>>> context: > >>>>> > >>>>> - Before deferred_probe_initcall() [2], it causes the boot process to > >>>>> hang due to a deadlock. > >>>>> > >>>>> - After deferred_probe_initcall() [3], it blocks kernel_init() from > >>>>> continuing till deferred_probe_timeout expires and beats the point of > >>>>> deferred_probe_timeout that's trying to wait for userspace to load > >>>>> modules. > >>>>> > >>>>> Neither of this is good. So revert the changes to > >>>>> wait_for_device_probe(). > >>>>> > >>>>> [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > >>>>> [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/ > >>>>> [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@xxxxxxxxxxxxx/ > >>>> Hi Saravana, Greg, > >>>> > >>>> > >>>> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb, > >>>> see the following details for more information. > >>>> > >>>> KernelCI dashboard link: > >>>> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/ > >>>> > >>>> Error messages from the logs :- > >>>> > >>>> + UUID=11236495_1.5.2.4.5 > >>>> + set +x > >>>> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin' > >>>> + cd /opt/bootrr/libexec/bootrr > >>>> + sh helpers/bootrr-auto > >>>> e6800000.ethernet > >>>> e6700000.dma-controller > >>>> e7300000.dma-controller > >>>> e7310000.dma-controller > >>>> ec700000.dma-controller > >>>> ec720000.dma-controller > >>>> fea20000.vsp > >>>> feb00000.display > >>>> fea28000.vsp > >>>> fea30000.vsp > >>>> fe9a0000.vsp > >>>> fe9af000.fcp > >>>> fea27000.fcp > >>>> fea2f000.fcp > >>>> fea37000.fcp > >>>> sound > >>>> ee100000.mmc > >>>> ee140000.mmc > >>>> ec500000.sound > >>>> /lava-11236495/1/../bin/lava-test-case > >>>> <8>[ 17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail> > >>>> > >>>> Test case failing :- > >>>> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests > >>>> > >>>> Regression Reproduced :- > >>>> > >>>> Lava job after reverting the commit 5ee76c256e92 > >>>> https://lava.collabora.dev/scheduler/job/11292890 > >>>> > >>>> > >>>> Bisection report from KernelCI can be found at the bottom of the email. > >>>> > >>>> Thanks, > >>>> Shreeya Patel > >>>> > >>>> #regzbot introduced: 5ee76c256e92 > >>>> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb > >>>> > >>>> --------------------------------------------------------------------------------------------------------------------------------------------------- > >>>> > >>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** > >>>> * If you do send a fix, please include this trailer: * > >>>> * Reported-by: "kernelci.org bot" <bot@...> * > >>>> * * > >>>> * Hope this helps! * > >>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > >>>> > >>>> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on > >>>> r8a77960-ulcb > >>> You are testing 5.10.y, yet the subject says 5.17? > >>> > >>> Which is it here? > >> Sorry, I accidentally used the lore link for 5.17 while reporting this > >> issue, > >> but this test does fail on all the stable releases from 5.10 onwards. > >> > >> stable 5.15 :- > >> https://linux.kernelci.org/test/case/id/64dd156a5ac58d0cf335b1ea/ > >> mainline :- > >> https://linux.kernelci.org/test/case/id/64dc13d55cb51357a135b209/ > >> > > Shreeya, can you try the patch Geert suggested and let us know if it > > helps? If not, then I can try to take a closer look. > > I tried to test the kernel with 9be4cbd09da8 but it didn't change the > result. > https://lava.collabora.dev/scheduler/job/11311615 > > Also, I am not sure if this can change things but just FYI, KernelCI > adds some kernel parameters when running these tests and one of the > parameter is deferred_probe_timeout=60. Ah this is good to know. > You can check this in the definition details given in the Lava job. I > also tried to remove this parameter and rerun the test but again I got > the same result. How long does the test wait after boot before checking for the deferred devices list? > I will try to add 9be4cbd09da8 to mainline kernel and see what results I > get. Now I'm confused. What do you mean by mainline? Are you saying the tip of tree of Linus's tree is also hitting this issue? -Saravana