Hi Saravana, On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > This series is based on linux-next + these 2 small patches applies on top: > https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@xxxxxxxxxx/ > > A lot of the deferred_probe_timeout logic is redundant with > fw_devlink=on. Also, enabling deferred_probe_timeout by default breaks > a few cases. > > This series tries to delete the redundant logic, simplify the frameworks > that use driver_deferred_probe_check_state(), enable > deferred_probe_timeout=10 by default, and fixes the nfsroot failure > case. > > The overall idea of this series is to replace the global behavior of > driver_deferred_probe_check_state() where all devices give up waiting on > supplier at the same time with a more granular behavior: > > 1. Devices with all their suppliers successfully probed by late_initcall > probe as usual and avoid unnecessary deferred probe attempts. > > 2. At or after late_initcall, in cases where boot would break because of > fw_devlink=on being strict about the ordering, we > > a. Temporarily relax the enforcement to probe any unprobed devices > that can probe successfully in the current state of the system. > For example, when we boot with a NFS rootfs and no network device > has probed. > b. Go back to enforcing the ordering for any devices that haven't > probed. > > 3. After deferred probe timeout expires, we permanently give up waiting > on supplier devices without drivers. At this point, whatever devices > can probe without some of their optional suppliers end up probing. > > In the case where module support is disabled, it's fairly > straightforward and all device probes are completed before the initcalls > are done. > > Patches 1 to 3 are fairly straightforward and can probably be applied > right away. > > Patches 4 to 6 are for fixing the NFS rootfs issue and setting the > default deferred_probe_timeout back to 10 seconds when modules are > enabled. > > Patches 7 to 9 are further clean up of the deferred_probe_timeout logic > so that no framework has to know/care about deferred_probe_timeout. > > Yoshihiro/Geert, > > If you can test this patch series and confirm that the NFS root case > works, I'd really appreciate that. Thanks, I gave this a try on various boards I have access to. The results were quite positive. E.g. the compile error I saw on v1 (implicit declation of fw_devlink_unblock_may_probe(), which is no longer used in v2) is gone. However, I'm seeing a weird error when userspace (Debian9 nfsroot) is starting: [ OK ] Started D-Bus System Message Bus. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000004 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EC = 0x25: DABT (current EL), IL = 32 bits EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault SET = 0, FnV = 0 Data abort info: ISV = 0, ISS = 0x00000004 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec45000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Data abort info: Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 0 PID: 374 Comm: v4l_id Tainted: G W 5.19.0-rc1-arm64-renesas-00799-gc13c3e49e8bd #1660 ISV = 0, ISS = 0x00000004 Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) CM = 0, WnR = 0 pc : subdev_open+0x8c/0x128 lr : subdev_open+0x78/0x128 sp : ffff80000aadba60 x29: ffff80000aadba60 x28: 0000000000000000 x27: ffff80000aadbc58 x26: 0000000000020000 x25: ffff00000b3aaf00 x24: 0000000000000000 x23: ffff00000c331c00 x22: ffff000009aa61b8 x21: ffff000009aa6000 x20: ffff000008bae3e8 x19: ffff00000c3fe200 x18: 0000000000000000 x17: ffff800076945000 x16: ffff800008004000 x15: 00008cc6bf550c7c x14: 000000000000038f x13: 000000000000001a x12: ffff00007fba8618 x11: 0000000000000001 x10: 0000000000000000 x9 : ffff800009253954 x8 : ffff00000b3aaf00 x7 : 0000000000000004 x6 : 000000000000001a x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000100000001 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: subdev_open+0x8c/0x128 v4l2_open+0xa4/0x120 chrdev_open+0x78/0x178 do_dentry_open+0xfc/0x398 vfs_open+0x28/0x30 path_openat+0x584/0x9c8 do_filp_open+0x80/0x108 do_sys_openat2+0x20c/0x2d8 user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec53000 do_sys_open+0x54/0xa0 __arm64_sys_openat+0x20/0x28 invoke_syscall+0x40/0xf8 el0_svc_common.constprop.0+0xf0/0x110 do_el0_svc+0x20/0x78 el0_svc+0x48/0xd0 el0t_64_sync_handler+0xb0/0xb8 el0t_64_sync+0x148/0x14c Code: f9405280 f9400400 b40000e0 f9400280 (f9400000) ---[ end trace 0000000000000000 ]--- This only happens on the Ebisu-4D board (r8a77990-ebisu.dts). I do not see this on the Salvator-X(S) boards. Bisection shows this starts to happen with "[PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default". Adding more debug info: subdev_open:54: file v4l-subdev1 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 subdev_open:54: file v4l-subdev2 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Matching the subdev using sysfs gives: /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev1 /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev2 The i2c device is the adi,adv7482 at address 0x70. But now I'm lost... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds