Hi, Thanks for reporting this issue, I'll check and add a fix to handle defer probe. Best regards, Alice Guo > -----Original Message----- > From: Dominique MARTINET <dominique.martinet@xxxxxxxxxxxxxxxxx> > Sent: 2021年3月29日 17:09 > To: Alice Guo <alice.guo@xxxxxxx>; Shawn Guo <shawnguo@xxxxxxxxxx>; > Krzysztof Kozlowski <krzk@xxxxxxxxxx> > Cc: robh+dt@xxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx; Peng Fan > <peng.fan@xxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: [EXT] regression due to soc_device_match not handling defer (Was: > [PATCH v4 4/4] soc: imx8m: change to use platform driver) > > Caution: EXT Email > > Hi, > > First thanks for the patch, it fixes the kexec hang I was looking at... > > Unfortunately it means the soc is now init much later and other piece of drivers > that depend on querying the soc will fail, I am getting a BUG in the caam crypto > driver from 7d981405d0fd ("soc: imx8m: change to use platform driver") + > ce58459d8c7f ("arm64: dts: imx8m: add SoC ID > compatible") on the imx8mp evk as follow: > > [ 2.575505] caam 30900000.crypto: caam pkc algorithms registered in > /proc/crypto > [ 2.588986] caam 30900000.crypto: registering rng-caam > [ 2.594168] caam_jr 30901000.jr: job ring error: irqstate: 00000103 > [ 2.600492] ------------[ cut here ]------------ > [ 2.605109] kernel BUG at drivers/crypto/caam/jr.c:187! > [ 2.610338] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > [ 2.615829] Modules linked in: > [ 2.618895] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc5 #8 > [ 2.625168] Hardware name: NXP i.MX8MPlus EVK board (DT) > [ 2.630482] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) > [ 2.636493] pc : caam_jr_interrupt+0x150/0x154 > [ 2.640944] lr : caam_jr_interrupt+0x150/0x154 > [ 2.645396] sp : ffff800010003e90 > [ 2.648709] x29: ffff800010003e90 x28: ffff800011f82e80 > [ 2.654035] x27: ffff800011cd2000 x26: ffff0000c1988400 > [ 2.659353] x25: ffff800011cd2000 x24: ffff800011f789e0 > [ 2.664674] x23: 000000000000002e x22: ffff800012261000 > [ 2.669994] x21: ffff0000c1979410 x20: ffff0000c1988a80 > [ 2.675315] x19: 0000000000000103 x18: 0000000000000000 > [ 2.680637] x17: 0000000000000007 x16: 000000000000000e > [ 2.685958] x15: 0000000000000030 x14: ffffffffffffffff > [ 2.691279] x13: ffff800090003aa7 x12: ffff800010003aaf > [ 2.696601] x11: 0000000000000003 x10: 0101010101010101 > [ 2.701921] x9 : ffff8000100039d0 x8 : ffff800011fa3830 > [ 2.707242] x7 : ffff800011ffb830 x6 : ffff800011ffb830 > [ 2.712560] x5 : 0000000000000000 x4 : 0000000000000000 > [ 2.717881] x3 : 0000000000000000 x2 : 0000000000000000 > [ 2.723203] x1 : 0000000000000000 x0 : ffff800011f82e80 > [ 2.728528] Call trace: > [ 2.730976] caam_jr_interrupt+0x150/0x154 > [ 2.735080] __handle_irq_event_percpu+0x6c/0x280 > [ 2.739791] handle_irq_event+0x70/0x160 > [ 2.743716] handle_fasteoi_irq+0xb0/0x200 > [ 2.747822] __handle_domain_irq+0x8c/0xf0 > [ 2.751924] gic_handle_irq+0xd8/0x160 > [ 2.755683] el1_irq+0xb4/0x180 > [ 2.758829] arch_cpu_idle+0x18/0x30 > [ 2.762412] default_idle_call+0x4c/0x1d0 > [ 2.766431] do_idle+0x238/0x2b0 > [ 2.769664] cpu_startup_entry+0x30/0x7c > [ 2.773595] rest_init+0xe4/0xf4 > [ 2.776832] arch_call_rest_init+0x1c/0x28 > [ 2.780937] start_kernel+0x570/0x5a8 > [ 2.784602] 0x0 > [ 2.786452] Code: aa1503e0 90005c41 91108021 940da95d (d4210000) > [ 2.792560] ---[ end trace 968b8515172abc2e ]--- > [ 2.797181] Kernel panic - not syncing: Oops - BUG: Fatal exception in > interrupt > [ 2.804580] SMP: stopping secondary CPUs > [ 2.808508] Kernel Offset: disabled > [ 2.811998] CPU features: 0x00240002,2000200c > [ 2.816360] Memory Limit: none > [ 2.819419] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception > in interrupt ]--- > > > This particular crash can be fixed by making the caam driver delay as well if the > device isn't inited yet, e.g. this works: > -------- > diff --git a/drivers/base/soc.c b/drivers/base/soc.c index > 0af5363a582c..f179f9e55b49 100644 > --- a/drivers/base/soc.c > +++ b/drivers/base/soc.c > @@ -36,6 +36,7 @@ static DEVICE_ATTR(family, 0444, > soc_info_show, NULL); > static DEVICE_ATTR(serial_number, 0444, soc_info_show, NULL); > static DEVICE_ATTR(soc_id, 0444, soc_info_show, NULL); > static DEVICE_ATTR(revision, 0444, soc_info_show, NULL); > +static int init_done; > > struct device *soc_device_to_device(struct soc_device *soc_dev) { @@ > -157,6 +158,7 @@ struct soc_device *soc_device_register(struct > soc_device_attribute *soc_dev_attr > return ERR_PTR(ret); > } > > + init_done = true; > return soc_dev; > > out3: > @@ -243,6 +245,9 @@ const struct soc_device_attribute > *soc_device_match( { > int ret = 0; > > + if (!init_done) > + return ERR_PTR(-EPROBE_DEFER); > + > if (!matches) > return NULL; > > diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c index > ca0361b2dbb0..d08f8cc4131f 100644 > --- a/drivers/crypto/caam/ctrl.c > +++ b/drivers/crypto/caam/ctrl.c > @@ -635,6 +635,9 @@ static int caam_probe(struct platform_device *pdev) > nprop = pdev->dev.of_node; > > imx_soc_match = soc_device_match(caam_imx_soc_table); > + if (IS_ERR(imx_soc_match)) > + return PTR_ERR(imx_soc_match); > + > caam_imx = (bool)imx_soc_match; > > if (imx_soc_match) { > ------- > > But it obviously doesn't cover the ~50 (!) other soc_device_match users in the > code base which would all need to start handling potential error return code. > > (I also had problems with other drivers when trying to backport the patch to the > 5.4.70_2.3.0 imx kernel but I just gave up on that one) > > > I think having all drivers handle potential EPROBE_DEFER errors is the best way > forward, how do you propose to move on? > I can do some but have no way of testing most of these so am a bit reluctant > to... > > Thanks, > -- > Dominique