RE: [EXT] regression due to soc_device_match not handling defer (Was: [PATCH v4 4/4] soc: imx8m: change to use platform driver)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thanks for reporting this issue, I'll check and add a fix to handle defer probe.

Best regards,
Alice Guo

> -----Original Message-----
> From: Dominique MARTINET <dominique.martinet@xxxxxxxxxxxxxxxxx>
> Sent: 2021年3月29日 17:09
> To: Alice Guo <alice.guo@xxxxxxx>; Shawn Guo <shawnguo@xxxxxxxxxx>;
> Krzysztof Kozlowski <krzk@xxxxxxxxxx>
> Cc: robh+dt@xxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx; Peng Fan
> <peng.fan@xxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: [EXT] regression due to soc_device_match not handling defer (Was:
> [PATCH v4 4/4] soc: imx8m: change to use platform driver)
> 
> Caution: EXT Email
> 
> Hi,
> 
> First thanks for the patch, it fixes the kexec hang I was looking at...
> 
> Unfortunately it means the soc is now init much later and other piece of drivers
> that depend on querying the soc will fail, I am getting a BUG in the caam crypto
> driver from 7d981405d0fd ("soc: imx8m: change to use platform driver") +
> ce58459d8c7f ("arm64: dts: imx8m: add SoC ID
> compatible") on the imx8mp evk as follow:
> 
> [    2.575505] caam 30900000.crypto: caam pkc algorithms registered in
> /proc/crypto
> [    2.588986] caam 30900000.crypto: registering rng-caam
> [    2.594168] caam_jr 30901000.jr: job ring error: irqstate: 00000103
> [    2.600492] ------------[ cut here ]------------
> [    2.605109] kernel BUG at drivers/crypto/caam/jr.c:187!
> [    2.610338] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [    2.615829] Modules linked in:
> [    2.618895] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc5 #8
> [    2.625168] Hardware name: NXP i.MX8MPlus EVK board (DT)
> [    2.630482] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> [    2.636493] pc : caam_jr_interrupt+0x150/0x154
> [    2.640944] lr : caam_jr_interrupt+0x150/0x154
> [    2.645396] sp : ffff800010003e90
> [    2.648709] x29: ffff800010003e90 x28: ffff800011f82e80
> [    2.654035] x27: ffff800011cd2000 x26: ffff0000c1988400
> [    2.659353] x25: ffff800011cd2000 x24: ffff800011f789e0
> [    2.664674] x23: 000000000000002e x22: ffff800012261000
> [    2.669994] x21: ffff0000c1979410 x20: ffff0000c1988a80
> [    2.675315] x19: 0000000000000103 x18: 0000000000000000
> [    2.680637] x17: 0000000000000007 x16: 000000000000000e
> [    2.685958] x15: 0000000000000030 x14: ffffffffffffffff
> [    2.691279] x13: ffff800090003aa7 x12: ffff800010003aaf
> [    2.696601] x11: 0000000000000003 x10: 0101010101010101
> [    2.701921] x9 : ffff8000100039d0 x8 : ffff800011fa3830
> [    2.707242] x7 : ffff800011ffb830 x6 : ffff800011ffb830
> [    2.712560] x5 : 0000000000000000 x4 : 0000000000000000
> [    2.717881] x3 : 0000000000000000 x2 : 0000000000000000
> [    2.723203] x1 : 0000000000000000 x0 : ffff800011f82e80
> [    2.728528] Call trace:
> [    2.730976]  caam_jr_interrupt+0x150/0x154
> [    2.735080]  __handle_irq_event_percpu+0x6c/0x280
> [    2.739791]  handle_irq_event+0x70/0x160
> [    2.743716]  handle_fasteoi_irq+0xb0/0x200
> [    2.747822]  __handle_domain_irq+0x8c/0xf0
> [    2.751924]  gic_handle_irq+0xd8/0x160
> [    2.755683]  el1_irq+0xb4/0x180
> [    2.758829]  arch_cpu_idle+0x18/0x30
> [    2.762412]  default_idle_call+0x4c/0x1d0
> [    2.766431]  do_idle+0x238/0x2b0
> [    2.769664]  cpu_startup_entry+0x30/0x7c
> [    2.773595]  rest_init+0xe4/0xf4
> [    2.776832]  arch_call_rest_init+0x1c/0x28
> [    2.780937]  start_kernel+0x570/0x5a8
> [    2.784602]  0x0
> [    2.786452] Code: aa1503e0 90005c41 91108021 940da95d (d4210000)
> [    2.792560] ---[ end trace 968b8515172abc2e ]---
> [    2.797181] Kernel panic - not syncing: Oops - BUG: Fatal exception in
> interrupt
> [    2.804580] SMP: stopping secondary CPUs
> [    2.808508] Kernel Offset: disabled
> [    2.811998] CPU features: 0x00240002,2000200c
> [    2.816360] Memory Limit: none
> [    2.819419] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception
> in interrupt ]---
> 
> 
> This particular crash can be fixed by making the caam driver delay as well if the
> device isn't inited yet, e.g. this works:
> --------
> diff --git a/drivers/base/soc.c b/drivers/base/soc.c index
> 0af5363a582c..f179f9e55b49 100644
> --- a/drivers/base/soc.c
> +++ b/drivers/base/soc.c
> @@ -36,6 +36,7 @@ static DEVICE_ATTR(family,            0444,
> soc_info_show,  NULL);
>  static DEVICE_ATTR(serial_number,      0444, soc_info_show,  NULL);
>  static DEVICE_ATTR(soc_id,             0444, soc_info_show,  NULL);
>  static DEVICE_ATTR(revision,           0444, soc_info_show,  NULL);
> +static int init_done;
> 
>  struct device *soc_device_to_device(struct soc_device *soc_dev)  { @@
> -157,6 +158,7 @@ struct soc_device *soc_device_register(struct
> soc_device_attribute *soc_dev_attr
>                 return ERR_PTR(ret);
>         }
> 
> +       init_done = true;
>         return soc_dev;
> 
>  out3:
> @@ -243,6 +245,9 @@ const struct soc_device_attribute
> *soc_device_match(  {
>         int ret = 0;
> 
> +       if (!init_done)
> +               return ERR_PTR(-EPROBE_DEFER);
> +
>         if (!matches)
>                 return NULL;
> 
> diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c index
> ca0361b2dbb0..d08f8cc4131f 100644
> --- a/drivers/crypto/caam/ctrl.c
> +++ b/drivers/crypto/caam/ctrl.c
> @@ -635,6 +635,9 @@ static int caam_probe(struct platform_device *pdev)
>         nprop = pdev->dev.of_node;
> 
>         imx_soc_match = soc_device_match(caam_imx_soc_table);
> +       if (IS_ERR(imx_soc_match))
> +               return PTR_ERR(imx_soc_match);
> +
>         caam_imx = (bool)imx_soc_match;
> 
>         if (imx_soc_match) {
> -------
> 
> But it obviously doesn't cover the ~50 (!) other soc_device_match users in the
> code base which would all need to start handling potential error return code.
> 
> (I also had problems with other drivers when trying to backport the patch to the
> 5.4.70_2.3.0 imx kernel but I just gave up on that one)
> 
> 
> I think having all drivers handle potential EPROBE_DEFER errors is the best way
> forward, how do you propose to move on?
> I can do some but have no way of testing most of these so am a bit reluctant
> to...
> 
> Thanks,
> --
> Dominique




[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux