On 12/11/21 5:28 PM, Stefan Berger wrote: > Fix the following crash on kexec by checking chip->ops for a NULL pointer > in tpm_chip_start() and returning an error code if this is the case. > > BUG: Kernel NULL pointer dereference on read at 0x00000060 > Faulting instruction address: 0xc00000000099a06c > Oops: Kernel access of bad area, sig: 11 [#1] > ... > NIP [c00000000099a06c] tpm_chip_start+0x2c/0x140 > LR [c00000000099a808] tpm_chip_unregister+0x108/0x170 > Call Trace: > [c0000000188bfa00] [c000000002b03930] fw_devlink_strict+0x0/0x8 (unreliable) > [c0000000188bfa30] [c00000000099a808] tpm_chip_unregister+0x108/0x170 > [c0000000188bfa70] [c0000000009a3874] tpm_ibmvtpm_remove+0x34/0x130 > [c0000000188bfae0] [c000000000110dbc] vio_bus_remove+0x5c/0xb0 > [c0000000188bfb20] [c0000000009bc154] device_shutdown+0x1d4/0x3a8 > [c0000000188bfbc0] [c000000000196e14] kernel_restart_prepare+0x54/0x70 > > The referenced patch below introduced a function to shut down the VIO bus. > The bus shutdown now calls tpm_del_char_device (via tpm_chip_unregister) > after a call to tpm_class_shutdown, which already set chip->ops to NULL. > The crash occurrs when tpm_del_char_device calls tpm_chip_start with the > chip->ops NULL pointer. It was unclear to me at first, but IIUC the problem is the ibmvtpm device receives two shutdown calls, the first is a class shutdown call for TPM devices, followed by a bus shutdown call for VIO devices. It appears that the class shutdown routines are meant to be a pre-shutdown routine as they are defined as class->shutdown_pre(), and it is clearly allowed to call class->shutdown_pre() followed by one of but not both of the following bus->shutdown() or driver->shutdown(). Even tpm_class_shutdown() mentions in the function comment that bus/device shutdown to follow. > > Fixes: 39d0099f9439 ("powerpc/pseries: Add shutdown() to vio_driver and vio_bus") This patch left implementing each vio driver shutdown routine as an exercise for the respective maintainers, and instead just big hammers anything that doesn't have a shutdown routine by calling the driver->remove(). If tpm_class_shutdown() quiecses ibmvtpm enough implementing a no-op ibmvtpm->shutdown() with a comment saying so suffices. However, the generic TPM code is still introducing a bug that an attempt to call tpm_chip_unregister() after tpm_class_shutdown() will crash as mentioned above. > Signed-off-by: Stefan Berger <stefanb@xxxxxxxxxxxxx> > --- > drivers/char/tpm/tpm-chip.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c > index ddaeceb7e109..cca1bde296ee 100644 > --- a/drivers/char/tpm/tpm-chip.c > +++ b/drivers/char/tpm/tpm-chip.c > @@ -101,6 +101,9 @@ int tpm_chip_start(struct tpm_chip *chip) > { > int ret; > > + if (!chip->ops) > + return -EINVAL; > + I wonder if a better fix would to have tpm_del_char_device() check for valid chip->ops and call tpm_class_shutdown() when the ops are still valid. Calling tpm_class_shutdown() allows for some code deduplication in tpm_del_char_device(). -Tyrel > tpm_clk_enable(chip); > > if (chip->locality == -1) { >