Re: [PATCH 12/12] crypto: caam - change JR device ownership scheme

Andrey Smirnov <andrew.smirnov@xxxxxxxxx> · Tue, 17 Sep 2019 20:13:14 -0700

On Fri, Sep 13, 2019 at 12:16 PM Leonard Crestez
<leonard.crestez@xxxxxxx> wrote:
>
> On 04.09.2019 05:35, Andrey Smirnov wrote:
> > Returning -EBUSY from platform device's .remove() callback won't stop
> > the removal process, so the code in caam_jr_remove() is not going to
> > have the desired effect of preventing JR from being removed.
> >
> > In order to be able to deal with removal of the JR device, change the
> > code as follows:
> >
> >    1. To make sure that underlying struct device remains valid for as
> >       long as we have a reference to it, add appropriate device
> >       refcount management to caam_jr_alloc() and caam_jr_free()
> >
> >    2. To make sure that device removal doesn't happen in parallel to
> >        use using the device in caam_jr_enqueue() augment the latter to
> >        acquire/release device lock for the duration of the subroutine
> >
> >    3. In order to handle the case when caam_jr_enqueue() is executed
> >       right after corresponding caam_jr_remove(), add code to check
> >       that driver data has not been set to NULL.
>
> What happens if a driver is unbound while a transform is executing
> asynchronously?
>

AFAICT caam_jr_shutdown() is going to disable JR's interrupt and kill
corresponding tasklet, so I think that transform will never finish. We
probably could handle that better by walking the list of unfinished
jobs and calling their callbacks with an appropriate error code.

> Doing get_device and put_device in caam_jr_alloc and caam_jr_free
> doesn't protect you against an explicit unbind of caam_jr, that path
> doesn't care about device reference counts. For example on imx6qp:
>

My thinking with get_device() and put_device() was to prevent struct
device * from being freed while we are using it. The intent with
explicit unbind was to protect against it by
device_lock()/device_unlock() as a well as check that device data is
not NULL. I think I did miss code to do that in caam_jr_free() before
accessing tfm_count.

> # echo 2102000.jr1 > /sys/bus/platform/drivers/caam_jr/unbind
>

A bit of a silly question, but is this with my patches applied or
against the vanilla driver?

> CPU: 2 PID: 561 Comm: bash Not tainted 5.3.0-rc7-next-20190904
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Backtrace:
> [<c010f8a4>] (dump_backtrace) from [<c010fc5c>] (show_stack+0x20/0x24)
> [<c010fc3c>] (show_stack) from [<c0d21600>] (dump_stack+0xdc/0x114)
> [<c0d21524>] (dump_stack) from [<c0955774>] (caam_jr_remove+0x24/0xf8)
> [<c0955750>] (caam_jr_remove) from [<c06e2d98>]
> (platform_drv_remove+0x34/0x4c)
> [<c06e2d64>] (platform_drv_remove) from [<c06e0b74>]
> (device_release_driver_internal+0xf8/0x1b0)
> [<c06e0a7c>] (device_release_driver_internal) from [<c06e0c70>]
> (device_driver_detach+0x20/0x24)
> [<c06e0c50>] (device_driver_detach) from [<c06de5a4>]
> (unbind_store+0x6c/0xe0)
> [<c06de538>] (unbind_store) from [<c06dd950>] (drv_attr_store+0x30/0x3c)
> [<c06dd920>] (drv_attr_store) from [<c0364c18>] (sysfs_kf_write+0x50/0x5c)
> [<c0364bc8>] (sysfs_kf_write) from [<c0363e0c>]
> (kernfs_fop_write+0x10c/0x1f8)
> [<c0363d00>] (kernfs_fop_write) from [<c02c3a30>] (__vfs_write+0x44/0x1d0)
> [<c02c39ec>] (__vfs_write) from [<c02c68ec>] (vfs_write+0xb0/0x194)
> [<c02c683c>] (vfs_write) from [<c02c6b7c>] (ksys_write+0x64/0xd8)
> [<c02c6b18>] (ksys_write) from [<c02c6c08>] (sys_write+0x18/0x1c)
> [<c02c6bf0>] (sys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
>
> I think you need to do some form of slow wait loop in jrpriv until
> jrpriv->tfm_count reaches zero.

Hmm, what do we do if it never does? Why do you think it would be
better than cancelling all outstanding jobs and resetting the HW?

> Preventing new operations from starting
> would help but isn't strictly necessary for correctness.

Wouldn't it be strictly necessary if you want to wait for tfm_count
reaching zero?

Thanks,
Andrey Smirnov