On Fri, Sep 13, 2019 at 12:16 PM Leonard Crestez <leonard.crestez@xxxxxxx> wrote: > > On 04.09.2019 05:35, Andrey Smirnov wrote: > > Returning -EBUSY from platform device's .remove() callback won't stop > > the removal process, so the code in caam_jr_remove() is not going to > > have the desired effect of preventing JR from being removed. > > > > In order to be able to deal with removal of the JR device, change the > > code as follows: > > > > 1. To make sure that underlying struct device remains valid for as > > long as we have a reference to it, add appropriate device > > refcount management to caam_jr_alloc() and caam_jr_free() > > > > 2. To make sure that device removal doesn't happen in parallel to > > use using the device in caam_jr_enqueue() augment the latter to > > acquire/release device lock for the duration of the subroutine > > > > 3. In order to handle the case when caam_jr_enqueue() is executed > > right after corresponding caam_jr_remove(), add code to check > > that driver data has not been set to NULL. > > What happens if a driver is unbound while a transform is executing > asynchronously? > AFAICT caam_jr_shutdown() is going to disable JR's interrupt and kill corresponding tasklet, so I think that transform will never finish. We probably could handle that better by walking the list of unfinished jobs and calling their callbacks with an appropriate error code. > Doing get_device and put_device in caam_jr_alloc and caam_jr_free > doesn't protect you against an explicit unbind of caam_jr, that path > doesn't care about device reference counts. For example on imx6qp: > My thinking with get_device() and put_device() was to prevent struct device * from being freed while we are using it. The intent with explicit unbind was to protect against it by device_lock()/device_unlock() as a well as check that device data is not NULL. I think I did miss code to do that in caam_jr_free() before accessing tfm_count. > # echo 2102000.jr1 > /sys/bus/platform/drivers/caam_jr/unbind > A bit of a silly question, but is this with my patches applied or against the vanilla driver? > CPU: 2 PID: 561 Comm: bash Not tainted 5.3.0-rc7-next-20190904 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > Backtrace: > [<c010f8a4>] (dump_backtrace) from [<c010fc5c>] (show_stack+0x20/0x24) > [<c010fc3c>] (show_stack) from [<c0d21600>] (dump_stack+0xdc/0x114) > [<c0d21524>] (dump_stack) from [<c0955774>] (caam_jr_remove+0x24/0xf8) > [<c0955750>] (caam_jr_remove) from [<c06e2d98>] > (platform_drv_remove+0x34/0x4c) > [<c06e2d64>] (platform_drv_remove) from [<c06e0b74>] > (device_release_driver_internal+0xf8/0x1b0) > [<c06e0a7c>] (device_release_driver_internal) from [<c06e0c70>] > (device_driver_detach+0x20/0x24) > [<c06e0c50>] (device_driver_detach) from [<c06de5a4>] > (unbind_store+0x6c/0xe0) > [<c06de538>] (unbind_store) from [<c06dd950>] (drv_attr_store+0x30/0x3c) > [<c06dd920>] (drv_attr_store) from [<c0364c18>] (sysfs_kf_write+0x50/0x5c) > [<c0364bc8>] (sysfs_kf_write) from [<c0363e0c>] > (kernfs_fop_write+0x10c/0x1f8) > [<c0363d00>] (kernfs_fop_write) from [<c02c3a30>] (__vfs_write+0x44/0x1d0) > [<c02c39ec>] (__vfs_write) from [<c02c68ec>] (vfs_write+0xb0/0x194) > [<c02c683c>] (vfs_write) from [<c02c6b7c>] (ksys_write+0x64/0xd8) > [<c02c6b18>] (ksys_write) from [<c02c6c08>] (sys_write+0x18/0x1c) > [<c02c6bf0>] (sys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28) > > I think you need to do some form of slow wait loop in jrpriv until > jrpriv->tfm_count reaches zero. Hmm, what do we do if it never does? Why do you think it would be better than cancelling all outstanding jobs and resetting the HW? > Preventing new operations from starting > would help but isn't strictly necessary for correctness. Wouldn't it be strictly necessary if you want to wait for tfm_count reaching zero? Thanks, Andrey Smirnov