On Wed, Jan 25, 2017 at 06:15:42PM +0530, Imran Khan wrote: > Remove firmware buffer from pending list when it is freed. > Once the buffer is free kmalloc can allocate the same slab > object for some other user but as the buffer is still there > in the pending list, we end up with multiple users of the > same slab object using it in different contexts. Thanks for your patch! A few things: 0) The fallback mechanism seems to be getting quite a bit of love these days due to issues creeping up. Some of these are old issues, some are regressions. We should take advantage of the lib/test_firmware.c driver to help test against the issues folks have found, build a test case using the script in tools/testing/selftests/firmware/fw_userhelper.sh renamed now in the latest development tree [0] to: tools/testing/selftests/firmware/fw_fallback.sh and tests against it to ensure we build an atomic test unit against the issue to easily reproduce, demo the issue, and most importantly avoid further regressions. Can you add a test case for this against the latest development changes? Note there there is one patch missing to be applied yet to the series [1], its a fix which you should merge before testing with the fw_fallback.sh script. The fix is part of a series [2] which first addresses a test unit case against the issue expanding on lib/test_firmware.c and then the script to re-create the issue. Please review that series as an example, if you could do the same for this issue that would be greatly appreciated. [0] https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/log/?h=char-misc-next [1] https://lkml.kernel.org/r/20170123161111.5925-8-mcgrof@xxxxxxxxxx [2] https://lkml.kernel.org/r/20170123161111.5925-1-mcgrof@xxxxxxxxxx 1) Please inspect carefully the notes on impact of the issue and fix spelled out in the patch that fixes the issue [1], in particular see how the commit log describes what configurations are affected, and there is some effort to help describe the severity of the issue. If you could do the same for your patch that would be appreciated. > Signed-off-by: Imran Khan <kimran@xxxxxxxxxxxxxx> > --- > drivers/base/firmware_class.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > index 4497d26..d09c1aa 100644 > --- a/drivers/base/firmware_class.c > +++ b/drivers/base/firmware_class.c > @@ -339,6 +339,9 @@ static void __fw_free_buf(struct kref *ref) > (unsigned int)buf->size); > > list_del(&buf->list); > +#ifdef CONFIG_FW_LOADER_USER_HELPER > + list_del(&buf->pending_list); > +#endif > spin_unlock(&fwc->lock); > > #ifdef CONFIG_FW_LOADER_USER_HELPER And finally patch review! The pending_list is only used for the fallback mechanism. The fallback mechanism uses a state machine to help us keep track of the state of the buffer since we are allowing userspace to take time to fill it up or cancel a request. The kernel can also cancel the request if we run out of memory as we setup things, reboot, or suspend. Lastly upon success the kernel will list_del_init(&fw_buf->pending_list) -- refer to firmware_loading_store() when a 0 is echo'd in. Instead of blindly list_del()'ing the buf I'd like to understand where we are missing a list_del for the rest of the cases to consider. When we cancel (see firmware_loading_store() when a -1 is echo'd in) we call fw_load_abort() which will list_del_init(&buf->pending_list) for us provided the state is not aborted (fw_state_is_done()). Upon shutdown fw_shutdown_notify() will also call __fw_load_abort() for all list elements. Upon suspend kill_requests_without_uevent() will fw_load_abort() only on requests which do not rely on uevents, this is because upon suspend firmware_class.c uses a firmware cache for requests which do rely on uevents for a fallback mechanism. We *cancel* non-uevents request to prevent stalling suspend. We *add* an element into the pending list only once we know we were able to create the temporary device used for the sysfs load on _request_firmware_load(). If the initial setup fails we call fw_load_abort(). Given your patch then when do we not remove the request from the pending list when we also fw_free_buf()? It would seem two cases: o release_firmware() -- but note that the state machine for the firmware should have coevered all cases: cancellation, successful loading o upon resume: __device_uncache_fw_images() -- likewise I cannot see where the issue is then, could the issue have been related to the state machine issue which we now have a fix for [1] ? Could you test / confirm ? Luis -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html