02.04.2020 17:00, Dmitry Osipenko пишет: > 02.04.2020 16:24, Dmitry Osipenko пишет: >> 02.04.2020 14:50, Wang, Jiada пишет: >>> Hi Dmitry >>> >>> On 2020/04/02 1:04, Dmitry Osipenko wrote: >>>> 31.03.2020 13:50, Jiada Wang пишет: >>>>> From: Sanjeev Chugh <sanjeev_chugh@xxxxxxxxxx> >>>>> >>>>> There could be scope of race conditions when sysfs is being handled >>>>> and at the same time, device removal is occurring. For example, >>>>> we don't want the device removal to begin if the Atmel device >>>>> cfg update is going on or firmware update is going on. In such >>>>> cases, wait for device update to be completed before the removal >>>>> continues. >>>>> >>>>> Thread Thread 2: >>>>> ========================= >>>>> ========================= >>>>> mxt_update_fw_store() mxt_remove() >>>>> mutex_lock(&data->lock) ... >>>>> mxt_initialize() //Tries to acquire lock >>>>> request_firmware_nowait() mutex_lock(&data->lock) >>>>> ... ==>waits for lock() >>>>> ... . >>>>> ... . >>>>> mutex_unlock(&data->lock) . >>>>> //Gets lock and >>>>> proceeds >>>>> >>>>> mxt_free_input_device(); >>>>> ... >>>>> >>>>> mutex_unlock(&data->lock) >>>>> //Frees atmel driver >>>>> data >>>>> kfree(data) >>>>> >>>>> If the request_firmware_nowait() completes after the driver removal, >>>>> and callback is triggered. But kernel crashes since the module is >>>>> already removed. >>>>> >>>>> This commit adds state machine to serialize such scenarios. >>>> >>>> Won't it be easier to bump driver's module use-count by __module_get() >>>> while firmware is updating? Or remove sysfs during of mxt_remove()? > >>> >>> thanks for your inspiration, I will replace state machine with module >>> use-count. >> >> I'm actually now thinking that the suggestion about the module-count >> wasn't very correct because this won't really help in regards to >> mxt_update_fw_store() / mxt_remove() racing. >> >> I see that mxt_remove() already invokes the mxt_sysfs_remove(), which >> should block until mxt_update_fw_store() is completed, shouldn't it? >> >> I guess the kfree(data) isn't the real cause of the problem and >> something like this should help: >> >> diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c >> b/drivers/input/touchscreen/atmel_mxt_ts.c >> index b2edf51e1595..4e66106feeb9 100644 >> --- a/drivers/input/touchscreen/atmel_mxt_ts.c >> +++ b/drivers/input/touchscreen/atmel_mxt_ts.c >> @@ -4254,6 +4254,7 @@ static void mxt_sysfs_remove(struct mxt_data *data) >> struct i2c_client *client = data->client; >> >> sysfs_remove_group(&client->dev.kobj, &mxt_attr_group); >> + sysfs_remove_group(&client->dev.kobj, &mxt_fw_attr_group); >> } >> >> static void mxt_reset_slots(struct mxt_data *data) >> @@ -4649,31 +4650,19 @@ static int mxt_remove(struct i2c_client *client) >> { >> struct mxt_data *data = i2c_get_clientdata(client); >> >> - mutex_lock(&data->lock); >> - if (data->e_state == MXT_STATE_UPDATING_CONFIG_ASYNC || >> - data->e_state == MXT_STATE_UPDATING_CONFIG) { >> - data->e_state = MXT_STATE_GOING_AWAY; >> - mutex_unlock(&data->lock); >> - mxt_wait_for_completion(data, &data->update_cfg_completion, >> - MXT_CONFIG_TIMEOUT); >> - } else { >> - data->e_state = MXT_STATE_GOING_AWAY; >> - mutex_unlock(&data->lock); >> - } >> + mxt_sysfs_remove(data); >> >> - disable_irq(data->irq); >> - sysfs_remove_group(&client->dev.kobj, &mxt_fw_attr_group); >> if (data->reset_gpio) { >> sysfs_remove_link(&client->dev.kobj, "reset"); >> gpiod_unexport(data->reset_gpio); >> } >> + >> mxt_debug_msg_remove(data); >> - mxt_sysfs_remove(data); >> mxt_free_input_device(data); >> mxt_free_object_table(data); >> >> if (debug_state) >> cancel_delayed_work_sync(&data->watchdog_work); >> + disable_irq(data->irq); >> >> return 0; >> } >> > > I'm looking at this again and the original tear-down order of the > mxt_remove() looks okay, so no need to change it. > > Reading the commit message, it says that request_firmware_nowait() races > with kfree(data), but that can't happen because the data is > resource-managed and request_firmware_nowait() bumps device's use-count. > > https://elixir.bootlin.com/linux/v5.6.2/source/drivers/base/firmware_loader/main.c#L1043 > > I think this patch was ported from some very old kernel version and it's > simply not applicable to the upstream anymore. > Moreover, request_firmware_nowait() uses try_module_get(), hence driver's module can't be unloaded until firmware is loaded. This patch is wrong, please drop it.