Re: [RFC] Input: Remove unsafe device module references

David Herrmann <dh.herrmann@xxxxxxxxxxxxxx> · Tue, 1 Nov 2011 19:16:05 +0100

On Tue, Nov 1, 2011 at 7:05 PM, Dmitry Torokhov
<dmitry.torokhov@xxxxxxxxx> wrote:
> On Tue, Nov 01, 2011 at 06:52:11PM +0100, David Herrmann wrote:
>> Hi Greg
>>
>> On Tue, Nov 1, 2011 at 6:01 PM, Greg KH <gregkh@xxxxxxx> wrote:
>> > On Tue, Nov 01, 2011 at 04:41:40PM +0100, David Herrmann wrote:
>> >> Hi Dmitry and Greg
>> >>
>> >> It doesn't make sense to take a reference to our own module. When we call
>> >> module_put(THIS_MODULE) we cannot make sure that our module is still alive when
>> >> this function returns. Therefore, module_put() will return to invalid memory and
>> >> our input_dev_release() function is no longer available.
>> >>
>> >> It would be interesting if Greg could elaborate what else we could do to replace
>> >> this module-refcount as it is definitely needed here. However, "struct device"
>> >> doesn't provide an owner field so there is no way for us to let the device core
>> >> keep a reference to our module.
>> >
>> > For a bus module, yes, this is needed, so don't remove these calls, it's
>> > wrong to do so.
>> >
>> >> I have no clue what to do here but the current implementation is definitely
>> >> unsafe so this is marked as RFC. Currently, the device_attributes probably
>> >> already keep a reference to our module so applying this patch would probably not
>> >> break anything, however, this does not look like something we can trust on.
>> >
>> > Yes it is, why do you think it isn't?
>> >
>> >> My bug-thread kind of died (https://lkml.org/lkml/2011/10/29/75) so I now try to
>> >> show this with an example here.
>> >
>> > It died due to me traveling, sorry, I'll respond to them now.
>>
>> No problem. This is why I've resent this with an example.
>>
>> > I fail to see what the real problem you are trying to solve here is.  Is
>> > there something with the way the kernel works today that you are having
>> > problems with?  What is driving this?
>>
>> I am working on converting the hci stack to properly use sysfs APIs +
>> struct device. And my problem simply is the following:
>>
>> @@ -1417,8 +1417,6 @@ static void input_dev_release(struct device *device)
>>        input_mt_destroy_slots(dev);
>>        kfree(dev->absinfo);
>>        kfree(dev);
>> -
>> -       module_put(THIS_MODULE);
>>  }
>>
>> If this module_put(THIS_MODULE) is needed as you said, then I can be
>> sure that this call does not release the last module-reference, can I?
>> Otherwise, this call may return to invalid memory.
>>
>> But, if I can be sure that this doesn't release the last reference,
>> why take this reference at all?
>>
>> The only reason I can think of is, that some other code calls
>> __get_module() after I called it, and it calls put_module() after I
>> call it. In all other cases, taking/releasing this reference is
>> needless as we can trust that our caller protects us.
>>
>> In other words, which code does this module_get/put() protect? It
>> cannot protect input_dev_release() because module_put(THIS_MODULE) is
>> called *inside* input_dev_release(). I need some way to protect the
>> input_dev_release() callback-code outside of this callback.
>> Or can I go sure that the caller of the input_dev_callback() takes a
>> reference to my module before calling this and releases it after? (But
>> then I wonder how does it know what module I am?)
>>
>> If this is the recommended way to protect the device_release
>> callbacks, I will just copy it into hci_dev, but currently I really
>> don't get why these are needed.
>> If you can tell me an example why the input-core breaks if this patch
>> is applied, I can probably better explain to you, why I think it still
>> breaks without this patch applied.
>>
>>
>>
>> For instance see my example:
>>
>> 1)
>> input-core-module is loaded
>> 2)
>> input-core-module creates a new input device and increases
>> module-refcnt inside input_allocate_device()
>> 4)
>> another subsystem grabs the "struct device" and increases its refcnt
>> (for any reason...)
>> 5)
>> input-core-module destroys the input-device but it still stays alive
>> until the other subsystem releases its refcnt of the "struct device".
>> 6)
>> input-core-module is unloaded
>> This doesn't succeed as the still living input-device has a module-refcnt
>> 7)
>> the other subsystem releases its refcnt of the input-device
>> 8)
>> The input-device is destroyed and its _release_ function is called
>> The release function destroys the input-device *and* frees the last
>> module-refcnt. Then *boom*, the release function cannot return as it
>> is no longer available as described above.
>
> The analysis is right except that this condition is very unlikely to
> trigger. By removing __module_get/module_put you are making this problem
> much much easier to hit.

I was aware of that, this is why I marked it as RFC and included some
comment, that another locking is needed. I don't know any trivial way,
though.

>>
>> My solution: Some parent subsystem of us must take and release this
>> module-refcnt instead of us, so this bug doesn't occur.
>> Or: We simply wait for all these input_devices to be released before
>> exiting input_exit().
>
> How would you know that all input devices are released in the sense
> that all threads left (as in exited) all code in input.c completely?

I rethought this and all ideas (like completion_t or waitqueues) I
have will trigger the same race-condition as with the module_put()
thing we currently use.
So I am left with the parent-subsystem to take care of the module
refcount. Or just accepting this race-condition.

> --
> Dmitry
>

David
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html