RE: [PATCH v2 1/3] firmware loader: Introduce new API - request_firmware_abort()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Greg Kroah-Hartman [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> Sent: Sunday, November 09, 2014 3:07 AM
>
> >
> > Besides aborting through user helper interface, a new API
> > request_firmware_abort() allows kernel driver module to abort the
> > request_firmware() / request_firmware_nowait() when they are no longer
> > needed. It is useful for cancelling an outstanding firmware load if
> > initiated from inside a module where that module is in the process of
> > being unloaded, since the unload function may not have a handle to the
> > struct firmware_buf.
> >
> > Note for people who use request_firmware_nowait():
> > You are required to do your own synchronization after you call
> > request_firmware_abort() in order to continue unloading the module.
> > The example synchronization code shows below:
> >
> > while (THIS_MODULE && module_refcount(THIS_MODULE))
> > 	msleep(20);
> 
> As others have pointed out, this isn't ok, and is totally racy and should never
> end up in a driver.  Never mess with THIS_MODULE from within the same
> module, otherwise it's racy and broken code.
> 
> So can you please rework this to not require this?
> 
> thanks,
> 
> greg k-h

Hi everyone,

First of all, I would like to apologize if my commit message gives you guys an impression
that to use request_firmware_abort(), you guys MUST do the synchronization on your own. 
But the fact is, it is not a MUST. Below will provide more detail.

Regarding this synchronization topic, I would like to open a discussion to get a
better approach to handle this problem. Before jumping onto the design, I would
like to give a background of why I am doing in this way.

- Only doing module unload is required to be aware of this synchronization
	-> Ensuring the call back does not fall into unloaded code which may cause
	     undefined behavior.
	-> Ensuring the put_device() & module_put() code have finished in firmware_class.c
	     function request_firmware_work_func() before the device is unregistered
	     and module unloaded happen.

- Not all the situations are required to do this synchronization:
	-> Implementation that use request_firmware() do not need this synchronization
	     due to it will get synced by returning at the same place the caller call
	     request_firmware().
	-> Implementation that will not unload the call back code and without relying
	     device refcount / module refcount do not need this synchronization (for e.g.
	     calling the request_firmware_nowait() without passing the MODULE and DEVICE
	     parameters as showed below:
	     request_firmware_nowait(NULL, FW_ACTION_NOHOTPLUG, "xxx", NULL,
				             GFP_KERNEL, NULL, callbackfn);).

- Following the original request_firmware_nowait() design thought
	-> Strongly believe the original design of request_firmware_nowait() that using the get_device(),
	     put_device(), try_get_module() & module_put() also expect the caller side to do the
	     synchronization themselves if they have relying on these counter to continue their works.

	     Let's take out this newly introduce API (request_firmware_abort()) and focus only on the
	     original design (request_firmware_nowait()). If there is a caller design that after the callback
	     happen and it need to do platform_device_unregister(), the original design also expect the
	    caller do its own synchronization before it can do the device unregister.

- Use cases that do not need the synchronization:
	-> Depend on a volatile condition in order to expose firmware upload interface: Like a design
	      only in a particular window period then open the interface, when outside of the window
	      then it need to disable the interface. This design also does not need the synchronization
	      as it do not unload the callback module code and not relying on the device refcount or
	      module refcount to do it stuff.
	-> System reboot: When system reboot, the system would not unload the module code and
	      also would not unregister the device. So it does not meet the conditions (unload call back
	      code and have dependency on device refcount / module refcount). This does not need
	      the synchronization.

Due to the above reasons, in summary, I think the caller has the responsibility to take care of this
synchronization whenever needed on the caller side. What do you guys think?


Come back to the design if really need to implement it in firmware_class. Below shows some design
thought:

- Complexity of implementing the synchronization inside firmware_class
	-> Cannot use module refcount or device refcount to be the synchronization method
	     due to:
	     (1) firmware_work data struct will get freed  at request_firmware_work_func()
	            after calling the __fw_load_abort() just before you could do the synchronization
	            at the end of request_firmware_abort().
	     (2) each different caller may have different refcount number in their device/module
	            data struct where it is impossible for firmware_class to take a number and wait
	            for it. Only caller know better what is the number they should wait for.
	      (3) As mentioned above, caller who use request_firmware_nowait() may not passing
	             the module or device parameters.
	-> firmware_buf data struct also will get freed after calling the __fw_load_abort() just
                     before you could do the synchronization at the end of request_firmware_abort().
	-> If implement wait completion for the synchronization, below are something we need
	     to watch out/take care:
	     (1) completion struct cannot tight to firmware_buf & firmware_work data structs as they
	            are freed before you can use it.
	     (2) one global completion struct may not work due to request_firmware_nowait() can
	            be called multiple time.
	     (3) it may have chances to race with userland issuing "echo 0 > loading" which eventually
	            happened in such scenario that complete_all() being called before wait_for_completion()
	            being called.

I am thinking that I will stick back with wait completion method for the synchronization instead of using
semaphore. Trying to overcome the above complexity, I need to introduce another struct that won't be
freed and must able to link with firmware_buf data struct. And I also need to implement some kind of
lock to prevent race condition. 

Do you guys think this is feasible? Or the most easiest way to do the synchronization is at the
platform_device_unregister() or module unload code which before they unregister/unload, they need
to wait until refcount = 0 then they can do their job.

Looking forward to hear from you guys. Thanks. :-)


Regards,
Wilson

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux