On 2024-04-09 06:08, Xu Yilun wrote: > On Wed, Apr 03, 2024 at 03:34:22PM +0200, Marco Pagani wrote: >> >> >> On 2024-04-01 11:34, Xu Yilun wrote: >>> On Wed, Mar 27, 2024 at 05:00:20PM +0100, Marco Pagani wrote: >>>> The current implementation of the fpga region assumes that the low-level >>>> module registers a driver for the parent device and uses its owner pointer >>>> to take the module's refcount. This approach is problematic since it can >>>> lead to a null pointer dereference while attempting to get the region >>>> during programming if the parent device does not have a driver. >>>> >>>> To address this problem, add a module owner pointer to the fpga_region >>>> struct and use it to take the module's refcount. Modify the functions for >>>> registering a region to take an additional owner module parameter and >>>> rename them to avoid conflicts. Use the old function names for helper >>>> macros that automatically set the module that registers the region as the >>>> owner. This ensures compatibility with existing low-level control modules >>>> and reduces the chances of registering a region without setting the owner. >>>> >>>> Also, update the documentation to keep it consistent with the new interface >>>> for registering an fpga region. >>>> >>>> Other changes: unlock the mutex before calling put_device() in >>>> fpga_region_put() to avoid potential use after release issues. >>> >>> Please try not to mix different changes in one patch, especially for >>> a "bug fix" as you said. >> >> You are right. I'll split out the change and eventually send it as a >> separate patch. >> >>> And I do have concern about the fix, see below. >>> >>> [...] >>> >>>> @@ -53,7 +53,7 @@ static struct fpga_region *fpga_region_get(struct fpga_region *region) >>>> } >>>> >>>> get_device(dev); >>>> - if (!try_module_get(dev->parent->driver->owner)) { >>>> + if (!try_module_get(region->br_owner)) { >>>> put_device(dev); >>>> mutex_unlock(®ion->mutex); >>>> return ERR_PTR(-ENODEV); >>>> @@ -75,9 +75,9 @@ static void fpga_region_put(struct fpga_region *region) >>>> >>>> dev_dbg(dev, "put\n"); >>>> >>>> - module_put(dev->parent->driver->owner); >>>> - put_device(dev); >>>> + module_put(region->br_owner); >>>> mutex_unlock(®ion->mutex); >>> >>> If there is concern the region would be freed after put_device(), then >>> why still keep the sequence in fpga_region_get()? >> >> Ouch, sorry, I forgot to make the change also in fpga_region_get(). >> >>> And is it possible region is freed before get_device() in >>> fpga_region_get()? >> >> If the user follows the usual pattern (i.e., waiting for > > I can see the only safe way is fpga_region_program_fpga() or fpga_region_get() > should be included in: > > region = fpga_region_class_find(); > ... > put_device(®ion->dev); > > That is to say, fpga_region_get() should not be called when there is no > region dev reference hold beforehand. In this case, no use after release > risk. That's why I was thinking about some documentation. > > Another concern is we'd better keep the get/put operations symmetrical > for easy maintaining, as long as it doesn't cause problem. Now I see your point. So, you suggest changing only the docs to clarify that the region must be taken with fpga_region_class_find() before programming it with fpga_region_program_fpga()? That's fine by me. However, this made me wonder why we need to take the region dev with get_device() in fpga_region_program_fpga()->fpga_region_get(). If we assume that the user must always call fpga_region_class_find() before programming with fpga_region_program_fpga(), why do we need the double get? Thanks, Marco >> fpga_region_program_fpga() to complete before calling >> fpga_region_unregister()) there should be no problem. However, I think >> releasing the device before unlocking the mutex contained in the context >> associated with the device makes the code brittle and more prone to >> problems. >> >>> Or we should clearly document how/when to use these functions? >> >> I think it is not necessary to change the documentation since the >> in-kernel programming API will not be affected by the change. >> [...]