On 2023-12-21 09:22, Greg Kroah-Hartman wrote: > On Wed, Dec 20, 2023 at 11:24:20PM +0100, Marco Pagani wrote: >> >> >> On 19/12/23 19:11, Greg Kroah-Hartman wrote: >>> On Tue, Dec 19, 2023 at 06:17:20PM +0100, Marco Pagani wrote: >>>> >>>> On 2023-12-19 16:10, Greg Kroah-Hartman wrote: >>>>> On Tue, Dec 19, 2023 at 03:54:25PM +0100, Marco Pagani wrote: >>>>>> >>>>>> >>>>>> On 2023-12-18 21:33, Greg Kroah-Hartman wrote: >>>>>>> On Mon, Dec 18, 2023 at 09:28:09PM +0100, Marco Pagani wrote: >>>>>>>> This patch tentatively set the owner field of fpga_manager_ops to >>>>>>>> THIS_MODULE for existing fpga manager low-level control modules. >>>>>>>> >>>>>>>> Signed-off-by: Marco Pagani <marpagan@xxxxxxxxxx> >>>>>>>> --- >>>>>>>> drivers/fpga/altera-cvp.c | 1 + >>>>>>>> drivers/fpga/altera-pr-ip-core.c | 1 + >>>>>>>> drivers/fpga/altera-ps-spi.c | 1 + >>>>>>>> drivers/fpga/dfl-fme-mgr.c | 1 + >>>>>>>> drivers/fpga/ice40-spi.c | 1 + >>>>>>>> drivers/fpga/lattice-sysconfig.c | 1 + >>>>>>>> drivers/fpga/machxo2-spi.c | 1 + >>>>>>>> drivers/fpga/microchip-spi.c | 1 + >>>>>>>> drivers/fpga/socfpga-a10.c | 1 + >>>>>>>> drivers/fpga/socfpga.c | 1 + >>>>>>>> drivers/fpga/stratix10-soc.c | 1 + >>>>>>>> drivers/fpga/tests/fpga-mgr-test.c | 1 + >>>>>>>> drivers/fpga/tests/fpga-region-test.c | 1 + >>>>>>>> drivers/fpga/ts73xx-fpga.c | 1 + >>>>>>>> drivers/fpga/versal-fpga.c | 1 + >>>>>>>> drivers/fpga/xilinx-spi.c | 1 + >>>>>>>> drivers/fpga/zynq-fpga.c | 1 + >>>>>>>> drivers/fpga/zynqmp-fpga.c | 1 + >>>>>>>> 18 files changed, 18 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c >>>>>>>> index 4ffb9da537d8..aeb913547dd8 100644 >>>>>>>> --- a/drivers/fpga/altera-cvp.c >>>>>>>> +++ b/drivers/fpga/altera-cvp.c >>>>>>>> @@ -520,6 +520,7 @@ static const struct fpga_manager_ops altera_cvp_ops = { >>>>>>>> .write_init = altera_cvp_write_init, >>>>>>>> .write = altera_cvp_write, >>>>>>>> .write_complete = altera_cvp_write_complete, >>>>>>>> + .owner = THIS_MODULE, >>>>>>> >>>>>>> Note, this is not how to do this, force the compiler to set this for you >>>>>>> automatically, otherwise everyone will always forget to do it. Look at >>>>>>> how functions like usb_register_driver() works. >>>>>>> >>>>>>> Also, are you _sure_ that you need a module owner in this structure? I >>>>>>> still don't know why... >>>>>>> >>>>>> >>>>>> Do you mean moving the module owner field to the manager context and setting >>>>>> it during registration with a helper macro? >>>>> >>>>> I mean set it during registration with a helper macro. >>>>> >>>>>> Something like: >>>>>> >>>>>> struct fpga_manager { >>>>>> ... >>>>>> struct module *owner; >>>>>> }; >>>>>> >>>>>> #define fpga_mgr_register(parent, ...) \ >>>>>> __fpga_mgr_register(parent,..., THIS_MODULE) >>>>>> >>>>>> struct fpga_manager * >>>>>> __fpga_mgr_register(struct device *parent, ..., struct module *owner) >>>>>> { >>>>>> ... >>>>>> mgr->owner = owner; >>>>>> } >>>>> >>>>> Yes. >>>>> >>>>> But again, is a module owner even needed? I don't think you all have >>>>> proven that yet... >>>> >>>> Programming an FPGA involves a potentially lengthy sequence of interactions >>>> with the reconfiguration engine. The manager conceptually organizes these >>>> interactions as a sequence of ops. Low-level modules implement these ops/steps >>>> for a specific device. If we don't protect the low-level module, someone might >>>> unload it right when we are in the middle of a low-level op programming the >>>> FPGA. As far as I know, the kernel would crash in that case. >>> >>> The only way an unload of a module can happen is if a user explicitly >>> asks for it to be unloaded. So they get what they ask for, right? >>> >> >> Right, the user should get what he asked for, including hanging the >> hardware. My only concern is that the kernel should not crash. >> >>> How do you "know" it is active? And why doesn't the normal >>> "driver/device" bindings prevent unloading from being a problem? When >>> you unload a module, you stop all ops on the driver, and then unregister >>> it, which causes any future ones to fail. >>> >>> Or am I missing something here? >>> >> >> I think the problem is that the ops are not directly tied to the driver >> of the manager's parent device. > > Then that needs to be fixed right there, as that is obviously not using > the driver model properly. > > Why aren't the "ops" a driver that is bound to this device? If it is > the one responsible for controlling it, then it should be a driver and > as such, the driver model logic will handle things if/when a module is > unloaded to tear things down better. > >> It is not even required to have a driver >> to register a manager. The only way to know if the fpga manager is >> active (i.e., someone is running one op) is by poking manager->state. > > That too seems wrong, why is this? I don't know. I was not around when the fpga subsystem was laid down. > >> One possibility that comes into my mind, excluding a major reworking, >> is waiting in fpga_mgr_unregister() until the manager reaches a steady >> state (no ops are running) before unregistering the device. However, it >> feels questionable because if one of the ops hangs, the module removal >> will also hang. > > You never know when a new operand will come in, so there's no way to > know "all is quiet", sorry. > > Try fixing this properly, buy using the driver model correctly, that > should help resolve these issues automatically instead of hacked up > module reference count attempts. > > Remember, this is the whole reason why the driver model was created all > those 20+ years ago, to move away from these module reference count > issues, let's not forget history please. > I do not entirely understand this part. The subsystem only provides an in-kernel API for programming the fpga that in-kernel consumers can use. The ops that the low-level module implements are used only internally by the manager in a predefined order. There is no standard interface for programming the fpga exposed to userspace using file_operations or attributes exported via sysfs. The manager only exports read-only attributes for status. On top of that, there is only the support for device tree overlays. Would it be correct to assume that the responsibility of keeping the low-level module in while programming the fpga is on the kernel component that consumes the subsystem's in-kernel API and (eventually) exports a programming interface to userspace? If we consider the case where the programming is done through a userspace interface exported by the same module that implements the ops, then we should be good even without taking the low-level module in the manager. However, I guess the decision to take the low-level module in the manager was meant to address the case where the module implementing the ops and the consumer of the in-kernel API (that may optionally export a userspace interface for programming) are two separate entities. Thanks, Marco