On 2/12/25 04:54, Bjorn Andersson wrote: > On Tue, Dec 10, 2024 at 11:33:31AM +0100, Arnaud POULIQUEN wrote: >> >> >> On 12/10/24 00:14, Bjorn Andersson wrote: >>> On Thu, Nov 28, 2024 at 09:42:10AM GMT, Arnaud Pouliquen wrote: >>>> This patch updates the rproc_ops structures to include two new optional >>>> operations. >>>> >>>> - The load_fw() op is responsible for loading the remote processor >>>> non-ELF firmware image before starting the boot sequence. This ops will >>>> be used, for instance, to call OP-TEE to authenticate an load the firmware >>>> image before accessing to its resources (a.e the resource table) >>>> >>>> - The release_fw op is responsible for releasing the remote processor >>>> firmware image. For instance to clean memories. >>>> The ops is called in the following cases: >>>> - An error occurs between the loading of the firmware image and the >>>> start of the remote processor. >>>> - after stopping the remote processor. >>>> >>> >>> Why does this difference need to be encoded in rproc_ops? I think we >>> should strive for having a single, simple high level flow of operations >>> through the remoteproc core for which the specifics of each remoteproc >>> instance can be encoded in that driver. >>> >>> >>> Perhaps there's a good reason for this, but if so please read and follow >>> https://docs.kernel.org/process/submitting-patches.html#describe-your-changes >>> to make that reasoning clear in the commit message. >>> >> >> The actual sequence to load a remoteproc firmware is >> - get firmware from file system and store the firmware image in Linux kernel memory > > This sounds like "load" in remoteproc terminology. it is the request_firmware() > >> - get resource table from the firmware image and make a copy( >> - parse the resource table and handle the resources > >> - load the firmware >> - start the firmware > > And these two are "start". yes done in rproc_start() > >> >> >> In OP-TEE we support not only one ELF image but n images (for instance a TF-M + >> a zephyr), the segments can be encrypted the OP-TEE load sequence is >> - copy header and meta data of the signed image in a secure memory >> - verify it >> - copy segments in remote processor memory and authenticate segments in place. >> - optionally decrypt the segments >> >> Only at this step the resource table as been authenticated (and decrypted) > > "this step" meaning TA_RPROC_FW_CMD_LOAD_FW? yes >Above you say that happens > after you parse the resource table. The sequence above staring by "the actual sequence to load a remoteproc firmware is" describes the existing legacy Linux kernel sequence without my series > >> >> So the point is that we need to load the firmware before getting the resource table >> >> >>>> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@xxxxxxxxxxx> >>>> --- >>>> Update vs version V13: >>>> - Rework the commit to introduce load_fw() op. >>>> - remove rproc_release_fw() call from rproc_start() as called in >>>> rproc_boot() and rproc_boot_recovery() in case of error. >>>> - create rproc_load_fw() and rproc_release_fw() internal functions. >>>> --- >>>> drivers/remoteproc/remoteproc_core.c | 16 +++++++++++++++- >>>> drivers/remoteproc/remoteproc_internal.h | 14 ++++++++++++++ >>>> include/linux/remoteproc.h | 6 ++++++ >>>> 3 files changed, 35 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >>>> index ace11ea17097..8df4b2c59bb6 100644 >>>> --- a/drivers/remoteproc/remoteproc_core.c >>>> +++ b/drivers/remoteproc/remoteproc_core.c >>>> @@ -1488,6 +1488,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw) >>>> kfree(rproc->cached_table); >>>> rproc->cached_table = NULL; >>>> rproc->table_ptr = NULL; >>>> + rproc_release_fw(rproc); >>>> unprepare_rproc: >>>> /* release HW resources if needed */ >>>> rproc_unprepare_device(rproc); >>>> @@ -1855,8 +1856,14 @@ static int rproc_boot_recovery(struct rproc *rproc) >>>> return ret; >>>> } >>>> >>>> + ret = rproc_load_fw(rproc, firmware_p); >>> >>> It is not clear to me why in the case of OP-TEE we need to invoke the >>> "load operation" here, and in the case of "legacy" ELF loading we do it >>> first thing in rproc_start() (i.e. on the very next line of code being >>> executed). >> >> For the OP-TEE, please refer to my comment above. >> >> The only reason I can see for the legacy ELF is that the resource table could >> contain information to be able to configure some resources to load the firmware. >> In case of OP-TEE this would be managed in OP-TEE. >> > > Sure, but as I say...inline rproc_start() here and the very next > operation we perform is something we call "load". > > Why do we need to differentiate "load" and "load firmware", when we call > them at the same step in the sequence? A simplified sequence showing the implementation after this commit would be 1) request_firmware() 2) rproc_load_fw() (optional) 3) rproc_parse_fw() 4) rproc_handle_resources() 5) rproc_load_segments() (optional) 6) rproc->ops->start() Here we introduce rproc_load_fw() because we need to load the firmware and authenticate it before parsing it for the resource table calling rproc_parse_fw(). legacy Elf format sequence: 1) request_firmware() 3) rproc_parse_fw() 4) rproc_handle_resources() 5) rproc_load_segments() 6) rproc->ops->start() Requested remoteproc TEE sequence 1) request_firmware() 2) rproc_load_fw() 3) rproc_parse_fw() 4) rproc_handle_resources() 6) rproc->ops->start() > >>> >>> >>> Should we start by renaming rproc_load_segments() rproc_load() and move >>> it out of rproc_start()? (I.e. here?) >>> >>> Perhaps define that rproc_load() is responsible for "loading firmware" >>> (whatever that means) and establishing rproc->cached_table, and >>> rproc->table_ptr? >>> >>> (Note that this seems like a good cleanup of the spaghetti regardless) >>> >> >> It's something that crossed my mind, but I don't know the legacy well enough to >> guarantee that it will work in all drivers. >> > > Looking at this again, if we move it, we need to duplicate it across a > few different call sites. So might need to take another look at that. > > Still flatten the code and you seem to do: > > if (load_fw) > load_fw(); > if (load) > load(); > > Why do we need two different things named "load". Right, the terminology can be confusing, but both perform a load. We arrived at this terminology with Mathieu, but if you have a suggestion, don't hesitate to share. > >> If you want to go in this direction, perhaps this is something that could be >> addressed in a dedicated pull request? In this case, the ops could become >> load_fw and load_fw_new, similar to how it is done for platform_driver::remove. >> > > No need to make it more complicated than it is, change the symbol name > and fix the 3 places that calls the function. > >> >>>> + if (ret) >>>> + return ret; >>>> + >>>> /* boot the remote processor up again */ >>>> ret = rproc_start(rproc, firmware_p); >>>> + if (ret) >>>> + rproc_release_fw(rproc); >>> >>> The fact that you rproc_release_fw() in the error path here, right >>> before we unconditionally release_firmware() the actual firmware means >>> that you have 2 different life cycles with very very similar names. >>> >>> This will contain bugs, sooner or later. >> >> So we need to find a better way for the ops if we continue in this direction. >> What about introducing rproc_load_new and rproc_release? >> > > You need to help me understand why load and load_new are both needed. > > And no, "load_new" is not an ok name. > >>> >>>> >>>> release_firmware(firmware_p); >>>> >>>> @@ -1997,7 +2004,13 @@ int rproc_boot(struct rproc *rproc) >>>> goto downref_rproc; >>>> } >>>> >>>> + ret = rproc_load_fw(rproc, firmware_p); >>>> + if (ret) >>>> + goto downref_rproc; >>>> + >>>> ret = rproc_fw_boot(rproc, firmware_p); >>>> + if (ret) >>>> + rproc_release_fw(rproc); >>>> >>>> release_firmware(firmware_p); >>>> } >>>> @@ -2071,6 +2084,7 @@ int rproc_shutdown(struct rproc *rproc) >>>> kfree(rproc->cached_table); >>>> rproc->cached_table = NULL; >>>> rproc->table_ptr = NULL; >>>> + rproc_release_fw(rproc); >>>> out: >>>> mutex_unlock(&rproc->lock); >>>> return ret; >>>> @@ -2471,7 +2485,7 @@ static int rproc_alloc_ops(struct rproc *rproc, const struct rproc_ops *ops) >>>> if (!rproc->ops->coredump) >>>> rproc->ops->coredump = rproc_coredump; >>>> >>>> - if (rproc->ops->load) >>>> + if (rproc->ops->load || rproc->ops->load_fw) >>>> return 0; >>>> >>>> /* Default to ELF loader if no load function is specified */ >>>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h >>>> index 0cd09e67ac14..2104ca449178 100644 >>>> --- a/drivers/remoteproc/remoteproc_internal.h >>>> +++ b/drivers/remoteproc/remoteproc_internal.h >>>> @@ -221,4 +221,18 @@ bool rproc_u64_fit_in_size_t(u64 val) >>>> return (val <= (size_t) -1); >>>> } >>>> >>>> +static inline void rproc_release_fw(struct rproc *rproc) >>>> +{ >>>> + if (rproc->ops->release_fw) >>>> + rproc->ops->release_fw(rproc); >>>> +} >>>> + >>>> +static inline int rproc_load_fw(struct rproc *rproc, const struct firmware *fw) >>>> +{ >>>> + if (rproc->ops->load_fw) >>>> + return rproc->ops->load_fw(rproc, fw); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> #endif /* REMOTEPROC_INTERNAL_H */ >>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h >>>> index 2e0ddcb2d792..ba6fd560f7ba 100644 >>>> --- a/include/linux/remoteproc.h >>>> +++ b/include/linux/remoteproc.h >>>> @@ -381,6 +381,10 @@ enum rsc_handling_status { >>>> * @panic: optional callback to react to system panic, core will delay >>>> * panic at least the returned number of milliseconds >>>> * @coredump: collect firmware dump after the subsystem is shutdown >>>> + * @load_fw: optional function to load non-ELF firmware image to memory, where the remote >>>> + * processor expects to find it. >>> >>> Why does it matter if it's an ELF or not? >> >> No matter. It was more to differentiate from the legacy one, but it does not >> make sense and adds to the argument that the ops naming is not accurate. >> > > I'm probably misunderstanding your intention here, but I can't help > feeling that you add this code path to avoid understanding or fixing the > existing code. > >>> >>> In the Qualcomm case, firmware comes in ELF format, Linux loads the >>> LOAD segments and the trusted world then authenticates the content and >>> start the remote processor. >>> >>> >>> I think the difference in your case is that you have memory reserved >>> elsewhere, and you want the "load" operation to pass the firmware to the >>> TEE - which means that you need rproc_release_fw() to eventually clean >>> up the state if rproc_start() fails - and upon shutdown. >> >> Yes the OP-TEE is make more stuff: >> - authenticate several firmware images > > Please tell me how that work. I'm not able to see the multiple calls to > request_firmware()...