Re: [PATCH v15 3/8] remoteproc: Introduce load_fw and release_fw optional operation

Arnaud POULIQUEN <arnaud.pouliquen@xxxxxxxxxxx> · Wed, 12 Feb 2025 14:48:30 +0100

On 2/12/25 04:54, Bjorn Andersson wrote:
> On Tue, Dec 10, 2024 at 11:33:31AM +0100, Arnaud POULIQUEN wrote:
>>
>>
>> On 12/10/24 00:14, Bjorn Andersson wrote:
>>> On Thu, Nov 28, 2024 at 09:42:10AM GMT, Arnaud Pouliquen wrote:
>>>> This patch updates the rproc_ops structures to include two new optional
>>>> operations.
>>>>
>>>> - The load_fw() op is responsible for loading the remote processor
>>>> non-ELF firmware image before starting the boot sequence. This ops will
>>>> be used, for instance, to call OP-TEE to  authenticate an load the firmware
>>>> image before accessing to its resources (a.e the resource table)
>>>>
>>>> - The release_fw op is responsible for releasing the remote processor
>>>> firmware image. For instance to clean memories.
>>>> The ops is called in the following cases:
>>>>  - An error occurs between the loading of the firmware image and the
>>>>    start of the remote processor.
>>>>  - after stopping the remote processor.
>>>>
>>>
>>> Why does this difference need to be encoded in rproc_ops? I think we
>>> should strive for having a single, simple high level flow of operations
>>> through the remoteproc core for which the specifics of each remoteproc
>>> instance can be encoded in that driver.
>>>
>>>
>>> Perhaps there's a good reason for this, but if so please read and follow
>>> https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
>>> to make that reasoning clear in the commit message.
>>>
>>
>> The actual sequence to load a remoteproc firmware is
>> - get firmware from file system and store the firmware image in Linux kernel memory
> 
> This sounds like "load" in remoteproc terminology.

it is the request_firmware()

> 
>> - get resource table from the firmware image and make a copy(
>> - parse the resource table and handle the resources
> 
>> - load the firmware
>> - start the firmware
> 
> And these two are "start".

yes done in rproc_start()

> 
>>
>>
>> In OP-TEE we support not only one ELF image but n images (for instance a TF-M +
>> a zephyr), the segments can be encrypted the OP-TEE load sequence is
>>  - copy header and meta data of the signed image in a secure memory
>>  - verify it
>>  - copy segments in remote processor memory and authenticate segments in place.
>>  - optionally decrypt the segments
>>
>> Only at this step the resource table as been authenticated (and decrypted)
> 
> "this step" meaning TA_RPROC_FW_CMD_LOAD_FW? 

 yes

>Above you say that happens
> after you parse the resource table.

The sequence above staring by "the actual sequence to load a remoteproc firmware
is" describes the existing legacy Linux kernel sequence without my series

> 
>>
>> So the point is that we need to load the firmware before getting the resource table
>>
>>
>>>> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@xxxxxxxxxxx>
>>>> ---
>>>> Update vs version V13:
>>>> - Rework the commit to introduce load_fw() op.
>>>> - remove rproc_release_fw() call from  rproc_start() as called in
>>>>   rproc_boot() and rproc_boot_recovery() in case of error.
>>>> - create rproc_load_fw() and rproc_release_fw() internal functions.
>>>> ---
>>>>  drivers/remoteproc/remoteproc_core.c     | 16 +++++++++++++++-
>>>>  drivers/remoteproc/remoteproc_internal.h | 14 ++++++++++++++
>>>>  include/linux/remoteproc.h               |  6 ++++++
>>>>  3 files changed, 35 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>>> index ace11ea17097..8df4b2c59bb6 100644
>>>> --- a/drivers/remoteproc/remoteproc_core.c
>>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>>> @@ -1488,6 +1488,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>>>>  	kfree(rproc->cached_table);
>>>>  	rproc->cached_table = NULL;
>>>>  	rproc->table_ptr = NULL;
>>>> +	rproc_release_fw(rproc);
>>>>  unprepare_rproc:
>>>>  	/* release HW resources if needed */
>>>>  	rproc_unprepare_device(rproc);
>>>> @@ -1855,8 +1856,14 @@ static int rproc_boot_recovery(struct rproc *rproc)
>>>>  		return ret;
>>>>  	}
>>>>  
>>>> +	ret = rproc_load_fw(rproc, firmware_p);
>>>
>>> It is not clear to me why in the case of OP-TEE we need to invoke the
>>> "load operation" here, and in the case of "legacy" ELF loading we do it
>>> first thing in rproc_start() (i.e. on the very next line of code being
>>> executed).
>>
>> For the OP-TEE, please refer to my comment above.
>>
>> The only reason I can see for the legacy ELF is that the resource table could
>> contain information to be able to configure some resources to load the firmware.
>> In case of OP-TEE this would be managed in OP-TEE.
>>
> 
> Sure, but as I say...inline rproc_start() here and the very next
> operation we perform is something we call "load".
> 
> Why do we need to differentiate "load" and "load firmware", when we call
> them at the same step in the sequence?

A simplified sequence showing the implementation after this commit would be

1) request_firmware()
2) rproc_load_fw() (optional)
3) rproc_parse_fw()
4) rproc_handle_resources()
5) rproc_load_segments() (optional)
6) rproc->ops->start()

Here we introduce rproc_load_fw() because we need to load the firmware and
authenticate it before parsing it for the resource table calling rproc_parse_fw().

legacy Elf format sequence:
1) request_firmware()
3) rproc_parse_fw()
4) rproc_handle_resources()
5) rproc_load_segments()
6) rproc->ops->start()

Requested remoteproc TEE sequence
1) request_firmware()
2) rproc_load_fw()
3) rproc_parse_fw()
4) rproc_handle_resources()
6) rproc->ops->start()

> 
>>>
>>>
>>> Should we start by renaming rproc_load_segments() rproc_load() and move
>>> it out of rproc_start()? (I.e. here?)
>>>
>>> Perhaps define that rproc_load() is responsible for "loading firmware"
>>> (whatever that means) and establishing rproc->cached_table, and
>>> rproc->table_ptr?
>>>
>>> (Note that this seems like a good cleanup of the spaghetti regardless)
>>>
>>
>> It's something that crossed my mind, but I don't know the legacy well enough to
>> guarantee that it will work in all drivers.
>>
> 
> Looking at this again, if we move it, we need to duplicate it across a
> few different call sites. So might need to take another look at that.
> 
> Still flatten the code and you seem to do:
> 
> if (load_fw)
> 	load_fw();
> if (load)
> 	load();
> 
> Why do we need two different things named "load".

Right, the terminology can be confusing, but both perform a load. We arrived at
this terminology with Mathieu, but if you have a suggestion, don't hesitate to
share.

> 
>> If you want to go in this direction, perhaps this is something that could be
>> addressed in a dedicated pull request? In this case, the ops could become
>> load_fw and load_fw_new, similar to how it is done for platform_driver::remove.
>>
> 
> No need to make it more complicated than it is, change the symbol name
> and fix the 3 places that calls the function.
> 
>>
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>>  	/* boot the remote processor up again */
>>>>  	ret = rproc_start(rproc, firmware_p);
>>>> +	if (ret)
>>>> +		rproc_release_fw(rproc);
>>>
>>> The fact that you rproc_release_fw() in the error path here, right
>>> before we unconditionally release_firmware() the actual firmware means
>>> that you have 2 different life cycles with very very similar names.
>>>
>>> This will contain bugs, sooner or later.
>>
>> So we need to find a better way for the ops if we continue in this direction.
>> What about introducing rproc_load_new and rproc_release?
>>
> 
> You need to help me understand why load and load_new are both needed.
> 
> And no, "load_new" is not an ok name.
> 
>>>
>>>>  
>>>>  	release_firmware(firmware_p);
>>>>  
>>>> @@ -1997,7 +2004,13 @@ int rproc_boot(struct rproc *rproc)
>>>>  			goto downref_rproc;
>>>>  		}
>>>>  
>>>> +		ret = rproc_load_fw(rproc, firmware_p);
>>>> +		if (ret)
>>>> +			goto downref_rproc;
>>>> +
>>>>  		ret = rproc_fw_boot(rproc, firmware_p);
>>>> +		if (ret)
>>>> +			rproc_release_fw(rproc);
>>>>  
>>>>  		release_firmware(firmware_p);
>>>>  	}
>>>> @@ -2071,6 +2084,7 @@ int rproc_shutdown(struct rproc *rproc)
>>>>  	kfree(rproc->cached_table);
>>>>  	rproc->cached_table = NULL;
>>>>  	rproc->table_ptr = NULL;
>>>> +	rproc_release_fw(rproc);
>>>>  out:
>>>>  	mutex_unlock(&rproc->lock);
>>>>  	return ret;
>>>> @@ -2471,7 +2485,7 @@ static int rproc_alloc_ops(struct rproc *rproc, const struct rproc_ops *ops)
>>>>  	if (!rproc->ops->coredump)
>>>>  		rproc->ops->coredump = rproc_coredump;
>>>>  
>>>> -	if (rproc->ops->load)
>>>> +	if (rproc->ops->load || rproc->ops->load_fw)
>>>>  		return 0;
>>>>  
>>>>  	/* Default to ELF loader if no load function is specified */
>>>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
>>>> index 0cd09e67ac14..2104ca449178 100644
>>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>>> @@ -221,4 +221,18 @@ bool rproc_u64_fit_in_size_t(u64 val)
>>>>  	return (val <= (size_t) -1);
>>>>  }
>>>>  
>>>> +static inline void rproc_release_fw(struct rproc *rproc)
>>>> +{
>>>> +	if (rproc->ops->release_fw)
>>>> +		rproc->ops->release_fw(rproc);
>>>> +}
>>>> +
>>>> +static inline int rproc_load_fw(struct rproc *rproc, const struct firmware *fw)
>>>> +{
>>>> +	if (rproc->ops->load_fw)
>>>> +		return rproc->ops->load_fw(rproc, fw);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  #endif /* REMOTEPROC_INTERNAL_H */
>>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>>>> index 2e0ddcb2d792..ba6fd560f7ba 100644
>>>> --- a/include/linux/remoteproc.h
>>>> +++ b/include/linux/remoteproc.h
>>>> @@ -381,6 +381,10 @@ enum rsc_handling_status {
>>>>   * @panic:	optional callback to react to system panic, core will delay
>>>>   *		panic at least the returned number of milliseconds
>>>>   * @coredump:	  collect firmware dump after the subsystem is shutdown
>>>> + * @load_fw:	optional function to load non-ELF firmware image to memory, where the remote
>>>> + *		processor expects to find it.
>>>
>>> Why does it matter if it's an ELF or not?
>>
>> No matter. It was more to differentiate from the legacy one, but it does not
>> make sense and adds to the argument that the ops naming is not accurate.
>>
> 
> I'm probably misunderstanding your intention here, but I can't help
> feeling that you add this code path to avoid understanding or fixing the
> existing code.
> 
>>>
>>> In the Qualcomm case, firmware comes in ELF format, Linux loads the
>>> LOAD segments and the trusted world then authenticates the content and
>>> start the remote processor.
>>>
>>>
>>> I think the difference in your case is that you have memory reserved
>>> elsewhere, and you want the "load" operation to pass the firmware to the
>>> TEE - which means that you need rproc_release_fw() to eventually clean
>>> up the state if rproc_start() fails - and upon shutdown.
>>
>> Yes the OP-TEE is make more stuff:
>> - authenticate several firmware images
> 
> Please tell me how that work. I'm not able to see the multiple calls to
> request_firmware()...