Hi Mathieu, On 11/13/20 10:27 PM, Mathieu Poirier wrote: > Hi, > > On Thu, Nov 12, 2020 at 02:56:20PM +0100, Arnaud POULIQUEN wrote: >> Hi Mathieu, >> >> Thanks for initiating the discussion! >> >> Waiting feedback from other, please find my feedback on our proposal below. > > The first version of this set has been released on August 26th and since then, > only you and Peng have given me feedback. As such I suggest that we move > forward with the decision you and I settle on. As usual with open source > development, people can submit new patches to enhance our solution as they see > fit. > >> >> On 10/30/20 8:57 PM, Mathieu Poirier wrote: >>> Adding flags to dictate how to handle a platform driver being removed >>> or the remote processor crashing while in RPROC_ATTACHED state. >>> >>> Signed-off-by: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> >>> --- >>> drivers/remoteproc/remoteproc_core.c | 25 +++++++++++++++++++++++++ >>> include/linux/remoteproc.h | 5 +++++ >>> 2 files changed, 30 insertions(+) >>> >>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >>> index 229fa2cad0bd..d024367c63e5 100644 >>> --- a/drivers/remoteproc/remoteproc_core.c >>> +++ b/drivers/remoteproc/remoteproc_core.c >>> @@ -2227,6 +2227,29 @@ static int rproc_alloc_ops(struct rproc *rproc, const struct rproc_ops *ops) >>> return 0; >>> } >>> >>> +static void rproc_set_automation_flags(struct rproc *rproc) >>> +{ >>> + struct device *dev = rproc->dev.parent; >>> + struct device_node *np = dev->of_node; >>> + bool core_reboot, remote_crash; >>> + >>> + /* >>> + * When function rproc_cdev_release() or rproc_del() are called and >>> + * the remote processor has been attached to, it will be detached from >>> + * (rather than turned off) if "autonomous_on_core_reboot" is specified >>> + * in the DT. >>> + */ >>> + core_reboot = of_property_read_bool(np, "autonomous_on_core_reboot"); >>> + rproc->autonomous_on_core_reboot = core_reboot; >>> + >>> + /* >>> + * When the remote processor crashes it will be detached from, and >>> + * attached to, if "autonomous_on_remote_crash" is specified in the DT. >>> + */ >>> + remote_crash = of_property_read_bool(np, "autonomous_on_remote_crash"); >>> + rproc->autonomous_on_core_reboot = core_reboot; >>> +} >>> + >> >> I wonder if the naming is not too restrictive. > > I'm happy to have this conversation, which is really the point of this second > revision. I turned names and ideas around in my head for a long time and the above is > the best I came up with. Your insight gave me food for thought - see below. > >> >> I think here we probably need first to identify the use cases we want to support >> to determine which use cases should be addressed and deduce DT fields. >> >> Please find my view below: >> >> 1) Attach to a remote processor on boot. >> This is the "attach" you introduced in a previous series. I wonder here if a DT >> field should not be introduce for platform which are not able to dynamically >> determines the remote processor state. Something like "remote-boot-on" or >> "autonomous-boot-on". > > Right - I think "autonomous-on-core-boot" would be best as it really spells out > what is going on. I did not include it in the "attach" patchset because there > wasn't a need for it. Both ST and NXP are able to determine the state of the > remote processor from a platform driver. My initial strategy was to introduce > the functionality when the need for it comes up. I can revisit if you feel > strongly about adding it immediately. No problem to add it later. I mentioned it here only trying to have a complete view of the use cases. > >> >> 2) Detach from a remote processor on Linux kernel shutdown >> Two possible actions: shutdown the remote processor or detach from it. >> A DT field could be used to determine the expected behavior. >> > > That is what the "autonomous-on-core-reboot" was for but reading your > description I think "autonomous-on-core-shutdown" is best to describe the > scenario. > >> 3) Linux core reboot on crash >> Two possible actions: shutdown and restart the remote processor or >> detach/re-attach from/to it. >> Is same DT field than 2) can be used for this . Or should be determine by a >> new sysfs recovery option [1]? > > As far as I can tell nothing happens to drivers when the kernel crashes. To > take action when the kernel crashes each driver needs to register explicitly > with the panic notifier, which the remoteproc doesn't currently do. That's a > different feature that I would like to delay for another time. > > If and when that time comes we can either reuse "autonomous-on-core-shutdown" > or introduce "autonomous-on-core-crash", depending on the level of granularity > needed. Right, i don't know if it possible to determine boot reason on remoteproc probe. (first boot or re-boot after crash). This information will be needed to determine if the firmware has to be re-attached or re-started. > >> >> [1] >> https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/remoteproc/remoteproc_sysfs.c#L45 >> >> 4) The remote processor need to reboot on crash. >> 3 possible actions: >> - shutdown and restart the remote processor > > That is currently the default behavior _if_ recovery is enabled. > >> - detach and re-attach from/to it. > > That is how I intend to use "autonomous-on-remote-crash", _if_ recovery is > enabled. > >> - Just shutdown, as no recovery possible without a system reset. > > That is the current behavior if recovery is _not_ enabled. > The recovery is a user interface, for my pov the recovery sysfs/debugfs is used to allow userland to chose behavior on crash - automatic recovery - asynchronous recovery ( recovery trigged by the userspace) - no recovery As the userspace should be hardware independent, the platform driver should provide information on the capabilities I list above. That's why i proposed to not have a boolean but an enum to handle the crash > Dealing with crash scenarios is a little more complex and requires some > refactoring. That is why I wanted to solely concentrate on the shutdown scenario > in this set. Yes, my feedback was mainly to determine if the properties allow to cover the use cases, to avoid to refactor them later. let start with the shutdown. > >> >> 5) Detach/re-attach on Linux suspend/resume >> Perhaps better to manage this in platform drivers without a generic DT field? > > I think that falls in the same category as power management and is too specific > to be handled in the remoteproc core. As you suggest, it is probably best to > leave that to platform drivers for the time being. > >> >> If i try to apply this on the remote proc boot and shutdown sequences: >> >> 1) on remoteproc device add: >> - Need to determine if the remote processor is already running: >> - started by another entity >> - Linux reboot after crash >> >> 2) On remoteproc device release. >> - Need to determine if the remote processor need to be shutdown or detached: >> - Linux kernel crash >> - Linux kernel graceful shutdown with remote processor keeping ON. >> >> 3) On remote processor crash >> - Need to determine if the remote processor will be restarted by an external >> entity or by the remoteproc framework, or if simply not possible to recover >> without a system reset. >> >> Regarding these use cases here is an alternative proposal(inspired by regulator >> framework): >> - "remote-boot-on": determine on probe if the remoteproc firmware is already >> booted. This field is optional, use by a platform driver which can not >> determine the state of the remote processor. Could be dynamically updated by the >> platform driver to manage Kernel crash... >> >> - "remote-always-on": means that the detach has to be privileged on >> shutdown. Need also to be managed by platform driver as it can be >> compared to the remote processor current state. >> >> - "remoteproc-crash-recovery": crash recovery mode: >> possible value: "SHUTDOWN", "DETACH", "DISABLED" > > I think all of the above scenarios can be managed with a combination of the > proposed bindings , i.e "autonomous-on-core-shutdown" and > "autonomous-on-remote-crash". The latter would be used in conjuction with the > recovery mechanic already available. > > Let me know what you think. "autonomous-on-core-shutdown" is fine and sufficient for this series. So as you suggest let's start with that. As explained above for the "autonomous-on-remote-crash" property i would prefer an enum to also be able to prohibit the recovering as well, especially if the recovering is migrating to the sysfs. let's discuss this in a second step. Thanks, Arnaud > > Mathieu > >> >> >> Regards, >> Arnaud >> >>> /** >>> * rproc_alloc() - allocate a remote processor handle >>> * @dev: the underlying device >>> @@ -2285,6 +2308,8 @@ struct rproc *rproc_alloc(struct device *dev, const char *name, >>> if (rproc_alloc_ops(rproc, ops)) >>> goto put_device; >>> >>> + rproc_set_automation_flags(rproc); >>> + >>> /* Assign a unique device index and name */ >>> rproc->index = ida_simple_get(&rproc_dev_index, 0, 0, GFP_KERNEL); >>> if (rproc->index < 0) { >>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h >>> index 71d4d4873164..9a6e79ef35d7 100644 >>> --- a/include/linux/remoteproc.h >>> +++ b/include/linux/remoteproc.h >>> @@ -516,6 +516,9 @@ struct rproc_dump_segment { >>> * @nb_vdev: number of vdev currently handled by rproc >>> * @char_dev: character device of the rproc >>> * @cdev_put_on_release: flag to indicate if remoteproc should be shutdown on @char_dev release >>> + * @autonomous_on_core_reboot: true if the remote processor should be detached from >>> + * (rather than turned off) when the remoteproc core >>> + * goes away. >>> */ >>> struct rproc { >>> struct list_head node; >>> @@ -554,6 +557,8 @@ struct rproc { >>> u16 elf_machine; >>> struct cdev cdev; >>> bool cdev_put_on_release; >>> + bool autonomous_on_core_reboot : 1, >>> + autonomous_on_remote_crash : 1; >>> }; >>> >>> /** >>>