> -----Original Message----- > From: xiang xiao <xiaoxiang781216@xxxxxxxxx> > Sent: jeudi 8 novembre 2018 19:11 > To: Loic PALLARDY <loic.pallardy@xxxxxx> > Cc: bjorn.andersson@xxxxxxxxxx; linux-remoteproc@xxxxxxxxxxxxxxx; > spjoshi@xxxxxxxxxxxxxx > Subject: Re: Regression by commit > 7e83cab824a86704cdbd7735c19d34e0ce423dc5 > > Loic, The patch can't fix my issue:(, how do you trigger virtio device > destroy process? > 1.Go through shutdown/start circle > 2.Or trigger some crash manually > My problem happen in the second case, maybe you can try it to see what > happen on your platform. > I run stress test doing start/shutdown circle on my platform and found issue I reported. I'll do the same with crash and recovery using proposed debugfs interface for that and let you know Regards, Loic > On Thu, Nov 8, 2018 at 4:16 PM Loic PALLARDY <loic.pallardy@xxxxxx> wrote: > > > > > > > > > -----Original Message----- > > > From: linux-remoteproc-owner@xxxxxxxxxxxxxxx <linux-remoteproc- > > > owner@xxxxxxxxxxxxxxx> On Behalf Of Bjorn Andersson > > > Sent: jeudi 8 novembre 2018 08:26 > > > To: xiang xiao <xiaoxiang781216@xxxxxxxxx> > > > Cc: linux-remoteproc@xxxxxxxxxxxxxxx; spjoshi@xxxxxxxxxxxxxx > > > Subject: Re: Regression by commit > > > 7e83cab824a86704cdbd7735c19d34e0ce423dc5 > > > > > > On Wed 07 Nov 22:36 PST 2018, xiang xiao wrote: > > > > > > > On Thu, Nov 8, 2018 at 2:18 PM Bjorn Andersson > > > > <bjorn.andersson@xxxxxxxxxx> wrote: > > > > > > > > > > On Wed 07 Nov 06:25 PST 2018, xiang xiao wrote: > > > > > > > > > > > This commit replace rproc_{shutdown,boot}() with > rproc_{stop,start}(), > > > > > > which skip destroy the virtio device at stop but reinitialize it again at > > > > > > start: > > > > > > [ 603.446805] remoteproc remoteproc0: crash detected in > > > > > > f9210000.toppwr:tl421-rproc: type mmufault > > > > > > [ 603.456883] remoteproc remoteproc0: handling crash #1 in > > > > > > f9210000.toppwr:tl421-rproc > > > > > > [ 603.469593] remoteproc remoteproc0: recovering > > > > > > f9210000.toppwr:tl421-rproc > > > > > > [ 603.483172] remoteproc remoteproc0: stopped remote processor > > > > > > f9210000.toppwr:tl421-rproc > > > > > > [ 603.495999] kobject (ffffffc0b8c51098): tried to init an initialized > > > > > > object, something is seriously wrong. > > > > > > > > > > > > > > > > I thought this issue was fixed. > > > > > > > > > Which patch fix this issue? I am using 4.19 kernel. > > > > > > > > > > I'm unable to find such commit right now. And given your report I > > > believe this still is a bug then. > > > > I send fix [1] for a similar issue in July. After investigation, I found an issue > at rpmsg level in remove procedure. > > In some case virtio device can't be destroyed as some child are still > registered. > > Could you please test my patch to see if it fix your issue? > > > > Regards, > > Loic > > > > 1: https://patchwork.kernel.org/patch/10544757/ > > > > > > > > > > > ^^^^^^^^^^^^^^^^^^^^^ > > > > > > [ 603.506868] CPU: 5 PID: 198 Comm: kworker/5:1 Tainted: G W > > > > > > 4.9.27-04454-gd4c1829-dirty #255 > > > > > > [ 603.517468] Hardware name: Banks (DT) > > > > > > [ 603.521581] Workqueue: events rproc_crash_handler_work > > > > > > [ 603.527342] Call trace: > > > > > > [ 603.530086] [<ffffff800808bd9c>] dump_backtrace+0x0/0x1cc > > > > > > [ 603.536115] [<ffffff800808bf7c>] show_stack+0x14/0x1c > > > > > > [ 603.541771] [<ffffff80083fef08>] dump_stack+0xa8/0xe0 > > > > > > [ 603.547423] [<ffffff8008402b24>] kobject_init+0x8c/0x9c > > > > > > [ 603.553280] [<ffffff800853758c>] device_initialize+0x3c/0xe8 > > > > > > [ 603.559609] [<ffffff80085397d4>] device_register+0x14/0x28 > > > > > > [ 603.565750] [<ffffff80084b777c>] > register_virtio_device+0xc4/0x114 > > > > > > [ 603.572669] [<ffffff8008878b20>] > rproc_add_virtio_dev+0x7c/0x108 > > > > > > [ 603.579390] [<ffffff8008875cbc>] > rproc_vdev_do_probe+0x14/0x1c > > > > > > [ 603.585911] [<ffffff8008875a60>] rproc_start+0xac/0x1ac > > > > > > [ 603.591754] [<ffffff8008877a68>] > > > rproc_trigger_recovery+0x2f8/0x324 > > > > > > [ 603.598763] [<ffffff8008877b24>] > > > rproc_crash_handler_work+0x90/0xb0 > > > > > > [ 603.605778] [<ffffff80080cd570>] > process_one_work+0x204/0x704 > > > > > > [ 603.612202] [<ffffff80080cdac4>] worker_thread+0x54/0x4a8 > > > > > > [ 603.618248] [<ffffff80080d4aec>] kthread+0xec/0x100 > > > > > > [ 603.623703] [<ffffff8008083890>] ret_from_fork+0x10/0x40 > > > > > > > > > > > > When the crash happen, is it better to destroy and recreate all virtio > > > > > > device and it's children(rpmsg device) again to match the remote > side > > > state > > > > > > like the original behavior? > > > > > > > > > > > > > > > > Yes, it's likely that the protocols on top does share some state, so we > > > > > do not have any choice but to report this up to the virtio device. > > > > > > > > > > Removing and re-probing the devices - rather than having some other > > > form > > > > > of notification of this event - makes the code simpler. > > > > > > > > > > > > > > > But it seems we're trying to re-register the same device the second > > > > > time, rather than initialize a new one. > > > > > > > > > If we use one new here, the old need to be destroyed to avoid the > leak. > > > > Basically, it's become the old approach again. > > > > > > > > > > You're right, for vdevs we must tear them down and bring them up again. > > > > > > The reason for introducing the offending commit was for carveouts to > > > be allocated past the stop() call, so that we can stop the core and > > > provide a coredump for post mortem debugging. > > > > > > > We are building many rpmsg devices on top of rproc/virtio, each one > > > > has the internal state sync with the remote side. > > > > If remote side crash and reboot again, all state is stale and need to > > > > reset to the default state. > > > > Removing and re-probing is the clean and simple solution, I think too. > > > > > > > > > > Regards, > > > Bjorn