Dne sobota, 05. februar 2022 ob 08:40:21 CET je Jernej Škrabec napisal(a): > Dne petek, 04. februar 2022 ob 10:09:56 CET je Paul Kocialkowski napisal(a): > > Hi Jernej, > > > > On Tue 01 Feb 22, 19:33, Jernej Skrabec wrote: > > > Currently, if job is not completed for whatever reason, userspace > > > application can hang on ioctl and thus become unkillable. > > > > > > In order to prevent that, implement watchdog, which will complete job > > > after 2 seconds with error state. > > > > > > Concept is borrowed from hantro driver. > > > > Good idea to implement a watchdog here, thanks! > > See comments below. > > > > > Signed-off-by: Jernej Skrabec <jernej.skrabec@xxxxxxxxx> > > > --- > > > > > > drivers/staging/media/sunxi/cedrus/cedrus.c | 2 ++ > > > drivers/staging/media/sunxi/cedrus/cedrus.h | 3 +++ > > > .../staging/media/sunxi/cedrus/cedrus_dec.c | 4 +++ > > > .../staging/media/sunxi/cedrus/cedrus_hw.c | 25 +++++++++++++++++++ > > > .../staging/media/sunxi/cedrus/cedrus_hw.h | 2 ++ > > > 5 files changed, 36 insertions(+) > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus.c index > > > 4a4b714b0f26..68b3dcdb5df3 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.c > > > @@ -439,6 +439,8 @@ static int cedrus_probe(struct platform_device > > > *pdev) > > > > > > mutex_init(&dev->dev_mutex); > > > > > > + INIT_DELAYED_WORK(&dev->watchdog_work, cedrus_watchdog); > > > + > > > > > > ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev); > > > if (ret) { > > > > > > dev_err(&pdev->dev, "Failed to register V4L2 > > device\n"); > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h > > > b/drivers/staging/media/sunxi/cedrus/cedrus.h index > > > c345f2984041..3bc094eb497f 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus.h > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h > > > @@ -24,6 +24,7 @@ > > > > > > #include <linux/iopoll.h> > > > #include <linux/platform_device.h> > > > > > > +#include <linux/workqueue.h> > > > > > > #define CEDRUS_NAME "cedrus" > > > > > > @@ -194,6 +195,8 @@ struct cedrus_dev { > > > > > > struct reset_control *rstc; > > > > > > unsigned int capabilities; > > > > > > + > > > + struct delayed_work watchdog_work; > > > > > > }; > > > > > > extern struct cedrus_dec_ops cedrus_dec_ops_mpeg2; > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index > > > a16c1422558f..9c7200299465 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c > > > @@ -97,4 +97,8 @@ void cedrus_device_run(void *priv) > > > > > > v4l2_ctrl_request_complete(src_req, &ctx->hdl); > > > > > > dev->dec_ops[ctx->current_codec]->trigger(ctx); > > > > > > + > > > + /* Start the watchdog timer. */ > > > + schedule_delayed_work(&dev->watchdog_work, > > > + msecs_to_jiffies(2000)); > > > > > > } > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index > > > 2d7663726467..a6470a89851e 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c > > > @@ -118,6 +118,13 @@ static irqreturn_t cedrus_irq(int irq, void *data) > > > > > > enum vb2_buffer_state state; > > > enum cedrus_irq_status status; > > > > > > + /* > > > + * If cancel_delayed_work returns false it means watchdog already > > > + * executed and finished the job. > > > + */ > > > + if (!cancel_delayed_work(&dev->watchdog_work)) > > > + return IRQ_HANDLED; > > > + > > > > > > ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev); > > > if (!ctx) { > > > > > > v4l2_err(&dev->v4l2_dev, > > > > > > @@ -143,6 +150,24 @@ static irqreturn_t cedrus_irq(int irq, void *data) > > > > > > return IRQ_HANDLED; > > > > > > } > > > > > > +void cedrus_watchdog(struct work_struct *work) > > > +{ > > > + struct cedrus_dev *dev; > > > + struct cedrus_ctx *ctx; > > > + > > > + dev = container_of(to_delayed_work(work), > > > + struct cedrus_dev, watchdog_work); > > > + > > > + ctx = v4l2_m2m_get_curr_priv(dev->m2m_dev); > > > + if (!ctx) > > > + return; > > > + > > > + v4l2_err(&dev->v4l2_dev, "frame processing timed out!\n"); > > > + reset_control_reset(dev->rstc); > > > > I don't think playing with the reset is the right approach here. > > First we don't really know if the reset is shared or not, so this might > > have no effect. > > AFAIK only few reset lines are shared in all Allwinner SoC, never for Cedrus > and even then, this is considered as HW issue. So, I'm good with using > reset line. This principle is also taken from Hantro driver. > > > Then even if it does, wouldn't this just reset the state of the > > registers to an unconfigured state, making it impossible to decode any > > future frame in the same context? > > Being stateless core, all context is held in auxiliary buffers, reference > frames and controls, which are not reset with pulsing reset line, so no, > state is not lost. Anyway, if decoding fails, you're generally screwed > until next key frame. You'll have to deal with decoding issues/artefacts > nevertheless. > > Honestly I'm not sure what a good process would be to get back on track > > here so I would be tempted to just do nothing an return errors. > > > > That's already better than being stuck. > > Doing nothing will solve only current job, but HW will still be stuck in > decoding state. I doubt reprogramming registers and triggering new decoding > will actually do anything. > > I'll check BSP lib sources again. Maybe selecting non-existing decoding mode > would reset the core. That is already suggested as good thing to do in > order to put core in low power mode. BSP kernel driver also pulses reset line: https://github.com/orangepi-xunlong/linux-orangepi/blob/orange-pi-4.9-sun50iw9/drivers/media/cedar-ve/cedar_ve.c#L848-L851 I still think this is the way to go. Best regards, Jernej > > IMO we have to do something. Doing nothing will probably just lock up the > core until next reboot or maybe until trying different decoding mode. > > Anyway, I have to find another way to cause decoding job to time out. > Currently I'm doing this with IOMMU on H6, but that brings down several > other things, which requires reboot anyway. > > Best regards, > Jernej > > > Paul > > > > > + v4l2_m2m_buf_done_and_job_finish(ctx->dev->m2m_dev, ctx- > > > >fh.m2m_ctx, > > > > > + VB2_BUF_STATE_ERROR); > > > +} > > > + > > > > > > int cedrus_hw_suspend(struct device *device) > > > { > > > > > > struct cedrus_dev *dev = dev_get_drvdata(device); > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.h > > > b/drivers/staging/media/sunxi/cedrus/cedrus_hw.h index > > > 45f641f0bfa2..7c92f00e36da 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.h > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.h > > > @@ -28,4 +28,6 @@ int cedrus_hw_resume(struct device *device); > > > > > > int cedrus_hw_probe(struct cedrus_dev *dev); > > > void cedrus_hw_remove(struct cedrus_dev *dev); > > > > > > +void cedrus_watchdog(struct work_struct *work); > > > + > > > > > > #endif