Re: Discuss the multi-core media scheduler

Nicolas Dufresne <nicolas@xxxxxxxxxxxx> · Wed, 01 May 2024 14:18:17 -0400

Le mardi 30 avril 2024 à 18:39 -0300, Daniel Almeida a écrit :
> Hi Nicolas,
> 
> 
> > 
> > There is one use case that isn't covered here that we really need to move
> > forward on RPi4/5 is cores that can execute multiple task at once.
> > 
> > In the case of Argon HEVC decoder on the Pi, the Entropy decoder and the
> > Rescontruction is ran in parallel, but the two function are using the same
> > trigger/irq pair.
> > 
> > In short, we need to be able to (if there is enough data in the vb2 queue) to
> > schedule two consecutive jobs at once. On a timeline:
> > 
> > ----------------------------------------------------->
> > [entropy0][no decoder]
> >                      [entropy1][decode0]
> >                                         [entropy2][decode1]
> > 
> > Perhaps it already fits in the RFC, but it wasn't expressed clearly as a use
> > case. For real-time reason, its not really driver responsibility to wait for
> > buffers to be queued, and a no-op can happen in any of the two functions. Also,
> > I believe you can mix entropy decoding from one stream, while decoding a frame
> > from another stream (another video session / m2m ctx).
> > 
> > Nicolas
> > 
> 
> I assume that the cores can be programmed separately, and that you can find which of the two
> cores is now idle when processing the interrupt? i.e.: this is effectively the same scenario we have
> with Mediatek vcodec?

No, there is only 1 core, that implements two features. The scheduling of one
core in this case is still complex, since if possible it should be fed with
multiple jobs.

> 
> If so, this is already covered.
> 
> Basically, whenever a core is done with a job, that will signal the pipeline to try and make progress.  

In current model, a job represent the executation of a task on a single core.
And that task is limited to one mem2mem ctx. In MTK, to fill the pipeline, you'd
need to pick work from possibly multiple mem2mem ctx.

> 
> i.e: you push `entropy0` and `entropy1` at the beginning of the pipeline, that will cause the entropy 
> decoder to start running. Whenever the entropy decoder is done, it will try to schedule the reconstruction
> core with `decode0` and start working on `entropy1`.
> 
> When the reconstruction core is done, it will push `decode0` to the pipeline’s output
> queue and grab `decode1` (from the queue it shares with the upstream core) to work on.
> 
> That way, all cores run concurrently, so long as there is work to do.
> 
> — Daniel