Discuss the multi-core media scheduler

Daniel Almeida <daniel.almeida@xxxxxxxxxxxxx> · Sun, 28 Apr 2024 15:26:35 -0300

Hi everyone,

There seems to be a few unsolved problems in the mem2mem framework, one of
which is the lack of support for architectures with multiple heterogeneous
cores. For example, it is currently impossible to describe Mediatek's LAT and
CORE cores to the framework as two independent units to be scheduled. This means
that, at all times, one unit is idle while the other one is working.

I know that this is not the only problem with m2m, but it is where I'd like to
start the discussion. Feel free to add your own requirements to the thread.

My proposed solution is to add a new iteration of mem2mem, which I have named
the Multi-core Media Scheduler for the lack of a better term.

Please note that I will use the terms input/output queues in place of
output/capture for the sake of readability.

-------------------------------------------------------------------------------

The basic idea is to have a core as the basic entity to be scheduled, with its
own input and output VB2 queues. This default will be identical to what we have
today in m2m.

 input        output
<----- core ----->

In all cases, this will be the only interface that the framework will expose to
the outside world. The complexity to handle multiple cores will be hidden from
callers. This will also allow us to keep the implementation compatible with
the current mem2mem interfaces, which expose only two queues.

To support multiple cores, each core can connect to another core to establish a
data dependency, in which case, they will communicate through a new type of
queue, here described as "shared".

 input           shared         output
<----- core0 -------> core1 ------>

This arrangement is basically an extension of the mem2mem idea, like so:

mem2mem2mem2mem

...with as many links as there are cores.

The key idea is that now, cores can be scheduled independently through a call
to schedule(core_number, work) to indicate that they should start processing
the work. They can also be marked as idle independently through a
job_done(core_number) call.

It will be the driver's responsibility to describe the pipeline to the
framework, indicating how cores are connected. The driver will also have to
implement the logic for schedule() and job_done() for a given core.

Queuing buffers into the framework's input queue will push the work into the
pipeline. Whenever a job is done, the framework will push the job into the
queue that is shared with the downstream core and attempt to schedule it. It
will also attempt to pull a workitem from the upstream queue.

When the job is processed by the last core in the pipeline, it will be marked
as done and pushed into the framework's output queue.

At all times, a buffer should have an owner, and the framework will ensure that
cores cannot touch buffers belonging to other cores.

This workflow can be expanded to account for a group of identical cores, here
denoted as "clusters". In such a case, each core will have its own input and
output queues:

 input      output           input      output      output 
<---- core0 ----->          <---- core1 ---->     ------->
                                    <---- core2 ---->
                                    input      output

Ideally, the framework will dispatch work from the output queue with the most
amount of items to the input queue with the least amount of items to balance
the load. This way, clusters and cores can compose to describe complex
architectures.

Of course, this is a rough sketch, and there are lots of unexplained minutiae to
sort out, but I hope that the general idea is enough to get a discussion going.

-- Daniel