Re: Discuss the multi-core media scheduler

Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> · Fri, 3 May 2024 06:25:14 +0300

Hi Daniel,

On Sun, Apr 28, 2024 at 03:26:35PM -0300, Daniel Almeida wrote:
> Hi everyone,
> 
> There seems to be a few unsolved problems in the mem2mem framework, one of
> which is the lack of support for architectures with multiple heterogeneous
> cores. For example, it is currently impossible to describe Mediatek's LAT and
> CORE cores to the framework as two independent units to be scheduled. This means
> that, at all times, one unit is idle while the other one is working.
> 
> I know that this is not the only problem with m2m, but it is where I'd like to
> start the discussion. Feel free to add your own requirements to the thread.

I'll add a comment, which doesn't solve your problem, but is possibly
still relevant.

We have a need to serve multiple clients and schedule them with
memory-to-memory ISPs. Those devices don't use the M2M framework, as
they have more than just one input and one output queue, and need to
handle formats and selection rectangles in addition to controls and
buffer queues. A few out-of-tree drivers currently create multiple
"virtual" device instances to address this need. I don't like this
solution much, as it creates a lot of video devices, and sets an
arbitrary bound to the number of clients.

We're instead considering solving the issue by exposing the ability to
submit a job through the media controller device. Similarly to the M2M
framework, we would use multiple opens with one file handle per client.
This is similar to the request API, but instead of setting per-request
parameters through video devices and subdevs, we would pass them all in
one go through the media controller device.

At this point we don't foresee the need to support multi-core ISPs, but
there's clearly a need for scheduling multiple clients.

> My proposed solution is to add a new iteration of mem2mem, which I have named
> the Multi-core Media Scheduler for the lack of a better term.
> 
> Please note that I will use the terms input/output queues in place of
> output/capture for the sake of readability.
> 
> -------------------------------------------------------------------------------
> 
> The basic idea is to have a core as the basic entity to be scheduled, with its
> own input and output VB2 queues. This default will be identical to what we have
> today in m2m.
> 
>  input        output
> <----- core ----->
> 
> In all cases, this will be the only interface that the framework will expose to
> the outside world. The complexity to handle multiple cores will be hidden from
> callers. This will also allow us to keep the implementation compatible with
> the current mem2mem interfaces, which expose only two queues.
> 
> To support multiple cores, each core can connect to another core to establish a
> data dependency, in which case, they will communicate through a new type of
> queue, here described as "shared".
> 
>  input           shared         output
> <----- core0 -------> core1 ------>
> 
> This arrangement is basically an extension of the mem2mem idea, like so:
> 
> mem2mem2mem2mem
> 
> ...with as many links as there are cores.
> 
> The key idea is that now, cores can be scheduled independently through a call
> to schedule(core_number, work) to indicate that they should start processing
> the work. They can also be marked as idle independently through a
> job_done(core_number) call.
> 
> It will be the driver's responsibility to describe the pipeline to the
> framework, indicating how cores are connected. The driver will also have to
> implement the logic for schedule() and job_done() for a given core.
> 
> Queuing buffers into the framework's input queue will push the work into the
> pipeline. Whenever a job is done, the framework will push the job into the
> queue that is shared with the downstream core and attempt to schedule it. It
> will also attempt to pull a workitem from the upstream queue.
> 
> When the job is processed by the last core in the pipeline, it will be marked
> as done and pushed into the framework's output queue.
> 
> At all times, a buffer should have an owner, and the framework will ensure that
> cores cannot touch buffers belonging to other cores.
> 
> This workflow can be expanded to account for a group of identical cores, here
> denoted as "clusters". In such a case, each core will have its own input and
> output queues:
> 
>  input      output           input      output      output 
> <---- core0 ----->          <---- core1 ---->     ------->
>                                     <---- core2 ---->
>                                     input      output
> 
> Ideally, the framework will dispatch work from the output queue with the most
> amount of items to the input queue with the least amount of items to balance
> the load. This way, clusters and cores can compose to describe complex
> architectures.
> 
> Of course, this is a rough sketch, and there are lots of unexplained minutiae to
> sort out, but I hope that the general idea is enough to get a discussion going.

-- 
Regards,

Laurent Pinchart