Hi everyone, There seems to be a few unsolved problems in the mem2mem framework, one of which is the lack of support for architectures with multiple heterogeneous cores. For example, it is currently impossible to describe Mediatek's LAT and CORE cores to the framework as two independent units to be scheduled. This means that, at all times, one unit is idle while the other one is working. I know that this is not the only problem with m2m, but it is where I'd like to start the discussion. Feel free to add your own requirements to the thread. My proposed solution is to add a new iteration of mem2mem, which I have named the Multi-core Media Scheduler for the lack of a better term. Please note that I will use the terms input/output queues in place of output/capture for the sake of readability. ------------------------------------------------------------------------------- The basic idea is to have a core as the basic entity to be scheduled, with its own input and output VB2 queues. This default will be identical to what we have today in m2m. input output <----- core -----> In all cases, this will be the only interface that the framework will expose to the outside world. The complexity to handle multiple cores will be hidden from callers. This will also allow us to keep the implementation compatible with the current mem2mem interfaces, which expose only two queues. To support multiple cores, each core can connect to another core to establish a data dependency, in which case, they will communicate through a new type of queue, here described as "shared". input shared output <----- core0 -------> core1 ------> This arrangement is basically an extension of the mem2mem idea, like so: mem2mem2mem2mem ...with as many links as there are cores. The key idea is that now, cores can be scheduled independently through a call to schedule(core_number, work) to indicate that they should start processing the work. They can also be marked as idle independently through a job_done(core_number) call. It will be the driver's responsibility to describe the pipeline to the framework, indicating how cores are connected. The driver will also have to implement the logic for schedule() and job_done() for a given core. Queuing buffers into the framework's input queue will push the work into the pipeline. Whenever a job is done, the framework will push the job into the queue that is shared with the downstream core and attempt to schedule it. It will also attempt to pull a workitem from the upstream queue. When the job is processed by the last core in the pipeline, it will be marked as done and pushed into the framework's output queue. At all times, a buffer should have an owner, and the framework will ensure that cores cannot touch buffers belonging to other cores. This workflow can be expanded to account for a group of identical cores, here denoted as "clusters". In such a case, each core will have its own input and output queues: input output input output output <---- core0 -----> <---- core1 ----> -------> <---- core2 ----> input output Ideally, the framework will dispatch work from the output queue with the most amount of items to the input queue with the least amount of items to balance the load. This way, clusters and cores can compose to describe complex architectures. Of course, this is a rough sketch, and there are lots of unexplained minutiae to sort out, but I hope that the general idea is enough to get a discussion going. -- Daniel