On Thu, Apr 06, 2023 at 12:14:36PM +0200, Christian König wrote: > Am 06.04.23 um 08:37 schrieb Daniel Vetter: > > On Thu, Apr 06, 2023 at 02:08:10AM +0000, Matthew Brost wrote: > > > On Wed, Apr 05, 2023 at 12:12:27PM +0200, Daniel Vetter wrote: > > > > On Wed, 5 Apr 2023 at 11:57, Christian König <christian.koenig@xxxxxxx> wrote: > > > > > Am 05.04.23 um 11:07 schrieb Daniel Vetter: > > > > > > [SNIP] > > > > > > > I would approach it from the complete other side. This component here is a > > > > > > > tool to decide what job should run next. > > > > > > > > > > > > > > How that is then signaled and run should not be part of the scheduler, but > > > > > > > another more higher level component. > > > > > > > > > > > > > > This way you also don't have a problem with not using DMA-fences as > > > > > > > dependencies as well as constrains for running more jobs. > > > > > > I think we're talking about two things here and mixing them up. > > > > > > > > > > > > For the dependencies I agree with you, and imo that higher level tool > > > > > > should probably just be an on-demand submit thread in userspace for the > > > > > > rare case where the kernel would need to sort out a dependency otherwise > > > > > > (due to running out of ringspace in the per-ctx ringbuffer). > > > > > > > > > > > > The other thing is the message passing stuff, and this is what I was > > > > > > talking about above. This has nothing to do with handling dependencies, > > > > > > but with talking to the gpu fw. Here the intel design issue is that the fw > > > > > > only provides a single queue, and it's in-order. Which means it > > > > > > fundamentally has the stalling issue you describe as a point against a > > > > > > message passing design. And fundamentally we need to be able to talk to > > > > > > the fw in the scheduler ->run_job callback. > > > > > > > > > > > > The proposal here for the message passing part is that since it has the > > > > > > stalling issue already anyway, and the scheduler needs to be involved > > > > > > anyway, it makes sense to integrated this (as an optional thing, only for > > > > > > drivers which have this kind of fw interface) into the scheduler. > > > > > > Otherwise you just end up with two layers for no reason and more ping-pong > > > > > > delay because the ->run_job needs to kick off the subordinate driver layer > > > > > > first. Note that for this case the optional message passing support in the > > > > > > drm/scheduler actually makes things better, because it allows you to cut > > > > > > out one layer. > > > > > > > > > > > > Of course if a driver with better fw interface uses this message passing > > > > > > support, then that's bad. Hence the big warning in the kerneldoc. > > > > > Well what I wanted to say is that if you design the dependency handling > > > > > / scheduler properly you don't need the message passing through it. > > > > > > > > > > For example if the GPU scheduler component uses a work item to do it's > > > > > handling instead of a kthread you could also let the driver specify the > > > > > work queue where this work item is executed on. > > > > > > > > > > When you design it like this the driver specifies the thread context of > > > > > execution for it's job. In other words it can specify a single threaded > > > > > firmware work queue as well. > > > > > > > > > > When you then have other messages which needs to be passed to the > > > > > firmware you can also use the same single threaded workqueue for this. > > > > > > > > > > Drivers which have a different firmware interface would just use one of > > > > > the system work queues instead. > > > > > > > > > > This approach basically decouples the GPU scheduler component from the > > > > > message passing functionality. > > > > Hm I guess we've been talking past each another big time, because > > > > that's really what I thought was under discussions? Essentially the > > > > current rfc, but implementing with some polish. > > > > > > > I think Daniel pretty much nailed it here (thanks), to recap: > > > > > > 1. I want the messages in the same worker so run_job / free_job / > > > process_msg execution is mutual exclusive and also so during reset paths > > > if the worker is stopped all the entry points can't be entered. > > > > > > If this is a NAK, then another worker is fine I guess. A lock between > > > run_job / free_job + process_msg should solve the exclusion issue and the > > > reset paths can also stop this new worker too. That being said I'd > > > rather leave this as is but will not fight this point. > > > > > > 2. process_msg is just used to communicate with the firmware using the > > > same queue as submission. Waiting for space in this queue is the only > > > place this function can block (same as submission), well actually we > > > have the concept a preempt time slice but that sleeps for 10 ms by > > > default. Also preempt is only used in LR entities so I don't think it is > > > relavent in this case either. > > > > > > 3. Agree this is in the dma-fence signaling path (if process_msg is in > > > the submission worker) so we can't block indefinitely or an unreasonable > > > period of time (i.e. we must obey dma-fence rules). > > Just to hammer this in: Not just process_msg is in the dma_fence signaling > > path, but the entire fw queue where everything is being funneled through, > > including whatever the fw is doing to process these. > > > > Yes this is terrible and blew up a few times already :-/ > > > > But also, probably something that the docs really need to hammer in, to > > make sure people don't look at this and thinkg "hey this seems to be the > > recommended way to do this on linux". We don't want hw people to build > > more of these designs, they're an absolute pain to deal with with Linux' > > dma_fence signalling and gpu job scheduling rules. > > > > It's just that if you're stuck with such fw, then integrating the flow > > into drm/sched instead of having an extra layer of workers seems the > > better of two pretty bad solutions. > > Yeah and if you have such fw limitations, make sure that you use something > which is understood by lockdep to feed into it. > > In other words, either locks or work item/queue and not some message passing > functionality through the scheduler. > > As far as I can see the approach with the work item/queue should fit your > needs here. dma_fence signalling annotations would also make the scheduler thread suitable to catch issues with lockdep, just to make double-sure it can catch issues. -Daniel > > Christian. > > > -Daniel > > > > > 4. Agree the documentation for thw usage of the messaging interface > > > needs to be clear. > > > > > > 5. Agree that my code could alway use polishing. > > > > > > Lets close on #1 then can I get on general Ack on this part of the RFC > > > and apply the polish in the full review process? > > > > > > Matt > > > > > > > iow I agree with you (I think at least). > > > > -Daniel > > > > -- > > > > Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > http://blog.ffwll.ch > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch