On Wed, Apr 28, 2021 at 02:21:54PM +0200, Daniel Vetter wrote: > On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote: > > Am 28.04.21 um 12:05 schrieb Daniel Vetter: > > > On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote: > > > > On Tue, Apr 27, 2021 at 1:35 PM Simon Ser <contact@xxxxxxxxxxx> wrote: > > > > > On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach <l.stach@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > Ok. So that would only make the following use cases broken for now: > > > > > > > > > > > > > > - amd render -> external gpu > > > > > > > - amd video encode -> network device > > > > > > FWIW, "only" breaking amd render -> external gpu will make us pretty > > > > > > unhappy > > > > > I concur. I have quite a few users with a multi-GPU setup involving > > > > > AMD hardware. > > > > > > > > > > Note, if this brokenness can't be avoided, I'd prefer a to get a clear > > > > > error, and not bad results on screen because nothing is synchronized > > > > > anymore. > > > > It's an upcoming requirement for windows[1], so you are likely to > > > > start seeing this across all GPU vendors that support windows. I > > > > think the timing depends on how quickly the legacy hardware support > > > > sticks around for each vendor. > > > Yeah but hw scheduling doesn't mean the hw has to be constructed to not > > > support isolating the ringbuffer at all. > > > > > > E.g. even if the hw loses the bit to put the ringbuffer outside of the > > > userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o > > > pte flags. Otherwise the entire "share address space with cpu side, > > > seamlessly" thing is out of the window. > > > > > > And with that r/o bit on the ringbuffer you can once more force submit > > > through kernel space, and all the legacy dma_fence based stuff keeps > > > working. And we don't have to invent some horrendous userspace fence based > > > implicit sync mechanism in the kernel, but can instead do this transition > > > properly with drm_syncobj timeline explicit sync and protocol reving. > > > > > > At least I think you'd have to work extra hard to create a gpu which > > > cannot possibly be intercepted by the kernel, even when it's designed to > > > support userspace direct submit only. > > > > > > Or are your hw engineers more creative here and we're screwed? > > > > The upcomming hardware generation will have this hardware scheduler as a > > must have, but there are certain ways we can still stick to the old > > approach: > > > > 1. The new hardware scheduler currently still supports kernel queues which > > essentially is the same as the old hardware ring buffer. > > > > 2. Mapping the top level ring buffer into the VM at least partially solves > > the problem. This way you can't manipulate the ring buffer content, but the > > location for the fence must still be writeable. > > Yeah allowing userspace to lie about completion fences in this model is > ok. Though I haven't thought through full consequences of that, but I > think it's not any worse than userspace lying about which buffers/address > it uses in the current model - we rely on hw vm ptes to catch that stuff. > > Also it might be good to switch to a non-recoverable ctx model for these. > That's already what we do in i915 (opt-in, but all current umd use that > mode). So any hang/watchdog just kills the entire ctx and you don't have > to worry about userspace doing something funny with it's ringbuffer. > Simplifies everything. > > Also ofc userspace fencing still disallowed, but since userspace would > queu up all writes to its ringbuffer through the drm/scheduler, we'd > handle dependencies through that still. Not great, but workable. > > Thinking about this, not even mapping the ringbuffer r/o is required, it's > just that we must queue things throug the kernel to resolve dependencies > and everything without breaking dma_fence. If userspace lies, tdr will > shoot it and the kernel stops running that context entirely. > > So I think even if we have hw with 100% userspace submit model only we > should be still fine. It's ofc silly, because instead of using userspace > fences and gpu semaphores the hw scheduler understands we still take the > detour through drm/scheduler, but at least it's not a break-the-world > event. Also no page fault support, userptr invalidates still stall until end-of-batch instead of just preempting it, and all that too. But I mean there needs to be some motivation to fix this and roll out explicit sync :-) -Daniel > > Or do I miss something here? > > > For now and the next hardware we are save to support the old submission > > model, but the functionality of kernel queues will sooner or later go away > > if it is only for Linux. > > > > So we need to work on something which works in the long term and get us away > > from this implicit sync. > > Yeah I think we have pretty clear consensus on that goal, just no one yet > volunteered to get going with the winsys/wayland work to plumb drm_syncobj > through, and the kernel/mesa work to make that optionally a userspace > fence underneath. And it's for a sure a lot of work. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel