Re: About the iGVT-g's requirement to pin guest contexts in VM

Daniel Vetter <daniel@xxxxxxxx> · Wed, 2 Sep 2015 10:19:03 +0200

On Thu, Aug 27, 2015 at 09:50:03AM +0800, Zhiyuan Lv wrote:
> Hi Daniel,
> 
> On Wed, Aug 26, 2015 at 10:56:00AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 25, 2015 at 08:17:05AM +0800, Zhiyuan Lv wrote:
> > > Hi Chris,
> > > 
> > > On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote:
> > > > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> > > > > Hi Chris,
> > > > > 
> > > > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> > > > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > > > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > > > > > > shadowing. The shadow copy is created when guest creates a context.
> > > > > > > If a context changes its LRCA address, the hypervisor is hard to know
> > > > > > > whether it is a new context or not. We always pin context objects to
> > > > > > > global GTT to make life easier.
> > > > > > 
> > > > > > Nak. Please explain why we need to workaround a bug in the host. We
> > > > > > cannot pin the context as that breaks userspace (e.g. synmark) who can
> > > > > > and will try to use more contexts than we have room.
> > > > > 
> > > > > Could you have a look at below reasons and kindly give us your inputs?
> > > > > 
> > > > > 1, Due to the GGTT partitioning, the global graphics memory available
> > > > > inside virtual machines is much smaller than native case. We cannot
> > > > > support some graphics memory intensive workloads anyway. So it looks
> > > > > affordable to just pin contexts which do not take much GGTT.
> > > > 
> > > > Wrong. It exposes the guest to a trivial denial-of-service attack. A
> > > 
> > > Inside a VM, indeed.
> > > 
> > > > smaller GGTT does not actually limit clients (there is greater aperture
> > > > pressure and some paths are less likely but an individual client will
> > > > function just fine).
> > > >  
> > > > > 2, Our hypervisor needs to change i915 guest context in the shadow
> > > > > context implementation. That part will be tricky if the context is not
> > > > > always pinned. One scenario is that when a context finishes running,
> > > > > we need to copy shadow context, which has been updated by hardware, to
> > > > > guest context. The hypervisor knows context finishing by context
> > > > > interrupt, but that time shrinker may have unpin the context and its
> > > > > backing storage may have been swap-out. Such copy may fail. 
> > > > 
> > > > That is just a bug in your code. Firstly allowing swapout on an object
> > > > you still are using, secondly not being able to swapin.
> > > 
> > > As Zhi replied in another email, we do not have the knowledge of guest
> > > driver's swap operations. If we cannot pin context, we may have to ask
> > > guest driver not to swap out context pages. Do you think that would be
> > > the right way to go? Thanks!
> > 
> > It doesn't matter at all - if the guest unpins the ctx and puts something
> > else in there before the host tells it that the ctx is completed, that's a
> > bug in the guest. Same with real hw, we guarantee that the context stays
> > around for long enough.
> 
> You are right. Previously I did not realize that shrinker will check
> not only the seqno, but also "ACTIVE_TO_IDLE" context interrupt for
> unpinning a context, then had above concern. Thanks for the
> explanation!
> 
> > 
> > Also you obviously have to complete the copying from shadow->guest ctx
> > before you send the irq to the guest to signal ctx completion. Which means
> > there's really no overall problem here from a design pov, the only thing
> 
> Right. We cannot control when guest driver sees seqno change, but we
> can control when guest sees context interrupts. The guest CSB update
> and interrupt injection will be after we finish writing guest
> contexts.
> 
> So right now we have two options of context shadowing: one is to track
> the whole life-cycle of guest context, and another is to do the shadow
> work in context schedule-in/schedule-out time. Zhi draws a nice
> picture of them.
> 
> Currently we do not have concrete performance comparison of the two
> approaches. We will have a try and see. And about this patchset, I
> will remove the "context notification" part and send out an updated
> version. Thanks!
> 
> > you have to do is fix up bugs in the host code (probably you should just
> > write through the ggtt).
> 
> Sorry could you elaborate a little more about this? Guest context may
> not always be in aperture right?

Yeah the high-level problem is that global gtt is contended (we already
have trouble with that on xengt and there's the ongoing but unfished
partial mmap support for that). And permanently pinning execlist contexts
will cause lots of troubles.

Windows can do this because it segments the global gtt into different
parts (at least last time I looked at their memory manager), which means
execlist will never sit in the middle of the range used for mmaps. But
linux has a unified memory manager, which means execlist can sit anywhere,
and therefore badly fragment the global gtt. If we pin them then that will
cause trouble after long system uptime. And afaiui xengt is mostly aimed
at servers, where the uptime assumption should be "runs forever".

Compounding factor is that despite that I raised this in the original
review execlist is still not yet using the active list in upstream and
instead does short-time pinning. It's better than pinning forever but
still breaks the eviction logic.

What Chris Wilson and I talked about forever is adding an object-specific
global_gtt_unbind hook. The idea is that this would be called when
unbinding/evicting a special object (e.g. hw context), and you could use
that to do the host signalling. That would be the perfect combination of
both approaches:

- Fast: host signalling (and therefore shadow context recreation) would
  only be done when the execlist context has actually moved around. That
  almost never happens, and hence per-execbuf overhead would be as low as
  with your pinning solution.

- Flexible: The i915 memory manager is still in full control since we
  don't pin any objects unecessarily.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx