Re: [PATCH v4 11/15] drm/shmem-helper: Add generic memory shrinker

Daniel Vetter <daniel@xxxxxxxx> · Thu, 12 May 2022 19:04:47 +0200

On Thu, 12 May 2022 at 13:36, Dmitry Osipenko
<dmitry.osipenko@xxxxxxxxxxxxx> wrote:
>
> On 5/11/22 22:09, Daniel Vetter wrote:
> > On Wed, May 11, 2022 at 07:06:18PM +0300, Dmitry Osipenko wrote:
> >> On 5/11/22 16:09, Daniel Vetter wrote:
> >>>>>>> I'd like to ask you to reduce the scope of the patchset and build the
> >>>>>>> shrinker only for virtio-gpu. I know that I first suggested to build
> >>>>>>> upon shmem helpers, but it seems that it's easier to do that in a later
> >>>>>>> patchset.
> >>>>>> The first version of the VirtIO shrinker didn't support memory eviction.
> >>>>>> Memory eviction support requires page fault handler to be aware of the
> >>>>>> evicted pages, what should we do about it? The page fault handling is a
> >>>>>> part of memory management, hence to me drm-shmem is already kinda a MM.
> >>>>> Hm I still don't get that part, why does that also not go through the
> >>>>> shmem helpers?
> >>>> The drm_gem_shmem_vm_ops includes the page faults handling, it's a
> >>>> helper by itself that is used by DRM drivers.
> >>>>
> >>>> I could try to move all the shrinker logic to the VirtIO and re-invent
> >>>> virtio_gem_shmem_vm_ops, but what is the point of doing this for each
> >>>> driver if we could have it once and for all in the common drm-shmem code?
> >>>>
> >>>> Maybe I should try to factor out all the shrinker logic from drm-shmem
> >>>> into a new drm-shmem-shrinker that could be shared by drivers? Will you
> >>>> be okay with this option?
> >>> I think we're talking past each another a bit. I'm only bringing up the
> >>> purge vs eviction topic we discussed in the other subthread again.
> >>
> >> Thomas asked to move the whole shrinker code to the VirtIO driver and
> >> I's saying that this is not a great idea to me, or am I misunderstanding
> >> the Thomas' suggestion? Thomas?
> >
> > I think it was just me creating a confusion here.
> >
> > fwiw I do also think that shrinker in shmem helpers makes sense, just in
> > case that was also lost in confusion.
>
> Okay, good that we're on the same page now.
>
> >>>>> I'm still confused why drivers need to know the difference
> >>>>> between evition and purging. Or maybe I'm confused again.
> >>>> Example:
> >>>>
> >>>> If userspace uses IOV addresses, then these addresses must be kept
> >>>> reserved while buffer is evicted.
> >>>>
> >>>> If BO is purged, then we don't need to retain the IOV space allocated
> >>>> for the purged BO.
> >>> Yeah but is that actually needed by anyone? If userspace fails to allocate
> >>> another bo because of lack of gpu address space then it's very easy to
> >>> handle that:
> >>>
> >>> 1. Make a rule that "out of gpu address space" gives you a special errno
> >>> code like ENOSPC
> >>>
> >>> 2. If userspace gets that it walks the list of all buffers it marked as
> >>> purgeable and nukes them (whether they have been evicted or not). Then it
> >>> retries the bo allocation.
> >>>
> >>> Alternatively you can do step 2 also directly from the bo alloc ioctl in
> >>> step 1. Either way you clean up va space, and actually a lot more (you
> >>> potentially nuke all buffers marked as purgeable, not just the ones that
> >>> have been purged already) and only when va cleanup is actually needed
> >>>
> >>> Trying to solve this problem at eviction time otoh means:
> >>> - we have this difference between eviction and purging
> >>> - it's still not complete, you still need to glue step 2 above into your
> >>>   driver somehow, and once step 2 above is glued in doing additional
> >>>   cleanup in the purge function is just duplicated logic
> >>>
> >>> So at least in my opinion this isn't the justification we need. And we
> >>> should definitely not just add that complication "in case, for the
> >>> future", if we don't have a real need right now. Adding it later on is
> >>> easy, removing it later on because it just gets in the way and confuses is
> >>> much harder.
> >>
> >> The IOVA space is only one example.
> >>
> >> In case of the VirtIO driver, we may have two memory allocation for a
> >> BO. One is the shmem allcation in guest and the other is in host's vram.
> >> If we will only release the guest's memory on purge, then the vram will
> >> remain allocated until BO is destroyed, which unnecessarily sub-optimal.
> >
> > Hm but why don't you just nuke the memory on the host side too when you
> > evict? Allowing the guest memory to be swapped out while keeping the host
> > memory allocation alive also doesn't make a lot of sense for me. Both can
> > be recreated (I guess at least?) on swap-in.
>
> Shouldn't be very doable or at least worth the efforts. It's userspace
> that manages data uploading, kernel only provides transport for the
> virtio-gpu commands.
>
> Drivers are free to use the same function for both purge() and evict()
> callbacks if they want. Getting rid of the purge() callback creates more
> problems than solves, IMO.

Hm this still sounds pretty funny and defeats the point of
purgeable/evictable buffers a bit I think. But also I guess we'd
pushed this bikeshed to the max, so I think if you make ->purge
optional and just call ->evict if that's not present, and document it
all in the kerneldoc, then I think that's good.

I just don't think that encouraging drivers to distinguish between
evict/purge is a good idea for almost all of them.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch