On Mon, Jun 20, 2022 at 08:18:04AM -0700, Rob Clark wrote: > On Mon, Jun 20, 2022 at 7:09 AM Dmitry Osipenko > <dmitry.osipenko@xxxxxxxxxxxxx> wrote: > > > > On 6/19/22 20:53, Rob Clark wrote: > > ... > > >> +static unsigned long > > >> +drm_gem_shmem_shrinker_count_objects(struct shrinker *shrinker, > > >> + struct shrink_control *sc) > > >> +{ > > >> + struct drm_gem_shmem_shrinker *gem_shrinker = to_drm_shrinker(shrinker); > > >> + struct drm_gem_shmem_object *shmem; > > >> + unsigned long count = 0; > > >> + > > >> + if (!mutex_trylock(&gem_shrinker->lock)) > > >> + return 0; > > >> + > > >> + list_for_each_entry(shmem, &gem_shrinker->lru_evictable, madv_list) { > > >> + count += shmem->base.size; > > >> + > > >> + if (count >= SHRINK_EMPTY) > > >> + break; > > >> + } > > >> + > > >> + mutex_unlock(&gem_shrinker->lock); > > > > > > As I mentioned on other thread, count_objects, being approximate but > > > lockless and fast is the important thing. Otherwise when you start > > > hitting the shrinker on many threads, you end up serializing them all, > > > even if you have no pages to return to the system at that point. > > > > Daniel's point for dropping the lockless variant was that we're already > > in trouble if we're hitting shrinker too often and extra optimizations > > won't bring much benefits to us. > > At least with zram swap (which I highly recommend using even if you > are not using a physical swap file/partition), swapin/out is actually > quite fast. And if you are leaning on zram swap to fit 8GB of chrome > browser on a 4GB device, the shrinker gets hit quite a lot. Lower > spec (4GB RAM) chromebooks can be under constant memory pressure and > can quite easily get into a situation where you are hitting the > shrinker on many threads simultaneously. So it is pretty important > for all shrinkers in the system (not just drm driver) to be as > concurrent as possible. As long as you avoid serializing reclaim on > all the threads, performance can still be quite good, but if you don't > performance will fall off a cliff. > > jfwiw, we are seeing pretty good results (iirc 40-70% increase in open > tab counts) with the combination of eviction + multigen LRU[1] + > sizing zram swap to be 2x physical RAM > > [1] https://lwn.net/Articles/856931/ > > > Alright, I'll add back the lockless variant (or will use yours > > drm_gem_lru) in the next revision. The code difference is very small > > after all. > > > > ... > > >> + /* prevent racing with the dma-buf importing/exporting */ > > >> + if (!mutex_trylock(&gem_shrinker->dev->object_name_lock)) { > > >> + *lock_contention |= true; > > >> + goto resv_unlock; > > >> + } > > > > > > I'm not sure this is a good idea to serialize on object_name_lock. > > > Purgeable buffers should never be shared (imported or exported). So > > > at best you are avoiding evicting and immediately swapping back in, in > > > a rare case, at the cost of serializing multiple threads trying to > > > reclaim pages in parallel. > > > > The object_name_lock shouldn't cause contention in practice. But objects > > are also pinned on attachment, hence maybe this lock is indeed > > unnecessary.. I'll re-check it. > > I'm not worried about contention with export/import/etc, but > contention between multiple threads hitting the shrinker in parallel. > I guess since you are using trylock, it won't *block* the other > threads hitting shrinker, but they'll just end up looping in > do_shrink_slab() because they are hitting contention. > > I'd have to do some experiments to see how it works out in practice, > but my gut feel is that it isn't a good idea Yeah trylock on anything else than the object lock is No Good in the shrinker. And it really shouldn't be needed, since import/export should pin stuff as needed. Which should be protected by the dma_resv object lock. If not, we need to fix that. Picking a random drm-internal lock like this is definitely no good design. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch