i915_gem_free_object() is called by multiple threads/processes, they all add objects onto the same free_list. The free_list processing worker thread becomes bottle-neck. I see that the worker is mostly a single thread (with particular thread ID), but sometimes multiple threads are launched to process the 'free_list' work concurrently. But the processing speed is still slower than the multiple process's feeding speed, and 'free_list' is holding more and more memory. The worker launching time is delayed a lot, we call queue_work() when we add the first object onto the empty 'free_list', but when the worker is launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is because of waiting currently running worker to finish? This happens with direct call to __i915_gem_free_object_rcu() and no cond_resched(). --CQ > -----Original Message----- > From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > Tang, CQ > Sent: Tuesday, October 13, 2020 9:41 AM > To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH] drm/i915: Make the GEM reclaim workqueue > high priority > > > > > -----Original Message----- > > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Sent: Tuesday, October 13, 2020 9:25 AM > > To: Tang, CQ <cq.tang@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > Subject: Re: [PATCH] drm/i915: Make the GEM reclaim > > workqueue high priority > > > > Quoting Tang, CQ (2020-10-13 17:19:27) > > > Chris, > > > I tested this patch. It is still not enough, I keep catch running out of > lmem. > > Every worker invocation takes larger and larger freeing object count. > > > > > > > Was that with the immediate call (not via call_rcu) to > > __i915_gem_free_object_rcu? > > > > If this brings the freelist under control, the next item is judicious > > use of cond_synchronize_rcu(). We just have to make sure we penalize > > the right hog. > > > > Otherwise, we have to shotgun apply i915_gem_flush_free_objects() and > > still find somewhere to put the rcu sync. > > This is with call_rcu(). > > Then I removed cond_resched(), it does not help, and then I call > __i915_gem_free_object_rcu() directly, still the same error, However, I > noticed that sometimes 'queue_work()' return false, which means the work > is already queued, how? The worker had been called so 'free_list' is empty: > > [ 117.381888] queue_work: 107967, 107930; 1 [ 119.180230] queue_work: > 125531, 125513; 1 [ 121.349308] queue_work: 155017, 154996; 1 [ 124.214885] > queue_work: 193918, 193873; 1 [ 127.967260] queue_work: 256838, 256776; > 1 [ 133.281045] queue_work: 345753, 345734; 1 [ 141.457995] queue_work: > 516943, 516859; 1 [ 156.264420] queue_work: 863622, 863516; 1 [ 156.322619] > queue_work: 865849, 3163; 0 [ 156.448551] queue_work: 865578, 7141; 0 > [ 156.882985] queue_work: 866984, 24138; 0 [ 157.952163] queue_work: > 862902, 53365; 0 [ 159.838412] queue_work: 842522, 95504; 0 [ 174.321508] > queue_work: 937179, 657323; 0 > > --CQ > > > -Chris > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx