On Thu, Sep 23, 2021 at 03:07:06PM +0200, Janusz Krzysztofik wrote: > Hi Matt, > > Thanks for review. > > On czwartek, 23 września 2021 00:24:29 CEST Matt Roper wrote: > > On Fri, Sep 03, 2021 at 04:23:20PM +0200, Janusz Krzysztofik wrote: > > > In preparation for clean driver release, attempts to drain work queues > > > and release freed objects are taken at driver remove time. However, GT > > > buffer pools are now not flushed before the driver release phase. > > > Since unused objects may stay there for up to one second, some may > > > survive until driver release is attempted. That can potentially > > > explain sporadic then hardly reproducible issues observed at driver > > > release time, like non-zero shrink counter or outstanding address space > > > > So just to make sure I'm understanding the description here: > > - We currently do an explicit flush of the buffer pools within the call > > path of drm_driver.release(); this removes all buffers, regardless of > > their age. > > And also triggers release of the buffers' underlying resources (objects, > address space areas). > > > - However there may be other code that runs *earlier* within the > > drm_driver.release() call chain > > Yes, within the drm_driver.release() call chain, but not necessarily earlier > -- that's irrelevant, I believe, ... > > > that expects buffer pools have > > already been flushed and are already empty. > > ... since that other code expects not just buffer pools but resource > categories they consume (objects, address space areas) to be flushed already, > while some may still be busy with old buffers not auto-flushed yet. > > > - Since buffer pools auto-flush old buffers once per second in a worker > > thread, there's a small window where if we remove the driver while > > there are still buffers with an age of less than one second, the > > assumptions of the other release code may be violated. > > Correct. > > > So by moving the flush to driver remove (which executes earlier via the > > pci_driver.remove() flow) you're ensuring that all buffers are flushed > > before _any_ code in drm_driver.release() executes. > > And also flushed before some other code in pci_driver.remove() flushes those > other resource categories released on buffer pools flush, then completeness of > all those flushes is checked in drm_driver.release(). > > May I copy-paste some of you wording while rephrasing my commit description? Sure go ahead. Thanks. Matt > > Thanks, > Janusz > > > > > I found the wording of the commit message here somewhat confusing since > > it's talking about flushes we do in driver release, but mentions > > problems that arise during driver release due to lack of flushing. You > > might want to reword the commit message somewhat to help clarify. > > Otherwise, the code change itself looks reasonable to me. > > > > BTW, I do notice that drm_driver.release() in general is technically > > deprecated at this point (with a suggestion in the drm_drv.h comments to > > switch to using drmm_add_action(), drmm_kmalloc(), etc. to manage the > > cleanup of resources). At some point in the future me may want to > > rework the i915 cleanup in general according to that guidance. > > > > > > Matt > > > > > areas. > > > > > > Flush buffer pools on GT remove as a fix. On driver release, don't > > > flush the pools again, just assert that the flush was called and > > > nothing added more in between. > > > > > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx> > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > --- > > > Resending with Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx as requested, and a > > > typo in commit description fixed. > > > > > > Thanks, > > > Janusz > > > > > > drivers/gpu/drm/i915/gt/intel_gt.c | 2 ++ > > > drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 2 -- > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > > > index 62d40c986642..8f322a4ecd87 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > > > @@ -737,6 +737,8 @@ void intel_gt_driver_remove(struct intel_gt *gt) > > > intel_uc_driver_remove(>->uc); > > > > > > intel_engines_release(gt); > > > + > > > + intel_gt_flush_buffer_pool(gt); > > > } > > > > > > void intel_gt_driver_unregister(struct intel_gt *gt) > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c > > > index aa0a59c5b614..acc49c56a9f3 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c > > > @@ -245,8 +245,6 @@ void intel_gt_fini_buffer_pool(struct intel_gt *gt) > > > struct intel_gt_buffer_pool *pool = >->buffer_pool; > > > int n; > > > > > > - intel_gt_flush_buffer_pool(gt); > > > - > > > for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++) > > > GEM_BUG_ON(!list_empty(&pool->cache_list[n])); > > > } > > > > > > > > -- Matt Roper Graphics Software Engineer VTT-OSGC Platform Enablement Intel Corporation (916) 356-2795