On Fri, Aug 23, 2024 at 03:08:40PM +0200, Andi Shyti wrote: > Hi, > > This patch series introduces static load balancing for GPUs with > multiple compute engines. It's a lengthy series, and some > challenging aspects still need to be resolved. Do we have an actual user for this, where just reloading the entire driver (or well-rebinding, if you only want to change the value for a specific device) with a new module option isn't enough? There's some really gnarly locking and lifetime fun in there, and it needs a corresponding justification. Which needs to be enormous for this case, meaning actual customers willing to shout on dri-devel that they really, absolutely need this, or their machines will go up in flames. Otherwise this is a nack from me. Thanks, Sima > > I have tried to split the work as much as possible to facilitate > the review process. > > To summarize, in patches 1 to 14, no functional changes occur > except for the addition of the num_cslices interface. The > significant changes happen in patch 15, which is the core part of > the CCS mode setting, utilizing the groundwork laid in the > earlier patches. > > In this updated approach, the focus is now on managing the UABI > engine list, which controls the engines exposed to userspace. > Instead of manipulating phuscal engines and their memory, we now > handle engine exposure through this list. > > I would greatly appreciate further input from all reviewers who > have already assisted with the previous work. > > IGT tests have also been developed, but I haven't sent them yet. > > Thank you Chris for the offline reviews. > > Thanks, > Andi > > Changelog: > ========== > PATCHv2 -> PATCHv3 > ------------------ > - Fix a NULL pointer dereference during module unload. > In i915_gem_driver_remove() I was accessing the gt after the > gt was removed. Use the dev_priv, instead (obviously!). > - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks > were not correctly placed in the exit paths. > - Fix a checkpatch error for spaces after and before parenthesis > in the for_each_enabled_engine() definition. > > PATCHv1 -> PATCHv2 > ------------------ > - Use uabi_mutex to protect the uabi_engines, not the engine > itself. Rename it to uabi_engines_mutex. > - Use kobject_add/kobject_del for adding and removing > interfaces, this way we don't need to destroy and recreate the > engines, anymore. Refactor intel_engine_add_single_sysfs() to > reflect this scenario. > - After adding engines to the rb_tree check that they have been > added correctly. > - Fix rb_find_add() compare function to take into accoung also > the class, not just the instance. > > RFCv2 -> PATCHv1 > ---------------- > - Removed gt->ccs.mutex > - Rename m -> width, ccs_id -> engine in > intel_gt_apply_ccs_mode(). > - In the CCS register value calculation > (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move > along the ccs_mask (set by the user) instead of the > cslice_mask. > - Add GEM_BUG_ON after calculating the new ccs_mask > (update_ccs_mask()) to make sure all angines have been > evaluated (i.e. ccs_mask must be '0' at the end of the > algorithm). > - move wakeref lock before evaluating intel_gt_pm_is_awake() and > fix exit path accordingly. > - Use a more compact form in intel_gt_sysfs_ccs_init() and > add_uabi_ccs_engines() when evaluating sysfs_create_file(): no > need to store the return value to the err variable which is > unused. Get rid of err. > - Print a warnging instead of a debug message if we fail to > create the sysfs files. > - If engine files creation fails in > intel_engine_add_single_sysfs(), print a warning, not an > error. > - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment > to explain its purpose. > - During uabi engine creation, in > intel_engines_driver_register(), the uabi_ccs_instance is > redundant because the ccs_instances is already tracked in > engine->uabi_instance. > - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as > __maybe_unused not to break bisectability. They wouldn't > compile in their own commit. They will be used in the next > patch and the __maybe_unused is removed. > - Update engine's workaround every time a new mode is set in > update_ccs_mask(). > - Mark engines as valid or invalid using their status as > rb_node. Invalid engines are marked as invalid using > RB_CLEAR_NODE(). Execbufs will check for their validity when > selecting the engine to be combined to a context. > - Create for_each_enabled_engine() which skips the non valid > engines and use it in selftests. > > RFCv1 -> RFCv2 > -------------- > Compared to the first version I've taken a completely different > approach to adding and removing engines. in v1 physical engines > were directly added and removed, along with the memory allocated > to them, each time the user changed the CCS mode (from the > previous cover letter). > > Andi Shyti (15): > drm/i915/gt: Avoid using masked workaround for CCS_MODE setting > drm/i915/gt: Move the CCS mode variable to a global position > drm/i915/gt: Allow the creation of multi-mode CCS masks > drm/i915/gt: Refactor uabi engine class/instance list creation > drm/i915/gem: Mark and verify UABI engine validity > drm/i915/gt: Introduce for_each_enabled_engine() and apply it in > selftests > drm/i915/gt: Manage CCS engine creation within UABI exposure > drm/i915/gt: Remove cslices mask value from the CCS structure > drm/i915/gt: Expose the number of total CCS slices > drm/i915/gt: Store engine-related sysfs kobjects > drm/i915/gt: Store active CCS mask > drm/i915: Protect access to the UABI engines list with a mutex > drm/i915/gt: Isolate single sysfs engine file creation > drm/i915/gt: Implement creation and removal routines for CCS engines > drm/i915/gt: Allow the user to change the CCS mode through sysfs > > drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 + > .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 28 +- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 23 -- > drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 + > drivers/gpu/drm/i915/gt/intel_engine_user.c | 62 ++- > drivers/gpu/drm/i915/gt/intel_gt.c | 3 + > drivers/gpu/drm/i915/gt/intel_gt.h | 12 + > drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 353 +++++++++++++++++- > drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 5 +- > drivers/gpu/drm/i915/gt/intel_gt_sysfs.c | 2 + > drivers/gpu/drm/i915/gt/intel_gt_types.h | 19 +- > drivers/gpu/drm/i915/gt/intel_workarounds.c | 8 +- > drivers/gpu/drm/i915/gt/selftest_context.c | 6 +- > drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 4 +- > .../drm/i915/gt/selftest_engine_heartbeat.c | 6 +- > drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 6 +- > drivers/gpu/drm/i915/gt/selftest_execlists.c | 52 +-- > drivers/gpu/drm/i915/gt/selftest_gt_pm.c | 2 +- > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 22 +- > drivers/gpu/drm/i915/gt/selftest_lrc.c | 18 +- > drivers/gpu/drm/i915/gt/selftest_mocs.c | 6 +- > drivers/gpu/drm/i915/gt/selftest_rc6.c | 4 +- > drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +- > .../drm/i915/gt/selftest_ring_submission.c | 2 +- > drivers/gpu/drm/i915/gt/selftest_rps.c | 14 +- > drivers/gpu/drm/i915/gt/selftest_timeline.c | 14 +- > drivers/gpu/drm/i915/gt/selftest_tlb.c | 2 +- > .../gpu/drm/i915/gt/selftest_workarounds.c | 14 +- > drivers/gpu/drm/i915/gt/sysfs_engines.c | 79 ++-- > drivers/gpu/drm/i915/gt/sysfs_engines.h | 2 + > drivers/gpu/drm/i915/i915_cmd_parser.c | 2 + > drivers/gpu/drm/i915/i915_debugfs.c | 4 + > drivers/gpu/drm/i915/i915_drv.h | 5 + > drivers/gpu/drm/i915/i915_gem.c | 4 + > drivers/gpu/drm/i915/i915_perf.c | 8 +- > drivers/gpu/drm/i915/i915_pmu.c | 11 +- > drivers/gpu/drm/i915/i915_query.c | 21 +- > 37 files changed, 643 insertions(+), 193 deletions(-) > > -- > 2.45.2 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch