On Wed, 2024-02-07 at 11:34 -0800, John Harrison wrote: > On 2/7/2024 10:49, Tvrtko Ursulin wrote: > > On 07/02/2024 18:12, John Harrison wrote: > > > On 2/7/2024 03:56, Tvrtko Ursulin wrote: > > > > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > > > > > > Add a new query to the GuC submission interface version. > > > > > > > > Mesa intends to use this information to check for old firmware versions > > > > with a known bug where using the render and compute command streamers > > > > simultaneously can cause GPU hangs due issues in firmware scheduling. > > > > > > > > Based on patches from Vivaik and Joonas. > > > > > > > > There is a little bit of an open around the width required for > > > > versions. > > > > While the GuC FW iface tells they are u8, i915 GuC code uses u32: > > > > > > > > #define CSS_SW_VERSION_UC_MAJOR (0xFF << 16) > > > > #define CSS_SW_VERSION_UC_MINOR (0xFF << 8) > > > > #define CSS_SW_VERSION_UC_PATCH (0xFF << 0) > > > > ... > > > > struct intel_uc_fw_ver { > > > > u32 major; > > > > u32 minor; > > > > u32 patch; > > > > u32 build; > > > > }; > > > This is copied from generic code which supports firmwares other than > > > GuC. Only GuC promises to use 8-bit version components. Other > > > firmwares very definitely do not. There is no open. > > > > Ack. > > > > > > > > > > So we could make the query u8, and refactor the struct intel_uc_fw_ver > > > > to use u8, or not. To avoid any doubts on why are we assigning u32 to > > > > u8 I simply opted to use u64. Which avoids the need to add any padding > > > > too. > > > I don't follow how potential 8 vs 32 confusion means jump to 64?! > > > > Suggestion was to use u8 in the uapi in order to align with GuC FW ABI > > (or however it's called), in which case there would be: > > > > ver.major = guc->submission_version.major; > > > > which would be: > > > > (u8) = (u32) > > > > And I was anticipating someone not liking that either. Using too wide > > u64 simply avoids the need to add a padding element to the uapi struct. > > > > If you are positive we need to include a branch number, even though it > > does not seem to be implemented in the code even(*) then I can make > > uapi 4x u32 and achieve the same. > It's not implemented in the code because we've never had to, and it is > yet another train wreck waiting to happen. There are a bunch of issues > at different levels that need to be resolved. But that is all in the > kernel and/or firmware and so can be added by a later kernel update when > necessary. However, if the UMDs are not already taking it into account > or its not even in the UAPI, then we can't back fill in the kernel > later, we are just broken. This sounds to me like a firmware version for internal testing or for pre-production HW, would any branched firmware be released to customers? > > > > > (*) > > static void uc_unpack_css_version(struct intel_uc_fw_ver *ver, u32 > > css_value) > > { > > /* Get version numbers from the CSS header */ > > ver->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR, css_value); > > ver->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR, css_value); > > ver->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, css_value); > > } > > > > No branch field in the CSS header? > I think there is, it's just not officially implemented yet. > > > > > And Why is UMD supposed to reject a non-zero branch? Like how would > > 1.1.3.0 be fine and 1.1.3.1 be bad? I don't get it. But anyway, I can > > respin if you definitely confirm. > Because that is backwards. The branch number goes at the front. > > So, for example (using made up numbers, I don't recall offhand what > versions we have where) say we currently have 0.1.3.0 in tip and 0.1.1.0 > in the last LTS. We then need to ship a critical security fix and back > port it to the LTS. Tip becomes 0.1.3.1 but the LTS can't become 0.1.1.1 > because that version already exists in the history of tip and does not > contain the fix. So the LTS gets branched to 1.1.0.0. We then have both > branches potentially moving forwards with completely independent versioning. > > Exactly the same as 5.8.x, 5.9,y, 6.0.z, etc in the Linux kernel > versioning. You cannot make any assumptions about what might be in > 1.4.5.6 compared to 0.1.2.3. 1.4.5.6 could actually 0.1.0.3 with a stack > of security fixes but none of the features, workarounds or bug fixes > that are in 0.1.2.3. > > Hence, if the branch number changes then all bets are off. You have to > start over and reject anything you do not explicitly know about. > > This is why we were saying that exposing version numbers to UMDs breaks > down horribly as soon as we have to start branching. There is no clean > or simple way to do this. > > John. > > > > > > Regards, > > > > Tvrtko > > > > > > > > > > Compile tested only. > > > > > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > > Cc: Kenneth Graunke <kenneth@xxxxxxxxxxxxx> > > > > Cc: Jose Souza <jose.souza@xxxxxxxxx> > > > > Cc: Sagar Ghuge <sagar.ghuge@xxxxxxxxx> > > > > Cc: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> > > > > Cc: John Harrison <John.C.Harrison@xxxxxxxxx> > > > > Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > > > > Cc: Jani Nikula <jani.nikula@xxxxxxxxx> > > > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > > Cc: Vivaik Balasubrawmanian <vivaik.balasubrawmanian@xxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/i915/i915_query.c | 32 > > > > +++++++++++++++++++++++++++++++ > > > > include/uapi/drm/i915_drm.h | 11 +++++++++++ > > > > 2 files changed, 43 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_query.c > > > > b/drivers/gpu/drm/i915/i915_query.c > > > > index 00871ef99792..999687f6a3d4 100644 > > > > --- a/drivers/gpu/drm/i915/i915_query.c > > > > +++ b/drivers/gpu/drm/i915/i915_query.c > > > > @@ -551,6 +551,37 @@ static int query_hwconfig_blob(struct > > > > drm_i915_private *i915, > > > > return hwconfig->size; > > > > } > > > > +static int > > > > +query_guc_submission_version(struct drm_i915_private *i915, > > > > + struct drm_i915_query_item *query) > > > > +{ > > > > + struct drm_i915_query_guc_submission_version __user *query_ptr = > > > > + u64_to_user_ptr(query->data_ptr); > > > > + struct drm_i915_query_guc_submission_version ver; > > > > + struct intel_guc *guc = &to_gt(i915)->uc.guc; > > > > + const size_t size = sizeof(ver); > > > > + int ret; > > > > + > > > > + if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc)) > > > > + return -ENODEV; > > > > + > > > > + ret = copy_query_item(&ver, size, size, query); > > > > + if (ret != 0) > > > > + return ret; > > > > + > > > > + if (ver.major || ver.minor || ver.patch) > > > > + return -EINVAL; > > > > + > > > > + ver.major = guc->submission_version.major; > > > > + ver.minor = guc->submission_version.minor; > > > > + ver.patch = guc->submission_version.patch; > > > This needs to include the branch version (currently set to zero) in > > > the definition. And the UMD needs to barf if branch comes back as > > > non-zero. I.e. there is no guarantee that a branched version will > > > have the w/a + fix that they are wanting. > > > > > > John. > > > > > > > > > > + > > > > + if (copy_to_user(query_ptr, &ver, size)) > > > > + return -EFAULT; > > > > + > > > > + return 0; > > > > +} > > > > + > > > > static int (* const i915_query_funcs[])(struct drm_i915_private > > > > *dev_priv, > > > > struct drm_i915_query_item *query_item) = { > > > > query_topology_info, > > > > @@ -559,6 +590,7 @@ static int (* const i915_query_funcs[])(struct > > > > drm_i915_private *dev_priv, > > > > query_memregion_info, > > > > query_hwconfig_blob, > > > > query_geometry_subslices, > > > > + query_guc_submission_version, > > > > }; > > > > int i915_query_ioctl(struct drm_device *dev, void *data, struct > > > > drm_file *file) > > > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > > > > index 550c496ce76d..d80d9b5e1eda 100644 > > > > --- a/include/uapi/drm/i915_drm.h > > > > +++ b/include/uapi/drm/i915_drm.h > > > > @@ -3038,6 +3038,7 @@ struct drm_i915_query_item { > > > > * - %DRM_I915_QUERY_MEMORY_REGIONS (see struct > > > > drm_i915_query_memory_regions) > > > > * - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob > > > > uAPI`) > > > > * - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct > > > > drm_i915_query_topology_info) > > > > + * - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct > > > > drm_i915_query_guc_submission_version) > > > > */ > > > > __u64 query_id; > > > > #define DRM_I915_QUERY_TOPOLOGY_INFO 1 > > > > @@ -3046,6 +3047,7 @@ struct drm_i915_query_item { > > > > #define DRM_I915_QUERY_MEMORY_REGIONS 4 > > > > #define DRM_I915_QUERY_HWCONFIG_BLOB 5 > > > > #define DRM_I915_QUERY_GEOMETRY_SUBSLICES 6 > > > > +#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION 7 > > > > /* Must be kept compact -- no holes and well documented */ > > > > /** > > > > @@ -3591,6 +3593,15 @@ struct drm_i915_query_memory_regions { > > > > struct drm_i915_memory_region_info regions[]; > > > > }; > > > > +/** > > > > +* struct drm_i915_query_guc_submission_version - query GuC > > > > submission interface version > > > > +*/ > > > > +struct drm_i915_query_guc_submission_version { > > > > + __u64 major; > > > > + __u64 minor; > > > > + __u64 patch; > > > > +}; > > > > + > > > > /** > > > > * DOC: GuC HWCONFIG blob uAPI > > > > * > > > >