On 09.11.2022 18:46, John Harrison wrote: > On 11/9/2022 03:05, Tvrtko Ursulin wrote: >> On 08/11/2022 20:15, John Harrison wrote: >>> On 11/8/2022 01:01, Tvrtko Ursulin wrote: >>>> On 07/11/2022 19:14, John Harrison wrote: >>>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote: >>>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote: >>>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote: >>>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@xxxxxxxxx wrote: >>>>>>>>> From: John Harrison <John.C.Harrison@xxxxxxxxx> >>>>>>>>> >>>>>>>>> When trying to analyse bug reports from CI, customers, etc. it >>>>>>>>> can be >>>>>>>>> difficult to work out exactly what is happening on which GT in a >>>>>>>>> multi-GT system. So add GT oriented debug/error message >>>>>>>>> wrappers. If >>>>>>>>> used instead of the drm_ equivalents, you get the same output >>>>>>>>> but with >>>>>>>>> a GT# prefix on it. >>>>>>>>> >>>>>>>>> Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx> >>>>>>>> >>>>>>>> The only downside to this is that we'll print "GT0: " even on >>>>>>>> single-GT devices. We could introduce a gt->info.name and print >>>>>>>> that, so we could have it different per-platform, but IMO it's >>>>>>>> not worth the effort. >>>>>>>> >>>>>>>> Reviewed-by: Daniele Ceraolo Spurio >>>>>>>> <daniele.ceraolospurio@xxxxxxxxx> >>>>>>>> >>>>>>>> I think it might be worth getting an ack from one of the >>>>>>>> maintainers to make sure we're all aligned on transitioning to >>>>>>>> these new logging macro for gt code. >>>>>>> >>>>>>> Idea is I think a very good one. First I would suggest >>>>>>> standardising to lowercase GT in logs because: >>>>>>> >>>>>>> $ grep "GT%" i915/ -r >>>>>>> $ grep "gt%" i915/ -r >>>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id)) >>>>>>> i915/gt/intel_gt_sysfs.c: "failed to initialize >>>>>>> gt%d sysfs root\n", gt->info.id); >>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to >>>>>>> create gt%u RC6 sysfs files (%pe)\n", >>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs >>>>>>> files (%pe)\n", >>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to >>>>>>> create gt%u RPS sysfs files (%pe)", >>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to >>>>>>> create gt%u punit_req_freq_mhz sysfs (%pe)", >>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle >>>>>>> sysfs files (%pe)", >>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u >>>>>>> media_perf_power_attrs sysfs (%pe)\n", >>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to add >>>>>>> gt%u rps defaults (%pe)\n", >>>>>>> i915/i915_driver.c: drm_err(>->i915->drm, "gt%d: >>>>>>> intel_pcode_init failed %d\n", id, ret); >>>>>>> i915/i915_hwmon.c: snprintf(ddat_gt->name, sizeof(ddat_gt->name), >>>>>>> "i915_gt%u", i); >>>>>>> >>>>> >>>>> Just because there are 11 existing instances of one form doesn't >>>>> mean that the 275 instances that are waiting to be converted should >>>>> be done incorrectly. GT is an acronym and should be capitalised. >>>> >>>> Okay just make it consistent then. >>>> >>>>> Besides: >>>>> grep -r "GT " i915 | grep '"' >>>>> i915/vlv_suspend.c: drm_err(&i915->drm, "timeout >>>>> disabling GT waking\n"); >>>>> i915/vlv_suspend.c: "timeout waiting for GT >>>>> wells to go %s\n", >>>>> i915/vlv_suspend.c: drm_dbg(&i915->drm, "GT register access >>>>> while GT waking disabled\n"); >>>>> i915/i915_gpu_error.c: err_printf(m, "GT awake: %s\n", >>>>> str_yes_no(gt->awake)); >>>>> i915/i915_debugfs.c: seq_printf(m, "GT awake? %s [%d], %llums\n", >>>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on >>>>> %s)", engine->name); >>>>> i915/intel_uncore.c: "GT thread status wait timed >>>>> out\n"); >>>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(>->i915->drm, "GT >>>>> failed to idle: %d\n", ret); >>>>> i915/gt/uc/selftest_guc.c: drm_err(>->i915->drm, "GT failed to >>>>> idle: %d\n", ret); >>>>> i915/gt/uc/selftest_guc.c: drm_err(>->i915->drm, "GT failed to >>>>> idle: %d\n", ret); >>>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as >>>>> "multicast" or "replicated" registers: >>>>> i915/gt/selftest_rps.c: pr_info("%s: rps counted %d >>>>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency >>>>> of %uKHz\n", >>>>> i915/gt/selftest_hangcheck.c: pr_err("[%s] GT is >>>>> wedged!\n", engine->name); >>>>> i915/gt/selftest_hangcheck.c: pr_err("GT is wedged!\n"); >>>>> i915/gt/intel_gt_clock_utils.c: "GT clock frequency >>>>> changed, was %uHz, now %uHz!\n", >>>>> i915/gt/selftest_engine_pm.c: pr_err("Unable to flush GT >>>>> pm before test\n"); >>>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n"); >>>>> i915/i915_sysfs.c: "failed to register GT >>>>> sysfs directory\n"); >>>>> i915/intel_uncore.h: * of the basic non-engine GT registers >>>>> (referred to as "GSI" on >>>>> i915/intel_uncore.h: * newer platforms, or "GT block" on older >>>>> platforms)? If so, we'll >>>>> >>>>> >>>>> >>>>>>> Then there is a question of naming. Are we okay with GT_XXX or, >>>>>>> do we want intel_gt_, or something completely different. I don't >>>>>>> have a strong opinion at the moment so I'll add some more folks >>>>>>> to Cc. >>>>>> >>>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would >>>>> prefer just gt_err("msg") to keep it as close to the official drm_* >>>>> versions as possible. Print lines tend to be excessively long >>>>> already. Taking a 'gt' parameter instead of a '>->i915->drm' >>>>> parameter does help with that but it seems like calling the wrapper >>>>> intel_gt_* is shooting ourselves in the foot on that one. And >>>>> GT_ERR vs gt_err just comes down to the fact that it is a macro >>>>> wrapper and therefore is required to be in upper case. >>>>> >>>>>> There was a maintainer level mini-discussion on this topic which I >>>>>> will try to summarise. >>>>>> >>>>>> Main contention point was the maintenance cost and generally an >>>>>> undesirable pattern of needing to add many >>>>>> subsystem/component/directory specific macros. Which then >>>>>> typically need extra flavours and so on. But over verbosity of the >>>>> How many versions are you expecting to add? Beyond the tile >>>>> instance, what further addressing requirements are there? The card >>>>> instance is already printed as part of the PCI address. The only >>>>> other reason to add per component wrappers would be to wrap the >>>>> mechanism for getting from some random per component object back to >>>>> the intel_gt structure. But that is hardware a new issue being >>>>> added by this wrapper. It is also not a requirement. Much of the >>>>> code has a gt pointer already. For the parts that don't, some of it >>>>> would be a trivial engine->gt type dereference, some of it is a >>>>> more complex container_of type construction. But for those, the >>>>> given file will already have multiple instances of that already >>>>> (usually as the first or second line of the function - 'intel_gt >>>>> *gt = fancy_access_method(my_obj)' so adding one or two more of >>>>> those as necessary is not making the code harder to read. >>>>> >>>>>> code is obviously also bad, so one compromise idea was to add a >>>>>> macro which builds the GT string and use drm logging helpers >>>>>> directly. This would be something like: >>>>>> >>>>>> drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret); >>>>>> drm_info(GT_LOG(...same...)); >>>>> Seriously? As above, some of these lines are already way too long, >>>>> this version makes them even longer with no obvious benefit. Worse, >>>>> it makes it harder to read what is going on. It is much less >>>>> intuitive to read than just replacing the drm_err itself. And >>>>> having two sets of parenthesis with some parameters inside the >>>>> first and some only inside the second is really horrid! Also, >>>>> putting the 'gt' parameter in the middle just confuses it with the >>>>> rest of the printf arguments even though there is no %d in the >>>>> string for it. So now a quick glances tells you that your code is >>>>> wrong because you have three format specifiers but four parameters. >>>>> >>>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or >>>>> intel_gt_err) keeps everything else consistent. The first parameter >>>>> changes from 'drm' to 'gt' but is still the master object parameter >>>>> and it matches the function/macro prefix so inherently looks >>>>> correct. Then you have your message plus parameters. No confusing >>>>> orders, no confusing parenthesis, no excessive macro levels, no >>>>> confusion at all. Just nice simple, easy to read, easy to maintain >>>>> code. >>>> >>>> I am personally okay with gt_err/GT_ERR some other folks might >>>> object though. And I can also understand the argument why it is >>>> better to not have to define gt_err, gt_warn, gt_info, gt_notice, >>>> gt_debug, gt_err_ratelimited, gt_warn_once.. and instead have only >>>> one macro. >>> A small set of trivial macro definitions vs a complicated and >>> unreadable construct on every single print? Erm, isn't that the very >>> definition of abstracting to helpers as generally required by every >>> code review ever? >>> >>> And what 'other folks might object'? People already CC'd? People >>> outside of i915? >>> >>> >>>> >>>> Because of that I was passing on to you the compromise option. >>>> >>>> It maybe still has net space savings since we wouldn't have to be >>>> repeating the gt->i915->drm whatever and gt->info.id on every line. >>>> >>>> You are free to try the most compact one and see how hard those >>>> objections will be. >>> Um. I already did. This patch. And you are the only person to have >>> objected in any manner at all. >> >> Where I have objected? > Only in everything you have written. Or are you saying that actually I > could have just taken the r-b from Daniele and merged it while > completely ignoring everything you have said because you didn't say > 'NACK' at the top of your email? > >> >> I was a) asking to convert all gt/ within one kernel release, b) >> transferring the maintainer discussion from IRC to this email chain to >> outlay one alternative, for which I said I could see the pros and cons >> of both, and c) raised the naming question early since that can >> usually become a churn point later on when we have large scale code >> transformations. >> >> As said, FWIW you have my ack for GT_XXX naming and approach, but >> please do convert the whole of gt/ so we don't ship with a mish-mash >> of log messages. >> > That sounds like a quite a lot of objections to me - don't do it that > way, do it this way; fine do it that way but expect lots of complaints > when you do; oh all right, do it that way but re-write the entire driver > all at once. > > I really don't get the 'we must not ship with a mish-mash of log > messages' argument. It's hardly like the current total mish-mash of > DRM_DEBUG, DRM_DEBUG_DRIVER, pr_dbg, drm_dbg, ... is even remotely > consistent. It's also not like the patch is massively changing the > format of the output messages. So some prints get a useful extra three > characters in advance of others. Does anyone care? Would anyone except > us even notice? > > Is it really a problem to merge this patch now to get the process > started? And other sub-components get updated as and when people get the > time to do them? You could maybe even help rather than posting > completely conflicting patch sets that basically duplicate all the > effort for no actual benefit. Instead of merging this patch now, oriented on GT only, I would rather wait until we discuss and plan solution for the all sub-components. Once that's done (with agreement on naming and output) we can start converting exiting messages. My proposal would be: - use wrappers per component - use lower case names - don't add colon #define i915_xxx(_i915, _fmt, ...) \ drm_xxx(&(_i915)->drm, _fmt, ##__VA_ARGS__) #define gt_xxx(_gt, _fmt, ...) \ i915_xxx((_gt)->i915, "GT%u " _fmt, (_gt)->info.id, .. #define guc_xxx(_guc, _fmt, ...) \ gt_xxx(guc_to_gt(_guc), "GuC " _fmt, .. #define ct_xxx(_ct, _fmt, ...) \ guc_xxx(ct_to_guc(_ct), "CTB " _fmt, .. where xxx = { err, warn, notice, info, dbg } and then for calls like: i915_err(i915, "Foo failed (%pe)\n", ERR_PTR(err)); gt_err(gt, "Foo failed (%pe)\n", ERR_PTR(err)); guc_err(guc, "Foo failed (%pe)\n", ERR_PTR(err)); ct_err(ct, "Foo failed (%pe)\n", ERR_PTR(err)); the output would look like: [ ] i915 0000:00:02.0: [drm] *ERROR* Foo failed (-EIO) [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 Foo failed (-EIO) [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC Foo failed (-EIO) [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC CTB Foo failed (-EIO) Would that work for all of you? Michal > > John. > > >> Regards, >> >> Tvrtko >