On Tue, Apr 11, 2023 at 09:41:04AM -0700, John Harrison wrote: > On 4/11/2023 07:41, Rodrigo Vivi wrote: > > On Mon, Apr 10, 2023 at 12:25:21PM -0700, John.C.Harrison@xxxxxxxxx wrote: > > > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > > > > > Sometimes, the only effective way to debug an issue is to dump all the > > > interesting information at the point of failure. So add support for > > > doing that. > > No! Please no! > > We have some of this on Xe and I'm hating it. I'm going to try to remove > > from there soon. It is horrible when you lost the hability to use dmesg > > directly because it goes over the number of lines it saves... or even > > with dmesg -w it goes over the number of lines of your terminal... > > or the ssh and serial slowness when printing a bunch of information. > > > > We probably want to be able to capture multiple error states and be > > able to cross them with a kernel timeline, but definitely not overflood > > our log terminals. > I think you are missing the point. > > This is the emergency backup plan for when nothing else works. It is not on > by default. It should never happen on an end user system unless we > specifically request them to run with a patched kernel to enable a dump at a > specific point. > > But there are (many) times when nothing else works. In those instances, it > is extremely useful to be able to dump the system state in this manner. > > It is code we have been using internally for some time and it has helped > resolve a number of different difficult to debug bugs. As our Xe generation > platforms are now out in the wild and no longer just internal, it is also > proving important to have this facility available in upstream trees as well. > And having it merged rather than floating around as random patches passed > from person to person is far easier to manage and would also help reduce the > internal tree burden. Note that Xe needs to move over to devcoredump infrastructure, so if you need dumping straight to dmesg that would be a patch for that subsystem in the future. Not sure how much you want to add fun here in the i915-gem deadend, I'll leave that up to i915 maintainers. Just figured this is a good place to drop this aside :-) -Daniel > > John. > > > > Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx> > > > > > > > > > John Harrison (2): > > > drm/i915: Dump error capture to kernel log > > > drm/i915/guc: Dump error capture to dmesg on CTB error > > > > > > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 53 +++++++++ > > > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 + > > > drivers/gpu/drm/i915/i915_gpu_error.c | 130 ++++++++++++++++++++++ > > > drivers/gpu/drm/i915/i915_gpu_error.h | 8 ++ > > > 4 files changed, 197 insertions(+) > > > > > > -- > > > 2.39.1 > > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch