Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/10/2022 8:35 PM, Rob Clark wrote:
On Thu, Mar 10, 2022 at 11:14 AM Sharma, Shashank
<shashank.sharma@xxxxxxx> wrote:



On 3/10/2022 7:33 PM, Abhinav Kumar wrote:


On 3/10/2022 9:40 AM, Rob Clark wrote:
On Thu, Mar 10, 2022 at 9:19 AM Sharma, Shashank
<shashank.sharma@xxxxxxx> wrote:



On 3/10/2022 6:10 PM, Rob Clark wrote:
On Thu, Mar 10, 2022 at 8:21 AM Sharma, Shashank
<shashank.sharma@xxxxxxx> wrote:



On 3/10/2022 4:24 PM, Rob Clark wrote:
On Thu, Mar 10, 2022 at 1:55 AM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:



Am 09.03.22 um 19:12 schrieb Rob Clark:
On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma
<contactshashanksharma@xxxxxxxxx> wrote:
From: Shashank Sharma <shashank.sharma@xxxxxxx>

This patch adds a new sysfs event, which will indicate
the userland about a GPU reset, and can also provide
some information like:
- process ID of the process involved with the GPU reset
- process name of the involved process
- the GPU status info (using flags)

This patch also introduces the first flag of the flags
bitmap, which can be appended as and when required.
Why invent something new, rather than using the already existing
devcoredump?

Yeah, that's a really valid question.

I don't think we need (or should encourage/allow) something drm
specific when there is already an existing solution used by both
drm
and non-drm drivers.  Userspace should not have to learn to support
yet another mechanism to do the same thing.

Question is how is userspace notified about new available core
dumps?

I haven't looked into it too closely, as the CrOS userspace
crash-reporter already had support for devcoredump, so it "just
worked" out of the box[1].  I believe a udev event is what triggers
the crash-reporter to go read the devcore dump out of sysfs.

I had a quick look at the devcoredump code, and it doesn't look like
that is sending an event to the user, so we still need an event to
indicate a GPU reset.

There definitely is an event to userspace, I suspect somewhere down
the device_add() path?


Let me check that out as well, hope that is not due to a driver-private
event for GPU reset, coz I think I have seen some of those in a few DRM
drivers.


Definitely no driver private event for drm/msm .. I haven't dug
through it all but this is the collector for devcoredump, triggered
somehow via udev.  Most likely from event triggered by device_add()

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchromium.googlesource.com%2Fchromiumos%2Fplatform2%2F%2B%2FHEAD%2Fcrash-reporter%2Fudev_collector.cc&amp;data=04%7C01%7Cshashank.sharma%40amd.com%7Cb4e920f125ae4d7de29708da02cd3112%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637825377562005233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=M4xHPErex4vn7l3lNPgniiMp%2BKb3SpOHQo2QLAndxDQ%3D&amp;reserved=0


Yes, that is correct. the uevent for devcoredump is from device_add()

Yes, I could confirm in the code that device_add() sends a uevent.

kobject_uevent(&dev->kobj, KOBJ_ADD);

I was trying to map the ChromiumOs's udev event rules with the event
being sent from device_add(), what I could see is there is only one udev
rule for any DRM subsystem events in ChromiumOs's 99-crash-reporter.rules:

ACTION=="change", SUBSYSTEM=="drm", KERNEL=="card0", ENV{ERROR}=="1", \
    RUN+="/sbin/crash_reporter
--udev=KERNEL=card0:SUBSYSTEM=drm:ACTION=change"

Can someone confirm that this is the rule which gets triggered when a
devcoredump is generated ? I could not find an ERROR=1 string in the
env[] while sending this event from dev_add();

I think it is actually this rule:

ACTION=="add", SUBSYSTEM=="devcoredump", \
   RUN+="/sbin/crash_reporter
--udev=SUBSYSTEM=devcoredump:ACTION=add:KERNEL_NUMBER=%n"

It is something non-drm specific because it supports devcore dumps
from non drm drivers.  I know at least some of the wifi and remoteproc
drivers use it.


Ah, this seems like a problem for me. I understand it will work for a reset/recovery app well, but if a DRM client (like a compositor), who wants to listen only to DRM events (like a GPU reset), wouldn't this create a lot of noise for it ? Like every time any subsystem produces this coredump, there will be a change in devcoresump subsystem, and the client will have to parse the core file, and then will have to decide if it wants to react to this, or ignore.

Wouldn't a GPU reset event, specific to DRM subsystem server better in such case ?

- Shashank

BR,
-R



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux