On Thu, Mar 10, 2022 at 11:14 AM Sharma, Shashank <shashank.sharma@xxxxxxx> wrote: > > > > On 3/10/2022 7:33 PM, Abhinav Kumar wrote: > > > > > > On 3/10/2022 9:40 AM, Rob Clark wrote: > >> On Thu, Mar 10, 2022 at 9:19 AM Sharma, Shashank > >> <shashank.sharma@xxxxxxx> wrote: > >>> > >>> > >>> > >>> On 3/10/2022 6:10 PM, Rob Clark wrote: > >>>> On Thu, Mar 10, 2022 at 8:21 AM Sharma, Shashank > >>>> <shashank.sharma@xxxxxxx> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 3/10/2022 4:24 PM, Rob Clark wrote: > >>>>>> On Thu, Mar 10, 2022 at 1:55 AM Christian König > >>>>>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Am 09.03.22 um 19:12 schrieb Rob Clark: > >>>>>>>> On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma > >>>>>>>> <contactshashanksharma@xxxxxxxxx> wrote: > >>>>>>>>> From: Shashank Sharma <shashank.sharma@xxxxxxx> > >>>>>>>>> > >>>>>>>>> This patch adds a new sysfs event, which will indicate > >>>>>>>>> the userland about a GPU reset, and can also provide > >>>>>>>>> some information like: > >>>>>>>>> - process ID of the process involved with the GPU reset > >>>>>>>>> - process name of the involved process > >>>>>>>>> - the GPU status info (using flags) > >>>>>>>>> > >>>>>>>>> This patch also introduces the first flag of the flags > >>>>>>>>> bitmap, which can be appended as and when required. > >>>>>>>> Why invent something new, rather than using the already existing > >>>>>>>> devcoredump? > >>>>>>> > >>>>>>> Yeah, that's a really valid question. > >>>>>>> > >>>>>>>> I don't think we need (or should encourage/allow) something drm > >>>>>>>> specific when there is already an existing solution used by both > >>>>>>>> drm > >>>>>>>> and non-drm drivers. Userspace should not have to learn to support > >>>>>>>> yet another mechanism to do the same thing. > >>>>>>> > >>>>>>> Question is how is userspace notified about new available core > >>>>>>> dumps? > >>>>>> > >>>>>> I haven't looked into it too closely, as the CrOS userspace > >>>>>> crash-reporter already had support for devcoredump, so it "just > >>>>>> worked" out of the box[1]. I believe a udev event is what triggers > >>>>>> the crash-reporter to go read the devcore dump out of sysfs. > >>>>> > >>>>> I had a quick look at the devcoredump code, and it doesn't look like > >>>>> that is sending an event to the user, so we still need an event to > >>>>> indicate a GPU reset. > >>>> > >>>> There definitely is an event to userspace, I suspect somewhere down > >>>> the device_add() path? > >>>> > >>> > >>> Let me check that out as well, hope that is not due to a driver-private > >>> event for GPU reset, coz I think I have seen some of those in a few DRM > >>> drivers. > >>> > >> > >> Definitely no driver private event for drm/msm .. I haven't dug > >> through it all but this is the collector for devcoredump, triggered > >> somehow via udev. Most likely from event triggered by device_add() > >> > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchromium.googlesource.com%2Fchromiumos%2Fplatform2%2F%2B%2FHEAD%2Fcrash-reporter%2Fudev_collector.cc&data=04%7C01%7Cshashank.sharma%40amd.com%7C86146416b717420501fc08da02c4785b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637825340130157925%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=LncI%2F5mIpeG1Avj2YXLmbZ5f1ONUfpf6TzJZH3%2Fs8%2Fw%3D&reserved=0 > >> > > > > Yes, that is correct. the uevent for devcoredump is from device_add() > > > Yes, I could confirm in the code that device_add() sends a uevent. > > kobject_uevent(&dev->kobj, KOBJ_ADD); > > I was trying to map the ChromiumOs's udev event rules with the event > being sent from device_add(), what I could see is there is only one udev > rule for any DRM subsystem events in ChromiumOs's 99-crash-reporter.rules: > > ACTION=="change", SUBSYSTEM=="drm", KERNEL=="card0", ENV{ERROR}=="1", \ > RUN+="/sbin/crash_reporter > --udev=KERNEL=card0:SUBSYSTEM=drm:ACTION=change" > > Can someone confirm that this is the rule which gets triggered when a > devcoredump is generated ? I could not find an ERROR=1 string in the > env[] while sending this event from dev_add(); I think it is actually this rule: ACTION=="add", SUBSYSTEM=="devcoredump", \ RUN+="/sbin/crash_reporter --udev=SUBSYSTEM=devcoredump:ACTION=add:KERNEL_NUMBER=%n" It is something non-drm specific because it supports devcore dumps from non drm drivers. I know at least some of the wifi and remoteproc drivers use it. BR, -R