On Mon, Aug 21, 2023 at 8:00 PM Limonciello, Mario <mario.limonciello@xxxxxxx> wrote: > > > > On 8/21/2023 12:52 PM, Rafael J. Wysocki wrote: > > On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario > > <mario.limonciello@xxxxxxx> wrote: > >> > >> > >> > >> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote: > >>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario > >>> <mario.limonciello@xxxxxxx> wrote: > >>>> > >>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote: > >>>> <snip> > >>>>>> I was just talking to some colleagues about PHAT recently as well. > >>>>>> > >>>>>> The use case that jumps out is "system randomly rebooted while I was > >>>>>> doing XYZ". You don't know what happened, but you keep using your > >>>>>> system. Then it happens again. > >>>>>> > >>>>>> If the reason for the random reboot is captured to dmesg you can cross > >>>>>> reference your journal from the next boot after any random reboot and > >>>>>> get the reason for it. If a user reports this to a Gitlab issue tracker > >>>>>> or Bugzilla it can be helpful in establishing a pattern. > >>>>>> > >>>>>>>> The below location may be appropriate in that case: > >>>>>>>> /sys/firmware/acpi/ > >>>>>>> > >>>>>>> Yes, it may. > > >>>>>>>> We already have FPDT and BGRT being exported from there. > >>>>>>> > >>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from > >>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the > >>>>>>> kernel to parse PHAT in particular? > >>>>>>> > >>>>>> > >>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having > >>>>>> something internal to the kernel "automatically" parsing it and saving > >>>>>> information to a place like the kernel log that is already captured by > >>>>>> existing userspace tools I think is "more" useful. > >>>>> > >>>>> What existing user space tools do you mean? Is there anything already > >>>>> making use of the kernel's PHAT output? > >>>>> > >>>> > >>>> I was meaning things like systemd already capture the kernel long > >>>> ringbuffer. If you save stuff like this into the kernel log, it's going > >>>> to be indexed and easier to grep for boots that had it. > >>>> > >>>>> And why can't user space simply parse PHAT by itself? > >>>>> > There are multiple ACPI tables that could be dumped into the kernel > >>>>> log, but they aren't. Guess why. > >>>> > >>>> Right; there's not reason it can't be done by userspace directly. > >>>> > >>>> Another way to approach this problem could be to modify tools that > >>>> excavate records from a reboot to also get PHAT. For example > >>>> systemd-pstore will get any kernel panics from the previous boot from > >>>> the EFI pstore and put them into /var/lib/systemd/pstore. > >>>> > >>>> No reason that couldn't be done automatically for PHAT too. > >>> > >>> I'm not sure about the connection between the PHAT dump in the kernel > >>> log and pstore. > >>> > >>> The PHAT dump would be from the time before the failure, so it is > >>> unclear to me how useful it can be for diagnosing it. However, after > >>> a reboot one should be able to retrieve PHAT data from the table > >>> directly and that may include some information regarding the failure. > >> > >> Right so the thought is that at bootup you get the last entry from PHAT > >> and save that into the log. > >> > >> Let's say you have 3 boots: > >> X - Triggered a random reboot > >> Y - Cleanly shut down > >> Z - Boot after a clean shut down > >> > >> So on boot Y you would have in your logs the reason that boot X rebooted. > > > > Yes, and the same can be retrieved from the PHAT directly from user > > space at that time, can't it? > > Yes it can. > > > > >> On boot Z you would see something about how boot Y's reason. > >> > >>> > >>> With pstore, the assumption is that there will be some information > >>> relevant for diagnosing the failure in the kernel buffer, but I'm not > >>> sure how the PHAT dump from before the failure can help here? > >> > >> Alone it's not useful. > >> I had figured if you can put it together with other data it's useful. > >> For example if you had some thermal data in the logs showing which > >> component overheated or if you looked at pstore and found a NULL pointer > >> dereference. > > > > IIUC, the current PHAT content can be useful. The PHAT content from > > boot X (before the failure) which is what will be there in pstore > > after the random reboot, is of limited value AFAICS. > > Right, you would need to have the pstore logs from your bad boot and > then the dmesg from your current (good) boot to get the info. And > you're right at that point you could just run a userspace tool that gets > the info instead. And it will get the information from the source without any (arguably redundant) intermediate processing (which may introduce noise into it). > I don't think any of this is necessary in the kernel, I just am > describing the use case. > > FWIW on the patch series IMO I think that the boots that don't show > useful unexpected things (power button, cold boot, warm boot, cold > reset) shouldn't be INFO either. I think these should default to debug, > and just the unexpected ones should show up. I would still prefer user space to deal with this as it sees fit.