Re: [PATCH 4/4] drm/xe/guc: Expose raw access to GuC log over debugfs

Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> · Tue, 14 May 2024 23:15:55 +0200

On 14.05.2024 22:31, Matthew Brost wrote:
> On Tue, May 14, 2024 at 11:13:14AM -0700, John Harrison wrote:
>> On 5/14/2024 07:58, Michal Wajdeczko wrote:
>>> On 13.05.2024 18:53, John Harrison wrote:
>>>> On 5/12/2024 08:36, Michal Wajdeczko wrote:
>>>>> We already provide the content of the GuC log in debugsfs, but it
>>>>> is in a text format where each log dword is printed as hexadecimal
>>>>> number, which does not scale well with large GuC log buffers.
>>>>>
>>>>> To allow more efficient access to the GuC log, which could benefit
>>>>> our CI systems, expose raw binary log data.  In addition to less
>>>>> overhead in preparing text based GuC log file, the new GuC log file
>>>>> in binary format is also almost 3x smaller.
>>>>>
>>>>> Any existing script that expects the GuC log buffer in text format
>>>>> can use command like below to convert from new binary format:
>>>>>
>>>>>      hexdump -e '4/4 "0x%08x " "\n"'
>>>>>
>>>>> but this shouldn't be the case as most decoders expect GuC log data
>>>>> in binary format.
>>>> I strongly disagree with this.
>>>>
>>>> Efficiency and file size is not an issue when accessing the GuC log via
>>>> debugfs on actual hardware.
>>> to some extend it is as CI team used to refuse to collect GuC logs after
>>> each executed test just because of it's size
>> I've never heard that argument. I've heard many different arguments but not
>> one about file size. The default GuC log size is pretty tiny. So size really
>> is not an issue.
>>
>>>
>>>> It is an issue when dumping via dmesg but
>>>> you definitely should not be dumping binary data to dmesg. Whereas,
>>> not following here - this is debugfs specific, not a dmesg printer
>> Except that it is preferable to have common code for both if at all
>> possible.
>>
>>>
>>>> dumping in binary data is much more dangerous and liable to corruption
>>>> because some tool along the way tries to convert to ASCII, or truncates
>>>> at the first zero, etc. We request GuC logs be sent by end users,
>>>> customer bug reports, etc. all doing things that we have no control over.
>>> hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
>> Because someone then tries to email it, or attach it or copy it via Windows
>> or any number of other ways in which a file can get munged.
>>
>>>
>>>> Converting the hexdump back to binary is trivial for those tools which
>>>> require it. If you follow the acquisition and decoding instructions on
>>>> the wiki page then it is all done for you automatically.
>>> I'm afraid I don't know where this wiki page is, but I do know that hex
>>> conversion dance is not needed for me to get decoded GuC log the way I
>>> used to do
>> Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty easy
>> to find.
>>
>>>> These patches are trying to solve a problem which does not exist and are
>>>> going to make working with GuC logs harder and more error prone.
>>> it at least solves the problem of currently super inefficient way of
>>> generating the GuC log in text format.
>>>
>>> it also opens other opportunities to develop tools that could monitor or
>>> capture GuC log independently on  top of what driver is able to offer
>>> today (on i915 there was guc-log-relay, but it was broken for long time,
>>> not sure what are the plans for Xe)
>>>
>>> also still not sure how it can be more error prone.
>> As already explained, the plan is move to LFD - an extensible, streamable,
>> logging format. Any non-trivial effort that is not helping to move to LFD is
>> not worth the effort.
>>
>>>
>>>> On the other hand, there are many other issues with GuC logs that it
>>>> would be useful to solves - including extra meta data, reliable output
>>>> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
>>>> to generate it ~12 times for a single read, etc.
>>> this series actually solves last issue but in a bit different way (we
>>> even don't need to generate full GuC log dump at all if we would like to
>>> capture only part of the log if we know where to look)
>> No, it doesn't solve it. Your comment below suggests it will be read in 4KB
>> chunks. Which means your 16MB buffer now requires 4096 separate reads! And
>> you only doing partial reads of the section you think you need is never
>> going to be reliable on live system. Not sure why you would want to anyway.
>> It is just making things much more complex. You now need an intelligent user
>> land program to read the log out and decode at least the header section to
>> know what data section to read. You can't just dump the whole thing with
>> 'cat' or 'dd'.
>>
> 
> Briefly have read this thread but seconding John's opinion that anything
> which breaks GuC log collection via a simple command like cat is a no

hexdump or cp is also a simple command and likely we can assume that
either user will know what command to use or will just type the command
that we say.

> go. Also anything that can allow windows to mangle the output would be
> less than idle too. Lastly, GuC log collection is not a critical path
> operation so it likely does not need to optimized for speed.

but please remember that this patch does not change anything to the
existing debugfs files, the guc_log stays as is, this new raw access is
defined as another guc_log_raw file that would allow develop other use
cases, beyond what is possible with naive text snapshots, like live
monitor for errors, all implemented above kernel driver

> 
> Matt
> 
>>>
>>> for reliable output via dmesg - see my proposal at [1]
>>>
>>> [1] https://patchwork.freedesktop.org/series/133613/
>>
>>>
>>>> Hmm. Actually, is this interface allowing the filesystem layers to issue
>>>> multiple read calls to read the buffer out in small chunks? That is also
>>>> going to break things. If the GuC is still writing to the log as the
>>>> user is reading from it, there is the opportunity for each chunk to not
>>>> follow on from the previous chunk because the data has just been
>>>> overwritten. This is already a problem at the moment that causes issues
>>>> when decoding the logs, even with an almost atomic copy of the log into
>>>> a temporary buffer before reading it out. Doing the read in separate
>>>> chunks is only going to make that problem even worse.
>>> current solution, that converts data into hex numbers, reads log buffer
>>> in chunks of 128 dwords, how proposed here solution that reads in 4K
>>> chunks could be "even worse" ?
>> See above, 4KB chunks means 4096 separate reads for a 16M buffer. And each
>> one of those reads is a full round trip to user land and back. If you want
>> to get at all close to an atomic read of the log then it needs to be done as
>> a single call that copies the log into a locally allocated kernel buffer and
>> then allows user land to read out from that buffer rather than from the live
>> log. Which can be trivially done with the current method (at the expense of
>> a large memory allocation) but would be much more difficult with random
>> access reader like this as you would need to say the copied buffer around
>> until the reads have all been done. Which would presumably mean adding
>> open/close handlers to allocate and free that memory.
>>
>>>
>>> and in case of some smart tool, that would understands the layout of the
>>> GuC log buffer, we can even fully eliminate problem of reading stale
>>> data, so why not to choose a more scalable solution ?
>> You cannot eliminate the problem of stale data. You read the header, you
>> read the data it was pointing to, you re-read the header and find that the
>> GuC has moved on. That is an infinite loop of continuously updating
>> pointers.
>>
>> John.
>>
>>>
>>>> John.
>>>>
>>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
>>>>> Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>>>>> Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
>>>>> ---
>>>>> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
>>>>> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
>>>>> ---
>>>>>    drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>>>>>    1 file changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> index d3822cbea273..53fea952344d 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> @@ -8,6 +8,7 @@
>>>>>    #include <drm/drm_debugfs.h>
>>>>>    #include <drm/drm_managed.h>
>>>>>    +#include "xe_bo.h"
>>>>>    #include "xe_device.h"
>>>>>    #include "xe_gt.h"
>>>>>    #include "xe_guc.h"
>>>>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>>>>>        {"guc_log", guc_log, 0},
>>>>>    };
>>>>>    +static ssize_t guc_log_read(struct file *file, char __user *buf,
>>>>> size_t count, loff_t *pos)
>>>>> +{
>>>>> +    struct dentry *dent = file_dentry(file);
>>>>> +    struct dentry *uc_dent = dent->d_parent;
>>>>> +    struct dentry *gt_dent = uc_dent->d_parent;
>>>>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
>>>>> +    struct xe_guc_log *log = &gt->uc.guc.log;
>>>>> +    struct xe_device *xe = gt_to_xe(gt);
>>>>> +    ssize_t ret;
>>>>> +
>>>>> +    xe_pm_runtime_get(xe);
>>>>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
>>>>> log->bo->size);
>>>>> +    xe_pm_runtime_put(xe);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static const struct file_operations guc_log_ops = {
>>>>> +    .owner        = THIS_MODULE,
>>>>> +    .read        = guc_log_read,
>>>>> +    .llseek        = default_llseek,
>>>>> +};
>>>>> +
>>>>>    void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
>>>>>    {
>>>>>        struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
>>>>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
>>>>> struct dentry *parent)
>>>>>        drm_debugfs_create_files(local,
>>>>>                     ARRAY_SIZE(debugfs_list),
>>>>>                     parent, minor);
>>>>> +
>>>>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
>>>>> &guc_log_ops);
>>>>>    }
>>