On Tue, May 14, 2024 at 11:13:14AM -0700, John Harrison wrote: > On 5/14/2024 07:58, Michal Wajdeczko wrote: > > On 13.05.2024 18:53, John Harrison wrote: > > > On 5/12/2024 08:36, Michal Wajdeczko wrote: > > > > We already provide the content of the GuC log in debugsfs, but it > > > > is in a text format where each log dword is printed as hexadecimal > > > > number, which does not scale well with large GuC log buffers. > > > > > > > > To allow more efficient access to the GuC log, which could benefit > > > > our CI systems, expose raw binary log data. In addition to less > > > > overhead in preparing text based GuC log file, the new GuC log file > > > > in binary format is also almost 3x smaller. > > > > > > > > Any existing script that expects the GuC log buffer in text format > > > > can use command like below to convert from new binary format: > > > > > > > > hexdump -e '4/4 "0x%08x " "\n"' > > > > > > > > but this shouldn't be the case as most decoders expect GuC log data > > > > in binary format. > > > I strongly disagree with this. > > > > > > Efficiency and file size is not an issue when accessing the GuC log via > > > debugfs on actual hardware. > > to some extend it is as CI team used to refuse to collect GuC logs after > > each executed test just because of it's size > I've never heard that argument. I've heard many different arguments but not > one about file size. The default GuC log size is pretty tiny. So size really > is not an issue. > > > > > > It is an issue when dumping via dmesg but > > > you definitely should not be dumping binary data to dmesg. Whereas, > > not following here - this is debugfs specific, not a dmesg printer > Except that it is preferable to have common code for both if at all > possible. > > > > > > dumping in binary data is much more dangerous and liable to corruption > > > because some tool along the way tries to convert to ASCII, or truncates > > > at the first zero, etc. We request GuC logs be sent by end users, > > > customer bug reports, etc. all doing things that we have no control over. > > hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ? > Because someone then tries to email it, or attach it or copy it via Windows > or any number of other ways in which a file can get munged. > > > > > > Converting the hexdump back to binary is trivial for those tools which > > > require it. If you follow the acquisition and decoding instructions on > > > the wiki page then it is all done for you automatically. > > I'm afraid I don't know where this wiki page is, but I do know that hex > > conversion dance is not needed for me to get decoded GuC log the way I > > used to do > Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty easy > to find. > > > > These patches are trying to solve a problem which does not exist and are > > > going to make working with GuC logs harder and more error prone. > > it at least solves the problem of currently super inefficient way of > > generating the GuC log in text format. > > > > it also opens other opportunities to develop tools that could monitor or > > capture GuC log independently on top of what driver is able to offer > > today (on i915 there was guc-log-relay, but it was broken for long time, > > not sure what are the plans for Xe) > > > > also still not sure how it can be more error prone. > As already explained, the plan is move to LFD - an extensible, streamable, > logging format. Any non-trivial effort that is not helping to move to LFD is > not worth the effort. > > > > > > On the other hand, there are many other issues with GuC logs that it > > > would be useful to solves - including extra meta data, reliable output > > > via dmesg, continuous streaming, pre-sizing the debugfs file to not have > > > to generate it ~12 times for a single read, etc. > > this series actually solves last issue but in a bit different way (we > > even don't need to generate full GuC log dump at all if we would like to > > capture only part of the log if we know where to look) > No, it doesn't solve it. Your comment below suggests it will be read in 4KB > chunks. Which means your 16MB buffer now requires 4096 separate reads! And > you only doing partial reads of the section you think you need is never > going to be reliable on live system. Not sure why you would want to anyway. > It is just making things much more complex. You now need an intelligent user > land program to read the log out and decode at least the header section to > know what data section to read. You can't just dump the whole thing with > 'cat' or 'dd'. > Briefly have read this thread but seconding John's opinion that anything which breaks GuC log collection via a simple command like cat is a no go. Also anything that can allow windows to mangle the output would be less than idle too. Lastly, GuC log collection is not a critical path operation so it likely does not need to optimized for speed. Matt > > > > for reliable output via dmesg - see my proposal at [1] > > > > [1] https://patchwork.freedesktop.org/series/133613/ > > > > > > Hmm. Actually, is this interface allowing the filesystem layers to issue > > > multiple read calls to read the buffer out in small chunks? That is also > > > going to break things. If the GuC is still writing to the log as the > > > user is reading from it, there is the opportunity for each chunk to not > > > follow on from the previous chunk because the data has just been > > > overwritten. This is already a problem at the moment that causes issues > > > when decoding the logs, even with an almost atomic copy of the log into > > > a temporary buffer before reading it out. Doing the read in separate > > > chunks is only going to make that problem even worse. > > current solution, that converts data into hex numbers, reads log buffer > > in chunks of 128 dwords, how proposed here solution that reads in 4K > > chunks could be "even worse" ? > See above, 4KB chunks means 4096 separate reads for a 16M buffer. And each > one of those reads is a full round trip to user land and back. If you want > to get at all close to an atomic read of the log then it needs to be done as > a single call that copies the log into a locally allocated kernel buffer and > then allows user land to read out from that buffer rather than from the live > log. Which can be trivially done with the current method (at the expense of > a large memory allocation) but would be much more difficult with random > access reader like this as you would need to say the copied buffer around > until the reads have all been done. Which would presumably mean adding > open/close handlers to allocate and free that memory. > > > > > and in case of some smart tool, that would understands the layout of the > > GuC log buffer, we can even fully eliminate problem of reading stale > > data, so why not to choose a more scalable solution ? > You cannot eliminate the problem of stale data. You read the header, you > read the data it was pointing to, you re-read the header and find that the > GuC has moved on. That is an infinite loop of continuously updating > pointers. > > John. > > > > > > John. > > > > > > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> > > > > Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> > > > > Cc: John Harrison <John.C.Harrison@xxxxxxxxx> > > > > --- > > > > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > > > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > > --- > > > > drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++ > > > > 1 file changed, 26 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c > > > > b/drivers/gpu/drm/xe/xe_guc_debugfs.c > > > > index d3822cbea273..53fea952344d 100644 > > > > --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c > > > > +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c > > > > @@ -8,6 +8,7 @@ > > > > #include <drm/drm_debugfs.h> > > > > #include <drm/drm_managed.h> > > > > +#include "xe_bo.h" > > > > #include "xe_device.h" > > > > #include "xe_gt.h" > > > > #include "xe_guc.h" > > > > @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = { > > > > {"guc_log", guc_log, 0}, > > > > }; > > > > +static ssize_t guc_log_read(struct file *file, char __user *buf, > > > > size_t count, loff_t *pos) > > > > +{ > > > > + struct dentry *dent = file_dentry(file); > > > > + struct dentry *uc_dent = dent->d_parent; > > > > + struct dentry *gt_dent = uc_dent->d_parent; > > > > + struct xe_gt *gt = gt_dent->d_inode->i_private; > > > > + struct xe_guc_log *log = >->uc.guc.log; > > > > + struct xe_device *xe = gt_to_xe(gt); > > > > + ssize_t ret; > > > > + > > > > + xe_pm_runtime_get(xe); > > > > + ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap, > > > > log->bo->size); > > > > + xe_pm_runtime_put(xe); > > > > + > > > > + return ret; > > > > +} > > > > + > > > > +static const struct file_operations guc_log_ops = { > > > > + .owner = THIS_MODULE, > > > > + .read = guc_log_read, > > > > + .llseek = default_llseek, > > > > +}; > > > > + > > > > void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent) > > > > { > > > > struct drm_minor *minor = guc_to_xe(guc)->drm.primary; > > > > @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc, > > > > struct dentry *parent) > > > > drm_debugfs_create_files(local, > > > > ARRAY_SIZE(debugfs_list), > > > > parent, minor); > > > > + > > > > + debugfs_create_file("guc_log_raw", 0600, parent, NULL, > > > > &guc_log_ops); > > > > } >