Hi Hans, Apologies for the long delay. On 9/18/2023 5:57 PM, Hans de Goede wrote: > Hi Shyam, > > On 9/10/23 16:20, Shyam Sundar S K wrote: >> There have been instances when the default size (1M) of the STB is not >> sufficient to get the complete traces of the failure. In such scenarios >> we can use a module_param to enable full trace that shall contain more >> debugging data. This is not a regular case and hence not enabling this >> capability by default. >> >> With this change, there will be two cases on how the driver fetches the >> stb data: >> 1) A special case (proposed now) - which is required only for certain >> platforms. Here, a new module param will be supplied to the driver that >> will have a special PMFW supporting enhanced dram sizes for getting >> the stb data. Without the special PMFW support, just setting the module >> param will not help to get the enhanced stb data. >> >> 2) Current code branch which fetches the stb data based on the parameters >> like the num_samples, fsize and the r/w pointer. >> >> Co-developed-by: Harsh Jain <Harsh.Jain@xxxxxxx> >> Signed-off-by: Harsh Jain <Harsh.Jain@xxxxxxx> >> Signed-off-by: Sanket Goswami <Sanket.Goswami@xxxxxxx> >> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx> >> --- >> v3->v4: >> - Update code branches and commit-msg as per Ilpo's remark. >> >> v2->v3: >> - no change >> >> v1->v2: >> - rebase to 'review-hans' branch >> - drop 2/4 of v1 >> (https://patchwork.kernel.org/project/platform-driver-x86/list/?series=775324&state=%2A&archive=both) >> >> drivers/platform/x86/amd/pmc/pmc.c | 43 +++++++++++++++++++++--------- >> drivers/platform/x86/amd/pmc/pmc.h | 1 + >> 2 files changed, 32 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c >> index 443bb78ea5f4..7e907cb50787 100644 >> --- a/drivers/platform/x86/amd/pmc/pmc.c >> +++ b/drivers/platform/x86/amd/pmc/pmc.c >> @@ -53,6 +53,7 @@ >> >> /* STB Spill to DRAM Parameters */ >> #define S2D_TELEMETRY_BYTES_MAX 0x100000 >> +#define S2D_TELEMETRY_FSIZE_MAX 0x200000 >> #define S2D_TELEMETRY_DRAMBYTES_MAX 0x1000000 >> >> /* STB Spill to DRAM Message Definition */ >> @@ -160,6 +161,10 @@ static bool disable_workarounds; >> module_param(disable_workarounds, bool, 0644); >> MODULE_PARM_DESC(disable_workarounds, "Disable workarounds for platform bugs"); >> >> +static bool dump_custom_stb; >> +module_param(dump_custom_stb, bool, 0644); >> +MODULE_PARM_DESC(dump_custom_stb, "Enable to dump full STB buffer"); >> + >> static struct amd_pmc_dev pmc; >> static int amd_pmc_send_cmd(struct amd_pmc_dev *dev, u32 arg, u32 *data, u8 msg, bool ret); >> static int amd_pmc_read_stb(struct amd_pmc_dev *dev, u32 *buf); >> @@ -239,7 +244,7 @@ static const struct file_operations amd_pmc_stb_debugfs_fops = { >> static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp) >> { >> struct amd_pmc_dev *dev = filp->f_inode->i_private; >> - u32 *buf, fsize, num_samples, val, stb_rdptr_offset = 0; >> + u32 *buf, num_samples, val, stb_rdptr_offset = 0; >> int ret; >> >> /* Write dummy postcode while reading the STB buffer */ >> @@ -247,10 +252,6 @@ static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp) >> if (ret) >> dev_err(dev->dev, "error writing to STB: %d\n", ret); >> >> - buf = kzalloc(S2D_TELEMETRY_BYTES_MAX, GFP_KERNEL); >> - if (!buf) >> - return -ENOMEM; >> - >> /* Spill to DRAM num_samples uses separate SMU message port */ >> dev->msg_port = 1; >> >> @@ -264,20 +265,36 @@ static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp) >> dev->msg_port = 0; >> if (ret) { >> dev_err(dev->dev, "error: S2D_NUM_SAMPLES not supported : %d\n", ret); >> - kfree(buf); >> return ret; >> } >> >> /* Start capturing data from the last push location */ >> - if (num_samples > S2D_TELEMETRY_BYTES_MAX) { >> - fsize = S2D_TELEMETRY_BYTES_MAX; >> - stb_rdptr_offset = num_samples - fsize; >> + if (dump_custom_stb && >> + (dev->dram_size - S2D_TELEMETRY_BYTES_MAX <= S2D_TELEMETRY_FSIZE_MAX)) { >> + /* >> + * we have a custom stb size and the PMFW is supposed to give >> + * the enhanced dram size. Note that we land here only for the >> + * platforms that support enhanced dram size reporting. >> + */ >> + dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX; >> + stb_rdptr_offset = 0; > > I don't understand this part. Why is num_samples not taken into account > anymore ? and why substract S2D_TELEMETRY_BYTES_MAX from dram_size ? Like I tried to capture this information in the commit-msg; this is a special case where the standard STB size reported by the SMU FW will not be useful to debug the failures. In those identified platforms, there be a custom SMU FW running and it will have the enhanced FW reporting capability. So, in those "identified" platforms, we don't need to look at the num_samples. That's an agreed protocol between the driver and FW for this case. not the entire DRAM size reported by the FW is usable, it has a reserved space of 1M. Hence we have to subtract that while accouting the dev->fsize here. > > > >> + } else if (num_samples > S2D_TELEMETRY_BYTES_MAX) { >> + /* >> + * This is for general cases, where the stb limits are meant for >> + * standard usage >> + */ >> + dev->fsize = S2D_TELEMETRY_BYTES_MAX; >> + stb_rdptr_offset = num_samples - dev->fsize; > > This assumes that num_samples is in the S2D_TELEMETRY_BYTES_MAX+1 .. 2*S2D_TELEMETRY_BYTES_MAX > rang, what if it is more ? > > I think that what you want here is: > > stb_rdptr_offset = num_samples % S2D_TELEMETRY_BYTES_MAX; No, this does not work. Let's take an example: dev->fsize = S2D_TELEMETRY_BYTES_MAX (i.e. 0x100000) num_samples = 0x8218b8 existing: stb_rdptr_offset = num_samples - dev->fsize; we will get => 0x7218b8 but if we do: stb_rdptr_offset = num_samples % S2D_TELEMETRY_BYTES_MAX; we will get => 0x21ea8 You can see that r/w pointer would get corrupted if we do % This is only one example and the same holds true for any other examples too. I spoke to our FW team too to confirm if the driver interpretation is as per expectation and their answer was YES. So I feel the current code is actually doing the right thing to calculate the r/w offset. Thanks, Shyam > > Note this is a pre-existing problem but I just noticed this now. > > Regards, > > Hans > >> } else { >> - fsize = num_samples; >> + dev->fsize = num_samples; >> stb_rdptr_offset = 0; >> } >> >> - memcpy_fromio(buf, dev->stb_virt_addr + stb_rdptr_offset, fsize); >> + buf = kzalloc(dev->fsize, GFP_KERNEL); >> + if (!buf) >> + return -ENOMEM; >> + >> + memcpy_fromio(buf, dev->stb_virt_addr + stb_rdptr_offset, dev->fsize); >> filp->private_data = buf; >> >> return 0; >> @@ -286,11 +303,13 @@ static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp) >> static ssize_t amd_pmc_stb_debugfs_read_v2(struct file *filp, char __user *buf, size_t size, >> loff_t *pos) >> { >> + struct amd_pmc_dev *dev = filp->f_inode->i_private; >> + >> if (!filp->private_data) >> return -EINVAL; >> >> return simple_read_from_buffer(buf, size, pos, filp->private_data, >> - S2D_TELEMETRY_BYTES_MAX); >> + dev->fsize); >> } >> >> static int amd_pmc_stb_debugfs_release_v2(struct inode *inode, struct file *filp) >> diff --git a/drivers/platform/x86/amd/pmc/pmc.h b/drivers/platform/x86/amd/pmc/pmc.h >> index c27bd6a5642f..f73d265430b8 100644 >> --- a/drivers/platform/x86/amd/pmc/pmc.h >> +++ b/drivers/platform/x86/amd/pmc/pmc.h >> @@ -26,6 +26,7 @@ struct amd_pmc_dev { >> u32 dram_size; >> u32 num_ips; >> u32 s2d_msg_id; >> + u32 fsize; >> /* SMU version information */ >> u8 smu_program; >> u8 major; >