On 8/30/2023 3:39 PM, Ilpo Järvinen wrote: > On Wed, 30 Aug 2023, Shyam Sundar S K wrote: >> On 8/29/2023 8:00 PM, Ilpo Järvinen wrote: >>> On Tue, 29 Aug 2023, Shyam Sundar S K wrote: >>>> On 8/29/2023 4:49 PM, Ilpo Järvinen wrote: >>>>> On Tue, 29 Aug 2023, Shyam Sundar S K wrote: >>>>>> On 8/29/2023 3:09 PM, Ilpo Järvinen wrote: >>>>>>> On Tue, 29 Aug 2023, Shyam Sundar S K wrote: >>>>>>> >>>>>>>> There have been instances when the default size (1M) of the STB is not >>>>>>>> sufficient to get the complete traces of the failure. In such scenarios >>>>>>>> we can use a module_param to enable full trace that shall contain more >>>>>>>> debugging data. This is not a regular case and hence not enabling this >>>>>>>> capability by default. >>>>>>>> >>>>>>>> Co-developed-by: Harsh Jain <Harsh.Jain@xxxxxxx> >>>>>>>> Signed-off-by: Harsh Jain <Harsh.Jain@xxxxxxx> >>>>>>>> Signed-off-by: Sanket Goswami <Sanket.Goswami@xxxxxxx> >>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx> >>>>>>>> --- >>>>>>>> v2->v3: >>>>>>>> - no change >>>>>>>> >>>>>>>> v1->v2: >>>>>>>> - rebase to 'review-hans' branch >>>>>>>> - drop 2/4 of v1 >>>>>>>> (https://patchwork.kernel.org/project/platform-driver-x86/list/?series=775324&state=%2A&archive=both) >>>>>>>> >>>>>>>> drivers/platform/x86/amd/pmc/pmc.c | 34 +++++++++++++++++++----------- >>>>>>>> drivers/platform/x86/amd/pmc/pmc.h | 1 + >>>>>>>> 2 files changed, 23 insertions(+), 12 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c >>>>>>>> index 443bb78ea5f4..0e61ae74f1a9 100644 >>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.c >>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c >>>>>>>> @@ -53,6 +53,7 @@ >>>>>>>> >>>>>>>> /* STB Spill to DRAM Parameters */ >>>>>>>> #define S2D_TELEMETRY_BYTES_MAX 0x100000 >>>>>>>> +#define S2D_TELEMETRY_FSIZE_MAX 0x200000 >>>>>>>> #define S2D_TELEMETRY_DRAMBYTES_MAX 0x1000000 >>>>>>>> >>>>>>>> /* STB Spill to DRAM Message Definition */ >>>>>>>> @@ -160,6 +161,10 @@ static bool disable_workarounds; >>>>>>>> module_param(disable_workarounds, bool, 0644); >>>>>>>> MODULE_PARM_DESC(disable_workarounds, "Disable workarounds for platform bugs"); >>>>>>>> >>>>>>>> +static bool dump_custom_stb; >>>>>>>> +module_param(dump_custom_stb, bool, 0644); >>>>>>>> +MODULE_PARM_DESC(dump_custom_stb, "Enable to dump full STB buffer"); >>>>>>>> + >>>>>>>> static struct amd_pmc_dev pmc; >>>>>>>> static int amd_pmc_send_cmd(struct amd_pmc_dev *dev, u32 arg, u32 *data, u8 msg, bool ret); >>>>>>>> static int amd_pmc_read_stb(struct amd_pmc_dev *dev, u32 *buf); >>>>>>>> @@ -239,7 +244,7 @@ static const struct file_operations amd_pmc_stb_debugfs_fops = { >>>>>>>> static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp) >>>>>>>> { >>>>>>>> struct amd_pmc_dev *dev = filp->f_inode->i_private; >>>>>>>> - u32 *buf, fsize, num_samples, val, stb_rdptr_offset = 0; >>>>>>>> + u32 *buf, num_samples, val, stb_rdptr_offset = 0; >>>>>>>> int ret; >>>>>>>> >>>>>>>> /* Write dummy postcode while reading the STB buffer */ >>>>>>>> @@ -247,10 +252,6 @@ static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp) >>>>>>>> if (ret) >>>>>>>> dev_err(dev->dev, "error writing to STB: %d\n", ret); >>>>>>>> >>>>>>>> - buf = kzalloc(S2D_TELEMETRY_BYTES_MAX, GFP_KERNEL); >>>>>>>> - if (!buf) >>>>>>>> - return -ENOMEM; >>>>>>>> - >>>>>>>> /* Spill to DRAM num_samples uses separate SMU message port */ >>>>>>>> dev->msg_port = 1; >>>>>>>> >>>>>>>> @@ -264,20 +265,27 @@ static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp) >>>>>>>> dev->msg_port = 0; >>>>>>>> if (ret) { >>>>>>>> dev_err(dev->dev, "error: S2D_NUM_SAMPLES not supported : %d\n", ret); >>>>>>>> - kfree(buf); >>>>>>>> return ret; >>>>>>>> } >>>>>>>> >>>>>>>> /* Start capturing data from the last push location */ >>>>>>>> - if (num_samples > S2D_TELEMETRY_BYTES_MAX) { >>>>>>>> - fsize = S2D_TELEMETRY_BYTES_MAX; >>>>>>>> - stb_rdptr_offset = num_samples - fsize; >>>>>>>> + if (dump_custom_stb && >>>>>>>> + (dev->dram_size - S2D_TELEMETRY_BYTES_MAX <= S2D_TELEMETRY_FSIZE_MAX)) { >>>>>>>> + dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX; >>>>>>>> + stb_rdptr_offset = 0; >>>>>>>> + } else if (num_samples > S2D_TELEMETRY_BYTES_MAX) { >>>>>>> >>>>>>> I find the logic here somewhat suspicious in the first if check (I fail >>>>>>> to follow the reasoning here despite spending considerable amount of time >>>>>>> staring at it). >>>>>>> >>>>>>> Lets assume dump_custom_stb is set. If dev->dram_size is large, the first >>>>>>> if condition will be false because LHS of <= is large. Thus, we go to the >>>>>>> second branch and dump exactly as many samples as before this patch. >>>>>>> ...And that is opposite of what this patch claims to do? >>>>>>> >>>>>> >>>>>> Let me explain: >>>>>> >>>>>> We have cases where the S2D (Spill to DRAM) STB (Smart Trace Buffer - >>>>>> which is meant to have debug data that gives information about the >>>>>> system conditions, be it success or failure.) may not fit in the current >>>>>> stb buffer size of 1M (i.e. S2D_TELEMETRY_BYTES_MAX). The stb buffer >>>>>> size of 1M is sufficient for most of the platforms, but there could be >>>>>> exceptions. >>>>>> >>>>>> In those cases, the dram_size would be updated by the PMFW (Power >>>>>> Management Firmware) coupled with "dump_custom_stb" to fit into a custom >>>>>> requirement. >>>>>> >>>>>> As an example (with this proposed change): >>>>>> >>>>>> when dump_custom_stb is SET and dev->dram_size is 3M (received from >>>>>> PMFW) the first if() will be successful and it takes the corresponding >>>>>> code path. >>>>> >>>>> Heh, I can see the numbers work in that case, however, the above still >>>>> doesn't really explain why only S2D_TELEMETRY_BYTES_MAX is copied if >>>>> dram_size - S2D_TELEMETRY_BYTES_MAX > S2D_TELEMETRY_FSIZE_MAX and not >>>>> S2D_TELEMETRY_FSIZE_MAX as I would expect? >>>>> >>>> >>>> I guess if your question is on: >>>> >>>> -->the if() case: >>>> >>>> why: >>>> >>>> dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX; >>>> >>>> and not >>>> >>>> dev->fsize = S2D_TELEMETRY_FSIZE_MAX; >>>> >>>> if yes, I think the intention of the code change is correct. >>> >>> I agree with you here, the statement itself inside the first if block is >>> not wrong. >>> >>>> The entire >>>> buffer has multiple segments out of which the driver can only access the >>>> STB buffer region. In general cases its S2D_TELEMETRY_BYTES_MAX(1M) and >>>> with the proposed patch a custom STB buffer range that cannot exceed >>>> S2D_TELEMETRY_FSIZE_MAX. >>>> >>>> taking the above example, assume >>>> >>>> case 1: >>>> dev->dram_size is 3M, S2D_TELEMETRY_BYTES_MAX is 1M, so the dev->fsize >>>> is 2M. >>>> >>>> case 2: >>>> dev->dram_size is 2M, S2D_TELEMETRY_BYTES_MAX is 1M, so the dev->fsize >>>> is 1M. >>>> >>>> so, it depends on the incoming dev->dram_size from the PMFW based on >>>> which the dev->fsize is calculated and cannot be directly >>>> S2D_TELEMETRY_FSIZE_MAX (always). >>> >>> Again your examples leave the most interesting case out... So what if >>> dev->dram_size above 3M, do you want dev->fsize to be >>> S2D_TELEMETRY_FSIZE_MAX or up to S2D_TELEMETRY_BYTES_MAX (the latter >>> depends on the num_samples)? >>> >>> Or are saying that configuration never happens? If that's the case, then >>> the entire <= condition looks unnecessary (which is why it looked >>> suspicious) and you could simply do: >>> if (dump_custom_stb) { >>> dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX; >>> ... >>> >>> ...But then, dev->dram_size is assigned S2D_TELEMETRY_DRAMBYTES_MAX if >>> DRAM size wasn't provided so it doesn't look convincing to me >>> dev->dram_size would be limited to 3M. >> >> The examples what I gave was for reference purpose only and lets not >> stick to 3M. > > Those examples failed to cover the case I was interested in and that 3M > just happens to exactly be the boundary above which the cases I was asking > about reside which is why I mentioned it (that 3M is simply derived from > the numbers in your code, not directly related to the example you gave). > >> There are two cases: >> 1: A special case (proposed now), which is required only for certain >> platforms. Here, a new module param will be supplied to the driver that >> will have a special PMFW supporting enchanced dram sizes for getting the >> stb data. >> >> like you mentioned, we can just guard that with a simple: >> >> if (dump_custom_stb) { >> dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX; >> >> >> but think of a scenario where any end user can just reload the driver >> with this new module param on regular platform? that would result in >> getting junk stb data to the userspace or in some cases the entire >> behavior is undefined. >> >> Hence would like to add extra conditions to make sure that even if the >> new module param is supplied, the underlying PMFW should also be ready >> to handle such thing and that is the reason you see: >> >> if (dump_custom_stb && (dev->dram_size - S2D_TELEMETRY_BYTES_MAX <= >> S2D_TELEMETRY_FSIZE_MAX)) >> >> 2: A general purpose regular one - (current existing code), which says >> about where to get the stb data. That is decided on the parameters like >> the num_samples, fsize and the r/w pointer. >> >> the crux is: the additional conditions are for protection only and >> definately not a overkill. > > Okay, thanks. I can accept that as you seem to think there's nothing > wrong with the logic. Not that I'm still entirely convinced since I > couldn't get an answer to those simple questions I asked about the case > where dram_size > 3M but it doesn't really matter. (I still need to read > the answer between the lines for the case where dram_size > 3M and that > message is conflicted.) > > But please do explain either in the commit message and/or with a comment > the purpose of that check because that condition together with the > surrounding logic looks very suspicious, keeping in mind that when > somebody reads this bit of code 10 years from now he/she should be able > understand why you added it there. > Ok. I will add this information as the comment within the code branches. Thanks, Shyam