Re: [PATCH v3 2/2] platform/x86/amd/pmc: Add dump_custom_stb module parameter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/30/2023 3:39 PM, Ilpo Järvinen wrote:
> On Wed, 30 Aug 2023, Shyam Sundar S K wrote:
>> On 8/29/2023 8:00 PM, Ilpo Järvinen wrote:
>>> On Tue, 29 Aug 2023, Shyam Sundar S K wrote:
>>>> On 8/29/2023 4:49 PM, Ilpo Järvinen wrote:
>>>>> On Tue, 29 Aug 2023, Shyam Sundar S K wrote:
>>>>>> On 8/29/2023 3:09 PM, Ilpo Järvinen wrote:
>>>>>>> On Tue, 29 Aug 2023, Shyam Sundar S K wrote:
>>>>>>>
>>>>>>>> There have been instances when the default size (1M) of the STB is not
>>>>>>>> sufficient to get the complete traces of the failure. In such scenarios
>>>>>>>> we can use a module_param to enable full trace that shall contain more
>>>>>>>> debugging data. This is not a regular case and hence not enabling this
>>>>>>>> capability by default.
>>>>>>>>
>>>>>>>> Co-developed-by: Harsh Jain <Harsh.Jain@xxxxxxx>
>>>>>>>> Signed-off-by: Harsh Jain <Harsh.Jain@xxxxxxx>
>>>>>>>> Signed-off-by: Sanket Goswami <Sanket.Goswami@xxxxxxx>
>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx>
>>>>>>>> ---
>>>>>>>> v2->v3:
>>>>>>>>  - no change
>>>>>>>>
>>>>>>>> v1->v2:
>>>>>>>>  - rebase to 'review-hans' branch
>>>>>>>>  - drop 2/4 of v1
>>>>>>>>    (https://patchwork.kernel.org/project/platform-driver-x86/list/?series=775324&state=%2A&archive=both)
>>>>>>>>
>>>>>>>>  drivers/platform/x86/amd/pmc/pmc.c | 34 +++++++++++++++++++-----------
>>>>>>>>  drivers/platform/x86/amd/pmc/pmc.h |  1 +
>>>>>>>>  2 files changed, 23 insertions(+), 12 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c
>>>>>>>> index 443bb78ea5f4..0e61ae74f1a9 100644
>>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.c
>>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c
>>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>>  
>>>>>>>>  /* STB Spill to DRAM Parameters */
>>>>>>>>  #define S2D_TELEMETRY_BYTES_MAX		0x100000
>>>>>>>> +#define S2D_TELEMETRY_FSIZE_MAX		0x200000
>>>>>>>>  #define S2D_TELEMETRY_DRAMBYTES_MAX	0x1000000
>>>>>>>>  
>>>>>>>>  /* STB Spill to DRAM Message Definition */
>>>>>>>> @@ -160,6 +161,10 @@ static bool disable_workarounds;
>>>>>>>>  module_param(disable_workarounds, bool, 0644);
>>>>>>>>  MODULE_PARM_DESC(disable_workarounds, "Disable workarounds for platform bugs");
>>>>>>>>  
>>>>>>>> +static bool dump_custom_stb;
>>>>>>>> +module_param(dump_custom_stb, bool, 0644);
>>>>>>>> +MODULE_PARM_DESC(dump_custom_stb, "Enable to dump full STB buffer");
>>>>>>>> +
>>>>>>>>  static struct amd_pmc_dev pmc;
>>>>>>>>  static int amd_pmc_send_cmd(struct amd_pmc_dev *dev, u32 arg, u32 *data, u8 msg, bool ret);
>>>>>>>>  static int amd_pmc_read_stb(struct amd_pmc_dev *dev, u32 *buf);
>>>>>>>> @@ -239,7 +244,7 @@ static const struct file_operations amd_pmc_stb_debugfs_fops = {
>>>>>>>>  static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp)
>>>>>>>>  {
>>>>>>>>  	struct amd_pmc_dev *dev = filp->f_inode->i_private;
>>>>>>>> -	u32 *buf, fsize, num_samples, val, stb_rdptr_offset = 0;
>>>>>>>> +	u32 *buf, num_samples, val, stb_rdptr_offset = 0;
>>>>>>>>  	int ret;
>>>>>>>>  
>>>>>>>>  	/* Write dummy postcode while reading the STB buffer */
>>>>>>>> @@ -247,10 +252,6 @@ static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp)
>>>>>>>>  	if (ret)
>>>>>>>>  		dev_err(dev->dev, "error writing to STB: %d\n", ret);
>>>>>>>>  
>>>>>>>> -	buf = kzalloc(S2D_TELEMETRY_BYTES_MAX, GFP_KERNEL);
>>>>>>>> -	if (!buf)
>>>>>>>> -		return -ENOMEM;
>>>>>>>> -
>>>>>>>>  	/* Spill to DRAM num_samples uses separate SMU message port */
>>>>>>>>  	dev->msg_port = 1;
>>>>>>>>  
>>>>>>>> @@ -264,20 +265,27 @@ static int amd_pmc_stb_debugfs_open_v2(struct inode *inode, struct file *filp)
>>>>>>>>  	dev->msg_port = 0;
>>>>>>>>  	if (ret) {
>>>>>>>>  		dev_err(dev->dev, "error: S2D_NUM_SAMPLES not supported : %d\n", ret);
>>>>>>>> -		kfree(buf);
>>>>>>>>  		return ret;
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>>  	/* Start capturing data from the last push location */
>>>>>>>> -	if (num_samples > S2D_TELEMETRY_BYTES_MAX) {
>>>>>>>> -		fsize  = S2D_TELEMETRY_BYTES_MAX;
>>>>>>>> -		stb_rdptr_offset = num_samples - fsize;
>>>>>>>> +	if (dump_custom_stb &&
>>>>>>>> +	    (dev->dram_size - S2D_TELEMETRY_BYTES_MAX <= S2D_TELEMETRY_FSIZE_MAX)) {
>>>>>>>> +		dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX;
>>>>>>>> +		stb_rdptr_offset = 0;
>>>>>>>> +	} else if (num_samples > S2D_TELEMETRY_BYTES_MAX) {
>>>>>>>
>>>>>>> I find the logic here somewhat suspicious in the first if check (I fail 
>>>>>>> to follow the reasoning here despite spending considerable amount of time 
>>>>>>> staring at it).
>>>>>>>
>>>>>>> Lets assume dump_custom_stb is set. If dev->dram_size is large, the first 
>>>>>>> if condition will be false because LHS of <= is large. Thus, we go to the 
>>>>>>> second branch and dump exactly as many samples as before this patch. 
>>>>>>> ...And that is opposite of what this patch claims to do?
>>>>>>>
>>>>>>
>>>>>> Let me explain:
>>>>>>
>>>>>> We have cases where the S2D (Spill to DRAM) STB (Smart Trace Buffer -
>>>>>> which is meant to have debug data that gives information about the
>>>>>> system conditions, be it success or failure.) may not fit in the current
>>>>>> stb buffer size of 1M (i.e. S2D_TELEMETRY_BYTES_MAX). The stb buffer
>>>>>> size of 1M is sufficient for most of the platforms, but there could be
>>>>>> exceptions.
>>>>>>
>>>>>> In those cases, the dram_size would be updated by the PMFW (Power
>>>>>> Management Firmware) coupled with "dump_custom_stb" to fit into a custom
>>>>>> requirement.
>>>>>>
>>>>>> As an example (with this proposed change):
>>>>>>
>>>>>> when dump_custom_stb is SET and dev->dram_size is 3M (received from
>>>>>> PMFW) the first if() will be successful and it takes the corresponding
>>>>>> code path.
>>>>>
>>>>> Heh, I can see the numbers work in that case, however, the above still 
>>>>> doesn't really explain why only S2D_TELEMETRY_BYTES_MAX is copied if 
>>>>> dram_size - S2D_TELEMETRY_BYTES_MAX > S2D_TELEMETRY_FSIZE_MAX and not
>>>>> S2D_TELEMETRY_FSIZE_MAX as I would expect?
>>>>>
>>>>
>>>> I guess if your question is on:
>>>>
>>>> -->the if() case:
>>>>
>>>> why:
>>>>
>>>> dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX;
>>>>
>>>> and not
>>>>
>>>> dev->fsize = S2D_TELEMETRY_FSIZE_MAX;
>>>>
>>>> if yes, I think the intention of the code change is correct.
>>>
>>> I agree with you here, the statement itself inside the first if block is 
>>> not wrong.
>>>
>>>> The entire
>>>> buffer has multiple segments out of which the driver can only access the
>>>> STB buffer region. In general cases its S2D_TELEMETRY_BYTES_MAX(1M) and
>>>> with the proposed patch a custom STB buffer range that cannot exceed
>>>> S2D_TELEMETRY_FSIZE_MAX.
>>>>
>>>> taking the above example, assume
>>>>
>>>> case 1:
>>>> dev->dram_size is 3M, S2D_TELEMETRY_BYTES_MAX is 1M, so the dev->fsize
>>>> is 2M.
>>>>
>>>> case 2:
>>>> dev->dram_size is 2M, S2D_TELEMETRY_BYTES_MAX is 1M, so the dev->fsize
>>>> is 1M.
>>>>
>>>> so, it depends on the incoming dev->dram_size from the PMFW based on
>>>> which the dev->fsize is calculated and cannot be directly
>>>> S2D_TELEMETRY_FSIZE_MAX (always).
>>>
>>> Again your examples leave the most interesting case out... So what if 
>>> dev->dram_size above 3M, do you want dev->fsize to be
>>> S2D_TELEMETRY_FSIZE_MAX or up to S2D_TELEMETRY_BYTES_MAX (the latter 
>>> depends on the num_samples)?
>>>
>>> Or are saying that configuration never happens? If that's the case, then 
>>> the entire <= condition looks unnecessary (which is why it looked 
>>> suspicious) and you could simply do:
>>> 	if (dump_custom_stb) {
>>> 		dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX;
>>> 		...
>>>
>>> ...But then, dev->dram_size is assigned S2D_TELEMETRY_DRAMBYTES_MAX if 
>>> DRAM size wasn't provided so it doesn't look convincing to me 
>>> dev->dram_size would be limited to 3M.
>>
>> The examples what I gave was for reference purpose only and lets not
>> stick to 3M.
> 
> Those examples failed to cover the case I was interested in and that 3M 
> just happens to exactly be the boundary above which the cases I was asking 
> about reside which is why I mentioned it (that 3M is simply derived from 
> the numbers in your code, not directly related to the example you gave).
> 
>> There are two cases:
>> 1: A special case (proposed now), which is required only for certain
>> platforms. Here, a new module param will be supplied to the driver that
>> will have a special PMFW supporting enchanced dram sizes for getting the
>> stb data.
>>
>> like you mentioned, we can just guard that with a simple:
>>
>> if (dump_custom_stb) {
>>  		dev->fsize = dev->dram_size - S2D_TELEMETRY_BYTES_MAX;
>>
>>
>> but think of a scenario where any end user can just reload the driver
>> with this new module param on regular platform? that would result in
>> getting junk stb data to the userspace or in some cases the entire
>> behavior is undefined.
>>
>> Hence would like to add extra conditions to make sure that even if the
>> new module param is supplied, the underlying PMFW should also be ready
>> to handle such thing and that is the reason you see:
>>
>> if (dump_custom_stb && (dev->dram_size - S2D_TELEMETRY_BYTES_MAX <=
>> S2D_TELEMETRY_FSIZE_MAX))
>>
>> 2: A general purpose regular one - (current existing code), which says
>> about where to get the stb data. That is decided on the parameters like
>> the num_samples, fsize and the r/w pointer.
>>
>> the crux is: the additional conditions are for protection only and
>> definately not a overkill.
> 
> Okay, thanks. I can accept that as you seem to think there's nothing 
> wrong with the logic. Not that I'm still entirely convinced since I 
> couldn't get an answer to those simple questions I asked about the case 
> where dram_size > 3M but it doesn't really matter. (I still need to read
> the answer between the lines for the case where dram_size > 3M and that 
> message is conflicted.)
> 
> But please do explain either in the commit message and/or with a comment 
> the purpose of that check because that condition together with the 
> surrounding logic looks very suspicious, keeping in mind that when 
> somebody reads this bit of code 10 years from now he/she should be able 
> understand why you added it there.
> 

Ok. I will add this information as the comment within the code branches.

Thanks,
Shyam



[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux