Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list assignment states of all the groups

"Moger, Babu" <babu.moger@xxxxxxx> · Thu, 20 Feb 2025 15:29:12 -0600

Hi Dave,

On 2/20/25 09:44, Dave Martin wrote:
> Hi,
> 
> On Wed, Feb 19, 2025 at 03:09:51PM -0600, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 2/19/25 07:53, Dave Martin wrote:
>>> On Wed, Jan 22, 2025 at 02:20:30PM -0600, Babu Moger wrote:
>>>> Provide the interface to list the assignment states of all the resctrl
>>>> groups in mbm_cntr_assign mode.
> 
> [...]
> 
>>>> +static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
>>>> +					   struct seq_file *s, void *v)
>>>> +{
> 
> [...]
> 
>>> Unlike the other resctrl files, it looks like the total size of this
>>> data will scale up with the number of existing monitoring groups
>>> and the lengths of the group names (in addition to the number of
>>> monitoring domains).
>>>
>>> So, this can easily be more than a page, overflowing internal limits
>>> in the seq_file and kernfs code.
>>>
>>> Do we need to track some state between read() calls?  This can be done
>>> by overriding the kernfs .open() and .release() methods and hanging
>>> some state data (or an rdtgroup_file pointer) on of->priv.
>>>
>>> Also, if we allow the data to be read out in chunks, then we would
>>> either have to snapshot all the data in one go and stash the unread
>>> tail in the kernel, or we would need to move over to RCU-based
>>> enumeration or similar -- otherwise releasing rdtgroup_mutex in the
>>> middle of the enumeration in order to return data to userspace is going
>>> to be a problem...
>>
>> Good catch.
>>
>> I see similar buffer overflow is handled by calling seq_buf_clear()
>> (look at process_durations() or in show_user_instructions()).
>>
>> How about handling this by calling rdt_last_cmd_clear() before printing
>> each group?
> 
> Does this work?
> 
> Once seq_buf_has_overflowed() returns nonzero, data has been lost, no?
> So far as I can see, show_user_instructions() just gives up on printing
> the affected line, while process_durations() tries to anticipate
> overflow and prints out the accumulated text to dmesg before clearing
> the buffer.

Yea. Agree,

> 
> In our case, we cannot send more data to userspace than was requested
> in the read() call, so we might have nowhere to drain the seq_buf
> contents to in order to free up space.
> 
> sysfs "expects" userspace to do a big enough read() that this problem
> doesn't happen.  In practice this is OK because people usually read
> through a buffered I/O layer like stdio, and in realistic
> implementations the user-side I/O buffer is large enough to hide this
> issue.
> 
> But mbm_assign_control data is dynamically generated and potentially
> much bigger than a typical sysfs file.

I have no idea how to handle this case. We may have to live with this
problem. Let us know if there are any ideas.

> 
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 484d6009869f..1828f59eb723 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1026,6 +1026,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>>         }
>>
>>         list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> +               rdt_last_cmd_clear();
>>                 seq_printf(s, "%s//", rdtg->kn->name);
>>
>>                 sep = false;
>> @@ -1041,6 +1042,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>>                 seq_putc(s, '\n');
>>
>>                 list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> mon.crdtgrp_list) {
>> +                       rdt_last_cmd_clear();
> 
> I don't see how this helps.
> 
> Surely last_cmd_status has nothing to do with s?

Correct. Clearly, I misunderstood the problem.

> 
> [...]
> 
> Cheers
> ---Dave
> 

-- 
Thanks
Babu Moger