Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list assignment states of all the groups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 03, 2025 at 07:30:48PM +0000, Luck, Tony wrote:
> > After having spent a bit of time looking into this, I think we are probably
> > OK, at least for reading these files.
> >
> > seq_file will loop over the file's show() callback, growing the seq_file
> > buffer until show() can run without overrunning the buffer.
> >
> > This means that the show() callback receives a buffer that is magically big
> > enough, but there may be some "speculative" calls whose output never goes
> > to userspace.  Once seq_file has the data, it deals with the userspace-
> > facing I/O buffering internally, so we shouldn't have to worry about that.
> 
> Doesn't this depend on the size of the user read(2) syscall request?

Yes and no.

If I've understood correctly:

To service a given read() call, seq_file calls down into the backend to
generate some whole record, then copies it out to userspace, then
repeats this process so long as there is any space left in the user
buffer.

For resctrl files, we don't implement a seq_file iterator: there is no
.next(), no .llseek(), and we don't implement any notion of file
position.  So our _show() functions generate a single big record that
contains the whole dump -- frequently multiple lines of text.

(This might or might not be desirable, but it is at least simple.)

If a _show() function in resctrl holds rdtgroup_mutex throughout, then
whatever it dumps will be dumped atomically with respect to other
resctrl operations that take this mutex.


So, to flesh out your scenario:

> 
> If the total size of the resctrl file is very large, we have a potential issue:

Let's say it's 5KB.

> 1) User asks for 4KB, owns the resctrl mutex.

(Note, rdtgroup_mutex is only held temporarily inside the resctrlfs
backend to these operations; at the start of the process, it is not
held.)

> 2) resctrl uses seq_file and fills with more than 4KB

(It's actually seq_file that uses resctrl here via callbacks: seq_file
sits in between the VFS layer and resctrl.)

When a .show() callback is called, resctrl doesn't know how much data
to generate; it just writes stuff out with seq_printf() etc.

If there's too much to fit in the default seq_file buffer, the data
gets truncated and the seq_file will get internally marked as having
overflowed.  resctrl could check for this condition in order to avoid
formatting text that will get thrown away due to truncation, but this
is not required.  When the .show() callback returns, the seq_file
implementation will respond to the overflow by growing the buffer and
retrying the whole thing until this doesn't occur (see the loop
preceding the "Fill" label in seq_file.c:seq_read_iter().)

This terminates with a seq_file buffer that contains all the output
(untruncated), or with an -ENOMEM failure (which would be punted to
userspace).

So, assuming nothing went wrong, the seq_file buffer now has the 5KB of
data.  rdtgroup_mutex is not held (it was only held in the _show()
callback).

> 3) User gets the first 4KB, releases the resctrl mutex

Userspace gets the first 4KB, and seq_file's notion of the file
position is advanced by this amount, and the generated text is
kept in the seq_file's buffer.

> 4) Some other pending resctrl operation now gets the mutex and makes changes that affect the contents of this file

The un-read data remains buffered in seq_file.  Other resctrl
operations can happen, so the buffered data may become stale, but it is
still an atomic snapshot.

> 5) User asks for next 4K (when it reaquires resctrl mutex)

If an iterator is implemented, seq_file might try to generate another
record to fill the requested space.  But we don't have an iterator, so
the generated data remains as-is.

> 6) resctrl uses seq_file() to construct new image of file incorporating changes because of step 4

I think this happens only if the file is reopened or lseek()'d, and
only if .llseek() is wired up in struct file_operations.  Resctrl
doesn't seem to do this (whether by accident or by design).

So userspace just sees a non-seekable file.

> 7) User gets the second 4KB from the seq_file buffer (which doesn't fit cleanly next to data it got in step 3).

Userspace gets the final 1K of the data that was generated in response
to the original read() call.

If userspace tries to read again, it will get EOF (again, because we
don't have an iterator -- meaning that no additional records can be
generated).


I haven't traced in detail through the code, but that's my
understanding.

Cheers
---Dave




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux