On 2023/3/1 13:09, Gao Xiang wrote:
On 2023/3/1 13:01, Matthew Wilcox wrote:
On Wed, Mar 01, 2023 at 12:49:10PM +0800, Gao Xiang wrote:
The only problem is that the readahead code doesn't tell the filesystem
whether the request is sync or async. This should be a simple matter
of adding a new 'bool async' to the readahead_control and then setting
REQ_RAHEAD based on that, rather than on whether the request came in
through readahead() or read_folio() (eg see mpage_readahead()).
Great! In addition to that, just (somewhat) off topic, if we have a
"bool async" now, I think it will immediately have some users (such as
EROFS), since we'd like to do post-processing (such as decompression)
immediately in the same context with sync readahead (due to missing
pages) and leave it to another kworker for async readahead (I think
it's almost same for decryption and verification).
So "bool async" is quite useful on my side if it could be possible
passed to fs side. I'd like to raise my hands to have it.
That's a really interesting use-case; thanks for bringing it up.
Ideally, we'd have the waiting task do the
decompression/decryption/verification for proper accounting of CPU.
Unfortunately, if the folio isn't uptodate, the task doesn't even hold
a reference to the folio while it waits, so there's no way to wake the
task and let it know that it has work to do. At least not at the moment
... let me think about that a bit (and if you see a way to do it, feel
free to propose it)
Honestly, I'd like to take the folio lock until all post-processing is
done and make it uptodate and unlock so that only we need is to pass
locked-folios requests to kworkers for async way or sync handling in
the original context.
If we unlocked these folios in advance without uptodate, which means
we have to lock it again (which could have more lock contention) and
need to have a way to trace I/Oed but not post-processed stuff in
addition to no I/Oed stuff.
I'm not sure which way is better to proper accounting of CPU, but I
think individual fs could know more than mm about post-processing
handling, I think just have some accounting apis to fses for these.
currently I think core-MM just needs to export "async" bool to rac.
and EROFS now just do sync decompression for <= 4 pages in
z_erofs_readahead(), and I think it can be done better, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/zdata.c?h=v6.2#n832
Thanks,
Gao Xiang
Thanks,
Gao Xiang