On 03/12/2020 22:36, Johannes Weiner wrote: > On Tue, Dec 01, 2020 at 01:32:26PM +0000, Christoph Hellwig wrote: >> On Tue, Dec 01, 2020 at 01:17:49PM +0000, Pavel Begunkov wrote: >>> I was thinking about memcpy bvec instead of iterating as a first step, >>> and then try to reuse passed in bvec. >>> >>> A thing that doesn't play nice with that is setting BIO_WORKINGSET in >>> __bio_add_page(), which requires to iterate all pages anyway. I have no >>> clue what it is, so rather to ask if we can optimise it out somehow? >>> Apart from pre-computing for specific cases... >>> >>> E.g. can pages of a single bvec segment be both in and out of a working >>> set? (i.e. PageWorkingset(page)). >> >> Adding Johannes for the PageWorkingset logic, which keeps confusing me >> everytime I look at it. I think it is intended to deal with pages >> being swapped out and in, and doesn't make much sense to look at in >> any form for direct I/O, but as said I'm rather confused by this code. > > Correct, it's only interesting for pages under LRU management - page > cache and swap pages. It should not matter for direct IO. > > The VM uses the page flag to tell the difference between cold faults > (empty cache startup e.g.), and thrashing pages which are being read > back not long after they have been reclaimed. This influences reclaim > behavior, but can also indicate a general lack of memory. > > The BIO_WORKINGSET flag is for the latter. To calculate the time > wasted by a lack of memory (memory pressure), we measure the total > time processes wait for thrashing pages. Usually that time is > dominated by waiting for in-flight io to complete and pages to become > uptodate. These waits are annotated on the page cache side. > > However, in some cases, the IO submission path itself can block for > extended periods - if the device is congested or submissions are > throttled due to cgroup policy. To capture those waits, the bio is > flagged when it's for thrashing pages, and then submit_bio() will > report submission time of that bio as a thrashing-related delay. TIL, thanks Johannes > > [ Obviously, in theory bios could have a mix of thrashing and > non-thrashing pages, and the submission stall could have occurred > even without the thrashing pages. But in practice we have locality, > where groups of pages tend to be accessed/reclaimed/refaulted > together. The assumption that the whole bio is due to thrashing when > we see the first thrashing page is a workable simplification. ] Great, then the last piece left before hacking this up is killing off mutating bio_for_each_segment_all(). But don't think anyone will be sad for it. -- Pavel Begunkov