On Tue, Dec 24, 2024 at 12:28 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > On Mon, Dec 23, 2024 at 11:45 PM Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote: > > > > > > > > On 2024/12/24 12:19, Yu Zhao wrote: > > > On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote: > > >> > > >> From: Chen Ridong <chenridong@xxxxxxxxxx> > > >> > > >> The page reclaim isolates a batch of folios from the tail of one of the > > >> LRU lists and works on those folios one by one. For a suitable > > >> swap-backed folio, if the swap device is async, it queues that folio for > > >> writeback. After the page reclaim finishes an entire batch, it puts back > > >> the folios it queued for writeback to the head of the original LRU list. > > >> > > >> In the meantime, the page writeback flushes the queued folios also by > > >> batches. Its batching logic is independent from that of the page reclaim. > > >> For each of the folios it writes back, the page writeback calls > > >> folio_rotate_reclaimable() which tries to rotate a folio to the tail. > > >> > > >> folio_rotate_reclaimable() only works for a folio after the page reclaim > > >> has put it back. If an async swap device is fast enough, the page > > >> writeback can finish with that folio while the page reclaim is still > > >> working on the rest of the batch containing it. In this case, that folio > > >> will remain at the head and the page reclaim will not retry it before > > >> reaching there. > > > > > > For starters, copying & pasting others' commit messages as your own is > > > plagiarism. You need to quote them. > > > > Hi, Yu, Thank you for reminding me, I did not mean any plagiarism. I am > > a beginner, and I do not know much about that. > > I didn't think you were intentional but please also understand this is > a very basic rule that applies everywhere, not just this case. > > > I wrote the message in my v1 and v2 to describe what issue I was fixing, > > which is wordy. What you wrote is much clearer in the commit > > 359a5e1416ca, so I pasted it. I am sorry. Should resend a new patch to > > modify the message? > > > > I have sent all patch versions to you, and I don't know whether you have > > noticed. How I wish you could point it out at the first I pasted it, so > > I wouldn't have made this mistake again and again. > > Here is an example of how I quoted from another commit message: > https://lore.kernel.org/20241107202033.2721681-5-yuzhao@xxxxxxxxxx/ > > > >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back > > >> while isolated") only fixed the issue for mglru. However, this issue > > >> also exists in the traditional active/inactive LRU. > > > > > > You need to prove it with some numbers. > > > > Do you mean I should prove it with some info in my message? Actually, I > > encountered this in the traditional active/inactive LRU, I did not know > > you had fixed for mglru until Barry told me. I offered how to reproduce > > this with Link. > > Yes, the first link does describe how to repro, but you also need to > show the results after your patch, i.e., the improvements from the > change being reviewed. > > > >> This issue will be > > >> worse if THP is split, which makes the list longer and needs longer time > > >> to finish a batch of folios reclaim. > > >> > > >> This issue should be fixed in the same way for the traditional LRU. > > >> Therefore, the common logic was extracted to the 'find_folios_written_back' > > >> function firstly, which is then reused in the 'shrink_inactive_list' > > >> function. Finally, retry reclaiming those folios that may have missed the > > >> rotation for traditional LRU. > > >> > > >> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@xxxxxxxxxxxxxxx/ > > >> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@xxxxxxxxxxxxxx/ > > >> Signed-off-by: Chen Ridong <chenridong@xxxxxxxxxx> > > >> Reviewed-by: Barry Song <baohua@xxxxxxxxxx> > > >> --- > > >> > > >> v5->v6: > > >> - fix compile error(implicit declaration of function 'lru_gen_distance') > > >> when CONFIG_LRU_GEN is disable. > > > > > > Did you build-test it this time? I don't think LRU_REFS_FLAGS is > > > defined when CONFIG_LRU_GEN=y. > > > > > > > Yes, I tested. I didn't test when CONFIG_LRU_GEN=n in patch v5. > > I tested with CONFIG_LRU_GEN=n and CONFIG_LRU_GEN=y in patch v6. I am > > using the next. > > I see. I would base my patches on mm-unstable instead -next, i.e., > git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable > > >> - rename 'is_retried' to is_retrying suggested by Barry Song. Also there is a subtle difference here: `skip_retry` implies "skip retry after it has retried N times", and currently N=1 for MGLRU. The change you made implies "only retry once", i.e., you ruled out the later possibility for N=0 or 2, etc.