On Wed, 14 Feb 2024, Charan Teja Kalla wrote: > Hello Hugh, > > Based on offline discussion with some folks in the list, it seems that > this syscall can be helpful. This patch might have forgotten and I hope > this ping helps in resurrecting this thread. Charan, it's not forgotten, but it was relayed to you through another channel a month ago, that I did not expect to have time to think about this for 3 months. Countdown says 2 months to go now. I realize that it's frustrating for you; it's unpleasant for me too. > > On 5/18/2023 6:16 PM, Charan Teja Kalla wrote: > > On 5/17/2023 5:02 PM, Hugh Dickins wrote: > >>> Sure, will include those range calculations for shmem pages too. > >> Oh, I forgot this issue, you would have liked me to look at V8 by now, > >> to see whether I agree with your resolution there. Sorry, no, I've > >> not been able to divert my concentration to it yet. > >> > >> And it's quite likely that I shall disagree, because I've a history of > >> disagreeing even with myself on such range widening/narrowing issues - > >> reconciling conflicting precedents is difficult 🙁 > >> > > If you can at least help by commenting which part of the patch you > > disagree with, I can try hard to convince you there:) . > > > >>> Please let me know if I'm missing something where I should be counting > >>> these as NR_ISOLATED. > >> Please grep for NR_ISOLATED, to see where and how they get manipulated > >> already, and follow the existing examples. The case that sticks in my > >> mind is in mm/mempolicy.c, where the migrate_pages() syscall can build > >> up a gigantic quantity of transiently isolated pages: your syscall can > >> do the same, so should account for itself in the same way. > > Based on the grep, it seems almost all the call stacks that isolates the > folios is for migrating the pages where after migration the NR_ISOLATED > is decremented (in migrate_folio_done()). The call paths are(compaction, > memory hotplug, mempolicy). > > The another call path is reclaim where we isolate 'nr' pages belongs to > a pgdat, account/unaccount them in NR_ISOLATED across the reclaim. > > I think it is easy to account for the above call paths as we know "which > folio corresponds to which pgdat". > > Where as in this patch, we are isolating a set of folios(can corresponds > to different nodes) and relying on the reclaim_pages() to do the swap > out. It is straightforward to account NR_ISOLATED while isolating, but > it requires unaccounting changes in the shrink_folio_list() where folio > is being freed after swap out. Doing so requires changes in all the > code places(eg: shrink_inactive_list()), where it now requires to > account NR_ISOLATED while isolating and the shrink_folio_list() > unaccounts it. > > So, accounting NR_ISOLATED requires changes in other code places where > this patch has not touched. That surprises me, though I do recall that there's an irritating asymmetry in where NR_ISOLATED is accounted and unaccounted. I have not checked what you say there, may do so in 2 months. > > If isolating a large amount of pages and not being recorded in > NR_ISOLATED is really a problem, then may I please know your opinion on > isolating(with out accounting) and reclaiming in small batches? The > batch size can be considered as SWAP_CLUSTER_MAX of pages. In most circumstances, omitting to account NR_ISOLATED wouldn't show up as a problem; in low memory it would. Splitting into small batches without accounting might be an easier and better way; but whatever I say in a hurried unthoughtful reply is likely to be wrong. I am not convinced that isolating is even appropriate: I think I hinted before that I would want to compare what you do here with what shmem_swapin_range() does in mm/madvise.c, and the shmem_collapse_swapin() I'll be proposing to avoid swapin while building up THP in collapse_file(). But it may well be that you've found the switching of LRUs essential: I'm not prejudging, just saying I cannot rush to judgment. And this is also a new UAPI for tmpfs, so should not be rushed then regretted. But if you can find another champion to force this into mm/shmem.c for you faster than I can manage, well, I don't own any Linux source. It's not unusual for me to limp along later and rearrange things to suit my preference. Hugh > > > I had a V8 posted without this into accounting. Let me make the changes > > to account for the NR_ISOLATED too. > > Thanks, > Charan