Re: [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 14 Feb 2024, Charan Teja Kalla wrote:

> Hello Hugh,
> 
> Based on offline discussion with some folks in the list, it seems that
> this syscall can be helpful. This patch might have forgotten and I hope
> this ping helps in resurrecting this thread.

Charan, it's not forgotten, but it was relayed to you through another
channel a month ago, that I did not expect to have time to think about
this for 3 months.  Countdown says 2 months to go now.

I realize that it's frustrating for you; it's unpleasant for me too.

> 
> On 5/18/2023 6:16 PM, Charan Teja Kalla wrote:
> > On 5/17/2023 5:02 PM, Hugh Dickins wrote:
> >>> Sure, will include those range calculations for shmem pages too.
> >> Oh, I forgot this issue, you would have liked me to look at V8 by now,
> >> to see whether I agree with your resolution there.  Sorry, no, I've
> >> not been able to divert my concentration to it yet.
> >>
> >> And it's quite likely that I shall disagree, because I've a history of
> >> disagreeing even with myself on such range widening/narrowing issues -
> >> reconciling conflicting precedents is difficult 🙁
> >>
> > If you can at least help by commenting which part of the patch you
> > disagree with, I can try hard to convince you there:) .
> > 
> >>> Please let me know if I'm missing something where I should be counting
> >>> these as NR_ISOLATED.
> >> Please grep for NR_ISOLATED, to see where and how they get manipulated
> >> already, and follow the existing examples.  The case that sticks in my
> >> mind is in mm/mempolicy.c, where the migrate_pages() syscall can build
> >> up a gigantic quantity of transiently isolated pages: your syscall can
> >> do the same, so should account for itself in the same way.
> 
> Based on the grep, it seems almost all the call stacks that isolates the
> folios is for migrating the pages where after migration the NR_ISOLATED
> is decremented (in migrate_folio_done()). The call paths are(compaction,
> memory hotplug, mempolicy).
> 
> The another call path is reclaim where we isolate 'nr' pages belongs to
> a pgdat, account/unaccount them in NR_ISOLATED across the reclaim.
> 
> I think it is easy to account for the above call paths as we know "which
> folio corresponds to which pgdat".
> 
> Where as in this patch, we are isolating a set of folios(can corresponds
> to different nodes) and relying on the reclaim_pages() to do the swap
> out. It is straightforward to account NR_ISOLATED while isolating, but
> it requires unaccounting changes in the shrink_folio_list() where folio
> is being freed after swap out.  Doing so requires changes in all the
> code places(eg: shrink_inactive_list()), where it now requires to
> account NR_ISOLATED while isolating and the shrink_folio_list()
> unaccounts it.
> 
> So, accounting NR_ISOLATED requires changes in other code places where
> this patch has not touched.

That surprises me, though I do recall that there's an irritating
asymmetry in where NR_ISOLATED is accounted and unaccounted.
I have not checked what you say there, may do so in 2 months.

> 
> If isolating a large amount of pages and not being recorded in
> NR_ISOLATED is really a problem, then may I please know your opinion on
> isolating(with out accounting) and reclaiming in small batches? The
> batch size can be considered as SWAP_CLUSTER_MAX of pages.

In most circumstances, omitting to account NR_ISOLATED wouldn't show
up as a problem; in low memory it would.  Splitting into small batches
without accounting might be an easier and better way; but whatever I
say in a hurried unthoughtful reply is likely to be wrong.

I am not convinced that isolating is even appropriate: I think I
hinted before that I would want to compare what you do here with what
shmem_swapin_range() does in mm/madvise.c, and the shmem_collapse_swapin()
I'll be proposing to avoid swapin while building up THP in collapse_file().

But it may well be that you've found the switching of LRUs essential:
I'm not prejudging, just saying I cannot rush to judgment.  And this
is also a new UAPI for tmpfs, so should not be rushed then regretted.

But if you can find another champion to force this into mm/shmem.c
for you faster than I can manage, well, I don't own any Linux source.
It's not unusual for me to limp along later and rearrange things to
suit my preference.

Hugh

> 
> > I had a V8 posted without this into accounting. Let me make the changes
> > to account for the NR_ISOLATED too.
> 
> Thanks,
> Charan

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux