On 02/08/2023 12:36, David Hildenbrand wrote: > On 02.08.23 13:20, Ryan Roberts wrote: >> On 02/08/2023 11:48, David Hildenbrand wrote: >>> On 02.08.23 12:27, Ryan Roberts wrote: >>>> On 28/07/2023 17:13, Yin Fengwei wrote: >>>>> In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(), >>>>> folio_mapcount() is used to check whether the folio is shared. But it's >>>>> not correct as folio_mapcount() returns total mapcount of large folio. >>>>> >>>>> Use folio_estimated_sharers() here as the estimated number is enough. >>>>> >>>>> Yin Fengwei (2): >>>>> madvise: don't use mapcount() against large folio for sharing check >>>>> madvise: don't use mapcount() against large folio for sharing check >>>>> >>>>> mm/huge_memory.c | 2 +- >>>>> mm/madvise.c | 6 +++--- >>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>> >>>> >>>> As a set of fixes, I agree this is definitely an improvement, so: >>>> >>>> Reviewed-By: Ryan Roberts >>>> >>>> >>>> But I have a couple of comments around further improvements; >>>> >>>> Once we have the scheme that David is working on to be able to provide precise >>>> exclusive vs shared info, we will probably want to move to that. Although that >>>> scheme will need access to the mm_struct of a process known to be mapping the >>> >>> There are probably ways to work around lack of mm_struct, but it would not be >>> completely for free. But passing the mm_struct should probably be an easy >>> refactoring. >>> >>>> folio. We have that info, but its not passed to folio_estimated_sharers() so we >>>> can't just reimplement folio_estimated_sharers() - we will need to rework these >>>> call sites again. >>> >>> We should probably just have a >>> >>> folio_maybe_mapped_shared() >>> >>> with proper documentation. Nobody should care about the exact number. >>> >>> >>> If my scheme for anon pages makes it in, that would be precise for anon pages >>> and we could document that. Once we can handle pagecache pages as well to get a >>> precise answer, we could change to folio_mapped_shared() and adjust the >>> documentation. >> >> Makes sense to me. I'm assuming your change would allow us to get rid of >> PG_anon_exclusive too? In which case we would also want a precise API >> specifically for anon folios for the CoW case, without waiting for pagecache >> page support. > > Not necessarily and I'm currently not planning that > > On the COW path, I'm planning on using it only when PG_anon_exclusive is clear > for a compound page, combined with a check that there are no other page > references besides from mappings: all mappings from me and #refs == #mappings -> > reuse (set PG_anon_exclusive). That keeps the default (no fork) as fast as > possible and simple. > >>> >>> I just saw >>> >>> https://lkml.kernel.org/r/20230802095346.87449-1-wangkefeng.wang@xxxxxxxxxx >>> >>> that converts a lot of code to folio_estimated_sharers(). >>> >>> >>> That patchset, for example, also does >>> >>> total_mapcount(page) > 1 -> folio_estimated_sharers(folio) > 1 >>> >>> I'm not 100% sure what to think about that at this point. We eventually add >>> false negatives (actually shared but we fail to detect it) all over the place, >>> instead of having false positives (actually exclusive, but we fail to detect >>> it). >>> >>> And that patch set doesn't even spell that out. >>> >>> >>> Maybe it's as good as we will get, especially if my scheme doesn't make it in. >> >> I've been working on the assumption that your scheme is plan A, and I'm waiting >> for it to unblock forward progress on large anon folios. Is this the right >> approach, or do you think your scheme is sufficiently riskly and/or far out that >> I should aim not to depend on it? > > It is plan A. IMHO, it does not feel too risky and/or far out at this point -- > and the implementation should not end up too complicated. But as always, I > cannot promise anything before it's been implemented and discussed upstream. OK, good we are on the same folio... (stolen from Hugh; if a joke is worth telling once, its worth telling 1000 times ;-) > > Hopefully, we know more soon. I'll get at implementing it fairly soon. >