On 11 Aug 2023, at 11:03, Peter Xu wrote: > On Thu, Aug 10, 2023 at 11:59:25PM +0200, David Hildenbrand wrote: >> On 10.08.23 23:54, Matthew Wilcox wrote: >>> On Thu, Aug 10, 2023 at 05:48:19PM -0400, Peter Xu wrote: >>>>> Yes, that comment from Hugh primarily discusses how we could possibly >>>>> optimize the loop, and if relying on folio_nr_pages_mapped() to reduce the >>>>> iterations would be racy. As far as I can see, there are cases where "it >>>>> would be certainly a bad idea" :) >>>> >>>> Is the race described about mapcount being changed right after it's read? >>>> Are you aware of anything specific that will be broken, and will be fixed >>>> with this patch? >>> >>> The problem is that people check the mapcount while holding no locks; >>> not the PTL, not the page lock. So it's an unfixable race. >>> >>>> Having a total mapcount does sound helpful if partial folio is common >>>> indeed. >>>> >>>> I'm curious whether that'll be so common after the large anon folio work - >>>> isn't it be sad if partial folio will be a norm? It sounds to me that's >>>> the case when small page sizes should be used.. and it's prone to waste? >>> >>> The problem is that entire_mapcount isn't really entire_mapcount. >>> It's pmd_mapcount. I have had thoughts about using it as entire_mapcount, >>> but it gets gnarly when people do partial unmaps. So the _usual_ case >>> ends up touching every struct page. Which sucks. Also it's one of the >>> things which stands in the way of shrinking struct page. >> >> Right, so one current idea is to have a single total_mapcount and look into >> removing the subpage mapcounts (which will require first removing >> _nr_pages_mapped, because that's still one of the important users). >> >> Until we get there, also rmap code has to do eventually "more tracking" and >> might, unfortunately, end up slower. >> >>> >>> But it's kind of annoying to explain all of this to you individually. >>> There have been hundreds of emails about it over the last months on >>> this mailing list. It would be nice if you could catch up instead of >>> jumping in. >> >> To be fair, a lot of the details are not readily available and in the heads >> of selected people :) >> >> Peter, if you're interested, we can discuss the current plans, issues and >> ideas offline! > > Thanks for offering help, David. > > Personally I still am unclear yet on why entire_mapcount cannot be used as > full-folio mapcounts, and why "partial unmap" can happen a lot (I don't > expect), but yeah I can try to catch up to educate myself first. Separate entire_mapcount and per-page mapcount are needed to maintain precise NR_{ANON,FILE}_MAPPED and NR_ANON_THPS. I wrote some explanation (third paragraph) at: https://lore.kernel.org/linux-mm/A28053D6-E158-4726-8BE1-B9F4960AD570@xxxxxxxxxx/. Let me know if it helps. > > The only issue regarding an offline sync-up is that even if David will help > Peter on catching up the bits, it'll not scale when another Peter2 had the > same question.. So David, rather than I waste your time on helping one > person, let me try to catch up with the public threads - I'm not sure how > far I can go myself; otoh thread links will definitely be helpful to be > replied here, so anyone else can reference too. I collected a list (which > can be enriched) of few threads that might be related, just in case helpful > to anyone besides myself: > > [PATCH 0/2] don't use mapcount() to check large folio sharing > https://lore.kernel.org/r/20230728161356.1784568-1-fengwei.yin@xxxxxxxxx > > [PATCH v1-v2 0/3] support large folio for mlock > https://lore.kernel.org/r/20230728070929.2487065-1-fengwei.yin@xxxxxxxxx > https://lore.kernel.org/r/20230809061105.3369958-1-fengwei.yin@xxxxxxxxx > > [PATCH v1 0/4] Optimize mmap_exit for large folios > https://lore.kernel.org/r/20230810103332.3062143-1-ryan.roberts@xxxxxxx > > [PATCH v4-v5 0/5] variable-order, large folios for anonymous memory > https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@xxxxxxx/ > https://lore.kernel.org/r/20230810142942.3169679-1-ryan.roberts@xxxxxxx > > [PATCH v3-v4 0/3] Optimize large folio interaction with deferred split > (I assumed Ryan's this one goes into the previous set v5 finally, so just > the discussions as reference) > https://lore.kernel.org/r/20230720112955.643283-1-ryan.roberts@xxxxxxx > https://lore.kernel.org/r/20230727141837.3386072-1-ryan.roberts@xxxxxxx > > [RFC PATCH v2 0/4] fix large folio for madvise_cold_or_pageout() > https://lore.kernel.org/r/20230721094043.2506691-1-fengwei.yin@xxxxxxxxx > > I'm not sure how far I'll go; maybe I'll start working on something else > before I finish all of them. I'll see.. > > Not allowing people to jump in will definitely cause less interactions and > less involvement/open-ness for the mm community, as sometimes people can't > easily judge when it's proper to jump in. > > IMHO the ideal solution is always keep all discussions public (either > meetings with recordings, or shared online documents, always use on-list > discussions, etc.), then share the links. > > -- > Peter Xu -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature