On Tue, Nov 05, 2024 at 10:00:59AM +0800, Huang, Ying wrote: > Hi, Gregory, > > Gregory Price <gourry@xxxxxxxxxx> writes: > > > My observations between these 3 proposals: > > > > - The page-lock state is complex while trying interpose in mark_folio_accessed, > > meaning inline promotion inside that interface is a non-starter. > > > > We found one deadlock during task exit due to the PTL being held. > > > > This worries me more generally, but we did find some success changing certain > > calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than > > modifying mark_folio_accessed. This ends up changing code in similar places > > to your hook - but catches a more conditions that mark a page accessed. > > > > - For Keith's proposal, promotions via LRU requires memory pressure on the lower > > tier to cause a shrink and therefore promotions. I'm not well versed in LRU > > LRU sematics, but it seems we could try proactive reclaim here. > > > > Doing promote-reclaim and demote/swap/evict reclaim on the same triggers > > seems counter-intuitive. > > IIUC, in TPP paper (https://arxiv.org/abs/2206.02878), a similar method > is proposed for page promoting. I guess that it works together with > proactive reclaiming. > Each process is responsible for doing page table scanning for numa hint faults and producing a promotion. Since the structure used there is the page tables themselves, there isn't an existing recording mechanism for us to piggy-back on to defer migrations to later. > > - Doing promotions inline with access creates overhead. I've seen some research > > suggesting 60us+ per migration - so aggressiveness could harm performance. > > > > Doing it async would alleviate inline access overheads - but it could also make > > promotion pointless if time-to-promote is to far from liveliness of the pages. > > Async promotion needs to deal with the resource (CPU/memory) charging > too. You do some work for a task, so you need to charge the consumed > resource for the task. > This is a good point, and would heavily complicate things. Simple is better, let's avoid that. > > - Doing async-promotion may also require something like PG_PROMOTABLE (as proposed > > by Keith's patch), which will obviously be a very contentious topic. > > Some additional data structure can be used to record pages. > I have an idea inspired by these three sets, i'll bumble my way through a prototype. > > Reading more into the code surrounding this and other migration logic, I also > > think we should explore an optimization to mempolicy that tries to aggressively > > keep certain classes of memory on the local node (RX memory and stack > > for example). > > > > Other areas of reclaim try to actively prevent demoting this type of memory, so we > > should try not to allocate it there in the first place. > > We have already used DRAM first allocation policy. So, we need to > measure its effect firstly. > Yes, but also as the weighted interleave patch set demonstrated, it can be beneficial to change this to distribute allocations from the outset - however, distributing all allocations lead to less reliable performance than just distributing the heap. Another topic for another thread. ~Gregory