On Sat, Jun 01, 2024 at 09:22:17AM +0200, David Hildenbrand wrote: > On 31.05.24 23:46, Dave Hansen wrote: > > On 5/31/24 11:04, Byungchul Park wrote: > > ... > > > I don't believe you do not agree with the concept itself. Thing is > > > the current version is not good enough. I will do my best by doing > > > what I can do. > > > > More performance is good. I agree with that. > > > > But it has to be weighed against the risk and the complexity. The more > > I look at this approach, the more I think this is not a good trade off. > > There's a lot of risk and a lot of complexity and we haven't seen the > > full complexity picture. The gaps are being fixed by adding complexity > > in new subsystems (the VFS in this case). > > > > There are going to be winners and losers, and this version for example > > makes file writes lose performance. > > > > Just to be crystal clear: I disagree with the concept of leaving stale > > TLB entries in place in an attempt to gain performance. > > There is the inherent problem that a CPU reading from such (unmapped but not > flushed yet) memory will not get a page fault, which I think is the most > controversial part here (besides interaction with other deferred TLB > flushing, and how this glues into the buddy). > > What we used to do so far was limiting the timeframe where that could > happen, under well-controlled circumstances. On the common unmap/zap path, > we perform the batched TLB flush before any page faults / VMA changes would > have be possible and munmap() would have returned with "succeess". Now that > time frame could be significantly longer. > > So in current code, at the point in time where we would process a page > fault, mmap()/munmap()/... the TLB would have been flushed already. > > To "mimic" the old behavior, we'd essentially have to force any page > faults/mmap/whatsoever to perform the deferred flush such that the CPU will > see the "reality" again. Not sure how that could be done in a *consistent* In luf's point of view, the points where the deferred flush should be performed are simply: 1. when changing the vma maps, that might be luf'ed. 2. when updating data of the pages, that might be luf'ed. All we need to do is to indentify the points: 1. when changing the vma maps, that might be luf'ed. a) mmap and munmap e.i. fault handler or unmap_region(). b) permission to writable e.i. mprotect or fault handler. c) what I'm missing. 2. when updating data of the pages, that might be luf'ed. a) updating files through vfs e.g. file_end_write(). b) updating files through writable maps e.i. 1-a) or 1-b). c) what I'm missing. Some of them are already performing necessary tlb flush and the others are not. luf has to handle the others, that I've been focusing on. Of course, there might be what I'm missing tho. Worth noting again, luf is working only on *migration* and *reclaim* currently. Thing is when to stop the pending initiated from migration or reclaim by luf. Byungchul > way (check whenever we take the mmap/vma lock etc ...) and if there would > still be a performance win. > > -- > Cheers, > > David / dhildenb