On 31.05.24 23:46, Dave Hansen wrote:
On 5/31/24 11:04, Byungchul Park wrote:
...
I don't believe you do not agree with the concept itself. Thing is
the current version is not good enough. I will do my best by doing
what I can do.
More performance is good. I agree with that.
But it has to be weighed against the risk and the complexity. The more
I look at this approach, the more I think this is not a good trade off.
There's a lot of risk and a lot of complexity and we haven't seen the
full complexity picture. The gaps are being fixed by adding complexity
in new subsystems (the VFS in this case).
There are going to be winners and losers, and this version for example
makes file writes lose performance.
Just to be crystal clear: I disagree with the concept of leaving stale
TLB entries in place in an attempt to gain performance.
There is the inherent problem that a CPU reading from such (unmapped but
not flushed yet) memory will not get a page fault, which I think is the
most controversial part here (besides interaction with other deferred
TLB flushing, and how this glues into the buddy).
What we used to do so far was limiting the timeframe where that could
happen, under well-controlled circumstances. On the common unmap/zap
path, we perform the batched TLB flush before any page faults / VMA
changes would have be possible and munmap() would have returned with
"succeess". Now that time frame could be significantly longer.
So in current code, at the point in time where we would process a page
fault, mmap()/munmap()/... the TLB would have been flushed already.
To "mimic" the old behavior, we'd essentially have to force any page
faults/mmap/whatsoever to perform the deferred flush such that the CPU
will see the "reality" again. Not sure how that could be done in a
*consistent* way (check whenever we take the mmap/vma lock etc ...) and
if there would still be a performance win.
--
Cheers,
David / dhildenb