On Mon, Aug 24, 2020 at 8:38 AM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: > > Sure, KSM does not increment page counter, when a page becomes PageKsm(). > Is patch comment about that? Even if so, I don't understand what this > comment is about. "PageKsm() does not take additional counter" is not > a reason the page can't be reused there. No, the reason is that we don't want to reuse a KSM page, and the page_count() check apparently isn't sufficient in all circumstances. So the comment is there to explain why a plain "page_count()" apparently isn't sufficient. > The reason is that readers > of this page may increase a counter without taking the lock, so > this page_count() == 1 under the lock does not guarantee anything. The intent is to get rid of (a) all the locking costs. The "lock_page()" we had here used to be very expensive. It's shown up several times in the page lock problems, and the reason seems to be simply that this is _the_ hottest non-IO path there is, so it's somewhat easy to generate lots of contention on a shared page. (b) the problems with GUP - because GUP (and some other page sharing cases) don't increase the page_mapping() count, GUP was "invisible" to the re-use code, and as a result the reuse code was a buggy mess. (c) the complete and pointless complexity of this path, that isn't actually done anywhere else. The GUP issue was the immediate - and currently existing - bug caused by this, but the locking costs are another example. So the page reuse is simply wrong. It's almost certainly also pointless and entirely historical. The _reason_ for trying to reuse the KSM pages was documented not as performance, but simple to match the other (also pointless) complexity of the swap cache reuse. So the intent is to do the "page_count()" test early, to get rid of the locking issues with any shared pages. So the logic is "if this page is marked PageKsm(), or if it has an elevated page count, don't even try - just copy". To make a very concrete example: it's not unusual at all to basically have simultaneous page faults on a dirty page because it's COW-shared in both parent and child. Trivial to trigger, with the child and parent running on different CPU's and just writing to the same page right after a fork. And there is absolutely _zero_ reason that should be serialized by anything at all. The parent and child are complete share-nothing things: taking the page lock was and is simply wrong. Solution: don't do it. Just notice "Oh, this page has other users" (and page_count() is the correct thing to do for that, not page_mappings(), since GUP is also another user), and actively *avoid* any serialization. Just copy the damn thing. I'll take full blame for the historical stupidity. This was a bigger deal back in the days when 4MB of RAM was considered normal. Plus page locking wasn't even an issue back then. In fact, no locking at all was needed back when the "try to reuse" code was originally written. Things were simpler back then. It's just that I'm 100% convinced that that historical legacy is very very wrong these days. That "serialize on page lock if COW fault in parent and child" is just an example of where this is fundamentally wrong. But the whole complexity in the map count logic is just wholly and totally wrong too. I dare anybody to read the swapfile code for "total_map_swapcount" and tell me they understand it fully. So my theory is that this code - that is meant to *improve* performance by sharing pages aggressively after a fork(), because that used to be a primary issue, is now in fact making performance *much worse*, because it's trying to optimize for a case that doesn't even matter any more (does anybody truly believe that swap cache and shared COW pages are a major source of performance?) and it does so at a huge complexity _and_ performance cost. So ripping out the KSM reuse code is just another "this is pointless and wrong" issue. If you seriously try to KSM share a page that now only has _one_ single user left, and that one single user writes to it and is modifying it, then the problem is absolutely *NOT* that we should try to re-use the page. No, the problem is that the KSM code picked a horribly bad page to try to share. Will that happen _occasionally_? Sure. But if it happens once in a blue moon, we really shouldn't have that code to deal with it. It's really that simple. All that reuse code is pointless and wrong. It has historical roots, and it made sense at the time, but in this day and age I'm convinced it's completely wrong. Now, I'm _also_ admittedly convinced that I am occasionally completely wrong, and people do odd things, and maybe there are loads where it really matters. I doubt it in this case, but I think what we should do is rip out all the existing historical code, and _if_ somebody has a case where it matters, we can look at THAT case, and people can show (a) what the exact pattern is that we actually care about (b) numbers and then maybe we can re-introduce some sort of re-use code with - hopefully - a much more targeted and documented "this is why this matters" approach. So the intent is to get rid of the page lock thing, but I also hope that long-term, we can get rid of reuse_swap_page() and some of that mapcount stuff entirely. Linus