On Friday, 3 December 2021 2:21:41 PM AEDT Peter Xu wrote: > On Thu, Dec 02, 2021 at 10:06:46PM +1100, Alistair Popple wrote: > > On Tuesday, 16 November 2021 12:49:50 AM AEDT Peter Xu wrote: > > > This check existed since the 1st git commit of Linux repository, but at that > > > time there's no page migration yet so I think it's okay. > > > > > > With page migration enabled, it should logically be possible that we zap some > > > shmem pages during migration. When that happens, IIUC the old code could have > > > the RSS counter accounted wrong on MM_SHMEMPAGES because we will zap the ptes > > > without decreasing the counters for the migrating entries. I have no unit test > > > to prove it as I don't know an easy way to trigger this condition, though. > > > > > > Besides, the optimization itself is already confusing IMHO to me in a few points: > > > > I've spent a bit of time looking at this and think it would be good to get > > cleaned up as I've found it hard to follow in the past. What I haven't been > > able to confirm is if anything relies on skipping swap entries or not. From > > you're description it sounds like skipping swap entries was done as an > > optimisation rather than for some functional reason is that correct? > > Thanks again for looking into this patch, Alistair. I appreciate it a lot. > > I should say that it's how I understand this, and I could be wrong, that's the That makes two of us! > major reason why I marked this patch as RFC. > > As I mentioned this behavior existed in the 1st commit of git history of Linux, > that's the time when there's no special swap entries at all but all the swap > entries are "real" swap entries for anonymous. > > That's why I think it should be an optimization because when previously > zap_details (along with zap_details->mapping in the old code) is non-null, and > that's definitely not an anonymous page. Then skipping swap entry for file > backed memory sounds like a good optimization. Thanks. That was the detail I was trying to figure out. Ie. why might something want to skip swap entries. I will spend some more time looking to be sure though. > However after that we've got all kinds of swap entries introduced, and as you > spotted at least the migration entry should be able to exist to some file > backed memory type (shmem). > > > > > > - The wording "skip swap entries" is confusing, because we're not skipping all > > > swap entries - we handle device private/exclusive pages before that. > > > > > > - The skip behavior is enabled as long as zap_details pointer passed over. > > > It's very hard to figure that out for a new zap caller because it's unclear > > > why we should skip swap entries when we have zap_details specified. > > > > > > - With modern systems, especially performance critical use cases, swap > > > entries should be rare, so I doubt the usefulness of this optimization > > > since it should be on a slow path anyway. > > > > > > - It is not aligned with what we do with huge pmd swap entries, where in > > > zap_huge_pmd() we'll do the accounting unconditionally. > > > > > > This patch drops that trick, so we handle swap ptes coherently. Meanwhile we > > > should do the same mapping check upon migration entries too. > > > > I agree, and I'm not convinced the current handling is very good - if we > > skip zapping a migration entry then the page mapping might get restored when > > the migration entry is removed. > > > > In practice I don't think that is a problem as the migration entry target page > > will be locked, and if I'm understanding things correctly callers of > > unmap_mapping_*() need to have the page(s) locked anyway if they want to be > > sure the page is unmapped. But it seems removing the migration entries better > > matches the intent and I can't think of a reason why they should be skipped. > > Exactly, that's what I see this too. > > I used to think there is a bug for shmem migration (if you still remember I > mentioned it in some of my previous patchset cover letters), but then I found > migration requires page lock then it's probably not a real bug at all. However > that's never a convincing reason to ignore swap entries. Right, it also took me a while to convince myself there wasn't a bug there so if for some reason this patch doesn't end up going in I think we should still treat migration entries the same way as device-private entries. > I wanted to "ignore" this problem by the "adding a flag to skip swap entry" > patch, but as you saw it was very not welcomed anyway, so I have no choice to > try find the fundamental reason for skipping swap entries. When I figured I > cannot really find any good reason and skipping seems to be even buggy, hence > this patch. If this is the right way, the zap pte path can be simplified quite > a lot after patch 2 of this series. Yep, I think it's definitely worth trying to figure out. And if it turns out there is some good reason for skipping we better make sure to document it in a comment somewhere so none of this good research is lost. However I haven't yet come up with a reason why they need to be skipped either. - Alistair