Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26.03.24 18:51, Ryan Roberts wrote:
On 26/03/2024 17:39, David Hildenbrand wrote:
On 26.03.24 18:32, Ryan Roberts wrote:
On 26/03/2024 17:04, David Hildenbrand wrote:

Likely, we just want to read "the real deal" on both sides of the pte_same()
handling.

Sorry I'm not sure I understand? You mean read the full pte including
access/dirty? That's the same as dropping the patch, right? Of course if
we do
that, we still have to keep pte_get_lockless() around for this case. In an
ideal
world we would convert everything over to ptep_get_lockless_norecency() and
delete ptep_get_lockless() to remove the ugliness from arm64.

Yes, agreed. Patch #3 does not look too crazy and it wouldn't really affect
any
architecture.

I do wonder if pte_same_norecency() should be defined per architecture and the
default would be pte_same(). So we could avoid the mkold etc on all other
architectures.

Wouldn't that break it's semantics? The "norecency" of
ptep_get_lockless_norecency() means "recency information in the returned pte
may
be incorrect". But the "norecency" of pte_same_norecency() means "ignore the
access and dirty bits when you do the comparison".

My idea was that ptep_get_lockless_norecency() would return the actual result on
these architectures. So e.g., on x86, there would be no actual change in
generated code.

I think this is a bad plan... You'll end up with subtle differences between
architectures.


But yes, the documentation of these functions would have to be improved.

Now I wonder if ptep_get_lockless_norecency() should actively clear
dirty/accessed bits to more easily find any actual issues where the bits still
matter ...

I did a version that took that approach. Decided it was not as good as this way
though. Now for the life of me, I can't remember my reasoning.

Maybe because there are some code paths that check accessed/dirty without
"correctness" implications? For example, if the PTE is already dirty, no need to
set it dirty etc?

I think I decided I was penalizing the architectures that don't care because all
their ptep_get_norecency() and ptep_get_lockless_norecency() need to explicitly
clear access/dirty. And I would have needed ptep_get_norecency() from day 1 so
that I could feed its result into pte_same().

True. With ptep_get_norecency() you're also penalizing other architectures. Therefore my original thought about making the behavior arch-specific, but the arch has to make sure to get the combination of ptep_get_lockless_norecency()+ptep_same_norecency() is right.

So if an arch decide to ignore bits in ptep_get_lockless_norecency(), it must make sure to also ignore them in ptep_same_norecency(), and must be able to handle access/dirty bit changes differently.

Maybe one could have one variant for "hw-managed access/dirty" vs. "sw managed accessed or dirty". Only the former would end up ignoring stuff here, the latter would not.

But again, just some random thoughts how this affects other architectures and how we could avoid it. The issue I describe in patch #3 would be gone if ptep_same_norecency() would just do a ptep_same() check on other architectures -- and would make it easier to sell :)

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux