On Tue, 23 Jul 2024 at 18:05, Will Deacon <will@xxxxxxxxxx> wrote: > > On Tue, Jul 23, 2024 at 05:02:15PM +0200, Ard Biesheuvel wrote: > > On Tue, 23 Jul 2024 at 16:52, Will Deacon <will@xxxxxxxxxx> wrote: > > > On Fri, Jul 19, 2024 at 11:02:29AM -0700, Ard Biesheuvel wrote: ... > > > > > > > > We might add > > > > > > > > if (pgtable_l4_enabled()) > > > > pgdp = &pgd; > > > > > > > > here to preserve the existing 'lockless' behavior when PUDs are not > > > > folded. > > > > > > The code still needs to be 'lockless' for the 5-level case, so I don't > > > think this is necessary. > > > > The 5-level case is never handled here. > > Urgh, yes, sorry. I've done a fantasticly bad job of explaining myself. > > > There is the 3-level case, where the runtime PUD folding needs the > > actual address in order to recalculate the descriptor address using > > the correct shift. In this case, we don't dereference the pointer > > anyway so the 'lockless' thing doesn't matter (afaict) > > > > In the 4-level case, we want to preserve the original behavior, where > > pgd is not reloaded from pgdp. Setting pgdp to &pgd achieves that. > > Right. What I'm trying to get at is the case where we have folding. For > example, with my patch applied, if we have 3 levels then the lockless > GUP walk looks like: > > > pgd_t pgd = READ_ONCE(*pgdp); > > p4dp = p4d_offset_lockless(pgdp, pgd, addr); > => Returns pgdp > p4d_t p4d = READ_ONCE(*p4dp); > > pudp = pud_offset_lockless(p4dp, p4d, addr); > => Returns &p4d, which is again the pgdp > pud_t pud = READ_ONCE(*pudp); > > > So here we're reloading the same pointer multiple times and my argument > is that if we need to add logic to avoid this for the > pgtable_l4_enabled() case, then we have bigger problems. > The 3-level case is not relevant here. My suggestion only affects the 4-level case: if (pgtable_l4_enabled()) pgdp = &pgd; which prevents us from evaluating *pgdp twice, which seems to me to be the reason these routines exist in the first place. Given that the 3-level runtime-folded case is the one we are trying to fix here, I'd argue that keeping the 4-level case the same as before is important. > > > Yes, we'll load the same entry multiple times, > > > but it should be fine because they're in the context of a different > > > (albeit folded) level. > > > > > > > I don't understand what you are saying here. Why is that fine? > > I think it's fine because (a) the CPU guarantees same address > read-after-read ordering and (b) We only evaluate the most recently read > value. It would be a problem if we mixed data from different reads but, > because the use is confined to that 'level', we don't end up doing that. > > Dunno, am I making any sense? > So what is the point of p?d_offset_lockless()? Is it a performance optimization that we don't care about on arm64? Or does this reasoning still only apply to the folded case?