Peter Zijlstra wrote:
On Thu, 2008-04-17 at 08:25 -0700, Linus Torvalds wrote:
On Thu, 17 Apr 2008, Peter Zijlstra wrote:
Would this be sufficient to address that comment's conern?
It would be nicer to just add a "native_get_pte()" to x86, to match the
already-existing "native_set_pte()".
See, I _knew_ I was missing something obvious :-/
And that "barrier()" should b "smp_rmb()". They may be the same code
sequence, but from a conceptual angle, "smp_rmb()" makes a whole lot more
sense.
Finally, I don't think that comment is correct in the first place. It's
not that simple. The thing is, even *with* the memory barrier in place, we
may have:
CPU#1 CPU#2
===== =====
fast_gup:
- read low word
native_set_pte_present:
- set low word to 0
- set high word to new value
- read high word
- set low word to new value
and so you read a low word that is associated with a *different* high
word! Notice?
So trivial memory ordering is _not_ enough.
So I think the code literally needs to be something like this
#ifdef CONFIG_X86_PAE
static inline pte_t native_get_pte(pte_t *ptep)
{
pte_t pte;
retry:
pte.pte_low = ptep->pte_low;
smp_rmb();
pte.pte_high = ptep->pte_high;
smp_rmb();
if (unlikely(pte.pte_low != ptep->pte_low)
goto retry;
return pte;
}
#else
#define native_get_pte(ptep) (*(ptep))
#endif
but I have admittedly not really thought it fully through.
Looks sane here; Clark can you give this a spin?
Jeremy, did I get the paravirt stuff right?
You shouldn't need to do anything special for paravirt. set_pte is
necessary because it may have side-effects (like a hypervisor call), but
get_pte should be side-effect free. There's no other need for it; any
special processing on the pte value itself is done in pte_val().
J
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html