On 06.09.22 16:30, Jason Gunthorpe wrote:
On Tue, Sep 06, 2022 at 03:57:30PM +0200, David Hildenbrand wrote:
READ_ONCE primarily is a marker that the data being read is unstable
and that the compiler must avoid all instability when reading it. eg
in this case the compiler could insanely double read the value, even
though the 'if' requires only a single read. This would result in
corrupt calculation.
As we have a full memory barrier + compile barrier, the compiler might
indeed do double reads and all that stuff. BUT, it has to re-read after we
incremented the refcount, and IMHO that's the important part to detect the
change.
Yes, it is important, but it is not the only important part.
The compiler still has to exectute "if (*a != b)" *correctly*.
This is what READ_ONCE is for. It doesn't set order, it doesn't
implement a barrier, it tells the compiler that '*a' is unstable data
and the compiler cannot make assumptions based on the idea that
reading '*a' multiple times will always return the same value.
If the compiler makes those assumptions then maybe even though 'if (*a
!= b)' is the reality, it could mis-compute '*a == b'. You enter into
undefined behavior here.
Though it is all very unlikely, the general memory model standard is
to annotate with READ_ONCE.
The only thing I could see going wrong in the comparison once the stars
alingn would be something like the following:
if (*a != b)
implemented as
if ((*a).lower != b.lower && (*a).higher != b.higher)
This could only go wrong if we have more than one change such that:
Original:
*a = 0x00000000ffffffffull;
First modification:
*a = 0xffffffffffffffffull;
Second modification:
*a = 0x00000000eeeeeeeeull;
If we race with both modifications, we could see that ffffffff matches,
and could see that 00000000 matches as well.
So I agree that we should change it, but not necessarily as an urgent
fix and not necessarily in this patch. It's best to adjust all gup_*
functions in one patch.
... I do wonder if we want to reuse ptep_get_lockless() instead of the
READ_ONCE(). CONFIG_GUP_GET_PTE_LOW_HIGH is confusing.
--
Thanks,
David / dhildenb