On Fri, 01 Mar 2019, Andrea Arcangeli wrote:
On Fri, Mar 01, 2019 at 02:04:38PM +0100, Vlastimil Babka wrote:
On 3/1/19 10:37 AM, Kirill A. Shutemov wrote:
> On Thu, Feb 28, 2019 at 10:55:48PM -0500, Andrea Arcangeli wrote:
>> Hello,
>>
>> This was a well known issue for more than a decade, but until a few
>> months ago we relied on the compiler to stick to atomic accesses and
>> updates while walking and updating pagetables.
>>
>> However now the 64bit native_set_pte finally uses WRITE_ONCE and
>> gup_pmd_range uses READ_ONCE as well.
>>
>> This convert more racy VM places to avoid depending on the expected
>> compiler behavior to achieve kernel runtime correctness.
>>
>> It mostly guarantees gcc to do atomic updates at 64bit granularity
>> (practically not needed) and it also prevents gcc to emit code that
>> risks getting confused if the memory unexpectedly changes under it
>> (unlikely to ever be needed).
>>
>> The list of vm_start/end/pgoff to update isn't complete, I covered the
>> most obvious places, but before wasting too much time at doing a full
>> audit I thought it was safer to post it and get some comment. More
>> updates can be posted incrementally anyway.
>
> The intention is described well to my eyes.
>
> Do I understand correctly, that it's attempt to get away with modifying
> vma's fields under down_read(mmap_sem)?
The issue is that we already get away with it, but we do it without
READ/WRITE_ONCE. The patch should changes nothing, it should only
reduce the dependency on the compiler to do what we expect.
If that's the intention, then IMHO it's not that well described. It
talks about "racy VM places" but e.g. the __mm_populate() changes are
for code protected by down_read(). So what's going on here?
expand_stack can move anonymous vma vm_end up or vm_start/pgoff down,
while we hold the mmap_sem for writing. See the location of the three
WRITE_ONCE in the patch.
You mean for reading, right? Yes, with expand_stack being held for read
such members were never really serialized by the mmap_sem and thus we
should not be computing stale values.
Acked-by: Davidlohr Bueso <dbueso@xxxxxxx>
Thanks.