Re: [PATCH 0/2] RFC: READ/WRITE_ONCE vma/mm cleanups

"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> · Mon, 4 Mar 2019 13:12:10 +0300

On Fri, Mar 01, 2019 at 11:54:52AM -0500, Andrea Arcangeli wrote:
> Hello Kirill and Vlastimil,
> 
> On Fri, Mar 01, 2019 at 02:04:38PM +0100, Vlastimil Babka wrote:
> > On 3/1/19 10:37 AM, Kirill A. Shutemov wrote:
> > > On Thu, Feb 28, 2019 at 10:55:48PM -0500, Andrea Arcangeli wrote:
> > >> Hello,
> > >>
> > >> This was a well known issue for more than a decade, but until a few
> > >> months ago we relied on the compiler to stick to atomic accesses and
> > >> updates while walking and updating pagetables.
> > >>
> > >> However now the 64bit native_set_pte finally uses WRITE_ONCE and
> > >> gup_pmd_range uses READ_ONCE as well.
> > >>
> > >> This convert more racy VM places to avoid depending on the expected
> > >> compiler behavior to achieve kernel runtime correctness.
> > >>
> > >> It mostly guarantees gcc to do atomic updates at 64bit granularity
> > >> (practically not needed) and it also prevents gcc to emit code that
> > >> risks getting confused if the memory unexpectedly changes under it
> > >> (unlikely to ever be needed).
> > >>
> > >> The list of vm_start/end/pgoff to update isn't complete, I covered the
> > >> most obvious places, but before wasting too much time at doing a full
> > >> audit I thought it was safer to post it and get some comment. More
> > >> updates can be posted incrementally anyway.
> > > 
> > > The intention is described well to my eyes.
> > > 
> > > Do I understand correctly, that it's attempt to get away with modifying
> > > vma's fields under down_read(mmap_sem)?
> 
> The issue is that we already get away with it, but we do it without
> READ/WRITE_ONCE. The patch should changes nothing, it should only
> reduce the dependency on the compiler to do what we expect.

Yes, it is pre-existing problem. And yes, complier may screw this up.
The patch may reduce dependency on the compiler, but it doesn't mean it
reduces chance of race.

Consider your changes into __mm_populate() and populate_vma_page_range().
You put READ_ONCE() in both functions. But populate_vma_page_range() gets
called from __mm_populate(). Before your change compiler may optimize the
code and load from the memory once for a field. With your changes complier
will issue two loads.

It *increases* chances of the race, not reduces them.

The current locking scheme doesn't allow modifying VMA field without
down_write(mmap_sem).

We do have hacks[1] that try to bypass the limitation, but AFAIK we never
had a solid explanation why this should work. Sparkling READ_ONCE()
doesn't help with this, but makes it appears legitimate.

[1] I believe we also touch vm_flags without proper locking to set/clear
VM_LOCKED.

-- 
 Kirill A. Shutemov