Re: [RFC 1/1] mm: introduce mmap_lock_speculation_{start|end}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 9, 2024 at 12:36 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
> On Thu, Aug 8, 2024 at 3:16 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> >
> > On Fri, Aug 9, 2024 at 12:05 AM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > On Thu, Aug 8, 2024 at 2:43 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Aug 8, 2024 at 11:11 PM Andrii Nakryiko
> > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > > On Thu, Aug 8, 2024 at 2:02 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Thu, Aug 8, 2024 at 8:19 PM Andrii Nakryiko
> > > > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, Aug 7, 2024 at 11:23 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > Add helper functions to speculatively perform operations without
> > > > > > > > read-locking mmap_lock, expecting that mmap_lock will not be
> > > > > > > > write-locked and mm is not modified from under us.
> > > > > > > >
> > > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > > > > > > > Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > > > > > Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> > > > > > > > ---
> > > > > > >
> > > > > > > This change makes sense and makes mm's seq a bit more useful and
> > > > > > > meaningful. I've also tested it locally with uprobe stress-test, and
> > > > > > > it seems to work great, I haven't run into any problems with a
> > > > > > > multi-hour stress test run so far. Thanks!
> > > > > >
> > > > > > Thanks for testing and feel free to include this patch into your set.
> > > > >
> > > > > Will do!
> > > > >
> > > > > >
> > > > > > I've been thinking about this some more and there is a very unlikely
> > > > > > corner case if between mmap_lock_speculation_start() and
> > > > > > mmap_lock_speculation_end() mmap_lock is write-locked/unlocked so many
> > > > > > times that mm->mm_lock_seq (int) overflows and just happen to reach
> > > > > > the same value as we recorded in mmap_lock_speculation_start(). This
> > > > > > would generate a false positive, which would show up as if the
> > > > > > mmap_lock was never touched. Such overflows are possible for vm_lock
> > > > > > as well (see: https://elixir.bootlin.com/linux/v6.10.3/source/include/linux/mm_types.h#L688)
> > > > > > but they are not critical because a false result would simply lead to
> > > > > > a retry under mmap_lock. However for your case this would be a
> > > > > > critical issue. This is an extremely low probability scenario but
> > > > > > should we still try to handle it?
> > > > > >
> > > > >
> > > > > No, I think it's fine.
> > > >
> > > > Modern computers don't take *that* long to count to 2^32, even when
> > > > every step involves one or more syscalls. I've seen bugs where, for
> > > > example, a 32-bit refcount is not decremented where it should, making
> > > > it possible to overflow the refcount with 2^32 operations of some
> > > > kind, and those have taken something like 3 hours to trigger in one
> > > > case (https://bugs.chromium.org/p/project-zero/issues/detail?id=2478),
> > > > 14 hours in another case. Or even cases where, if you have enough RAM,
> > > > you can create 2^32 legitimate references to an object and overflow a
> > > > refcount that way
> > > > (https://bugs.chromium.org/p/project-zero/issues/detail?id=809 if you
> > > > had more than 32 GiB of RAM, taking only 25 minutes to overflow the
> > > > 32-bit counter - and that is with every step allocating memory).
> > > > So I'd expect 2^32 simple operations that take the mmap lock for
> > > > writing to be faster than 25 minutes on a modern desktop machine.
> > > >
> > > > So for a reader of some kinda 32-bit sequence count, if it is
> > > > conceivably possible for the reader to take at least maybe a couple
> > > > minutes or so between the sequence count reads (also counting time
> > > > during which the reader is preempted or something like that), there
> > > > could be a problem. At that point in the analysis, if you wanted to
> > > > know whether it's actually exploitable, I guess you'd have to look at
> > > > what kinda context you're running in, and what kinda events can
> > > > interrupt/preempt you (like whether someone can send a sufficiently
> > > > dense flood of IPIs to completely prevent you making forward progress,
> > > > like in https://www.vusec.net/projects/ghostrace/), and for how long
> > > > those things can delay you (maybe including what the pessimal
> > > > scheduler behavior looks like if you're in preemptible context, or how
> > > > long clock interrupts can take to execute when processing a giant pile
> > > > of epoll watches), and so on...
> > > >
> > >
> > > And here we are talking about *lockless* *speculative* VMA usage that
> > > will last what, at most on the order of a few microseconds?
> >
> > Are you talking about time spent in task context, or time spent while
> > the task is on the CPU (including time in interrupt context), or about
> > wall clock time?
>
> We are doing, roughly:
>
> mmap_lock_speculation_start();
> rcu_read_lock();
> vma_lookup();
> rb_find();
> rcu_read_unlock();
> mmap_lock_speculation_end();
>
>
> On non-RT kernel this can be prolonged only by having an NMI somewhere
> in the middle.

I don't think you're running with interrupts off here? Even on kernels
without any preemption support, normal interrupts (like timers,
incoming network traffic, TLB flush IPIs) should still be able to
interrupt here. And in CONFIG_PREEMPT kernels (which enable
CONFIG_PREEMPT_RCU by default), rcu_read_lock() doesn't block
preemption, so you can even get preempted here - I don't think you
need RT for that.

My understanding is that the main difference between normal
CONFIG_PREEMPT and RT is whether spin_lock() blocks preemption.

> On RT it can get preempted even within RCU locked
> region, if I understand correctly. If you manage to make this part run
> sufficiently long to overflow 31-bit counter, it's probably a bigger
> problem than mmap's sequence wrapping over, no?


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux