Re: [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 10, 2016 at 05:10:35PM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 25, 2016 at 10:50:14PM -0800, Hugh Dickins wrote:
> > It's a useful suggestion from Gerald, and your THP rework may have
> > brought us closer to being able to rely on RCU locking rather than
> > IRQ disablement there; but you were right just to delete the comment,
> > there are other reasons why fast GUP still depends on IRQs disabled.
> > 
> > For example, see the fallback tlb_remove_table_one() in mm/memory.c:
> > that one uses smp_call_function() sending IPI to all CPUs concerned,
> > without waiting an RCU grace period at all.
> 
> I full agree, the refcounting change just drops the THP splitting from
> the equation, but everything else remains. It's not like x86 is using
> RCU for gup_fast when CONFIG_TRANSPARENT_HUGEPAGE=n.
> 
> The main issue Peter also pointed out is how it can be faster to wait
> a RCU grace period than sending an IPI to only the CPU that have an
> active_mm matching the one the page belongs to 

Typically RCU (sched) grace periods take a relative 'forever' compared
to sending IPIs. That is, synchronize_sched() is typically slower.

But, on the upside, not sending IPIs will not perturb those other
CPUs, which is something HPC/RT people like.

> and I'm not exactly
> sure the cost of disabling irqs in gup_fast is going to pay off.

Entirely depends on the workload of course, but you can do a lot of
gup_fast compared to munmap()s. So making gup_fast, faster, seems like a
potential win. Also, is anybody really interested in munmap()
performance?

> It's
> not just swap, large munmap should be able to free up pagetables or
> pagetables would get a footprint out of proportion with the Rss of the
> process, and in turn it'll have to either block synchronously for long
> before returning to userland, or return to userland when the pagetable
> memory is still not free, and userland may mmap again and munmap again
> in a loop and being legit doing so too, with unclear side effects with
> regard to false positive OOM.

I'm not seeing that, the only point where this matters at all, is if the
batch alloc fails, otherwise the RCU_TABLE_FREE stuff uses
call_rcu_sched() and what you write above is true already.

Now, RCU already has an oom_notifier to push work harder if we approach
that.

> Then there's another issue with synchronize_sched(),
> __get_user_pages_fast has to safe to run from irq (note the
> local_irq_save instead of local_irq_disable) and KVM leverages it.

This is unchanged. synchronize_sched() serialized against anything that
disables preemption, having IRQs disabled is very much included in that.

So there should be no problem running this from IRQ context.

> KVM
> just requires it to be atomic so it can run from inside a preempt
> disabled section (i.e. inside a spinlock), I'm fairly certain the
> irq-safe guarantee could be dropped without pain and
> rcu_read_lock_sched() would be enough, but the documentation of the
> IRQ-safe guarantees provided by __get_user_pages_fast should be also
> altered if we were to use synchronize_sched() and that's a symbol
> exported to GPL modules too.

No changes needed.

> Overall my main concern in switching x86 to RCU gup-fast is the
> performance of synchronize_sched in large munmap pagetable teardown.

Normally, as already established by Martin, you should not actually ever
encounter the sync_sched() call. Only under severe memory pressure, when
the batch alloc in tlb_remove_table() fails is this ever an issue.

And at the point where such allocations fail, performance typically
isn't a concern anymore.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]