Re: Strange mmap_lock tracepoint

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Vlastimil Babka <vbabka@xxxxxxx> [210901 06:52]:
> On 8/31/21 22:40, Liam Howlett wrote:
> > 
> > Hello,
> > 
> > I've been trying to trace the mmap_lock calls using tracepoints and
> > captured this behaviour which I cannot explain.  This was with the maple
> > tree v2 patches running ebizzy with multiple threads in an x86_64 KVM
> > using 8 virtual CPUs.
> > 
> > AFAICT, there are zero callers that use the mmap_lock directly besides a
> > prefetchw(&mm->mmap_lock);
> > 
> > 
> >           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000de8cf25e lock
> >           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page)
> >           ebizzy-803     [000] ....  5376.655367: lock_release: 000000006b581afd &mm->mmap_lock
> >           ebizzy-803     [000] ....  5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false
> >  
> >           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
> >  
> >           ebizzy-803     [000] ....  5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false
> >  
> >           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
> 
> I was thinking khugepaged interference, but this seems to be a system-wide
> tracing (judging from <idle>-0) so you would see khugepaged too, right?

Correct.  All calls will go through Michel's tracepoints from what I can
tell.

> 
> In the other hand it seems strange to have a long list of just cpu 0 here.
> Are other CPU's missing or just the interleaving is imperfect because
> timestamps are not perfectly in sync between cpus, and in fact there was
> another CPU who took the lock?

Yes, it could be the clock.  I had used the default clock which is local
to CPUs.  I was looking for documentation in 'time stamp' but should
have looked for 'clock'.  I've re-run the test with counter, which
should remove the potential of incorrect ordering.  I've also imported
into a spreadsheet and sorted by the counter to ensure there isn't
interleaving of printing causing issues.  This is why the output below
has a slightly different whitespace than the original.

ebizzy-1437    [003] ....	83650150	: mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83650157	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83650158	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false
ebizzy-1437    [003] ....	83650159	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83650160	: lock_acquire: 0000000055bc357b read &mm->mmap_lock
ebizzy-1437    [003] d...	83650161	: lock_acquire: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83650163	: lock_release: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83650164	: lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83650168	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83650169	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83650171	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83650173	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83650174	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83650175	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83650178	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83650181	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83650182	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83650183	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650184	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650185	: lock_acquire: 00000000f5623f3e read jiffies_seq.seqcount
<idle>-0       [003] d...	83650186	: lock_release: 00000000f5623f3e jiffies_seq.seqcount
<idle>-0       [003] d...	83650187	: lock_acquire: 000000006705bb4e &base->lock
<idle>-0       [003] d...	83650188	: lock_release: 000000006705bb4e &base->lock
<idle>-0       [003] d...	83650189	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d...	83650190	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d...	83650191	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650192	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650193	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d...	83650194	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650195	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650196	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650197	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650198	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d.h.	83651382	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d.h.	83651386	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d.h.	83651391	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] dNh.	83651403	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] dN..	83651406	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] dN..	83651409	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] dN..	83651411	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651415	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] dN..	83651417	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] dN..	83651427	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651429	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651431	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] dN..	83651433	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] dN..	83651441	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651443	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83651448	: lock_release: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83651453	: lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83651455	: lock_release: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] ....	83651458	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true
ebizzy-1437    [003] ....	83651462	: lock_acquire: 000000005ada5d35 fs_reclaim
ebizzy-1437    [003] ....	83651464	: lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1437    [003] ....	83651466	: lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1437    [003] ....	83651467	: lock_release: 000000005ada5d35 fs_reclaim
ebizzy-1437    [003] d...	83651470	: lock_acquire: 0000000002c996e9 lock
ebizzy-1437    [003] d...	83651472	: lock_release: 0000000002c996e9 lock
ebizzy-1437    [003] ....	83651475	: lock_acquire: 00000000aae3fceb ptlock_ptr(page)
ebizzy-1437    [003] ....	83651476	: lock_acquire: 0000000064812dba lock
ebizzy-1437    [003] ....	83651477	: lock_release: 0000000064812dba lock
ebizzy-1437    [003] ....	83651478	: lock_release: 00000000aae3fceb ptlock_ptr(page)
ebizzy-1437    [003] ....	83651479	: lock_release: 0000000055bc357b &mm->mmap_lock
ebizzy-1437    [003] ....	83651480	: mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83651489	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83651491	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false
ebizzy-1437    [003] ....	83651492	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83651494	: lock_acquire: 0000000055bc357b read &mm->mmap_lock
ebizzy-1437    [003] d...	83651496	: lock_acquire: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83651497	: lock_release: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83651499	: lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83651501	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83651504	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83651505	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83651507	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83651510	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83651512	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83651515	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83651517	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83651520	: lock_release: 00000000b5963f30 &rq->__lock

Please note that this is now next-20210811 directly.  No maple tree
patches.


As per the grouping of each thread, I don't really have an answer.
There are places in the trace which do have different CPUs and threads
running interleaved:

ebizzy-1445    [000] ....	83693711	: lock_release: 0000000055bc357b &mm->mmap_lock
ebizzy-1445    [000] ....	83693715	: mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83693727	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83693730	: lock_acquire: 0000000055bc357b try read &mm->mmap_lock
ebizzy-1437    [003] ....	83693733	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true
ebizzy-1437    [003] ....	83693738	: lock_acquire: 000000005ada5d35 fs_reclaim
ebizzy-1437    [003] ....	83693742	: lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1445    [000] ....	83693743	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83693747	: lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start


>  <removed old log>
> > 
> > The following tracepoints were enabled:
> >   events/mmap_lock/enable
> >   events/lock/enable
> > 
> > My reading of the above trace is that the ebizzy thread dropped the lock
> > and immediately attempted to reacquire and detected it was in contention
> > so the thread added itself to the list and went to sleep only to wake up
> > and get the lock?
> 
> Which path does this mmap_read_trylock() followed by immediate
> mmap_read_lock() anyway? I mean down_read() is implemented like this
> internally, but that wouldn't generate these mmap_lock_ events.

I think it's do_user_addr_fault()?

> 
> > Doesn't that mean the mmap_lock is _not_ contended?  The lack of
> > contention makes sense since there is no tracepoint of an attempt to
> > acquire the lock between the dropping of the lock and the same MM waking
> > up to get the lock.
> > 
> > What is even stranger is if I change the value of _Q_PENDING_LOOPS from
> > (1 << 9) to (1 << 10), the benchmark performs better.
> > 
> > From the above trace and the effects of the _Q_PENDING_LOOPS change, it
> > looks like the lock is somehow remaining in a state that causes a
> > failure to acquire the lock even when it is not held or contended?

So the new log is with trace_clock set to counter with next-20210811
running ./ebizzy -m (only use mmap) on a KVM with 8 CPUs.

Thanks,
Liam




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux