* Vlastimil Babka <vbabka@xxxxxxx> [210901 06:52]: > On 8/31/21 22:40, Liam Howlett wrote: > > > > Hello, > > > > I've been trying to trace the mmap_lock calls using tracepoints and > > captured this behaviour which I cannot explain. This was with the maple > > tree v2 patches running ebizzy with multiple threads in an x86_64 KVM > > using 8 virtual CPUs. > > > > AFAICT, there are zero callers that use the mmap_lock directly besides a > > prefetchw(&mm->mmap_lock); > > > > > > ebizzy-803 [000] .... 5376.655366: lock_release: 00000000de8cf25e lock > > ebizzy-803 [000] .... 5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page) > > ebizzy-803 [000] .... 5376.655367: lock_release: 000000006b581afd &mm->mmap_lock > > ebizzy-803 [000] .... 5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false > > > > ebizzy-803 [000] .... 5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false > > > > ebizzy-803 [000] .... 5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false > > > > ebizzy-803 [000] .... 5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false > > I was thinking khugepaged interference, but this seems to be a system-wide > tracing (judging from <idle>-0) so you would see khugepaged too, right? Correct. All calls will go through Michel's tracepoints from what I can tell. > > In the other hand it seems strange to have a long list of just cpu 0 here. > Are other CPU's missing or just the interleaving is imperfect because > timestamps are not perfectly in sync between cpus, and in fact there was > another CPU who took the lock? Yes, it could be the clock. I had used the default clock which is local to CPUs. I was looking for documentation in 'time stamp' but should have looked for 'clock'. I've re-run the test with counter, which should remove the potential of incorrect ordering. I've also imported into a spreadsheet and sorted by the counter to ensure there isn't interleaving of printing causing issues. This is why the output below has a slightly different whitespace than the original. ebizzy-1437 [003] .... 83650150 : mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83650157 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83650158 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false ebizzy-1437 [003] .... 83650159 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83650160 : lock_acquire: 0000000055bc357b read &mm->mmap_lock ebizzy-1437 [003] d... 83650161 : lock_acquire: 00000000b3904b60 &sem->wait_lock ebizzy-1437 [003] d... 83650163 : lock_release: 00000000b3904b60 &sem->wait_lock ebizzy-1437 [003] d... 83650164 : lock_acquire: 00000000b5963f30 &rq->__lock ebizzy-1437 [003] d... 83650168 : lock_acquire: 000000002d23d506 read rcu_read_lock ebizzy-1437 [003] d... 83650169 : lock_release: 000000002d23d506 rcu_read_lock ebizzy-1437 [003] d... 83650171 : lock_acquire: 000000002d23d506 read rcu_read_lock ebizzy-1437 [003] d... 83650173 : lock_release: 000000002d23d506 rcu_read_lock ebizzy-1437 [003] d... 83650174 : lock_acquire: 000000002d23d506 read rcu_read_lock ebizzy-1437 [003] d... 83650175 : lock_release: 000000002d23d506 rcu_read_lock ebizzy-1437 [003] d... 83650178 : lock_release: 00000000b5963f30 &rq->__lock <idle>-0 [003] d... 83650181 : lock_acquire: 00000000b5963f30 &rq->__lock <idle>-0 [003] d... 83650182 : lock_release: 00000000b5963f30 &rq->__lock <idle>-0 [003] d... 83650183 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] d... 83650184 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] d... 83650185 : lock_acquire: 00000000f5623f3e read jiffies_seq.seqcount <idle>-0 [003] d... 83650186 : lock_release: 00000000f5623f3e jiffies_seq.seqcount <idle>-0 [003] d... 83650187 : lock_acquire: 000000006705bb4e &base->lock <idle>-0 [003] d... 83650188 : lock_release: 000000006705bb4e &base->lock <idle>-0 [003] d... 83650189 : lock_acquire: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] d... 83650190 : lock_release: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] d... 83650191 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] d... 83650192 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] d... 83650193 : lock_acquire: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] d... 83650194 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] d... 83650195 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] d... 83650196 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] d... 83650197 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] d... 83650198 : lock_release: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] d.h. 83651382 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] d.h. 83651386 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] d.h. 83651391 : lock_acquire: 00000000b5963f30 &rq->__lock <idle>-0 [003] dNh. 83651403 : lock_release: 00000000b5963f30 &rq->__lock <idle>-0 [003] dN.. 83651406 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] dN.. 83651409 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] dN.. 83651411 : lock_acquire: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] dN.. 83651415 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] dN.. 83651417 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] dN.. 83651427 : lock_release: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] dN.. 83651429 : lock_acquire: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] dN.. 83651431 : lock_acquire: 00000000b152519a read tk_core.seq.seqcount <idle>-0 [003] dN.. 83651433 : lock_release: 00000000b152519a tk_core.seq.seqcount <idle>-0 [003] dN.. 83651441 : lock_release: 0000000009614e04 hrtimer_bases.lock <idle>-0 [003] dN.. 83651443 : lock_acquire: 00000000b5963f30 &rq->__lock <idle>-0 [003] d... 83651448 : lock_release: 00000000b5963f30 &rq->__lock ebizzy-1437 [003] d... 83651453 : lock_acquire: 00000000b5963f30 &rq->__lock ebizzy-1437 [003] d... 83651455 : lock_release: 00000000b5963f30 &rq->__lock ebizzy-1437 [003] .... 83651458 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true ebizzy-1437 [003] .... 83651462 : lock_acquire: 000000005ada5d35 fs_reclaim ebizzy-1437 [003] .... 83651464 : lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start ebizzy-1437 [003] .... 83651466 : lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start ebizzy-1437 [003] .... 83651467 : lock_release: 000000005ada5d35 fs_reclaim ebizzy-1437 [003] d... 83651470 : lock_acquire: 0000000002c996e9 lock ebizzy-1437 [003] d... 83651472 : lock_release: 0000000002c996e9 lock ebizzy-1437 [003] .... 83651475 : lock_acquire: 00000000aae3fceb ptlock_ptr(page) ebizzy-1437 [003] .... 83651476 : lock_acquire: 0000000064812dba lock ebizzy-1437 [003] .... 83651477 : lock_release: 0000000064812dba lock ebizzy-1437 [003] .... 83651478 : lock_release: 00000000aae3fceb ptlock_ptr(page) ebizzy-1437 [003] .... 83651479 : lock_release: 0000000055bc357b &mm->mmap_lock ebizzy-1437 [003] .... 83651480 : mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83651489 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83651491 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false ebizzy-1437 [003] .... 83651492 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83651494 : lock_acquire: 0000000055bc357b read &mm->mmap_lock ebizzy-1437 [003] d... 83651496 : lock_acquire: 00000000b3904b60 &sem->wait_lock ebizzy-1437 [003] d... 83651497 : lock_release: 00000000b3904b60 &sem->wait_lock ebizzy-1437 [003] d... 83651499 : lock_acquire: 00000000b5963f30 &rq->__lock ebizzy-1437 [003] d... 83651501 : lock_acquire: 000000002d23d506 read rcu_read_lock ebizzy-1437 [003] d... 83651504 : lock_release: 000000002d23d506 rcu_read_lock ebizzy-1437 [003] d... 83651505 : lock_acquire: 000000002d23d506 read rcu_read_lock ebizzy-1437 [003] d... 83651507 : lock_release: 000000002d23d506 rcu_read_lock ebizzy-1437 [003] d... 83651510 : lock_acquire: 000000002d23d506 read rcu_read_lock ebizzy-1437 [003] d... 83651512 : lock_release: 000000002d23d506 rcu_read_lock ebizzy-1437 [003] d... 83651515 : lock_release: 00000000b5963f30 &rq->__lock <idle>-0 [003] d... 83651517 : lock_acquire: 00000000b5963f30 &rq->__lock <idle>-0 [003] d... 83651520 : lock_release: 00000000b5963f30 &rq->__lock Please note that this is now next-20210811 directly. No maple tree patches. As per the grouping of each thread, I don't really have an answer. There are places in the trace which do have different CPUs and threads running interleaved: ebizzy-1445 [000] .... 83693711 : lock_release: 0000000055bc357b &mm->mmap_lock ebizzy-1445 [000] .... 83693715 : mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83693727 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83693730 : lock_acquire: 0000000055bc357b try read &mm->mmap_lock ebizzy-1437 [003] .... 83693733 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true ebizzy-1437 [003] .... 83693738 : lock_acquire: 000000005ada5d35 fs_reclaim ebizzy-1437 [003] .... 83693742 : lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start ebizzy-1445 [000] .... 83693743 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false ebizzy-1437 [003] .... 83693747 : lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start > <removed old log> > > > > The following tracepoints were enabled: > > events/mmap_lock/enable > > events/lock/enable > > > > My reading of the above trace is that the ebizzy thread dropped the lock > > and immediately attempted to reacquire and detected it was in contention > > so the thread added itself to the list and went to sleep only to wake up > > and get the lock? > > Which path does this mmap_read_trylock() followed by immediate > mmap_read_lock() anyway? I mean down_read() is implemented like this > internally, but that wouldn't generate these mmap_lock_ events. I think it's do_user_addr_fault()? > > > Doesn't that mean the mmap_lock is _not_ contended? The lack of > > contention makes sense since there is no tracepoint of an attempt to > > acquire the lock between the dropping of the lock and the same MM waking > > up to get the lock. > > > > What is even stranger is if I change the value of _Q_PENDING_LOOPS from > > (1 << 9) to (1 << 10), the benchmark performs better. > > > > From the above trace and the effects of the _Q_PENDING_LOOPS change, it > > looks like the lock is somehow remaining in a state that causes a > > failure to acquire the lock even when it is not held or contended? So the new log is with trace_clock set to counter with next-20210811 running ./ebizzy -m (only use mmap) on a KVM with 8 CPUs. Thanks, Liam