* Liam Howlett <liam.howlett@xxxxxxxxxx> [210901 13:05]: > * Vlastimil Babka <vbabka@xxxxxxx> [210901 06:52]: > > On 8/31/21 22:40, Liam Howlett wrote: > > > > > > Hello, > > > > > > I've been trying to trace the mmap_lock calls using tracepoints and > > > captured this behaviour which I cannot explain. This was with the maple > > > tree v2 patches running ebizzy with multiple threads in an x86_64 KVM > > > using 8 virtual CPUs. > > > > > > AFAICT, there are zero callers that use the mmap_lock directly besides a > > > prefetchw(&mm->mmap_lock); > > > > > > > > > ebizzy-803 [000] .... 5376.655366: lock_release: 00000000de8cf25e lock > > > ebizzy-803 [000] .... 5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page) > > > ebizzy-803 [000] .... 5376.655367: lock_release: 000000006b581afd &mm->mmap_lock > > > ebizzy-803 [000] .... 5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false > > > > > > ebizzy-803 [000] .... 5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false > > > > > > ebizzy-803 [000] .... 5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false > > > > > > ebizzy-803 [000] .... 5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false > > > > I was thinking khugepaged interference, but this seems to be a system-wide > > tracing (judging from <idle>-0) so you would see khugepaged too, right? > > Correct. All calls will go through Michel's tracepoints from what I can > tell. > > > > > In the other hand it seems strange to have a long list of just cpu 0 here. > > Are other CPU's missing or just the interleaving is imperfect because > > timestamps are not perfectly in sync between cpus, and in fact there was > > another CPU who took the lock? > > Yes, it could be the clock. I had used the default clock which is local > to CPUs. I was looking for documentation in 'time stamp' but should > have looked for 'clock'. I've re-run the test with counter, which > should remove the potential of incorrect ordering. I've also imported > into a spreadsheet and sorted by the counter to ensure there isn't > interleaving of printing causing issues. This is why the output below > has a slightly different whitespace than the original. > <dropping old log since it's not necessary> > > Please note that this is now next-20210811 directly. No maple tree > patches. > > > As per the grouping of each thread, I don't really have an answer. > There are places in the trace which do have different CPUs and threads > running interleaved: > > ebizzy-1445 [000] .... 83693711 : lock_release: 0000000055bc357b &mm->mmap_lock > ebizzy-1445 [000] .... 83693715 : mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false > ebizzy-1437 [003] .... 83693727 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false > ebizzy-1437 [003] .... 83693730 : lock_acquire: 0000000055bc357b try read &mm->mmap_lock > ebizzy-1437 [003] .... 83693733 : mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true > ebizzy-1437 [003] .... 83693738 : lock_acquire: 000000005ada5d35 fs_reclaim > ebizzy-1437 [003] .... 83693742 : lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start > ebizzy-1445 [000] .... 83693743 : mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false > ebizzy-1437 [003] .... 83693747 : lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start > > > > <removed old log> > > > > > > The following tracepoints were enabled: > > > events/mmap_lock/enable > > > events/lock/enable > > > > > > My reading of the above trace is that the ebizzy thread dropped the lock > > > and immediately attempted to reacquire and detected it was in contention > > > so the thread added itself to the list and went to sleep only to wake up > > > and get the lock? > > > > Which path does this mmap_read_trylock() followed by immediate > > mmap_read_lock() anyway? I mean down_read() is implemented like this > > internally, but that wouldn't generate these mmap_lock_ events. > > I think it's do_user_addr_fault()? > > > > > > Doesn't that mean the mmap_lock is _not_ contended? The lack of > > > contention makes sense since there is no tracepoint of an attempt to > > > acquire the lock between the dropping of the lock and the same MM waking > > > up to get the lock. > > > > > > What is even stranger is if I change the value of _Q_PENDING_LOOPS from > > > (1 << 9) to (1 << 10), the benchmark performs better. > > > > > > From the above trace and the effects of the _Q_PENDING_LOOPS change, it > > > looks like the lock is somehow remaining in a state that causes a > > > failure to acquire the lock even when it is not held or contended? > > So the new log is with trace_clock set to counter with next-20210811 > running ./ebizzy -m (only use mmap) on a KVM with 8 CPUs. I was able to get to the bottom what what was happening with a lot of help from Steven Rostedt. The logs can be explained by the following: - Sequence counters may be missed if a given CPU overruns its ring buffer. These are marked by #### CPU N buffer started #### or something similar. - There may also be a jump in the sequence counter by interrupts. - The mmap_lock logging was racy with other threads logging of mmap_lock. I have sent out a patch [1] to fix the race. - When downgrading the writer, the log will show that the lock is acquired as a read lock (write=false) without a 'start locking' log. 1. https://lore.kernel.org/linux-mm/20210903013521.1802774-1-Liam.Howlett@xxxxxxxxxx/ Thanks everyone for the help in tracking down the odd behaviour. It was nice to find out it wasn't entirely user error :) Cheers, Liam