On Thu, Jul 18, 2024 at 11:00 AM Bharata B Rao <bharata@xxxxxxx> wrote: > > On 17-Jul-24 4:59 PM, Mateusz Guzik wrote: > > As for clear_shadow_entry mentioned in the opening mail, the content is: > > spin_lock(&mapping->host->i_lock); > > xa_lock_irq(&mapping->i_pages); > > __clear_shadow_entry(mapping, index, entry); > > xa_unlock_irq(&mapping->i_pages); > > if (mapping_shrinkable(mapping)) > > inode_add_lru(mapping->host); > > spin_unlock(&mapping->host->i_lock); > > > > so for all I know it's all about the xarray thing, not the i_lock per se. > > The soft lockup signature has _raw_spin_lock and not _raw_spin_lock_irq > and hence concluded it to be i_lock. I'm not disputing it was i_lock. I am claiming that the i_pages is taken immediately after and it may be that in your workload this is the thing with the actual contention problem, making i_lock a red herring. I tried to match up offsets to my own kernel binary, but things went haywire. Can you please resolve a bunch of symbols, like this: ./scripts/faddr2line vmlinux clear_shadow_entry+92 and then paste the source code from reported lines? (I presume you are running with some local patches, so opening relevant files in my repo may still give bogus resutls) Addresses are: clear_shadow_entry+92 __remove_mapping+98 __filemap_add_folio+332 Most notably in __remove_mapping i_lock is conditional: if (!folio_test_swapcache(folio)) spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); and the disasm of the offset in my case does not match either acquire. For all I know i_lock in this routine is *not* taken and all the queued up __remove_mapping callers increase i_lock -> i_pages wait times in clear_shadow_entry. To my cursory reading i_lock in clear_shadow_entry can be hacked away with some effort, but should this happen the contention is going to shift to i_pages presumably with more soft lockups (except on that lock). I am not convinced messing with it is justified. From looking at other places the i_lock is not a problem in other spots fwiw. All that said even if it is i_lock in both cases *and* someone whacks it, the mm folk should look into what happens when (maybe i_lock ->) i_pages lock is held. To that end perhaps you could provide a flamegraph or output of perf record -a -g, I don't know what's preferred. -- Mateusz Guzik <mjguzik gmail.com>