On Fri, Jul 19, 2024 at 8:16 AM Bharata B Rao <bharata@xxxxxxx> wrote: > > On 18-Jul-24 5:41 PM, Mateusz Guzik wrote: > > On Thu, Jul 18, 2024 at 11:00 AM Bharata B Rao <bharata@xxxxxxx> wrote: > >> > >> On 17-Jul-24 4:59 PM, Mateusz Guzik wrote: > >>> As for clear_shadow_entry mentioned in the opening mail, the content is: > >>> spin_lock(&mapping->host->i_lock); > >>> xa_lock_irq(&mapping->i_pages); > >>> __clear_shadow_entry(mapping, index, entry); > >>> xa_unlock_irq(&mapping->i_pages); > >>> if (mapping_shrinkable(mapping)) > >>> inode_add_lru(mapping->host); > >>> spin_unlock(&mapping->host->i_lock); > >>> > >>> so for all I know it's all about the xarray thing, not the i_lock per se. > >> > >> The soft lockup signature has _raw_spin_lock and not _raw_spin_lock_irq > >> and hence concluded it to be i_lock. > > > > I'm not disputing it was i_lock. I am claiming that the i_pages is > > taken immediately after and it may be that in your workload this is > > the thing with the actual contention problem, making i_lock a red > > herring. > > > > I tried to match up offsets to my own kernel binary, but things went haywire. > > > > Can you please resolve a bunch of symbols, like this: > > ./scripts/faddr2line vmlinux clear_shadow_entry+92 > > > > and then paste the source code from reported lines? (I presume you are > > running with some local patches, so opening relevant files in my repo > > may still give bogus resutls) > > > > Addresses are: clear_shadow_entry+92 __remove_mapping+98 __filemap_add_folio+332 > > clear_shadow_entry+92 > > $ ./scripts/faddr2line vmlinux clear_shadow_entry+92 > clear_shadow_entry+92/0x180: > spin_lock_irq at include/linux/spinlock.h:376 > (inlined by) clear_shadow_entry at mm/truncate.c:51 > > 42 static void clear_shadow_entry(struct address_space *mapping, > 43 struct folio_batch *fbatch, pgoff_t > *indices) > 44 { > 45 int i; > 46 > 47 if (shmem_mapping(mapping) || dax_mapping(mapping)) > 48 return; > 49 > 50 spin_lock(&mapping->host->i_lock); > 51 xa_lock_irq(&mapping->i_pages); > > > __remove_mapping+98 > > $ ./scripts/faddr2line vmlinux __remove_mapping+98 > __remove_mapping+98/0x230: > spin_lock_irq at include/linux/spinlock.h:376 > (inlined by) __remove_mapping at mm/vmscan.c:695 > > 684 static int __remove_mapping(struct address_space *mapping, struct > folio *folio, > 685 bool reclaimed, struct mem_cgroup > *target_memcg) > 686 { > 687 int refcount; > 688 void *shadow = NULL; > 689 > 690 BUG_ON(!folio_test_locked(folio)); > 691 BUG_ON(mapping != folio_mapping(folio)); > 692 > 693 if (!folio_test_swapcache(folio)) > 694 spin_lock(&mapping->host->i_lock); > 695 xa_lock_irq(&mapping->i_pages); > > > __filemap_add_folio+332 > > $ ./scripts/faddr2line vmlinux __filemap_add_folio+332 > __filemap_add_folio+332/0x480: > spin_lock_irq at include/linux/spinlock.h:377 > (inlined by) __filemap_add_folio at mm/filemap.c:878 > > 851 noinline int __filemap_add_folio(struct address_space *mapping, > 852 struct folio *folio, pgoff_t index, gfp_t gfp, void > **shadowp) > 853 { > 854 XA_STATE(xas, &mapping->i_pages, index); > ... > 874 for (;;) { > 875 int order = -1, split_order = 0; > 876 void *entry, *old = NULL; > 877 > 878 xas_lock_irq(&xas); > 879 xas_for_each_conflict(&xas, entry) { > > > > > Most notably in __remove_mapping i_lock is conditional: > > if (!folio_test_swapcache(folio)) > > spin_lock(&mapping->host->i_lock); > > xa_lock_irq(&mapping->i_pages); > > > > and the disasm of the offset in my case does not match either acquire. > > For all I know i_lock in this routine is *not* taken and all the > > queued up __remove_mapping callers increase i_lock -> i_pages wait > > times in clear_shadow_entry. > > So the first two are on i_pages lock and the last one is xa_lock. > bottom line though messing with i_lock removal is not justified afaics > > > > To my cursory reading i_lock in clear_shadow_entry can be hacked away > > with some effort, but should this happen the contention is going to > > shift to i_pages presumably with more soft lockups (except on that > > lock). I am not convinced messing with it is justified. From looking > > at other places the i_lock is not a problem in other spots fwiw. > > > > All that said even if it is i_lock in both cases *and* someone whacks > > it, the mm folk should look into what happens when (maybe i_lock ->) > > i_pages lock is held. To that end perhaps you could provide a > > flamegraph or output of perf record -a -g, I don't know what's > > preferred. > > I have attached the flamegraph but this is for the kernel that has been > running with all the accumulated fixes so far. The original one (w/o > fixes) did show considerable time spent on > native_queued_spin_lock_slowpath but unfortunately unable to locate it now. > So I think the problems at this point are all mm, so I'm kicking the ball to that side. -- Mateusz Guzik <mjguzik gmail.com>