Re: Hard and soft lockups with FIO and LTP runs on a large system

Mateusz Guzik <mjguzik@xxxxxxxxx> · Thu, 18 Jul 2024 14:11:53 +0200

On Thu, Jul 18, 2024 at 11:00 AM Bharata B Rao <bharata@xxxxxxx> wrote:
>
> On 17-Jul-24 4:59 PM, Mateusz Guzik wrote:
> > As for clear_shadow_entry mentioned in the opening mail, the content is:
> >          spin_lock(&mapping->host->i_lock);
> >          xa_lock_irq(&mapping->i_pages);
> >          __clear_shadow_entry(mapping, index, entry);
> >          xa_unlock_irq(&mapping->i_pages);
> >          if (mapping_shrinkable(mapping))
> >                  inode_add_lru(mapping->host);
> >          spin_unlock(&mapping->host->i_lock);
> >
> > so for all I know it's all about the xarray thing, not the i_lock per se.
>
> The soft lockup signature has _raw_spin_lock and not _raw_spin_lock_irq
> and hence concluded it to be i_lock.

I'm not disputing it was i_lock. I am claiming that the i_pages is
taken immediately after and it may be that in your workload this is
the thing with the actual contention problem, making i_lock a red
herring.

I tried to match up offsets to my own kernel binary, but things went haywire.

Can you please resolve a bunch of symbols, like this:
./scripts/faddr2line vmlinux clear_shadow_entry+92

and then paste the source code from reported lines? (I presume you are
running with some local patches, so opening relevant files in my repo
may still give bogus resutls)

Addresses are: clear_shadow_entry+92 __remove_mapping+98 __filemap_add_folio+332

Most notably in __remove_mapping i_lock is conditional:
        if (!folio_test_swapcache(folio))
                spin_lock(&mapping->host->i_lock);
        xa_lock_irq(&mapping->i_pages);

and the disasm of the offset in my case does not match either acquire.
For all I know i_lock in this routine is *not* taken and all the
queued up __remove_mapping callers increase i_lock -> i_pages wait
times in clear_shadow_entry.

To my cursory reading i_lock in clear_shadow_entry can be hacked away
with some effort, but should this happen the contention is going to
shift to i_pages presumably with more soft lockups (except on that
lock). I am not convinced messing with it is justified. From looking
at other places the i_lock is not a problem in other spots fwiw.

All that said even if it is i_lock in both cases *and* someone whacks
it, the mm folk should look into what happens when (maybe i_lock ->)
i_pages lock is held. To that end perhaps you could provide a
flamegraph or output of perf record -a -g, I don't know what's
preferred.
-- 
Mateusz Guzik <mjguzik gmail.com>