[PATCH 0/6 v3] xfs: lockless buffer lookups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

Current work to merge the XFS inode life cycle with the VFS indoe
life cycle is finding some interesting issues. If we have a path
that hits buffer trylocks fairly hard (e.g. a non-blocking
background inode freeing function), we end up hitting massive
contention on the buffer cache hash locks:

-   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
   - 92.67% xfs_inodegc_worker
      - 92.13% xfs_inode_unlink
         - 91.52% xfs_inactive_ifree
            - 85.63% xfs_read_agi
               - 85.61% xfs_trans_read_buf_map
                  - 85.59% xfs_buf_read_map
                     - xfs_buf_get_map
                        - 85.55% xfs_buf_find
                           - 72.87% _raw_spin_lock
                              - do_raw_spin_lock
                                   71.86% __pv_queued_spin_lock_slowpath
                           - 8.74% xfs_buf_rele
                              - 7.88% _raw_spin_lock
                                 - 7.88% do_raw_spin_lock
                                      7.63% __pv_queued_spin_lock_slowpath
                           - 1.70% xfs_buf_trylock
                              - 1.68% down_trylock
                                 - 1.41% _raw_spin_lock_irqsave
                                    - 1.39% do_raw_spin_lock
                                         __pv_queued_spin_lock_slowpath
                           - 0.76% _raw_spin_unlock
                                0.75% do_raw_spin_unlock

This is basically hammering the pag->pag_buf_lock from lots of CPUs
doing trylocks at the same time. Most of the buffer trylock
operations ultimately fail after we've done the lookup, so we're
really hammering the buf hash lock whilst making no progress.

We can also see significant spinlock traffic on the same lock just
under normal operation when lots of tasks are accessing metadata
from the same AG, so let's avoid all this by creating a lookup fast
path which leverages the rhashtable's ability to do rcu protected
lookups.

This is a rework of the initial lockless buffer lookup patch I sent
here:

https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@xxxxxxxxxxxxx/

And the alternative cleanup sent by Christoph here:

https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@xxxxxx/

This version isn't quite a short as Christophs, but it does roughly
the same thing in killing the two-phase _xfs_buf_find() call
mechanism. It separates the fast and slow paths a little more
cleanly and doesn't have context dependent buffer return state from
the slow path that the caller needs to handle. It also picks up the
rhashtable insert optimisation that Christoph added.

This series passes fstests under several different configs and does
not cause any obvious regressions in scalability testing that has
been performed. Hence I'm proposing this as potential 5.20 cycle
material.

Thoughts, comments?

Version 3:
- rebased onto linux-xfs/for-next
- rearranged some of the changes to avoid repeated shuffling of code
  to different locations
- fixed typos in commits
- s/xfs_buf_find_verify/xfs_buf_map_verify/
- s/xfs_buf_find_fast/xfs_buf_lookup/

Version 2:
- https://lore.kernel.org/linux-xfs/20220627060841.244226-1-david@xxxxxxxxxxxxx/
- based on 5.19-rc2
- high speed collision of original proposals.

Initial versions:
- https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@xxxxxx/
- https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@xxxxxxxxxxxxx/





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux