xfs: byte-based grant head reservation tracking v4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Note: I've taken over from Dave on this to push it over the finish line]

One of the significant limitations of the log reservation code is
that it uses physical tracking of the reservation space to account
for both the space used in the journal as well as the reservations
held in memory by the CIL and active running transactions. Because
this in-memory reservation tracking requires byte-level granularity,
this means that the "LSN" that the grant head stores it's location
in is split into 32 bits for the log cycle and 32 bits for the grant
head offset into the log.

Storing a byte count as the grant head offset into the log means
that we can only index 4GB of space with the grant head. This is one
of the primary limiting factors preventing us from increasing the
physical log size beyond 2GB. Hence to increase the physical log
size, we have to increase the space available for storing the grant
head.

Needing more physical space to store the grant head is an issue
because we use lockless atomic accounting for the grant head to
minimise the overhead of new incoming transaction reservations.
These have unbound concurrency, and hence any lock in the
reservation path will cause serious scalability issues. The lockless
accounting fast path was the solution to these scalability problems
that we had over a decade ago, and hence we know we cannot go back
to a lock based solution.


The simplest way I can describe how we track the log space is
as follows:

   l_tail_lsn           l_last_sync_lsn         grant head lsn
	|-----------------------|+++++++++++++++++++++|
	|    physical space     |   in memory space   |
	| - - - - - - xlog_space_left() - - - - - - - |

It is simple for the AIL to track the maximum LSN that has been
inserted into the AIL. If we do this, we no longer need to track
log->l_last_sync_lsn in the journal itself and we can always get the
physical space tracked by the journal directly from the AIL. The AIL
functions can calculate the "log tail space" dynamically when either
the log tail or the max LSN seen changes, thereby removing all need
for the log itself to track this state. Hence we now have:

   l_tail_lsn           ail_max_lsn_seen        grant head lsn
	|-----------------------|+++++++++++++++++++++|
	|    log->l_tail_space  |   in memory space   |
	| - - - - - - xlog_space_left() - - - - - - - |

And we've solved the problem of efficiently calculating the amount
of physical space the log is consuming. All this leaves is now
calculating how much space we are consuming in memory.

Luckily for us, we've just added all the update hooks needed to do
this. From the above diagram, two things are obvious:

1. when the tail moves, only log->l_tail_space reduces
2. when the ail_max_lsn_seen increases, log->l_tail_space increases
   and "in memory space" reduces by the same amount.

IOWs, we now have a mechanism that can transfer the in-memory
reservation space directly to the on-disk tail space accounting. At
this point, we can change the grant head from tracking physical
location to tracking a simple byte count:

   l_tail_lsn           ail_max_lsn_seen        grant head bytes
   	|-----------------------|+++++++++++++++++++++|
	|    log->l_tail_space  |     grant space     |
	| - - - - - - xlog_space_left() - - - - - - - |

and xlog_space_left() simply changes to:

space left = log->l_logsize - grant space - log->l_tail_space;

All of the complex grant head cracking, combining and
compare/exchange code gets replaced by simple atomic add/sub
operations, and the grant heads can now track a full 64 bit bytes
space. The fastpath reservation accounting is also much faster
because it is much simpler.

There's one little problem, though. The transaction reservation code
has to set the LSN target for the AIL to push to ensure that the log
tail keeps moving forward (xlog_grant_push_ail()), and the deferred
intent logging code also tries to keep abreast of the amount of
space available in the log via xlog_grant_push_threshold().

The AIL pushing problem is actually easy to solve - we don't need to
push the AIL from the transaction reservation code as the AIL
already tracks all the space used by the journal. All the
transaction reservation code does is try to keep 25% of the journal
physically free, and there's no reason why the AIL can't do that
itself.

Hence before we start changing any of the grant head accounting, we
remove all the AIL pushing hooks from the reservation code and let
the AIL determine the target it needs to push to itself. We also
allow the deferred intent logging code to determine if the AIL
should be tail pushing similar to how it currently checks if we are
running out of log space, so the intent relogging still works as it
should.

With these changes in place, there is no external code that is
dependent on the grant heads tracking physical space, and hence we
can then implement the change to pure in-memory reservation space
tracking in the grant heads.....

This all passes fstests for default and rmapbt enabled configs.
Performance tests also show good improvements where the transaction
accounting is the bottleneck.

Changes since v3:
 - fix all review comments (Dave)
 - add a new patch to skip flushing AIL items (Dave)
 - rework XFS_AIL_OPSTATE_PUSH_ALL handling (Dave)
 - misc checkpath and minor coding style fixups (Christoph)
 - clean up the grant head manipulation helpers (Christoph)
 - rename the sysfs files so that xfstests can autodetect the new format
   (Christoph)
 - fix the contact address for xfs sysfs ABI entries (Christoph)

Changes since v2:
  - rebase on 6.6-rc2 + linux-xfs/for-next
  - cleaned up static warnings from build bot.
  - fixed comment about minimum AIL push target.
  - fixed whitespace problems in multiple patches.

Changes since v1:
  - https://lore.kernel.org/linux-xfs/20220809230353.3353059-1-david@xxxxxxxxxxxxx/
  - reorder moving xfs_trans_bulk_commit() patch to start of series
  - fix failure to consider NULLCOMMITLSN push target in AIL
  - grant space release based on ctx->start_lsn fails to release the space used in
    the checkpoint that was just committed. Release needs to be based on
    ctx->commit_lsn which is the end of the region that the checkpoint consumes in
    the log

Diffstat:
 Documentation/ABI/testing/sysfs-fs-xfs |   26 -
 fs/xfs/libxfs/xfs_defer.c              |    4 
 fs/xfs/xfs_inode.c                     |    1 
 fs/xfs/xfs_inode_item.c                |    6 
 fs/xfs/xfs_log.c                       |  511 +++++++--------------------------
 fs/xfs/xfs_log.h                       |    1 
 fs/xfs/xfs_log_cil.c                   |  177 +++++++++++
 fs/xfs/xfs_log_priv.h                  |   61 +--
 fs/xfs/xfs_log_recover.c               |   23 -
 fs/xfs/xfs_sysfs.c                     |   29 -
 fs/xfs/xfs_trace.c                     |    1 
 fs/xfs/xfs_trace.h                     |   42 +-
 fs/xfs/xfs_trans.c                     |  129 --------
 fs/xfs/xfs_trans.h                     |    4 
 fs/xfs/xfs_trans_ail.c                 |  244 ++++++++-------
 fs/xfs/xfs_trans_priv.h                |   44 ++
 16 files changed, 552 insertions(+), 751 deletions(-)




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux